AI and the Age of Creative Superhumans

AI is rocking the creative world. Some things to consider.

Oct 14, 2022

A two minute text to video prompt, Phenaki

Welcome to People vs Algorithms #47.

I look for patterns in media, business and culture. My POV is informed by 30 years of leadership in media and advertising businesses, most recently as global President of Hearst Magazines, one of the largest publishers in the world.

Brian Morrissey and I talk about Meta’s next chapter, AI and more on this week’s podcast. Listen here.

This story was not written by AI. It probably could have been. It certainly will be in the future. It was written by me.

But what it means to create something, what makes a thing authentic and who gets credit for it is about to get much harder to decipher.

Earlier this year I wrote a note on what I saw was our progression toward "full composability," a state where anything you can dream up can be made, of total creative fluidity, driven by advances in compute and accessible content creation tools. I saw the present as accelerating a long history of innovation in our ability to make media, early chapters of which I witnessed early in my career with the wide adoption of the PC and desktop publishing software.

When I wrote it I did not fully appreciate AI’s disruptive potential. The past 12 months have seen a Cambrian explosion in innovation, extending the building blocks of large language AI models (LLM) like GPT-3 to all forms of media creation. The most profound and disruptive result, particularly for its ability to capture popular imagination, has been the emergence of "text-to-image” with an ability to render wholly new creative works from simple text prompts.

Software innovations led by companies like Adobe used to push the creative community forward in slow, predictable cycles.

Innovation is now distributed and breakneck. The dry brush of a hyper-networked world, broad technical literacy, the open source movement and on-demand computing, all create conditions for the rapid spread of new technological ideas and capabilities. Generative AI is a wildfire, capturing the attention of non-technical users and thousands of developers looking to profit from what seems to have been a fundamental technological shift.

And, in a twist on long held conceptions about technology’s ability to displace workers, it turns out it’s not the blue collar class being challenged, but the creative knowledge workers we imagined would be last to be impacted.

These moments don't come along very often. They happen when streams of technological innovation converge on a single use case, usually one with outsized social or economic ramifications. It happened fifteen years ago with the introduction of the iPhone, a moment when convergent innovation in micro processors, 3G networks, GPS, battery tech, cameras, touchscreens combined to set off a technical and cultural revolution that has touched every part of our lives.

This time it was the change was born of advances in large language models, breakthrough thinking in how semantic concepts map to image data, systems trained on enormous data sets of images and related metadata made possible by decades of knowledge digitization and steady advances in raw computing power, in particular adoption of powerful GPUs (graphical processing units). All fed by enormous investment in these technologies by companies like Google and Facebook and Microsoft dripping in surplus profits from oligopolistic ad and software businesses.

More recently, open source models, specifically those offered by Stability AI and their Stable Diffusion model, have unlocked the energy and imagination of the development community at a staggering pace. Dall-E, from Open AI was the first material release of text-to-image technology in January of 2021. As you can see from many of the links below, the amount of innovation in under two years is staggering.

If you are sick of hearing about AI, this is only the beginning. Everything on the frontier is X plus AI, truly the age of the tech powered superhuman. This is the next chapter of technical disruption post the internet and mobile phone. Good to pay attention.

Below a few early observations and examples that will help you understand the landscape, starting with a useful summary of text-to-image from the thoughtful people at Vox:

Text-to-image becomes text to anything

AI has evolved from language to images. It's moving quickly to video to 3D and music. Text to video becomes text to anything; "create a melody inspired by Fleetwood Mac Dreams but incorporating a dubstep bass line."

Meta builds on text to image with text to video: Make-a-Video.
Google follows suite with Imagen.
Deforum Stable Diffusion show how we can easily extend images to animation.
Phenaki is a new model that is making text to video more efficient. Early demos show a path from text to rendered video narrative.
AI podcasting. Rogan interviews Steve Jobs.
Researchers from Google at Dream Fusion imagine text to 3D.
Text to Pokemon

The art of the prompt

Prompts are text queries that tell AI what to make. Good prompts are creative expressions that understand how underlying models work. You can get good at these in the same way a librarian is good at finding the perfect book.

Members of the Hugging Face AI community have a service to help you with creative prompt
PromptBase launches a marketplace to sell quality prompts. $1.99 gets you one for mini rooms or portraits of suffering. Here’s one for vintage avatars.

Hyper-personalization

AI personalization touches you in subtle ways today. Like when Google automatically completes a search query based on location data. AI will drive a much more intimate future where software agents deeply personalize information retrieval, organization and synthesis.

Imagine asking for a list of the best articles on AI and marketing automation. An AI powered bot will retrieve and organize a list of content worthy of your time based on a deep understanding of your specific interests, level of expertise, research history etc. It will author a useful summary.

Everything we see will have potential for deep personalization, but now at a visual level. Vacation merchandising will feature your family at a beach resort or on a ski hill. Movie or game characters will be replaced by anyone you want.

Or today let’s say you want to reimagine a room? Upload a photo of any environment. AI helps you imagine it myriad ways.

Remix culture comes to all media

We are now accustomed to the remix as a foundational part of modern music. What if we could do the same with any media type? And what if anyone could do it? AI will enable versioning of entire creative works with new characters, new backdrops, new storylines. Reimagine The Godfather as warring clans of cat people? Coming up. Naturally, the technology will unleash massive new complexities in copyright and rights management.

Here’s a fun little project… making movie posters from films that do not exist.

Marketing and merchandising disruption

Programmatic automated much of the media buying process. Or at least drove out a lot of the process inefficiencies. AI will write a chapter where media targeting and optimization decisions disappear behind the algorithm. Google's Performance Max is showing that future today, taking away much of the complexity of media planning, buying and optimization.

Creative has been slower to automate. AI will change all of that. Dynamic funnels will automatically create visual marketing narratives based on your personal profile. The laborious work of digital merchandising will be fundamentally changed by AI. A pair of shoes can be instantly rendered from any angle, on any foot, in any setting, all created by training the model on a handful of images.

Cracks in Google's search monopoly

Related. Search queries are just prompts. We are accustomed to Google returning curated results against a query. Imagine a future when the query does a lot more, from hyper-personalized results to AI generated images and multimedia content. It's been hard to imagine a disruptive competitor to Google for a long time. This is an opening.

Superhuman creators are tool experts

Our son is a blossoming musician and producer. My consistent advice to him, master the process and tools of making. The next gen of creator wins because they know how to turn inspiration to content quickly. In the future, the job of any creative person becomes the orchestrator of tools that enable unimaginably sophisticated media creation anywhere.

It changes coding too. Like how AI makes you a faster programmer today. Github Copilot uses OpenAI to suggest the next line of code in real time. The future of programming looks less and less like understanding the intricacies of programming languages.

Perhaps the easiest way to see how quickly things are evolving today…check what Runway.ai is about to do to video editing.

Runway @runwayml

Introducing Erase and Replace A new AI Magic Tool that allows you to transform your images simply by using a natural language description. Available now: runwayml.com

Diffusion of diffusers

AI is not monolithic. The ability to take open source models and augment them with new datasets creates new creative outcomes. Midjourney, for example, has a particular visual vibe.

A fun art project called An Improbable Future shows industrial and transportation design using AI to generated images against one specific retro futuristic aesthetic.
Disco Diffusion (aka DD) is a clip-guided diffusion model that can generate amazing images from text prompt, and it is pretty impressive in generating abstract art. Here's a primer. Here's a Reddit forum showing tons of examples. Here's a comparison between Stable and Disco diffusion.
Longer prompts.... here's how we are beginning to extend image to story.
Playgound AI provides a showcase of all kinds of generative images.

All of this will force a fundamental refactoring of the role of a knowledge worker, how we create media, how we price and value creativity and certainly a complex examination of intellectual property rights.

I suspect through it all we will come to value what is uniquely human ever more. Like the track below from Beabadoobee.

Have a great weekend…/ Troy

From Wikipedia:

Laus was born in Iloilo City in the Philippines on 3 June 2000 and moved to London with her parents at the age of 3. She grew up in West London listening to original Pinoy music as well as pop and rock music from the 1980s. While she was a teenager, she listened to indie rock including Karen O, Yeah Yeah Yeahs, Florist and Alex G. She was expelled from Sacred Heart High School before completing her thirteenth year at Hammersmith Academy. Laus spent seven years learning to play the violin, before getting her first guitar second-hand at the age of 17. She taught herself how to play the instrument using YouTube tutorials. She was inspired by Kimya Dawson and the Juno soundtrack to start making music.