Pictures produced by AI went viral, stirred controversies, and drove investments.

What happened: A new generation of text-to-image generators inspired a flood of experimentation, transforming text descriptions into mesmerizing artworks and photorealistic fantasies. Commercial enterprises were quick to press the technology into service, making image generation a must-have feature in software for creating and editing graphics.

Driving the story: Models that generate media became the public face of AI thanks to friendly user interfaces, highly entertaining output, and open APIs and models.

  • OpenAI introduced DALL·E 2 in April. More than 1.5 million users beta tested the model, and in September, the company made it widely available. Microsoft, which funds OpenAI in exchange for exclusive commercial rights to its work, integrated the model into its Azure AI-as-a-service platform.
  • By July, push-button artists were flooding the social media platforms with relatively crude images produced by the simpler Craiyon.
  • Stability AI soon upped the ante with the open source model Stable Diffusion — updated in November to version 2.0 — that eventually attracted more than $100 million in fresh capital.
  • Adobe and stock-photo kingpins Getty Images and Shutterstock integrated image-generation models into their own products and services.
  • Such programs produce radically different results depending on the text prompt they’re given. PromptBase opened a marketplace for text strings that generate interesting output.

Yes, but: Such models are trained on images scraped from the web. Like large language models, they inherit biases embedded in online content and imitate the inflammatory styles of expression.

  • Lensa AI, a photo-editing app that generates artistic avatars from users’ selfies, reached the top of mobile app store charts. Its success came with a dose of controversy as users, particularly women, found that the app sexualized their images.
  • ArtStation, an online community for visual artists, launched its own text-to-image features. Many artists, feeling threatened by computer programs that can reproduce an artist’s hard-won personal style in seconds, boycotted the website.

Behind the news: Diffusion models generate output by starting with noise and removing it selectively over a series of steps. Introduced in 2015 by researchers at UC Berkeley and Stanford, they remained in the background for several years until further work showed that they could produce images competitive with the output of generative adversarial networks (GANs). Stability AI put a diffusion model at the heart of Stable Diffusion. OpenAI, which based the initial version of DALL·E on a GAN, updated it with a diffusion model at around the same time.

Where things stand: The coming year is shaping up for a revolution in computer-aided creativity. And the groundswell of generated imagery isn’t going to stop at pictures. Google and Meta released impressive text-to-video models this year, and OpenAI accelerated text-to-3D-object generation by an order of magnitude.

Share

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox