Text-to-Image Generation and the Path to Truly Open AI

Published
Reading time
2 min read
Left: Panda drawn by Andrew Ng in 10 minutes | Right: Panda generated by Stable Diffusion in seconds

Dear friends,

Stable Diffusion, an image generation model that takes a text prompt and produces an image, was released a few weeks ago in a landmark event for AI. While similar programs like DALL·E and Craiyon can be used via API calls or a web user interface, Stable Diffusion can be freely downloaded and run on the user’s hardware.

I'm excited by the artwork produced by such programs (Developer Simon Willison posted a fun tweetstorm that highlights some of the creativity they’ve unleashed), but I’m also excited by the ways in which other developers are incorporating it into their own drawing tools. Ironically, Stable Diffusion’s manner of release moves us closer to “open AI” than the way DALL·E was released by the company called OpenAI. Kudos to Emad Mostaque and his Stability AI team, which developed the program.

If you want to learn about how diffusion models like Stable Diffusion work, you can find a concise description here.

Image generation is still maturing, but it’s a big deal. Many people have the creativity to produce art but lack the drawing skill to do so. As an amateur illustrator (I like to draw pandas to entertain my daughter using the Procreate paint app), my meager skill limits what I can create. But sitting in front of the DALL·E or Stable Diffusion user interface, I can ask her what she wants to see a panda doing and render a picture for her.

Artists who have greater skill than I can use image generators to create stunning artworks more efficiently. In fact, an image produced this way recently won an art competition at the Colorado State Fair.

The rise of inexpensive smartphone cameras brought an explosion in photography, and while expensive DSLRs still have a role, they now produce a minuscule fraction of all pictures taken. I expect AI-powered image generators to do something similar in art: Ever-improving models and user interfaces will make it much more efficient to generate art using AI than without. I see a future where most art is generated using AI, and novices who have great creativity but little drawing skill will be able to participate.

My friend and collaborator Curt Langlotz, addressing the question of whether AI will replace radiologists, said that radiologists who use AI will replace radiologists who don’t. The same will be true here: Artists who use AI will (largely) replace artists who don’t. Imagine the transition in the 1800s from the time when each artist had to source their own minerals to mix shades of paint to when they could purchase ready-mixed paint in a tube. This development made it easier for any artist to paint whatever and whenever they wished. I see a similar transition ahead. What an exciting time!

Separately from generating images for human consumption, these algorithms have great potential to generate images for machine consumption. A number of companies have been developing image generation techniques to produce training images for computer vision algorithms. But because of the difficulty of generating realistic images, many have focused on vertical applications that are sufficiently valuable to justify their investment, such as generating road scenes to train self-driving cars or portraits of diverse faces to train face recognition algorithms.

Will image generation algorithms reduce the cost of data generation and other machine-to-machine processes? I believe so. It will be interesting to see this space evolve.

Keep learning!

Andrew

Share

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox