ChatGPT’s image generator is available via API.
What’s new: GPT Image 1, which produces images from text or other images, has proven enormously popular among ChatGPT users. The OpenAI Images API enables developers to incorporate OpenAI’s most sophisticated image generator into their own software tools and platforms.
- Input/output: Text and images in, images out
- Architecture: Autoregressive (details undisclosed)
- Performance: Currently tops Artificial Analysis’ Image Arena leaderboard.
- Price: $5 per 1 million tokens of text input, $10 per 1 million tokens of image input, $40 per 1 million tokens of image output (roughly $0.02, $0.07, and $0.19 per generated image for low, medium, and high-quality square images, respectively)
- Undisclosed: Architecture details, parameter count, training data, training methods
How it works: GPT Image 1 generates and modifies images in a wide range of styles, performs image editing and other alterations, renders text, and follows detailed instructions. Shortly after its debut, the version of GPT-4o equipped with GPT Image 1 quickly soared to the No. 1 spot on the Artificial Analysis Image Arena leaderboard.
- The model employs an autoregressive design rather than the more typical diffusion architecture (like Open AI’s DALL·E 3), using generated parts of an image to predict the next part.
- Its pricing structure differs from rivals, charging by input/output tokens rather than per image generated.
- The model’s output is watermarked unobtrusively with C2PA data that identifies it as AI-generated.
- The model may struggle to process non-English text, small type, rotated type, varying colors and styles, counting, and localization in space such as positions of pieces on a game board.
Behind the news: In March, OpenAI attracted huge public interest when it deployed the model, then unnamed, in ChatGPT. Within the first week, 130 million users used it to create more than 700 million images.
Why it matters: Adding GPT Image 1 to the API enables developers to use OpenAI’s most sophisticated image generator in a wide variety of automated workflows. OpenAI’s initial API partners include design companies (Adobe and Canva), marketers (HubSpot), and web designers (GoDaddy), all of which are using GPT Image 1.
We’re thinking: GPT Image 1 is part of an exciting trend toward unification of multimodal architectures. Researchers have progressed from text-in, text-out to text/images-in, text-out and increasingly text/images/audio-in, text/images/audio-out. This paints a beautiful picture of where multimodal models can go!