The competitive landscape of large language models (LLMs) is evolving quickly. The ultimate winners are yet to be determined, and already the current dynamics are exciting. Let me share a few observations, focusing on direct-to-consumer chat interfaces and the LLM infrastructure and application layers.
First, ChatGPT is a new category of product. It’s not just a better search engine, auto-complete, or something else we already knew. It overlaps with other categories, but people also use it for entirely different purposes such as writing and brainstorming. Companies like Google and Microsoft that are integrating LLMs into existing products may find that the complexity of switching not only technologies but also product categories raises unique challenges.
OpenAI is clearly in the lead in offering this new product category, and ChatGPT is a compelling direct-to-consumer product. While competitors are emerging, OpenAI’s recent move to have ChatGPT support third-party plugins, if widely adopted, could make its business much more defensible, much like the app stores for iOS and Android helped make those platforms very defensible businesses.
Second, the LLM infrastructure layer, which enables developers to interact with LLMs via an API, looks extremely competitive. OpenAI/Microsoft leads in this area as well, but Google and Amazon have announced their own offerings, and players such as Hugging Face, Meta, Stability AI, and many academic institutions are busy training and releasing open source models. It remains to be seen how many applications will need the power of the largest models, such as GPT-4, versus smaller (and cheaper) models offered by cloud providers or even hosted locally, like gpt4all, which runs on a desktop.
Finally, the application layer, in which teams build on top of LLMs, looks less competitive and full of creativity. While many teams are piling onto “obvious” ideas — say, building question-answering bots or summarizers on top of online content — the sheer diversity of potential LLM-powered applications leaves many ideas relatively unexplored in verticals including specialized coaching and robotic process automation. AI Fund, the venture studio I lead, is working with entrepreneurs to build applications like this. Competition feels less intense when you can identify a meaningful use case and go deep to solve it.
LLMs are a general-purpose technology that’s making many new applications possible. Taking a lesson from an earlier era of tech, after the iPhone came out, I paid $1.99 for an app that turned my phone into a flashlight. It was a good idea, but that business didn’t last: The app was easy for others to replicate and sell for less, and eventually Apple integrated a flashlight into iOS. In contrast, other entrepreneurs built highly valuable and hard-to-build businesses such as AirBnB, Snapchat, Tinder, and Uber, and those apps are still with us. We may already have seen this phenomenon in generative AI: Lensa grew rapidly through last December but its revenue run appears to have collapsed.
Today, in a weekend hackathon, you can build a shallow app that does amazing things by taking advantage of amazing APIs. But over the long term, what excites me are the valuable solutions to hard problems that LLMs make possible. Who will build generative AI’s lasting successes? Maybe you!
One challenge is that the know-how for building LLM products is still evolving. While academic studies are important, current research offers a limited view of how to use LLMs. As the InstructGPT paper says, “Public NLP datasets are not reflective of how our language models are used. . . . [They] are designed to capture tasks that are easy to evaluate with automatic metrics.”
In light of this, community is more important than ever. Talking to friends who are working on LLM products often teaches me non-intuitive tricks for improving how I use them. I will continue trying to help others wherever I can.
P.S. On Tuesday April 25, 2023, I’ll share early ideas on Visual Prompting in a livestream on behalf of my team Landing AI. LLMs let users enter a text prompt and quickly get a text output, which has transformed natural language processing. I’m excited about taking these ideas from text to computer vision so we can let users enter a visual prompt (labeling a few pixels) and quickly get a visual output. You can sign up for the livestream here.