An Expanding Universe of Large Language Models From ChatGPT to the open source GPT4All, the bounty of large language models means opportunities for users and developers alike.

Letters

Technical Insights

Published

Nov 29, 2023

Reading time

3 min read

Dear friends,

One year since the launch of ChatGPT on November 30, 2022, it’s amazing how many large language models are available.

A year ago, ChatGPT was pretty much the only game in town for consumers (using a web user interface) who wanted to use a large language model (LLM), and a handful of models from OpenAI were the only options for developers (making API calls). Today, numerous open and closed source models are within easy reach. ChatGPT is the most popular way for consumers to chat with an LLM, but others abound, including Microsoft Bing, Google Bard, and offerings from startups such as Anthropic Claude, Inflection Pi, and perplexity.ai. There are also multiple options for developers, including APIs from Amazon Web Services, Azure, Cohere, Google Cloud, Hugging Face, OpenAI, and many others. The proliferation of options is exciting, and I hope it will continue!

For both consumer and developer use cases, open source models that you can host yourself, or even run locally on your laptop, are getting surprisingly good. For many applications, a good open source model can perform perhaps on par with the ChatGPT-3.5 of a year ago. The open source GPT4All and MLC, and closed source LM Studio (which has a very nice user interface) are making it easier than ever to run models locally. Running models locally used to be an esoteric act restricted to developers who were willing to struggle through complex installation and configuration processes, but it’s now becoming much more widely accessible.

I regularly use a chatbot as a thought partner. These days, I find myself using an LLM running on my laptop (which runs fairly quickly and guarantees privacy, since my data stays on my machine) about as often as a cloud-hosted one. I use a cloud-hosted model when I need a level of performance I can't get from a smaller, locally run, open source one. For instance, I often use GPT-4 for tricky problems and creative brainstorming.

While safety is important — we don't want LLMs to casually hand out harmful instructions — I find that offerings from most of the large providers have been tuned to be "safer" than I would like for some use cases. For example, sometimes a model has refused to answer basic questions about an activity that harms the environment, even though I was just trying to understand that activity and had no intention of committing that harm. There are now open source alternatives that are less aggressively safety-tuned that I can use responsibly for particular applications.

The wealth of alternatives is also a boon to developers. An emerging design pattern is to quickly build a prototype or initial product that delivers good performance by prompting an LLM, perhaps an expensive one like GPT-4. Later, if you need cheaper inference or better performance for a particular, narrowly scoped task, you can fine-tune one of the huge number of open source LLMs to your task. (Some developers reportedly are using data generated by GPT-4 for their own fine-tuning, although it’s not clear whether this violates its terms of use.)

In a year, we've gone from having essentially one viable option to having at least dozens. The explosion of options brings with it the cost of choosing a good one; hopefully our short courses on generative AI can help with that. If you have experience with open source LLMs that you’d like to share, or if you’ve found some models more useful than others in particular applications or situations, please let me know on social media!

Keep learning!

Andrew

P.S. Our new short course on advanced retrieval augmented generation (RAG) techniques is out! Taught by Jerry Liu and Anupam Datta of Llama Index and TruEra, it teaches retrieval techniques such as sentence-window retrieval and auto-merging retrieval (which organizes your document into a hierarchical tree structure to let you pick the most relevant chunks). The course also teaches a methodology to evaluate the key steps of RAG separately (using context relevance, answer relevance, and groundedness) to analyze errors and improve performance. Please check out this course!

Subscribe to The Batch