The Unexpected Power of Large Language Models Training on massive amounts of text partly offsets lack of exposure to other data types.

Reading time
2 min read
Person sitting on the floor and reading a book in a giant library

Dear friends,

Recent successes with large language models have brought to the surface a long-running debate within the AI community: What kinds of information do learning algorithms need in order to gain intelligence?

The vast majority of human experience is not based on language. The taste of food, the beauty of a sunrise, the touch of a loved one β€” such experiences are independent of language. But large language models have shown that it’s possible to capture a surprisingly rich facsimile of human experiences by consuming far more language than any human can in a lifetime.

Prior to recent advances in large language models, much of the AI community had viewed text as a very limited source of information for developing general-purpose intelligence. After all, animals evolved intelligence without language. Intelligence includes perceiving the world through sight, sound, and other senses; knowing how to move our bodies; having a common-sense understanding of physics, such as how to knock a fruit off a high tree; and being able to plan simple actions to find food, shelter, or a mate. Writing is a relatively recent invention that dates back only around 5,500 years. Spoken language arose roughly 100,000 years ago. In contrast, mammals have been around for around 200 million years.

If AI development were to follow the path of evolution, we would start by trying to build insect-level intelligence, then mouse-level intelligence, perhaps followed by dog-level, monkey-level, and finally human-level. We would focus on tasks like vision and psychomotor skills long before the ability to use language.

But models like ChatGPT show that language, when accessed at massive scale, overcomes many of its limitations as a source of information. Large language models can learn from more words β€” several orders of magnitude more! β€” than any individual human can.

  • In a typical year, a child might hear around 10 million words (with huge variance depending on factors such as the family). So, by age 10, the child might have heard 100 million words.
  • If you read 24/7 for a year at a rate of 250 words per minute, you’d read about 130 million words annually.
  • GPT-3 was trained on about 500,000 million words.

An individual human would need dozens of lifetimes spent doing nothing but reading to see the number of words that GPT-3 considered during its training. But the web aggregates text written for or by billions of individuals, and computers have ready access to much of it. Through this data, large language models (LLMs) capture a wealth of knowledge about the human experience. Even though an LLM has never seen a sunrise, it has read enough text about sunrises to describe persuasively what one looks like.

So, even though language is a small part of human experience, LLMs are able to learn a huge amount of information about the world. It goes to show that there are multiple paths to building intelligence, and that the path followed by evolution or human children may not be the most efficient way for an engineered system.

Seeing the entire world only through the lens of text β€” as rich as it turns out to be, and as valuable as systems trained on text have become β€” is still ultimately an impoverished world compared to the one we live in. But relying on text alone has already taken us quite far, and I expect this direction to lead to exciting progress for years to come.

Keep learning!



Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox