Large Language Models Shrink Gopher and RETRO prove lean language models can push boundaries.

Published

Dec 15, 2021

Reading time

2 min read

DeepMind released three papers that push the boundaries — and examine the issues — of large language models.

What’s new: The UK-based subsidiary of Alphabet, Google’s parent company, unveiled a pair of transformer models that take different approaches to achieving state-of-the-art performance in a variety of language tasks. The company also pinpointed risks that are likely to intensify as such models continue to improve.
How it works: The company detailed its findings in three papers.

Gopher is based on OpenAI’s GPT-2. The 280-billion-parameter model was trained on a 10.5-terabytes corpus, called MassiveText, of news, books, Wikipedia articles, and other web pages. Tested on 152 tasks including the BIG-bench and MMLU benchmarks, it set a new state of the art in 80 percent of them.
Retrieval Enhanced Transformer (RETRO) achieved results similar to those of Gopher in 7 billion parameters. It makes up for its smaller size by retrieving passages from MassiveText and integrating them through what DeepMind calls chunked cross-attention, which finds relationships between the input and retrieved data.
A third paper offers a taxonomy of 21 social and ethical risks that such models pose. For instance, they could inadvertently perpetuate stereotypes and toxic language, spread harmful misinformation, disclose sensitive information, and create an undue environmental burden from energy use. The paper lists strategies to alleviate such risks, including developing better datasets and building more transparent models.

Behind the news: Gopher and RETRO run counter the trend toward ever-larger language models. On the other hand, RETRO’s querying strategy extends recent research into connecting language models with external sources of knowledge.

Considering its performance, Gopher’s 280-billion parameter count is conservative compared to that of Microsoft-Nvidia’s Megatron (530 billion) and Beijing Academy of Artificial Intelligence’s WuDao 2.0 (1.75 trillion).
RETRO’s ability to gather external information is similar to that of Facebook’s RAG and Google’s REALM. An additional benefit: The database can be updated, giving the model access to newer or more accurate information without retraining.

Why it matters: Natural language models have made great strides in recent years, but much work remains to be done to make them reliable and compact enough for a wide variety of applications. With this triad of papers, DeepMind offers a multifaceted approach to delivering on this promise.

We’re thinking: The idea that machine learning models don’t need to learn everything but can query external sources during inference could be a key to building more efficient systems.

Subscribe to The Batch