3 Posts

Dependency between compute budget and number of parameters

Right-Sizing Models for the Dataset: Finding the Best Data-To-Parameter Ratio for NLP Models

The route to improving transformer-based language models like GPT-3 and Gopher, which are trained on immense quantities of text scraped from the web, has been to increase their size. But research shows that, given a processing budget, bigger doesn’t necessarily mean better.
Photograph of Yale Song

Yale Song: Foundation models for vision

Large models pretrained on immense quantities of text have been proven to provide strong foundations for solving specialized language tasks. My biggest hope for AI in 2022 is...
Two images showing RETRO Architecture and Gopher (280B) vs State of the Art

Large Language Models Shrink: Gopher and RETRO prove lean language models can push boundaries.

DeepMind released three papers that push the boundaries — and examine the issues — of large language models.

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox