Itching to get your hands on a fully trained large language model? The wait is over.
What’s new: Meta introduced the OPT family of transformer-based language models with nearly unfettered access to source code and trained weights. The family’s eight models range in size from 125 million to 175 billion parameters.
How it works: The OPT architecture is similar to that of OpenAI’s GPT-3. The models were trained on publicly available datasets that include novels, news articles, Reddit posts, and a subset of The Pile.
- The 175 billion parameter version, OPT-175B, is designed to approximate GPT-3. It has the same number of parameters, performs with comparable accuracy, and shows a similar propensity to generate worrisome output. It’s available for non-commercial use to researchers affiliated with organizations in academia, industry, government, and civil society but not to military researchers or those who work with biometric or surveillance data. You can request access here.
- The smaller versions — 125 million, 350 million, 1.3 billion, 2.7 billion, 6.7 billion, 13 billion, and 30 billion parameters — are freely available to anyone. Meta hopes this will encourage researchers to study the effects of varying scale.
- The release includes a log that documents successes, failures, bugs, and breakthroughs the team encountered while training OPT-175B over three months.
Behind the news: OPT-175B is the largest and most ambitious open-source language model to date, but it’s not the first.
- Last year, Google published the code library for the 1.6 trillion parameter Switch Transformer. It didn’t provide access to the trained weights.
- In February, the machine learning collective EleutherAI released its trained 20 billion-parameter GPT-NeoX. The group is also responsible for The Pile, an 812-gigabyte compilation of 22 text datasets.
- Hugging Face's BigScience project aims to release a trained 200 billion-parameter language model. So far, it has open-sourced the 11 billion parameter T0 series.
Yes, but: A parameter count of 175 billion parameters is mouthwatering, but it takes a lot of horsepower to drive a model that large. As Maarten Sap of the Allen Institute for Artificial Intelligence told IEEE Spectrum, “[I’d] love to use OPT-175B,” but “few research labs actually have the infrastructure to run this model.”
Why it matters: For researchers — well, for anyone interested in language modeling, really — the opportunity is obvious. OPT comes pretrained, ready to be used, fine-tuned, dissected, or adapted for any purposes the AI community dreams up. No more APIs! No more paywalls! It’s your party, so indulge yourself. For Meta, open-sourcing these models may have several benefits. Giving away OPT is a community-minded gesture at a time when the company has been under fire for proliferating hatred, misinformation, and disinformation on a grand scale. It’s a bid to attract talent that could help break in young engineers to the company’s coding practices. And it’s a shot at OpenAI, the former nonprofit, open-source shop, which was criticized for keeping GPT-3’s code under wraps.
We’re thinking: The OPT-175B training log offers a rare look at a large-scale machine learning project. While the mass media may imagine bespectacled programmers in airy, well-lit rooms debating the nature of intelligence, technology development is often messy as researchers struggle to visualize what an algorithm is doing or trace the source of a GPU crash. Worth a look!