Cutting the Carbon Cost of Training A New Tool Helps NLP Models Lower Their Gas Emissions

Published

Jul 20, 2022

Reading time

2 min read

You can reduce your model’s carbon emissions by being choosy about when and where you train it.

What’s new: Researchers at the Allen Institute for AI, HuggingFace, Microsoft, the University of Washington, Carnegie Mellon University, and the Hebrew University of Jerusalem developed a tool that measures atmospheric carbon emitted by cloud servers while training machine learning models. After a model’s size, the biggest variables were the server’s location and time of day it was active.

How it works: The authors’ calculations account for kilowatt hours used by a cloud computing system, emissions from the local electrical grid, and emissions while manufacturing and disposing of the system’s hardware. They based their method on an approach developed by the Green Software Foundation.

The authors trained or fine-tuned 11 language and vision models: two BERTs, one 6.1 billion-parameter Transformer language model (which they trained only to 13 percent completion), three DenseNets with parameter counts ranging from 8 million to 20 million, and five Vision Transformers from 20 million to 632 million parameters.
They drew on data that described the carbon cost of generating electricity in eight U.S. regions, six European regions, and one region each in Canada and Australia. They used historical data to analyze how emissions would differ depending on the time of day or year.
They tested the impact of two emissions-reduction options offered by Microsoft’s Azure Cloud. Flexible Start starts processing at times that are expected to reduce carbon emissions. Pause and Resume processes intermittently during low-emission time frames.

Results: Training a model in a low-emissions region like France and Norway could save over 70 percent of the carbon that would be emitted in a carbon-heavy region like the central United States or Germany.

The time of day had a subtle impact on emissions. Starting a training run at midnight, for instance, increased emissions by 8 percent compared to starting at 6:00 a.m.
The Azure Cloud options had little impact on emissions released in training smaller models over short periods of time (less than 30 minutes). However, when training the 6.1 billion-parameter transformer over eight days, they cut emissions by up to 25 percent.

Yes, but: A 2021 study found that large transformers consume more energy, and yield more carbon emissions, during inference than training.

Behind the news: Energy consumption and the associated carbon emissions are growing concerns as machine learning models and datasets balloon.

A 2019 study of deep learning’s carbon footprint found that training a single large language model could release the same quantity of CO2 as a car over five years of driving.
Last year, the MLPerf processing benchmark added an energy-efficiency test.

Why it matters: Atmospheric carbon is causing changes in climate that are devastating many communities across the globe. Data centers alone accounted for 1 percent of electricity consumed globally in 2020 (although the portion of data center usage devoted to AI is unknown). Machine learning engineers can do their part to reduce carbon emissions by choosing carefully when and where to train models.

We’re thinking: It's impractical to expect every team to minimize carbon emissions by choosing times and locations to process training jobs. We urge cloud providers to consider pricing and other signals that would help — better yet, incentivize — engineers to cut emissions.

Subscribe to The Batch