Here’s a conundrum: Deep learning could help address a variety of intractable problems, climate change among them. Yet neural networks can consume gargantuan quantities of energy, potentially dumping large amounts of heat-trapping gas into the atmosphere.
What’s new: Researchers studying the energy implications of deep learning systems report that training the latest language models can generate as much atmospheric carbon as five cars over their lifetime, including manufacturing.
How bad is it? Training a Transformer with 65 million parameters generates 26 pounds of carbon dioxide equivalent, a bit more than burning a gallon of fuel. OK, we’ll bike to work. But a Transformer roughly four times the size tuned using neural architecture search generated 626,155 pounds (a car produces 126,000 pounds from factory to scrapyard). To make matters worse, developing an effective model generally requires several training cycles. A typical figure is around 78,000 pounds, the researchers conclude.
How they measured: Emma Strubell, Ananya Ganesh, and Andrew McCallum at the University of Massachusetts Amherst considered Transformer, BERT, ELMo, and GPT-2. For each model, they:
- Trained for one day.
- Sampled energy consumption throughout.
- Multiplied energy consumption by total training time reported by each model’s developers.
- Converted total energy consumption into pounds of CO2 equivalent based on average power generation in the U.S.
The debate: The conclusions sparked much discussion on Twitter and Reddit. The researchers based their estimates on the average U.S. energy mix. However, some of the biggest AI platforms are far less carbon-intensive. Google claims its AI platform runs on 100 percent renewable energy. Amazon claims to be 50 percent renewable. The researchers trained on a GPU, ignoring the energy efficiency of more specialized chips like Google’s TPU. Moreover, the most carbon-intensive scenario cost between $1 million and $3 million — not an everyday expense. Yes, AI is energy-intensive, but further research is needed to find the best ways to minimize the impact.
Everything is political: Ferenc Huszár, a Bayes booster and candidate for the International Conference on Machine Learning’s board of directors, tried to take advantage of the buzz. He proposed “phasing out deep learning within the next five years” and advised his Twitter followers to “vote Bayesian” in the ICML's upcoming election. ¯\_(ツ)_/¯
We’re thinking: This work is an important first step toward raising awareness and quantifying deep learning's potential CO2 impact. Ever larger models are bound to gobble up energy saved by more efficient architectures and specialized chips. But the real issue is how we generate electricity. The AI community has a special responsibility to support low-carbon computing and sensible clean energy initiatives.