Chips specially designed for AI are becoming much faster at training neural networks, judging from recent trials.
What’s new: MLPerf, an organization that’s developing standards for hardware performance in machine learning tasks, released results from its third benchmark competition. Nvidia’s latest products led the pack, but Google’s forthcoming hardware surpassed Nvidia’s scores.
Start your engines: MLPerf measures how long it takes various hardware configurations to train particular machine learning models. Tasks include object detection, image classification, language translation, recommendation, and reinforcement learning goals.
- Systems from nine organizations trained models 2.7 times faster, on average, than they did in tests conducted last November, demonstrating the rapid evolution of AI hardware (and enabling software such as compilers).
- Nvidia submitted 40 different configurations. Those based on its A100 graphics processing unit (GPU) scored highest among commercially available systems.
- Showing off capabilities that aren’t yet on the market, Google dominated six of the eight tasks with its fourth-generation tensor processing unit (TPU). Earlier versions are available via the Google Cloud platform.
- Alibaba, Fujitsu, Intel, Inspur, Shenzhen Institute, and Tencent also joined the competition. Conspicuously absent: AI hardware upstarts Cerebras and Graphcore (see “New Horsepower for Neural Nets” below).
Behind the news: Nvidia’s GPUs have long been the premier machine learning chips, thanks to their ability to process large volumes of floating point integers per second. But startups including Cerebras, Graphcore, and Habana (acquired by Intel in December) are vying for that position, and Google Cloud is making a strong play for AI workloads.
Why it matters: It’s good to be past the era of Mythbusters videos as a way to compare AI hardware. Machine learning engineers benefit from faster, more energy-efficient hardware systems, but we need clear, consistent metrics like MLPerf to evaluate hardware performance with particular models.
We’re thinking: Since MLPerf’s first tests two years ago, the time required to train some models has plummeted from hours to seconds. Clearly semiconductor companies have been chipping away at the problem.