Billions of dollars invested to create novel AI hardware will bear their early fruit in 2020.
Google unleashed a financial avalanche with its tensor processing unit in 2017. The past year saw specialized AI processors from Alibaba, Cerebras, Graphcore, Habana, and Intel, with many others in the pipeline. These new chips will find their way slowly into research labs and data centers. I hope the AI community will embrace the best of them, pushing the field toward better models and more valuable applications.
How can machine learning engineers know whether a newfangled alternative performs better than the conventional CPU-plus-GPUs combo?
Computer architecture is graded on a curve rather than an absolute scale. To account for differing computer sizes, we normalize performance by price, power, or numbers of chips. Competitors select a set of representative programs to serve as a benchmark. Averaging scores across many of these programs is more likely to reflect real performance than scores on any single one.
MLPerf is a recent benchmark for machine learning created by representatives from more than 50 companies and nine universities. It includes programs, data sets, and ground rules for testing both inference and training, specifying important details like the accuracy target and valid hyperparameter values. New versions occur every three months (alternating inference and training) to keep up with rapid advances in machine learning.
Not every product can win a fair comparison, so some marketing departments may sidestep MLPerf, saying some version of, “Our customers don’t care about the programs in MLPerf.” But don’t be fooled. First, MLPerf welcomes new programs, so if a given workload isn’t in MLPerf, it can be added. Second, competitors check MLPerf results for fairness to ensure apples-to-apples comparisons.
Caveat emptor. Ask to see MLPerf scores!
David Patterson is a professor of computer science at UC Berkeley.