A new consortium of companies, schools, and research labs is building open tools for next-generation machine learning.
What’s new: MLCommons aims to foster innovation in machine learning by developing new benchmarks, datasets, and best practices. Its founding board includes representatives of Alibaba, Facebook, Google, Intel, and DeepLearning.AI’s sister company Landing AI.
Fresh resources: The group kicked off by releasing two products:
- People’s Speech contains 87,000 hours of spoken-word examples in 59 languages (mostly English). It includes audio from the internet, audiobooks, and about 5,000 hours of text generated by GPT-3 and spoken by a voice synthesizer.
- MLCube is an interface for sharing models, including data and parameters, via container systems like Docker. Models can run locally or in the cloud.
Behind the news: MLCommons grew out of the development of MLPerf, a benchmark for measuring hardware performance on machine learning tasks. MLCommons will continue to steward MLPerf.
Why it matters: Publicly available datasets and benchmarks have spurred much of AI’s recent progress. Producing such resources is expensive, and doing it well requires expertise from several subdisciplines of AI. MLCommons brings together more than 50 organizations to keep the community fueled with the tools necessary to continue innovating.
We’re thinking: Datasets from the Linguistic Data Consortium and others have been a boon for speech recognition research in academia, but academic researchers still lack datasets on the scale used by big tech companies. Access to 87,000 hours of speech will help these groups to develop cutting-edge speech systems.