Data-Centric AI

2 Posts

Plot demonstrating the relative sizes of parallel and monolingual examples
Data-Centric AI

Massively Multilingual Translation: Machine Learning Model Trained to Translate 1,000 Languages

Recent work showed that models for multilingual machine translation can increase the number of languages they translate by scraping the web for pairs of equivalent sentences in different languages. A new study radically expanded the language repertoire through training on untranslated web text.
Two randomly cropped pictures
Data-Centric AI

Tradeoffs for Higher Accuracy: Data Augmentation Plus Weight Decay can Boost Some AI Models

Vision models can be improved by training them on several altered versions of the same image and also by encouraging their weights to be close to zero. Recent research showed that both can have adverse effects that may be difficult to detect.

