University College Dublin

3 Posts

Abeba Birhane: Clean up web datasets

From language to vision models, deep neural networks are marked by improved performance, higher efficiency, and better generalizations. Yet, these systems are also marked by perpetuation of bias and injustice.

Series of example of accurate and inaccurate matching images to text

University College Dublin

Crawl the Web, Absorb the Bias: NLP Models Absorb Biases from Web Training Data

The emerging generation of trillion-parameter models needs datasets of billions of examples, but the most readily available source of examples on that scale — the web — is polluted with bias and antisocial expressions. A new study examines the issue.

University College Dublin

Tiny Images, Outsized Biases: Why MIT withdrew the Tiny Images dataset

MIT withdrew a popular computer vision dataset after researchers found that it was rife with social bias. Researchers found racist, misogynistic, and demeaning labels among the nearly 80 million pictures in Tiny Images, a collection of 32-by-32 pixel color photos.

University College Dublin

Abeba Birhane: Clean up web datasets

Crawl the Web, Absorb the Bias: NLP Models Absorb Biases from Web Training Data

Tiny Images, Outsized Biases: Why MIT withdrew the Tiny Images dataset

Subscribe to The Batch