Ready, Set, Improve the Data!

Published

Jun 16, 2021

Reading time

1 min read

Dear friends,

I’m thrilled to announce the first data-centric AI competition! I invite you to participate.

For decades, model-centric AI competitions, in which the dataset is held fixed while you iterate on the code, have driven our field forward. But deep learning has matured to the point that, for many applications, an open-source model works just fine — if we can prepare the right data to train it. What we urgently need now are methods, tools, and platforms for getting the data we need efficiently and systematically.

This competition, a collaboration between Landing AI and DeepLearning.AI, offers an opportunity to develop methods for improving data.

In the grand tradition of MNIST, the dataset assembled by Yann LeCun and his colleagues that has driven much model-centric progress, this competition will use a new dataset called Roman MNIST. It’s a noisy collection of handwritten Roman numerals to serve as a starting point for making a dataset for this task.

Can you develop a dataset that results in the best performance on this problem?

The competition will end on September 4, 2021 — the birthday of John McCarthy, who coined the term artificial intelligence. The winners will be invited to join me at a private roundtable event to share ideas about how to grow the data-centric movement, and I will highlight their work here in The Batch.

I’m grateful to Chris Re at Stanford and D Sculley at Google for advising us on this competition, and to everyone who contributed their thoughts on social media.

There will be more data-centric AI competitions in the future. But if you join this one with me, you’ll be able to tell your friends that you were there at the very beginning of the data-centric AI movement! You’ll find further information here.

Keep preparing data!

Andrew

Subscribe to The Batch