We just launched a Data-Centric AI Resource Hub to help you improve the performance of AI systems by systematically engineering the underlying data. It offers new articles by Nvidia director of machine learning research Anima Anandkumar, Stanford computer science professor Michael Bernstein, and Google Brain director of engineering D. Sculley. It also includes talks from the NeurIPS Data-Centric AI Workshop that was held in December. We’ll be adding more helpful articles and videos in coming months.
Working effectively with human labelers is a key part of Data-Centric AI. My friend Michael Bernstein is an expert in human-computer interface (HCI), a discipline that offers many insights for empowering labelers. His article explains some of the most important ones.
For example, given a task in computer vision, natural language processing, or speech recognition, it’s common to ask several crowdsourced labelers to annotate the same example and take the mean or majority-vote label. Many clever ideas have been proposed to improve the labeling process, such as testing labeler accuracy, developing novel voting mechanisms, and routing examples to labelers in sophisticated ways.
Surprisingly, Michael has found that it's often better to invest in hiring and training a few annotators than to focus on improving the process. Alternatively, the best process may be one that enables you to build a small team of skilled labelers.
Working with a smaller, committed team also makes it easier to discover and fix ambiguities in your labeling instructions. Michael writes, “When something goes wrong, your reactions should be, ‘What did I do wrong in communicating my intent?,’ not, ‘Why weren’t they paying attention?’”
Every machine learning engineer and data scientist can take advantage of Data-Centric AI techniques. And, because the data-centric approach changes the workflow of AI development, software engineers and product managers can also benefit. So please visit the Data-Centric AI Resource Hub, and tell your friends and colleagues about it, too.