Yann LeCun — Learning From Observation The power of self-supervised learning

Published

Jan 01, 2020

Reading time

2 min read

How is it that many people learn to drive a car fairly safely in 20 hours of practice, while current imitation learning algorithms take hundreds of thousands of hours, and reinforcement learning algorithms take millions of hours? Clearly we’re missing something big.

It appears that humans learn efficiently because we build a model of the world in our head. Human infants can hardly interact with the world, but over the first few months of life they absorb a huge amount of background knowledge by observation. A large part of the brain apparently is devoted to understanding the structure of the world and predicting things we can’t directly observe because they’re in the future or otherwise hidden.

This suggests that the way forward in AI is what I call self-supervised learning. It’s similar to supervised learning, but instead of training the system to map data examples to a classification, we mask some examples and ask the machine to predict the missing pieces. For instance, we might mask some frames of a video and train the machine to fill in the blanks based on the remaining frames.

This approach has been extremely successful lately in natural language understanding. Models such as BERT, RoBERTa, XLNet, and XLM are trained in a self-supervised manner to predict words missing from a text. Such systems hold records in all the major natural language benchmarks.

In 2020, I expect self-supervised methods to learn features of video and images. Could there be a similar revolution in high-dimensional continuous data like video?

One critical challenge is dealing with uncertainty. Models like BERT can’t tell if a missing word in a sentence is “cat” or “dog,” but they can produce a probability distribution vector. We don’t have a good model of probability distributions for images or video frames. But recent research is coming so close that we’re likely to find it soon.

Suddenly we’ll get really good performance predicting actions in videos with very few training samples, where it wasn’t possible before. That would make the coming year a very exciting time in AI.

Yann LeCun is vice president and chief AI scientist at Facebook and a professor of computer science at New York University.

Subscribe to The Batch