Recent advances in deep learning largely have come by brute force: taking the latest architectures and scaling up compute power, data, and engineering. Do we have the architectures we need, and all that remains is to develop better hardware and datasets so we can keep scaling up? Or are we still missing something?
I believe we’re missing something, and I hope for progress toward finding it in the coming year.
I’ve been studying, in collaboration with neuroscientists and cognitive neuroscientists, the performance gap between state-of-the-art systems and humans. The differences lead me to believe that simply scaling up is not going to fill the gap. Instead, building into our models a human-like ability to discover and reason with high-level concepts and relationships between them can make the difference.
Consider the number of examples necessary to learn a new task, known as sample complexity. It takes a huge amount of gameplay to train a deep learning model to play a new video game, while a human can learn this very quickly. Related issues fall under the rubric of reasoning. A computer needs to consider numerous possibilities to plan an efficient route from here to there, while a human doesn’t.
Humans can select the right pieces of knowledge and paste them together to form a relevant explanation, answer, or plan. Moreover, given a set of variables, humans are pretty good at deciding which is a cause of which. Current AI techniques don’t come close to this human ability to generate reasoning paths. Often, they’re highly confident that their decision is right, even when it’s wrong. Such issues can be amusing in a text generator, but they can be life-threatening in a self-driving car or medical diagnosis system.
Current systems behave in these ways partly because they’ve been designed that way. For instance, text generators are trained simply to predict the next word rather than to build an internal data structure that accounts for the concepts they manipulate and how they are related to each other. But I think we can design systems that track the meanings at play and reason over them while keeping the numerous advantages of current deep learning methodologies. In doing so, we can address a variety of challenges from excessive sample complexity to overconfident incorrectness.
I’m excited by generative flow networks, or GFlowNets, an approach to training deep nets that my group started about a year ago. This idea is inspired by the way humans reason through a sequence of steps, adding a new piece of relevant information at each step. It’s like reinforcement learning, because the model sequentially learns a policy to solve a problem. It’s also like generative modeling, because it can sample solutions in a way that corresponds to making a probabilistic inference.
If you think of an interpretation of an image, your thought can be converted to a sentence, but it’s not the sentence itself. Rather, it contains semantic and relational information about the concepts in that sentence. Generally, we represent such semantic content as a graph, in which each node is a concept or variable. GFlowNets generate such graphs one node or edge at a time, choosing which concept should be added and connected to which others in what kind of relation.
I don’t think this is the only possibility, and I look forward to seeing a multiplicity of approaches. Through a diversity of exploration, we’ll increase our chance to find the ingredients we’re missing to bridge the gap between current AI and human-level AI.
Yoshua Bengio is a professor of computer science at Université de Montréal and scientific director of Mila - Quebec AI Institute. He received the 2018 A.M. Turing Award, along with Geoffrey Hinton and Yann LeCun, for his contribution to breakthroughs in deep learning.