AI researchers keep coming up with impressive innovations: transformer-based language models, self-supervised learning, deep reinforcement learning, small data. All of these developments hold great promise. But some will continue to improve over time and set new directions for AI, and others will turn out to have less impact.
How can you tell which is which?
I remember seeing early data, over a decade ago, that indicated deep learning algorithms could scale up to become very useful. Similarly, I remember thinking that sequence-to-sequence models, when they were first presented and not yet working well, set a new direction. In these instances, my instincts turned out to be right. But I’ve been wrong, too. For example, in the mid-2000s, I thought that mobile manipulation would take off faster than it has so far.
I’ve thought about how to evaluate whether an exciting idea that doesn’t yet work well is likely to become a winner or whether it’s unlikely to improve much for a long time. Over the past decade, three major drivers of improvement in AI performance have been:
- Computational scaling: Does running an algorithm on computers 10 or 100 times faster result in better performance?
- Data scaling: Does feeding an AI system more data improve its performance?
- Algorithmic improvements: Does the data available still hold a significant amount of information that current algorithms do not extract?
I believe these three factors will continue to drive AI performance for years to come. Thus, nascent ideas that can take advantage of them seem more promising to me. If the “only” thing a new algorithm requires to be useful is a 10x improvement in computation speed, you have Nvidia, Intel, and AMD working hard to make that improvement, so it’s a good bet that it will happen.
This reasoning leads me to believe that GPT-3 is setting a new direction for building language models and applications. I see a clear path toward scaling computation (by making models cheaper to run or building bigger ones) and algorithmic improvements. At AI Fund (where I’m managing general partner), we’re seeing many entrepreneurs looking to build new companies using GPT-3.
On the other hand, I don’t expect quantum computing to have a dramatic impact on AI any time soon. I look forward to quantum AI and I’m glad that many groups are investing in it. But it doesn’t appear to ride any of the three drivers above, and I believe it will take a significant amount of time to become practical for machine learning.
Regarding algorithmic improvements, it’s important to note that the information must be in the data for an algorithm to extract it. If someone’s DNA doesn’t contain enough information to determine whether that person will develop diabetes, then no amount of algorithmic work will yield the ability to predict the disease from only the genetic sequence. If humans can perform a task, that’s strong evidence that the data available to humans holds information helpful for completing that task — and that points to the possibility that algorithmic improvements can enable AI to complete it, too.
This is why I believe that small data is a promising area: A handful of pictures contains sufficient information for a human to learn to recognize a new object. This offers hope that improved algorithms will be able to extract that information and learn from far fewer examples than are required today.
When you hear about an exciting category of emerging AI technology, you might ask yourself whether it can ride on the backs of computational scaling, data scaling, and algorithmic improvement. If so, it’s more likely to make a big impact in the future. We can create immense value if we can get better at recognizing new ideas that, although they may not yet work well today, have potential to become tomorrow’s top performers.