How will we know if someone succeeds in building artificial general intelligence (AGI)? A recent paper defines milestones on the road from calculator to superintelligence.
What’s new: Researchers at Google led by Meredith Ringel Morris propose a taxonomy of AI systems according to their degree of generality and ability to perform cognitive tasks. They consider today’s large multimodal models to be “emerging AGI.”
AGI basics: Artificial general intelligence is commonly defined as AI that can perform any intellectual task a human can. Shane Legg (who co-founded DeepMind) and Ben Goertzel (co-founder and CEO of SingularityNet) coined the term AGI for a 2007 collection of essays. Subsequently, companies like DeepMind and OpenAI, which explicitly aim to develop AGI, propelled the idea into the mainstream.
How it works: The taxonomy categorizes systems as possessing narrow skills (not AGI) or general capabilities (AGI). It divides both narrow and general systems into five levels of performance beyond calculator-grade Level 0. It also includes a metric for degree of autonomy.
- Narrow systems perform one distinct task; they may perform at one of the five levels, but they are not AGI. General systems perform a range of tasks (which the authors don’t specify) that align with real-world activities of broad value to people, including but not limited to linguistic, mathematical, logical, spatial reasoning, social, learning, and creative tasks. Crucially, they can learn how to learn new skills and when to ask humans for more information. The authors classify general systems as AGI at various levels of performance.
- Level 1 (“emerging”) matches or slightly exceeds unskilled humans. Levels 2 (“competent”), 3 (“expert”), and 4 (“virtuoso”) systems surpass the 50th, 90th and 99th percentiles of skilled human performance, respectively. Level 5 (“superhuman” or “artificial superintelligence”) outperforms 100 percent of skilled humans.
- Most current systems that perform at Level 2 or higher are narrow. For example, AlphaFold, which finds the shapes of protein molecules, achieves Level 5 performance but only in a single task. On the other hand, the authors consider large language models like Bard, ChatGPT, and Lama 2 to be general systems at Level 1 (although their performance may achieve Level 2 in some tasks).
- The authors’ autonomy scale ranges from tools for which humans control the task while the system automates subtasks (the first level of autonomy) to agents that act independently (the fifth). Higher levels of performance can unlock higher levels of autonomy. For instance, Level 4 AGI may be necessary to enable fully autonomous vehicles that are safe and trustworthy.
Yes, but: The authors’ definition identifies some classes of tasks that contribute to generality, but it includes neither a list of tasks a system must perform to be considered general nor a method for selecting them. Rather, the authors call on the research community to develop a “living benchmark” for generality that includes a mechanism for adding novel tasks.
Why it matters: AGI is one of the tech world’s hottest buzzwords, yet it has had no clear definition, and various organizations propose different definitions. This lack of specificity makes it hard to talk about related technology, regulation, and other topics. The authors’ framework, on the other hand, supports a more nuanced discussion of the path toward AGI. And it may have high-stakes business implications: Under the terms of their partnership, OpenAI can withhold from Microsoft models that attain AGI. Applying the authors’ taxonomy would make it harder for one of the parties to move the goalposts.
We’re thinking: Defining AGI is tricky! For instance, OpenAI defines AGI as “a highly autonomous system that outperforms humans at most economically valuable work.” This definition, had it been formulated in the early 1900s, when agriculture accounted for 70 percent of work globally, would have described the internal combustion engine.