One Model Does It All Multi-task AI models got more sophisticated in 2022.

Published

Dec 21, 2022

Reading time

2 min read

Individual deep learning models proved their mettle in hundreds of tasks.

What happened: The scope of multi-task models expanded dramatically in the past year.

Driving the story: Researchers pushed the limits of how many different skills a neural network can learn. They were inspired by the emergent skills of large language models — say, the ability to compose poetry and write computer programs without architectural tuning for either — as well as the capacity of models trained on both text and images to find correspondences between the disparate data types.

In spring, Google’s PaLM showed state-of-the-art results in few-shot learning on hundreds of tasks that involve language understanding and generation. In some cases, it outperformed fine-tuned models or average human performance.
Shortly afterward, DeepMind announced Gato, a transformer that It learned over 600 diverse tasks — playing Atari games, stacking blocks using a robot arm, generating image captions, and so on — though not necessarily as well as separate models dedicated to those tasks. The system underwent supervised training on a wide variety of datasets simultaneously, from text and images to actions generated by reinforcement learning agents.
As the year drew to a close, researchers at Google brought a similar range of abilities to robotics. RT-1 is a transformer that enables robots to perform over 700 tasks. The system, which tokenizes actions as well as images, learned from a dataset of 130,000 episodes collected from a fleet of robots over nearly a year and a half. It achieved outstanding zero-shot performance in new tasks, environments, and objects compared to prior techniques.

Behind the news: The latest draft of the European Union’s proposed AI Act, which could become law in 2023, would require users of general-purpose AI systems to register with the authorities, assess their systems for potential misuse, and conduct regular audits. The draft defines general-purpose systems as those that “perform generally applicable functions such as image/speech recognition, audio/video generation, pattern-detection, question-answering, translation, etc.,” and are able to “have multiple intended and unintended purposes.” Some observers have criticized the definition as too broad. The emerging breed of truly general-purpose models may prompt regulators to sharpen their definition.

Where things stand: We’re still in the early phases of building algorithms that generalize to hundreds of different tasks, but the year showed that deep learning has the potential to get us there.

Subscribe to The Batch