Coordinating Robot Teams Google DeepMind’s RoboBallet project blends GNNs with RL to drive 8-armed robots

Published

Dec 03, 2025

Reading time

3 min read

In factories, where teams of robotic arms work in tight spaces, their motions are programmed by hand to keep them from interfering with one another. Researchers automated this programming using graph neural networks trained via reinforcement learning.

What’s new: Matthew Lai, Keegan Go, and colleagues at Google DeepMind, University College London, and robotics software shop Intrinsic developed RoboBallet, a graph neural network that coordinates robotic arms.

Key insight: Coordinating several robot arms is computationally infeasible for traditional search-based planners that figure out how to reach a target by searching through possible joint movements while checking for collisions. Each additional robot or obstacle multiplies the number of possible configurations. A graph neural network can overcome this limitation by learning to produce synchronized, collision-free motions in large numbers of simulated setups with different robot placements, obstacles, and target positions.

How it works: RoboBallet is a graph neural network that takes as input positions and orientations of robots, obstacles, and targets and generates joint velocities for each arm from its current position to reach a target. The authors trained it entirely in simulation using the TD3 actor-critic algorithm, a reinforcement learning algorithm. They generated about 1 million simulated workspaces, each of which contained a team of 4 or 8 simulated 3-joint Franka Panda robotic arms attached to the sides of a table at random, 30 obstacle blocks placed at random, and 40 target positions/orientations per team. They rejected configurations that started in collision.

Given a workspace, the authors represented robots, obstacles, and target positions as nodes in a graph. Edges connected each robot’s tip to its target position, each obstacle, and each other robot. The edge embeddings encoded how the robot’s tip related to an object’s or position’s centroid, size, and orientation.
During training, every 100 milliseconds, the model selected joint velocities of all robots, effectively telling each arm how to move (the actor role in the actor-critic learning algorithm). In parallel, it evaluated how good each prediction was – that is, how much total reward the current action and all actions likely to follow would yield (the critic role).
The authors rewarded the model for arms that touched the target positions and penalized collisions. Because the arms rarely touched the target positions, they used Hindsight Experience Replay, a method that turns failed attempts into useful examples by treating points that the arm reached accidentally as intended goals.
The loss encouraged the actor to produce actions that the critic predicted would lead to higher long-term rewards. This helped the model learn to prefer actions that paid off over time rather than maximize immediate rewards.

Results: The authors tested the trained model in the real world. For real-world tests, the authors generated graphs from the known geometry of a physical workspace, using robot placements and 3D meshes of obstacles.

Given new work spaces, the model generated collision-free trajectories for up to 8 Franka Panda robotic arms.
RoboBallet effectively parallelized work. Average time to move robots to 20 target positions dropped from 7.5 seconds with 4 arms to 4.3 seconds with 8 arms.
In a simplified benchmark with four robots and 20 target positions, RoboBallet produced trajectories as quickly as the best hand-optimized baselines, reaching all target poses in the same range of 8 to 11 seconds.

Why it matters: RoboBallet shows that a learning algorithm can coordinate many robots working together in real-world setups, and it can do so after training exclusively in a simulation. In addition, the model is more robust. When a robot fails, hard-coded routines can’t adapt. By contrast, the graph neural network continuously tracks how robots, tasks, and obstacles relate. If a robot fails, it can adapt on the fly and revise its plan.

We’re thinking: Representing the world as a graph enforces a built-in structure to the data, tracking relative positions and relationships between objects. Other data structures don’t inherently provide relationships between objects, so a network learning from them would have to learn those relationships as well. Using a graph makes it easier for a network to learn how to perform a task, since it doesn’t need to learn those relationships.

Subscribe to The Batch