Roboticists often train their machines in simulation, where the controller model can learn from millions of hours of experience. A new method trained robots in the real world in 20 minutes.
What's new: Laura Smith, Ilya Kostrikov, and Sergey Levine at UC Berkeley introduced a process to rapidly train a quadruped robot to walk in a variety of real-world terrains and settings.
Key insight: One way to train a model on less data is to train it repeatedly on the same examples (in this case, the robot's orientation, velocity, and joint angles at specific points in time). However, this may lead the model to overfit (for instance, the robot may learn to walk effectively only on the terrain used in training). Regularization or normalization enables a model to train multiple times on the same examples without overfitting.
How it works: The authors trained a motion-planning model to move a Unitree A1 robot forward on a given terrain using an actor-critic algorithm, a reinforcement-learning method in which an actor function learns to take actions that maximize the total return (roughly the sum of all rewards) estimated by a critic function. The actor was a vanilla neural network and the critic was an ensemble of such networks.
- The actor, given the robot’s current orientation, angular and linear velocity, joint angles, joint velocities, which feet were touching the ground, and the previous action, generated target joint angles.
- The critic encouraged the actor to move the robot forward within a range of speed defined by the authors. It also discouraged the actor from turning sideways.
- After each movement, the critic learned to estimate the expected future reward by minimizing the difference between its expected future reward before the movement and the sum of the actual reward and the expected future reward after the movement.
- The actor-critic algorithm updated the actor’s likelihood of making a particular move based on the size of the critic’s estimated reward.
- The authors applied layer normalization to the critic, enabling it to update 20 times per movement without overfitting. They updated the actor once per movement.
Results: The authors trained the model to walk the robot on each of five surfaces (starting from scratch for each surface): flat ground, mulch, lawn, a hiking trail, and a memory foam mattress. The robot learned to walk on each in about 20 minutes, which is roughly equivalent to 20,000 examples. Competing methods use either simulation or more time in the real world. For example, the authors of DayDreamer: World Models for Physical Robot Learning trained the same type of robot to walk on an indoor surface without a simulation, but it took one hour and 3.6 times more examples.
Why it matters: Training on simple features (those with a small number of dimensions, such as robot orientation and velocity) rather than complex features (such as images) reduces the number of examples required to learn a task, and regularizing the model prevents overfitting. This is a simple, general setup to train reinforcement learning models in the real world.
We're thinking: Reinforcement learning algorithms are famously data-hungry, which is why much of the progress in the past decade was made in simulated environments. A recipe for training a quadruped rapidly in the real world is a great step forward!