Autonomous robots trained to navigate in a simulation often struggle in the real world. New work helps bridge the gap in a counterintuitive way.
What’s new: Joanne Truong and colleagues at Georgia Institute of Technology and Meta proposed a training method that gives robots a leg up in the transition from simulation to reality. They found that training in a crude simulation produced better performance in the real world than training in a more realistic sim.
Key insight: When using machine learning to train a robot to navigate, it stands to reason that a more realistic simulation would ease its transition to the real world — but this isn’t necessarily so. The more detailed the simulation, the more likely the robot’s motion planning algorithm will overfit to the simulation’s flaws or bog down in processing, hindering real-world operation. One way around this is to separate motion planning from low-level control and train the motion planner while “teleporting” the robot from one place to another without locomotion. Once deployed, the motion planner can pass commands to an off-the-shelf, non-learning, low-level controller, which in turn calculates the details of locomotion. This avoids both the simulation errors and intensive processing, enabling the robot to operate more smoothly in the real world.
How it Works: The authors trained two motion planners (each made up of a convolutional neural network and an LSTM) to move a Boston Dynamics Spot through simulated environments. One learned to navigate by teleporting, the other by moving simulated legs.
- The motion planners used the reinforcement learning method DD-PPO to navigate to goal locations in over 1,000 high-resolution 3D models of indoor environments.
- They were rewarded for reaching their goals and penalized for colliding with obstacles, moving backward, or falling.
- Given a goal location and a series of depth images from the robot’s camera, the motion planners learned to estimate a velocity (speed plus direction) to move the robot’s center of mass.
- In simulation, one motion planner sent velocities to a low-level controller that simply teleported the robot to a new location without moving its legs. The other sent velocities to a low-level controller, adopted from other work, that converted the output into motions of simulated legs (and thus raised the chance of being penalized).
Results: The authors tested a Spot unit outfitted with each controller in a real-world office lobby, replacing the low-level controllers used in training with Spot’s built-in controller. The motion planner trained on teleportation took the robot to its goal 100 percent of the time, while the one trained on the more detailed simulation succeeded 67.7 percent of the time.
Yes, but: Dividing robotic control between high- and low-level policies enabled the authors to dramatically simplify the training simulation. However, they didn’t compare their results with those of systems that calculate robot motion end-to-end.
Why it matters: Overcoming the gap between simulation and reality is a major challenge in robotics. The finding that lower-fidelity simulation can narrow the gap defies intuition.
We’re thinking: Simplifying simulations may benefit other reinforcement learning models that are expected to generalize to the real world.