A dog doesn’t think twice about fetching a tennis ball, but an autonomous robot typically suffers from delays between perception and action. A new machine-learning model helped a quadruped robot coordinate its sensors and actuators.
What's new: Chieko Sarah Imai and colleagues at University of California devised a reinforcement learning method, Multi-Modal Delay Randomization (MMDR), that approximates real-world latency in a simulated training environment, enabling engineers to compensate for it.
Key insight: Most robot simulations wait for the machine to take an action after a change in its surroundings. But in the real world, it takes time for a sensor to read the environment, a neural network to compute the action, and motors to execute the action — and by that time, the environment has already shifted again. Simulating the latency of sensors that track position and movement during training helps a model to learn to adjust accordingly, but that doesn’t account for lags due to reading and processing visual sensors. Simulating a separate latency for vision should address this issue.
How it works: The authors trained their system to compute optimal angles for a simulated robot's joints using the reinforcement learning algorithm proximal policy optimization. The virtual robot traversed uneven virtual ground between box-like obstacles in a PyBullet simulation.
- During training, the authors maintained a buffer of 16 frames from a virtual depth camera. They split the buffer into quarters and randomly selected a frame from each part to simulate variable latency in real-world depth perception.
- Similarly, they buffered position and movement sensor readings, for example, the angles of the robot’s joints. For fine variation over the latency, they chose two adjacent readings at random and interpolated between them.
- Selected frames of depth information went to a convolutional neural network, and position and movement sensor readings went to a vanilla neural network. The system concatenated the representations from both networks and passed them to another vanilla neural network, which generated target angles for each joint.
- The reward function encouraged the virtual robot to keep moving forward and not to fall while minimizing the virtual motors’ energy cost.
Results: The authors tested a Unitree A1 robot in the real world, comparing MMDR with alternatives they call No-Delay and Frame-Extract. No-Delay used only the four most recent frames as input. Frame-Extract was similar to MMDR but used the initial frames from each of the buffered sequences. MMDR was consistently best in terms of steps traveled through a variety of terrain. For example, in nine forest trials, the robot using MMDR moved an average of 992.5 steps versus 733.8 steps for No-Delay and 572.4 steps for Frame-Extract.
Why it matters: Robots in the wild often face different mechanical and environmental conditions than a simulation can reproduce. To build autonomous machines that work in the real world, it’s critical to account for all kinds of latency in the system.
We're thinking: Roboticists and mechanical engineers who work with physical robots have been accounting for various control latencies for decades. But much of the recent activity in reinforcement learning has involved simulated environments. We’re glad to see researchers working to bridge the sim-to-real gap and address the challenges of working with physical robots.