Armchair speed demons have a new nemesis.
What’s new: Peter Wurman and a team at Sony developed Gran Turismo Sophy (GT Sophy), a reinforcement learning model that defeated human champions of Gran Turismo Sport, a PlayStation game that simulates auto races right down to tire friction and air resistance.
Key insight: It’s okay to bump another car while racing (as in the video above), but there’s a thin and subjective line between innocuous impacts and those that would give the offender an advantage. In official Gran Turismo Sport competitions — as in real-world races — a human referee makes these calls and penalizes errant drivers. A reinforcement learning algorithm can model such judgments by assigning a cost to each collision, but it must be tuned to avoid an adverse effect on performance: Too high a penalty and drivers become timid, too low and they become dangerous. Penalizing common situations in which a driver typically would be judged at fault, such as rear-ending, side-swiping, and colliding on a curve, should help a neural network learn to drive boldly without ramming its opponents to gain an advantage.
How it works: Given information about the car and its environment, a vanilla neural network decided how to steer and accelerate. The authors trained the network on three virtual tracks and in custom scenarios, such as the slingshot pass, that pitted the model against itself, previous iterations of itself, and the in-game AI.
- Ten times a second, a vanilla neural network decided how much to accelerate or brake and how much to turn left or right depending on several variables: the car’s velocity, acceleration, orientation, weight on each tire, position, the data points that described the environment ahead, the positions of surrounding cars, whether it was colliding with a wall or another car, and whether it was off-course.
- During training, a reinforcement learning algorithm rewarded the model for traveling and for gaining ground on opponents. It applied a penalty for skidding, touching a wall, allowing an opponent to gain ground, going off-course, and colliding with an opponent. It further penalized the typical at-fault scenarios.
- A separate vanilla neural network, given the information about the car and environment, learned to predict the future reward for taking a given action.
- The first network learned to take actions that maximized the predicted future reward.
Results: In time trials, GT Sophy achieved faster lap times than three of the world’s top Gran Turismo Sport drivers. In addition, a team of four GT Sophys faced off against four of the best human drivers in two sets of three head-to-head races held months apart. Points were awarded based on the cars’ final positions: 10 points for first place, 8 for second, 6 for third, and from 5 to 1 point for the remaining positions. The human team won the first set 86 to 70. Then the developers increased the model size and changed some rewards and features, among other tweaks, and the GT Sophy team won the second set 104 to 52.
Why it matters: Unlike board games like Chess and Go in which learning algorithms have beaten human champions, winning a car race requires making complex decisions at high speed while tracing a fine line between nudging and disabling opponents. That said, there’s still a significant gap between doing well in even an exceptionally realistic video game and driving a real car.
We’re thinking: Autonomous driving requires perception, planning, and control. We have little doubt that the latest algorithms can outperform most human drivers in control, but a substantial gap remains in perception and planning.