Visual Strategies for RL Plan2Vec helps reinforcement learning with complex tasks.

Published

Jun 10, 2020

Reading time

2 min read

Reinforcement learning can beat humans at video games, but humans are better at coming up with strategies to master more complex tasks. New work enables neural networks to connect the dots.

What’s new: Ge Yang and Amy Zhang led researchers at Facebook, McGill University, and UC Berkeley to create Plan2Vec, a method that helps reinforcement learning systems strategize by representing each observation of a given task as a point on a surface.

Key insight: Reinforcement learning tasks generally involve reaching a goal as efficiently as possible. If a model can represent the task at hand as a weighted graph of points in space, then a conventional planning algorithm can find the shortest path between any two points. Plan2Vec observes solutions to a maze and distorts its representation so that points on a path out are closer together.

How it works: Training data for a reinforcement learning task consists of sequences of states and actions. The distance between any two states in general is not known, but the distances between states in a sequence are known.

Plan2Vec first learns to distinguish whether or not states are neighbors using noise-contrastive estimation. This method teaches the network to mark consecutive states in a sequence as close together and non-consecutive states as far apart.
From the predicted neighboring states, Plan2Vec extrapolates whether states from different sequences are neighbors, producing a graph that connects identified neighbors.
A planning algorithm uses the graph to generate a continuous surface that captures the predicted distances between all states.
To solve a task, Plan2Vec represents on the surface the starting and goal states. Then a planning algorithm finds the shortest path between them.

Results: Plan2Vec completed a 2D maze 80 percent of the time compared with a variational autoencoder (VAE) approach’s 53 percent. It solved StreetLearn, which requires navigation based on scenes along a path rather than a map, 92 percent of the time, while the VAE was successful in 26 percent of attempts.

Why it matters: VAEs are good at extracting low-dimensional features from images, but the meaning of those features may not be easy to interpret. Plan2Vec creates a surface that represents how various states in a task relate to one another. This representation makes it easier to learn — and interpret — efficient solutions.

We’re thinking: If we could see the strategic surface of Go, would Move 37 make sense to someone who isn’t a Grandmaster?

Subscribe to The Batch