Eyes on the Prize Vision-only reinforcement learning improves generalizability.

Published

Jul 29, 2020

Reading time

2 min read

When the chips are down, humans can track critical details without being distracted by irrelevancies. New research helps reinforcement learning models similarly focus on the most important details.

What’s new: Google’s Yujin Tang, Duong Nguyen, and David Ha developed a reinforcement learning approach that teaches an agent to pay attention only to visual information that helps accomplish a task. This strategy makes it easier to perform similar tasks in new environments.

Key insight: In the previous World Models approach, an agent memorized features when it observed the world and used that knowledge to predict outcomes of future experiences. Memorizing the entire world isn’t necessary because many observable details, such as background color, aren’t helpful when solving a task. Agents should perform better if they block out such details.

How it works: The authors’ approach effectively preprocesses an image before the agent considers it in selecting an action.

Presented with a new image, the model splits it into small patches. It multiplies each patch’s pixel values by a matrix to transform them into a four-dimensional vector (four being a hyperparameter).
A self-attention layer, minus the usual feed-forward layer to reduce the number of parameters, ranks each patch’s relevance to the task at hand.
The rank ordering technique is non-differentiable, so the agent can’t learn which are most relevant through backprop. Instead, the researchers used the covariance matrix adaptation evolution strategy, an evolutionary technique that optimizes a loss function across a large population of models.
The highest-ranked patches (the user decides how many) feed an LSTM layer, which predicts an action.

Results: The researchers tested their method on the Car Racing and Doom Takeover tasks from OpenAI Gym. On both tasks, it surpassed an OpenAI benchmark that’s nearly optimal.

Why it matters: Providing agents with fewer inputs made it possible to reduce their size, and using an evolutionary technique reduced the number of parameters devoted to self-attention. The researchers needed only 3,700 parameters. World Models, which performed both tasks using relatively few parameters compared to other earlier approaches, required 4.7 million.

We’re thinking: We love AI approaches to car racing, and it looks like this work is braking new ground.

Subscribe to The Batch