Ready or Not

Published

Sep 25, 2019

Reading time

2 min read

Independent research lab OpenAI designed virtual agents to play hide-and-seek. They evolved increasingly clever strategies, eventually hacking the game world’s physics to gain advantage.

What happened: The researchers trained the agents to navigate and manipulate their environment and juiced them with reinforcement learning. Then they divided their creations into teams of hiders and seekers and set them loose in a virtual world that included movable blocks, walls, and ramps.

How it works: Seekers scored points if they caught sight of a hider. Hiders scored if they finished a game without being seen. An agent could move or lock objects in place; but only the agent that locked a given object could unlock it again.

The agents figured out the basics over the first several million rounds. Around game 22 million, hiders — which were given a grace period at the start of each round to scramble for cover — began building shelters out of the movable objects.
Roughly 100 million rounds in, seekers learned to infiltrate these hideaways using ramps. A few million later, the hiders stymied this strategy by locking the ramps.
The researchers say they didn’t expect the agents to learn much more. But around game 450 million, seekers discovered they could push blocks around even if they were standing on top. This allowed them to surf to hiders’ walls and walk right into their hideaways (as seen in the animation above).
Hiders eventually discovered the final, unbeatable strategy: Lock up every moveable object they wouldn’t be using as a barricade, then lock themselves inside a shelter of movable walls.

Why it matters: Hide-and-seek strategies could map to many real-world applications. For instance, rescue robots could be programmed as seekers — with rules restricting which types of objects are okay to pick up or move — to sift through rubble for survivors after a disaster.

We’re thinking: Reinforcement learning continues to find clever solutions. But the need to play 480 million rounds limits such techniques to simulated environments. We look forward to breakthroughs in small-data RL that make it possible to apply such techniques to physical robots that can play, say, thousands of games before they wear out. Meta learning, which organizations including OpenAI have worked on, could be an important step in this direction.

Subscribe to The Batch