Artificial intelligence has bested humanity at chess, Go, even StarCraft II. But those games are played against a single opponent in one sitting. Poker requires neural nets to learn skills like tracking a table full of players and maximizing winnings over many games.
What’s new: Researchers at Facebook and Carnegie Mellon University developed Pluribus, a deep learning model that plays No-Limit Texas Hold 'Em against a table full of players. The model fleeced a dozen professional gamblers over 12 days and 10,000 hands.
How the game is played: Pluribus learned by playing hundreds of thousands of hands against versions of itself.
- Rather than beat its opponents in a single game, the program aims to make the most money over a series of games.
- After each play — check bet, raise, call, or fold — Pluribus predicts how the action will affect the next two or three players’ actions.
- By strategizing only a few moves at a time rather than computing the end-of-game outcome, it economizes compute cycles.
- In the trial, the algorithm figured out how to bluff. It became more predictable; for instance, placing big bets on hands to psych out human players.
A chip and a chair: Pluribus didn’t win every hand, but it did win roughly $1,000 an hour. The researchers didn’t calculate the human players' rate, but the software's accumulation of chips — especially late in the trial — indicates that it maintained a steady advantage.
Behind the news: Ever since IBM Deep Blue’s 1997 chess victory over world champion Garry Kasparov, engineers have used strategy games to hone machine intelligence. In 2015, AlphaGo, developed by Google’s DeepMind, defeated champion Lee Sedol using strategies nobody had ever seen before. DeepMind struck again earlier this year, taking down a string of StarCraft II pros.
Why it matters: Authors Noam Brown and Tuomas Sandholm believe their technology could be useful in negotiations. Poker mirrors a variety of real-world scenarios, especially in business, which typically involves more than two people, each of whom has hidden motives, with stakes that are rarely sum-zero.
Win more, spend less: Pluribus’ winnings at the table are impressive, but so are its savings at the server. The model was trained using an off-the-shelf laptop with 512 gigabytes of memory. The authors estimate the training cost using cloud computing at around $144. Compare that with the $10,000-plus it can take to train a state-of-the-art language model.
We’re thinking: Pluribus proves that lightweight models are capable of heavy-duty decision making. This could be a boon for resource-strapped developers — and a bane for online gamblers.