Poker Pros School Computer on No-Limit Texas Hold'em
A supercomputer-powered artificial intelligence that had previously beat all other computer rivals at playing no-limit Texas hold’em fell short of victory when it challenged four of the world’s best human poker players. The unprecedented showdown took place during a two-week competition lasting from April 24 to May 8.
But computer scientists at Carnegie Mellon University in Pittsburgh have already started poring through the competition’s results for lessons on how to improve their poker-playing AI. One of them believes that the computer program could eventually beat the best human players sometime within the next five years.
The “Brains Vs. Artificial Intelligence” competition held at the Rivers Casino in Pittsburgh showed big differences in play style between the computer program and its human opponents. The Carnegie Mellon University computer program, named Claudico, made bets and played hands according to the best poker-playing strategy it could generate without ever adjusting to the human opponents’ play styles. By comparison, the human poker pros were quick to notice any patterns in Claudico’s play that they could exploit.
“The humans looked for holes in Claudico’s strategy all day long, and they were very astute and attuned to finding those holes,” said Tuomas Sandholm, a computer scientist at Carnegie Mellon University. “They were testing Claudico’s strategy in a very different way than our computer opponents did.”
The no-limit Texas hold’em version of poker is a good test for computers because the unrestricted bet sizes add complexity to the problem. The University of Alberta in Canada previously developed a computer program capable of near-optimally solving limit Texas hold’em, a simpler version of poker with restricted bet sizes. But no computer program, including Claudico, has managed to come up with a solution for no-limit Texas hold’em. The Carnegie Mellon University team hopes to greatly improve their computer algorithms from having pitted Claudico against the best human opponents.
One of Claudico’s signature moves involved “limping,” a tactic that involves the first mover calling the “big blind” bet by meeting its bet size instead of raising or folding. The human poker pros noticed Claudico’s unusual reliance on limping right away, because most human players consider it a weak strategy. But a preliminary glance at the competition results suggests that Claudico’s use of limping seemed to have worked out fairly well without being seriously exploited by its human opponents.
The computer program’s methodical approach has several advantages over the humans, Sandholm said. For example, Claudico’s randomized strategy provided one of its greatest defenses against being exploited. Whenever the program bet a certain amount of money, it always made that bet size with a balanced set of bad and good hands. That meant the human poker pros couldn’t simply assume that a certain bet size from Claudico indicated either a strong or weak hand.
A second strength for the computer program came from its wide range of bet sizes. Most human poker players typically use one or two bet sizes in any given situation. But Claudico was willing to bet many different amounts of money in any given situation, including very big and very small amounts that wouldn’t make much sense to humans. Doug Polk, widely considered the best heads-up (two-player) no-limit Texas Hold’em player, observed that “betting $19,000 to win a $700 pot just isn't something that a person would do.”
A third strength for Claudico came from its ability to play based in part on the probability of a certain card coming up in a given hand of two cards held by an opponent. Good human players consider the range of possible cards that may come up in a given hand, but it’s much more difficult for humans to assign a given percentage chance to the possibility of a specific card showing up.
The computer program also had its weaknesses. One previously mentioned weakeness involved was the failure to adapt to individual opponents and exploit their weaknesses. A second weakness came from Claudico’s inability to fully take advantage of the “card removal” effect; considering how the cards held by the computer affect the probability of certain other cards appearing in an opponent’s hand.
But Claudico continually improved its performance against the human poker pros throughout the two-week competition. The computer program had already been using the Pittsburgh Supercomputing Center’s Blacklightsupercomputer to continuously calculate a better overall strategy for several months before the competition, and it continued to refine its poker-playing strategy during the competition. Sandholm’s team also tried out different versions of Claudico’s software modules to see what might improve its play against the human players.
“I think Claudico’s strategy did get better over the match,” Sandholm said. “To us and the human players, it looks like Claudico played much stronger against the humans in the second half of the match.”
As a result, the human poker pros had a smaller winning edge in the second half of the competition compared with the first half. That didn’t stop them from pulling out a victory with a combined total of $732,713 ahead of Claudico during a combined 80,000 hands of poker. The total amount of money “bet” during the competition was about $170 million. (In reality, the human pros took home appearance fees from $100,000 donated by the Rivers Casino and Microsoft Research.)
But in the bigger sense, the human margin of victory over Claudico was not large enough to statistically determine whether the humans or the computer program were actually the better poker players in the long run. The computer scientists and poker pros agreed on playing a total of 80,000 hands in part because of the limits of human endurance. But a test that pits computers against humans with statistically significant results seems to require an even higher number of hands to be played. Sandholm explains:
80,000 hands is already a number that humans can barely do during two weeks of full-time play. As the program gets closer and closer to the quality of play for the best humans in the world, it requires that many more hands to get statistically significant results. It’s quite possible that the AI is going to catch up to and surpass the humans and we won’t be able to tell.
Still, Sandholm believes that Claudico’s successors could become better players of no-limit Texas hold’em than the best human poker pros sometime in the next five years. That could prove good news for everyone, because Sandholm’s team doesn’t actually have interest in dominating poker tournaments. Instead, the researchers are using poker to improve Claudico’s algorithms so that they can tackle tough real-world problems in security, business and medicine; problems that also feature complexities and unknown factors not unlike no-limit Texas hold’em.
The four human poker pros also took away lessons from their match against Claudico, Sandholm said. But the poker pros prefer to keep quiet about those lessons so that they can beat other people more readily in no-limit Texas hold’em tournaments.
“I don’t want to spill the beans,” Sandholm said. “They’re great guys who study hard, use computational tools in their studies and are open to new ideas.”
Comments