Experiments

Self-play experiments offer a convenient method for the comparison of two or more versions of the program. Our experiments use a duplicate tournament system, based on the same principle as duplicate bridge. Since each hand can be played with no memory of preceding hands, it is possible to replay the same deal, but with the participants holding a different set of hole cards each time. Our tournament system simulates a ten-player game, where each deal is replayed ten times, shuffling the seating arrangement so that every participant has the opportunity to play each set of hole cards once. This arrangement greatly reduces the "luck element" of the game, since each player will have the same number of good and bad hands. The differences in the performance of players will therefore be based more strongly on the quality of the

decisions made in each situation. This reduction in natural variance allows meaningful results to be obtained with a smaller number of trials than in a typical game setting. Nevertheless, it is important to not over-interpret the results of one experiment.

Experiments have been performed with Loki to measure the performance of generic opponent modeling (GOM), simulation (S), and both combined (GOM+S). The results were obtained by playing a self-play tournament containing two enhanced versions of Loki against 8 unenhanced versions. A tournament consisted of 2,500 different deals (i.e. 25,000 games). Each simulation consisted of 500 trials, since the results obtained after 500 trials were reasonably stable.4

The metric used to measure program performance is the average number of small bets won per hand (sb/hand), a metric that is sometimes used by human players. For example, in a game of $10/$20 Hold'em, an improvement of +0.10 sb/hand translates into an extra $30 per hour (based on 30 hands per hour). Anything above +0.05 small bets per hand is considered a large improvement. In play on an Internet poker server against human opponents, Loki has consistently performed at or above the +0.05 sb/hand level.

The experiments showed that GOM improved performance by 0.031 ±0.019 sb/hand, simulations improved by 0.093 ±0.04 sb/hand, and the combination was worth 0.095 ±0.045 sb/hand (note that these are newer numbers than those appearing in [2,3,4]). The results reported here may be slightly misleading since each experiment used two similar programs. As has been shown in chess, one has to be careful about interpreting the results of these types of experiments.

GOM is a significant gain as expected. Given that all players in the tournaments were variants of Loki, the wide variety of play that is seen in human play is missing. Hence, GOM may be of greater benefit against typical human opponents. Simulations, on the other hand, are a huge win in self-play experiments against non-simulation opponents. As expected, they have a naturally occurring higher variance. The use of simulations represents a large improvement in the quality and variety of the betting strategies employed by Loki (or, possibly, overcome a serious weakness in the older version of the program). Whereas our initial knowledge-based betting strategy routine [1,14] was limited by the amount of knowledge we could code and tune, the simulation-based approach has no such restrictions. The simulations implicitly enable advanced betting strategies, with a degree of unpredictability that makes it harder for the opponents to model Loki.

Note that although each feature is a win by itself, the combination is not necessarily additive because there may

4 The average absolute difference in expected value in increasing from 500 to 2,000 trials was small and seldom resulted in a significant change to an assessment. The difference between 100 trials and 500 trials was much more significant; the variance with 100 trials was too high.

be some interdependence between GOM and simulations (i.e. both ideas may exploit the same weaknesses). As well, the magnitude of the simulation improvement is such that it hides the effects of combining it with GOM. The larger the winning margin, the smaller the opportunity there is for demonstrating further improvement against the same opposition.

Each set of improvements reported over the past two years were measured against the previous strongest versions of Loki. As a result, the magnitude of the change may be dampened over time, simply because it is being tested against generally stronger opposition. For example, if you have three generations of poker-playing programs (A, B, and C) with B defeating A by 0.1 sb/hand and C is better than B by 0.1 sb/hand, it does not follow that C will be .2 sb/hand better than A.

Specific opponent modeling (SOM) is harder to measure, due in part to the nature of our self-play experiments. In previous work we demonstrated improvements for both GOM and SOM against a static default model [2]. However, since that time Loki has improved significantly (for example, with improved reweighting and simulations). A consequence is that our simplistic SOM model has not yet added significantly to the performance of the stronger version of Loki. Improving SOM is our current focus, and some of the ideas we are pursuing are discussed in the next section.

Loki has been tested in more realistic games against human opposition. For this purpose, the program participates in an on-line poker game, running on the Internet Relay Chat (IRC). Human players connect to IRC and participate in games conducted by dedicated server programs. No real money is at stake, but bankroll statistics on each player are maintained. The new versions of Loki using GOM and simulations win consistently when playing on the IRC server. Although there is a high level of variance in this environment, there is strong evidence that GOM is a major advance in the program's playing strength against human opposition (as opposed to the self-play experiments where the advantage was not as significant). The performance of the program depends strongly on which players happen to be playing, and on IRC it ranges from novices to professional players. Consequently, it is dangerous to quantify the results of our recent improvements to Loki.

0 0

Post a comment

  • Receive news updates via email from this site