LOKrS ARCHITECTURE

This section gives a brief overview of the important components of Loki's architecture [4]. Figure 1 illustrates how these components interact. In the diagram, rectangles are major components, rounded rectangles are major data structures, and ovals are actions. The data follows the arrows between components. An annotated arrow indicates how many times data moves between the components for each of our betting actions.

The architecture revolves around generating and using probability triples. It is an ordered triple of values, PT = [f,c,r], such that f + c + r = 1.0, representing the probability distribution that the next betting action in a given context should be a fold, call, or raise, respectively. The Triple Generator contains our poker knowledge, and is analogous to an evaluation function in two-player games. The Triple Generator calls the Hand Evaluator to evaluate any two-card hand in the current context. It uses the resulting hand value, the current game state, and expert-defined betting rules to compute the triple. To evaluate a hand, the Hand Evaluator enumerates over all possible opponent hands and counts how many of them would win, lose or tie the given hand.

Each time it is Loki's turn to bet, the Action Selector uses a single probability triple to decide what action to take. For example, if the triple [0.0,0.8,0.2] were generated, then the Action Selector would never fold, call 80% of the time and raise 20% of the time. A random number is generated to select one of these actions so that the program varies its play, even in identical situations. Although this is analogous to a mixed strategy in game theory, the probability triple implicitly contains contextual information.

After the flop, the probability for each possible opponent hand is different. For example, the probability that Ace-Ace hole cards are held is much higher than the cards 7-2, since most players will fold 7-2 before the flop. There is a weight table for each opponent. Each weight table contains one value for each possible two-card hand that the opponent could hold. The value is the probability that the hand would be played exactly as that opponent has played so far. For example, assume that an opponent called before the flop. The updated probability value for the hand 7-2 might be 2% since it normally should be folded. Similarly the probability of Ace-King might be 60% since it would seldom be folded before the flop, but is often raised. After an opponent action, the Opponent Modeler updates the Weight Table for that opponent in a process called re-weighting. The value for each hand is increased or decreased to be consistent with the opponent's action. The Hand Evaluator uses the Weight Table in assessing the strength of each possible hand, and these values are in turn used to update the Weight Table after each opponent action. The absolute values of these probabilities are of little consequence, since only the relative weights affect the later calculations. The details are discussed in Section 6.

Figure 1. The architecture of Loki.

Probability triples are used in three places in Loki. The Action Selector uses a probability triple to decide on a course of action (fold, call, raise) as previously described. The Simulator uses probability triples to choose actions for simulated opponent hands (see Section 5). The Opponent Modeler uses an array of probability triples to update the model of each opponent (see Section 6).

An important advantage of the probability triple representation is that imperfect information is restricted to the Triple Generator and does not affect the rest of the program. This is similar to the way that alpha-beta search restricts knowledge to the evaluation function. The probability triple framework allows the "messy" elements of the program to be amalgamated into one component, which can then be treated as a "black box" by the rest of the system. Thus, aspects like game-specific information, complex expert-defined rule systems, and knowledge of human behavior are all isolated from the engine that uses this input.

0 0

Post a comment

  • Receive news updates via email from this site