07GamePlaying
Game-playing
CITS3001 Algorithms, Agents and Artificial Intelligence
2021, Semester 2Tim French
Department of Computer Science and Software Engineering
The University of Western Australia
Introduction
• We will motivate the investigation of games in AI
• We will apply our ideas on search to game trees
– Minimax
– Alpha-beta pruning
• We will introduce the idea of an evaluation
function
– And some concepts important to their design
2
Broadening our worldview
• In our discussions so far, we have assumed that world descriptions have been
– Complete – all information needed to solve the problem is available to the search
algorithm
– Deterministic – the effects of actions are uniquely determined and predictable
• But this is rarely the case with real-world problems!
• Sources of incompleteness include
– Sensor limitations – it may be impossible to perceive the entire state of the world
– Intractability – the full state description may be too large to store, or too large to compute
• Sources of non-determinism are everywhere
– e.g. people, weather, mechanical failure, dice, etc.
• Incompleteness ↔ non-determinism?
– Both imply uncertainty
– Addressing them involves similar techniques 3
Three Approaches to Uncertainty
• Contingency planning
– Build all possibilities into the plan
– Often makes the tree very large
– Can only guarantee a solution if the number of contingencies is tractable
• Interleaving, or adaptive planning
– Alternate between planning, acting, and sensing
– Requires extra work during execution
– Unsuitable for offline planning
• Strategy learning
– Learn, from examples, strategies that can be applied in any situation
– Must decide on parameterisation, state-evaluation, suitable examples to study, etc.
4
Why Study Games
• Games provide
– An abstraction of the real world
– Well-defined, clear state descriptions
– Limited operations with well-defined consequences
– A way of making incremental, controllable changes
– A way of including hostile agents
• So they provide a forum for investigating many of the real-world issues outlined
previously
– More like the real world than previous examples…
• The initial state and the set of actions (the moves of the game) define a game tree
that serves as the search tree
– But of course different players get to choose actions at various points
– So our previous search algorithms don’t work!
• Games are to AI, as F1 is to car design
5
Example: noughts and crosses
• Each level down
represents a move by one
player
– Known as one ply
– Stop when we get to a
goal state (three in a line)
• What is the “size” of this
problem?
6
Noughts and Crosses Vital Statistics
• The game tree as drawn above has 9! = 362,880 edges
– But that includes games that continue after a victory
– Removing these gives 255,168 edges
• Combining equivalent game boards leaves 26,830
edges
– Mostly this means resolving rotations and reflections
• Each square can be a cross, a circle, or empty
– Therefore there are 39 = 19,683 distinct boards
– But that includes (e.g.) boards with five crosses and two
circles
– Removing these gives 5,478 distinct legal boards
• Resolving rotations and reflections leaves 765 distinct
legal boards
• The takeaway message is “think before you code”!
7
Noughts and Crosses Scenarios
• You get to choose your opponent’s moves, and you know the goal, but you don’t
know what is a good move
– Normal search works, because you control everything
– What is the best uninformed search strategy?
• How many states does it visit?
– What is a good heuristic for A* here?
• How many states does it visit?
• Your opponent plays randomly
– Does normal search work?
– Uninformed strategy?
– A* heuristic?
• Your opponent tries
– We know it’s a draw really
• One important difference with games is that we don’t get to dictate all of the actions
chosen, The opponent has a say too! 8
Perfect Play: the Minimax algorithm
• Consider a two-player game between MAX and MIN
– Moves alternate between the players
• Assume it is a zero-sum game
– Whatever is good for one player, is bad for the other
• Assume also that we have a utility function that we can apply to any game position
– utility(s) returns r Î R
– ∞ if s is a win for MAX
– positive if s is good for MAX
– 0 if s is even
– negative if s is good for MIN
– –∞ if s is a win for MIN
• Whenever MAX has the move in position s, they choose the move that maximises the value
of utility(s)
– Assuming that MIN chooses optimally
• Conversely for MIN
Minimax(s)
= utility(s), if terminal(s)
= max{Minimax(result(s, a)) | a Î actions(s)}, if player(s) = MAX
= min{Minimax(result(s, a)) | a Î actions(s)}, if player(s) = MIN 9
Minimax operation
• We imagine that the game tree is expanded to some definition of terminals
– This will depend on the search depth
• In the figure, two ply
– This will depend on the available resources
– In general, it won’t be uniform across the tree
• The tree is generated top-down, starting from the current position
– Then Minimax is applied bottom-up, from the leaves back to the current position
• At each of MAX’s choices, they (nominally) choose the move that maximises the
utility
– Conversely for MIN
10
Minimax Performance
• Complete: yes, for a finite tree
• Optimal: yes, against an optimal opponent
• Time: O(bm), all nodes examined
• Space: O(bm), depth-first (or depth-limited) search
• Minimax can be extended straightforwardly
to multi-player games
– Section 5.2.2 of AIMA
• But for a “big” game like chess, expanding to the terminals is completely infeasible
• The standard approach is to employ
– A cut-off test, e.g. a depth limit
• Possibly with quiescence search
– An evaluation function
• Heuristic used to estimate
the desirability of a position
• This will still be perfect play…. If we have a perfect evaluation function…
11
Example: Chess
• Average branching factor is 35
– Search tree has maybe 35100 nodes
– Although only around 1040 distinct legal positions
• Clearly cannot solve by brute force
– Intractable nature → incomplete search
– So offline contingency planning is impossible
• Interleave time- or space-limited search with moves
– This lecture
– Algorithm for perfect play [Von Neumann, 1944]
– Finite-horizon, approximate evaluation [Zuse, 1945]
– Pruning to reduce search costs [McCarthy, 1956]
• Or use/learn strategies to facilitate move-choice based on
current position
– Later in CITS3001
• What do humans do?
12
Evaluation Functions
• If we cannot expand the game tree to terminal nodes, we expand “as far as we
can” and apply some judgement to decide which positions are best
• A standard approach is to define a linear weighted sum of relevant features
– e.g. in chess: 1 for each pawn, 3 for each knight or bishop, 5 for each rook, 9 for each
queen
– Plus positional considerations, e.g. centre control
– Plus dynamic considerations, e.g. threats
• eval(s) = w1f1(s) + w2f2(s) + … + wnfn(s)
– e.g. w1 = 9
– e.g. f1(s) = number of white Qs – number of black Qs
• Non-linear combinations are also used
– e.g. reward pairs of bishops
13
Material advantage
Positional advantage
Properties of good evaluation
functions
• Usually the quality of the player depends
critically on the quality of the evaluation
function
• An evaluation function should
– Agree with the utility function on terminal
states
– Reflect the probability of winning
– Be time efficient, to allow maximum search
depth
• Note that the exact values returned seldom
matter
– Only the ordering matters
• An evaluation could also be accompanied
by a measure of certainty
– e.g. we may prefer high certainty when we
are ahead, low certainty when we are behind
14
Cutting off Search
• We can cut-off search at a fixed depth
– Works well for simple games
– Depth-limited search
• Often we are required to manage the time taken per move
– Can be hard to turn time into a cut-off depth
– Use iterative-deepening
• An anytime algorithm
– Sacrifice (some) depth for flexibility
• Sometimes we are required to manage the time taken for a series of moves
– More complicated again
– Sometimes we can anticipate changes in the branching factor
• Seldom want cut-off depth to be uniform across the tree
– Two particular issues that arise often are quiescence and the horizon effect
15
Quiescence
• A quiescent situation is one where values from the evaluation function are unlikely
to change much in the near future
• Using a fixed search-depth can mean relying on the evaluations of non-quiescent
situations
– Can avoid this by e.g. extending the search to the end of a series of captures
16
The Horizon Effect
• If we are searching to k ply, something bad that will happen on the k+1th ply (or
later) will be invisible
• In extreme cases, we may even select bad moves,
simply to postpone the inevitable
– If “the inevitable” scores –x, any move that scores better than –x in the search window
looks good
– Even if the inevitable is still guaranteed
to happen later!
• No general solution to this problem
– It is fundamentally a problem with lack of depth
17
Alpha-Beta Pruning
• One way we can reduce the number of nodes examined by Minimax is to identify
nodes that cannot be better than those that we have already seen
– This will enable a deeper search in the same time
• Consider again Fig. 5.2
• Minimax(A) = max(min(_,_,_), min(_,_,_), min(_,_,_))
– Working from left-to-right
– First we inspect the 3, 12, and 8
• Minimax(A) = max(3, min(_, _, _), min(_, _, _))
– Next we inspect the first 2
• Minimax(A) = max(3, min(2, _, _), min(_, _, _))
– This is less than the 3
– The next two leaves are immediately irrelevant
• Minimax(A) = max(3, min(_, _, _)) = max(3, 2) = 3
• We do not need to inspect the 5th and 6th leaves
– But we do need to inspect the 8th and 9th…
18
Alpha-Beta Operation
• We need to keep track of the range of
possible values for each internal node
• In Fig. 5.6, if
– On the left sub-tree, we know definitely
that we can choose a move that gives
score m, and
– On the right sub-tree, we know that the
opponent can choose a move that
limits the score to n ≤ m
• Then we will never (rationally) choose the
move that leads to the right sub-tree
19
Alpha-beta pseudo-code
αβsearch(s):
return a Î actions(s) with value maxvalue(s, –∞, +∞)
maxvalue(s, α, β):
if terminal(s) return utility(s)
else
v = –∞
for a in actions(s)
v = max(v, minvalue(result(s, a), α, β))
if v ≥ β return v
α = max(α, v)
return v
minvalue(s, α, β):
if terminal(s) return utility(s)
else
w = +∞
for a in actions(s)
w = min(w, maxvalue(result(s, a), α, β))
if w ≤ α return w
β = min(β, w)
return w
20
Alpha-beta in action
• αβsearch(A) = maxvalue(A, –∞, +∞), v = –∞
• call minvalue(B, –∞, +∞), w = +∞
– call maxvalue(B1, –∞, +∞)
• returns 3, w = 3, β = 3
– call maxvalue(B2, –∞, 3)
• returns 12
– call maxvalue(B3, –∞, 3)
• returns 8
– returns 3, v = 3, α = 3
• call minvalue(C, 3, +∞), w = +∞
– call maxvalue(C1, 3, +∞)
• returns 2, w = 2
– returns 2
• call minvalue(D, 3, +∞), w = +∞
– call maxvalue(D1, 3, +∞)
• returns 14, w = 14, β = 14
– call maxvalue(D2, 3, 14)
• returns 5, w = 5, β = 5
– call maxvalue(D3, 3, 5)
• returns 2, w = 2
– returns 2
• returns 3 21
Alpha-beta discussion
• Pruning does not affect the final result
– It simply gets us there sooner
• A good move ordering means we can prune more
– e.g. if we had inspected D3 first, we could have pruned D1 and D2
• We want to test expected good moves first
– “Good” from the POV of that node’s player
• “Perfect ordering” can double our search depth
– Obviously perfection is unattainable, but e.g. in chess we might test
• Captures
• Threats
• Forward moves
• Backward moves
• Sometimes we can learn good orderings
– Known as speedup learning
– Can play either faster at the same standard, or better in the same time
22
Game playing agents….
• Checkers (Draughts)
– Marion Tinsley ruled Checkers for forty years, losing only seven games in that time
– In 1994 Tinsley’s health forced him to resign from a match against Chinook, which was
crowned world champion shortly afterwards
– At that time, Chinook used a database of 443,748,401,247 endgame positions
– Checkers has since been proved to be a draw with perfect play
• The proof was announced at UWA
• Chinook now plays perfectly, using αβ search and a database of
39,000,000,000,000 positions
• Chess
– Deep Blue defeated Gary Kasparov in a six-game match in 1997
– Deep Blue searches 200,000,000 positions/second, up to 40 ply deep
• Othello
– Look-ahead is very difficult for humans in Othello
– The Moor became world champion in 1980
– These days computers are banned from championship play 23
Game playing agents….
• Go
– 19×19 Go has a branching factor of over 300, making look-ahead very difficult
for programs
• Play at a “good amateur” level, although still improving
– They are much better at 9×9
– DeepMind AlphaGo defeated Go champion Lee Sedol in March 2016.
• Backgammon
– Dice rolls increase the branching factor
• 21 possible rolls with two dice
– About 20 legal moves with most positions and rolls
• Although approx 6,000 sometimes with a 1+1!
• Depth 4 means 20 x (21 x 20)3 ≈ 1,500,000,000 possibilities
– Obviously most of this search is “wasted”
• Value of look-ahead is much diminished
– TDGammon (1992) used depth 2 search plus a very good evaluation function
to reach
“almost” world champion level
• Players have since copied its style!
– Modern programs based on neural networks are believed to better than the
best humans
• Poker
– Pluribus from Facebook and CMU recently beat a group of professional poker
players.
24
http://www.theverge.com/2016/3/9/11184362/google-alphago-go-deepmind-result