CS计算机代考程序代写 database flex AI algorithm 07GamePlaying

07GamePlaying

Game-playing

CITS3001 Algorithms, Agents and Artificial Intelligence

2021, Semester 2Tim French
Department of Computer Science and Software Engineering
The University of Western Australia

Introduction

• We will motivate the investigation of games in AI
• We will apply our ideas on search to game trees

– Minimax
– Alpha-beta pruning

• We will introduce the idea of an evaluation
function

– And some concepts important to their design

2

Broadening our worldview

• In our discussions so far, we have assumed that world descriptions have been
– Complete – all information needed to solve the problem is available to the search

algorithm
– Deterministic – the effects of actions are uniquely determined and predictable

• But this is rarely the case with real-world problems!

• Sources of incompleteness include
– Sensor limitations – it may be impossible to perceive the entire state of the world
– Intractability – the full state description may be too large to store, or too large to compute

• Sources of non-determinism are everywhere
– e.g. people, weather, mechanical failure, dice, etc.

• Incompleteness ↔ non-determinism?
– Both imply uncertainty
– Addressing them involves similar techniques 3

Three Approaches to Uncertainty

• Contingency planning
– Build all possibilities into the plan
– Often makes the tree very large
– Can only guarantee a solution if the number of contingencies is tractable

• Interleaving, or adaptive planning
– Alternate between planning, acting, and sensing
– Requires extra work during execution
– Unsuitable for offline planning

• Strategy learning
– Learn, from examples, strategies that can be applied in any situation
– Must decide on parameterisation, state-evaluation, suitable examples to study, etc.

4

Why Study Games

• Games provide
– An abstraction of the real world
– Well-defined, clear state descriptions
– Limited operations with well-defined consequences
– A way of making incremental, controllable changes
– A way of including hostile agents

• So they provide a forum for investigating many of the real-world issues outlined
previously

– More like the real world than previous examples…

• The initial state and the set of actions (the moves of the game) define a game tree
that serves as the search tree

– But of course different players get to choose actions at various points
– So our previous search algorithms don’t work!

• Games are to AI, as F1 is to car design
5

Example: noughts and crosses

• Each level down
represents a move by one
player

– Known as one ply
– Stop when we get to a

goal state (three in a line)
• What is the “size” of this

problem?

6

Noughts and Crosses Vital Statistics

• The game tree as drawn above has 9! = 362,880 edges
– But that includes games that continue after a victory
– Removing these gives 255,168 edges

• Combining equivalent game boards leaves 26,830
edges

– Mostly this means resolving rotations and reflections

• Each square can be a cross, a circle, or empty
– Therefore there are 39 = 19,683 distinct boards
– But that includes (e.g.) boards with five crosses and two

circles
– Removing these gives 5,478 distinct legal boards

• Resolving rotations and reflections leaves 765 distinct
legal boards

• The takeaway message is “think before you code”!
7

Noughts and Crosses Scenarios

• You get to choose your opponent’s moves, and you know the goal, but you don’t
know what is a good move

– Normal search works, because you control everything
– What is the best uninformed search strategy?

• How many states does it visit?
– What is a good heuristic for A* here?

• How many states does it visit?

• Your opponent plays randomly
– Does normal search work?
– Uninformed strategy?
– A* heuristic?

• Your opponent tries
– We know it’s a draw really

• One important difference with games is that we don’t get to dictate all of the actions
chosen, The opponent has a say too! 8

Perfect Play: the Minimax algorithm

• Consider a two-player game between MAX and MIN
– Moves alternate between the players

• Assume it is a zero-sum game
– Whatever is good for one player, is bad for the other

• Assume also that we have a utility function that we can apply to any game position
– utility(s) returns r Î R
– ∞ if s is a win for MAX
– positive if s is good for MAX
– 0 if s is even
– negative if s is good for MIN
– –∞ if s is a win for MIN

• Whenever MAX has the move in position s, they choose the move that maximises the value
of utility(s)

– Assuming that MIN chooses optimally
• Conversely for MIN
Minimax(s)

= utility(s), if terminal(s)
= max{Minimax(result(s, a)) | a Î actions(s)}, if player(s) = MAX
= min{Minimax(result(s, a)) | a Î actions(s)}, if player(s) = MIN 9

Minimax operation

• We imagine that the game tree is expanded to some definition of terminals
– This will depend on the search depth

• In the figure, two ply
– This will depend on the available resources
– In general, it won’t be uniform across the tree

• The tree is generated top-down, starting from the current position
– Then Minimax is applied bottom-up, from the leaves back to the current position

• At each of MAX’s choices, they (nominally) choose the move that maximises the
utility

– Conversely for MIN

10

Minimax Performance

• Complete: yes, for a finite tree

• Optimal: yes, against an optimal opponent

• Time: O(bm), all nodes examined
• Space: O(bm), depth-first (or depth-limited) search

• Minimax can be extended straightforwardly

to multi-player games

– Section 5.2.2 of AIMA

• But for a “big” game like chess, expanding to the terminals is completely infeasible

• The standard approach is to employ

– A cut-off test, e.g. a depth limit

• Possibly with quiescence search
– An evaluation function

• Heuristic used to estimate

the desirability of a position

• This will still be perfect play…. If we have a perfect evaluation function…
11

Example: Chess
• Average branching factor is 35

– Search tree has maybe 35100 nodes

– Although only around 1040 distinct legal positions

• Clearly cannot solve by brute force

– Intractable nature → incomplete search

– So offline contingency planning is impossible

• Interleave time- or space-limited search with moves

– This lecture

– Algorithm for perfect play [Von Neumann, 1944]

– Finite-horizon, approximate evaluation [Zuse, 1945]

– Pruning to reduce search costs [McCarthy, 1956]

• Or use/learn strategies to facilitate move-choice based on

current position

– Later in CITS3001

• What do humans do?
12

Evaluation Functions

• If we cannot expand the game tree to terminal nodes, we expand “as far as we
can” and apply some judgement to decide which positions are best

• A standard approach is to define a linear weighted sum of relevant features
– e.g. in chess: 1 for each pawn, 3 for each knight or bishop, 5 for each rook, 9 for each

queen
– Plus positional considerations, e.g. centre control
– Plus dynamic considerations, e.g. threats

• eval(s) = w1f1(s) + w2f2(s) + … + wnfn(s)
– e.g. w1 = 9
– e.g. f1(s) = number of white Qs – number of black Qs

• Non-linear combinations are also used
– e.g. reward pairs of bishops

13

Material advantage

Positional advantage

Properties of good evaluation
functions

• Usually the quality of the player depends
critically on the quality of the evaluation
function

• An evaluation function should
– Agree with the utility function on terminal

states
– Reflect the probability of winning
– Be time efficient, to allow maximum search

depth
• Note that the exact values returned seldom

matter
– Only the ordering matters

• An evaluation could also be accompanied
by a measure of certainty

– e.g. we may prefer high certainty when we
are ahead, low certainty when we are behind

14

Cutting off Search

• We can cut-off search at a fixed depth
– Works well for simple games
– Depth-limited search

• Often we are required to manage the time taken per move
– Can be hard to turn time into a cut-off depth
– Use iterative-deepening

• An anytime algorithm
– Sacrifice (some) depth for flexibility

• Sometimes we are required to manage the time taken for a series of moves
– More complicated again
– Sometimes we can anticipate changes in the branching factor

• Seldom want cut-off depth to be uniform across the tree
– Two particular issues that arise often are quiescence and the horizon effect

15

Quiescence

• A quiescent situation is one where values from the evaluation function are unlikely
to change much in the near future

• Using a fixed search-depth can mean relying on the evaluations of non-quiescent
situations

– Can avoid this by e.g. extending the search to the end of a series of captures

16

The Horizon Effect

• If we are searching to k ply, something bad that will happen on the k+1th ply (or
later) will be invisible

• In extreme cases, we may even select bad moves,
simply to postpone the inevitable

– If “the inevitable” scores –x, any move that scores better than –x in the search window
looks good

– Even if the inevitable is still guaranteed
to happen later!

• No general solution to this problem
– It is fundamentally a problem with lack of depth

17

Alpha-Beta Pruning

• One way we can reduce the number of nodes examined by Minimax is to identify
nodes that cannot be better than those that we have already seen

– This will enable a deeper search in the same time
• Consider again Fig. 5.2
• Minimax(A) = max(min(_,_,_), min(_,_,_), min(_,_,_))

– Working from left-to-right
– First we inspect the 3, 12, and 8

• Minimax(A) = max(3, min(_, _, _), min(_, _, _))
– Next we inspect the first 2

• Minimax(A) = max(3, min(2, _, _), min(_, _, _))
– This is less than the 3
– The next two leaves are immediately irrelevant

• Minimax(A) = max(3, min(_, _, _)) = max(3, 2) = 3
• We do not need to inspect the 5th and 6th leaves

– But we do need to inspect the 8th and 9th…

18

Alpha-Beta Operation

• We need to keep track of the range of
possible values for each internal node

• In Fig. 5.6, if
– On the left sub-tree, we know definitely

that we can choose a move that gives
score m, and

– On the right sub-tree, we know that the
opponent can choose a move that
limits the score to n ≤ m

• Then we will never (rationally) choose the
move that leads to the right sub-tree

19

Alpha-beta pseudo-code

αβsearch(s):
return a Î actions(s) with value maxvalue(s, –∞, +∞)

maxvalue(s, α, β):
if terminal(s) return utility(s)
else
v = –∞
for a in actions(s)
v = max(v, minvalue(result(s, a), α, β))
if v ≥ β return v
α = max(α, v)

return v

minvalue(s, α, β):
if terminal(s) return utility(s)
else
w = +∞
for a in actions(s)
w = min(w, maxvalue(result(s, a), α, β))
if w ≤ α return w
β = min(β, w)

return w

20

Alpha-beta in action
• αβsearch(A) = maxvalue(A, –∞, +∞), v = –∞
• call minvalue(B, –∞, +∞), w = +∞

– call maxvalue(B1, –∞, +∞)
• returns 3, w = 3, β = 3

– call maxvalue(B2, –∞, 3)
• returns 12

– call maxvalue(B3, –∞, 3)
• returns 8

– returns 3, v = 3, α = 3
• call minvalue(C, 3, +∞), w = +∞

– call maxvalue(C1, 3, +∞)
• returns 2, w = 2

– returns 2
• call minvalue(D, 3, +∞), w = +∞

– call maxvalue(D1, 3, +∞)
• returns 14, w = 14, β = 14

– call maxvalue(D2, 3, 14)
• returns 5, w = 5, β = 5

– call maxvalue(D3, 3, 5)
• returns 2, w = 2

– returns 2
• returns 3 21

Alpha-beta discussion
• Pruning does not affect the final result

– It simply gets us there sooner

• A good move ordering means we can prune more
– e.g. if we had inspected D3 first, we could have pruned D1 and D2

• We want to test expected good moves first
– “Good” from the POV of that node’s player

• “Perfect ordering” can double our search depth
– Obviously perfection is unattainable, but e.g. in chess we might test

• Captures
• Threats
• Forward moves
• Backward moves

• Sometimes we can learn good orderings
– Known as speedup learning
– Can play either faster at the same standard, or better in the same time

22

Game playing agents….

• Checkers (Draughts)
– Marion Tinsley ruled Checkers for forty years, losing only seven games in that time
– In 1994 Tinsley’s health forced him to resign from a match against Chinook, which was

crowned world champion shortly afterwards
– At that time, Chinook used a database of 443,748,401,247 endgame positions
– Checkers has since been proved to be a draw with perfect play

• The proof was announced at UWA
• Chinook now plays perfectly, using αβ search and a database of

39,000,000,000,000 positions

• Chess
– Deep Blue defeated Gary Kasparov in a six-game match in 1997
– Deep Blue searches 200,000,000 positions/second, up to 40 ply deep

• Othello
– Look-ahead is very difficult for humans in Othello
– The Moor became world champion in 1980
– These days computers are banned from championship play 23

Game playing agents….
• Go

– 19×19 Go has a branching factor of over 300, making look-ahead very difficult
for programs

• Play at a “good amateur” level, although still improving
– They are much better at 9×9
– DeepMind AlphaGo defeated Go champion Lee Sedol in March 2016.

• Backgammon
– Dice rolls increase the branching factor

• 21 possible rolls with two dice
– About 20 legal moves with most positions and rolls

• Although approx 6,000 sometimes with a 1+1!
• Depth 4 means 20 x (21 x 20)3 ≈ 1,500,000,000 possibilities

– Obviously most of this search is “wasted”
• Value of look-ahead is much diminished

– TDGammon (1992) used depth 2 search plus a very good evaluation function
to reach
“almost” world champion level

• Players have since copied its style!
– Modern programs based on neural networks are believed to better than the

best humans
• Poker

– Pluribus from Facebook and CMU recently beat a group of professional poker
players.

24

http://www.theverge.com/2016/3/9/11184362/google-alphago-go-deepmind-result