CS代考 COMP3308/COMP3608, Lecture 4

COMP3308/COMP3608, Lecture 4
ARTIFICIAL INTELLIGENCE
Game Playing
Reference: Russell and Norvig, ch. 5

Copyright By PowCoder代写 加微信 powcoder

, COMP3308/3608 AI, week 4, 2022 1

• Characteristics
• Games as search
• Games of perfect information • Minimax
• Alpha-beta pruning
• Games of imperfect information
• Games of chance
, COMP3308/3608 AI, week 4, 2022 2

Why Study Games?
• Why are games attractive to humans? They:
• are fun!
• offer competition – the goal is to win
• have simple rules
• challenge our ability to think – test our “intelligence”
• Why are games appealing target for AI research?
1. Games are too hard to solve
• Very large search space for common games
Chess: b=35, m=100 (50 moves by each player) => 35100 nodes in the search tree (legal positions “only” 1040)
tic-tac-toe: b=5, m=9 => 59=125 nodes
• Require decisions in limited time
No time to calculate the exact consequences of each move; need to make best guess based on previous experience, heuristics
, COMP3308/3608 AI, week 4, 2022 3

Games are Attractive for AI (cont.)
2. More like the real world than the toy search problems
• Optimal decision is infeasible, yet some decision is required
• There is an “unpredictable” opponent to be considered
3. Easy to represent as search problems
• A state is a board configuration
• Moves: typically a small number and well defined
• Clear criterion for success
, COMP3308/3608 AI, week 4, 2022 4

Characteristics of Games
• Deterministic vs chance
• Deterministic: no chance element (no die rolls, no coin flips)
• Perfect vs imperfect information
• Perfect: no hidden information; fully observable game, each
player can see the complete game
• Imperfect: some information is hidden, e.g. in card games the player’s cards are hidden from the other players
• Zero-sum vs non zero-sum
• Zero-sum (adversarial): one player’s gain is the other player’s
, COMP3308/3608 AI, week 4, 2022 5

Types of Games
deterministic chance
perfect information
imperfect information
• Chess=? Checkers=? Backgammon? Bridge=? Poker=? Monopoly-=? Connect-4=? Scrabble?
, COMP3308/3608 AI, week 4, 2022 6

deterministic
perfect information
imperfect information
COMP3308/3608 AI, week 4, 2022 7

Specific Setting
• 2 players – MAX and MIN
• Players take turns
• No chance involved (e.g. no dice)
• Perfect information (fully observable) game
• Zero-sum game – a win for MAX is a loss for MIN and vice versa, i.e. MAX wants MIN to lose and vice versa
• Examples: chess, checkers, tic-tac-toe, connect-4, Go, Othello
, COMP3308/3608 AI, week 4, 2022 8

A Possible Way to Play
• Consider all legal moves you can make from the current position
• Compute all new positions resulting from these moves
• Evaluate each new position and determine the best move
• Make that move
• Wait for your opponent to move and then repeat
• Key steps:
• Representing a position
• Evaluating a position
• Generating all legal moves from a given position
• This can be represented as a search problem
, COMP3308/3608 AI, week 4, 2022 9

Game Playing as Search
• Computers play games by searching a game tree
• State – board configuration
• Initial state – initial board configuration + who goes first
• Terminal states – board configurations where the game is over
• Terminal test – when is the game over?
• Operators – legal moves a player can make
• Utility function (=payoff function) – numeric value associated with terminal states representing the game outcome, e.g.
• e.g. a positive value – win for MAX, negative – loss for MAX, 0 – draw (e.g. 1, -1 and 0 can be used)
• the higher the value = the better the outcome for MAX
• Evaluation function – numeric value associated with non-terminal
states; shows how good the state is, e.g. how close it is to a win
• Game tree – represents all possible game scenarios
, COMP3308/3608 AI, week 4, 2022 10

Partial Game Tree for Tic-Tac-Toe
zero-sum game: the utility values at the end of the game are equal and opposite, e.g. 1 (win) and -1 (loss) from the first player’s
perspective
, COMP3308/3608 AI, week 4, 2022 11

Evaluation Function
• A numeric value e(s) associated with each non-terminal state s
• Similar to the heuristic functions we studied before (e.g. an estimate
of the distance to the goal)
• It gives the expected outcome of the game from the current position, for a given player, e.g. for MAX:
• state s → number e(s)
• e(s) is a heuristic value that estimates how favorable s is for MAX
• e(s) > 0 – s is favorable for MAX
• e(s) < 0 - s is favorable for MIN • e(s) = 0 - s is neutral • Or more generally: the higher the value e(s), the more favorable the position s is for MAX , COMP3308/3608 AI, week 4, 2022 12 Evaluation Function – Example for Tic-Tac-Toe MAX: cross, MIN: circle e(s) = number of rows, columns and diagonals open for MAX - number of rows, columns and diagonals open for MIN , COMP3308/3608 AI, week 4, 2022 13 8-8 = 0 6-4 = 2 http://ai.stanford.edu/~latombe/cs121/2011/slides/L-adversarial.ppt Game Playing as Normal Search • Consider the following: • Player 1 (MAX) is the first player • He/she searches for a sequence of moves leading to a terminal state that is a winner (according to the utility function) • Let’s apply a normal search strategy, e.g. greedy search • Expand each path to a terminal state, i.e. build the game tree • Choose the move with the best heuristic value for each player , COMP3308/3608 AI, week 4, 2022 14 Game Playing Using Greedy Search Example from http://pages.cs.wisc.edu/~skrentny/cs540/slides/7_gamePlaying.ppt 1) Player 1 (MAX) chooses C (the max value) 2) Player 2 (MIN) chooses J (the min value) • J is terminal node, the game ends • J’s value is negative => player 2 wins and player 1 loses
• What is the problem with greedy search? Does it guarantee anything (e.g. a win) for player 1 or player 2?
A Player 1 (MAX)’s
possible moves
BCDE BCDE 5923
FGHIJKLMNO
-7 -5 3 9 -6 0 2 1 3 2
Player 2 (MIN)’s possible moves
terminal states
, COMP3308/3608 AI, week 4, 2022 15

Minimax Algorithm – Idea
• Given: a 2-player, deterministic, perfect information game
• Minimax gives the perfect (best, optimal) move for a player,
assuming that its opponent plays optimally
• i.e. assuming the worst case scenario – optimal play by the opponent
• A player plays optimally if he/she always selects the best move based on the evaluation function
• MAX – always select the node with the max evaluation value
• MIN – always selects the node with the min evaluation value
, COMP3308/3608 AI, week 4, 2022 16

Minimax Algorithm – Idea (2)
• Minimax computes a value for each non-terminal node, called a minimax value
• It then chooses the move to the position with the best minimax value for the current player (the highest values for MAX or the lowest value for MIN )
• This position has the best achievable outcome for the current player, given that the other player selects the best move, i.e. plays optimally
, COMP3308/3608 AI, week 4, 2022 17

Minimax Algorithm
• Generate the game tree
• Create start node as a MAX node (MAX’s turn to move) with the current
board configuration
• Expand nodes down to terminal states [or to some depth (look ahead) if there is resource limitation – time or memory]
• Evaluate the utility of the terminal states (leaf nodes)
• Starting at the leaf nodes and going back to the root of the tree,
compute recursively the minimax-value of each node:
• at MIN nodes (i.e. in MIN’s turn) – take the min of the children’s values
• at MAX nodes (i.e. in MAX’s turn) – take the max of the children’s values
• When the root node is reached, select the best move for MAX
• i.e. the move which determined the value at the root (= the max of the
minimax values of children)
, COMP3308/3608 AI, week 4, 2022 18

Example – Game Tree for a 2-ply game
• MAX wants the highest score
• MIN wants the lowest score
, COMP3308/3608 AI, week 4, 2022 19

Example – Game Tree for a 2-ply game (2)
• MAX wants the highest score
• MIN wants the lowest score
At MIN nodes – take the min of the children’s values
, COMP3308/3608 AI, week 4, 2022 20

Example – Game Tree for a 2-ply game (3)
The minimax decision
• MAX wants the highest score
• MIN wants the lowest score
At MAX nodes – take the max of the children’s values
, COMP3308/3608 AI, week 4, 2022 21

Example – Game Tree for a 2-ply game (3)
• Which move should MAX choose? The one with the highest evaluation function => the left most (3)
• If MIN plays optimally, what will be the result of the game?
• 3 – MIN will choose the lowest value (the left branch)
, COMP3308/3608 AI, week 4, 2022 22

Check that Mimimax Achieved the Outcome for MAX Assuming MIN Played Optimally
3 options for MAX:
1) If MAX follows left branch and MIN plays optimally: utility =3
2) If MAX follows middle branch and MIN plays optimally: utility=2
3) If MAX follows right branch and MIN plays optimally: utility=2
• => yes, 1) is the best option for MAX (highest utility) = optimal move
, COMP3308/3608 AI, week 4, 2022 23

Another Example

8 7 2 9 1 6 2 4 1 1 3 5 3 9 2 6 5 2 1 2 3 9 7 2 16 6 4 , COMP3308/3608 AI, week 4, 2022 24

Minimax Algorithm – Pseudocode
function MINIMAX-DECISION(state) returns an action inputs: state, current state in game vMAX-VALUE(state)
return the action in SUCCESSORS(state) with value v
function MAX-VALUE(state) returns a utility value
if TERMINAL-TEST(state) then return UTILITY(state) v-∞
for a,s in SUCCESSORS(state) do v  MAX(v,MIN-VALUE(s))
function MIN-VALUE(state) returns a utility value
if TERMINAL-TEST(state) then return UTILITY(state) v∞
for a,s in SUCCESSORS(state) do v  MIN(v,MAX-VALUE(s))
, COMP3308/3608 AI, week 4, 2022 25

Minimax Algorithm – Typical Implementation
http://pages.cs.wisc.edu/~skrentny/cs540/slides/7_gamePlaying.ppt
For each move by MAX (first player):
Perform depth-first search to a terminal state Evaluate each terminal state
Propagate upwards the minimax values
if MIN’s move, minimum value of children backed up
if MAX’s move, maximum value of children backed up
Choose move with the maximum of minimax values of children
• minimax values gradually propagate upwards as DFS proceeds: i.e.,
minimax values propagate up in left-to-right fashion
• minimax values for sub-tree backed up as-we-go,
so only O(bd) nodes need to be kept in memory at any time
, COMP3308/3608 AI, week 4, 2022 26

What if MIN Does Not Play Optimally?
• In Minimax MAX assumes that MIN plays optimally: maximizes worst-case outcome for MAX
• But if MIN does not play optimally, MAX will do even better! (proven)
, COMP3308/3608 AI, week 4, 2022 27

Checking that MAX will do Even Better if MIN does not Play Optimally
1) If MAX follows left branch and MIN does not play optimally: utility = 12 or 8, better result for MAX than utility=3 if MIN plays optimally
2) If MAX follows middle branch and MIN does not play optimally: utility = 4 or 6, better result for MAX than utility=2 if MIN plays optimally
3) If MAX follows right branch and MIN dies not play optimally: utility = 14 or 5, better than utility=2 if MIN plays optimally
, COMP3308/3608 AI, week 4, 2022 28

Games with More Than Two Players
• Many games allow more than two players
• Single minimax values become vectors with a value for each player
• Each player is maximizing its value
• The whole vector is backed up, not a single value
• Example: a three player game
The first 3 plies of a game with 3 players (A, B, C)
Each node is labelled with utility values from the viewpoint of each player
, COMP3308/3608 AI, week 4, 2022 29

Multiplayer Games (2)
• Backed-up value of a node n is the utility vector of whichever child has the highest value for the player choosing at n
• Example: player B chooses a move
• Compare the second element of (1,2,6) and (6,1,2), 2>1 => B
chooses the node (1,2,6)
for A for B
Tie for A ( 1 and 1) – choose randomly the left branch – there should be a rule for resolving ties
COMP3308/3608 AI, week 4, 2022 30

Properties of Minimax
• Implemented as DFS
• Assumptions: branching factor b, all terminal nodes at depth d
• Optimal? – will the optimal score be reached?
• “optimal score”- the score of the terminal node that will be reached if
both players play optimally
• Yes, against an optimal opponent (i.e. MAX and MIN play optimally)
• Against a suboptimal player? I.e. MAX plays optimally but MIN plays sub-optimally: the score of the terminal node reached will never be lower than the optimal score (see slide 28 and the tutorial exercises)
• Time complexity? O(bm) as in DFS
• Space complexity? O(bm) as in DFS
• Time is the major problem – moves need to be chosen in a limited time
, COMP3308/3608 AI, week 4, 2022 31

Resource Limits
• Minimax assumes that the program has time to search all the way to terminal states, which is not always possible
• In most cases we have limited resources, especially time
• How far ahead can we look? Consider chess and suppose that:
• we have 100 seconds per move
• in 1 second we can explore 10 000 nodes
=> i.e. at each move we can explore 106 nodes: => bm=106, for b=35 => m=4 ply ahead
• But this is not good enough
• 4 ply = human novice
• 8 ply = human master
• 12 ply = Kasparov and Deep Blue
• Is there any algorithm that is guaranteed to produce the
same result as minimax but will generate less nodes? Yes!
, COMP3308/3608 AI, week 4, 2022 32

Not All Nodes Are Worth Exploring
• There is no need to evaluate the other children of B as we already know that MAX will not choose B, it will choose A
value >=100
B value <=20 A value =100 100 “If you have an idea that is surely bad, don’t waste time to see how truly awful it is.” , COMP3308/3608 AI, week 4, 2022 33 Why Not All Nodes Are Worth Exploring Let the values of the 2 non-evaluated children of B are x and y minimax(S) = max(A,B) = max(min(200,100),min(20,x,y)) = = max(100,  20) =100 => The value of the root is independent of the values of the leaves x and y => we can prune x and y – there is no need to generate and evaluate them
, COMP3308/3608 AI, week 4, 2022 34

Alpha-Beta Pruning – Idea
• No need to examine every node; prune the nodes that do not influence the decision
• If alpha-beta pruning is applied to a standard minimax tree, it returns the same move as minimax, but prunes brunches that cannot influence the final decision
, COMP3308/3608 AI, week 4, 2022 35

Alpha-Beta Pruning
• While doing DFS of game tree, along the path, keep 2 values:
• Alpha value = the best value for MAX (the highest) so far along the
path (initialise to –)
• Beta value = the best value for MIN (the lowest) so far along the
path (initialise to +)
• If a MAX node exceeds beta, prune the sub-tree below
• IF a MIN node is below alpha, prune the sub-tree below
, COMP3308/3608 AI, week 4, 2022 36

Closer Look at the Pruning – Alpha Cut-off
IF a MIN node is below alpha, prune it (alpha cut-off)
• i.e. if child’s beta <= parent’s alpha • No need to expand B further as MAX can make a better move =100 MAX  =20 MIN DEF... 100 120 20 Alpha cutoff (B)=20<(S)=100 , COMP3308/3608 AI, week 4, 2022 37 Closer Look at the Pruning – Beta Cut-off • If a MAX node exceeds beta, prune it (beta cut-off) • child’s alpha >= parent’s beta
• No need to expand B further as MIN can make a better move
Beta cutoff (B)=25>(S)=20
=20 MIN B  =25 MAX
DEF… -10 -20 25
, COMP3308/3608 AI, week 4, 2022 38

Another Way to Test if it’s Possible to Prune
Keep alpha and beta bounds, not values
If the intervals they define overlap, pruning is not possible
If the intervals they define do not overlap, pruning is possible
Do we need to evaluate the other children of B or we can prune them?
Need to evaluate them as <=120 and >=100 overlap
DE…… 100 120
, COMP3308/3608 AI, week 4, 2022 39

Alpha-Beta Pruning – Example
• We always need to evaluate all children of the first branch – it is not possible to prune them
, COMP3308/3608 AI, week 4, 2022 40

Alpha-Beta Pruning – Example
• The intervals <=2 and >=3 do not overlap, so no need to evaluate the 2 children
, COMP3308/3608 AI, week 4, 2022 41

Alpha-Beta Pruning – Example
• The intervals <=14 and >=3 overlap, so the evaluation of children should continue
, COMP3308/3608 AI, week 4, 2022 42

Alpha-Beta Pruning – Example
• The intervals <=5 and >=3 overlap, so we should evaluate the remaining children (no pruning is possible)
, COMP3308/3608 AI, week 4, 2022 43

Alpha-Beta Pruning – Example
• MAX should choose the left most path (value =3)
• MIN also should choose the left most path (value =3)
, COMP3308/3608 AI, week 4, 2022 44

Another Example
, COMP3308/3608 AI, week 4, 2022 45

Alpha-Beta Algorithm
• Traverse the search tree in depth-first order
• Apply utility function at each leaf node that is generated
• Compute values backwards
• At each non-leaf node store a value indicating the best backed up value so far
• MAX nodes: alpha value = best (max) value found so far
• MIN nodes: beta value = best (min) value found so far
• Given a node n, cutoff the search below n if:
• n is a MAX node and alpha(n)  beta (i) for some MIN node i
that is ancestor of n (beta cutoff)
• n is a MIN node and beta(n)  alpha(i) for some MAX node i
that is ancestor of n (alpha cutoff)
, COMP3308/3608 AI, week 4, 2022 46

Alpha-Beta Algorithm – Pseudocode
, COMP3308/3608 AI, week 4, 2022 47

Alpha-Beta Algorithm – Pseudocode (cont.)
, COMP3308/3608 AI, week 4, 2022 48

Properties of Alpha-Beta Pruning
• Pruning does not affect the final result

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com