COMP 424 – Artificial Intelligence Game Playing
Instructor: Jackie CK Cheung and Readings: R&N Ch 5
Quick recap
Copyright By PowCoder代写 加微信 powcoder
Standard assumptions (except for the lecture on uncertainty):
• Discrete (vs continuous) state space
• Deterministic (vs stochastic) environment
• Observable (vs unobservable) environment
• Static (vs changing) environment
• There is only a single AI agent
COMP-424: Artificial intelligence 2
Quick recap
Standard assumptions (except for the lecture on uncertainty):
• Discrete (vs continuous) state space
• Deterministic (vs stochastic) environment
• Observable (vs unobservable) environment
• Static (vs changing) environment
• There is only a single AI agent
• In the Tic-tac-toe exercise, we pushed uncertainty from the other
agent’s moves into “the environment” (AND nodes)
• Doesn’t account for the fact that the other player has its own goal
COMP-424: Artificial intelligence 3
Today’s lecture
• Adversarial search → games with 2 players
• Planning ahead in a world where other agents are planning
against us
• Minimax search
• Evaluation functions
• Alpha-beta pruning
• State-of-the-art game playing programs
COMP-424: Artificial intelligence 4
Game playing
• One of the oldest, best studied domains in AI! Why?
• They’re fun! People enjoy games and are good at playing them
• Many games are hard:
• State spaces can be huge and complicated
• Chess has branching factor of 35 and games go to 50 moves per player → 35100 nodes in the search tree!
• Games may be stochastic and partially observable
• Real-time constraints (e.g., fixed amount of time between moves)
• Games require the ability to make some decision even when calculating the optimal decision is infeasible
• Clear, clean description of the environment and actions
• Easy performance indicator: winning is everything
COMP-424: Artificial intelligence 5
Types of games
• Perfect information vs. Imperfect information
• Perfect: players observe the exact state of the game
• Imperfect: information is hidden from players
• Fully observable vs. partially observable
• Deterministic vs. Stochastic
• Deterministic: state changes are fully determined by player moves • Stochastic: state changes are partially determined by chance
COMP-424: Artificial intelligence 6
What type of game?
Perfect information? Yes
Checkers Othello Chess Go
Mastermind Solitaire*
Backgammon
Scrabble Monopoly
Most card games
Deterministic
Stochastic
COMP-424: Artificial intelligence
Game playing as search
• Consider 2-player, turn-taking, perfect information, deterministic games
• Can we formulate them as a search problem?
COMP-424: Artificial intelligence 8
Game playing as search
• Consider 2-player, turn-taking, perfect information, deterministic games
• Can we formulate them as a search problem? Yes!
COMP-424: Artificial intelligence 9
Game playing as search
• Consider 2-player, turn-taking, perfect information, deterministic games
• Can we formulate them as a search problem? Yes! State, s: the state of the board + which player moves
Operators, o:
Transition fcn:
Legal moves in a given state Defines the result of a move States in which the game is over
Terminal states: (won/lost/drawn)
Utility fcn: Defines a numeric value for the game for player p, when the game ends in terminal
Simple case: +1 for win, -1 for loss, 0 for draw. More complex cases: points won, money, …
COMP-424: Artificial intelligence 10
Game playing as search
• Consider 2-player, turn-taking, perfect information, deterministic games
• Can we formulate them as a search problem? Yes!
• We want to find a strategy (a way of picking moves) that
maximizes utility
• i.e., maximize the probability of winning or the expected points,
minimize the cost, …
COMP-424: Artificial intelligence 11
Game search challenge
• It’s not quite the same as simple searching
• There’s an opponent! That opponent is adversarial!
COMP-424: Artificial intelligence 12
Game search challenge
• It’s not quite the same as simple searching
• There’s an opponent! That opponent is adversarial!
• The opponent has its own goals which don’t match our goals
• Opponent tries to make things good for itself and bad for us
• We must simulate the opponent’s decisions
COMP-424: Artificial intelligence 13
Quick aside
• In games, there’s an adversarial opponent
• It has its own goals which don’t match our goals
• A special case: the zero-sum game
COMP-424: Artificial intelligence 14
Quick aside
In games, there’s an adversarial opponent
• It has its own goals which don’t match our goals
• A special case: the zero-sum game
Zero-sum games
• Each player’s gain or loss of utility is exactly balanced by the
losses or gains of the utility of the other player
• If I win, you lose: U(p1) = 1 → U(p2) = -1 U(p1) + U(p2) = 0
COMP-424: Artificial intelligence
Game search challenge
• It’s not quite the same as simple searching
• There’s an opponent! That opponent is adversarial!
• It has its own goal which does not match our goal
• We must simulate the opponent’s decisions
• Key idea: Define
• a max player (who wants to maximize the utility)
• a min player (who wants to minimize it)
COMP-424: Artificial intelligence 16
Example: Game Tree for Tic-Tac-Toe
Term for the move of one player. A full move is two ply.
Search Tree:
A tree superimposed on the full game tree that examines enough nodes for the player to determine what move to make.
COMP-424: Artificial intelligence 17
Minimax search
An algorithm for finding the optimal strategy: the best move to play at each state (node)
How it works:
Expand the complete search tree until terminal states have been reached
Compute utilities of the terminal states
Back up from the leaves towards the current game state:
• At each min node: back up the worst value among its children
• At each max node: back up the best value among its children
COMP-424: Artificial intelligence 18
Minimax search
• Expand the search tree to the terminal states
• Compute utilities at terminal states
• Back up from the leaves towards the current game state
• At each min node: back up the worst value among the children
• At each max node: back up the best value among the children
COMP-424: Artificial intelligence 19
Minimax search
COMP-424: Artificial intelligence
Minimax search
COMP-424: Artificial intelligence
Minimax search
COMP-424: Artificial intelligence
Minimax search
COMP-424: Artificial intelligence
Minimax search
COMP-424: Artificial intelligence
Minimax search
The minimax value at each node tells us how to play an optimal game!
COMP-424: Artificial intelligence 25
Minimax Algorithm
operator MinimaxDecision() for each legal operator o:
apply the operator o and obtain the new game state s.
Value[o] = MinimaxValue(s)
return the operator o with the highest value Value[o].
double MinimaxValue(s)
if isTerminal(s), return Utility(s). for each state s’ in Successors(s)
let Value(s’) = MinimaxValue(s’).
if Max’s turn to move in s, return maxs’Value(s’). if Min’s turn to move in s, return mins’Value(s’).
COMP-424: Artificial intelligence 26
• Apply minimax search to the following game search tree
COMP-424: Artificial intelligence 27
Apply minimax search to the following game search tree 7
27 Max4 27
COMP-424: Artificial intelligence 28
Properties of Minimax search
• Complete? • Optimal?
• Time complexity?
• Space complexity?
COMP-424: Artificial intelligence 29
Properties of Minimax search
• Complete? If the game tree is finite
• Optimal? Against an optimal opponent, yes
• It maximizes the worst-case outcome for Max
• There might exist better strategies for suboptimal opponents
• Time complexity? O(bm)
• Space complexity? O(bm) if we use DFS
COMP-424: Artificial intelligence 30
Properties of Minimax search
• Complete? If the game tree is finite
• Optimal? Against an optimal opponent, yes
• It maximizes the worst-case outcome for Max
• There might exist better strategies for suboptimal opponents
• Time complexity? O(bm)
• Space complexity? O(bm) if we use DFS
• So is Minimax suitable for solving chess?
COMP-424: Artificial intelligence 31
Properties of Minimax search
• Complete? If the game tree is finite
• Optimal? Against an optimal opponent, yes
• It maximizes the worst-case outcome for Max
• There could be superior strategies for suboptimal opponents
• Time complexity? O(bm)
• Space complexity? O(bm) if we use DFS
• So is Minimax suitable for solving chess?
• In chess: b≈35, m≈100, so an exact solution is impossible!
COMP-424: Artificial intelligence 32
Coping with resource limitations
• Suppose we have 100 seconds to make a move, and we can search 104 nodes per second
• Then we can only search 106 nodes
(Or even fewer, if we spend time deciding which nodes to search)
• Possible approach:
• Use a cutoff test (e.g., based on a depth limit)
• Use an evaluation function for the nodes where we cut the search (since they’re not terminal states, we don’t know the true utility)
• Need to think about real-time search
COMP-424: Artificial intelligence 33
Evaluation functions
• An evaluation function v(s) estimates the expected utility
of a state (e.g., the likelihood of winning from that state)
• Performance depends strongly on the quality of v(s)
• If it is too inaccurate it will lead the agent astray
• Desiderata:
• It should order the terminal states according to their true utility
• It should be relatively quick to compute
• For nonterminal states, it should be strongly correlated with the actual chances of winning
• An evaluation function can be designed by an expert or learned from experience
COMP-424: Artificial intelligence 34
Evaluation functions
• Most evaluation functions work by calculating features of the state (e.g., # of pawns, queens, etc.)
• Features define categories or equivalence classes of states
• Typically, we compute features separately and then combine
them for an aggregate value
• E.g., if the features of the board are independent, use a weighted linear function:
v(s) = w1 f1(s) + w2 f2(s) + … + wn fn(s)
• Independence is a strong, likely incorrect assumption. But it could still be useful!
• An extra bishop is worth more in the endgame; its weight should depend on the “move number” feature
COMP-424: Artificial intelligence 35
Example: Chess
Black to move White to move White slightly better Black winning
• Linear evaluation function: v(s) = w1 f1(s) + w2 f2(s) f1(s) = (# white queens) – (# black queens)
f2(s) = (# white pawns) – (# black pawns)
COMP-424: Artificial intelligence 36
Example: Chess
Black to move White to move White slightly better Black winning
• Linear evaluation function: v(s) = w1 f1(s) + w2 f2(s)
f1(s) = (# white queens) – (# black queens) f2(s) = (# white pawns) – (# black pawns)
w1 = 9 w2 = 3
COMP-424: Artificial intelligence
How precise should the evaluation fcn be?
• For deterministic games, all that matters is that the fcn preserve the ordering of the nodes
• Thus, the move chosen is invariant under monotonic transformations of the evaluation function
• In deterministic games, payoff acts as an ordinal utility function
COMP-424: Artificial intelligence 38
Minimax with an evaluation function
• Use the evaluation function to evaluate non-terminal nodes
• This helps make a decision without searching until the end of the game
• Minimax cutoff algorithm:
Same as standard Minimax, except stop at some maximum depth m, use the evaluation function on those nodes, back up from there
COMP-424: Artificial intelligence 39
Minimax cutoff in chess
• How many moves ahead can we search in chess?
If our hardware could search 106 nodes in the available time, then
minimax cutoff with b=35 could search 4 moves ahead
• Is that good?
4 moves ahead ≈ novice player
8 moves ahead ≈ human master, typical PC 12 moves ahead ≈ Deep Blue, Kasparov
• Key idea:
Instead of exhaustive search, let’s search a few lines of play, but deeply We need pruning!
COMP-424: Artificial intelligence 40
α-β pruning
• Simple idea: if a path looks worse than what we already have, then discard it (prune)
• If the best move at a node cannot change (regardless of what we would find by searching), there’s no need to search further!
• α-β is a standard technique for deterministic, perfect information games
• How does it work?
• It uses two parameters, α and β, to track bounds on the utility or
evaluation values
COMP-424: Artificial intelligence 41
α-β pruning
• Proceed like Minimax, with DFS and then back ups
• Additionally, keep track of:
• the best (highest) leaf value found for Max (in α)
• the best (lowest) leaf value found for Min (in β)
• At max nodes, update α only
• At min nodes, update β only
• Prune in the event of inconsistency (α ≥ β)
COMP-424: Artificial intelligence 42
Initialize α and β
COMP-424: Artificial intelligence
COMP-424: Artificial intelligence
COMP-424: Artificial intelligence
COMP-424: Artificial intelligence
COMP-424: Artificial intelligence
COMP-424: Artificial intelligence
COMP-424: Artificial intelligence
COMP-424: Artificial intelligence
COMP-424: Artificial intelligence
COMP-424: Artificial intelligence
COMP-424: Artificial intelligence
In this search, we pruned away two leaf nodes compared to Minimax.
COMP-424: Artificial intelligence 54
• The textbook’s version of this example (Edition 3, p. 168, Figure 5.5) shows the correct steps, but with incorrect and confusing alpha and beta values, which don’t match their algorithm!
COMP-424: Artificial intelligence 55
α-β pruning algorithm
double MaxValue(s,α,β)
if cutoff(s), return Evaluation(s).
for each state s’ in Successors(s)
let α = max { α, MinValue(s’,α,β) }. if α ≥ β, return β.
double MinValue(s,α,β)
if cutoff(s), return Evaluation(s). for each state s’ in Successors(s)
let β = min { β, MaxValue(s’,α,β) }.
if α ≥ β, return α. return β.
COMP-424: Artificial intelligence 56
Important lessons of α-β pruning
• Pruning can greatly increase efficiency!
• Pruning does not affect the final result.
• The best moves are same as returned by Minimax (assuming the opponent is optimal and the evaluation function is correct.)
• Order matters for search efficiency!
• α-β pruning demonstrates the value of reasoning about
which computations are important!
COMP-424: Artificial intelligence 57
Efficiency of α-β pruning
• With bad move ordering, time complexity is O(bm) • The same as Minimax since nothing is pruned
• With perfect ordering, time complexity is O(bm/2)
• This means double the search depth for the same resources!
• In chess: the difference between a novice and expert agent
• On average, O(b3m/4), if we expect to find the max/min after b/2 expansions
• Randomizing the move ordering can achieve the average
• An Evaluation function can be used to order the nodes
• A useful ordering fcn for chess: try captures first, then threats, then forward moves, then backward moves
COMP-424: Artificial intelligence 58
Drawbacks of α-β
• If the branching factor is really big, search depth is still too limited. E.g., Go, where branching factor is about 300
• Optimal play is guaranteed against an optimal opponent if search proceeds to the end of the game. But the opponent may not be optimal!
• If heuristics are used, this assumption turns into the opponent playing optimally according to the same heuristic function as the player.
• This is a very big assumption! What if the opponent plays very differently?
COMP-424: Artificial intelligence 59
Forward pruning
Idea (for domains with large branching factor): Only explore the n best moves for current state (according to the evaluation function)
• Unlike α-β pruning, this can lead to suboptimal solutions
• But it can be very efficient with a good evaluation function
COMP-424: Artificial intelligence 60
State-of-the-art game playing programs
Chinook (Schaeffer et al., U. of Alberta)
• Best checkers player (since 1990’s)
• Plain α-β search, performed on standard PCs
• Evaluation function based on expert features of the board
• Opening move database
• HUGE endgame database!
Chinook has perfect information for all checkers positions involving 8
or fewer pieces on the board (a total of 443,748,401,247 positions)
• Only a few moves in middle of the game are actually searched
• They’ve now done an exhaustive search for checkers, and through optimal play can force at least a draw
COMP-424: Artificial intelligence 62
Deep Blue (IBM)
• Specialized chess processor, special-purpose memory architecture
• Very sophisticated evaluation function (expert features, tuned weights)
• Database of standard openings/closings
• Uses a version of α-β pruning (with undisclosed improvements)
• Can search up to 40-deep in some branches
• Can search over 200 billion positions per second!
• Overall, an impressive engineering feat
COMP-424: Artificial intelligence 63
Computer Chess
• Now, several computer programs running on regular hardware are on par with human champions (e.g., Fritz, Stockfish)
• ELO ratings of top chess bots are estimated at over 3500
• C.f., rating of 2500 needed to become a Grandmaster
• Top human players at just over 2800
• Human chess players use chess bots as a way to analyze and improve their game
• Interesting preview of how humans and AI can cooperate?
AlphaGo (DeepMind, 2016)
• Uses Monte Carlo tree search (more on this next class!) to simulate future game states
• Uses deep reinforcement learning (i.e., multi-layer neural networks + reinforcement learning) to learn how to value moves (policy network) and board states (value network):
• These are machine learning techniques
• Requires access to a database of previous games by expert players
• Performance:
• March 2016: 4W-1L record against
• 2017: 60W-0L record after further training against human pros
• AlphaZero (2017) removed the need for human game data, learned to play “from scratch” against itself!
COMP-424: Artificial intelligence 65
MuZero (DeepMind, 2020)
Some possible strategies for AI in games
• Design a compact state representation
• Search using iterative deepening for real-time play
• Use alpha-beta pruning with an evaluation function
• Order moves using the evaluation function
• Tune the evaluation function using domain knowledge, trial-and-error, learning
• Searching deeper is often more important than having a good evaluation function
• Consider using different strategies for opening, middle and endgame (including look-ups)
• Consider that the opponent might not be optimal
• Decide where to spend the computation effort!
COMP-424: Artificial intelligence 67
• Understand the different types of games
• Understand Minimax and alpha-beta pruning, what they compute, how to implement, and common methods to make them work better (e.g., node ordering)
• Understand the role of the evaluation function and desirable characteristics
• Get familiar with the state-of-the-art for solving some common games
COMP-424: Artificial intelligence 68
Extra Reading (R&N)
• Stochastic games
• Add chance nodes to the search try,
• Minimax → Expectiminimax
• α-β applies (with modifications for chance nodes)
• More sophisticated move ordering schemes
• killer moves
• transposition tables
• More sophisticated cutoffs
• quiescence search
• singular extensions
• Multiplayer games
COMP-424: Artificial intelligence 69
Human or computer – who is better?
• 1994: Chinook
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com