COMP111: Artificial Intelligence – Section 6. Adversarial search (Game playing)
COMP111: Artificial Intelligence
Section 6. Adversarial search (Game playing)
Frank Wolter
Outline
We will look at how search can be applied to playing games
I Types of Games
I Perfect play:
I minimax algorithm
I α–β pruning
I Playing with limited resources (heuristics)
Games vs. search problems
I In search we make all moves. In games we play against an
unpredictable opponent:
I solution is a strategy specifying a move for every possible move
of the oponent.
I A method is needed for selecting good moves that stand a
good chance of achieving a winning state whatever the
opponent does!
I Because of combinatorial explosion, in practice we must
approximate using heuristics.
Games vs. search problems
I In search we make all moves. In games we play against an
unpredictable opponent:
I solution is a strategy specifying a move for every possible move
of the oponent.
I A method is needed for selecting good moves that stand a
good chance of achieving a winning state whatever the
opponent does!
I Because of combinatorial explosion, in practice we must
approximate using heuristics.
Games vs. search problems
I In search we make all moves. In games we play against an
unpredictable opponent:
I solution is a strategy specifying a move for every possible move
of the oponent.
I A method is needed for selecting good moves that stand a
good chance of achieving a winning state whatever the
opponent does!
I Because of combinatorial explosion, in practice we must
approximate using heuristics.
Types of Games
I In some games we have a fully observable environment. The
position is known completely. These are called games with
perfect information.
I Examples: chess, go, backgammon, monopoly.
I In others we have a partially observable environment. For
example, we cannot see the opponents cards. These are called
games with imperfect information.
I Examples: battleships, bridge, poker.
I Some games are deterministic: chess, go.
I Others have an element of chance: backgammon, monopoly,
bridge, poker
Types of Games
I In some games we have a fully observable environment. The
position is known completely. These are called games with
perfect information.
I Examples: chess, go, backgammon, monopoly.
I In others we have a partially observable environment. For
example, we cannot see the opponents cards. These are called
games with imperfect information.
I Examples: battleships, bridge, poker.
I Some games are deterministic: chess, go.
I Others have an element of chance: backgammon, monopoly,
bridge, poker
Types of Games
I In some games we have a fully observable environment. The
position is known completely. These are called games with
perfect information.
I Examples: chess, go, backgammon, monopoly.
I In others we have a partially observable environment. For
example, we cannot see the opponents cards. These are called
games with imperfect information.
I Examples: battleships, bridge, poker.
I Some games are deterministic: chess, go.
I Others have an element of chance: backgammon, monopoly,
bridge, poker
Types of Games
I In some games we have a fully observable environment. The
position is known completely. These are called games with
perfect information.
I Examples: chess, go, backgammon, monopoly.
I In others we have a partially observable environment. For
example, we cannot see the opponents cards. These are called
games with imperfect information.
I Examples: battleships, bridge, poker.
I Some games are deterministic: chess, go.
I Others have an element of chance: backgammon, monopoly,
bridge, poker
The games we consider
We consider special kinds of games
I Deterministic
I Two-player
I Zero-sum:
I the utility values at the end are equal and opposite
I example: one wins (+1) the other loses (−1).
I Perfect information
Problem Formulation
The search graph gives for every state the successor states
obtained by making a move. The set of goal states is replaced by a
utility function.
I Initial state sstart:
I Initial board position. Which player moves first.
I Successor function:
I provides for every state s and move the new state after the
move.
I Terminal test
I Determines when the game is over
I Utility function
I Numeric value for terminal states
I E.g. Chess +1, -1, 0
I E.g. Backgammon +192 to -192
Problem Formulation
The search graph gives for every state the successor states
obtained by making a move. The set of goal states is replaced by a
utility function.
I Initial state sstart:
I Initial board position. Which player moves first.
I Successor function:
I provides for every state s and move the new state after the
move.
I Terminal test
I Determines when the game is over
I Utility function
I Numeric value for terminal states
I E.g. Chess +1, -1, 0
I E.g. Backgammon +192 to -192
Problem Formulation
The search graph gives for every state the successor states
obtained by making a move. The set of goal states is replaced by a
utility function.
I Initial state sstart:
I Initial board position. Which player moves first.
I Successor function:
I provides for every state s and move the new state after the
move.
I Terminal test
I Determines when the game is over
I Utility function
I Numeric value for terminal states
I E.g. Chess +1, -1, 0
I E.g. Backgammon +192 to -192
Problem Formulation
The search graph gives for every state the successor states
obtained by making a move. The set of goal states is replaced by a
utility function.
I Initial state sstart:
I Initial board position. Which player moves first.
I Successor function:
I provides for every state s and move the new state after the
move.
I Terminal test
I Determines when the game is over
I Utility function
I Numeric value for terminal states
I E.g. Chess +1, -1, 0
I E.g. Backgammon +192 to -192
Problem Formulation
The search graph gives for every state the successor states
obtained by making a move. The set of goal states is replaced by a
utility function.
I Initial state sstart:
I Initial board position. Which player moves first.
I Successor function:
I provides for every state s and move the new state after the
move.
I Terminal test
I Determines when the game is over
I Utility function
I Numeric value for terminal states
I E.g. Chess +1, -1, 0
I E.g. Backgammon +192 to -192
Possible Development
X
O
O
X
X
OO
X
O
X
X
O
X
Possible Development
X
O
O
X
X
OO
X
O
X
X
O
X
Possible Development
XX
X
OO
X
O
X
X
OO
X
O
+1
Possible Development
XX
X
OO
X
O X
X
X
OO
X
O
X
X
OO
X
O
+1
Possible Development
X
XX
X
OO
X
O X
X
X
OO
X
O
X
X
OO
X
O
X
X
OO
X
O
+1
Possible Development
X
XX
X
OO
X
O X
X
X
OO
X
O
X
X
OO
X
O
X
X
OO
X
O
+1 +1
Possible Development
X
XX
X
OO
X
O X
X
X
OO
X
O
X
X
OO
X
O
X
X
OO
X
O
O
X
X
X
OO
X
O
+1 +1
Possible Development
X
X
O
X
X
OO
X
O X
X
X
OO
X
O
X
X
OO
X
O
X
X
OO
X
O
O
X
X
X
OO
X
O X
X
X
OO
X
O
+1 +1
Possible Development
X
X
O
X
X
OO
X
O X
X
X
OO
X
O
X
X
OO
X
O
X
X
OO
X
O
O
X
X
X
OO
X
O X
X
X
OO
X
O
+1 +1
−1
Possible Development
X
X
O
X
X
X
OO
X
O X
X
X
OO
X
O
X
X
OO
X
O
X
X
OO
X
O
O
X
X
X
OO
X
O X
X
X
OO
X
O
O
X
X
X
OO
X
O
+1 +1
−1
Possible Development
X
X
O
X
X
X
OO
X
O X
X
X
OO
X
O
X
X
OO
X
O
X
X
OO
X
O
O
X
X
X
OO
X
O X
X
X
OO
X
O
O
X
X
X
OO
X
O
+1 +1
+1
−1
Game Tree
I Each level labelled with player to move
I Each level represents a ply
I Half a turn
I Represents what happens with competing agents
Introducing MIN and MAX
MIN and MAX are two players:
I MAX wants to win (maximise utility)
I MIN wants MAX to lose (minimise utility for MAX)
I MIN is the opponent.
Both players will play to the best of their ability
I MAX wants a strategy for maximising utility assuming MIN
will do best to minimise MAX’s utility
I Consider minimax value of each state: the utility of a state
given that both players play optimally.
Introducing MIN and MAX
MIN and MAX are two players:
I MAX wants to win (maximise utility)
I MIN wants MAX to lose (minimise utility for MAX)
I MIN is the opponent.
Both players will play to the best of their ability
I MAX wants a strategy for maximising utility assuming MIN
will do best to minimise MAX’s utility
I Consider minimax value of each state: the utility of a state
given that both players play optimally.
Example Game Tree
XX
XX
X
X
X
XX
MAX (X)
MIN (O)
X X
O
O
OX O
O
O O
O OO
MAX (X)
X OX OX O X
X X
X
X
X X
MIN (O)
X O X X O X X O X
. . . . . . . . . . . .
. . .
. . .
. . .
TERMINAL
XX
−1 0 +1Utility
Minimax Value
I The utility (=minimax value) of a terminal state is given by
its utility already (as an input).
I The utility (=minimax value) of a MAX-state (when MAX
moves) is the maximum of the utilities of its successor states.
I The utility (=minimax value) of a MIN-state (when MIN
moves) is the minimum of the utilities of its successor states.
I Thus, we can compute the minimax value recursively starting
from the terminal states.
Formally, let Succ(s) denote the set of successors states of state s.
Define the function MinimaxV(s) recursively as follows:
MinimaxV(s) =
Utility(s) s is Terminal
maxn∈Succ(s) MinimaxV(n) MAX moves in s
minn∈Succ(s) MinimaxV(n) MIN moves in s
Minimax Value
I The utility (=minimax value) of a terminal state is given by
its utility already (as an input).
I The utility (=minimax value) of a MAX-state (when MAX
moves) is the maximum of the utilities of its successor states.
I The utility (=minimax value) of a MIN-state (when MIN
moves) is the minimum of the utilities of its successor states.
I Thus, we can compute the minimax value recursively starting
from the terminal states.
Formally, let Succ(s) denote the set of successors states of state s.
Define the function MinimaxV(s) recursively as follows:
MinimaxV(s) =
Utility(s) s is Terminal
maxn∈Succ(s) MinimaxV(n) MAX moves in s
minn∈Succ(s) MinimaxV(n) MIN moves in s
Minimax Value
I The utility (=minimax value) of a terminal state is given by
its utility already (as an input).
I The utility (=minimax value) of a MAX-state (when MAX
moves) is the maximum of the utilities of its successor states.
I The utility (=minimax value) of a MIN-state (when MIN
moves) is the minimum of the utilities of its successor states.
I Thus, we can compute the minimax value recursively starting
from the terminal states.
Formally, let Succ(s) denote the set of successors states of state s.
Define the function MinimaxV(s) recursively as follows:
MinimaxV(s) =
Utility(s) s is Terminal
maxn∈Succ(s) MinimaxV(n) MAX moves in s
minn∈Succ(s) MinimaxV(n) MIN moves in s
Minimax Value
I The utility (=minimax value) of a terminal state is given by
its utility already (as an input).
I The utility (=minimax value) of a MAX-state (when MAX
moves) is the maximum of the utilities of its successor states.
I The utility (=minimax value) of a MIN-state (when MIN
moves) is the minimum of the utilities of its successor states.
I Thus, we can compute the minimax value recursively starting
from the terminal states.
Formally, let Succ(s) denote the set of successors states of state s.
Define the function MinimaxV(s) recursively as follows:
MinimaxV(s) =
Utility(s) s is Terminal
maxn∈Succ(s) MinimaxV(n) MAX moves in s
minn∈Succ(s) MinimaxV(n) MIN moves in s
Minimax Value
I The utility (=minimax value) of a terminal state is given by
its utility already (as an input).
I The utility (=minimax value) of a MAX-state (when MAX
moves) is the maximum of the utilities of its successor states.
I The utility (=minimax value) of a MIN-state (when MIN
moves) is the minimum of the utilities of its successor states.
I Thus, we can compute the minimax value recursively starting
from the terminal states.
Formally, let Succ(s) denote the set of successors states of state s.
Define the function MinimaxV(s) recursively as follows:
MinimaxV(s) =
Utility(s) s is Terminal
maxn∈Succ(s) MinimaxV(n) MAX moves in s
minn∈Succ(s) MinimaxV(n) MIN moves in s
Minimax Value
I The utility (=minimax value) of a terminal state is given by
its utility already (as an input).
I The utility (=minimax value) of a MAX-state (when MAX
moves) is the maximum of the utilities of its successor states.
I The utility (=minimax value) of a MIN-state (when MIN
moves) is the minimum of the utilities of its successor states.
I Thus, we can compute the minimax value recursively starting
from the terminal states.
Formally, let Succ(s) denote the set of successors states of state s.
Define the function MinimaxV(s) recursively as follows:
MinimaxV(s) =
Utility(s) s is Terminal
maxn∈Succ(s) MinimaxV(n) MAX moves in s
minn∈Succ(s) MinimaxV(n) MIN moves in s
Minimax algorithm
I Calculate minimax value of each state using the equation
above starting from the terminal states.
I Game tree as minimax tree:
Minimax Tree
I MIN takes the minimal value from its successors.
I MAX takes the maximal value from its successors.
Consider
MAX
3 12 8 642 14 5 2
MIN
3
A
1
A
3
A
2
A
13
A
12
A
11
A
21
A
23
A
22
A
33
A
32
A
31
3 2 2
Properties of minimax
Assume MAX always move to the state with the maximal minimax
value.
I Optimal: against an optimal opponent. Otherwise MAX will
do even better. There may, however, be better strategies
against suboptimal opponents.
I Time complexity: can be implemented (depth-first) so that
time complexity is bm (branching factor b, depth m).
I Space complexity: can be implemented (depth-first) so that
space complexity is bm.
For chess, b ≈ 35, m ≈ 100 for “reasonable” games
I 10154 paths to explore
I infeasible
But do we need to explore every path?
Properties of minimax
Assume MAX always move to the state with the maximal minimax
value.
I Optimal: against an optimal opponent. Otherwise MAX will
do even better. There may, however, be better strategies
against suboptimal opponents.
I Time complexity: can be implemented (depth-first) so that
time complexity is bm (branching factor b, depth m).
I Space complexity: can be implemented (depth-first) so that
space complexity is bm.
For chess, b ≈ 35, m ≈ 100 for “reasonable” games
I 10154 paths to explore
I infeasible
But do we need to explore every path?
Properties of minimax
Assume MAX always move to the state with the maximal minimax
value.
I Optimal: against an optimal opponent. Otherwise MAX will
do even better. There may, however, be better strategies
against suboptimal opponents.
I Time complexity: can be implemented (depth-first) so that
time complexity is bm (branching factor b, depth m).
I Space complexity: can be implemented (depth-first) so that
space complexity is bm.
For chess, b ≈ 35, m ≈ 100 for “reasonable” games
I 10154 paths to explore
I infeasible
But do we need to explore every path?
Properties of minimax
Assume MAX always move to the state with the maximal minimax
value.
I Optimal: against an optimal opponent. Otherwise MAX will
do even better. There may, however, be better strategies
against suboptimal opponents.
I Time complexity: can be implemented (depth-first) so that
time complexity is bm (branching factor b, depth m).
I Space complexity: can be implemented (depth-first) so that
space complexity is bm.
For chess, b ≈ 35, m ≈ 100 for “reasonable” games
I 10154 paths to explore
I infeasible
But do we need to explore every path?
Removing redundant information: α-β-Pruning
If you know half-way through a calculation that it will succeed or
fail, then there is no point in doing the rest of it!
MAX
3 12 8
MIN 3
3
Removing redundant information: α-β-Pruning
If you know half-way through a calculation that it will succeed or
fail, then there is no point in doing the rest of it!
MAX
3 12 8
MIN 3
2
2
X X
3
Removing redundant information: α-β-Pruning
If you know half-way through a calculation that it will succeed or
fail, then there is no point in doing the rest of it!
MAX
3 12 8
MIN 3
2
2
X X
14
14
3
Removing redundant information: α-β-Pruning
If you know half-way through a calculation that it will succeed or
fail, then there is no point in doing the rest of it!
MAX
3 12 8
MIN 3
2
2
X X
14
14
5
5
3
Removing redundant information: α-β-Pruning
If you know half-way through a calculation that it will succeed or
fail, then there is no point in doing the rest of it!
MAX
3 12 8
MIN
3
3
2
2
X X
14
14
5
5
2
2
3
Cutoffs and Heuristics
I Cutoff search according to some cutoff test.
I Problem: utitilies are defined only at terminal states.
I Solution: Evaluate the pre-terminal leaf states using heuristic
evaluation function rather than using the actual utility
function.
Cutoff Value
Instead of MiniMaxV(s) we compute CutOffV(s).
Assume that we can compute a function Evaluation(s) which gives
us a utility value for any state s which we do not want explore
(every cutoff state).
Then define CutOffV(s) recursively:
CutoffV(s) =
Utility(s) s is Terminal
Evaluation(s) s is Cutoff
maxn∈Succ(s) CutoffV(n) s is MAX
minn∈Succ(s) CutoffV(n) s is MIN
Cutoff Value
Instead of MiniMaxV(s) we compute CutOffV(s).
Assume that we can compute a function Evaluation(s) which gives
us a utility value for any state s which we do not want explore
(every cutoff state).
Then define CutOffV(s) recursively:
CutoffV(s) =
Utility(s) s is Terminal
Evaluation(s) s is Cutoff
maxn∈Succ(s) CutoffV(n) s is MAX
minn∈Succ(s) CutoffV(n) s is MIN
Cutoff Value
Instead of MiniMaxV(s) we compute CutOffV(s).
Assume that we can compute a function Evaluation(s) which gives
us a utility value for any state s which we do not want explore
(every cutoff state).
Then define CutOffV(s) recursively:
CutoffV(s) =
Utility(s) s is Terminal
Evaluation(s) s is Cutoff
maxn∈Succ(s) CutoffV(n) s is MAX
minn∈Succ(s) CutoffV(n) s is MIN
Example: Chess (I)
I Assume MAX is white
I Assume each piece has the following material value:
I pawn = 1;
I knight = 3;
I bishop = 3;
I rook = 5;
I queen = 9;
I let w = sum of the value of white pieces
I let b = sum of the value of black pieces
Evaluation(s) =
w − b
w + b
Example: Chess (II)
I The previous evaluation function naively gave the same weight
to a piece regardless of its position on the board…
I Let Xi be the number of squares the ith piece attacks
Evaluation(s) = piece1value ∗ X1 + piece2value ∗ X2 + …
Game playing: summary
I Minimax algorithm (with α-β pruning) fundamental for game
playing.
I Not efficient enough for games such as chess, go, etc.
I Evaluation functions are needed to replace terminal states by
cutoff states.
I Various approaches to define evaluation function.
I Most successful approach: machine learning. Evaluate
positions using experience from previous games.
Game playing: summary
I Minimax algorithm (with α-β pruning) fundamental for game
playing.
I Not efficient enough for games such as chess, go, etc.
I Evaluation functions are needed to replace terminal states by
cutoff states.
I Various approaches to define evaluation function.
I Most successful approach: machine learning. Evaluate
positions using experience from previous games.
Game playing: summary
I Minimax algorithm (with α-β pruning) fundamental for game
playing.
I Not efficient enough for games such as chess, go, etc.
I Evaluation functions are needed to replace terminal states by
cutoff states.
I Various approaches to define evaluation function.
I Most successful approach: machine learning. Evaluate
positions using experience from previous games.
Game playing: summary
I Minimax algorithm (with α-β pruning) fundamental for game
playing.
I Not efficient enough for games such as chess, go, etc.
I Evaluation functions are needed to replace terminal states by
cutoff states.
I Various approaches to define evaluation function.
I Most successful approach: machine learning. Evaluate
positions using experience from previous games.
Game playing: summary
I Minimax algorithm (with α-β pruning) fundamental for game
playing.
I Not efficient enough for games such as chess, go, etc.
I Evaluation functions are needed to replace terminal states by
cutoff states.
I Various approaches to define evaluation function.
I Most successful approach: machine learning. Evaluate
positions using experience from previous games.