CS计算机代考程序代写 algorithm COMP111: Artificial Intelligence – Section 6. Adversarial search (Game playing)

COMP111: Artificial Intelligence – Section 6. Adversarial search (Game playing)

COMP111: Artificial Intelligence
Section 6. Adversarial search (Game playing)

Frank Wolter

Outline

We will look at how search can be applied to playing games

I Types of Games
I Perfect play:

I minimax algorithm
I α–β pruning

I Playing with limited resources (heuristics)

Games vs. search problems

I In search we make all moves. In games we play against an
unpredictable opponent:

I solution is a strategy specifying a move for every possible move
of the oponent.

I A method is needed for selecting good moves that stand a
good chance of achieving a winning state whatever the
opponent does!

I Because of combinatorial explosion, in practice we must
approximate using heuristics.

Games vs. search problems

I In search we make all moves. In games we play against an
unpredictable opponent:

I solution is a strategy specifying a move for every possible move
of the oponent.

I A method is needed for selecting good moves that stand a
good chance of achieving a winning state whatever the
opponent does!

I Because of combinatorial explosion, in practice we must
approximate using heuristics.

Games vs. search problems

I In search we make all moves. In games we play against an
unpredictable opponent:

I solution is a strategy specifying a move for every possible move
of the oponent.

I A method is needed for selecting good moves that stand a
good chance of achieving a winning state whatever the
opponent does!

I Because of combinatorial explosion, in practice we must
approximate using heuristics.

Types of Games

I In some games we have a fully observable environment. The
position is known completely. These are called games with
perfect information.

I Examples: chess, go, backgammon, monopoly.

I In others we have a partially observable environment. For
example, we cannot see the opponents cards. These are called
games with imperfect information.

I Examples: battleships, bridge, poker.

I Some games are deterministic: chess, go.

I Others have an element of chance: backgammon, monopoly,
bridge, poker

Types of Games

I In some games we have a fully observable environment. The
position is known completely. These are called games with
perfect information.

I Examples: chess, go, backgammon, monopoly.

I In others we have a partially observable environment. For
example, we cannot see the opponents cards. These are called
games with imperfect information.

I Examples: battleships, bridge, poker.

I Some games are deterministic: chess, go.

I Others have an element of chance: backgammon, monopoly,
bridge, poker

Types of Games

I In some games we have a fully observable environment. The
position is known completely. These are called games with
perfect information.

I Examples: chess, go, backgammon, monopoly.

I In others we have a partially observable environment. For
example, we cannot see the opponents cards. These are called
games with imperfect information.

I Examples: battleships, bridge, poker.

I Some games are deterministic: chess, go.

I Others have an element of chance: backgammon, monopoly,
bridge, poker

Types of Games

I In some games we have a fully observable environment. The
position is known completely. These are called games with
perfect information.

I Examples: chess, go, backgammon, monopoly.

I In others we have a partially observable environment. For
example, we cannot see the opponents cards. These are called
games with imperfect information.

I Examples: battleships, bridge, poker.

I Some games are deterministic: chess, go.

I Others have an element of chance: backgammon, monopoly,
bridge, poker

The games we consider

We consider special kinds of games

I Deterministic

I Two-player
I Zero-sum:

I the utility values at the end are equal and opposite
I example: one wins (+1) the other loses (−1).

I Perfect information

Problem Formulation

The search graph gives for every state the successor states
obtained by making a move. The set of goal states is replaced by a
utility function.

I Initial state sstart:
I Initial board position. Which player moves first.

I Successor function:
I provides for every state s and move the new state after the

move.

I Terminal test
I Determines when the game is over

I Utility function
I Numeric value for terminal states
I E.g. Chess +1, -1, 0
I E.g. Backgammon +192 to -192

Problem Formulation

The search graph gives for every state the successor states
obtained by making a move. The set of goal states is replaced by a
utility function.

I Initial state sstart:
I Initial board position. Which player moves first.

I Successor function:
I provides for every state s and move the new state after the

move.

I Terminal test
I Determines when the game is over

I Utility function
I Numeric value for terminal states
I E.g. Chess +1, -1, 0
I E.g. Backgammon +192 to -192

Problem Formulation

The search graph gives for every state the successor states
obtained by making a move. The set of goal states is replaced by a
utility function.

I Initial state sstart:
I Initial board position. Which player moves first.

I Successor function:
I provides for every state s and move the new state after the

move.

I Terminal test
I Determines when the game is over

I Utility function
I Numeric value for terminal states
I E.g. Chess +1, -1, 0
I E.g. Backgammon +192 to -192

Problem Formulation

The search graph gives for every state the successor states
obtained by making a move. The set of goal states is replaced by a
utility function.

I Initial state sstart:
I Initial board position. Which player moves first.

I Successor function:
I provides for every state s and move the new state after the

move.

I Terminal test
I Determines when the game is over

I Utility function
I Numeric value for terminal states
I E.g. Chess +1, -1, 0
I E.g. Backgammon +192 to -192

Problem Formulation

The search graph gives for every state the successor states
obtained by making a move. The set of goal states is replaced by a
utility function.

I Initial state sstart:
I Initial board position. Which player moves first.

I Successor function:
I provides for every state s and move the new state after the

move.

I Terminal test
I Determines when the game is over

I Utility function
I Numeric value for terminal states
I E.g. Chess +1, -1, 0
I E.g. Backgammon +192 to -192

Possible Development

X

O
O

X
X
OO

X
O

X
X
O

X

Possible Development

X

O
O

X
X
OO

X
O

X
X
O

X

Possible Development

XX
X
OO

X
O

X
X
OO

X
O

+1

Possible Development

XX
X
OO

X
O X

X
X
OO

X
O

X
X
OO

X
O

+1

Possible Development

X

XX
X
OO

X
O X

X
X
OO

X
O

X
X
OO

X
O

X
X
OO

X
O

+1

Possible Development

X

XX
X
OO

X
O X

X
X
OO

X
O

X
X
OO

X
O

X
X
OO

X
O

+1 +1

Possible Development

X

XX
X
OO

X
O X

X
X
OO

X
O

X
X
OO

X
O

X
X
OO

X
O

O
X

X
X
OO

X
O

+1 +1

Possible Development

X

X

O

X
X
OO

X
O X

X
X
OO

X
O

X
X
OO

X
O

X
X
OO

X
O

O
X

X
X
OO

X
O X

X
X
OO

X
O

+1 +1

Possible Development

X

X

O

X
X
OO

X
O X

X
X
OO

X
O

X
X
OO

X
O

X
X
OO

X
O

O
X

X
X
OO

X
O X

X
X
OO

X
O

+1 +1

−1

Possible Development

X

X

O

X

X
X
OO

X
O X

X
X
OO

X
O

X
X
OO

X
O

X
X
OO

X
O

O
X

X
X
OO

X
O X

X
X
OO

X
O

O
X

X
X
OO

X
O

+1 +1

−1

Possible Development

X

X

O

X

X
X
OO

X
O X

X
X
OO

X
O

X
X
OO

X
O

X
X
OO

X
O

O
X

X
X
OO

X
O X

X
X
OO

X
O

O
X

X
X
OO

X
O

+1 +1

+1

−1

Game Tree

I Each level labelled with player to move
I Each level represents a ply

I Half a turn

I Represents what happens with competing agents

Introducing MIN and MAX

MIN and MAX are two players:

I MAX wants to win (maximise utility)

I MIN wants MAX to lose (minimise utility for MAX)

I MIN is the opponent.

Both players will play to the best of their ability

I MAX wants a strategy for maximising utility assuming MIN
will do best to minimise MAX’s utility

I Consider minimax value of each state: the utility of a state
given that both players play optimally.

Introducing MIN and MAX

MIN and MAX are two players:

I MAX wants to win (maximise utility)

I MIN wants MAX to lose (minimise utility for MAX)

I MIN is the opponent.

Both players will play to the best of their ability

I MAX wants a strategy for maximising utility assuming MIN
will do best to minimise MAX’s utility

I Consider minimax value of each state: the utility of a state
given that both players play optimally.

Example Game Tree

XX
XX

X
X

X

XX

MAX (X)

MIN (O)

X X

O

O
OX O

O
O O

O OO

MAX (X)

X OX OX O X
X X

X
X

X X

MIN (O)

X O X X O X X O X

. . . . . . . . . . . .

. . .

. . .

. . .

TERMINAL
XX

−1 0 +1Utility

Minimax Value

I The utility (=minimax value) of a terminal state is given by
its utility already (as an input).

I The utility (=minimax value) of a MAX-state (when MAX
moves) is the maximum of the utilities of its successor states.

I The utility (=minimax value) of a MIN-state (when MIN
moves) is the minimum of the utilities of its successor states.

I Thus, we can compute the minimax value recursively starting
from the terminal states.

Formally, let Succ(s) denote the set of successors states of state s.
Define the function MinimaxV(s) recursively as follows:

MinimaxV(s) =



Utility(s) s is Terminal
maxn∈Succ(s) MinimaxV(n) MAX moves in s
minn∈Succ(s) MinimaxV(n) MIN moves in s

Minimax Value

I The utility (=minimax value) of a terminal state is given by
its utility already (as an input).

I The utility (=minimax value) of a MAX-state (when MAX
moves) is the maximum of the utilities of its successor states.

I The utility (=minimax value) of a MIN-state (when MIN
moves) is the minimum of the utilities of its successor states.

I Thus, we can compute the minimax value recursively starting
from the terminal states.

Formally, let Succ(s) denote the set of successors states of state s.
Define the function MinimaxV(s) recursively as follows:

MinimaxV(s) =



Utility(s) s is Terminal
maxn∈Succ(s) MinimaxV(n) MAX moves in s
minn∈Succ(s) MinimaxV(n) MIN moves in s

Minimax Value

I The utility (=minimax value) of a terminal state is given by
its utility already (as an input).

I The utility (=minimax value) of a MAX-state (when MAX
moves) is the maximum of the utilities of its successor states.

I The utility (=minimax value) of a MIN-state (when MIN
moves) is the minimum of the utilities of its successor states.

I Thus, we can compute the minimax value recursively starting
from the terminal states.

Formally, let Succ(s) denote the set of successors states of state s.
Define the function MinimaxV(s) recursively as follows:

MinimaxV(s) =



Utility(s) s is Terminal
maxn∈Succ(s) MinimaxV(n) MAX moves in s
minn∈Succ(s) MinimaxV(n) MIN moves in s

Minimax Value

I The utility (=minimax value) of a terminal state is given by
its utility already (as an input).

I The utility (=minimax value) of a MAX-state (when MAX
moves) is the maximum of the utilities of its successor states.

I The utility (=minimax value) of a MIN-state (when MIN
moves) is the minimum of the utilities of its successor states.

I Thus, we can compute the minimax value recursively starting
from the terminal states.

Formally, let Succ(s) denote the set of successors states of state s.
Define the function MinimaxV(s) recursively as follows:

MinimaxV(s) =



Utility(s) s is Terminal
maxn∈Succ(s) MinimaxV(n) MAX moves in s
minn∈Succ(s) MinimaxV(n) MIN moves in s

Minimax Value

I The utility (=minimax value) of a terminal state is given by
its utility already (as an input).

I The utility (=minimax value) of a MAX-state (when MAX
moves) is the maximum of the utilities of its successor states.

I The utility (=minimax value) of a MIN-state (when MIN
moves) is the minimum of the utilities of its successor states.

I Thus, we can compute the minimax value recursively starting
from the terminal states.

Formally, let Succ(s) denote the set of successors states of state s.

Define the function MinimaxV(s) recursively as follows:

MinimaxV(s) =



Utility(s) s is Terminal
maxn∈Succ(s) MinimaxV(n) MAX moves in s
minn∈Succ(s) MinimaxV(n) MIN moves in s

Minimax Value

I The utility (=minimax value) of a terminal state is given by
its utility already (as an input).

I The utility (=minimax value) of a MAX-state (when MAX
moves) is the maximum of the utilities of its successor states.

I The utility (=minimax value) of a MIN-state (when MIN
moves) is the minimum of the utilities of its successor states.

I Thus, we can compute the minimax value recursively starting
from the terminal states.

Formally, let Succ(s) denote the set of successors states of state s.
Define the function MinimaxV(s) recursively as follows:

MinimaxV(s) =



Utility(s) s is Terminal
maxn∈Succ(s) MinimaxV(n) MAX moves in s
minn∈Succ(s) MinimaxV(n) MIN moves in s

Minimax algorithm
I Calculate minimax value of each state using the equation

above starting from the terminal states.
I Game tree as minimax tree:

Minimax Tree

I MIN takes the minimal value from its successors.

I MAX takes the maximal value from its successors.

Consider

MAX

3 12 8 642 14 5 2

MIN

3

A
1

A
3

A
2

A
13

A
12

A
11

A
21

A
23

A
22

A
33

A
32

A
31

3 2 2

Properties of minimax

Assume MAX always move to the state with the maximal minimax
value.

I Optimal: against an optimal opponent. Otherwise MAX will
do even better. There may, however, be better strategies
against suboptimal opponents.

I Time complexity: can be implemented (depth-first) so that
time complexity is bm (branching factor b, depth m).

I Space complexity: can be implemented (depth-first) so that
space complexity is bm.

For chess, b ≈ 35, m ≈ 100 for “reasonable” games
I 10154 paths to explore

I infeasible

But do we need to explore every path?

Properties of minimax

Assume MAX always move to the state with the maximal minimax
value.

I Optimal: against an optimal opponent. Otherwise MAX will
do even better. There may, however, be better strategies
against suboptimal opponents.

I Time complexity: can be implemented (depth-first) so that
time complexity is bm (branching factor b, depth m).

I Space complexity: can be implemented (depth-first) so that
space complexity is bm.

For chess, b ≈ 35, m ≈ 100 for “reasonable” games
I 10154 paths to explore

I infeasible

But do we need to explore every path?

Properties of minimax

Assume MAX always move to the state with the maximal minimax
value.

I Optimal: against an optimal opponent. Otherwise MAX will
do even better. There may, however, be better strategies
against suboptimal opponents.

I Time complexity: can be implemented (depth-first) so that
time complexity is bm (branching factor b, depth m).

I Space complexity: can be implemented (depth-first) so that
space complexity is bm.

For chess, b ≈ 35, m ≈ 100 for “reasonable” games
I 10154 paths to explore

I infeasible

But do we need to explore every path?

Properties of minimax

Assume MAX always move to the state with the maximal minimax
value.

I Optimal: against an optimal opponent. Otherwise MAX will
do even better. There may, however, be better strategies
against suboptimal opponents.

I Time complexity: can be implemented (depth-first) so that
time complexity is bm (branching factor b, depth m).

I Space complexity: can be implemented (depth-first) so that
space complexity is bm.

For chess, b ≈ 35, m ≈ 100 for “reasonable” games
I 10154 paths to explore

I infeasible

But do we need to explore every path?

Removing redundant information: α-β-Pruning

If you know half-way through a calculation that it will succeed or
fail, then there is no point in doing the rest of it!

MAX

3 12 8

MIN 3

3

Removing redundant information: α-β-Pruning

If you know half-way through a calculation that it will succeed or
fail, then there is no point in doing the rest of it!

MAX

3 12 8

MIN 3

2

2

X X

3

Removing redundant information: α-β-Pruning

If you know half-way through a calculation that it will succeed or
fail, then there is no point in doing the rest of it!

MAX

3 12 8

MIN 3

2

2

X X
14

14

3

Removing redundant information: α-β-Pruning

If you know half-way through a calculation that it will succeed or
fail, then there is no point in doing the rest of it!

MAX

3 12 8

MIN 3

2

2

X X
14

14

5

5

3

Removing redundant information: α-β-Pruning

If you know half-way through a calculation that it will succeed or
fail, then there is no point in doing the rest of it!

MAX

3 12 8

MIN

3

3

2

2

X X
14

14

5

5

2

2

3

Cutoffs and Heuristics

I Cutoff search according to some cutoff test.

I Problem: utitilies are defined only at terminal states.

I Solution: Evaluate the pre-terminal leaf states using heuristic
evaluation function rather than using the actual utility
function.

Cutoff Value

Instead of MiniMaxV(s) we compute CutOffV(s).

Assume that we can compute a function Evaluation(s) which gives
us a utility value for any state s which we do not want explore
(every cutoff state).
Then define CutOffV(s) recursively:

CutoffV(s) =




Utility(s) s is Terminal
Evaluation(s) s is Cutoff
maxn∈Succ(s) CutoffV(n) s is MAX
minn∈Succ(s) CutoffV(n) s is MIN

Cutoff Value

Instead of MiniMaxV(s) we compute CutOffV(s).
Assume that we can compute a function Evaluation(s) which gives
us a utility value for any state s which we do not want explore
(every cutoff state).

Then define CutOffV(s) recursively:

CutoffV(s) =




Utility(s) s is Terminal
Evaluation(s) s is Cutoff
maxn∈Succ(s) CutoffV(n) s is MAX
minn∈Succ(s) CutoffV(n) s is MIN

Cutoff Value

Instead of MiniMaxV(s) we compute CutOffV(s).
Assume that we can compute a function Evaluation(s) which gives
us a utility value for any state s which we do not want explore
(every cutoff state).
Then define CutOffV(s) recursively:

CutoffV(s) =




Utility(s) s is Terminal
Evaluation(s) s is Cutoff
maxn∈Succ(s) CutoffV(n) s is MAX
minn∈Succ(s) CutoffV(n) s is MIN

Example: Chess (I)

I Assume MAX is white
I Assume each piece has the following material value:

I pawn = 1;
I knight = 3;
I bishop = 3;
I rook = 5;
I queen = 9;

I let w = sum of the value of white pieces

I let b = sum of the value of black pieces

Evaluation(s) =
w − b
w + b

Example: Chess (II)
I The previous evaluation function naively gave the same weight

to a piece regardless of its position on the board…
I Let Xi be the number of squares the ith piece attacks

Evaluation(s) = piece1value ∗ X1 + piece2value ∗ X2 + …

Game playing: summary

I Minimax algorithm (with α-β pruning) fundamental for game
playing.

I Not efficient enough for games such as chess, go, etc.

I Evaluation functions are needed to replace terminal states by
cutoff states.

I Various approaches to define evaluation function.

I Most successful approach: machine learning. Evaluate
positions using experience from previous games.

Game playing: summary

I Minimax algorithm (with α-β pruning) fundamental for game
playing.

I Not efficient enough for games such as chess, go, etc.

I Evaluation functions are needed to replace terminal states by
cutoff states.

I Various approaches to define evaluation function.

I Most successful approach: machine learning. Evaluate
positions using experience from previous games.

Game playing: summary

I Minimax algorithm (with α-β pruning) fundamental for game
playing.

I Not efficient enough for games such as chess, go, etc.

I Evaluation functions are needed to replace terminal states by
cutoff states.

I Various approaches to define evaluation function.

I Most successful approach: machine learning. Evaluate
positions using experience from previous games.

Game playing: summary

I Minimax algorithm (with α-β pruning) fundamental for game
playing.

I Not efficient enough for games such as chess, go, etc.

I Evaluation functions are needed to replace terminal states by
cutoff states.

I Various approaches to define evaluation function.

I Most successful approach: machine learning. Evaluate
positions using experience from previous games.

Game playing: summary

I Minimax algorithm (with α-β pruning) fundamental for game
playing.

I Not efficient enough for games such as chess, go, etc.

I Evaluation functions are needed to replace terminal states by
cutoff states.

I Various approaches to define evaluation function.

I Most successful approach: machine learning. Evaluate
positions using experience from previous games.