PowerPoint Presentation
CSE 3521: Adversarial Search
[Many slides are adapted from the UC Berkeley. CS188 Intro to AI at UC Berkeley and previous CSE 3521 course at OSU.]
Game Playing Agents
Build an agent to play:
Tic-Tac-Toe
Checkers
Chess
Go
Poker
For centuries used by humans to exert intelligence
Recent success in building game programs that challenge human supremacy
Zero-Sum Games
Agents have opposite utilities (values on outcomes)
Let us think of a single value that one maximizes and the other minimizes
Adversarial, pure competition
General Games
Agents have independent utilities (values on outcomes)
Cooperation, indifference, competition, and more are all possible
Types
Many kinds of games!
Axes:
Deterministic or stochastic?
One, two, or more players?
Zero sum?
Perfect information (can you see the state)?
Want algorithms for calculating a strategy (i.e., agent function, policy) which recommends a move (action) from each state
Game-playing agents
4
Two-player games
In many games someone wins, someone loses
These types of games are adversarial
Player A will try to maximize the chance of A winning
Player B will try to maximize the chance of B winning
Completely adversarial: zero-sum game
5
A
B
Zero-sum game: Total payoff to all players is always the same. So, for chess there is one winner or a draw. So the zero-sum game is {(1, 0), (0, 1), {½, ½}).
5
Can We Use Search?
Tic-Tac-Toe
X goes first; trade X/O between games
What strategies do you use?
How do you choose your first move?
What would a bad first move be?
Do you look ahead?
7
Tic-Tac-Toe
Two-player
Turn-taking
Fully observable
Zero-sum
Time-constrained game?
Adversarial Search: Game Tree
9
Agent
Opp
Agent
You choose moves in red
Opponent in blue
You don’t have complete control
Adversarial Search: Game Tree
Agent generates a new Game Tree for each turn
Successor function
Next move
Agent is called MAX
Opponent is called MIN
Assume MAX plays the first move (initial state)
Terminal Test
Is a state a win/loss/draw for MAX
All states are fully observable
Uncertainty is caused by the action of another
MAX wants MIN to lose (and vice versa)
Two player game tree
11
Adversarial Search: Game Tree
No plan exists that guarantees success for either player
At each turn, the choice of which action to perform must be made within a specified time limit
The state space is enormous for difficult games (Chess, Go, etc.)
Only a tiny fraction of this space can be explored within the time limit
The branching factor and the depth of the terminal states are large
Chess
Number of states: ~1040
Branching factor: ~35
Number of total moves in a game: ~100
Adversarial Search: Game Tree
Diagram does not show all symmetries
MAX’s play
MIN’s play
Terminal state
(win for MAX)
MIN nodes
MAX nodes
Adversarial Search: Game Tree
Agent’s turn
Build a game tree from the current state to a maximal depth h (called the horizon) so that it is feasible within the time limit
Evaluate the states at the leaf nodes
How???
turn 1
turn 2
turn 3
Agent choose moves in red
Opponent in blue
Evaluation Function
Evaluation function e(s): state s scalar
A utility with respect to the agent
A heuristic function that estimates how favorable state s is for MAX (agent)
e(s) > 0: s is favorable to MAX
e(s) < 0: s is favorable to MIN
e(s) = 0: s is neutral
If the agent is playing X which states are favorable to MAX?
Evaluation Function: Example Tic-Tac-Toe
A possible heuristic function:
e(s) = # rows/columns/diagonals open for MAX –
# rows/columns/diagonals open for MIN
8-8 = 0
6-4 = 2
3-3 = 0
Evaluation Function
Usually a weighted sum of “features”:
What features can you define for chess?
Number of pieces of each type
Number of possible moves
Number of squares controlled
Evaluation functions
Evaluation functions score non-terminals in depth-limited search
Ideal function: returns the actual minimax value of the position
In practice: typically weighted linear sum of feature functions:
e.g. f1(s) = (num white queens – num black queens), etc.
Adversarial Search (Minimax)
Deterministic, zero-sum games:
Tic-tac-toe, chess, checkers
One player maximizes result
The other minimizes result
Minimax search:
A state-space search tree
Players alternate turns
Compute each node’s minimax value: the best achievable utility against a rational (optimal) adversary
Assumes the opponent also is using the same game tree and evaluation function
8
2
5
6
max
min
2
5
5
Terminal values:
part of the game
Minimax values:
computed recursively
19
Example
max
min
Example
max
min
Ph-1
Example
10
100
2
20
2
10
10
max
min
Ph-2
Example
10
100
2
20
2
10
10
max
min
You assume the opponent will act rationally
Ph-3
Example: Tic-Tac-Toe at horizon (h) = 2
6-5=1
5-6=-1
5-5=0
5-5=0
6-5=1
5-5=1
4-5=-1
5-6=-1
6-4=2
5-4=1
6-6=0
4-6=-2
-1
-2
1
1
Best move
Example: Continued
0
1
1
1
3
2
1
1
2
1
0
1
1
0
0
2
0
1
1
1
2
2
2
3
1
2
Minimax implementation (recursively using DFS)
def min-value(state):
initialize v = +∞
for each successor of state:
v = min(v, max-value(successor))
return v
def max-value(state):
initialize v = -∞
for each successor of state:
v = max(v, min-value(successor))
return v
26
Minimax implementation (recursively using DFS)
def value(state):
if the state is a terminal state: return the state’s utility
if the next agent is MAX: return max-value(state)
if the next agent is MIN: return min-value(state)
def min-value(state):
initialize v = +∞
for each successor of state:
v = min(v, value(successor))
return v
def max-value(state):
initialize v = -∞
for each successor of state:
v = max(v, value(successor))
return v
Pick an order for two reasons: sequential processor and pruning
27
Minimax example
28
4
10
5
3
1
16
6
2
11
MAX
MIN
a1
a2
a3
Minimax example
29
4
10
5
3
1
16
6
2
11
MAX
MIN
a1
a2
a3
Search in depth-first manner: O(bm) time
Minimax example
30
4
10
5
3
1
16
6
2
11
MAX
MIN
a1
a2
a3
Minimax example
31
4
10
5
3
1
16
6
2
11
MAX
MIN
a1
a2
a3
Minimax example
32
4
10
5
3
1
16
6
2
11
MAX
MIN
a1
a2
a3
4
Minimax example
33
4
10
5
3
1
16
6
2
11
MAX
MIN
a1
a2
a3
4
1
Minimax example
34
4
10
5
3
1
16
6
2
11
MAX
MIN
a1
a2
a3
4
1
2
Minimax example
35
4
10
5
3
1
16
6
2
11
MAX
MIN
a1
a2
a3
4
1
2
4
Minimax example
36
4
10
5
3
1
16
6
2
11
MAX
MIN
a1
a2
a3
4
1
2
4
Minimax efficiency
How efficient is minimax?
Just like (exhaustive) DFS: “for each successor”
Time: O(bm)
Space: O(bm)
Example: For chess, b 35, m 100
Exact solution is completely infeasible
But do we need to explore the whole tree?
Resource limits
The search process: Don’t know the middle nodes values before reaching the terminal states
Problem: In realistic games, we cannot search to leaves!
Solution: Depth-limited search
Instead, search only to a limited depth in the tree
Replace terminal utilities with an evaluation function for non-terminal positions (like heuristics)
Example:
Suppose we have 100 seconds, and can explore 10K nodes/sec
So can check 1M nodes per move
- reaches about depth 8 – decent chess program
Guarantee of optimal play is gone
More plies makes a BIG difference
?
?
?
?
-1
-2
4
9
4
min
max
-2
4
Remember: we perform the entire one-round of minimax search mainly to determine just the next step!
Depth matters
Evaluation functions are always imperfect
The deeper in the tree the evaluation function is buried, the less the quality of the evaluation function matters
An important example of the tradeoff between complexity of evaluation functions and complexity of computation
å
n
ii
i=1
e(s)=wf(s)
/docProps/thumbnail.jpeg