PowerPoint Presentation
CSE 3521: Adversarial Search
[Many slides are adapted from the UC Berkeley. CS188 Intro to AI at UC Berkeley and previous CSE 3521 course at OSU.]
Game tree pruning
Ignore portions of search tree that make no difference to final choice
Minimax example
12
8
5
2
3
2
14
4
6
3
Alpha-Beta Pruning
MAX
MIN
Terminal
3
3
12
8
2
3
2
14
14
5
5
2
2
2
3
Note: Only showed MIN pruning here
Minimax pruning
12
8
5
2
3
2
14
5
Can We Do Better?
0
5
-3
2
5
-2
3
2
-3
0
3
3
-5
0
1
-3
5
0
1
-5
5
3
2
-3
5
Alpha-Beta Pruning
MAX
MIN
Terminal
≤ 3
3
Beta
Alpha-Beta Pruning
MAX
MIN
Terminal
3
3
3
12
8
Final score
≤ 3
Alpha
Alpha-Beta Pruning
MAX
MIN
Terminal
3
2
3
12
8
2
3 > 2 !!!
No “final” score
But fine to tell MAX “2”
3
2
Alpha-Beta Pruning
MAX
MIN
Terminal
3
3
12
8
2
3
2
14
14
3 > 14 ?
Alpha-Beta Pruning
MAX
MIN
Terminal
3
3
12
8
2
3
2
14
14
5
5
3 > 5 ?
Example
Example
b = 2
2
The beta value of a MIN
node is an upper bound on
the final backed-up value.
It can never increase
Example
The beta value of a MIN
node is an upper bound on
the final backed-up value.
It can never increase
1
b = 1
2
Example
a = 1
The alpha value of a MAX
node is a lower bound on
the final backed-up value.
It can never decrease
1
b = 1
2
Example
a = 1
1
b = 1
2
-1
b = -1
Example
a = 1
1
b = 1
2
-1
b = -1
Search can be discontinued below
any MIN node whose beta value is
less than or equal to the alpha value
of one of its MAX ancestors
Alpha-Beta Pruning
3
-1
Pruning
-1
3
This part of the tree can’t have any effect on the value that will be backed up to the root
Alpha-Beta Pruning (Algorithm)
Explore the game tree to depth h (horizon) in depth-first manner
Back up alpha and beta values whenever possible
Prune branches that can’t lead to changing the final decision
Update the alpha/beta value of the parent of a node N when the search below N has been completed or discontinued
Discontinue the search below a MAX node N if its alpha value is the beta value of a MIN ancestor of N
Discontinue the search below a MIN node N if its beta value is the alpha value of a MAX ancestor of N
Alpha-Beta implementation (recursively)
def min-value(state , α, β):
initialize v = +∞
for each successor of state:
v = min(v, value(successor, α, β))
if v ≤ α return v
β = min(β, v)
return v
def max-value(state, α, β):
initialize v = -∞
for each successor of state:
v = max(v, value(successor, α, β))
if v ≥ β return v
α = max(α, v)
return v
α: MAX’s best option on path to root so far: the lowest score MAX can possibly get
β: MIN’s best option on path to root so far: the highest score MIN can possibly get
In the beginning, α= -∞ and β= +∞
Pick an order for two reasons: sequential processor and pruning
20
Alpha-Beta Pruning
MAX
MIN
Terminal
≤ 3
3
Beta
def min-value(state , α= -∞, β= +∞):
initialize v = +∞
for each successor of state:
v = min(v, value(successor, α, β))
if v ≤ α return v
β = min(β, v)
return v
def max-value(state, α= -∞, β= +∞):
initialize v = -∞
for each successor of state:
v = max(v, value(successor, α, β))
if v ≥ β return v
α = max(α, v)
return v
Alpha-Beta Pruning
MAX
MIN
Terminal
3
3
3
12
8
Final score
≤ 3
Alpha
def min-value(state , α= -∞, β= 3):
initialize v = +∞
for each successor of state:
v = min(v, value(successor, α, β))
if v ≤ α return v
β = min(β, v)
return v
def max-value(state, α= -∞, β= +∞):
initialize v = -∞
for each successor of state:
v = max(v, value(successor, α, β))
if v ≥ β return v
α = max(α, v)
return v
Alpha-Beta Pruning
MAX
MIN
Terminal
3
2
3
12
8
2
3 > 2 !!!
No “final” score
But fine to tell MAX “2”
3
2
def min-value(state , α= 3, β= +∞):
initialize v = +∞
for each successor of state:
v = min(v, value(successor, α, β))
if v ≤ α return v
β = min(β, v)
return v
def max-value(state, α= 3, β= +∞):
initialize v = -∞
for each successor of state:
v = max(v, value(successor, α, β))
if v ≥ β return v
α = max(α, v)
return v
Alpha-Beta Pruning
MAX
MIN
Terminal
3
3
12
8
2
3
2
14
14
3 > 14 ?
def min-value(state , α= 3, β= +∞):
initialize v = +∞
for each successor of state:
v = min(v, value(successor, α, β))
if v ≤ α return v
β = min(β, v)
return v
def max-value(state, α= 3, β= +∞):
initialize v = -∞
for each successor of state:
v = max(v, value(successor, α, β))
if v ≥ β return v
α = max(α, v)
return v
Alpha-Beta Pruning
MAX
MIN
Terminal
3
3
12
8
2
3
2
14
14
5
5
3 > 5 ?
def min-value(state , α= 3, β= 14):
initialize v = +∞
for each successor of state:
v = min(v, value(successor, α, β))
if v ≤ α return v
β = min(β, v)
return v
def max-value(state, α= 3, β= +∞):
initialize v = -∞
for each successor of state:
v = max(v, value(successor, α, β))
if v ≥ β return v
α = max(α, v)
return v
Alpha-Beta Pruning
MAX
MIN
Terminal
3
3
12
8
2
3
2
14
14
5
5
2
2
2
3
Note: Only showed MIN pruning here
def min-value(state , α= 3, β= 5):
initialize v = +∞
for each successor of state:
v = min(v, value(successor, α, β))
if v ≤ α return v
β = min(β, v)
return v
def max-value(state, α= 3, β= +∞):
initialize v = -∞
for each successor of state:
v = max(v, value(successor, α, β))
if v ≥ β return v
α = max(α, v)
return v
Another Example
0
5
-3
2
5
-2
3
2
-3
0
3
3
-5
0
1
-3
5
0
1
-5
5
3
2
-3
5
Example
0
5
-3
2
5
-2
3
2
-3
0
3
3
-5
0
1
-3
5
0
1
-5
5
3
2
-3
5
0
Example
0
5
-3
2
5
-2
3
2
-3
0
3
3
-5
0
1
-3
5
0
1
-5
5
3
2
-3
5
0
0
Example
0
5
-3
2
5
-2
3
2
-3
0
3
3
-5
0
1
-3
5
0
1
-5
5
3
2
-3
5
0
0
-3
Example
0
5
-3
2
5
-2
3
2
-3
0
3
3
-5
0
1
-3
5
0
1
-5
5
3
2
-3
5
0
0
-3
Example
0
5
-3
2
5
-2
3
2
-3
0
3
3
-5
0
1
-3
5
0
1
-5
5
3
2
-3
5
0
0
0
-3
Example
0
5
-3
2
5
-2
3
2
-3
0
3
3
-5
0
1
-3
5
0
1
-5
5
3
2
-3
5
0
0
0
-3
3
3
Example
0
5
-3
2
5
-2
3
2
-3
0
3
3
-5
0
1
-3
5
0
1
-5
5
3
2
-3
5
0
0
0
-3
3
3
Example
0
5
-3
2
5
-2
3
2
-3
0
3
3
-5
0
1
-3
5
0
1
-5
5
3
2
-3
5
0
0
0
0
-3
3
3
0
Example
0
5
-3
2
5
-2
3
2
-3
0
3
3
-5
0
1
-3
5
0
1
-5
5
3
2
-3
5
0
0
0
0
-3
3
3
0
5
Example
0
5
-3
2
5
-2
3
2
-3
0
3
3
-5
0
1
-3
5
0
1
-5
5
3
2
-3
5
0
0
0
0
-3
3
3
0
2
2
Example
0
5
-3
2
5
-2
3
2
-3
0
3
3
-5
0
1
-3
5
0
1
-5
5
3
2
-3
5
0
0
0
0
-3
3
3
0
2
2
Example
0
5
-3
2
5
-2
3
2
-3
0
3
3
-5
0
1
-3
5
0
1
-5
5
3
2
-3
5
0
0
0
0
-3
3
3
0
2
2
2
2
Example
0
5
-3
2
5
-2
3
2
-3
0
3
3
-5
0
1
-3
5
0
1
-5
5
3
2
-3
5
0
0
0
0
-3
3
3
0
2
2
2
2
Example
0
5
-3
2
5
-2
3
2
-3
0
3
3
-5
0
1
-3
5
0
1
-5
5
3
2
-3
5
0
0
0
0
-3
3
3
0
2
2
2
2
0
Example
0
5
-3
2
5
-2
3
2
-3
0
3
3
-5
0
1
-3
5
0
1
-5
5
3
2
-3
5
0
0
0
0
-3
3
3
0
2
2
2
2
5
0
Example
0
5
-3
2
5
-2
3
2
-3
0
3
3
-5
0
1
-3
5
0
1
-5
5
3
2
-3
5
0
0
0
0
-3
3
3
0
2
2
2
2
1
1
0
Example
0
5
-3
2
5
-2
3
2
-3
0
3
3
-5
0
1
-3
5
0
1
-5
5
3
2
-3
5
0
0
0
0
-3
3
3
0
2
2
2
2
1
1
-3
0
Example
0
5
-3
2
5
-2
3
2
-3
0
3
3
-5
0
1
-3
5
0
1
-5
5
3
2
-3
5
0
0
0
0
-3
3
3
0
2
2
2
2
1
1
-3
0
Example
0
5
-3
2
5
-2
3
2
-3
0
3
3
-5
0
1
-3
5
0
1
-5
5
3
2
-3
5
0
0
0
0
-3
3
3
0
2
2
2
2
1
1
-3
1
1
0
Example
0
5
-3
2
5
-2
3
2
-3
0
3
3
-5
0
1
-3
5
0
1
-5
5
3
2
-3
5
0
0
0
0
-3
3
3
0
2
2
2
2
1
1
-3
1
1
-5
0
Example
0
5
-3
2
5
-2
3
2
-3
0
3
3
-5
0
1
-3
5
0
1
-5
5
3
2
-3
5
0
0
0
0
-3
3
3
0
2
2
2
2
1
1
-3
1
1
-5
0
Example
0
5
-3
2
5
-2
3
2
-3
0
3
3
-5
0
1
-3
5
0
1
-5
5
3
2
-3
5
0
0
0
0
-3
3
3
0
2
2
2
2
1
1
-3
1
1
-5
-5
-5
0
Example
0
5
-3
2
5
-2
3
2
-3
0
3
3
-5
0
1
-3
5
0
1
-5
5
3
2
-3
5
0
0
0
0
-3
3
3
0
2
2
2
2
1
1
-3
1
1
-5
-5
-5
0
Example
0
5
-3
2
5
-2
3
2
-3
0
3
3
-5
0
1
-3
5
0
1
-5
5
3
2
-3
5
0
0
0
0
-3
3
3
0
2
2
2
2
1
1
-3
1
1
-5
-5
-5
0
1
Example
0
5
-3
2
5
-2
3
2
-3
0
3
3
-5
0
1
-3
5
0
1
-5
5
3
2
-3
5
0
0
0
0
-3
3
3
0
2
2
2
2
1
1
-3
1
1
-5
-5
-5
2
2
2
2
1
1
Example
0
5
-3
2
5
-2
3
2
-3
0
3
3
-5
0
1
-3
5
0
1
-5
5
3
2
-3
5
0
0
0
0
-3
3
3
0
2
2
2
2
1
1
-3
1
1
-5
-5
-5
1
2
2
2
2
1
How much do we gain?
Consider these two cases:
3
a = 3
-1
b=-1
(4)
3
a = 3
4
b=4
-1
Alpha-Beta Pruning properties
This pruning has no effect on minimax value computed for the root!
Values of intermediate nodes might be wrong
Important: children of the root may have the wrong values
Good child ordering improves effectiveness of pruning
With “perfect ordering”:
Time complexity drops to O(bm/2) [Knuth and Moore 1975]
Doubles solvable depth!
Full search of, e.g., chess, is still hopeless…
This is a simple example of meta-reasoning (computing about what to compute)
10
10
0
max
min
55
Exercise-1 (adversarial)
Answer Exercise-1 (adversarial)
4
8
8
Exercise-1 (Alpha-Beta)
Answer Exercise-1 (Alpha-Beta)
4
8
8
Exercise-2 (adversarial)
Answer Exercise-2 (adversarial)
10
100
2
20
2
10
10
Exercise-2 (Alpha-Beta)
Answer Exercise-2 (Alpha-Beta)
10
100
2
-∞
2
10
10
Uncertain Outcomes
Worst Case vs. Average Case
10
10
9
100
max
min
Idea: Uncertain outcomes controlled by chance, thus not adversary!
At least 10, or otherwise just rather dead; or just 100, or otherwise rather dead; that’s yet something different than minimax or expectimax
65
Reminder: Probabilities
A random variable represents an event whose outcome is unknown
A probability distribution is an assignment of weights to outcomes
Example: Traffic on freeway
Random variable: T = whether there’s traffic
Outcomes: T in {none, light, heavy}
Distribution: P(T=none) = 0.25, P(T=light) = 0.50, P(T=heavy) = 0.25
Some laws of probability (more later):
Probabilities are always non-negative
Probabilities over all possible outcomes sum to one
As we get more evidence, probabilities may change:
P(T=heavy) = 0.25, P(T=heavy | Hour=8am) = 0.60
We’ll talk about methods for reasoning and updating probabilities later
0.25
0.50
0.25
The expected value of a function of a random variable is the average, weighted by the probability distribution over outcomes
Example: How long to get to the airport?
Reminder: Expectations
0.25
0.50
0.25
Probability:
20 min
30 min
60 min
Time:
35 min
x
x
x
+
+
67
Expectimax Search
Why wouldn’t we know what the result of an action will be?
Explicit randomness: flip a coin or rolling dice
Unpredictable opponents: the ghosts respond randomly
Actions can fail: when moving a robot, wheels might slip
Values should now reflect average-case (expectimax) outcomes, not worst-case (minimax) outcomes
Expectimax search: compute the average score under optimal play
Max nodes as in minimax search
Chance nodes are like min nodes, but the outcome is uncertain
Calculate their expected utilities
I.e., take weighted average (expectation) of children
10
4
5
7
max
chance
10
10
9
100
68
Expectimax pseudocode
def value(state):
if the state is a terminal state: return the state’s utility
if the next agent is MAX: return max-value(state)
if the next agent is EXP: return exp-value(state)
def exp-value(state):
initialize v = 0
for each successor of state:
p = probability(successor)
v += p * value(successor)
return v
def max-value(state):
initialize v = -∞
for each successor of state:
v = max(v, value(successor))
return v
Pick an order for two reasons: sequential processor and pruning
69
Expectimax pseudocode
def exp-value(state):
initialize v = 0
for each successor of state:
p = probability(successor)
v += p * value(successor)
return v
5
7
8
24
-12
1/2
1/3
1/6
v = (1/2) (8) + (1/3) (24) + (1/6) (-12) = 10
Expectimax example
12
9
6
0
3
2
15
4
6
Expectimax pruning?
12
9
3
2
Depth-Limited Expectimax
…
…
492
362
…
400
300
Estimate of true expectimax value (which would require a lot of work to compute)
73
Expectimax: Coin Flip
Expected value = P(heads) * MIN(heads) +P(tails) * MIN(tails)
74
Evaluation function meaning
A1 is best move
A2 is best move
2 outcomes with probabilities {.9, .1}
Modeling Assumptions
The dangers of optimism and pessimism
Dangerous Optimism
Assuming chance when the world is adversarial
Dangerous Pessimism
Assuming the worst case when it’s not likely
77
Uncertainty everywhere
Not just for games of chance!
I’m sick: will I sneeze this minute?
Email contains “FREE!”: is it spam?
Tooth hurts: have cavity?
60 min enough to get to the airport?
Robot rotated wheel three times; how far did it advance?
Safe to cross street? (Look both ways!)
Sources of uncertainty in random variables:
Inherently random process (dice, etc.)
Insufficient or weak evidence
Ignorance of underlying processes
Unmodeled variables
The world’s just noisy – it doesn’t behave according to plan!
Compared to fuzzy logic, which has degrees of truth, rather than just degrees of belief (Talk about next)
/docProps/thumbnail.jpeg