程序代写代做代考 algorithm go game html CIS 471/571(Fall 2020): Introduction to Artificial Intelligence

CIS 471/571(Fall 2020): Introduction to Artificial Intelligence
Lecture 6: Adversarial Search
Thanh H. Nguyen
Source: http://ai.berkeley.edu/home.html

Reminders
§ Project 2:
§Deadline: Oct 27th, 2020
§ Written assignment 2: §Deadline: Oct 24th, 2020
Thanh H. Nguyen
10/13/20
2

Adversarial Games

Types of Games
§Many different kinds of games!
§ Axes:
§Deterministic or stochastic?
§One, two, or more players?
§Zero sum?
§ Perfect information (can you see the state)?
§Want algorithms for calculating a strategy (policy) which recommends a move from each state

Deterministic Games
§Many possible formalizations, one is: §States: S (start at s0)
§Players: P={1…N} (usually take turns) §Actions: A (may depend on player / state) §Transition Function: SxA ® S §Terminal Test: S ® {t,f}
§Terminal Utilities: SxP ® R
§Solution for a player is a policy: S ® A

Zero-Sum Games
§Zero-Sum Games
§ Agents have opposite utilities (values on
§General Games
§ Agents have independent utilities (values
outcomes)
on outcomes)
§ Lets us think of a single value that one maximizes and the other minimizes
§ Adversarial, pure competition Thanh H. Nguyen
§ Cooperation, indifference, competition, and more are all possible
§ More later on non-zero-sum games 10/13/20
6

Adversarial Search

Single-Agent Trees
8
20…26…46

Value of a State
Value of a state: The best achievable outcome (utility) from that state
Non-Terminal States:
8
20…26…46
Terminal States:

Adversarial Game Trees
-20 -8 … -18 -5 … -10 +4 -20 +8

Minimax Values
States Under Agent’s Control: States Under Opponent’s Control:
-8 -5 -10 +8
Terminal States:

Tic-Tac-Toe Game Tree

Adversarial Search (Minimax)
§ Deterministic, zero-sum games: § Tic-tac-toe, chess, checkers
§ One player maximizes result § The other minimizes result
§ Minimax search:
§ A state-space search tree
§ Players alternate turns
§ Compute each node’s minimax value: the best achievable utility against a rational (optimal) adversary
Minimax values: computed recursively
max
2 5min
8256
Terminal values: part of the game
5

Minimax Implementation
def value(state):
if the state is a terminal state: return the state’s utility if the next agent is MAX: return max-value(state)
if the next agent is MIN: return min-value(state)
def max-value(state):
initialize v = -∞
for each successor of state:
v = max(v, value(successor)) return v
def min-value(state):
initialize v = +∞
for each successor of state:
v = min(v, value(successor)) return v

Minimax Example
MAX
MIN
3 12 8 2 4 6 14 5 2

Minimax Properties
max
min
10 10 9 100
Optimal against a perfect player. Otherwise?

Minimax Efficiency
§How efficient is minimax? §Just like (exhaustive) DFS §Time: O(bm)
§Space: O(bm)
§Example: For chess, b » 35, m » 100
§ Exact solution is completely infeasible
§ But, do we need to explore the whole tree?

Resource Limits

Game Tree Pruning

Minimax Example
MAX
MIN
3 12 8 2 4 6 14 5 2

Minimax Pruning
MAX
MIN
3 12 8 2 14 5 2

Alpha-Beta Pruning
§Alpha α: value of the best choice so far for MAX (lower bound of Max utility)
§Beta β: value of the best choice so far for MIN (upper bound of Min utility)
§ Expanding at MAX node n: update α
§ If a child of n has value greater than β, stop expanding the MAX node n
§ Reason: MIN parent of n would not choose the action which leads to n
§At MIN node n: update β
§ If a child of n has value less than α, stop expanding the MIN node n
§ Reason: MAX parent of n would not choose the action which leads to n
Thanh H. Nguyen
10/13/20
22

Alpha-Beta Implementation
def value(state, α, β):
if the state is a terminal state: return the state’s utility if the next agent is MAX: return max-value(state, α, β) if the next agent is MIN: return min-value(state, α, β)
def max-value(state, α, β):
initialize v = -∞
for each successor of state:
v = max(v, value(successor, α, β)) if v ≥ β return v
α = max(α, v)
return v
def min-value(state , α, β):
initialize v = +∞
for each successor of state:
v = min(v, value(successor, α, β)) if v ≤ α return v
β = min(β, v)
return v

Alpha-Beta Pruning Properties
§This pruning has no effect on minimax value computed for the root!
§Values of intermediate nodes might be wrong max § Important: children of the root may have the wrong value
§ So the most naïve version won’t let you do action selection
§Good child ordering improves effectiveness of pruning
min
10 9 0

Alpha-Beta Quiz
min
max [𝛼, 𝛽]=[−∞, +∞]

Alpha-Beta Quiz
min
[𝛼, 𝛽]=[−∞, +∞]
max [𝛼, 𝛽]=[−∞, +∞]

Alpha-Beta Quiz
max [𝛼, 𝛽]=[−∞, +∞]
min
[𝛼, 𝛽]=[−∞, 10]

Alpha-Beta Quiz
max [𝛼, 𝛽]=[−∞, +∞]
min
[𝛼, 𝛽]=[−∞, 8]

Alpha-Beta Quiz
max [𝛼, 𝛽]=[−∞, +∞]
min
8
[𝛼, 𝛽]=[−∞, 8]

Alpha-Beta Quiz
max [𝛼, 𝛽]=[8, +∞]
min
8
[𝛼, 𝛽]=[−∞, 8]

Alpha-Beta Quiz
max [𝛼, 𝛽]=[8, +∞]
min
8 [𝛼, 𝛽]=[−∞, 8]
[𝛼, 𝛽]=[8, +∞]

Alpha-Beta Quiz
max [𝛼, 𝛽]=[8, +∞]
min
8 [𝛼, 𝛽]=[−∞, 8]
[𝛼, 𝛽]=[8, +∞]

Alpha-Beta Quiz
max [𝛼, 𝛽]=[8, +∞]
min
8 [𝛼, 𝛽]=[−∞, 8] 4 [𝛼, 𝛽]=[8, +∞]

Alpha-Beta Quiz 2
max
min
max

Alpha-Beta Quiz 2
max
[𝛼, 𝛽]=[−∞, +∞]
min
[𝛼, 𝛽]=[−∞, +∞]
max
[𝛼, 𝛽]=[−∞, +∞]

Alpha-Beta Quiz 2
max
[𝛼, 𝛽]=[−∞, +∞]
min
[𝛼, 𝛽]=[−∞, +∞]
max
[𝛼, 𝛽]=[−∞, +∞]

Alpha-Beta Quiz 2
max
[𝛼, 𝛽]=[10, +∞]
min
[𝛼, 𝛽]=[−∞, +∞]
max
[𝛼, 𝛽]=[−∞, +∞]

Alpha-Beta Quiz 2
max
[𝛼, 𝛽]=[10, +∞]
min
[𝛼, 𝛽]=[−∞, +∞]
max
[𝛼, 𝛽]=[−∞, +∞]

Alpha-Beta Quiz 2
max
10 [𝛼, 𝛽]=[10, +∞]
min
[𝛼, 𝛽]=[−∞, +∞]
max
[𝛼, 𝛽]=[−∞, +∞]

Alpha-Beta Quiz 2
max
10 [𝛼, 𝛽]=[10, +∞]
min
[𝛼, 𝛽]=[−∞, 10]
max
[𝛼, 𝛽]=[−∞, +∞]

Alpha-Beta Quiz 2
max
[𝛼, 𝛽]=[−∞, +∞]
min
[𝛼, 𝛽]=[−∞, 10]
max
10 [𝛼, 𝛽]=[10, +∞] [𝛼, 𝛽]=[−∞, 10]

Alpha-Beta Quiz 2
max
10 [𝛼, 𝛽]=[10, +∞] [𝛼, 𝛽]=[−∞, 10]
min
[𝛼, 𝛽]=[−∞, 10]
max
[𝛼, 𝛽]=[−∞, +∞]

Alpha-Beta Quiz 2
min
[𝛼, 𝛽]=[−∞, 10]
max
[𝛼, 𝛽]=[−∞, +∞]
max
10[𝛼,𝛽]=[10,+∞] 100[𝛼,𝛽]=[−∞,10]

Alpha-Beta Quiz 2
max
[𝛼, 𝛽]=[−∞, +∞]
max
10 [𝛼, 𝛽]=[−∞, 10] 10[𝛼,𝛽]=[10,+∞] 100[𝛼,𝛽]=[−∞,10]
min

Alpha-Beta Quiz 2
max
[𝛼, 𝛽]=[10, +∞]
10 [𝛼, 𝛽]=[−∞, 10] [𝛼, 𝛽]=[10, +∞] max 10 [𝛼, 𝛽]=[10, +∞] 100 [𝛼, 𝛽]=[−∞, 10] [𝛼, 𝛽]=[10, +∞]
min

Alpha-Beta Quiz 2
max
[𝛼, 𝛽]=[10, +∞]
10 [𝛼, 𝛽]=[−∞, 10] [𝛼, 𝛽]=[10, +∞] max 10 [𝛼, 𝛽]=[10, +∞] 100 [𝛼, 𝛽]=[−∞, 10] [𝛼, 𝛽]=[10, +∞]
min

Alpha-Beta Quiz 2
max
[𝛼, 𝛽]=[10, +∞]
10 [𝛼, 𝛽]=[−∞, 10] [𝛼, 𝛽]=[10, +∞] max 10 [𝛼, 𝛽]=[10, +∞] 100 [𝛼, 𝛽]=[−∞, 10] [𝛼, 𝛽]=[10, +∞]
min

Alpha-Beta Quiz 2
max
[𝛼, 𝛽]=[10, +∞]
10 [𝛼, 𝛽]=[−∞, 10] [𝛼, 𝛽]=[10, +∞] max 10 [𝛼, 𝛽]=[10, +∞] 100 [𝛼, 𝛽]=[−∞, 10] 2 [𝛼, 𝛽]=[10, +∞]
min

Alpha-Beta Quiz 2
max
[𝛼, 𝛽]=[10, +∞]
min
10 [𝛼, 𝛽]=[−∞, 10] 2 [𝛼, 𝛽]=[10, +∞]
max 10 [𝛼, 𝛽]=[10, +∞] 100 [𝛼, 𝛽]=[−∞, 10] 2 [𝛼, 𝛽]=[10, +∞]

Alpha-Beta Quiz 2
max
[𝛼, 𝛽]=[10, +∞]
min
10
10 [𝛼, 𝛽]=[−∞, 10] 2
[𝛼, 𝛽]=[10, +∞]
max 10 [𝛼, 𝛽]=[10, +∞] 100 [𝛼, 𝛽]=[−∞, 10] 2 [𝛼, 𝛽]=[10, +∞]

Resource Limits

Resource Limits
§ Problem: In realistic games, cannot search to leaves!
§ Solution: Depth-limited search
§ Instead, search only to a limited depth in the tree
§ Replace terminal utilities with an evaluation function for non- terminal positions
§ Example:
§ Suppose we have 100 seconds, can explore 10K nodes / sec § So can check 1M nodes per move
§ a-b reaches about depth 8 – decent chess program
§ Guarantee of optimal play is gone
§ More plies makes a BIG difference
§ Use iterative deepening for an anytime algorithm
4
-2 4
max min
-1 -2 4 9
????

Why Pacman Starves
§A danger of replanning agents!
§ He knows his score will go up by eating the dot now (west, east)
§ He knows his score will go up just as much by eating the dot later (east, west)
§ There are no point-scoring opportunities after eating the dot (within the horizon, two here)
§ Therefore, waiting seems just as good as eating: he may go east, then back west in the next round of replanning!

Evaluation Functions

Evaluation Functions
§ Evaluation functions score non-terminals in depth-limited search
§ Ideal function: returns the actual minimax value of the position § In practice: typically weighted linear sum of features:
§e.g. f1(s)=(numwhitequeens–numblackqueens),etc.

Evaluation for Pacman

Depth Matters
§Evaluation functions are always imperfect
§The deeper in the tree the evaluation function is buried, the less the quality of the evaluation function matters
§An important example of the tradeoff between complexity of features and complexity of computation
[Demo: depth limited (L6D4, L6D5)]

Synergies between Evaluation Function and Alpha-Beta?
§Alpha-Beta: amount of pruning depends on expansion ordering
§ Evaluation function can provide guidance to expand most promising nodes first (which later makes it more likely there is already a good alternative on the path to the root)
§ (somewhat similar to role of A* heuristic, CSPs filtering) §Alpha-Beta: (similar for roles of min-max swapped)
§ Value at a min-node will only keep going down
§ Once value of min-node lower than better option for max along path to root, can
prune
§ Hence: IF evaluation function provides upper-bound on value at min-node, and upper-bound already lower than better option for max along path to root THEN can prune