代写代考 Game Playing and Adversarial Search

Game Playing and Adversarial Search

Chapter 5, Sections 1–5

AIMA Slides © and ; Chapter 5, Sections 1–5 1

♢ Perfect play

♢ Resource limits

♢ α–β pruning

♢ Games of chance

AIMA Slides © and ; Chapter 5, Sections 1–5 2

Games vs. search problems

“Unpredictable” opponent ⇒ solution is a contingency plan

Time limits ⇒ unlikely to find goal, must approximate

Plan of attack:

• algorithm for perfect play ( , 1944)

• finite horizon, approximate evaluation (Zuse, 1945; Shannon, 1950; Samuel,

• pruning to reduce costs (McCarthy, 1956)

AIMA Slides © and ; Chapter 5, Sections 1–5 3

Types of games

AIMA Slides © and ; Chapter 5, Sections 1–5 4

Representing a game as a search problem

We can formally define a strategic two-player game by:

• initial state

• terminal test (i.e. win / lose / draw)

• utility function (i.e. numeric reward for outcome)
chess: +1, 0, -1
poker: cash won or lost

In a zero-sum game with 2 players:
each player’s utility for a state are equal and opposite

AIMA Slides © and ; Chapter 5, Sections 1–5 5

Perfect play for deterministic, perfect-information games

Idea: choose move to position with highest minimax value
= best achievable payoff against best play

E.g., 2-ply game:

AIMA Slides © and ; Chapter 5, Sections 1–5 6

Minimax algorithm

function Minimax-Decision(game) returns an operator

for each op in Operators[game] do
Value[op]←Minimax-Value(Apply(op, game), game)

return the op with the highest Value[op]

function Minimax-Value(state, game) returns a utility value

if Terminal-Test[game](state) then
return Utility[game](state)

else if max is to move in state then
return the highest Minimax-Value of Successors(state)

return the lowest Minimax-Value of Successors(state)

AIMA Slides © and ; Chapter 5, Sections 1–5 7

Properties of minimax

Complete??

Time complexity??

Space complexity??

AIMA Slides © and ; Chapter 5, Sections 1–5 8

Properties of minimax

Complete?? Yes, if tree is finite (chess has specific rules for this)

Optimal?? Yes, against an optimal opponent. Otherwise??

Time complexity?? O(bm)

Space complexity?? O(bm) (depth-first exploration)

For chess, b ≈ 35, m ≈ 100 for “reasonable” games
⇒ exact solution completely infeasible

AIMA Slides © and ; Chapter 5, Sections 1–5 9

Resource limits

Suppose we have 100 seconds, explore 104 nodes/second
⇒ 106 nodes per move

Standard approach:

• cutoff test
e.g., depth limit (perhaps add quiescence search)

• evaluation function
= estimated desirability of position

AIMA Slides © and ; Chapter 5, Sections 1–5 10

Evaluation functions

White slightly better Black winning

For chess, typically linear weighted sum of features

Eval(s) = w1f1(s) + w2f2(s) + . . . + wnfn(s)

e.g., w1 = 9 with
f1(s) = (number of white queens) – (number of black queens)

AIMA Slides © and ; Chapter 5, Sections 1–5 11

Digression: Exact values don’t matter

1 2 2 4 1 20 20 400

Behaviour is preserved under any monotonic transformation of Eval

Only the order matters:
payoff in deterministic games acts as an ordinal utility function

AIMA Slides © and ; Chapter 5, Sections 1–5 12

Cutting off search

MinimaxCutoff is identical to MinimaxValue except
1. Terminal? is replaced by Cutoff?
2. Utility is replaced by Eval

Does it work in practice?

bm = 106, b = 35 ⇒ m = 4

4-ply lookahead is a hopeless chess player!

4-ply ≈ human novice
8-ply ≈ typical PC, human master
12-ply ≈ IBM’s Deep Blue, Kasparov

AIMA Slides © and ; Chapter 5, Sections 1–5 13

α–β pruning example

AIMA Slides © and ; Chapter 5, Sections 1–5 14

α–β pruning example

AIMA Slides © and ; Chapter 5, Sections 1–5 15

α–β pruning example

AIMA Slides © and ; Chapter 5, Sections 1–5 16

α–β pruning example

AIMA Slides © and ; Chapter 5, Sections 1–5 17

α–β pruning example

AIMA Slides © and ; Chapter 5, Sections 1–5 18

Properties of α–β

Pruning does not affect final result

Good move ordering improves effectiveness of pruning

With “perfect ordering,” time complexity = O(bm/2)
⇒ doubles depth of search
⇒ can easily reach depth 8 and play good chess

A simple example of the value of reasoning about which computations are
relevant (a form of metareasoning)

AIMA Slides © and ; Chapter 5, Sections 1–5 19

Why is it called α–β?

α is the best value (to max) found so far off the current path

If V is worse than α, max will avoid it ⇒ prune that branch

Define β similarly for min

AIMA Slides © and ; Chapter 5, Sections 1–5 20

The α–β algorithm

Basically Minimax + keep track of α, β + prune

function Max-Value(state, game,α,β) returns the minimax value of state
inputs: state, current state in game

game, game description
α, the best score for max along the path to state
β, the best score for min along the path to state

if Cutoff-Test(state) then return Eval(state)
for each s in Successors(state) do

α←Max(α,Min-Value(s, game,α,β))
if α ≥ β then return β

function Min-Value(state, game,α,β) returns the minimax value of state

if Cutoff-Test(state) then return Eval(state)
for each s in Successors(state) do

β←Min(β,Max-Value(s, game,α,β))
if β ≤ α then return α

Deterministic games in practice

Checkers: Chinook ended 40-year-reign of human world champion in 1994. Used an endgame database defining perfect play for all
positions involving 8 or fewer pieces on the board, a total of 443,748,401,247
positions.

Chess: Deep Blue defeated human world champion in a six-
game match in 1997. Deep Blue searches 200 million positions per second,
uses very sophisticated evaluation, and undisclosed methods for extending
some lines of search up to 40 ply.

Othello: human champions refuse to compete against computers, who are

Go: human champions refuse to compete against computers, who are too
bad. In go, b > 300, so most programs use random moves initially, along
with pattern knowledge bases to suggest plausible moves.

Nondeterministic games

E..g, in backgammon, the dice rolls determine the legal moves

Simplified example with coin-flipping instead of dice-rolling:

Algorithm for nondeterministic games

Expectiminimax gives perfect play

Just like Minimax, except we must also handle chance nodes:

if state is a chance node then

return average ofExpectiMinimax-Value of Successors(state)

A version of α–β pruning is possible
but only if the leaf values are bounded. Why??

Nondeterministic games in practice

Dice rolls increase b: 21 possible rolls with 2 dice
Backgammon ≈ 20 legal moves (can be 6,000 with 1-1 roll)

depth 4 = 20× (21× 20)3 ≈ 1.2× 109

As depth increases, probability of reaching a given node shrinks
⇒ value of lookahead is diminished

α–β pruning is much less effective

TDGammon uses depth-2 search + very good Eval
≈ world-champion level

Digression: Exact values DO matter

Behaviour is preserved only by positive linear transformation of Eval

Hence Eval should be proportional to the expected payoff

Games illustrate several important points about AI

♢ perfection is unattainable ⇒ must approximate and make trade-offs

♢ uncertainty limits the value of look-ahead

♢ can programs learn for themselves as they play? (stay tuned…)

Examples of skills expected:

♢ Demonstrate operation of game search algorithms

♢ Discuss and evaluate the properties of game search algorithms

♢ Design suitable evaluation functions for a game

♢ Explain how to search in nondeterministic games

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts