FIT3080 – Artificial Intelligence
Adversarial Search Algorithms
www.monash.edu.au
Copyright By PowCoder代写 加微信 powcoder
Assignment 1 is out!
Modelling and solving problems using search
4 parts (mix of conceptual and practical tasks) Check Moodle for the specification
Due date: September 7, 11:55pm (Melbourne)
Announcements
FIT3080 Artificial Intelligence S2 2020 2
Informed tree/graph search
Expand ‘most promising’ nodes first,
Using evaluation function: f(n) = g(n) + h(n)
What are they Properties & guarantees Effectiveness
Heuristics
Bounded suboptimal search
FIT3080 Artificial Intelligence S2 2020 3
Route planning
FIT3080 Artificial Intelligence S2 2021 4
Route planning
Coordination problems
FIT3080 Artificial Intelligence S2 2021 5
Tic-Tac-Toe
This week: games!
FIT3080 Artificial Intelligence S2 2021 6
Tic-Tac-Toe
This week: games!
FIT3080 Artificial Intelligence S2 2021 7
Tic-Tac-Toe
Games with an adversary
This week: games!
FIT3080 Artificial Intelligence S2 2021 8
This week: games!
Tic-Tac-Toe
Games with an adversary Man vs. Machine!
FIT3080 Artificial Intelligence S2 2021 9
This week: games!
Chinook vs. Tinsley (1993/4) Watch Jonathan Schaffer wonderful account:
FIT3080 Artificial Intelligence S2 2021 10
This week: games!
Deep Blue vs.
A short recap from the BBC:
FIT3080 Artificial Intelligence S2 2021 11
This week: games!
Deep Blue vs.
A short recap from the BBC:
FIT3080 Artificial Intelligence S2 2021 12
This week: games!
Chess AlphaGo vs. (2016)
There’s a wonderful documentary about this match:
FIT3080 Artificial Intelligence S2 2021 13
Types of games
Perfect information (deterministic, full obsv.)
Chess, Go, noughts-and-crosses (tic-tac-toe), draughts (checkers), etc.
Imperfect information (stochastic, partially obsv.)
Poker, Texas hold’em (variant of poker), bridge, backgammon, etc
n-Players (e.g., n = 2 for chess)
Sequential (vs. simultaneous) Zero-sum (vs non-zero sum games)
Our focus: 2p, sequential, perfect information, zero sum FIT3080 Artificial Intelligence S2 2021 14
Game Trees
Start state: initial configuration of the board Players are MAX and MIN
A position favourable to MAX has a value > 0 (winning often ∞) A position favourable to MIN has a value < 0 (winning often −∞)
Each player chooses the most promising move for them Each move adds new nodes to the frontier
The set of nodes generated for one player is called a ply Each node has an associated utility for each player Leaves of the tree are called terminal nodes
FIT3080 Artificial Intelligence S2 2020 15
Game Tree (2-player, Deterministic, Turns)
FIT3080 Artificial Intelligence S2 2021 16
Solving Game Trees
A solution is a game-tree where we can label the root node with an outcome (win, loss, draw, or a utility value)
Each move available to the opponent is a subproblem (another game)
All subproblems need to be solved before the parent node can be considered solved
A strategy for MAX (resp. MIN) is a solution-tree which contains exactly one successor for each MAX (resp. MIN) node.
A winning strategy for MAX (resp. MIN) is a solution-tree where every terminal is a WIN (resp. LOSS, resp. max utility value)
FIT3080 Artificial Intelligence S2 2020 17
Goal: find a winning strategy for MAX
For all nodes representing a game situation where it is MIN’s move next, show that MAX can win from every position to which MIN might move
For all nodes representing a game situation where it is MAX’s move next, show that MAX can win from just one position to which MAX might move
Solving Game Trees
FIT3080 Artificial Intelligence S2 2021 18
Unpredictable opponent:
Must specify a move for every possible opponent reply
Time limits:
not all games can be searched to the end
find a good first move
Games versus Search Problems
FIT3080 Artificial Intelligence S2 2021 19
If MAX were to choose among tip nodes (or available nodes), she would take the node with the largest value
If MIN were to choose among tip nodes (or available nodes), she would take the node with the smallest value
Choose a move to the position with the highest minimax value: best achievable payoff against best play
E.g., 2-ply game:
Search ideas
FIT3080 Artificial Intelligence S2 2021 20
Minimax Algorithm
FIT3080 Artificial Intelligence S2 2021 21
Evaluation function:
{ # rows, columns, diagonals available to MAX – # of rows, columns, diagonals available to MIN }
6-5=1 5-5=0 6-5=1 5-5=0
Minimax Example: Tic-Tac-Toe
4-5=-1 5-4=1 6-4=2
FIT3080 Artificial Intelligence S2 2021 22
Properties of Minimax
Complete Depth First Exploration:
all paths to depth m
Yes (if tree is finite)
Yes (against an optimal opponent)
Time complexity?
Space complexity? O(bm) (depth-first exploration)
For chess, b ≈ 35, m ≈100 for “reasonable” games → exact solution completely infeasible
FIT3080 Artificial Intelligence S2 2021 23
We can improve performance by eliminating symmetric moves
Pruning the Search Space
FIT3080 Artificial Intelligence S2 2021 24
We can improve performance by eliminating redundant nodes
Pruning the Search Space
FIT3080 Artificial Intelligence S2 2021 25
α-value of a MAX node – current largest final backed-up value of its successors
α-value is the lower bound for a MAX backed-up value
β-value of a MIN node – current smallest final
backed-up value of its successors
β-value is the upper bound for a MIN backed-up value
Definitions: α and β Values
FIT3080 Artificial Intelligence S2 2021 26
Rules for discontinuing the search:
α cut-off: search can be discontinued below any MIN node having a β-value ≤ α-value of any of its MAX node ancestors
● The final backed-up value of this MIN node is set to its β-value
β cut-off: search can be discontinued below any MAX node having an α-value ≥ β-value of any of its MIN node ancestors
● The final backed-up value of this MAX node is set to its α-value
α-β Procedure
FIT3080 Artificial Intelligence S2 2021 27
● All the successors of the start node are given final backed-up values
● The best first move is that which creates the successor with the highest backed-up value
Termination Condition
FIT3080 Artificial Intelligence S2 2021 28
MinimaxAlgorithm (again)
FIT3080 Artificial Intelligence S2 2021 29
The α-β Algorithm (I)
α = the value of the best (i.e., highest-value) choice we have found so far at any choice point along the path for MAX.
β = the value of the best (i.e., lowest-value) choice we have found so far at any choice point along the path for MIN.
FIT3080 Artificial Intelligence S2 2021 30
Xβ=2 α cut
α-β Pruning – Example
FIT3080 Artificial Intelligence S2 2021 31
Start node
Backed-up value = +1
MAX nodes MIN nodes
0 +5-3+3+3-3 0+2-2+3+5+2+5-5 0+1+5+1
α-β Pruning – Large Example
-3 0 -5+5-3+3+2+3-3 0-1 -2 0+1+4+5+1-1-1+3 -3+2 -
FIT3080 Artificial Intelligence S2 2021 32
xα=+2 β cut
β cut β cut β=-3x β=+3 β=+5 β =+2
α-β Pruning – Part of Large Example
FIT3080 Artificial Intelligence S2 2021 33
xα=+2 β cut
Does the order of the moves matter?
β cut β cut β=-3x β=+3 β=+5 β =+2
α-β Pruning – Part of Large Example
FIT3080 Artificial Intelligence S2 2021 34
The effectiveness of the αβ algorithm depends on the order in which states are examined
With perfect ordering:
● time complexity = O(sqrt(b)m) = O(bm/2)
i.e., depth of search can be doubled
Adding dynamic ordering schemes brings us close to the theoretical limit
Move Ordering
FIT3080 Artificial Intelligence S2 2021 35
Suppose we have 100 secs per move, and we explore 104 nodes/sec → 106 nodes per move
Still might not reach a terminal node!
Idea: treat the frontier nodes as terminal states
Heuristics for Games
FIT3080 Artificial Intelligence S2 2021 36
Some ideas for generating these.
From experience:
Material values in chess (1 = pawn, 3 = bishop, ... 9 = Queen) State features:
f1(s) = (# of white queens) – (# of black queens)
Combinations of features (e.g., linear weighted sum) Eval(s) = w1 f1(s) + w2 f2(s) + ... + wn fn(s),
From preprocessing:
End-game databases
Evaluation Functions
FIT3080 Artificial Intelligence S2 2021 37
Depth limited search (fixed cost) Depth-First Iterative Deepening (anytime) Beam search (forward pruning)
Don’t search at all! (use table lookups)
How do we search?
FIT3080 Artificial Intelligence S2 2021 38
Potentially misleading evaluations
Search considerations
FIT3080 Artificial Intelligence S2 2021 39
Potentially misleading evaluations
Search considerations
FIT3080 Artificial Intelligence S2 2021 40
Starting from a fixed board position: (state, player)
1)Run AB up to some limit (DFS, or DFID) 2)Treat the frontier nodes s’ as terminal states 3)Compute a score using EVAL(s’, p)
4)Back up the values
5)Pick the best move
6)Repeat from the new board position
Heuristic AB Tree Search
FIT3080 Artificial Intelligence S2 2021 41
Types of Search
wide, but shallow
narrow, but deep
Historically practitioners preferred Type 1 Searches.
More recently, Type 2 approaches (and Type1/2 combinations) have emerged.
FIT3080 Artificial Intelligence S2 2021 42
Use sampling to obtain estimates of utility at a given state
Starting from a given position (s, p)
“Roll out” the game until a terminal state
Fixed policy determines the moves at each ply (fast!) Repeat many times to obtain an “average” utility
Monte Carlo Simulation
FIT3080 Artificial Intelligence S2 2021 43
Four main ingredients:
1) Selection: pick a node from the frontier
2) Expand: generate successors to extend the frontier 3) Simulate games from the newly added node
4) Propagate the results
Monte Carlo Tree Search
FIT3080 Artificial Intelligence S2 2021 44
Instead of always maximising average utility, we can bias the choices: exploration vs exploitation
Monte Carlo Tree Search
FIT3080 Artificial Intelligence S2 2021 45
Deterministic Games in Practice
Checkers: Chinook defeated the world champion in an abbreviated game in 1990. It uses αβ search combined with a pre- computed database defining perfect play for 39 trillion endgame positions.
Chess: Deep Blue defeated human world champion in a six-game match in 1997. Deep Blue searches 30 billion positions per move (200 million per second), normally searching to depth 14, and extending the search up to depth 40 for promising options. Heuristics reduce the EBF to about 3.
Othello: In 1997, a computer defeated the world champion 6-0. Humans are no match for computers.
Go: b ~ 19 x 19 = 361, which is too large for αβ. In 2016, AlphaGo, which uses Deep Learning, beat the world champion 4-1.
AlphaGo also used Monte Carlo Tree Search (MTCS)
FIT3080 Artificial Intelligence S2 2021 46
Summary: Adversarial Search
Games illustrate important points about AI
Perfection is unattainable → must approximate
It causes one to think about which problems are worth thinking about
The basic ideas discussed today generalise:
● >2 players → more ply and more costs
● Stochastic or hidden information →probabilistic reasoning (expectimax)
FIT3080 Artificial Intelligence S2 2021 47
Propositional logic
Inference Resolution Satisfiability
First Order Logic Logic and agents
Next week – Logical Agents
FIT3080 Artificial Intelligence S2 2021 48
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com