COMP3620 / 6320 – S1 2022
This lecture will be recorded
Turn off your camera if you do not want to be in the recording
If you cannot see my whole slide, go to View Options -> Zoom Ratio and select Fit to Window
Copyright By PowCoder代写 加微信 powcoder
If you want to ask a question, either:
• Raise your hand and I will open your mic, or • Type the question in the chat box
• Finish A*
• Generalizing Tree Search to Graph Search • How to construct (Admissible) Heuristics • Adversarial Search
Outline for this Week
Recap: A* Search
• Idea: avoid expanding paths that are already expensive • Evaluation function f (n) = g(n) + h(n)
– g(n) = cost so far to reach n
– h(n) = estimated cost from n to the closest goal
– f (n) = estimated total cost of path through n to goal
• Admissible heuristic:
– ∀n h(n) ≤ h∗(n) where h∗(n) is the true cost from n
– Also require h(n) ≥ 0, so h(G) = 0 for any goal G
• Theorem: if h is admissible, A∗ search finds the optimal solution
start node
Consistency
• A heuristic is consistent if h(n) ≤ c(n, a, n′) + h(n′)
• If h is consistent, then
– h is admissible, and
– f(n) is nondecreasing along any path: f (n′) = g(n′) + h(n′)
= g(n) + c(n, a, n′) + h(n′) ≥ g(n) + h(n)
➢ Consequently, when expanding a node, we cannot get a node with a smaller f, and so the value of the best node on the frontier will never decrease.
• When we dequeue a node labelled with a new state, we found the optimal path to that state:
➢ any other node m labelled with the same state satisfies f(n) ≤ f (m) and h(n) = h(m), hence g(n) ≤ g(m).
Optimality of A* (based on consistency)
• Consistency: A*expands nodes in order of increasing f value
• Gradually expands “f-contours” ofnodes
• Contour i has all nodes with f ≤ fi, where fi < fi+1
• (breadth-first expands layers, uniform-cost expands g-contours) O
Properties of A*
• A* expands
– all nodes with f (n) < C (C is the optimal solution cost)
• Space? Exponential •Optimal?Yes!Cannotexpandfi+1untilfi isfinished
IDA* is a version of iterative deepening with a cutoff on f
– some nodes with f (n) = C (but only one goal node) ∗
– no nodes with f (n) > C
• Complete? Yes, unless there are infinitely many nodes with f
• Time? Exponential in [relative error in h × length of solution]
Tree Search and Repeated States
• For many problems, the state space is a graph rather than a tree
• Cycles can prevent termination
• Failure to detect repeated states can turn a linear problem into an exponential one!
Graph search (simple)
• At most one instance of each state in explored ∪ frontier • All expanded nodes kept in memory!
Graph search and optimality
• When seeking optimal solutions, multiple paths to the same state may need to be explored and compared to find the optimal
Number inside the nodes represents 𝑔(𝑛), i.e., the path cost
Graph search and optimality
• When seeking optimal solutions, multiple paths to the same state may need to be explored and compared to find the optimal
Number inside the nodes represents 𝑔(𝑛), i.e., the path cost
better path cost 5
Graph search and optimality
• We may need to keep the better node and update the depths and path- costs of the descendants of the worst one.
Number inside the nodes represents 𝑔(𝑛), i.e., the path cost
Graph search and optimality
• Trick to avoid updating descendants: re-open the explored node; its descendants will be updated when it is re-expanded.
Number inside the nodes represents 𝑔(𝑛), i.e., the path cost
better path cost 5
Graph search and optimality
• Trick to avoid updating descendants: re-open the explored node; its descendants will be updated when it is re-expanded.
Number inside the nodes represents 𝑔(𝑛), i.e., the path cost
Graph search and optimality
• Trick to avoid updating descendants: re-open the explored node; its descendants will be updated when it is re-expanded.
Number inside the nodes represents 𝑔(𝑛), i.e., the path cost
Graph Search (optimal)
Same as before
• The internal block (else if) is needed with A*
– Uniform cost: it also needs this block (unless all step costs are equal)
• If h is consistent (not just admissible), no re-opening (last 2 lines) is needed. – Uniform cost: no reopening is needed because h = 0 is consistent
Found a cheaper path to m
Which algorithm and strategy to use
useful when
algorithm and state space
many solutions exist
• tree-search for finite acyclic graphs
• recursive algorithm for finite acyclic graphs • add cycle detection for finite graphs
shallow solutions exist
• tree search
• graph search (simple) may improve performance
good admissible heuristic lacking
priority queue ordered by g
• tree search for trees
• graph-search (simple) for equal step costs
• graph-search (optimal) for arbitrary step costs
(no reopening needed)
good admissible heuristics exist
priority queue ordered by f
• tree search for trees and admissible heuristics
• graph-search (optimal) for admissible heuristics
(no reopening needed for consistent heuristics)
greedy search
arbitrary but maybe good
good (inadmissible) heuristics exist
priority queue ordered by h
• tree search for finite acyclic graphs
• graph search (simple) for finite graphs
Admissible heuristics – Example
Start State Goal State
• h1(n) = number of misplaced tiles – h1(start state above) = 6
• h2(n) = total Manhattan distance
(i.e., no. of squares from desired location of each tile)
– h2(start state above) = 4+0+3+3+1+0+2+1 = 14
• Given two admissible heuristics ha, hb, if hb(n) ≥ ha(n) for all n then ➢ hb dominates ha and is better for search
– In the 8-puzzle h2 (Manhattan d.) dominates h1 (# misplaced). Typical search costs:
d = 14 IDS = 3,473,941 nodes A∗(h1) = 539 nodes A∗(h2) = 113 nodes
IDS ≈ 54,000,000,000 nodes A∗(h1) = 39,135 nodes A∗(h2) = 1,641 nodes
• There is a trade-off between the accuracy of h and the time to compute h • Given two admissible heuristics ha and hb
h(n) = max(ha(n), hb(n)) is also admissible and dominates ha and hb • Is ha(n) + hb(n) admissible? No! It can double count work
𝑑: depth of the shallowest solution
Relaxed Problems
• Admissible heuristics can be derived from the optimal solution cost of a relaxed version of the problem
➢ Key point: the optimal solution cost of a relaxed problem is no greater than the optimal solution cost of the real problem
• Rules of the 8-puzzle:
a tile can move from square A to square B if A is adjacent to B and B is blank; get all tiles in their correct positions.
• If we relax the rules so that a tile can move anywhere – then h1(n) gives the shortest solution
• If we relax the rules so that a tile can move to any adjacent square – then h2(n) gives the shortest solution
Rules of the 8-puzzle:
Relaxed Problems
a tile can move from square A to square B if A is adjacent to B and B is blank; get all tiles in their correct positions.
• Relaxing the rules so that only some tiles need to get in their correct positions and solving the relaxed problem optimally yields another admissible heuristic
Start State Goal State
– Heuristics derived from the cost of an optimal solution to a smaller subproblem are used in pattern databases to store solutions for every possible subproblem up to a given size.
Relaxed Problems
• Well-known example: travelling salesperson problem (TSP), i.e., find the least-
cost tour visiting all cities exactly once
• Minimum spanning tree can be computed in O(n2)
– and the sum of the edge costs in an MST is a lower bound on the optimal (open)
Announcements
• No class on Monday! (Canberra Day) – Nomake-upclassneeded
• First Tutorials and Labs next week • Quiz:
– during your tutorial
• After the tutorial, your quiz password will stop working
– open book!
– If you have an in-person tutorial: you must bring a device (e.g., laptop or
tablet) to do the quiz. Smartphone is OK • A1 due in 3 weeks
– No labs in the week A1 is due
Adversarial search (game playing)
• Perfect play
– minimax decisions
– α–β pruning
• Imperfect decisions in real time
Adversarial Search Problems (Games)
• Arise in competitive multi-agent environments
• In Game Theory, a multi-agent environment is called a game
• In AI, a game is often a deterministic, turn-taking, two-player, zero-sum game of perfect information:
– deterministic
– twoagents
– whoseactionalternate
– utility values are opposite, e.g. (+1,-1)
– fully observable
• We will write algorithms that play such games against an opponent.
Games Definition
A Game consists of:
• sets of players P, states S (board and player to play), and moves M
• an initial state s0 ∈ S which specifies how the game is set up
• Player(s) ∈ P : defines the player to move in state s
• Moves(s) ∈ 2M : defines the set of legal moves in state s
• Result(s,m) ∈ S: defines the result of performing move m in state s
• Terminal(s) ∈ {T,F}: the terminal test says whether the game is over
• Utility(s,p) ∈ R: the utility function gives a numeric value to terminal states from the point of view of a given player, e.g.:
– {+1, −1, 0} for chess
– {−192, . . . , 192} for backgammon
• “Unpredictable” opponent
⇒ solution for a player is not a sequence of actions but a strategy
• A strategy for a player: a function mapping the player’s states to (legal) moves. • A winning strategy always lead the player to a win from s0
Example: Tic-tac-toe
• States? content of each cell {X,O,empty}, player to play • Moves? an empty cell
• Result? content of chosen cell is X or O depending on the player playing; next player
• Terminal test? are 3 Os or 3 Xs aligned or is the board full?
• Utility function? for a given player gives +1 if player has aligned 3 tokens, -1 if his opponent has, and 0 otherwise. There is a draw strategy.
Game tree (2-player, deterministic, turns)
The game tree is defined by moves, result and terminal
Max is the player using the algorithm to compute his strategy
Max is always at the root of the tree
• Perfect play for deterministic, two-player, zero-sum, perfect-information games • Idea: choose move to position with highest minimax value
➢best achievable utility against best possible opponent
E.g. 2-ply game:
Computing the Minimax value:
• Apply utility function to each leaf of the game tree
• Back-up values from the leaves through inner nodes up to the root:
– min node: compute the min of its children values
– max node: compute the max of its children values
• At the root: choose the move leading to the child of highest value
• Better method: use a depth-first like approach to save space
Minimax Algorithm
Properties of Minimax
• Complete?Yes, if tree is finite
• Optimal? Yes, against an optimal opponent. Otherwise? • Time complexity? O(bm)
• Space complexity?O(bm) (depth-first exploration)
– For chess, b ≈ 35, m ≈ 100 for “reasonable” games
⇒ exact solution completely infeasible – But do we need to explore every path?
𝑏: max. branching factor
𝑑: depth of the shallowest solution 𝑚: max. depth of the state space
Do we need to explore every path?
3 12 8 2 4 6 14 2 5
α–β pruning
• α is the best value (to max, i.e. highest) found so far • If v is not better (greater) than α, max will avoid it
➢prune that branch
• Define β similarly for min
α–β pruning
• β is the best value (to min, i.e. lowest) found so far • If v is not worse (lower) than β, min will avoid it
➢prune that branch
α–β pruning
– α is the best (largest) value found by max on path to current node
– β is the best (lowest) value found by min on path to current node
– Some values outside the interval [α, β] can be pruned
• Algorithm:
– Node passes its current values for α and β to its children in turn
– Child passes back up its value to Node
– Node updates its current value v (max or min with child’s value)
– Nodecheckswhetherv≤α(forminnode)orv≥β(formaxnode)
• If so, child’s siblings can be pruned and v returned to Parent • Otherwise β (min) or α (max) is updated
The α–β algorithm
α–β pruning example
3 12 8 2 4 6 14 2 5
α–β pruning example
MIN [−∞,+∞]
3 12 8 2 4 6 14 2 5
α–β pruning example
MIN [−∞,3]
3 12 8 2 4 6 14 2 5
α–β pruning example
MIN [−∞,3]
3 12 8 2 4 6 14 2 5
α–β pruning example
MIN [−∞,3]
3 12 8 2 4 6 14 2 5
α–β pruning example
3 12 8 2 4 6 14 2 5
α–β pruning example
3 12 8 2 4 6 14 2 5
α–β pruning example
3 12 8 2 4 6 14 2 5
α–β pruning example
3 12 8 2 4 6 14 2 5
α–β pruning example
MIN 3 2 [3,+∞]
3 12 8 2 4 6 14 2 5
α–β pruning example
MIN 3 2 [3,14]
3 12 8 2 4 6 14 2 5
α–β pruning example
MIN 3 2 [3,2] !!
3 12 8 2 4 6 14 2 5
α–β pruning example
3 12 8 2 4 6 14 2 5
Remarks on and Properties of α–β
• Alpha-Beta pruning is used with depth-first exploration of the game tree (not complete generation!)
• Pruning does not affect the final result:
– value at the root of the tree is the same (but not the values of the inner nodes)
• Good move ordering improves effectiveness of pruning
– Perfect ordering (unachievable): increasing order for max and decreasing
order for min.
– With perfect ordering, the time complexity is asymptotically O(bm/2) – With random ordering we get O(b3m/4)
• Unfortunately, 3550 is still impossible!
𝑏: max. branching factor
𝑚: max. depth of the state space
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com