Adversarial Search
[AIMA G4] Chapter 6
Sections 6.1-6.3
(6.4, 15.1-15.3)
COSC1127/1125
Artificial Intelligence
Semester 2, 2021
Prof.
* Some slides are based on slides from and those from Fahiem Bacchus.
—
Wominjeka! Welcome!
I acknowledge, and invite you all to acknowledge, the Traditional Owners (Wurundjeri people of the Kulin Nations) of the land on which we will conduct this session for AI21. I recognise their continuing connection to land, waters and culture, as well as the challenges and injustices that continue up to our times, unfortunately.
IMPORTANT DISCLAIMER
These pack of slides are NOT core study
material. The core & main material is the book or any specific resources given for the week.
The slides in this presentations were only for supporting the instructor during the lectorial.
These pack of slides do not represent the complete material of the week, but only a few bits here and there around what was discussed in lectorials. Most of the week content will NOT show up in this pack of slides.
Please refer to the book (and any resources specifically given) for a comprehensive coverage of the week material.
Looking forward seeing you f2f soon..
s
‹#›
Overview
Book-keeping info & news.
Questions?
A* and heuristics.
Admissibility & consistency.
Mini-max for multi-agent search.
DFS mini-max.
Alpha-beta pruning.
How did you spend the 6-8 self-study hours on this course last week?
Read the book? Watched the videos? Completed tute 2?
Worked on project 1?
‹#›
Additional drop-in lab
Additional drop-in lab tomorrow Wednesday 2pm-3pm.
Book here!
If you book:
Make 100% you will use it.
Make a PRIVATE post before to inform us what you want to chat.
Include the link to your repo.
Remember Monday 4pm-5pm Consultation Time
Posts on Project 1 – Search
On Q7 timeout and borrowing heuristics from others – post #66 and #65
On early test goal in DFS/BFS – post #58
On modifying other files – post #53
Many others under “Assignments – P1” tag
‹#›
On the topic of forum posts…
So far has been REALLY good, professional and insightful, thanks!
But remember:
Mark as RESOLVED when your question has been addressed successfully. Don’t just leave it open..
Make posts self-contained:
add screenshots, links, exact sections in book, etc.
Avoid open ended questions that show you have not made enough effort: “how does X work?” or “what is X?”
Overall, help others to help you!
Forum behaviour
We encourage exchange of:
conceptual ideas (e.g., the explore set contains…);
discussion on pseudo-code (e.g., line 4 in the book);
Python programming tricks, information about Pacman infrastructure.
But avoid:
Showing any code at all, unless it is pure Python issue.
Anything that starts with “I did X and Y…”
Questions like: “How do I do X or Y?”
Abstract questions/requests: “Help me with X”
Common pitfalls
“I didn’t copy & paste, I just used it as inspiration”
“I used that code, but made sure I understand it”
“I was stuck and thought looking at a solution would help”
“I told her/him that she cannot use it”
“I always use Google in my programming, it’s great!”
“I am not a fan of re-inventing the wheel…”
AI’21 Honors Code
Your solutions must be your own work.
The code representing your actual solution to the problem should be your code – check here.
You may only reuse non-essential code, provided you understand it and acknowledge it clearly.
Do not Google solutions: do it yourself!
CODEQUIRY knows them all! Not worth the risk…
You may not share or show code with anyone.
You may not engage in any other activities that will dishonestly improve your results or dishonestly improve or damage the results of others.
*** DO NOT LET US DOWN ***
Other news
Tute 2 solutions are in EdStem.
Tute 3 sheet for this week available in EdStem.
Much shorter than 1!
Project 0 marking on the way…
Sorry, had to swap to Project 1 work for the past days (including weekend!) to support those students who went a bit beyond with optimised DFS/BFS: early gaol checking. Well done there!
Will get an email soon…
Project 1 deadline end of this week: Sunday 11:59pm.
Project 0 lessons..
To make “cheap” errors now and avoid them later for the “real” projects!
No tag “submission” found: check about git tagging
Capitalization: “Submission” != “submission”.
Branches are not tags.
A commit message “submission” is not a tag.
Wrong tag (e.g., tag just the initial commit)
Check with git log or GitHub web interface.
Did not run the autograder!
It is there for your benefit, use it!
No certification = No marking = Zero marks
Broken code:
Wrong Python, code indented wrongly, edited wrong file
Teaching staff will not debug or repair your submission. Wrong submissions will attract no marks and no warrant no re-submission.
post #33
Strongly agree
Strongly disagree
Questions?
*
Search is an exploration of possibilities.
Particularly useful when a ‘direct’ method/algorithm is not known.
One of the first topics studied in AI:
Newell and Simon (1961) General Problem Solver.
Central component to many AI systems:
Automated reasoning, theorem proving, navigation, VLSI layout, scheduling, game playing, optimisation, …
What is search?
Search Strategies
Depth-first LIFO
Breadth-first FIFO, g(n) = depth(n)
Uniform cost Ordered by g(n)
Depth-limited Depth bound
Iterative Deepening LIFO + depth bound
Greedy best-first ordered by h(n)
A* ordered by h(n) + g(n)
Admissible Heuristics & A* Optimality
h admissible if it “never overestimates the cost”
That is, heuristic h is “optimistic” !
Formally: h(n) <= h*(n), for all nodes n
Theorem: If the heuristic is admissible, then A* with tree-search is optimal.
This implies that A* search approaches the optimal cost h* “from below”: heuristic is a lower bound.
Every node with f(n) < C* will be expanded (C* is the cost of the optimal path to the goal).
Can you prove it formally?
*
Dominant Heuristics
h2 is more informed than h1 if h2(n) ≥ h1(n) and h2(n) <= h*(n)
Why is h2 better than h1?
Heuristics allow us to prune the search space:
if h2(n) ≥ h1(n) then h2 will cause us to search only a subset of the nodes searched by h1
Hence, it is always better to use a heuristic function with higher values, provided it does not overestimate and that the computation time for the heuristic is not too large.
*
Optimality of A* on graph-search
For searching graphs we may want to ask for more than admissibility:
Consistency (monotonicity):
h(n) <= cost(n,n’) + h(n’), for all nodes n,n’
Almost any admissible heuristic function will also be consistent.
Theorem: If the heuristic is consistent, then A* graph-search is optimal.
Can you prove it formally?
Proof idea: If h is consistent, once a node is expanded, the cost by which it was reached is the lowest possible!
*
Admissibility vs Consistency
Admissible:
h(n) <= h*(n), for all nodes n
Consistent (monotonicity):
h(n) <= cost(n, n’) + h(n’), for all nodes n, n’
Prove that it is TRUE!
*
How well do you know IDS and A*?
Code 8974 0091 @ menti.com
‹#›
Cannot tackle these yet…
Chance:
Hidden states:
Infinite states:
Games against adversary:
All of the above
Generalizing Search...
So far: assumed agent has complete control of environment
State does not change unless the agent changes it.
All we need to compute is a single path to a goal state.
Assumption not always reasonable!
Stochastic environment (e.g., the weather, traffic accidents).
Other agents whose interests conflict with yours.
Generalize search to handle state changes that are not in the control of the agent.
Adversarial/Game Search
There are two (or more) agents making changes to the world (the state)
Each agent has their own interests
e.g., each agent has a different goal; or assigns different costs to different paths/states
Each agent tries to alter the world so as to best benefit itsef.
How you should play depends on how you think the other person will play; but how they play depends on how they think you will play; so how you should play depends on how you think they think you will play; but how they play should depend on how they think you think they think you will play; …
Games considered here
Zero-sum games: Fully competitive
If one play wins, the others lose (e.g. Poker) you win what the other player lose
Deterministic: no chance involved
No dice, or random deals of cards, or coin flips, etc.
Perfect information
All aspects of the state are fully observable e.g no hidden cards
State of the Art
@ Chess
Deep Blue (1997!):
minimax + alpha-beta search + evaluation functions.
lots of specialist processors.
Typically 14-ply (moves), sometimes up to 40-ply (!!)
`Opening book’ of 4000 positions.
Database of 700,000 grandmaster games.
Endgame database of all 5 piece checkmates.
A good PC with the right program can match a human world champion …
*
Minimax Game Search
Two players: MAX and MIN
Set of positions P (states of the game)
Starting position s0 (where the game begins)
Transition model Result(s,a): next game position
Set of terminal positions T (where the game can end)
Utility U(s): utility of position s for player MAX
Utility for terminal in chess is generally 1,0 or 0.5
Utility for terminal in tic-tac-to is 1, 0, or -1
Turn function Player(s): who moves in state s
Initial (root) state is a MAX node
Children of MAX state are MIN states
Children of MIN state are MAX states
Why don’t we need a utility function for MIN?
*
Game Tree Search
MAX NODE
MIN NODE
TERMINALS
MIN ACTIONS
MAX ACTIONS
Tree Game Search: Intuitions
Players alternate moves (starting with Max)
Game ends when some terminal state t in T is reached
A game state: a state-player pair
State we're in and whose move it is
Utility function and terminals replace goals
MAX wants to maximize the terminal payoff
MIN wants to minimize the terminal payoff
Think of it as:
MAX gets Utility(t) for terminal node t
MIN gets –Utility(t) for terminal node t
This is why it’s called zero (or constant) sum
*
Game Tree Search: Utilities
1
3
2
5
6
7
8
7
6
5
2
6
1
4
2
8
5
1
5
UTILITY AT TERMINALS
MAX
MIN
Tic-tac-toe Tree Search
Tree Game Search: Intuitions
Game tree looks like a search tree
Layers reflect alternating moves between players
Player MAX doesn’t decide where to go alone
After MAX moves to a state, MIN decides which of the states children to move to
Thus MAX must have a strategy
Must know what to do for each possible move of MIN
One sequence of moves will not suffice: “What to do” will depend on how MIN will play
What is a reasonable strategy?
*
Minimax Strategy
Assume that the other player will always play their best move.
You always play a move that will minimize the payoff that could be gained by the other player.
My minimizing the other player's payoff you maximize yours' payoff.
If however you know that Min will play poorly in some circumstances, there might be a better strategy than MiniMax (i.e., a strategy that gives you a better payoff).
But in the absence of that knowledge minimax “plays it safe”.
*
Game Tree Search: Utilities
1
3
2
4
5
6
7
8
8
7
6
5
5
2
6
1
1
3
5
7
7
1
5
5
3
7
7
5
3
5
5
4
2
8
5
1
5
UTILITY AT TERMINALS
Leaf node t: use Utility(s)
MAX node: maximum of children
MIN node: minimum of children
Game Tree Search: Utilities
1
3
2
4
5
6
7
8
8
7
6
5
5
2
6
1
1
3
5
7
7
1
5
5
3
7
7
5
3
5
5
4
2
8
5
1
5
Leaf node t: use Utility(s)
MAX node: maximum of children
MIN node: minimum of children
Tic-tac-toe Tree Search
Minimax Strategy
Build full game tree (all leaves are terminals)
Root is start state, edges are possible moves, etc.
Label terminal nodes with utilities
Back values up the tree
Utility(t) is defined for all terminals (part of input)
Utility(n) =
min {Utility(c) : c si a child of n} if n is a MIN node
max {Utility(c) : c si a child of n} if n is a MAX node
*
Minimax Strategy (cont.)
The values labeling each state are the values that MAX will achieve in that state.
MAX plays a move to change the state to the highest valued min child.
MIN plays a move to change the state to the lowest valued max child.
If MIN plays poorly MAX could do better but never worse.
If MAX however knows that MIN will play poorly, there might be a better strategy of play for MAX than Minimax.
*
MIN plays very bad
1
3
2
4
5
6
7
9
8
7
6
5
5
2
6
1
1
3
5
7
7
1
5
5
3
7
7
5
3
5
5
4
2
8
5
1
5
Suppose you know that MIN is a horrible player: plays always bad.
MIN plays very bad (cont.)
1
3
2
4
5
6
7
9
8
7
6
5
5
2
6
1
1
3
5
7
7
1
5
5
3
7
7
5
3
5
5
4
2
8
5
1
5
BETTER THAN MINIMAX !
Suppose you know that MIN is a horrible player: plays always bad.
Problem!
Mini-max search tree will generally be HUGE!!
Too large for any domain...
After the break: Pruning the tree
How can we address this to save on space + time?
Depth-first implementation of minimax: save space!
Prune the tree on branches that are useless.
Depth-limited trees + heuristic evaluations.
Minimax is too demanding
Building the entire game tree and backing up values gives each player their strategy, good!
However, the game tree is exponential in size (wrt number of moves).
Too large for almost any interesting domain...
*
Depth‐First Implementation of Minimax
DFSMiniMax(n, Player) // Return utility of state n for
If n is TERMINAL
Return Utility(n) // Return terminal states utility
// Apply Player’s moves to get successor states.
ChildList = n.Successors(Player)
If Player == MIN
return minimum of {DFMiniMax(c, MAX)| c in ChildList}
Else //Player is MAX
return maximum of {DFMiniMax(c, MIN)| c in ChildList}
Check Minimax search and Alpha-Beta Pruning
*
Depth‐First Implementation of Minimax
Advantage of DFS implementation: space efficient
Never more than one branch in memory...
What about time?
Well, minimax will expand O(b^d) states, which is both a BEST and WORST case scenario.
We must traverse the entire search tree to evaluate all options.
We can’t be lucky as in regular search and find a path to a goal before searching the entire tree.
*
Prune the tree!
Not all branches need expanding to make right decision!
5
1
3
7
Does it matter anymore?
*
Alpha pruning
5
1
3
7
X
Y
Will have 7 or more
MIN will pick 5 regardless of X and Y
Alpha cuts
*
Alpha pruning II
5
1
3
7
23
Will have 7 or more
MIN will pick 5 regardless of 12 or 23
Alpha cuts
12
*
Beta pruning
8
6
2
X
Y
Will have 2 or less
MAX will pick 5 regardless of X and Y
5
Beta cuts
*
1
0
Beta pruning II
8
6
2
Will have 2 or less
MAX will pick 5 regardless of X and Y
5
Beta cuts
*
Alpha-beta pruning
Same search as for Minimax except that:
At every node keep:
alpha = value of best (i.e., highest) choice found so far at any MAX node.
beta = value of best (i.e., highest) choice found so far at any MIN node.
MIN node: stop if current value ≤ alpha
(will not change above MAX decision!)
MAX node: stop if current value ≥ beta
(will not change above MIN decision!)
*
Alpha cuts
4
1
3
7
X
Y
Alpha cuts
beta = 4 (best MIN so far)
current best = 1, 3, 7
current best >= beta
*
Beta cuts
8
6
2
X
Y
5
Beta cuts
alpha = 5 (best MAX so far)
current best = 8, 6, 2
current best <= alpha
Whole subtrees are not expanded!
*
Alpha-beta pruning: Savings
Quality of pruning depends on order of generation.
Try to generate children in "likely" order.
Can reduce minimax from O(bd) to O(bd/2).
Assuming perfect ordering: best nodes first.
Searches twice as deep a tree in same time.
Reduces effective branching factor from b to √b.
For chess 6 instead of 35.
*
Let’s prune...
1
3
2
4
5
6
9
9
8
7
6
5
1
2
6
1
1
3
5
9
8
1
5
1
3
5
8
5
3
8
3
4
2
8
5
1
1
MAX best = 5 >= beta = 3
MIN best = 6 <= alpha = 8
MIN best = 2 <= alpha = 5
Step by Step: Alpha Beta Pruning
*
Games and Search: recap.
Set up game as a kind of search tree
Use minimax method to choose moves (in depth-first)
Use alpha-beta pruning to cut down alternatives
(Some implementation tricks such as transposition tables as hash tables)
So ... all good?
*
Practical Matters
Games involve search, but …
Search space can be huge (chess 1040 states (!))
All “real” games are too large to enumerate tree
Depth 10 tree in chess: 2,700,000,000,000,000 nodes
Even alpha-beta pruning won’t help here!
Not all moves may be chosen by opponent.
Time limits may limit search.
Exhaustive search is not how humans do it!.
*
Practical Matters (cont.)
We must limit depth of search tree.
Can’t expand all the way to terminal nodes.
We must make heuristic estimates about the values of the (non-terminal) states at the leaves of the tree.
These heuristics are often called evaluation function.
Evaluation functions are often learned.
*
Evaluation Function
Idea goes back to in 1950(!!)
Limit depth + estimate utility via a heuristic evaluation function ("educated" guess)
Then, calculate H-Minimax(s):
Eval(s) if Cutoff-test(s, d)
Maxa H-Minimax(Result(s,a),d+1) if Player(s) = MAX
Mina H-Minimax(Result(s,a),d+1) if Player(s) = MIN
*
Evaluation Function (cont.)
So what is a good evaluation function?
Efficient.
Match utility on terminal states
winning better than draws better than losses
Correlate with winning chances on non-terminal states.
Will have some level of uncertainty (as it is an estimate)
*
Evaluation Function (cont.)
Often a weighted sum of factors
Eval(s) = w1f1(s) + w2f2(s) + … + wnfn(s)
Assumes factors are independent.
Common chess function:
9* #Queens + 4 * #Bishops + 3*#Knights + #Pawns
(Deep Blue used about 6000 features in its evaluation function)
Says nothing about position …
*
Evaluation Function (cont.)
For chess the weights and factors could be:
w1 f1(s) = 9 * (number of white Queen – number of black queen)
w2 f2(s) = 4 * (number of white Bishop – number of black Bishop)
w3f3(s) = 3 * (number of white knight – number of black knight)
w4 f4(s) = 1 * (number of white pawn – number of black pawn)
… so on and so forth…
By this simple function we can estimate which player is closer to winning.
Checkers?
Tic-tac-toe?
Your favourite game?
*
Horizon effect
How can we deal with horizon effect?
Apply only to quiescent states (not ‘I am about to lose my queen!’)
Can check for non-quiescent states (i.e., something is under attack) and insist that they are expanded
*
Deterministic, Perfect Information
Perfect Information Imperfect Information
Deterministic Chess, Draughts (checkers), Go, Othello, Noughts and Crosses (tic-tac-toe), … Kriegspiel, Battleship
Stochastic Backgammon, Monopoly Bridge, 500, Scrabble
*
Stochastic Games
Involve some ‘chance’ factor:
Dice.
Deal in a card game.
Tossing a coin.
Backgammon:
Deterministic moves, but which ones are legal are determined by the dice.
Requires chance nodes in the game tree (2 per ply)
*
Stochastic Games
Calculate Expect-Minimax(s):
Utility(s) if Terminal-test(s)
Maxa Expect-Minimax(Result(s,a)) if Player(s) = MAX
Mina Expect-Minimax(Result(s,a)) if Player(s) = MIN
SUMr P(r) * Expect-Minimax(Result(s,r)) if Player(s) = CHANCE
Complicates alpha-beta pruning …
Can use Monte Carlo (random!) choices over a large enough sample …
*
State of the Art
@ Chess
Deep Blue (1997!):
minimax + alpha-beta search + evaluation functions.
lots of specialist processors.
Typically 14-ply (moves), sometimes up to 40-ply (!!)
`Opening book’ of 4000 positions.
Database of 700,000 grandmaster games.
Endgame database of all 5 piece checkmates.
A good PC with the right program can match a human world champion …
*
Alpha-Go: Search + ML
AlphaGo's algorithm uses a combination of machine learning and tree search techniques, combined with extensive training, both from human and computer play.
Which one?
Conclusion
Game Search
Not just you doing actions in the tree!
Minimax strategy
Assumes optimal players
Reduce exploration:
DFS minimax.
Alpha-beta pruning.
Limit look-ahead.
Evaluation functions.
Expected-Minimax:
Deal with chance
Good luck with project 1, enjoy & learn from it!
*