CS代考 Adversarial Search

Adversarial Search
[AIMA G4] Chapter 6
Sections 6.1-6.3
(6.4, 15.1-15.3)
COSC1127/1125
Artificial Intelligence
Semester 2, 2021
Prof.
* Some slides are based on slides from and those from Fahiem Bacchus.


Wominjeka! Welcome!

I acknowledge, and invite you all to acknowledge, the Traditional Owners (Wurundjeri people of the Kulin Nations) of the land on which we will conduct this session for AI21. I recognise their continuing connection to land, waters and culture, as well as the challenges and injustices that continue up to our times, unfortunately.

IMPORTANT DISCLAIMER
These pack of slides are NOT core study
material. The core & main material is the book or any specific resources given for the week.
The slides in this presentations were only for supporting the instructor during the lectorial.
These pack of slides do not represent the complete material of the week, but only a few bits here and there around what was discussed in lectorials. Most of the week content will NOT show up in this pack of slides.
Please refer to the book (and any resources specifically given) for a comprehensive coverage of the week material.

Looking forward seeing you f2f soon..

s

‹#›

Overview
Book-keeping info & news.
Questions?
A* and heuristics.
Admissibility & consistency.
Mini-max for multi-agent search.
DFS mini-max.
Alpha-beta pruning.

How did you spend the 6-8 self-study hours on this course last week?
Read the book? Watched the videos? Completed tute 2?
Worked on project 1?

‹#›

Additional drop-in lab
Additional drop-in lab tomorrow Wednesday 2pm-3pm.
Book here!
If you book:
Make 100% you will use it.
Make a PRIVATE post before to inform us what you want to chat.
Include the link to your repo.

Remember Monday 4pm-5pm Consultation Time

Posts on Project 1 – Search
On Q7 timeout and borrowing heuristics from others – post #66 and #65
On early test goal in DFS/BFS – post #58
On modifying other files – post #53
Many others under “Assignments – P1” tag

‹#›

On the topic of forum posts…
So far has been REALLY good, professional and insightful, thanks!

But remember:
Mark as RESOLVED when your question has been addressed successfully. Don’t just leave it open..
Make posts self-contained:
add screenshots, links, exact sections in book, etc.
Avoid open ended questions that show you have not made enough effort: “how does X work?” or “what is X?”
Overall, help others to help you!

Forum behaviour
We encourage exchange of:
conceptual ideas (e.g., the explore set contains…);
discussion on pseudo-code (e.g., line 4 in the book);
Python programming tricks, information about Pacman infrastructure.

But avoid:
Showing any code at all, unless it is pure Python issue.
Anything that starts with “I did X and Y…”
Questions like: “How do I do X or Y?”
Abstract questions/requests: “Help me with X”

Common pitfalls
“I didn’t copy & paste, I just used it as inspiration”
“I used that code, but made sure I understand it”
“I was stuck and thought looking at a solution would help”
“I told her/him that she cannot use it”
“I always use Google in my programming, it’s great!”
“I am not a fan of re-inventing the wheel…”

AI’21 Honors Code
Your solutions must be your own work.
The code representing your actual solution to the problem should be your code – check here.
You may only reuse non-essential code, provided you understand it and acknowledge it clearly.
Do not Google solutions: do it yourself!
CODEQUIRY knows them all! Not worth the risk…
You may not share or show code with anyone.
You may not engage in any other activities that will dishonestly improve your results or dishonestly improve or damage the results of others.
*** DO NOT LET US DOWN ***

Other news
Tute 2 solutions are in EdStem.
Tute 3 sheet for this week available in EdStem.
Much shorter than 1!
Project 0 marking on the way…
Sorry, had to swap to Project 1 work for the past days (including weekend!) to support those students who went a bit beyond with optimised DFS/BFS: early gaol checking. Well done there!
Will get an email soon…
Project 1 deadline end of this week: Sunday 11:59pm.

Project 0 lessons..
To make “cheap” errors now and avoid them later for the “real” projects!
No tag “submission” found: check about git tagging
Capitalization: “Submission” != “submission”.
Branches are not tags.
A commit message “submission” is not a tag.
Wrong tag (e.g., tag just the initial commit)
Check with git log or GitHub web interface.
Did not run the autograder!
It is there for your benefit, use it!
No certification = No marking = Zero marks
Broken code:
Wrong Python, code indented wrongly, edited wrong file
Teaching staff will not debug or repair your submission. Wrong submissions will attract no marks and no warrant no re-submission.

post #33
Strongly agree
Strongly disagree

Questions?

*

Search is an exploration of possibilities.
Particularly useful when a ‘direct’ method/algorithm is not known.
One of the first topics studied in AI:
Newell and Simon (1961) General Problem Solver.
Central component to many AI systems:
Automated reasoning, theorem proving, navigation, VLSI layout, scheduling, game playing, optimisation, …

What is search?

Search Strategies
Depth-first LIFO
Breadth-first FIFO, g(n) = depth(n)
Uniform cost Ordered by g(n)

Depth-limited Depth bound
Iterative Deepening LIFO + depth bound
Greedy best-first ordered by h(n)

A* ordered by h(n) + g(n)

Admissible Heuristics & A* Optimality

h admissible if it “never overestimates the cost”
That is, heuristic h is “optimistic” !
Formally: h(n) <= h*(n), for all nodes n Theorem: If the heuristic is admissible, then A* with tree-search is optimal. This implies that A* search approaches the optimal cost h* “from below”: heuristic is a lower bound. Every node with f(n) < C* will be expanded (C* is the cost of the optimal path to the goal). Can you prove it formally? * Dominant Heuristics h2 is more informed than h1 if h2(n) ≥ h1(n) and h2(n) <= h*(n) Why is h2 better than h1? Heuristics allow us to prune the search space: if h2(n) ≥ h1(n) then h2 will cause us to search only a subset of the nodes searched by h1 Hence, it is always better to use a heuristic function with higher values, provided it does not overestimate and that the computation time for the heuristic is not too large. * Optimality of A* on graph-search For searching graphs we may want to ask for more than admissibility: Consistency (monotonicity): h(n) <= cost(n,n’) + h(n’), for all nodes n,n’ Almost any admissible heuristic function will also be consistent. Theorem: If the heuristic is consistent, then A* graph-search is optimal. Can you prove it formally? Proof idea: If h is consistent, once a node is expanded, the cost by which it was reached is the lowest possible! * Admissibility vs Consistency Admissible: h(n) <= h*(n), for all nodes n Consistent (monotonicity): h(n) <= cost(n, n’) + h(n’), for all nodes n, n’ Prove that it is TRUE! * How well do you know IDS and A*? Code 8974 0091 @ menti.com ‹#› Cannot tackle these yet… Chance: Hidden states: Infinite states: Games against adversary: All of the above Generalizing Search... So far: assumed agent has complete control of environment State does not change unless the agent changes it. All we need to compute is a single path to a goal state. Assumption not always reasonable! Stochastic environment (e.g., the weather, traffic accidents). Other agents whose interests conflict with yours. Generalize search to handle state changes that are not in the control of the agent. Adversarial/Game Search There are two (or more) agents making changes to the world (the state) Each agent has their own interests e.g., each agent has a different goal; or assigns different costs to different paths/states Each agent tries to alter the world so as to best benefit itsef. How you should play depends on how you think the other person will play; but how they play depends on how they think you will play; so how you should play depends on how you think they think you will play; but how they play should depend on how they think you think they think you will play; … Games considered here Zero-sum games: Fully competitive If one play wins, the others lose (e.g. Poker) you win what the other player lose Deterministic: no chance involved No dice, or random deals of cards, or coin flips, etc. Perfect information All aspects of the state are fully observable e.g no hidden cards State of the Art @ Chess Deep Blue (1997!): minimax + alpha-beta search + evaluation functions. lots of specialist processors. Typically 14-ply (moves), sometimes up to 40-ply (!!) `Opening book’ of 4000 positions. Database of 700,000 grandmaster games. Endgame database of all 5 piece checkmates. A good PC with the right program can match a human world champion … * Minimax Game Search Two players: MAX and MIN Set of positions P (states of the game) Starting position s0 (where the game begins) Transition model Result(s,a): next game position Set of terminal positions T (where the game can end) Utility U(s): utility of position s for player MAX Utility for terminal in chess is generally 1,0 or 0.5 Utility for terminal in tic-tac-to is 1, 0, or -1 Turn function Player(s): who moves in state s Initial (root) state is a MAX node Children of MAX state are MIN states Children of MIN state are MAX states Why don’t we need a utility function for MIN? * Game Tree Search MAX NODE MIN NODE TERMINALS MIN ACTIONS MAX ACTIONS Tree Game Search: Intuitions Players alternate moves (starting with Max) Game ends when some terminal state t in T is reached A game state: a state-player pair State we're in and whose move it is Utility function and terminals replace goals MAX wants to maximize the terminal payoff MIN wants to minimize the terminal payoff Think of it as: MAX gets Utility(t) for terminal node t MIN gets –Utility(t) for terminal node t This is why it’s called zero (or constant) sum * Game Tree Search: Utilities 1 3 2 5 6 7 8 7 6 5 2 6 1 4 2 8 5 1 5 UTILITY AT TERMINALS MAX MIN Tic-tac-toe Tree Search Tree Game Search: Intuitions Game tree looks like a search tree Layers reflect alternating moves between players Player MAX doesn’t decide where to go alone After MAX moves to a state, MIN decides which of the states children to move to Thus MAX must have a strategy Must know what to do for each possible move of MIN One sequence of moves will not suffice: “What to do” will depend on how MIN will play What is a reasonable strategy? * Minimax Strategy Assume that the other player will always play their best move. You always play a move that will minimize the payoff that could be gained by the other player. My minimizing the other player's payoff you maximize yours' payoff. If however you know that Min will play poorly in some circumstances, there might be a better strategy than MiniMax (i.e., a strategy that gives you a better payoff). But in the absence of that knowledge minimax “plays it safe”. * Game Tree Search: Utilities 1 3 2 4 5 6 7 8 8 7 6 5 5 2 6 1 1 3 5 7 7 1 5 5 3 7 7 5 3 5 5 4 2 8 5 1 5 UTILITY AT TERMINALS Leaf node t: use Utility(s) MAX node: maximum of children MIN node: minimum of children Game Tree Search: Utilities 1 3 2 4 5 6 7 8 8 7 6 5 5 2 6 1 1 3 5 7 7 1 5 5 3 7 7 5 3 5 5 4 2 8 5 1 5 Leaf node t: use Utility(s) MAX node: maximum of children MIN node: minimum of children Tic-tac-toe Tree Search Minimax Strategy Build full game tree (all leaves are terminals) Root is start state, edges are possible moves, etc. Label terminal nodes with utilities Back values up the tree Utility(t) is defined for all terminals (part of input) Utility(n) = min {Utility(c) : c si a child of n} if n is a MIN node max {Utility(c) : c si a child of n} if n is a MAX node * Minimax Strategy (cont.) The values labeling each state are the values that MAX will achieve in that state. MAX plays a move to change the state to the highest valued min child. MIN plays a move to change the state to the lowest valued max child. If MIN plays poorly MAX could do better but never worse. If MAX however knows that MIN will play poorly, there might be a better strategy of play for MAX than Minimax. * MIN plays very bad 1 3 2 4 5 6 7 9 8 7 6 5 5 2 6 1 1 3 5 7 7 1 5 5 3 7 7 5 3 5 5 4 2 8 5 1 5 Suppose you know that MIN is a horrible player: plays always bad. MIN plays very bad (cont.) 1 3 2 4 5 6 7 9 8 7 6 5 5 2 6 1 1 3 5 7 7 1 5 5 3 7 7 5 3 5 5 4 2 8 5 1 5 BETTER THAN MINIMAX ! Suppose you know that MIN is a horrible player: plays always bad. Problem! Mini-max search tree will generally be HUGE!! Too large for any domain... After the break: Pruning the tree How can we address this to save on space + time? Depth-first implementation of minimax: save space! Prune the tree on branches that are useless. Depth-limited trees + heuristic evaluations. Minimax is too demanding Building the entire game tree and backing up values gives each player their strategy, good! However, the game tree is exponential in size (wrt number of moves). Too large for almost any interesting domain... * Depth‐First Implementation of Minimax DFSMiniMax(n, Player) // Return utility of state n for If n is TERMINAL Return Utility(n) // Return terminal states utility // Apply Player’s moves to get successor states. ChildList = n.Successors(Player) If Player == MIN return minimum of {DFMiniMax(c, MAX)| c in ChildList} Else //Player is MAX return maximum of {DFMiniMax(c, MIN)| c in ChildList} Check Minimax search and Alpha-Beta Pruning * Depth‐First Implementation of Minimax Advantage of DFS implementation: space efficient Never more than one branch in memory... What about time? Well, minimax will expand O(b^d) states, which is both a BEST and WORST case scenario. We must traverse the entire search tree to evaluate all options. We can’t be lucky as in regular search and find a path to a goal before searching the entire tree. * Prune the tree! Not all branches need expanding to make right decision! 5 1 3 7 Does it matter anymore? * Alpha pruning 5 1 3 7 X Y Will have 7 or more MIN will pick 5 regardless of X and Y Alpha cuts * Alpha pruning II 5 1 3 7 23 Will have 7 or more MIN will pick 5 regardless of 12 or 23 Alpha cuts 12 * Beta pruning 8 6 2 X Y Will have 2 or less MAX will pick 5 regardless of X and Y 5 Beta cuts * 1 0 Beta pruning II 8 6 2 Will have 2 or less MAX will pick 5 regardless of X and Y 5 Beta cuts * Alpha-beta pruning Same search as for Minimax except that: At every node keep: alpha = value of best (i.e., highest) choice found so far at any MAX node. beta = value of best (i.e., highest) choice found so far at any MIN node. MIN node: stop if current value ≤ alpha (will not change above MAX decision!) MAX node: stop if current value ≥ beta (will not change above MIN decision!) * Alpha cuts 4 1 3 7 X Y Alpha cuts beta = 4 (best MIN so far) current best = 1, 3, 7 current best >= beta

*

Beta cuts

8
6
2
X
Y

5
Beta cuts
alpha = 5 (best MAX so far)

current best = 8, 6, 2
current best <= alpha Whole subtrees are not expanded! * Alpha-beta pruning: Savings Quality of pruning depends on order of generation. Try to generate children in "likely" order. Can reduce minimax from O(bd) to O(bd/2). Assuming perfect ordering: best nodes first. Searches twice as deep a tree in same time. Reduces effective branching factor from b to √b. For chess 6 instead of 35. * Let’s prune... 1 3 2 4 5 6 9 9 8 7 6 5 1 2 6 1 1 3 5 9 8 1 5 1 3 5 8 5 3 8 3 4 2 8 5 1 1 MAX best = 5 >= beta = 3
MIN best = 6 <= alpha = 8 MIN best = 2 <= alpha = 5 Step by Step: Alpha Beta Pruning * Games and Search: recap. Set up game as a kind of search tree Use minimax method to choose moves (in depth-first) Use alpha-beta pruning to cut down alternatives (Some implementation tricks such as transposition tables as hash tables) So ... all good? * Practical Matters Games involve search, but … Search space can be huge (chess 1040 states (!)) All “real” games are too large to enumerate tree Depth 10 tree in chess: 2,700,000,000,000,000 nodes Even alpha-beta pruning won’t help here! Not all moves may be chosen by opponent. Time limits may limit search. Exhaustive search is not how humans do it!. * Practical Matters (cont.) We must limit depth of search tree. Can’t expand all the way to terminal nodes. We must make heuristic estimates about the values of the (non-terminal) states at the leaves of the tree. These heuristics are often called evaluation function. Evaluation functions are often learned. * Evaluation Function Idea goes back to in 1950(!!) Limit depth + estimate utility via a heuristic evaluation function ("educated" guess) Then, calculate H-Minimax(s): Eval(s) if Cutoff-test(s, d) Maxa H-Minimax(Result(s,a),d+1) if Player(s) = MAX Mina H-Minimax(Result(s,a),d+1) if Player(s) = MIN * Evaluation Function (cont.) So what is a good evaluation function? Efficient. Match utility on terminal states winning better than draws better than losses Correlate with winning chances on non-terminal states. Will have some level of uncertainty (as it is an estimate) * Evaluation Function (cont.) Often a weighted sum of factors Eval(s) = w1f1(s) + w2f2(s) + … + wnfn(s) Assumes factors are independent. Common chess function: 9* #Queens + 4 * #Bishops + 3*#Knights + #Pawns (Deep Blue used about 6000 features in its evaluation function) Says nothing about position … * Evaluation Function (cont.) For chess the weights and factors could be: w1 f1(s) = 9 * (number of white Queen – number of black queen) w2 f2(s) = 4 * (number of white Bishop – number of black Bishop) w3f3(s) = 3 * (number of white knight – number of black knight) w4 f4(s) = 1 * (number of white pawn – number of black pawn) … so on and so forth… By this simple function we can estimate which player is closer to winning. Checkers? Tic-tac-toe? Your favourite game? * Horizon effect How can we deal with horizon effect? Apply only to quiescent states (not ‘I am about to lose my queen!’) Can check for non-quiescent states (i.e., something is under attack) and insist that they are expanded * Deterministic, Perfect Information Perfect Information Imperfect Information Deterministic Chess, Draughts (checkers), Go, Othello, Noughts and Crosses (tic-tac-toe), … Kriegspiel, Battleship Stochastic Backgammon, Monopoly Bridge, 500, Scrabble * Stochastic Games Involve some ‘chance’ factor: Dice. Deal in a card game. Tossing a coin. Backgammon: Deterministic moves, but which ones are legal are determined by the dice. Requires chance nodes in the game tree (2 per ply) * Stochastic Games Calculate Expect-Minimax(s): Utility(s) if Terminal-test(s) Maxa Expect-Minimax(Result(s,a)) if Player(s) = MAX Mina Expect-Minimax(Result(s,a)) if Player(s) = MIN SUMr P(r) * Expect-Minimax(Result(s,r)) if Player(s) = CHANCE Complicates alpha-beta pruning … Can use Monte Carlo (random!) choices over a large enough sample … * State of the Art @ Chess Deep Blue (1997!): minimax + alpha-beta search + evaluation functions. lots of specialist processors. Typically 14-ply (moves), sometimes up to 40-ply (!!) `Opening book’ of 4000 positions. Database of 700,000 grandmaster games. Endgame database of all 5 piece checkmates. A good PC with the right program can match a human world champion … * Alpha-Go: Search + ML AlphaGo's algorithm uses a combination of machine learning and tree search techniques, combined with extensive training, both from human and computer play. Which one? Conclusion Game Search Not just you doing actions in the tree! Minimax strategy Assumes optimal players Reduce exploration: DFS minimax. Alpha-beta pruning. Limit look-ahead. Evaluation functions. Expected-Minimax: Deal with chance Good luck with project 1, enjoy & learn from it! *