COMP30024 Artificial Intelligence
Week 5 Problem Sheet
Weekly Topic Review
In adversarial search, two players, MAX and MIN, compete in zero-sum, perfect information games. Both players want to find a sequence of actions leading to a win, contingent on the strategy of their opponent. What strategy is reasonable? The minimax algorithm develops a strategy assuming both players play optimally until the game terminates. Put another way, either player must find an optimal sequence of actions in the ’worst-case’ scenario, assuming perfect play by the opponent.
Copyright By PowCoder代写 加微信 powcoder
Recall the utility U(σ) for a given player is the payoff for that player upon game completion in state
σ. The minimax value of node σ is the utility for being players play optimally until the end.
U ( σ ) ,
MM(σ) = maxa∈A(σ) MM(σ′),
mina∈A(σ) MM(σ′),
• In state σ, the current player chooses an action a
in the corresponding state, assuming both
for terminal σ
for MAX, σ′ = a(σ) for MIN, σ′ = a(σ)
from the set of total valid actions A(σ) in the current state leading to the child σ′ with highest minimax value,
• MM(σ) is computed recursively for each node in tree via depth-first traversal, with the relevant utilities propagated upwards from the leaf nodes.
– Time complexity O(bm) for max. depth m. – Space complexity O(bm).
• Minimax is trivially complete + optimal as we fully enumerate the game tree and take the best action at each juncture.
Note the need to simulate until termination – but recursion in combinatorially large state space is generally infeasible! Put another way, complete enumeration is impossible for games with large b, m. For example, in Chess the state space is estimated to be around O(10154) nodes, an unfeasibly large number to physically enumerate. We need to focus on ’good’, likely high-utility regions of state space, and this is where the notion of state pruning enters.
Section 5.3. Alpha–Beta Pruning
Pruning the search graph refers to ignoring re- gions of state space that cannot lead to opti- mal or complete solutions. We stop when we establish that a partial solution is worse than the current best solution, or cannot be extended into a complete solution. This ensures we ig- nore regions of state space that provably cannot contain the optimal solution. This doesn’t fully enumerate the exponential growth of the search graph, but we can come close to exponentially reduce its size, on average.
The form of pruning relevant to minimax is
alpha-beta pruning. The intuition is this:
consider a possible successor node n in tree. If
MAX/MIN has a better choice m at any point
before n, n will never be generated and can be
ignored. Once we have found out enough infor-
mation about n (i.e. by expanding some of its
descendants to obtainFaigbuourned5o.6n theTmhineimgeanxeral case for alpha–beta pruning. If m is better than value of n) to reach this conclusion, we may ig-
Player Opponent
will never get to n in play.
nore it hereafter – it makes no difference to gameplay as the minimax decision is not affected by the
true value of n.
For α-β pruning, two parameters are needed along the minimax depth-first path in tree:
α = the value of the best (i.e., highest-value) choice we have found so far a along the path for MAX.
β = the value of the best (i.e., lowest-value) choice we have found so far a • α: The value of the best (highest-value) choice along path for MAX – α = ”at least”.
α-β search updabtersanthcehveaslueastofaαn, βodaes it(ir.eec.u,rsteesrtmhrionuagthetshethtree.reItcutersmivineatcesatlhl)e raescusrsoivoencaalls the val
along the path for MIN.
• β: The value of the best (lowest-value) choice along path for MIN – β = ”at most”.
Alpha–beta search updates the values of α and β as it goes along and pru
as soon as it establishes the minimax value v of the current node is v ≤ α (for MAX) or v ≧ β (for
node is known to be worse than the current α or β value for MAX or MIN,
MIN). The performance is highly contingent on move ordering, as this determines how efficiently we complete algorithm is given in Figure 5.7. We encourage you to trace i
can establish a bound on the minimax value for each state in the tree. In class you will show that assuming optimal move ordering, α-β search reduces the complexity of standard minimax search
applied to the tree in Figure 5.5.
from O(bm) to O(bm/2), and this is the best we can do. 5.3.1 Move ordering
The effectiveness of alpha–beta pruning is highly dependent on the order i are examined. For example, in Figure 5.5(e) and (f), we could not prune an at all because the worst successors (from the point of view of MIN) were the third successor of D had been generated first, we would have been able two. This suggests that it might be worthwhile to try to examine first the s likely to be best.
If this can be done,2 then it turns out that alpha–beta needs to exam nodes to pick the best move, instead of O bm for minimax. This means
1. Noughts and Crosses [T] Here you will investigate the fundamentals of minimax and alpha- beta pruning. Consider noughts and crosses on a 3 × 3 grid. Define Xn as the number of rows, columns or diagonals with exactly n crosses and no noughts. Define On analogously. Assume our utility function assigns +1 to any position with X3 = 1 and −1 to any position with O3 = 1, with all other positions having zero utility. For nonterminal positions, use the evaluation function f (s) = 3X2 (s) + X1 (s) − (3O2 (s) + O1 (s)) .
(a) Find an (possibly approximate) expression for the size of the state space in terms of n, i.e. how many possible configurations exist.
(b) Construct the whole game tree starting with an empty board down to depth 2 (i.e., one X and one O on the board), taking symmetry into account. Mark the evaluations of all positions at depth 2.
(c) Use minimax to mark the backed-up values for the positions at depth 1, and use these values to choose the optimal move for the X player.
(d) Circle the nodes at depth 2 that would not be evaluated if alpha-beta pruning was applied, assuming the nodes were generated in the optimal order for alpha-beta pruning.
2. Great Expectations [T] Why is minimax less reasonable than expectimax as a practical decision-making principle for a complex agent acting in the real world?
3. Suboptimal [T] The minimax algorithm returns the best move for MAX under the assumption that MIN plays optimally. What happens when MIN plays suboptimally?
4. Optimal Ordering [T] Prove that alpha–beta pruning takes time O(bm/2) with optimal move ordering (i.e. the best move for each player is always expanded first), where m is the maximum depth of the game tree and b the branching fraction. Is this possible in practice?
5. Variations [T] Minimax and alpha-beta as presented assume a zero-sum setting with bounded utility functions. This question prompts you to consider the consequences of removing these assumptions.
(a) Describe how the minimax and alpha–beta algorithms change for two-player, non-zero- sum games in which each player has a distinct utility function and both utility functions are known to both players.
(b) What happens in the almost cooperative case, where the utility functions differ by at most a positive constant k > 0, i.e. UA = UB + k?
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com