代写代考 COMP30024 Artificial Intelligence

COMP30024 Artificial Intelligence
Week 5 Problem Sheet Solutions
• Tutors: These are not unique, complete (and definitely not optimally pedagogical) solutions. there are other ways of deriving these results and the way you prefer is to some extent an aesthetic judgement.
• Students: If you want to attain any benefit from the problems you should make a genuine attempt at them before you see the solutions.

Copyright By PowCoder代写 加微信 powcoder

Comments/corrections are as always greatly appreciated by the teaching team.
1. Noughts and Crosses [T] Here you will investigate the fundamentals of minimax and alpha- beta pruning. Consider noughts and crosses on a 3 × 3 grid. Define Xn as the number of rows, columns or diagonals with exactly n crosses and no noughts. Define On analogously. Assume our utility function assigns +1 to any position with X3 = 1 and −1 to any position with O3 = 1, with all other positions having zero utility. For nonterminal positions, use the evaluation function f(s) = 3X2(s) + X1(s) − (3O2(s) + O1(s)) . Assume the cross player goes first.
(a) Find an (possibly approximate) expression for the size of the state space in terms of n, i.e. how many possible configurations exist.
• b = 9,m = 9, naively obvious bounds are 99, 9! or 39. Note number of possible games > number of unique states as a game considers state ordering.
• A slightly tighter bound: note game finishes after 9 moves (5 by X, 4 by O). – How many ways to place first X on board? 9 = 􏰋9􏰌.
– How many ways to place first O on board? 72 = 􏰋9􏰌 × 􏰋8􏰌. 11
– How many ways to place second X on board? 􏰋9􏰌 × 􏰋7􏰌. 21
– How many ways to place second O on board? 􏰋9􏰌 × 􏰋7􏰌. 22
– How many ways to place third X on board? 􏰋9􏰌 × 􏰋6􏰌. 32
– How many ways to place third O on board? 􏰋9􏰌 × 􏰋6􏰌, etc. 33
– Sum all up to get 6046

• This can be reduced a bit more via symmetry.
(b) Construct the whole game tree starting with an empty board down to depth 2 (i.e., one X and one O on the board), taking symmetry into account. Mark the evaluations of all positions at depth 2.
• Root has a minimax value of +1.
• At depth 1, MIN encounters the following values:
– X at top left: −1. – X at top mid: −2. – X at centre: +1.
• At depth 2, the twelve leaf nodes are left as an exercise.
(c) Use minimax to mark the backed-up values for the positions at depth 1, and use these
values to choose the optimal move for the X player. See above.
(d) Circle the nodes at depth 2 that would not be evaluated if alpha-beta pruning was applied, assuming the nodes were generated in the optimal order for alpha-beta pruning.
In total, 8 of the 12 leaf nodes can be pruned by alpha-beta with optimal move ordering.
2. Great Expectations [T] Why is minimax less reasonable than expectiminimax as a practical decision-making principle for a complex agent acting in the real world?
The real world is often stochastic and in complex environments with inherent randomness it is generally necessary to use probability distributions to parameterize our uncertainty. More practically, minimax assumes an optimal adversary – it is more likely that optimal responses to an agent’s actions are not guaranteed. Minimax makes decisions based on the potentially low-probability worst case scenario. Recall that Expectiminimax uses the expected utility over its children. This weighs events and their corresponding outcomes according to their probability in the expectation, so may be more robust to highly negative but very rare events.
3. Suboptimal [T] The minimax algorithm returns the best move for MAX under the assumption that MIN plays optimally. What happens when MIN plays suboptimally?
Assertion: For every game tree, the utility obtained by MAX using minimax decisions against a suboptimal MIN will be never be lower than the utility obtained playing against an optimal MIN.
• Consider a min node whose children are terminal nodes. For a suboptimal opponent, the utility of the node is greater or equal than the value than if min played optimally, U(σmin) ≥ U∗(σmin)
• Hence the value of the parent max node, which assumes the maximum among all it’s children, is greater than the case for an optimal min player.
• Repeat this argument until the root node and conclude that the assertion is true, the max player always enjoys payoff greater or equal to the case of an optimal min player.

4. Optimal Ordering [T] Prove that alpha–beta pruning takes time O(bm/2) with optimal move ordering (i.e. the best move for each player is always expanded first), where m is the maximum depth of the game tree and b the branching fraction. Is this possible in practice?
The following is an intuitive ’proof’, for a more rigorous proof see Section 6 of Knuth and Moore (1975). Consider a node σ in the tree and a 2-ply section of the total game tree extending below that – assume without loss of generality that the first player is MAX and the second player is MIN.
• We want to calculate the exact minimax value of the node σ. This requires us to calculate the exact value of its best child c∗, and a bound on the state value for the remaining b − 1 children {c1, . . . , cb−1}. Assuming optimal ordering, we expand c∗ first and then the other b − 1 children in decreasing order of their minimax value. So the first ply must expand all b nodes.
• To obtain a bound on the minimax value of a given node, we need the exact value of at least one of its children (the best child). Here by optimal ordering, the best move is expanded first, so we obtain the correct bound just by expanding the first child. The second ply expands just one child for each of the b nodes expanded in the first ply.
• Hence the number of nodes that must be expanded is O(b) + n(c∗), where n(c∗) is the number of nodes expanded when calculating the exact value of the best child c∗ in the first ply, versus O(b2) without the optimal ordering assumption.
• Now it seems plausible that we get a square-root reduction in the number of nodes that have to be considered in every 2-ply section of the tree to calculate the exact value of a node, assuming optimal ordering. Repeat inductively until the terminal depth m to show that that time complexity is O(bm/2).
This is an exponential reduction in the number of nodes that have to be considered relative to minimax, allowing alpha-beta search with this optimality properly to search to roughly twice the depth as minimax using the same amount of computation.
5. Variations [T] Minimax and alpha-beta as presented assume a zero-sum setting with bounded utility functions. This question prompts you to consider the consequences of removing these assumptions.
(a) Describe how the minimax and alpha–beta algorithms change for two-player, non-zero- sum games in which each player has a distinct utility function and both utility functions are known to both players.
For non-zero sum games between two players A and B, a single value is not sufficient to describe the utility of a node, and should instead be replaced by a tuple of values (vA,vB), giving the minimax value/utility of the state for each respective player. At each stage the player whose turn it is to move selects the action that yields the highest vi value.
Alpha-beta pruning is not possible in this context as the game is not necessarily adver- sarial and the two players are not necessarily competing against each other – the utility functions may be entirely independent, and neither player may be particularly fussed

about what the other player does. There may no longer be situations where one player will never let the other player down a particular branch of the game tree.
(b) What happens in the almost cooperative case, where the utility functions differ by at most a positive constant k > 0, i.e. UA = UB + k?
The optimal strategy in this scenario is for both players to cooperate to reach the maximum-valued leaf – the agents work together to achieve a mutually desirable goal.
References
Donald E. Knuth and . Moore. An analysis of alpha-beta pruning. Artificial Intelligence, 6(4):293–326, 1975.

Figure 1: Pitfalls of optimal zero-sum game playing in an adversarial environment. 5

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com