COMP4418, 2017 – Assignment 3
Due: Wednesday, 22 November, 23:59:59 Worth: 15%
1. [20 Marks] (Social Choice and Game Theory)
abc
d
efg
(a)
In the tournament in the above Figure, assuming all the arcs missing from the figure are downward arc, list
• the uncovered set;
• the top cycle;
• the set of Copeland winners; • the set of Banks winners; and • the set of Condorcet winners.
(b) Compute all the Nash equilibria of the following two player game.
DE A
B
2, 4 |
8, 5 |
6, 6 |
4, 4 |
1
2. [30 Marks] (Decision Making)
(a) For each of the following games, choose the best model among (A) Markov chain (Markov process); (B) Markov decision process (MDP); (C) Hidden Markov model (HMM); (D)
Partially-observable Markov decision process (POMDP); and (E) • Blackjack
• Candy Crush
• Chess
• Minesweeper
• Snakes and Ladders
• Texas Hold ’em Poker
None/Other.
For the next questions, consider the Markov Decision Process depicted below. Edges are labelled “name of the action (reward associated), probability of the transition”.
Leave (5), 0.5
Leave (5), 0.5 Stay (−2), 1
Stay (1), 1 s1
Leave (0), 1
s2 Stay (0), 1 s3
Leave (−2), 1
- (b) Using your intuition, give an optimal policy for situations where the discount factor is very high (for instance, δ = 0.999)? Explain your reasoning in two or three sentences.
- (c) Using your intuition, give an optimal policy for situations where the discount factor is very low (for instance, δ = 0.001)? Explain your reasoning in two or three sentences.
- (d) Represent the values computed during the first three iterations of the Value Iteration algorithm using the following format where L represents the action Leave and S represents the action Stay. Use a discounting factor of 0.6.
V0(s) V0(s, S) V0(s, L) V1(s) V1(s, S) V1(s, L) V2(s) V2(s, S) V2(s, L) V3(s)
s1 0 1 … s2 0 …
s3 0
- (e) Let π be the following policy: π(s1) = L, π(s2) = L, π(s3) = S. If π is assumed to hold, the MDP turns into a Markov Chain. Represent this Markov Chain / Markov Process.
- (f) Assuming the agent uses π, express the value associated to each state as a function of the discount factor δ. Provide the formal derivation of the result as part of your answer. Elaborate on whether the computations of this question support the intuition of questions 2b and 2c.
2
Submission
• Put your written solutions in a single PDF file assn3.pdf
• Submit using the command: give cs4418 assn3 assn3.pdf
Late Submissions
Due to the assignment due date being extended to the 22nd November there will be no late submissions allowed.
Academic Honesty and Plagiarism
All work submitted for assessment must be your own work. Assignments must be completed individually. We regard copying of assignments, in whole or part, as a very serious offence. Be warned that:
- the submission of work derived from another person, or jointly written with someone else will, at the very least, result in automatic failure for COMP4418 with a mark of zero;
- allowing another student to copy from you will, at the very least, result in a mark of zero for your own assignment; and
- severe or second offences will result in automatic failure, exclusion from the University, and possibly other academic discipline.
Collaborative work in the form of “think tanking” is encouraged, but students are not allowed to derive solutions together as a group during such discussions. Students are also warned not to send solution fragments of the assignments to each other in any form (e.g. as email or listings). In addition, copy- ing/purchasing of solutions that is available on the web is also not permitted. Students are strongly advised to protect their work. Do not leave your terminal/computer unattended, or leave printouts at the printer for others to take. Read the study guide carefully for the rules regarding plagiarism.
3