7/21/2021 Quiz: Practice exam quiz (long answer questions)
Practice exam quiz (long answer questions)
Started: Jul 21 at 15:51
Quiz Instructions
These are practice ‘long answer’ questions, just based on the sample exam questions compiled from previous years.
This is NOT intended to be a sample example: there will be fewer questions on the final exam, and there will be short-answer questions on the final exam as well.
The subject staff have worked hard on the real exam to try to make answering questions as straightforward as possible, so hopefully it will be smoother than these sample questions, but these questions are illustrative of the type of long answer questions on the final exam.
Blind and heuristic search
Question 1 0 pts
The following search algorithms can be implemented in a similar manner, differing only in the abstract data structure used to implement the open list.
Breadth First Search Depth First Search Uniform Cost Search
Answer the following questions:
I. State which abstract data structure you would use for Breadth First Search. Briefly explain, in one or two sentences, why you would choose the particular data structure.
II. State which abstract data structure you would use for Depth First Search. Briefly explain, in one or two sentences, why you would choose the particular data structure.
III. State which abstract data structure you would use for Uniform Cost Search. Briefly explain,
in one or two sentences, why you would choose the particular data structure.
https://canvas.lms.unimelb.edu.au/courses/105515/quizzes/116221/take 1/19
7/21/2021 Quiz: Practice exam quiz (long answer questions)
https://canvas.l
2/19
p 0 words >
Question 2 0 pts
Recall that A* requires a consistent heuristic to guarantee finding an optimal plan without reopening nodes. Draw or define a graph and make up an admissible but inconsistent heuristic function where A* returns a suboptimal solution.
Write down the steps involved in the A* algorithm by devising a simple example to illustrate.
Note: avoid making large examples, a graph with 4 nodes should be sufficient.
ms.unimelb.edu.au/courses/105515/quizzes/116221/take
7/21/2021
https://canvas.lms.unimelb.edu.au/courses/105515/quizzes/116221/take 3/19
Quiz: Practice exam quiz (long answer questions)
The next five questions refer to the following example. They concern a classical planning problem where a robot can move horizontally and vertically to adjacent cells as depicted in the figure below. Note that the robot cannot move diagonally between cells and the hashed cell is inaccessible to the robot.
In answering the sub-questions below, you are allowed to use variables as arguments for the actions (action schemes), specifying the values of the variables. Note: it is not compulsory to use PDDL syntax, as long as you can convey the main ideas.
Hint: consider that the position of the robot could be modelled as either:
row/column tuples, e.g. <0, 0> could refer to the lower left cell, or single cells, e.g. position < > could refer to the lower left cell.
p 0 words >
Question 3 0 pts
Describe briefly in STRIPS how to model the problem where a robot can move horizontally and vertically among adjacent cells, such that the hadd heuristic estimates the same values as the Manhattan heuristic
I
7/21/2021 Quiz: Practice exam quiz (long answer questions)
https://canvas.l
4/19
Edit View Insert Format Tools Table 12pt Paragraph
p
0 words
>
Question 4 0 pts
Describe briefly in STRIPS how to model the problem where a robot can move horizontally and vertically among adjacent cells, such that the hadd heuristic estimates the same values as the Optimal heuristic.
ms.unimelb.edu.au/courses/105515/quizzes/116221/take
7/21/2021
https://canvas.lms.unimelb.edu.au/courses/105515/quizzes/116221/take 5/19
Quiz: Practice exam quiz (long answer questions)
Question 5 0 pts
Using your last STRIPS encoding where hadd = h∗, an initial state s0 = robot at location I and a goal state sg = robot at location K, compute hadd(sg) and hff from the best supporters induced by hadd.
Show your working
p 0 words
>
p 0 words
>
Question 6 0 pts
We have 2 rooms A and B, 2 objects o1 and o2. A robot can load and unload objects if the robot is at the same location, and move from one room to the other. We want to get o1 and o2 into room B, given that both are initially at A.
7/21/2021 Quiz: Practice exam quiz (long answer questions)
https://canvas.l
6/19
Model the problem in STRIPS in such a way that the optimal plan would be the following:
pick(o1, A), move(A, B), drop(o1, B), move(B, A), pick(o2, A), move(A, B), drop(o2,
B)
p 0 words
>
Question 7 0 pts
What’s the hmax(s0) value of your STRIPS model, where s0 stands for the initial state?
Show your working
ms.unimelb.edu.au/courses/105515/quizzes/116221/take
7/21/2021
https://canvas.lms.unimelb.edu.au/courses/105515/quizzes/116221/take 7/19
Quiz: Practice exam quiz (long answer questions)
MDPs and reinforcement learning
p 0 words
>
The next six questions refer to the following example
Imagine a kitchen robot whose task is to ask the user what kinds of cereals he/she wants for breakfast, wait for the user answer, and then hand out the appropriate cereal box once it knows the desired cereal. We want to design a simple dialogue system to handle the interaction with the user.
A simple way to model it is via a discount-reward MDP with only two states: UnknownCereal, where the robot does not know which cereal to give, and KnownCereal, where the robot knows the cereal to hand out.
There are only two possible actions in the model:
Action AskCerealType corresponds to the robot asking the user for the cereal box he/she wishes to have. The action is only available in state UnknownCereal, and has a reward of -1 (a cost) in that state.
The probability of reaching state KnownCereal is 0.8 (if the user answers the robot’s question), and otherwise (probability 0.2) the robot remains in UnknownCereal (if the user ignores the question or provides an unclear answer).
Action GiveCereal corresponds to the robot physically giving the cereal to the user. The action is available in the state KnownCereal, and has a reward of 5. When the robot executes this action in state KnownCereal, the MDP reaches an absorbing goal state and finishes. As such, there is no actions or reward available from this absorbing goal state.
7/21/2021 Quiz: Practice exam quiz (long answer questions)
Question 8 0 pts
Assuming a discount factor γ = 0.9, calculate the the value function V for each of the states UnknownCereal and KnownCereal using value iteration, for the 2nd and 3rd iterations.
Show your working.
Iteration 1: V(KnownCereal) = 0.0 V(UnknownCereal) = 0.0
p 0 words >
When the robot executes this action in the UnknownCereal state, the MDP reaches the absorbing goal state with probability 0.3 (it gets lucky and hands the right cereal) and receives a reward of 5, or the person rejects the cereal and the robot goes back to the UnknownCereal state with probability 0.7 and receives a reward of -2.
https://canvas.lms.unimelb.edu.au/courses/105515/quizzes/116221/take 8/19
7/21/2021 Quiz: Practice exam quiz (long answer questions)
https://canvas.l
9/19
Question 9 0 pts
Given the value function that you calculate above, what policy would maximise the robot’s expected reward?
Show your working.
p 0 words
>
Question 10 0 pts
In your own words, explain the difference between value iteration and policy iteration.
ms.unimelb.edu.au/courses/105515/quizzes/116221/take
7/21/2021
https://canvas.l
Quiz: Practice exam quiz (long answer questions)
10/19
p 0 words
>
Question 11 0 pts
The robot designers have found that the probabilities used for outcomes are incorrect and different for each household. As such, they
decide to instead use reinforcement learning to learn the policy after deployment.
A few weeks after deployment, one such robot has the following Q-table:
State AskCereal GiveSerial
UnknownCereal 7.2 1.9
KnownCereal — 3.4
Assuming a discount reward factor of 0.9 and a learning rate of 0.5.
In the state UnknownCereal, the robot executes GiveCereal, which is rejected. It
decides to execute GiveCereal again.
Update the Q-table both the 1-step Q-learning and 1-step SARSA updates for the
first GiveCereal action. That is, calculate two new Q-tables. Show your working.
ms.unimelb.edu.au/courses/105515/quizzes/116221/take
7/21/2021
https://canvas.lms.unimelb.edu.au/courses/105515/quizzes/116221/take 11/19
Quiz: Practice exam quiz (long answer questions)
p 0 words >
Question 12 0 pts
In the state UnknownCereal, the robot executes GiveCereal three times in a row. Each time the cereal is rejected.
Calculate the 2-step SARSA update for this execution of three actions
Assume the same parameters (discounted reward, etc) as in the previous question
Show your working
Edit View Insert Format Tools Table 12pt Paragraph
p
0 words >
7/21/2021 Quiz: Practice exam quiz (long answer questions)
Question 13 0 pts
The robot designers are finding that the owners of the robots are demanding a refund because the robots get their cereal preference wrong too often. Give one technique that the robot designers could do to improve the situation.
p 0 words >
Question 14 0 pts
Below is a tree from MCTS in which a collection of roll-outs have been performed, and six nodes expanded are shown (others are omitted as they are not relevant to the question). The notation V(s) = v indicates that v is the value of state s, N = n indicates that the node has been visited n times, and r = R indicates that the reward R is given when this state is visited.
https://canvas.lms.unimelb.edu.au/courses/105515/quizzes/116221/take 12/19
7/21/2021 Quiz: Practice exam quiz (long answer questions)
Using back-propagation and assuming γ = 0.9, calculate the values of V and N for the states s and v.
Show your working
p 0 words
>
https://canvas.lms.unimelb.edu.au/courses/105515/quizzes/116221/take 13/19
7/21/2021 Quiz: Practice exam quiz (long answer questions)
Question 15 0 pts
Given your answer for the previous question, what would action should be chosen at state s to maximise expected reward?
Show your working.
p 0 words
>
Question 16 0 pts
Imagine a reinforcement learning algorithm that monitors heart beat information from a fitness device, such as a FitBit, to determine whether a person develops a heart problem, such as an irregular heartbeat or a faster heartbeat.
List one potential ethical dilemma that could occur in such a situation. Justify why you believe this could be a serious problem.
https://canvas.lms.unimelb.edu.au/courses/105515/quizzes/116221/take 14/19
7/21/2021 Quiz: Practice exam quiz (long answer questions)
Game theory
p 0 words
>
Question 17 0 pts
Consider the following abstract two-player game in normal form. Find all pure and mixed-strategy equilibria for this game.
HINT: Consider the notion of dominated strategies, in which some strategies are strictly dominated by others, so can be discarded.
Show your working
https://canvas.lms.unimelb.edu.au/courses/105515/quizzes/116221/take 15/19
7/21/2021 Quiz: Practice exam quiz (long answer questions)
p 0 words
>
Question 18 0 pts
Challenge Question: Two players, A and B play the following game.
First A must choose IN or OUT.
If A chooses OUT the game ends, and the payoffs: are A gets 2, and B gets 0. If A chooses IN then B observes this and must then choose IN or OUT.
If B chooses OUT the game ends, and the payoffs are: B gets 2, and A gets 0. If B chooses IN then they play the following simultaneous move game:
https://canvas.lms.unimelb.edu.au/courses/105515/quizzes/116221/take 16/19
7/21/2021 Quiz: Practice exam quiz (long answer questions)
Draw the extended-form tree for this game and calculate the equilibria of the extended-form game.
Note: to ‘draw’ the tree, you can use text format; e.g.
U -1.a-> X
X -2.b-> Y (0,1) X -2.c-> Z (1,0)
represents a true with root node U. From U, player 1 must select move a, and the player 2 chooses between moves b and c
Alternatively, you may draw it by hand on paper or using a stylus, and then insert the image using the icon at the top of the text box, however, this often leads to failed uploads or the incorrect figure being uploaded, so it is strongly recommended that you do NOT use images.
Edit View Insert Format Tools Table 12pt Paragraph
https://canvas.lms.unimelb.edu.au/courses/105515/quizzes/116221/take 17/19
7/21/2021 Quiz: Practice exam quiz (long answer questions)
Question 19 0 pts
Challenge Question. The following question is under-specified, but is intended to improve your understanding of subject content and general problem-solving ability, rather than act as an entirely accurate reflection of an exam question.
Consider a person who is mugged by someone on the street with a gun. The person has an unloaded gun in their own pocket. They could get the gun out to try to scare off the mugger, however, they risk the mugger shooting them instead. If they do not get out the unloaded gun, their mugger has time to search only their left or right pocket, but not both and the mugger will not shoot them. The person has $100 in their left pocket and nothing in their right pocket, and the mugger knows this. Assume that if the mugger does not get the $100, they get no payoff.
Should the person draw their unloaded gun or not?
Justify your answer
Edit View Insert Format Tools Table 12pt Paragraph
p
0 words
>
p 0 words
>
https://canvas.lms.unimelb.edu.au/courses/105515/quizzes/116221/take 18/19
7/21/2021 Quiz: Practice exam quiz (long answer questions)
Saved at 15:51 Submit Quiz
https://canvas.lms.unimelb.edu.au/courses/105515/quizzes/116221/take 19/19