Groups Calendar Inbox History LMS support Communi!es Library Study skills
COMP90054_2021_SM1 Quizzes Exam: AI Planning for Autonomy (COMP90054_2021_SM1)
2021 Semester 1
Exam: AI Planning for Autonomy (COMP90054_2021_SM1)
Copyright By PowCoder代写 加微信 powcoder
” Ques!on 1 ” Ques!on 2 ” Ques!on 3 ” Ques!on 4 ” Ques!on 5 ” Ques!on 6 ” Ques!on 7
? Spacer Time Running: Hide
A&empt due: Jun 7 at 17:15
1 Hour, 4 Minutes, 28 Seconds
Announcements
Subject Overview
Lecture Capture
Media Gallery
Zoom 1. Quizzes
Exam Support
Started: Jun 7 at 15:00
Quiz Instruc!ons
Academic Integrity Declara!on
By commencing and/or submi”ng this assessment I agree that I have read and understood the University’s policy on
academic integrity.
I also agree that:
Unless paragraph 2 applies, the work I submit will be original and solely my own work (chea!ng);
I will not seek or receive any assistance from any other person (collusion) except where the work is for a designated collabora!ve task, in which case the individual contribu!ons will be indicated; and,
I will not use any sources without proper acknowledgment or referencing (plagiarism).
Where the work I submit is a computer program or code, I will ensure that:
a. any code I have copied is clearly noted by iden!fying the source of that code at the start of the program or in a header file or, that comments inline iden!fy the start and end of the copied code; and
b. any modifica!ons to code sourced from elsewhere will be commented upon to show the nature of the modifica!on.
Mul!ple choice ques!ons [7 marks]
Ques!on 1 1 pts
Assume the ini!al state is I and the goal state is G. Which connec!on needs to be removed from the following graph to make the heuris!c admissible:
I->A A->D B->C B->E E->G
Which of the following statements are true?
hff is always closer to h* than hmax or hadd
The h+ heuris!c is some!mes inadmissible for all STRIPS planning problems
The A* algorithm will always return an op!mal solu!on without reopening nodes if the heuris!c used is consistent
Enforced hill climbing is complete if the heuris!c used is admissible
Depth first search has a lower space complexity than breadth first search or A*
Consider a policy 𝜋 that takes a state and returns the ac!on a that should be chosen in state s. What type of policy is this?
A random policy
An ini!al policy
A determinis!c policy A stochas!c policy
A local policy
What of the following are correct formula for policy extrac!on from a Q-func!on 𝑄(𝑠, 𝑎)? Select all correct answers.
argmax𝑎∈𝐴(𝑠) ∑𝑠∈𝑆 𝑃𝑎(𝑠′ ∣ 𝑠)[𝑟(𝑠, 𝑎, 𝑠′) + 𝛾𝑉 (𝑠′)]
argmax𝑎∈𝐴(𝑠) ∑𝑠∈𝑆 𝑃𝑎(𝑠′ ∣ 𝑠)[𝑟(𝑠, 𝑎, 𝑠′) + 𝛾𝑄(𝑠′, 𝑎)] argmax𝑎∈𝐴(𝑠) ∑𝑠∈𝑆 𝑃𝑎(𝑠′ ∣ 𝑠)[𝑟(𝑠, 𝑎, 𝑠′) + 𝛾 max𝑎′∈𝐴(𝑠′)𝑄(𝑠′, 𝑎′)] 𝑄(𝑠, 𝑎)
argmax𝑎′∈𝐴(𝑠) 𝑄(𝑠, 𝑎′)
Select the statement below that is false
Q-learning and SARSA are both temporal difference learning
Q-learning learns a Q-func!on and SARSA learns a policy
n-step reinforcement learning can be applied to both Q-learning and SARSA Q-learning is temporal different learning
SARSA is temporal different learning
Backward induc!on and mul!-agent reinforcement learning are both techniques for solving extensive form games. Under which circumstances would you choose to use backward induc!on instead of mul!-agent reinforcement learning?
If we do not want to use deep Q-learning
If we need an op!mal solu!on
If one of the players is the environment
If we require a policy rather than an equilibrium
Select all pure-strategy Nash equilibria for the following game. Select none if there are no pure- strategy Nash equilibria.
Player 1 Up Down
4, 6 9, 7 5, 5 4, 6
Player 2 Le”
Short answer ques!ons [18 marks]
Calculate hmax(I) for the robot described above
The following informa!on relates to the next three ques!ons
A cleaning robot can move between four rooms (A, B, C & D) in either direc!on along the lines indicated in the diagram below.
The robot has two fluents: at(X) and cleaned(X), where X can take the value A, B, C or D. The robot has two ac!ons:
move(X, Y) which moves from room X to room Y with a cost of 1. It’s formal specifica!on is below: precondi!ons: at(X)
effects: at(Y), not at(X)
clean(X), which cleans a room with a cost of 2. It’s formal specifica!on is below: precondi!ons: at(X)
effects: cleaned(X)
The robot is ini!ally at room A and none of the rooms have been cleaned. The goal is to clean all four rooms.
The robot is now modified in the following way:
Once the robot enters a room that has not been cleaned it must clean it before moving to another
room. The formal specifica!on of the move(X, Y) ac!on is now: precondi!ons: at(X), cleaned(X)
effects: at(Y), not at(X)
Calculate hadd for the modified robot, assuming the ini!al and goal states are unchanged.
Ques!on 10
Calculate hff for the modified robot described in the previous ques!on, as induced by hadd.
Ques!on 14
Ques!on 11 2 pts
Create a STRIPS problem with at most four ac!ons such that the Bellman-Ford table for hmax(I) has not fully converged by the point where all proposi!ons have non-infinite costs. You need to specify the following:
– The ini!al state of the problem
– The goal state of the problem
– The precondi!ons, effects and costs of each ac!on
p 0 words >
Ques!on 12 3 pts
Assume that the domain only includes the following five proposi!ons {a,b,c,d,e} With reference to the diagram above, answer the following ques!ons:
(i) Assuming the goal node is Node 9, specify a set of two proposi!ons that could hold in Node 5 that would allow IW(1) to find the goal node.
(ii) Assuming the goal node is Node 9, specify one proposi!on that could hold in Node 5 that would allow IW(2) but not IW(1) to find the goal node.
(ii) Assuming the goal node is Node 9, specify a set of two proposi!ons that could hold in Node 5 that would not allow either of IW(2) or IW(1) to find the goal node.
Note that a node’s numeric label indicates the order it is considered for genera!on and expansion. Each character denotes a proposi!on, e.g. node 2 has a state which contains only the proposi!on a.
(i): 1. a, b 2. a, e
(ii): a, b, e
(iii): 1. d, e 2. a, d, e
19 words >
Ques!on 13 2 pts
Consider a robot called MedAssist, which takes medical kits from their storage loca!on to an opera!ng theatre in a hospital. When it is not being used, it stays at its base sta!on to charge.
It can be in one of five states:
1. Base: It is at its base sta!on
2. No Kit: It is not at its base sta!on and does not have a medical kit
3. Kit 1: It has collected medical kit 1
4. Kit 2: It has collected medical kit 2
5. Delivered: It has delivered a medical kit 1 or 2
There are three ac!ons available:
1. get_kit1: MedAssist goes to collect medical kit 1. If kit 1 is there, the MDP transi!ons to state Kit 1. If kit 1 is missing, the MDP will stay in state No Kit. No immediate reward is received.
2. get_kit2: MedAssist goes to collect medical kit 2. There is a 1.0 chance kit 2 will be there (transi!on to state Kit 2). No immediate reward is received.
3. deliver: Deliver the kit that is being carried
The MDP transi!on probabili!es and rewards are:
s a s’ P(s,a,s’) r(s,a,s’)
Base Base Base No Kit Kit 1 Kit 2
get_kit1 get_kit1 get_kit2 get_kit2 deliver deliver
No Kit Kit 1 Kit 2 Kit 2 Delivered Delivered
0.8 0 0.2 0 1.0 0 1.0 0 1.0 10 1.0 5
Using value itera!on, we end up with the following value func!on for MedAssist a$er four itera!ons using a discount factor γ = 0.9.
State Base No Kit Kit 1 Kit 2 Delivered Value 4.9 4.4 10 7 0
Apply one more itera!on to calculate the value V(Base) a$er five itera!ons. Enter your final answer to two decimal places in the box below
Match the techniques below with their proper!es. Mul!ple techniques can match to one property.
Value itera!on
Approximate Q-learning
Ques!on 15
Model-based
Model-free, model-based, or simula!on-based
Model-free, model-based, or simula!on-based
Consider a reinforcement learning agent that is trying to learn how fast a vacuum cleaning robot can travel without over-hea!ng.
There are two states: cool and warm.
There are two ac!ons: slow and fast.
If the robot goes fast, it is more likely to transi!on to a warm state than it is goes slow. Using a learning rate of 0.5 and a discount factor of 0.8, we arrive at the following Q-table:
Q(cool, fast) 11 Q(cool, slow) 7 Q(warm, fast) 2 Q(warm, slow) 5
The agent executes the ac!on fast in the state cool, receives a reward of 6, and is now in the warm state. It will execute the ac!on slow next.
Calculate the new value for Q(cool, fast) using 1-step Q-learning to 2 decimal places. 10.75
Ques!on 16
If two spiders find a dead insect at the same !me, each spider will make menacing gestures to scare off the other. If one spider backs down, that spider gets nothing and the other spider get the insect to itself. If both spiders back down, they can share the insect. If neither backs down, the spiders will fight. The payoffs resul!ng from the fight depend on the sizes of the spiders (represented as x and y) and are described using the following normal form game matrix:
Spider 1 Back down
Suppose the spiders are the same size so that x=y. What is the smallest value of x that gives both spiders a strictly dominant strategy?
Ques!on 17 2 pts In your own words, compare the concepts of a normal form game and an extensive form game.
Normal form game is expressed in the form of matrix and extensive form game is expressed in the form of decision tree.
In normal form games, players choose their ac!ons simultaneously, but in extensive form game, players choose their ac!ons in a certain order.
It is easy to iden!fy dominated strategy and Nash equilibrium in normal form games. In extensive form game, players can observe other player’s ac!on.
p span 68 words >
Long answer ques!ons [15 marks]
Ques!on 18 3 pts
Argument: If autonomous vehicles can be shown to drive more safely than people on average, people should be banned from driving and all vehicles on roads should be autonomously controlled.
Give one short argument (2-3 sentences) that argues against this point from a u!litarianism or deontological perspec!ve. Specify which ethical perspec!ve you take.
Note: you do not need to agree with the argument that you give. Edit View Insert Format Tools Table
12pt Paragraph
U!litarianis!c ethics: only outcomes ma&er. The outcomes in this argument is that all vehicles on roads should be autonomously controlled. However it only focus on people on average, there are s!ll many people drive safer than autonomous vehicles, so the outcome is not op!mal.
Deontological ethics: we have a duty not to harm others, in this argument, people are banned from driving and all vehicles on roads. This hurts many people who love to drive cars.
p 76 words >
Ques!on 19 5 pts
A treasure-hun!ng robot (R) can move horizontally and ver!cally to adjacent cells within a grid. One cell in the grid contains a shovel (S) that the robot may pick up and use to dig to find treasure in a par!cular cell. Note that the robot cannot move diagonally between cells and cannot move into a cell in which it has already dug for treasure but can move into the same loca!on of the shovel (S). If the robot is in the same cell as the shovel it can pick it up. If the robot is holding the shovel it can dig to expose any treasure contained in that cell, provided it has not already dug in that cell. If the robot is in the same cell as the treasure and the treasure is exposed it can pickup the treasure. The goal is to collect the treasure and return it to the ini!al cell.
The robot has four ac!ons: (1) move; (2) pickup-shovel; (3) pickup-treasure; and (4) dig. Describe briefly in STRIPS how to model the domain described. Include a specifica!on of the parameters the four ac!ons will take, and the precondi!ons and postcondi!ons of each ac!on. Include a descrip!on of the goal state of the problem, and an ini!al state that makes sure the goal is reachable.
You are allowed to use variables as arguments for the ac!ons (ac!on schemes), specifying the values of the variables. Note: it is not compulsory to use PDDL syntax, as long as you can convey the main ideas.
Edit View Insert Format Tools Table 12pt Paragraph
0 words >
Ques!on 20 0 pts
MedAssist (from the earlier ques!on) is updated, and it no longer knows the probability transi!on matrix, so it does not know the probability of Kit 1 being present.
Assume the following Q-func!on for MedAssist, implemented as a Q-table. Note that some Q- values are omi&ed as they are not important for this ques!on:
Q(Base, get_kit1) 4 Q(Base, get_kit2) 2
Q(No Kit, get_kit2) 3
Q(Kit 1,deliver) 8 Q(Kit 2,deliver) 5
Consider the following par!al episode from the state Base: get_kit1 → No Kit → get_kit2 → Kit 2 → deliver→ Delivered
The episode terminates once the robot reaches the state Delivered.
Assuming that α = 0.4 and γ = 0.9, perform a 3-step update for Q(Base, get_kit1) using 3-step
SARSA. Show your working.
Enter your final answer to two decimal places in the box below, and your working in the next ques!on box.
Ques!on 21
Saved at 14:10
Submit Quiz
Ques!on 22 5 pts
Challenge ques!on.
In a small town, a pie shop makes two types of pie: gourmet and basic. Gourmet pies sell for $6 while basic pies sell for $3.
Some!mes tour buses stop at the shop for lunch. They always carry about 100 people on a gourmet food tour, or 100 backpackers on a budget tour. People on gourmet food tours prefer gourmet pies, while people on budget tours prefer the cheaper, basic pies. Each !me a bus comes, the shop owner needs to decide whether to make 100 gourmet pies or 100 basic pies.
The shop owner knows the following:
The shop needs to make 100 gourmet pies or 100 basic pies, it cannot make both types on the same day. The passengers will decide whether to buy pies based on the type of pie that is made.
On a gourmet food tour:
All passengers on a gourmet food tour bus would have a u!lity of 1 for buying a gourmet pie, and a u!lity of -1 if they did not.
All passengers on a gourmet food tour bus would have a u!lity of -1 for buying a basic pie, and a u!lity of 1 if they did not.
On a budget tour:
All passengers on a budget tour would have a u!lity of 1 for buying a basic pie, and a u!lity of -1 if they did not.
All passengers on a budget tour would have a u!lity of -1 for buying a gourmet pie, and a u!lity of 1 if they did not.
The shop owner learns that a tour bus carrying 100 passengers is stopping at the shop for lunch the next day, but does not know which type of bus it is. Based on prior experience, the shop owner believes there is a 65% probability of the bus being a budget tour bus, and 35% probability of the bus being a gourmet food bus.
Using your knowledge from across this subject, calculate whether the shop should make 100 gourmet pies or 100 basic pies before the bus arrives, assuming that the shop owner wants to maximise their expected reward/u!lity for the day. Show your working.
Edit View Insert Format Tools Table
12pt Paragraph
0 words >
Enter your working for the previous ques!on here.
Edit View Insert Format Tools Table
12pt Paragraph
0 words >
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com