Q1–Q2 RP, Q3–Q4 AE F29AI 1. Uninformed, Informed, and Game Tree Search
(a) Consider the following grid representing the states and transitions in a search problem. States are labelled with letters. An agent can move between states provided the two states are adjacent and not blocked by a wall (the black squares). Diagonal movement is not per- mitted. Each move into an adjacent white costs 1 resource. A move into a grey square (H, P, R) costs 3 resources. E.g., a move from D to H costs 3 resources but a move from H to J costs 1 resource. The grid also includes a tunnel that allows the agent to travel from one side of the grid to the other. An agent in state E can use the tunnel to travel to state P at a cost of 3 resources.
If S is the start state and G is the goal state, determine the order in which states are expanded, as well as the goal path returned, for each of the following search methods. For best-first (greedy) search, include the h-value of each expanded state. For A* search, include the f-value of each expanded state. To estimate the distance from any state to the goal, the Manhattan distance heuristic should be used. Assume that when states are added to the fringe that ties are resolved so that states appearing earlier in alphabetical order are expanded first, and that no state is expanded more than once.
i) Depth-first search
States expanded: Goal path:
ii) Breadth-first search
States expanded: Goal path:
(2)
(2)
2 of 11
Q1–Q2 RP, Q3–Q4 AE
F29AI
iii) Best-first (greedy) search
States expanded (with h values): Goal path:
iv) A* search
States expanded (with f values): Goal path:
v) Is the Manhattan heuristic admissible for the given grid? Explain why or why not with specific references to the grid, costs, and heuristic.
(3)
(3)
vi) Say the grid was modified so that R was no longer a grey square but was an ordinary white square. Is the Manhattan heuristic admissible for the resulting grid? Explain why or why not with specific references to the grid, costs, and heuristic.
(2)
3 of 11
(2)
Q1–Q2 RP, Q3–Q4 AE F29AI
(b) Consider the following game tree. Upward facing triangles (A, B, D, F, I) represent the game moves for the maximising player. Downward facing triangles (C, G, I) represent the game moves for the minimising player. Squares represent terminal states.
i) Assuming both players play optimally, what is the minimax value for each of the states A, B, C, D, E, F, G, H, I in the game tree? According to minimax, which move should the maximising player make?
ii) What states are pruned by alpha-beta pruning when applied to the above tree?
iii) Assume that each of the minimising states (C, G, H) is now a chance state, and that each outcome from a chance state is equally likely. What is the expectimax value for each of the states A, B, C, D, E, F, G, H, I in the tree? According to expectimax, which move should the maximising player make?
iv) Consider a game where a minimising player can make one of two possible moves, M1 or M2. After each move, a 6-sided die is rolled. If move M1 was made, the resulting value of the game will be the value shown on the die roll. If move M2 was made, the resulting value will be twice the value shown on the die roll. Describe the game tree that represents this game.
4 of 11
(3) (3)
(3)
(2)
Q1–Q2 RP, Q3–Q4 AE F29AI 2. Knowledge Representation and Automated Planning
Consider the following description of a planning domain. A house con- tains four rooms: a kitchen, a bedroom, an office, and a hall. A mobile robot can pass between adjacent rooms provided it is holding a power cell. Moving to an adjacent room uses up the power cell and the robot cannot move again until it has another power cell. A power cell is con- tained in each room. A special locked and alarmed door (D) prevents the robot from going outside. Picking up a security pass turns off the alarm. Pressing a special button unlocks the door, provided the alarm is off. The robot can hold multiple objects at one time. Assume that the domain contains five locations (kitchen, bedroom, office, hall, outside), four power cells (c1, c2, c3, c4), a security pass (p), and a button (b). A floorplan showing the room locations, locked door, and object locations is given below:
The robot starts in the kitchen and has three actions available to it:
• move-robot(?x ?y ?z): move from location ?x to location ?y using power cell ?z
• pickup-object(?x ?y): pickup object ?x at location ?y
• press-button(?b ?x ?y ?z): press button ?b in location ?x to unlock the door between ?y and ?z
5 of 11
Q1–Q2 RP, Q3–Q4 AE F29AI Answer the following questions using PDDL notation where necessary.
(a) Write PDDL actions for move-robot, pickup-object, and press-button. Use descriptive names for the domain properties but do not change the action parameters or use typing. For each action, also include a short English description of the action’s preconditions and effects. (11)
(b) Say the robot in the above scenario would like to go outside while holding the security pass. Define the initial state and goal for a plan- ning problem that models this problem using PDDL notation. (5)
(c) State a plan that achieves the goal of going outside with the security pass from the initial state described in the problem description. Use PDDL notation for each action in the plan and its instantiated param- eters. Also provide a short explanation of the plan in plain English.
(3)
(d) Say that the robot house scenario describes a real-world scenario. How would the agent and environment types of a real-world scenario compare to the assumptions of modelling the problem as a classical planning problem? Make reference to specific agent and environ- ment types in your answer. (4)
(e) Say that oneMove(R,X,Y) is the head of a Prolog rule that en- codes that robot R can travel from X to Y by making one move, and hasPower(R,P) is the head of a Prolog rule that encodes that robot R is carrying a power cell P that lasts for one move. Write a Prolog rule called twoMove(R,X,Y) that encodes the idea that robot R can make two consecutive moves between two different locations X and Y, with sufficient power.
(2)
6 of 11
Q1–Q2 RP, Q3–Q4 AE F29AI 3. Markov Decision Processes (MDPs) and Reinforcement Learning (RL)
(a) Consider the following Markov Decision Process (MDP) of a robot running with an ice-cream:
• The actions are either to run or walk.
• The three states are: having one scoop of ice-cream (1S), having two scoops (2S), or having none (0S).
• Walking will always give the robot a reward of +1. Running with one scoop will give a reward of +2, and it might, with a probability of 0.5, be rewarded with another scoop of ice cream.
• However running with 2 scoops is very risky: it will either make the robot drop both scoops with a probability of 0.8, and a reward of -10; or else the robot will not drop any scoop, receiving a reward of +2.
• Assume no discount of future actions (γ = 1.0).
Using Policy Evaluation evaluate the policy π for 2 iterations where: π(1S) = Run; π(2S) = Run. Present the results in tabular format as shown below. The first row (Vπ0 ) has been initialised to 0.
(6) 7 of 11
1S
2S
Vπ0
0
0
Vπ1
Vπ2
Q1–Q2 RP, Q3–Q4 AE F29AI
(b) You are given the Gridworld shown below. Assume that the Markov Decision Process is unknown. However, you can observe the fol- lowing four episodes generated. Each training episode specifies a sequence of transitions with each transition comprised of the cur- rent state s, the chosen action a, the new state s′ and the received immediate reward, in that order. Assume that there is no discount (γ = 1.0).
Use model-based Reinforcement Learning to estimate the transition function for the following state-action pairs:
T (A, south, C) = T(B,east, C) = T(C,south,E) = T (C, south, D) =
(4)
(c) Using Direct Evaluation, evaluate the policy π specified by the arrows in the same grid figure in part (b) above. Provide values for the following quantities:
Vˆπ(A) = Vˆπ(B) = Vˆπ(C) = Vˆπ(D) = Vˆπ(E) =
(5) 8 of 11
Q1–Q2 RP, Q3–Q4 AE F29AI
(d) Briefly explain in plain English what the Direct Evaluation method tries to estimate, and how. Also, in what sense is Direct Evaluation
a model-free method? Use the MDP and Episodes in (b) above to illustrate your answers. (2)
(e) Using Q-Learning (Model-free), and assuming that all Q-Values for every
A
B
C
D
E
Up
Down
Right
Left
Exit
9 of 11
(8)
Q1–Q2 RP, Q3–Q4 AE
4. Language Models & Syntactic Parsing of Natural Language
(a) You are given the following corpus, C, of sentences:
F29AI
⟨s⟩ It is not red ⟨/s⟩
⟨s⟩ It is green ⟨/s⟩
⟨s⟩ The colour is not red ⟨/s⟩
⟨s⟩ The shape is square ? ⟨/s⟩ ⟨s⟩ It is what ? ⟨/s⟩
⟨s⟩ What is the shape ? ⟨/s⟩ ⟨s⟩ What is the colour ? ⟨/s⟩ ⟨s⟩ What colour is it not ? ⟨/s⟩
⟨s⟩ Red is not the colour ⟨/s⟩ ⟨s⟩ Square is the shape ⟨/s⟩ ⟨s⟩ The colour is green ⟨/s⟩
⟨s⟩ Is it not red ? ⟨/s⟩
⟨s⟩ What is it ? ⟨/s⟩
⟨s⟩ What shape is it ? ⟨/s⟩
Assuming a Bigram Language Model trained on C using Maximum Likelihood Estimation (MLE), and without any smoothing, answer the following questions. Note that the question mark ‘?’ is also a token of the language. Don’t forget to show how you arrived at your answers.
• What is the probability of generating the following sentences: i. The colour is square ?
ii. What is not red
iii. Given the sequence of words, “It is …”, what is the most likely next word, and what is its probability?
(7)
(b) You are given the following Context-Free Grammar, G:
Grammar
Lexicon
S → NP VP | VP
S → Aux NP VP
NP → Det Nom | NPN
Nom → Nom N | Adj Nom | Nom PP | N VP→ V NP | V
VP → V PP | VP PP | V Adv
PP → Prep NP
Det → a | the
N → run | mile V → run
NPN → London Aux → does Adv → fast
Adj → fast
Prep → in
Convert it to Chomsky Normal Form (CNF). Note, that you do not have to deal with the rules under the Lexicon column. (7)
(c) Using the original grammar, G, in part (c), parse the sentence ‘Run a mile in London’, clearly showing how you applied the rules of the grammar by showing the sequence of resulting rewrites. The order in which you apply the rewrite rules does not matter for this question as long as you have applied them correctly.
(4)
10 of 11
Q1–Q2 RP, Q3–Q4 AE F29AI (d) Now assume that we add the following rule to the original grammar
G:
VP →V NP PP
Explain how this extra rule gives rise to structural ambiguity in pars- ing the same sentence, “Run a mile in London”. Using the new ex- panded grammar, produce two parse trees for the same sentence, one corresponding to your derivation in (d), and another (you don’t need to give the derivation for this).
(7)
END OF PAPER
11 of 11