CIS 471/571: Introduction to Artificial Intelligence, Fall 2020
FINAL EXAM
• You have approximately 120 minutes. • The exam is open book.
First Name
Last Name
UID
1
Q1. Graph Search (15 points)
Q1.1. Search Algorithms. (7.5 points) Consider the following graph. For the following sub-questions, ties are broken in alphabetical order.
B
h=5 4
2 A2D
3 h=7 2
Start F
h=9 h=4
4114
C3E h=5 h=4
2
h=2 2
Goal h=0
1. (2.5 pts) What path would UCS return?
2. (2.5 pts) What path would Greedy Search return?
3. (2.5 pts) What path would A∗ return?
Q1.2. Missing Heuristic Values. (7.5 points) Consider the state space graph shown below in which some of the states are missing a heuristic value. Determine the possible range for each missing heuristic value so that the heuristic is admissible and consistent. If this isn’t possible, write so.
B
h=5 4
2 A2D
3 h=? 2
Start F
h=9 h=4
4114
C3E h=5 h=?
2
h=? 2
Goal h=0
2
State
Range for h(s)
A
≤ h(A) ≤
D
≤ h(D) ≤
E
≤ h(E) ≤
Q2. Adversarial Search (20 points)
Q2.1. Search Algorithms. (15 points) Consider the zero-sum game tree shown below. Triangles that point up, such as at the top node (root), represent choices for the maximizing player; triangles that point down represent choices for the minimizing player. Outcome values for the maximizing player are listed for each leaf node, represented by the values in circles at the bottom of the tree.
A Player 1
B C D Player 2
E F G H K LPlayer1 -5 -9 -4 10 -6 -7 -5 8 -5 -1 -2 10
1. (5 points) Assuming both players act optimally, carry out the minimax search algorithm. Enter the values for the letter nodes.
2. (10 points) Assuming both players act optimally, use alpha-beta pruning to find the value of the root node. The search goes from left to right; when choosing which child to visit first, choose the left-most unvisited child.
• Enter the values of the letter nodes.
3
• Select the leaf nodes that don’t get visited due to pruning.
Q2.2. Unknown Leaf Nodes. (5 points) Consider the following game tree with the same
setting as Q2.1 except the leaf has an unknown payoff, y.
A Player 1
B C D Player 2
E F G H K LPlayer1 -5 -9 -4 10 -6 -7 -5 8 -5 y -2 10
Similar to Q2.1.2, by using alpha-beta pruning, there are some pruning on the branches starting from nodes B and C. For what values of y such that the pruning shown in the figure will also be achieved? The search goes from left to right; when choosing which child to visit first, choose the left-most unvisited child. Please specify your answer in one of the following forms:
• Write All if y can take on all values
• Write None if y has no possible values
• Use an inequality in the form y < {value}, y > {value}, or {value1} < y < {value2} to specify an interval of values. As an example, if you think y can take on all values larger than 16, you should enter y > 16.
4
Q3. Markov Decision Process (15 points)
Pacman is an agent in deterministic MDP with states A, B, C, D, E, F, G. He can deterministically choose to follow any edge pointing out of the state he is currently in. He cannot stay in place. D, E, F , and G are terminal states. Let the discount factor be γ = 0.8. Pacman receives the reward value labeled underneath a state upon entering that state. For example, R(A, A → C, C) = R(C) = +10.
A +10
B -10
E -1000
C +10
D +100
G +50
F -50
1. Write down the optimal value V ∗(s) and the optimal policy π∗(s) for s = A and s = C.
2. Pacman is now becomes indecisive if he enters state C. In state C, he finds the two best actions (among the four actions he can take from C) and randomly, with equal probability, chooses between the two. Let V ̄ (s) be the values under the policy were Pacman acts according to π∗(s) for all s ̸= C, and follows the indecisive policy when at state C. What are the values V ̄(s) for s = A and s = C?
5
Q4. Reinforcement Learning (15 points)
Q4.1. True/False. (5 points) Which of the following statements are true?
1. In Direct Evaluation, you recalculate state values after each transition you experience.
2. Theexplorationfunctionf=R(s,a,s′)+ln(1+ 1 )(whereN(s,a)referstothenumber N (s,a)
of times that you have visited state s and taken action a in your samples) would encourage the agent to visit unseen states and actions.
3. In approximate Q-learning, different states always have different feature-representations.
Q4.2. Q-learning (10 points) Consider a non-deterministic MDP as shown in the following figure with five states {A, B, C, D, E}. The agent can choose to follow any edge pointing out of the state he is currently in. The transition and reward functions are unknown. When the agent takes
AB E
DC
an action from a state, the agent may end up staying still or moving to one of neighboring states that can be reached from that state. For example, if the agent is at state B and taking action B → C, the resulting state can be any state in {A,B,C}. The discount factor is γ = 0.5. The learning rate α = 0.5
Suppose agent chooses actions according to some policy π and generates the following sequence of actions and rewards in the unknown game:
Assume that all Q-values are initialized as 0. What are the Q-values learned by running Q- learning with the above experience sequence?
6
t
st
at
st+1
rt
0
B
B→C
B
4
1
B
B→C
C
-3
2
C
C→E
E
3
3
E
E→A
A
1
Q5. Bayes Nets (35 points)
Q5.1. Independence. (10 points) Consider the following Bayesian network with 6 variables {X1,X2,…,X6}.
X1
X4
X2
X3
X5
Which of the following statements are true: 1. X2 ⊥X6 |X1,X5
2. X4 ⊥X5 |X3
Q5.2. Inference (15 points) Assume the following Bayes Net and corresponding CPTs.
Compute the following conditional probabilities: 7
X6
• P(C=1|A=1)
• P(E=1|A=1,D=1)
Q4.3. Gibbs Sampling. (10 points) We will work on the same Bayes net and its corresponding CPTs as in Q4.2. We observe the value of the variable C = 0. In this question, we will perform Gibbs sampling, using the random samples given in the table below.
When generating random samples, use as many values as needed from the table below, which we generated independently and uniformly at random from [0, 1). Use numbers from left to right. To sample a binary variable W with probability P(W = 0) = p and P(W = 1) = 1−p using a valueafromthetable,chooseW =0ifa