CS计算机代考程序代写 chain deep learning algorithm scheme UC Berkeley – Computer Science

UC Berkeley – Computer Science
CS188: Introduction to Artificial Intelligence Josh Hug and Adam Janin
Final, Fall 2016
This test has ​10 questions worth a total of ​100 points, to be completed in 170 minutes. The exam is closed book, except that you are allowed to use three two-sided hand written cheat sheets. No calculators or other electronic devices are permitted. Give your answers and show your work in the space provided.
Write the statement out below in the blank provided and sign. You may do this before the exam begins.
Any plagiarism, no matter how minor, will result in an F.
“I have neither given nor received any assistance in the taking of this exam.” _________________________________________________________________________________________ _________________________________________________________________________________________
Signature: _______________________________
Name: _____________________________ Your EdX Login: ___________________________ SID: _____________________________ Name of person to left:_______________________ Exam Room: _______________________ Name of person to right: _____________________ Primary TA: _______________________
Tips:
● ◯ indicates that only one circle should be filled in.
● ▢ indicates that more than one box may be filled in.
● Be sure to fill in the ◯ ​and ​▢ boxes completely and erase fully if you change your answer.
● There may be partial credit for incomplete answers. Write as much of the solution as you can, but bear in
mind that we may deduct points if your answers are much more complicated than necessary.
● There are a lot of problems on this exam. Work through the ones with which you are comfortable first. ​Do
not get overly captivated by interesting problems or complex corner cases you’re not sure about.
● Not all information provided in a problem may be useful.
● There are some problems on this exam with a slow brute force approach and a faster, clever
approach. Think before you start calculating with lots of numbers!
● Write the last four digits of your SID on each page in case pages get shuffled during scanning.
Optional. Mark along the line to show your feelings Before exam: [:( ​ ​____________________​☺​]. on the spectrum between :(​ ​and ​☺​. After exam: [​:( ​____________________​☺​].
1
Problem
1
2
3
4
5
6
7
8
9
10
Points
7
6
8.5
8.5
17
14
9
9
10.5
10.5

Last 4 digits of SID: ______________ CS188 FINAL, FALL 2016
1. (7 pts) Naive Bayes – Josh Detector
Suppose we want to find if Josh is in this office (Y = 1) or not (Y = 0) by analyzing the measurements of the following three sensors deployed in his office:
– a chair pressure sensor informs you if someone is on his chair (Xc = 1) or not (Xc = 0)
– a motion detector tells you if someone moves in the room (Xm = 1) or not (Xm = 0)
– a power meter indicates if someone consumes electricity in the room ( X p = 1 ) or not ( X p = 0)
Each sensor above only reveals partial information about Josh’s presence. For instance, if Josh is in his office but not sitting on the chair, then the chair pressure sensor can not reliably indicate Josh’s presence. Suppose we have some historical data logs of sensor measurements and Josh’s presence status:
Day
1
2
3
4
5
6
7
8
9
10
Xc
0
0
0
0
0
1
0
1
0
1
Xm
0
0
0
0
0
0
0
0
1
1
Xp
1
1
0
0
1
1
1
1
0
0
Y
0
0
0
0
1
1
1
1
1
1
i. (2 pts) Let us use Naive Bayes to model the sensor measurements and Josh’s presence status as shown in the Bayes Net model below. Fill in the maximum likelihood estimate (MLE) of each entry in the probability tables.
(B1)
Y
P (Y )
0
2/5
1
3/5
Y
Xc
P(Xc|Y )
0
0
1
0
1
0
1
0
1/2
1
1
1/2
Y
Xm
P(Xm|Y )
0
0
1
0
1
0
1
0
2/3
1
1
1/3
Y
Xp
P(Xp|Y )
0
0
1/2
0
1
1/2
1
0
1/3
1
1
2/3
2

Last 4 digits of SID: ______________ CS188 FINAL, FALL 2016
ii. (1 pt) Suppose we get a new set of sensor observations: Xc = 0, Xm = 0 , Xp = 1 . According to the probability tables’ maximum likelihood estimate, is Josh more likely to be present or absent?
◯ Present ⬤​ Absent ◯ Equally Likely ◯ Not Enough Information Explanation:​ We compare:
P(Y = 0,Xc = 0,Xm = 0,Xp = 1) = 2/5*1*1*1/2 = 1/5
P(Y = 1,Xc = 0,Xm = 0,Xp = 1) = 3/5*2/3*2/3*1/3 = 4/45
The former is larger, so we predict that Josh is more likely to be absent.
Notice that this is true even though Y =0,Xc =0,Xm =0,Xp =1 and Y =1,Xc =0,Xm =0,Xp =1 appeared equally frequently in our samples.
iii. (1 pt) Suppose that instead of maximum likelihood, we now use Laplace Smoothing to estimate each entry in the following probability tables. Assume the prior strength k = 1 .
Y
P (Y )
0
5/12
1
7/12
Y
Xc
P(Xc|Y )
0
0
5/6
0
1
1/6
1
0
1/2
1
1
1/2
iv.​ ​(1 pt) What happens to our CPTs as we increase k, i.e. what sort of distribution do we get in the large k limit? Solution:​ In the large k limit, we get uniform distributions in each CPT.
v. (1 pt) Suppose we want to generate exactly two samples from the distribution P(Y, X​c,​ X​m,​ X​p)​ given by the CPTs you computed from MLE estimates in part i. Suppose we run Gibbs sampling for a long time so that the samples are indeed drawn from the correct distribution, and get the resulting sample GS1: (Y = 1, X​c = 0, X​m = 1, X​p = 0). Suppose we then generate a new sample GS2 by resampling from P(X​p | Y = 1, X​c = 0, X​m = 1). What are the different possible values for GS2, and how likely is each?
Possible Value #1 for GS2: Y = __​1​__, Xc​ ​ = __​0​__, X​m​ = __​1​__, X​p​ = __​0​__, probability: ___​1/3​___ Possible Value #2 for GS2: Y = __​1​__, Xc​ ​ = __​0​__, X​m​ = __​1​__, X​p​ = __​1​__, probability: ___​2/3​___ Explanation:​ The probabilities utilize P (Xp|Y , Xc, Xm) = P (Xp|Y ) .
3

Last 4 digits of SID: ______________ CS188 FINAL, FALL 2016
vi. (1 pt) Suppose we instead use prior sampling to generate samples PS1 and PS2. Which samples are of higher quality, the two samples generated by Gibbs Sampling or the two samples generated by Prior Sampling? We have intentionally left “higher quality” undefined. Use your best judgment. For partial credit in the event of a wrong answer, very briefly explain your answer below.
​⬤​ GS1 and GS2 are higher quality ​⬤​ PS1 and PS2 are higher quality ◯ They are the same quality
Intended solution:​ When we have many samples, prior sampling and Gibbs sampling perform equally well. However, when there are only a few samples, prior sampling is better than Gibbs sampling, because the samples generated by Gibbs sampling are correlated with one another (each successive sample only differs by a single variable). For example, if there are only two samples for four variables, Gibbs sampling will only see one value for three of the variables, while prior sampling avoids this correlation bias.
Alternate interpretation:​ Some students didn’t realize that the samples are for the same query as in the previous part (v) (where no is evidence), and considered the possibility that we have a query that ​does​ have evidence. In that case, Gibbs sampling is better than prior sampling, since the latter generates samples without paying attention to the evidence.
4

Last 4 digits of SID: ______________ CS188 FINAL, FALL 2016
2. MDPs [6 pts]
Consider the following Markov Decision Process. Unlike most MDP models where actions have many potential outcomes of varying probability, assume that the transitions are ​deterministic​, i.e. occur with T(s, a, s’)=100% probability as shown in the graph below. The reward for each transition is displayed adjacent to each edge:
i. (3 pts) Compute the following quantities, assuming a discount factor of 1. Recall that V​k(​ s) r​ epresents the expected utility of the state s under the optimal policy assuming the game ends in k more time steps, and V*(s) is the expected utility starting in state s if we play optimally forever.
V 2(A) = R(A,C) + R(C,D) = 13 V 5(A) = R(A,C) + R(C,D) + R(D,B) + R(B,E) + R(E,E) = 26 V *(A) = ∞ from taking the cycle C − D − B repeatedly.
V2(B) = R(B,E)+R(E,E) = 9 V5(B) = R(B,C)+R(C,D)+R(D,B)+R(B,E)+R(E,E) = 17 V *(B) = ∞ from taking the cycle B − C − D repeatedly.
ii. (1 pt) Consider using feature-based Q-learning to learn what the optimal policy should be. Suppose we use the feature f​1(​ s, a) equal to the smallest number of edges from s to E, and feature f​2(​ s, a) equal to the smallest number of edges from a’s target to E. For example f​1(​ A, →B) is 2, and f​2(​ C, →D) is 1.
Let the current weight for these features be [3, 2]. What is the current prediction for Q(C, →D) given these weights for these two features?
Q(C,→D)= f1(C,→D)*w1 +f2(C,→D)*w2 =2*3+1*2=8
iii. (2 pts) Suppose we update w based on a single observation of a transition from C to D, assuming α = 0.5 . What is w after updating based on this observation?
w = ​[6, 3.5]
Explanation:
The difference between the actual reward and the predicted reward is:
Δ = (R(C, →D) + max Q(D, a′)) − Q(C, →D) = (6 + 5) − 8 = 3 . a′
We thus updates our new weights to be: w+Δ*α*f =[3,2]+3*0.5*[2,1]=[6,3.5]
5

Last 4 digits of SID: ______________ CS188 FINAL, FALL 2016
3. Probability (8.5 pts)
Consider the following CPT, ​to be used for part i only!
A
B
C
P(A, B, C)
+a
+b
+c
0.15
+a
+b
-c
0.2
+a
-b
+c
0.3
+a
-b
-c
0.1
-a
+b
+c
0.15
-a
+b
-c
0
-a
-b
+c
0.05
-a
-b
-c
0.05
i. (2 pts) Compute the following expressions for the table above. Write each answer as a single numerical value. It is ok to leave your answer as a fraction of decimal numbers.
● P(+a | -c) = P (+a,−c) = 0.2+0.1 = 6/7 P (−c) 0.2+0.1+0.+0.05
● P(+b|-a,-c)= P(−a,+b,−c) = 0 =0
P (−a,−c)
● P(-a | +b, +c) = P (−a,+b,+c) =
0.+0.05
P (+b, +c) ● P(-b, +c | -a) = P (−a,−b,+c)
0.15 = 1/2 0.15.+0.15
P (−a)
= 0.05 0.15+0+0.05+0.05
= 1/5
ii. (1 pt) Let A, B, and C be binary random variables. Select all of the following CPTs that are guaranteed to have 4 rows ​for ANY distribution​, not the table above.
​ P(+a|B,C) ​P(B,C|+a) ​P(B|A) ⃞ P(+a,C|+b)
Explanation:​ Since we have binary random variables, the number of rows that a distribution has is equal to 2 to (the number of unspecified variables in the distribution).
iii. (2 pts) Select all of the following that are guaranteed to equal 1 for ANY distribution, not the table above.
​ P(+a) + P(-a)
​ P(+a | -c)P(-c) / (P(+a, -c, +b) + P(+a, -c, -b)) ⃞ P(+a | +b)P(+b) + P(-a | -b)P(-b)
⃞ P(-c | -a)P(-a) + P(-c | +a)P(+a)
Explanation:
The first is true by the law of total probability.
The second’s numerator is equal to P (+ a,− c) by the product rule, and the denominator is equal to summing B to also get P(+ a,− c).
The third is equal to P (+ a,+ b) + P (− a,− b) , which is not simplifiable.
Thefourthisequalto P(−a,−c)+P(+a,−c)=P(−c).
6

Last 4 digits of SID: ______________ CS188 FINAL, FALL 2016
For parts iv – vi: It’s winter break, and you’re staying at Sherdil’s Casino. Every morning, you receive (as a gift) a random ​odd​ number of gambling chips, C, in the range 1 to k, where k is some odd integer constant, according to the distribution:
Assume the number of chips you receive on a particular day is independent of the number of chips you received on any and all prior days.
iv. (1 pt) Let the number of chips you receive on days 1, 2, and 3 be C1, C2, and C3 respectively. Is C3 conditionally independent of C1 given C2? (Answer yes or no. No explanation is needed.)
Yes No
Explanation:​ Depends on if we know k.
If we don’t know k, then the answer is no because C1 gives us information about k , which C3 is dependent on. If we know k, then the answer is yes.
v. (1 pt) Suppose ​k​ is 5. On average, how many chips can you expect to receive per day? E[# chips] = ​3
Explanation: E[C] = 1/3*1+1/3*3+1/3*5 = 3.
vi. (1.5 pts) Suppose you stay for five days and receive the following chip amounts: {5, 11, 1, 9, 9}. Based on your five samples, compute the maximum likelihood estimate for ​k​.
MLE of k = ​11
Explanation:
Intuitively, as long as k is at least 11, the smaller the k the larger the probability of getting any particular outcome.
Formally, the log likelihood of the outcome is:
ll(k|C)=P(C|k)=log∏ 2 =∑(log2−log(k+1))=5log2−5log(k+1) if k≥11 else 0 k+1
This function is strictly decreasing as k increases, so k = 11 is the maximum likelihood estimate.
Common Mistakes:
● The question says that k is an odd integer. Any answer that is not an odd integer is not a plausible answer; this can be a good sanity check on the answer.
● The most common incorrect answer was 13. This is presumably from using a technique like method of moments (i.e. finding k such that E[k] is equal to the average of the sample), since if k = 13 , then
E[k] = 7 . Method of moments sometimes, but ​does not always​, find the maximum likelihood estimate.
2016 Most Valuable Goat Celebration Zone
7

Last 4 digits of SID: ______________ CS188 FINAL, FALL 2016
4. CSPs (8.5 pts)
In the game of Hidato, we start with a partially filled grid with N blanks, and our goal is to fill in all the blanks with integers 1 through N such that every number is unique, and every number q (except N) appears adjacent to its successor q + 1, with diagonal moves allowed. For example, if we start from the figure on the left, we can get the solution on the right. Black squares act as obstacles and never receive numbers.
We can formulate this as a CSP, where the domain of the variables is {1, …, N}, where N is the number of non-black (i.e. initially blank) squares.
i. (2.5 pts) If we start from the figure below with 4 variables assigned (to 1, 3, 9, and 14), and run AC-3 (our arc-consistency algorithm), what values remain in the domain of the variable marked with an X? Assume that all 10 blank squares (including X and Y) are unassigned.
Arc-consistency involves processing of binary arcs only, so we must start by deciding what binary arcs exist. Our original intention was that you would derive and apply the following two constraints:
● NO_EQUAL_SQUARES (NES): There are binary arcs between all squares which specify that they cannot be equal (and a separate set of constraints that require numbers to be in a sequence)
● NO_DISTANT_SUCCESSORS (NDS): There are binary arcs between non-adjacent squares that specify that they cannot be one away from each other.
In addition, you can also add a 3rd constraint that is a more powerful version of NDS:
● RADIAL_DIFFERENCE_TEST (RDT): There are binary arcs between every pair of states such that if
they have values Xi and Xj respectively then |Xi – Xj| < distance between those arcs. It it redundant with NDS as far as solutions go, but it makes arc consistency more powerful. However: it is also possible to cast the Hidato problem without any binary constraints, in which case arc-consistency won’t do anything at all. For example, consider the following multi-variable constraints: ● ALL_DISTINCT: A single N-ary constraint that ensures all variables are distinct. ● ALL_PRESENT: A single N-ary constraint that says that for every X_i, one of them is i. ● NEIGHBOR_IS_ADJACENT: For each node, its larger successor is one of the neighbors. This is a Q-ary constraint, where Q-1 is the number of neighbors. In retrospect, we should have asked you to explicitly enumerate your binary constraints, but we did not, so we gave you credit for ANY valid answer, even if you did not write out your binary constraints. 8 Last 4 digits of SID: ______________ CS188 FINAL, FALL 2016 Assuming no binary constraints: The right answer is: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14} If we apply arc-consistency considering only the arcs from X to 1, 3, 9, and 14, then: ● NES eliminates: {1, 3, 9, 14} ● NDS eliminates: {8, 10} ● RDT eliminates: {7, 8, 10, 11} Completing the entire AC-3 algorithm yields no further eliminations from X. So any answer that includes {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14} minus any of the entire sets above is eligible for full credit. Sample correct answers: Domain of X ​(assuming no binary constraints):​ __​{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14}​___ Domain of X ​(assuming NO_EQUAL_SQUARES):​ ___​{2, 4, 5, 6, 7, 8, 10, 11, 12, 13}​____ Domain of X ​(assuming NES and NDS):​ ______​{2, 4, 5, 6, 7, 11, 12, 13}​_________ Domain of X ​(assuming NES and RDT):​ ______​{2, 4, 5, 6, 12, 13}​_________ ii. (2.5 pts) If we assign X=2, and re-run AC-3, what will Y’s domain be after AC-3 completes? Assume that all 9 blank squares (including Y) other than X are unassigned. Similar to above, the answer depends on which binary constraints you came up with and applied. In this case, the possibilities are: ● NES eliminates {1, 2, 3, 9, 14} ● NDS eliminates: {8, 10, 13} ● RDT eliminates: (nothing) As before, any combination that involves eliminating zero or more of the entire sets above is fine. For example: Domain of Y ​(assuming no binary arcs):​ __​{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14}​___ Domain of Y ​(assuming NES):​ ___​{4, 5, 6, 7, 8, 10, 11, 12, 13}​____ Domain of Y ​(assuming NES and NDS):​ ______​{4, 5, 6, 7, 11, 12}​_________ Domain of Y ​(assuming NDS only):​ ______​{1, 2, 3, 4, 5, 6, 7, 9, 11, 12, 14}​_________ iii. (1 pt) If we ensure arc-consistency after every assignment in any Hidato puzzle, we’ll never backtrack. ◯ True False Explanation:​ Even if we have only binary constraints, the constraint graph is not a tree, so backtracking is possible. iv. (2.5 pts) Repeat part i, but suppose we run an algorithm that ensures 3-consistency instead of arc-consistency (recall that arc-consistency is just 2-consistency). In this case, what is the domain of X? Explanation:​ In the CSP lecture, we strongly implied that you didn’t need to know how k-consistency works for k > 2. Oops.
Domain of X: ___​[Beyond scope of course]​_____
9

Last 4 digits of SID: ______________ CS188 FINAL, FALL 2016
5. (17 pts) Potpourri
i. (2.5 pts) You are given a choice between the following:
● A random lottery in which you receive either $A or $B uniformly at random.
● An event in which you receive $​k​, where ​k​ is a positive real-valued constant.
Your utility function for money is U(d) = d2 where U(d) is the utility corresponding to d dollars. For what values of ​k should you take the lottery? Give your answer as an interval or a union of intervals as appropriate, e.g.,
[12, ∞] or [5, 9] ⋃ [7, ∞) . You may handle ties however you wish. Assume that A, B, and k are non-negative.
k ∈ [0, √(A2 + B2)/2]
Explanation: ​We value the lottery at U(L) = (A2 + B2)/2 . Thus, we will take the lottery if U(k) = k2 is less than or equal to this.
For parts ii and iii: Let X , Y , and Z be independent random variables with the following distributions:
You can take either action 1 or action 2.
● Action1hasutility U1(X,Y,Z)=10*X−3*Y *Z
● Action2hasutility U2(X,Y,Z)=10*X+5*Y −5*Y *Z
ii. (3.5 pts) Calculate the maximum expected utility. Hint, to avoid tedious calculations, recall from your prior coursework that E[A*B] = E[A]*E[B], so long as A and B are independent variables, and that E[X+Y] = E[X] + E[Y] for any random variables X and Y.
MEU({}) = − 72
Explanation:
Wefirstcompute E[X]=0,E[Y]=6,E[Z]=4.​​Since Y and Z areindependent, E[YZ]=E[Y]E[Z]=24. Wecanthencalculate E[U1]=10E[X]−3E[YZ]=−72 and E[U2]=10E[X]+5E[Y]−5E[YZ]=−90,sowe should take the first utility value, getting M E U ({}) =− 72 .
X
P(X)
Y
P (Y )
Z
P(Z)
3
2/3
2
1/2
−2
1/4
−6
1/3
10
1/2
6
3/4
10

Last 4 digits of SID: ______________ CS188 FINAL, FALL 2016
iii. (2 pts) Which variable has the lowest VPI? Justify your answer either in words or numerically. Please restrict any answers to fewer than 20 words.
X◯Y◯Z
Explanation:​ V P I (X ) = 0 because the way X affects the utility is independent of the action. Since VPIs are nonnegative, this variable must have the lowest VPI (or be tied for this property, but we can verify that the VPIs of the other two are positive).
Specifically, one way of seeing this is to note that U1(X,Y ,Z) − U2(X,Y ,Z) = 2Y Z − 5Y , which does not depend on X .
Common Mistakes:
● Sayingthat VPI(X)=0 because E[X]=0 orthat E[X] N​t(​ 1)



Could be in any order
F​t(​ i) = N​t(​ i)



By coincidence
26

Last 4 digits of SID: ______________ CS188 FINAL, FALL 2016
Part 2: One way to try to detect words is to build an HMM for each possible word. The state transition diagram for the HMM is shown below, where state i is the ith phoneme of the word, and T​i,j is the probability of moving from state i to state j. For example, for the word waka, we might assume that there is a 20% chance that that a speaker saying the ​/a/ sound keeps making that sound 10 ms later, and an 80% chance that they move on to the /k/ sound. In that case, T​1,1 would be 0.2 and T​1,2 would be 0.8. T​3,4 represents the speaker finishes the word. If you’d like, you can think of the terminal state is the speaker being silent.
ii. (2.5 pts) Fill in the table, similar to what you did in part i. Mark “Sometimes” even if the chance is very small.
Always
Sometimes
Never
Reason
0 <= T​i,k​ <= 1.0 ⬤ ◯ ◯ Probability T​0,1​ + T​0,0​ = 1.0 ⬤ ◯ ◯ Probability T​0,1​ + T​1,2​ + T​2,3​ = 1.0 ◯ ⬤ ◯ By coincidence only T​1,1​ > T​2,2



By coincidence only
T​1,1​ > T​3,3



These are the same value. See NOTE below.
Part 3: Suppose we have a sound file containing a single word. Suppose this sound file is S samples long, has 13 bins of spectral energy, and that Pacmanian has 20 phonemes. ​For iii through vi, suppose that we want to use our HMM from above to determine whether the word is “waka”.
iii. (2 pts) How many ​hidden variables must our HMM have? Recall that an HMM is just a Bayes Net with a special structure.
Solution: S, as we have a hidden variable Q​t for each time step, where P(Q​t = q​t)​ represents the probability that the phoneme spoken during time t is qt. Each variable is a vector of 20 probabilities (just like the weather HMM example from lecture, where the variables had values “cloudy”, “rainy”, etc.)
27

Last 4 digits of SID: ______________ CS188 FINAL, FALL 2016
iv. (2 pts) If q​5 is our 5th hidden variable, and q​6 is our 6th hidden variable, how many rows does P(q​6 | q​5)​ have? If the problem seems ambiguous, make sure to state your assumptions.
Solution:​ 400: there are 20 possible values for q5 , and 20 possible values for q6 .
Alternate solution: ​441, where we also allow silence, is also accepted.
Alternate solution 2: If you assume that the HMM ​only needs to support the word “waka”, then the only rows that the transition matrix need to support are a → a,a → k,k → a,k → k,w → a,w → w, and maybe also a → EXIT . ​Other solutions also exist depending on assumptions of what’s in the CPT– see rubric on
gradescope. See also NOTE below.
v. (2 pt) For each hidden variable, how many scalar evidence values are there?
Solution:​ 13: one for each spectral bin.
NOTE: Several students have asked about the two /a/’s in waka. They are the same /a/. ​The issue is the same as the weather HMM example, where all you have is a probability of it being cloudy tomorrow given it was cloudy today. It’s not dependent on the history. In fact, researchers have tried making each phoneme word-position dependent, but it very quickly becomes intractable. Imagine if each and every /a/ were it’s own entry — the CPTs would be huge.
28

Last 4 digits of SID: ______________ CS188 FINAL, FALL 2016
Appendix: Alternate approaches to Problem #6
6.i One common incorrect answer was to either have transitions directly between Q-states (top two diagrams), or to have a special “opponent” node with probabilistic behavior. None of these is an MDP search tree. The top left model is fundamentally broken, since it has no Q state for p2=”pass”. The top right model is strange because we have to allow transitions directly between Q-states, but could theoretically work. The bottom model is an abuse of the symbol for a minimizing agent, but is ultimately equivalent to the top right model.
6.ii If you had one of the incorrect solutions for part i, then you didn’t actually draw an MDP search tree. In this case, the question is a bit open to interpretation: the smallest transition probability was either 1/4, 1/8, or 1/32, depending on if you counted the entire aggregate transition probability from the Q-states given to the roots, as well as how you built your model.
29

Last 4 digits of SID: ______________ CS188 FINAL, FALL 2016
6.iv. If you used model 1 or 2, the correct answer is zero, not Z.
30