COMP3411/9814 Artificial Intelligence 20T1, 2020
Tutorial Solutions – Week 8 Question 1
Consider a world with two states S = {S1, S2} and two actions A = {a1, a2}, where the transitions δ and reward r for each state and action are as follows:
(i) Draw a picture of this world, using circles for the states and arrows for the transitions.
(ii) Assuming a discount factor of γ = 0.9, determine:
(iii)Write the Q values in a table.
(iv) Trace through the first few steps of the Q-learning algorithm, with all Q values initially set to zero. Explain why it is necessary to force exploration through probabilistic choice of actions in order to ensure convergence to the true Q values.
At this point, the table looks like this:
If the agent always chooses the current best action, it can have a policy where it always prefers a suboptimal action, e.g. a1 in state S2, so will never sufficiently explore action a2. This means that Q(S2,a2) will remain zero forever, instead of converging to the true value of 21.58. With exploration, the next few steps might look like this:
Now we have this table:
From this point on, the agent will prefer action a2 both in state S1 and in state S2. Further steps refine the Q value estimates, and, in the limit, they will converge to their true values.
Question 2
Consider the task of predicting whether children are likely to be hired to play members of the Von Trapp Family in a production of The Sound of Music, based on these data:
height
hair
eyes
hired
short
blond
blue
+
tall
red
blue
+
tall
blond
blue
+
tall
blond
brown
–
short
dark
blue
–
tall
dark
blue
–
tall
dark
brown
–
short
blond
brown
–
a. Compute the information (entropy) gain for each of the three attributes (height, hair, eyes) in terms of classifying objects as belonging to the class, + or – .
There are 3 objects in class ‘+’ and 5 in ‘-‘, so the entropy is: Entropy(parent) = Σi Pi log2Pi = -(3/8)log(3/8) – (5/8)log(5/8) = 0.954 Suppose we split on height:
Of the 3 ‘short’ items, 1 is ‘+’ and 2 are ‘-‘, so Entropy(short) = -(1/3)log(1/3) – (2/3)log(2/3) = 0.918
Of the 5 ‘tall’ items, 2 are ‘+’ and 3 are ‘-‘, so Entropy(tall) = -(2/5)log(2/5) – (3/5)log(3/5) = 0.971
The average entropy after splitting on ‘height’ is Entropy(height) = (3/8)(0.918) + (5/8)(0.971) = 0.951
The information gained by testing this attribute is: 0.954 – 0.951 = 0.003 (i.e. very little)
If we try splitting on ‘hair’ we find that the branch for ‘dark’ has 3 items, all ‘-‘ and the branch for ‘red’ has 1 item, in ‘+’. Thus, these branches require no further information to make a decision. The branch for ‘blond’ has 2 ‘+’ and 2 ‘-‘ items and so requires 1 bit. That is,
Entropy(hair) = (3/8)(0) + (1/8)(0) + (4/8)(1) = 0.5
and the information gained by testing hair is 0.954 – 0.5 = 0.454 bits.
By a similar calculation, the entropy for testing ‘eyes’ is (5/8)(0.971) + (3/8)(0) = 0.607, so the information gained is 0.954 – 0.607 = 0.347 bits.
Thus ‘hair’ gives us the maximum information gain.
b. Construct a decision tree based on the minimum entropy principle. Since the ‘blond’ branch for hair still contains a mixed population, we need to apply the procedure recursively to these four items. Note that we now only need to test ‘height’ and ‘eyes’ since the ‘hair’ attribute has already been used. If we split on ‘height’, the branch for ‘tall’ and ‘short’ will each contain one ‘+’ and one ‘-‘, so the entropy gain is zero. If we split on ‘eyes’, the ‘blue’ brach contains two ‘+’s and the ‘brown’ branch two ‘-‘s, so the tree is complete:
Question 3
The Laplace error estimate for pruning a node in a Decision Tree is given by:
where N is the total number of items, n is the number of items in the majority class and k is the number of classes. Given the following subtree, should the children be pruned or not? Show your calculations.
Error(Parent) = 1 – (7+1)/(11+2) = 1 – 8/13 = 5/13 = 0.385 Error(Left) = 1 – (2+1)/(3+2) = 1 – 3/5 = 2/5 = 0.4
Error(Right) = 1 – (6+1)/(8+2) = 1 – 7/10 = 3/10 = 0.3
Backed Up Error = (3/11)*(0.4) + (8/11)*(0.3) = 0.327 < 0.385 Since Error of Parent is larger than Backed Up Error ⇒ Don't Prune
Question 4
Construct a Decision Tree for the following set of examples.
What class is assigned to the instance {D15, Sunny, Hot, High, Weak}? 1.
2.
3.