COMP9414: Artificial Intelligence
Tutorial Week 10: Neural Networks/Reinforcement Learning
1. (i) Construct by hand a Perceptron which correctly classifies the following data; use your knowledge of plane geometry to choose values for the weights w0, w1 and w2.
(ii) Simulate the Perceptron Learning Algorithm on the above data, using a learning rate of 1.0 and initial weight values of w0 = −0.5, w1 = 0 and w2 = 1. In your answer, clearly indicate the new weight values at the end of each training step.
2. Explain how each of the following could be constructed:
(i) Perceptron to compute the OR function of m inputs (ii) Perceptron to compute the AND function of n inputs
(iii) 2-Layer neural network to compute any (given) logical expression written in CNF
3. Consider a world with two states S = {S1,S2} and two actions A = {a1,a2}, where the transitions δ and reward r for each state and action are as follows:
Training Example
x1
x2
Class
a
0
1
−1
b
2
0
−1
c
1
1
+1
δ(S1,a1)=S1 δ(S1,a2)=S2 δ(S2,a1)=S2 δ(S2,a2)=S1
r(S1,a1)=0 r(S1,a2)=−1 r(S2,a1)=+1 r(S2,a2)=+5
(i) Draw a picture of this world, using circles for the states and arrows for the transitions.
(ii) Assuming a discount factor of γ = 0.9, determine:
(a) the optimal policy π∗ : S → A (b) thevaluefunctionV∗ :S→R (c) theQfunctionQ:S×A→R
(iii) Write the Q values in a table.
(iv) Trace through the first few steps of the Q-learning algorithm, with all Q values initially set to zero. Explain why it is necessary to force exploration through probabilistic choice of actions in order to ensure convergence to the true Q values.