tutorial9.dvi
COMP9414: Artificial Intelligence
Tutorial 9: Neural Networks/Reinforcement Learning
1. (i) Construct by hand a perceptron which correctly classifies the following data; use your
knowledge of plane geometry to choose values for the weights w0, w1 and w2.
Training Example x1 x2 Class
a 0 1 −
b 2 0 −
c 1 1 +
(ii) Simulate the perceptron learning algorithm on the above data, using a learning rate of
1.0 and initial weight values of w0 = −0.5, w1 = 0 and w2 = 1. In your answer, clearly
indicate the new weight values at the end of each training step.
2. Explain how each of the following could be constructed:
(i) Perceptron to compute the OR function of m inputs
(ii) Perceptron to compute the AND function of n inputs
(iii) 2-Layer neural network to compute any (given) logical expression written in CNF
3. Consider a world with two states S = {S1, S2} and two actions A = {a1, a2}, where the
transitions δ and reward r for each state and action are as follows:
δ(S1, a1) = S1 r(S1, a1) = 0
δ(S1, a2) = S2 r(S1, a2) = −1
δ(S2, a1) = S2 r(S2, a1) = +1
δ(S2, a2) = S1 r(S2, a2) = +5
(i) Draw a picture of this world, using circles for the states and arrows for the transitions.
(ii) Assuming a discount factor of γ = 0.9, determine:
(a) the optimal policy π∗ : S → A
(b) the optimal value function V ∗ : S → R
(c) the Q function Q : S ×A → R for the optimal policy
(iii) Write the Q values in a table.
(iv) Trace through the first few steps of the Q-learning algorithm on some randomly chosen
input, with all Q values initially set to zero. Explain why it is necessary for the agent
to explore the environment through probabilistic choice of actions in order to ensure
convergence to the true Q values.