2020/8/14 COMP9444 Exercises 5 Solutions
COMP9444 Neural Networks and Deep Learning Term 2, 2020
Solutions to Exercise 5: Hidden Units and Convolution This page was last updated: 06/22/2020 15:47:48
1. Hidden Unit Geometry
Consider a fully connected feedforward neural network with 6 inputs, 2 hidden units and 3 outputs, using tanh activation at the hidden units and sigmoid at the outputs. Suppose this network is trained on the following data, and that the training is successful.
Outputs
Item Inputs —- ——
——- 123456 123
—- ——
1. 100000 000
2. 010000 001
3. 001000 010
4. 000100 100
5. 000010 101
6. 000001 110
Draw a diagram showing:
a. for each input, a point in hidden unit space corresponding to that input, and
b. for each output, a line dividing the hidden unit space into regions for which the value of that output is greater/less than one half.
2. Softmax
Recall that the formula for Softmax is Prob( ) = exp(z ) / ¦² exp( )
Consider a classification task with three classes 1, 2, 3. Suppose a particular input is presented, producing outputs
——-
1=1.0, 2=2.0, 3=3.0
https://www.cse.unsw.edu.au/~cs9444/20T2/tut/sol/Ex5_Convolution_sol.html 1/3
zzz
jzji i
2020/8/14 COMP9444 Exercises 5 Solutions
and that the correct class for this input is Class 2. Compute the following, to two decimal places:
a. Prob( ), for = 1, 2, 3
Prob(1) = e1/(e1 + e2 + e3) = 2.718/30.193 = 0.09 Prob(2) = e2/(e1 + e2 + e3) = 7.389/30.193 = 0.24 Prob(3) = e3/(e1 + e2 + e3) = 20.086/30.193 = 0.67
b. d(log Prob(2))/d , for
d(log Prob(2))/d
-0.09
d(log Prob(2))/d
1 – 0.24 = 0.76 d(log Prob(2))/d
-0.67
= 1, 2, 3
= d( 2 – log ¦² exp( j))/d
= d( 2 – log ¦² exp( j))/d = d( 2 – log ¦² exp( j))/d
= -exp( 1)/¦² exp( j) =
= 1 – exp( 2)/¦² exp( j) = = -exp( 3)/¦² exp( j) =
Note how the correct class (2) is pushed up, while the incorrect class with the highest activation (3) is pushed down the most.
3. Convolutional Network Architecture
One of the early papers on Deep Q-Learning for Atari games (Mnih et al, 2013)
contains this description of its Convolutional Neural Network:
“The input to the neural network consists of an 84 ¡Á 84 ¡Á 4 image. The first hidden layer convolves 16 8 ¡Á 8 filters with stride 4 with the input image and applies a rectifier nonlinearity. The second hidden layer convolves 32
4 ¡Á 4 filters with stride 2, again followed by a rectifier nonlinearity. The final hidden layer is fully-connected and consists of 256 rectifier units. The output layer is a fully-connected linear layer with a single output for each valid action. The number of valid actions varied between 4 and 18 on the games we considered.”
For each layer in this network, compute the number of
a. weights per neuron in this layer (including bias) b. neurons in this layer
c. connections into the neurons in this layer d. independent parameters in this layer
You should assume the input images are gray-scale, there is no padding, and there are 18 valid actions (outputs).
First Convolutional Layer:
J = K = 84, L = 4, M = N = 8, P = 0, s = 4
weights per neuron: 1 + M ¡Á N ¡Á L = 1 + 8 ¡Á 8 ¡Á 4 = 257 width and height of layer: 1+(J-M)/s = 1+(84-8)/4 = 20 neurons in layer: 20 ¡Á 20 ¡Á 16 = 6400
https://www.cse.unsw.edu.au/~cs9444/20T2/tut/sol/Ex5_Convolution_sol.html 2/3
z j z 3z z j z 3z z j z 2z z j z 2z z j z 1z z j z 1z
j jz
ii
2020/8/14 COMP9444 Exercises 5 Solutions
connections: 20 ¡Á 20 ¡Á 16 ¡Á 257 = 1644800 independent parameters: 16 ¡Á 257 = 4112
Second Convolutional Layer:
J = K = 20, L = 16, M = N = 4, P = 0, s = 2
weights per neuron: width and height of layer: neurons in layer: connections: independent parameters:
Fully Connected Layer:
weights per neuron: neurons in layer: connections: independent parameters:
Output Layer:
weights per neuron: neurons in layer: connections: independent parameters:
1 + M ¡Á N ¡Á L = 1 + 4 ¡Á 4 ¡Á 16 = 257 1+(J-M)/s = 1+(20-4)/2 = 9
9 ¡Á 9 ¡Á 32 = 2592
9 ¡Á 9 ¡Á 32 ¡Á 257 = 666144
32 ¡Á 257 = 8224
1 + 2592 = 2593
256
256 ¡Á 2593 = 663808 663808
1 + 256 = 257 18
18 ¡Á 257 = 4626 4626
https://www.cse.unsw.edu.au/~cs9444/20T2/tut/sol/Ex5_Convolution_sol.html 3/3