Note: An indicative mark is in front of each question. The total mark is 12. You may mark your own work when we release the solutions.
1. Using the definitions for o and h on slide 10 of Lecture 7 to show that if the activation function is linear such that g(a) = a, then the one-hidden-layer on that slide encodes a linear relationship between the input x and output o. Include all steps.
MLAI Week 7 Exercise: Neural Networks
h = g((W (1))T )x + b(1))
Copyright By PowCoder代写 加微信 powcoder
o = g((W (2))T )h + b(2))
o = g((W (2))T )g((W (1))T )x + b(1)) + b(2))
g is defined as g(a) = a
o = (W (2))T ((W (1))T )x + b(1)) + b(2))
o = (W(2))T (W(1))T x + (W(2))T b(1) + b(2)
Substitute W = (W (2))T (W (1))T ; b = (W (2))T b(1) + b(2) o=Wx+b
2. In Slide 38: we change the 3×3 kernel to 0 1 0 . What will be the 3×3 convolved
001 features? What features can this kernel detect?
01110 Image=0 0 1 1 1 0 0 1 1 0
100 Kernel=0 1 0
Apply kernel to image in the first position (the red indicated where the kernel is placed and the kernel values at their respective positions):
1×1 1×0 1×0 0 0
0×0 1×1 1×0 1 0 Image= 0×0 0×0 1×1 1 1 0 0 1 1 0
The ture after the first convolution, summing all the products between the image and the kernel values:
First ture = 1 + 0 + 0 + 0 + 1 + 0 + 0 + 0 + 1 = 3
3 tures =
The next step is to shift the kernel and perform the same operation:
1 1×1 1×0 0×0 0
0 1×0 1×1 1×0 0 Image= 0 0×0 1×0 1×1 1 0 0 1 1 0
Calculate the new convolved feature: 33 tures =
Perform the same operation for the entire image, resulting in the final – tures:
333 tures = 1 3 2 .
The kernel can detect diagonal edges in the image.
3 3. We have a 512×512×3 colour image. We apply 100 5×5 filters with stride 7, and pad 2 to obtain a convolution output. What is the output volume size? How many parameters are needed for such a layer?
Size of output:
Size of output: (Image Length – Filter Size + 2× Padding) / Stride + 1
Image Length = 512 Filter Size = 5 Stride = 7
Padding = 2
After applying the first 5 × 5 filter:
Output Size After First Filter = (512 − 5 + 2 × 2)/7 + 1 = 74
Final Output Shape = Number of Filters × Output Size × Output Size Final Output Shape = 100 × 74 × 74
Number of parameters:
Number of parameters = (Filter Width × Filter Height × Filters in Previous Layer +
1) × Number of Filters
Filter Width = 5
Filter Height = 5
Filters in Previous Layer = 3 Number of Filters = 100
Number of parameters = (5×5×3+1)×100 = 7600
6 4. For the AlexNet depicted in Slide 35 of Lecture 6, there are about 60 million learnable pa-
rameters. With the help of the illustration https://static.packt-cdn.com/products/ 9781789956177/graphics/assets/ec08175c-5282-4be2-b6e7-6f2d99272166.png, com- pute the exact number of learnable parameters in AlexNet, showing the steps.
The AlexNet consists of convolutional layers, pooling layers and fully connected layers. The pooling layer does not have any learnable parameters.
The number of parameters in the convolutional layer is:
Wc = K2 × C × N, (1)
where the K is the size of the kernel, C is the number of channels in the input and N is the number of kernels. In addition to the weights, there are also N bias values. The final number of parameters is Pc = N + Wc.
There are also two types of fully connected (FC) layer: the first is where the last pooling layer is connected to a FC layer, and the other is where a FC layer is connected to another FC layer.
The number of parameters in the first case is:
Wfc = O2 × N × F, (2)
where the O is the size of the convolved output, N is the number of kernels in the previous convolutional layer and F is the number of neurons in the layer. The convolved output is flatted to a vector of length O × O × N . In addition to the weights, there are also F bias values. The total number of parameters in this layer is Pc = F + Wfc.
In the case where a fc layer is connected to another fc layer:
Wfc = F−1 × F, (3)
where, F−1 is the number of neurons in the previous layer and F is the number of neurons in the current layer. The total number of parameters in this layer is Pc = F + Wfc.
For example in the first layer:
P1 = 112 ×3×96+96 = 34944. (4)
The second layer:
P2 = 52 ×96×256+256 = 614656. (5) After performing the appropriate operations at each layer the total number of parame-
ters in AlexNet is: 62, 378, 344. Parameters in each layer:
• Conv Layer 1: 34944 • Conv Layer 2: 614656
• Conv Layer 3: 885120 • Conv Layer 4: 1327488 • Conv Layer 5: 884992 • FC layer 1: 37752832
• FC layer 2: 16781312
• FC layer 3: 4097000