CPSC 425 Neural Networks 20/21 (Term 21) Practice Questions
Multiple Part True/False Questions. For each question, indicate which of the statements, (A)–(D), are true and which are false? Note: Questions may have zero, one or multiple statements that are true.
Question 1. Which of the following statements about max-pooling are true? Which are false? [Adopted from Stanford’s CS230]
(A) Itallowsaneuroninanetworktohaveinformationaboutfeaturesinalarger part of the image, compared to a neuron at the same depth in a network without max pooling.
(B) It increases the number of parameters when compared to a similar network without max pooling.
(C) It increases the sensitivity of the network towards the position of features within an image.
Question 2. Consider a simple convolutional neural network with one convo- lutional layer. Which of the following statements is true about this network? [Adopted from Stanford’s CS230]
(A) It is scale invariant.
(B) It is rotation invariant. (C) It is translation invariant. (D) All of the above.
Short Answer Questions.
Question 3. What are the two passes of Backpropagation and their roles?
Question 4. Recall that using stochastic gradient descent the parameters are up- datedusingthefollowingequations:Wi,j,k =Wi,j,k−λ∂L(f(xi),yi),whereWi,j,k
is an element of the weight matrix (or more broadly a parameter of the neural net),
f(xi) is the mapping function that neural network implements applied on sample xi, the yi is the ground truth label for this sample and L is the loss. What is the λ called? What would you expect to happen as you set λ to be small? large?
Question 5. You want to map every possible image of size 64 × 64 to a binary category (cat or non-cat). Each image has 3 channels and each pixel in each channel can take an integer value between (and including) 0 and 255. How many bits do you need to represent this mapping? [Adopted from Stanford’s CS230]
Question 6. The mapping from previous question clearly can not be stored in memory. Instead, it is typically encoded using a neural network classifier. Recall the simple single hidden layer classifier that was illustrated in class, containing an input layer, a hidden layer and an output layer. Assuming we use a single hidden layer of size 100 for this task.
(a) What is the size of the weight matrices W1 and W2?
(b) What about the bias vectors b1 and b2?
(c) How many learnable parameters are there?
(d) How many bits do you need to store your two layer neural network (you may ignore the biases b1 and b2)? [Adopted from Stanford’s CS230]
Question 7(a). Consider a problem of detecting whether there is a car is in any given grayscale satellite image of size 1024 × 1024 pixels. Considering that the largest instance of the car that can appear in such an image is 11 × 11 pixels, design the simplest two-layer neural network consisting of one Convolutional and one layer. What would be a reasonable design? How many learnable parameters would such a network have?
Question 7(b). Now consider that you have access to a library that ONLY sup- ports 3 × 3 × 1 convolutional filters. How could you implement the architecture that could conceptually have the same behavior? How many parameters would such an architecture have?
Question 8. Consider an intermediate convolutional neural network layer within a CNN which receives as an input the output from the previous layer of size 512 × 512 × 128 Assuming that the convolutional layer we are considering applies 48
7 × 7 filters at stride 2 with no padding. What is the number of parameters that need to be learned in this layer? What is the size of the resulting activation map?
Question 9. Lets say you want to re-implement the template matching assignment on face detection using a neural network. What architectural design choices would you make and how would you setup the parameters? What would be the key difference between your original filtering implementation and the neural network implementation?
Question 10. Consider a convolution layer. The input consists of 6 feature maps of size 20 × 20. The output consists of 8 feature maps, and the filters are of size 5 × 5. The convolution is done with a stride of 2 and with zero padding, so the output feature maps are of size 10 × 10. [Adopted from UofT’s CSC321]
(A) Determine the number of weights in this convolution layer.
(B) Nowsupposewemadethisafullyconnectedlayer,butwherethenumberof input and output units are kept the same as in the network described above. Determine the number of weights in this layer.