Tutorial Questions | Week 3
COSC2779 – Deep Learning
This tutorial is aimed at reviewing feed forward neural networks. Please try the questions before
you join the session.
1. For a fully-connected deep network with one hidden layer, increasing the number of hidden units should
have what effect on bias and variance?
Solution: Adding more hidden units should decrease bias and increase variance. In general, more
complicated models will result in lower bias but larger variance, and adding more hidden units certainly
makes the model more complex.
2. Consider the following one hidden layer network.
x1
x2
a[1] ŷ
w (1)
1
w
(1)
2
w
(2)
1
a[1] = σ1
(
w
(1)
1 x1 + w
(1)
2 x2 + b
(1)
)
y = σ2
(
w
(2)
1 a
[1] + b(2)
)
Show that if σ1 is linear, the network can be represented by one layer perceptron.
Solution: Linear activation means αz = σ (z)
y = σ2
(
w
(2)
1 a
[1] + b(2)
)
y = σ2
(
w
(2)
1 α
(
w
(1)
1 x1 + w
(1)
2 x2 + b
(1)
)
+ b(2)
)
y = σ2
(
αw
(2)
1 w
(1)
1 x1 + αw
(2)
1 w
(1)
2 x2 + αw
(2)
1 b
(1) + b(2)
)
If we define
ω1 = αw
(2)
1 w
(1)
1
ω2 = αw
(2)
1 w
(1)
2
β = αw
(2)
1 b
(1) + b(2)
y = σ2 (ω1×1 + ω2×2 + β)
3. You want to map every possible image of size 64 × 64 to a binary category (cat or non-cat). Each image
has 3 channels and each pixel in each channel can take an integer value between (and including) 0 and 255.
How many bits do you need to represent this mapping?
Solution: 2563×64×64
4. The mapping from question (3) clearly can not be stored in memory. Instead, you will build a classifier to
do this mapping. You will use a single hidden layer of size 100 for this task. Each weight in the two weight
matrices can be represented in memory using a float of size 64 bits. How many bits do you need to store
your two layer neural network?
Solution: 64 (100 (64× 64× 3 + 1) + (100 + 1))
5. One of the difficulties with the logistic activation function is that of saturated units. Briefly explain the
problem, and whether switching to tanh fixes the problem. Recall:
σ (z) =
1
1 + exp (−z)
tanh (z) =
exp (z)− exp (−z)
exp (z) + exp (−z)
Solution: No, switching to tanh does not fix the problem. The derivative of σ(z) is small for large
negtive or positive z. The same problem persists in tanh (z). Both function has a sigmoidal shape. We
can see tanh is effectively a scaled and translated sigmoid function: tanh (z) = 2σ(2z)− 1.
6. You are asked to develop a NN to identify if a given images contain a cat and/or a dog. Note that some
images may contain both a cat and a dog. What will be a possible output activation and loss function?
Solution:
Option 1: Have 4 outputs (none, cat, dog, both) and softmax. use categorical-cross-entropy.
Option 2: Have two sigmoid activated units. Use combined loss function of two binary-cross-entropy
summed.
Page 2