CS计算机代考程序代写 python deep learning algorithm Microsoft Word – Sample Questions-v3.docx

Microsoft Word – Sample Questions-v3.docx

Page 1 of 3

Question 1
With the follow code

import numpy as np
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [5, 6]])

B is:

A. A numpy matrix
B. An ordinary list (of lists) Python object:
C. A numpy array

Question 2
With the above code and
A*B

the result is

A. Concatenation of the two lists
B. A matrix product matrix([[15, 18], [35, 42]])
C. An elementwise product of the matrix elements: array([[ 5, 12], [15, 24]])
D. TypeError: can’t multiply sequence by non-int of type ‘list’

Hint: you need to be familiar with the difference between elementwise multiplication and matrix
multiplication

Question 3
Can deep neural networks be trained in an unsupervised way?
Yes
No

Hint: what is unsupervised learning? -> what is difference between unsupervised and supervised
learning? -> What kind of algorithms are trained in unsupervised ways? -> Can we apply it to the
DNN?
Notes: generation models could be considered as a kind of self-learning (unsupervised learning).

Question 4
There are exactly six fish tanks in a room of the aquarium. The six tanks contain the
following numbers of fish:
𝒙𝟏 = 𝟓, 𝒙𝟐 = 𝟓, 𝒙𝟑 = 𝟖, 𝒙𝟒 = 𝟏𝟐, 𝒙𝟓 = 𝟏𝟓, 𝒙𝟔 = 𝟏𝟖. The variance of the population is

A. 10.5
B. 24.25
C. 29.1
D. 145.5

Question 5

Page 2 of 3

Given 𝑠 as the input of Tanh function, when would Tanh function lead to vanishing
gradient problem

A. 𝑠 ⟶ 1
B. 𝑠 ⟶ 0
C. 𝑠 ⟶ −∞
D. 𝑠 ⟶ 𝑒

Hint: go back to check the figure of Tanh, and think about what is the vanishing gradient problem?

Question 6
Suppose we have trained a neural network (ReLU on the hidden layer) for a 3-category
classification problem, in which the weights and bias are

Consider a test example [1, −1, 2, −3]!

1) What is the output 𝑦. of the network?
2) The ground-truth output is [0,1,0]!. Given the squared loss function ”

#
‖𝑦 − 𝑦.‖#, what is the

prediction error of this test example?
3) Given a softmax layer before 𝑦, what is the output of the softmax layer?
4) Given a softmax layer before 𝑦, what is the final cross-entropy loss of this test example?

Hint: follow the forward propagation to get the output. Note: the final layer also contains the
activation function.

1) [ 0.12 , 0.00 , 0.99 ]T
2) 0.99
3) [0.2378, 0.1947, 0.5675]T
4) 1.6363

Question 7
(a) Can we initialize the deep neural network with zeros in the optimization? What is the reason?
What would be a better way to initialize the deep neural network?
(b) There are many popular CNN architectures, such as AlexNet, VGG, GoogLeNet, and ResNet.
Choose at least two CNN architectures that you are familiar with. Introduce their major
characteristics, and explain how these characteristics lead to performance improvement of the
networks.

Hint: (a) if we has zero initialized network, that means we will has zero weight and bias. -> what
happen if we multiple with zero and then add with zero? (check the week2’s slides of 34-46)
(b) explain and compare their model designs (major characteristics), discuss why these designs
could lead to the performance improvement. (check the week7’s slides).

Page 3 of 3

Question 8
There are many nonlinear activation functions, such as Sigmoid, Tanh, Hardtanh, ReLU, and leaky
ReLU.
(a) Give the equations of at least three nonlinear activation functions and plot their curves.
(b) Give the deritives (w.r.t. x) of activation functions chosen in (a).
(c) What are the disadvantages of Sigmoid? What are the disadvatages of ReLU?

Hint: check the slides of week 3 p30-34. (a) Select three activation functions you preferred, and
then write their function. (c) check the figure of these activation functions. Sigmoid (non-negative),
Relu (non-negative, dead Relu -> leaky Relu)

Question 9
Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) are pouplar techniques in
deep learning.
(a) Briefly introduce the major structures of RNN and LSTM.
(b) Why LSTM can solve the gradient vanish problem.
(c) What advantages of LSTM over Convolutional layers could be to process sequential data.

Hint: (a) describe the model design of both RNN and LSTM (RNN has memory to store output of
hidden layer, LSTM has memory and gates)
(b) what is the problem of RNN? Why RNN could leads to the vanish problem? -> LSTM use
additive gradient structure to replace the gradient multiplication in RNN (check the week8’s slides
p33).
(c) seuqntial data -> require to analysis the relationships of current input with previous data. LSTM
has memory to store previous information.