2020/8/14 https://www.cse.unsw.edu.au/~cs9444/20T2/quiz/ans/quiz5_answers.html
COMP9444 Neural Networks and Deep Learning Quiz 5 (Recurrent Networks)
This is an optional quiz to test your understanding of Recurrent Networks from Week 5.
1. Explain the format and method by which input was fed to the NetTalk system, and the target output.
Characters were fed to NetTalk using a sliding window approach. The characters in a 7-word window were encoded with a 1-hot encoding to form the input of size 7 × 29. The network had 26 outputs – each corresponding to a letter of the phonetic alphabet. The target output was the correct pronunciation of the central character in the input.
2. Explain the role of the context layer in an Elman network.
The context layer is a copy of the hidden layer at the previous timestep. The hidden layer accepts connections from both the hidden and context layers. This in theory allows the network to retain “state” information for an indefinite period of time.
3. Draw a diagram of an LSTM and write the equations for its operation.
Gates:
ft = σ(Wf xt + Uf ht-1 + bf) [forget gate]
it = σ(Wi xt + Ui ht-1 + bi) [input gate]
gt = tanh(Wg xt + Ug ht-1 + bg)
ot = σ(Wo xt + Uo ht-1 + bo) [output gate]
State:
ct = ct-1 ⊗ ft + it ⊗ gt
https://www.cse.unsw.edu.au/~cs9444/20T2/quiz/ans/quiz5_answers.html 1/2
2020/8/14 https://www.cse.unsw.edu.au/~cs9444/20T2/quiz/ans/quiz5_answers.html
Output:
ht = tanh(ct) ⊗ ot
4. Draw a diagram of a Gated Recurrent Unit and write the equitions for its operation.
Gates:
zt = σ(Wz xt + Uz ht-1 + bz)
rt = σ(Wr xt + Ur ht-1 + br) Candidate Activation:
ĥt = tanh(W xt + U(rt ⊗ ht-1) + bh) Output:
ht = (1 – zt) ⊗ ht-1 + zt ⊗ ĥ
5. Briefly describe the problem of
of the following architectures is able to deal with long range dependencies:
a. sliding window approach
b. Simple Recurrent (Elman) Network
c. Long Short Term Memory (LSTM) d. Gated Recurrent Unit (GRU)
For sequence processing tasks, it can happen that the correct output depends on inputs that occurred many timesteps earlier. The sliding window approach is unable to take account of any input beyond the edge of the window. Simple Recurrent Networks can learn medium-range dependencies but may struggle with long range dependencies unless the training data are carefully constructed and the amount of “state” information is limited. LSTMs and GRUs are more successful at learning long range dependencies because they can learn to use some dimensions for short-term processing and others for long-term information.
, and discuss how well each
https://www.cse.unsw.edu.au/~cs9444/20T2/quiz/ans/quiz5_answers.html 2/2
seicnedneped egnar gnol