The University of Sydney Page 1
Recurrent Neural Networks
Dr Chang Xu
School of Computer Science
The University of Sydney Page 2
Example Application
Image credit to https://twitter.com/SamsungMobile/status/967807667463958531
The University of Sydney Page 3
Recurrent Neural Network
p Input
pWord sequence
pEach word is represented by vector
p Methods
pOne hot encoding
pWord hash
pWord embedding
!!Complicated !”
“! “”
The University of Sydney Page 4
Recurrent Neural Network
p One hot encoding
pThe length of vector is lexicon size
pEach dimension corresponds to a word in the lexicon
pThe dimension for the word is 1, and others are 0
lexicon = {apple, bag, cat, dog, elephant}
Word D_1 D_2 D_3 D_4 D_5
Apple 1 0 0 0 0
Bag 0 1 0 0 0
Cat 0 0 1 0 0
Dog 0 0 0 1 0
Elephant 0 0 0 0 1
The University of Sydney Page 5
Recurrent Neural Network
w = “apple”
a-a-a
a-a-b
p-p-l
26 X 26 X
26
…
…a-p-p
…
p-l-e…
…
…
…
…
1
1
1
0
0
Word hashingDimension for “Other”
w = “Sauron”
…
apple
bag
cat
dog
elephant
“other”
0
0
0
0
0
1
w = “Gandalf”
The University of Sydney Page 6
Recurrent Neural Network
p Output
pProbability distribution that the sentence belongs to
negative or positive.
Complicated? Yes,
and a little slow, too.
-1 +1
Complicated? Yes,
and a little slow, too.
y! = 0.9
y” = 0.1
But the animation is as
colorful as the story and
emotions do eventually reach
the desired level.
y! = 0.3
y” = 0.7
!! !”
“! “”
The University of Sydney Page 7
Recurrent Neural Network
Complicated? Yes, and a little slow, too.
But the animation is as colorful as the story and
emotions do eventually reach the desired level.
Negative OR Positive Need
Memory
The University of Sydney Page 8
Recurrent Neural Network
ℎ! ℎ”
Memory can be considered as
another input.
The output of hidden layer are
stored in the memory.
store
!! !”
“! “”
The University of Sydney Page 9
Recurrent Neural Network
1x 2x
ℎ! ℎ”
store
All the weights are “1”, no bias
All activation functions are linear
Input sequence:
1
1
2
2
3
3
1 1
0 0
2 2
4 4Output sequence: 4
4
1
1
2 2
“! “”
The University of Sydney Page 10
Recurrent Neural Network
1x 2x
ℎ! ℎ!
store
Input sequence:
1
1
2
2
3
3
2 2
2 2
8 8
16 16Output sequence: 4
4
2
2
16
16
8 8
“! “”
The University of Sydney Page 11
Recurrent Neural Network
!!
“!
ℎ!
!”
“”
ℎ”
!#
“#
ℎ#
Complicated Yes And
…
…
…
…
The University of Sydney Page 12
Recurrent Neural Network
Memory can be considered as another input.
!!
“!
ℎ!
!”
“”
ℎ”
!#
“#
ℎ#
The University of Sydney Page 13
Recurrent Neural Network
p Formally
p&” is the input
pℎ” is the hidden state
p(” is the output
pℎ# = 0
pℎ” = tanh(0$$ℎ”%& +0$’&” + 2$)
p(” = 0($ℎ”
)## )##
)#$ )#$ )#$
)%# )%# )%#
!!
“!
ℎ!
!”
“”
ℎ”
!#
“#
ℎ#
The University of Sydney Page 14
Recurrent Neural Network
p Vanishing gradient problem
pℎ” = tanh(0$$ℎ”%& +0$’&” + 2$)
p(” = 0($ℎ”
pFor every step, its loss is 4*
)## )##
)#$ )#$ )#$
)%# )%# )%#
!!
“!
ℎ!
!”
“”
ℎ”
!#
“#
ℎ#
,#
+,&
+)##
=
+,&
+.&
+.&
+ℎ&
+ℎ&
+)##
ℎ& = tanh()##ℎ” + )#$5& + 6#)
+ℎ&
+)##
= tanh ‘ ℎ” + )##
+ℎ”
+)##
+ℎ”
+)##
= tanh ‘ ℎ! + )##
+ℎ!
+)##
tanh ‘)## tanh
‘)## tanh
‘)## ⋯
The University of Sydney Page 15
Recurrent Neural Network
p Vanishing gradient problem
pℎ” = tanh(0$$ℎ”%& +0$’&” + 2$)
p(” = 0($ℎ”
pFor every step, its loss is 4*
+,&
+)##
=
+,&
+.&
+.&
+ℎ&
+ℎ&
+)##
tanh ‘)## tanh ‘)## tanh ‘)##⋯
tanh $ ≤ 1
If 0 < )## < 1,
tanh ')## tanh
')## tanh
')##⋯ → 0
If )## is larger,
tanh ')## tanh
')## tanh
')##⋯ → ∞
Vanishing Gradients
Exploding Gradients
0.9!((( ≈ 0
1.01!((( ≈ 21000
The University of Sydney Page 16
Long Short-Term Memory
The University of Sydney Page 17
Long Short-Term Memory
Memory
Cell
Input Gate
Output Gate
Signal control
the input gate
Signal control
the output gate
Forget
Gate
Signal control
the forget gate
Output
LSTM
Special Neuron:
4 inputs,
1 output
Input
The University of Sydney Page 18
Long Short-Term Memory
5
5B
5C
5D
6 5
7 5B
Element-wise Multiply
Element-wise multiply 7 is usually a sigmoid function
Value From 0 to 1
Mimic open and close gate
c
8E = 6 5 7 5B + 87 5C
ℎ 8E7 5D
ℎ = ℎ 8E 7 5D
6 5 7 5B
8E
7 5C
87 5C
8
The University of Sydney Page 19
Long Short-Term Memory
1
0
0
0(
3
1
0
0
2
0
0
0
4
1
0
0
2
0
0
0
1
0
1
7
3
-1
0
0
6
1
0
0
1
0
1
6
When &F = 1, add the numbers of x1 into the memory
When &G = 1, output the number in the memory.
0 0 3 3 7 7 7 0 6
When &F = -1, reset the memory
9
&&
&F
&G
The University of Sydney Page 20
"
3
1
0
0
4
1
0
0
2
0
0
0
1
0
1
7
3
-1
0
0
+
+
+
+
!! !" !#
!!
!"
!#
!!
!"
!#
!!
!"
!#
y
1 0 0
100
0
0
1
1
1
1
0
-10
0
0
100
-10
0
0
10
1000
!!
!"
!#
The University of Sydney Page 21
"
3
1
0
0
4
1
0
0
2
0
0
0
1
0
1
7
3
-1
0
0
+
+
+
+
3 1 0
3
1
0
3
1
0
3
1
0
y
1 0 0
100
0
0
1
1
1
1
0
-10
0
0
100
-10
0
0
10
100
3
≈1 3
≈1
03
3≈0
≈0
!!
!"
!#
The University of Sydney Page 22
"
3
1
0
0
4
1
0
0
2
0
0
0
1
0
1
7
3
-1
0
0
+
+
+
+
4 1 0
4
1
0
4
1
0
4
1
0
y
1 0 0
100
0
0
1
1
1
1
0
-10
0
0
100
-10
0
0
10
100
4
≈1 4
≈1
37
7≈0
≈0
!!
!"
!#
The University of Sydney Page 23
"
3
1
0
0
4
1
0
0
2
0
0
0
1
0
1
7
3
-1
0
0
+
+
+
+
2 0 0
2
0
0
2
0
0
2
0
0
y
1 0 0
100
0
0
1
1
1
1
0
-10
0
0
100
-10
0
0
10
100
2
≈0 0
≈1
7
7≈0
≈0
!!
!"
!#
The University of Sydney Page 24
"
3
1
0
0
4
1
0
0
2
0
0
0
1
0
1
7
3
-1
0
0
+
+
+
+
1 0 1
1
0
1
1
0
1
1
0
1
y
1 0 0
100
0
0
1
1
1
1
0
-10
0
0
100
-10
0
0
10
100
1
≈0 0
≈1
7
7≈1
≈7
!!
!"
!#
The University of Sydney Page 25
"
3
1
0
0
4
1
0
0
2
0
0
0
1
0
1
7
3
-1
0
0
+
+
+
+
3 -1 0
3
-1
0
3
-1
0
3
-1
0
y
1 0 0
100
0
0
1
1
1
1
0
-10
0
0
100
-10
0
0
10
100
3
≈0 0
≈0
70
0≈0
≈0
!!
!"
!#
The University of Sydney Page 26
Long Short-Term Memory
&& &F
+
+
+
+
Input
(&
(& LSTMRNN
4 times of parameters
The University of Sydney Page 27
Long Short-Term Memory
p Forget gate
l 7" = :(0C$ℎ"%& +0C'&" + 2C)
p Input gate
l ;" = :(0B$ℎ"%& +0B'&" + 2B)
p Output gate
l <" = :(0D$ℎ"%& +0D'&" + 2D)
The University of Sydney Page 28
Long Short-Term Memory
p Candidate cell state
l =8" = tanh(0$$ℎ"%& +0$'&" + 2$)
p Update cell state
l 8" = 7" ∗ 8"%& + ;" ∗ =8"
p Output state
l ℎ" = <" ∗ tanh(8")
8
The University of Sydney Page 29
Long Short-Term Memory
p Learning target
10label
!!
"!
ℎ!
!"
""
ℎ"
…
…
…
Complicated Yes …
ℎ
y
!
level
The University of Sydney Page 30
Long Short-Term Memory
LSTM
!%
LSTM LSTM…… LSTM
!%&! !%&"
LSTM……
…
…
"%
…… LSTM
LSTM
…
…
"%&!
LSTM
LSTM
…
…
"%&"
p Stacked LSTM
The University of Sydney Page 31
Long Short-Term Memory
p Bidirectional RNN
("H&
LSTM LSTM…
…
…
…
LSTM
LSTMLSTM …
…
…
…
LSTM
("HF("
!% !%&! !%&"
!% !%&! !%&"
A Little Slow
The University of Sydney Page 32
LSTM V.S. RNN
The University of Sydney Page 33
Prevent vanishing gradient
pRNN
lℎ" = tanh(0$$ℎ"%& +0$'&" + 2$)
pLSTM
l8" = 7" ∗ 8"%& + ;" ∗ =8"
LSTMs solve the problem using a unique additive gradient structure
that includes direct access to the forget gate’s activations, enabling the
network to encourage desired behaviour from the error gradient using
frequent gates update on every time step of the learning process.
The University of Sydney Page 34
Gate Recurrent Unit
The University of Sydney Page 35
Gate Recurrent Unit
ℎ! ℎ"
store
!! !"
"! ""
Vanilla RNN
The University of Sydney Page 36
Gate Recurrent Unit
ℎE = tanh 0$$ℎ"%&E +0$'&" + 2$
?" = : 0I$ℎ"%& +0I'&" + 2I
Reset Gate
ℎ"%&E = ?" ∗ ℎ"%&
ℎ"%&E
Update Gate (forget gate + input gate)
5" = : 0J$ℎ"%& +0J'&" + 2J
ℎ" = 5" ∗ ℎ"%& + (1 − 5") ∗ ℎE
The University of Sydney Page 37
Application
p Visual Question Answer (VQA)
p Reading Comprehension
p Sequence To Sequence (Seq2seq)
pLanguage Translation
pChat Robot
The University of Sydney Page 38
VQA
What’s the
mustache made of ?
AI System Banana
Image credit to http://www.visualqa.org/challenge.html
The University of Sydney Page 39
VQA
The University of Sydney Page 40
VQA
p Question Attention
Embedding
LSTM
Embedding
LSTM
Embedding
LSTM
Embedding
LSTM
Embedding
LSTM
Stack
Conv Softmax What’s the mustache made of
Question Vector
Question Vector What’s
What’s the mustache made of
= × What’s + + × of⋯ of
The University of Sydney Page 41
VQA
p Object Detection
face
eye
eyehair
nose
ear
banana
banana
mouth
Object
Vector
The University of Sydney Page 42
VQA
p Image Attention
Object
Vector
Question Vector Tile
Vector
Tile
CNN
CNN
Elementwise
Product
L2 Norm &
Softmax
Image Attention Vector
The University of Sydney Page 43
VQA
What color is illuminated on the traffic light?
What is the man holding?
Image credit to Bottom-Up and Top-Down
Attention for Image Captioning and Visual
Question Answering
The University of Sydney Page 44
VQA
– Question & Image Fusion
Image Attention Vector Question Vector
FC Layer FC LayerElementwise
Product
L2 Norm
FC Layer Softmax Banana
The University of Sydney Page 45
Reading Comprehension
p Answering question based on given facts
p Example
A. Brian is a frog.
B. Lily is gray.
C. Brian is yellow
D. Julius is green.
E. Greg is a frog
üWhat color is Greg?
The University of Sydney Page 46
Reading Comprehension
Query Memory answer
Attention
Select
Fact
Each Fact becomes a vector by LSTM
The University of Sydney Page 47
Reading Comprehension
End-To-End Memory Networks. S. Sukhbaatar, A.
Szlam, J. Weston, R. Fergus. NIPS, 2015.
The University of Sydney Page 48
Seq2seq
p Language Translation
我是一个学生
I am a student
The University of Sydney Page 49
Seq2seq
p Chat Robot
Where do you come from ?
I am from Sydney, and you?
The University of Sydney Page 50
DecoderEncoder
Seq2seq
E
L
I
L
E
是
L
E
一
L
E
个
L
E
学生
L
E
L
E
我 am
L
E
a
L
E
stu
L
E
I am a stu
The University of Sydney Page 51
Seq2seq
p Attention
Encoder
E
L L
E
是
L
E
一
L
E
个
L
E
学生我
Decoder
L
E
IAttention module
The University of Sydney Page 52
Seq2seq
p Attention
Encoder
E
L L
E
是
L
E
一
L
E
个
L
E
学生我
Decoder
L
E
IAttention module
I
L
E
am
The University of Sydney Page 53
Seq2seq
p Attention
Encoder
E
L L
E
是
L
E
一
L
E
个
L
E
学生我
Decoder
L
E
IAttention module
I
L
E
am
am
L
E
a
⋯
The University of Sydney Page 54
Seq2seq
p Two stage
pEncode stage
pDecode stage
p Attention
pWeight for each encode symbol in decode step
pAnother input of LSTM represented as A”
p7″ = :(0C$ℎ”%& +0C’&” +0CLA” + 2C)
p;” = :(0B$ℎ”%& +0B’&” +0BLA” + 2B)
p<" = :(0D$ℎ"%& +0D'&" +0DLA" + 2D)
p =8" = tanh(0$$ℎ"%& +0$'&" +0$LA" + 2$)
The University of Sydney Page 55
Seq2seq
Image credit to Neural Machine Translation by
Jointly Learning to Align and Translate
The University of Sydney Page 56
Thank you !