CS计算机代考程序代写 AI The University of Sydney Page 1

The University of Sydney Page 1

Recurrent Neural Networks

Dr Chang Xu

School of Computer Science

The University of Sydney Page 2

Example Application

Image credit to https://twitter.com/SamsungMobile/status/967807667463958531

The University of Sydney Page 3

Recurrent Neural Network

p Input
pWord sequence
pEach word is represented by vector

p Methods
pOne hot encoding
pWord hash
pWord embedding

!!Complicated !”

“! “”

The University of Sydney Page 4

Recurrent Neural Network

p One hot encoding
pThe length of vector is lexicon size
pEach dimension corresponds to a word in the lexicon
pThe dimension for the word is 1, and others are 0

lexicon = {apple, bag, cat, dog, elephant}

Word D_1 D_2 D_3 D_4 D_5
Apple 1 0 0 0 0
Bag 0 1 0 0 0
Cat 0 0 1 0 0
Dog 0 0 0 1 0
Elephant 0 0 0 0 1

The University of Sydney Page 5

Recurrent Neural Network

w = “apple”

a-a-a
a-a-b

p-p-l

26 X 26 X
26


…a-p-p


p-l-e…



1

1

1

0
0

Word hashingDimension for “Other”

w = “Sauron”

apple
bag
cat

dog
elephant

“other”

0
0
0
0
0

1

w = “Gandalf”

The University of Sydney Page 6

Recurrent Neural Network

p Output
pProbability distribution that the sentence belongs to

negative or positive.

Complicated? Yes,
and a little slow, too.

-1 +1

Complicated? Yes,
and a little slow, too.

y! = 0.9
y” = 0.1

But the animation is as
colorful as the story and
emotions do eventually reach
the desired level.

y! = 0.3
y” = 0.7

!! !”

“! “”

The University of Sydney Page 7

Recurrent Neural Network

Complicated? Yes, and a little slow, too.

But the animation is as colorful as the story and
emotions do eventually reach the desired level.

Negative OR Positive Need
Memory

The University of Sydney Page 8

Recurrent Neural Network

ℎ! ℎ”

Memory can be considered as
another input.

The output of hidden layer are
stored in the memory.

store

!! !”

“! “”

The University of Sydney Page 9

Recurrent Neural Network

1x 2x

ℎ! ℎ”

store

All the weights are “1”, no bias

All activation functions are linear

Input sequence:
1
1

2
2

3
3

1 1

0 0
2 2

4 4Output sequence: 4
4

1
1

2 2

“! “”

The University of Sydney Page 10

Recurrent Neural Network

1x 2x

ℎ! ℎ!

store

Input sequence:
1
1

2
2

3
3

2 2

2 2
8 8

16 16Output sequence: 4
4

2
2
16
16

8 8

“! “”

The University of Sydney Page 11

Recurrent Neural Network

!!

“!

ℎ!

!”

“”

ℎ”

!#

“#

ℎ#

Complicated Yes And

The University of Sydney Page 12

Recurrent Neural Network

Memory can be considered as another input.

!!

“!

ℎ!

!”

“”

ℎ”

!#

“#

ℎ#

The University of Sydney Page 13

Recurrent Neural Network

p Formally
p&” is the input
pℎ” is the hidden state
p(” is the output
pℎ# = 0
pℎ” = tanh(0$$ℎ”%& +0$’&” + 2$)
p(” = 0($ℎ”

)## )##

)#$ )#$ )#$

)%# )%# )%#

!!

“!

ℎ!

!”

“”

ℎ”

!#

“#

ℎ#

The University of Sydney Page 14

Recurrent Neural Network

p Vanishing gradient problem
pℎ” = tanh(0$$ℎ”%& +0$’&” + 2$)
p(” = 0($ℎ”
pFor every step, its loss is 4*

)## )##

)#$ )#$ )#$

)%# )%# )%#

!!

“!

ℎ!

!”

“”

ℎ”

!#

“#

ℎ#

,#

+,&
+)##

=
+,&
+.&

+.&
+ℎ&

+ℎ&
+)##

ℎ& = tanh()##ℎ” + )#$5& + 6#)

+ℎ&
+)##

= tanh ‘ ℎ” + )##
+ℎ”
+)##

+ℎ”
+)##

= tanh ‘ ℎ! + )##
+ℎ!
+)##

tanh ‘)## tanh
‘)## tanh

‘)## ⋯

The University of Sydney Page 15

Recurrent Neural Network
p Vanishing gradient problem

pℎ” = tanh(0$$ℎ”%& +0$’&” + 2$)
p(” = 0($ℎ”
pFor every step, its loss is 4*

+,&
+)##

=
+,&
+.&

+.&
+ℎ&

+ℎ&
+)##

tanh ‘)## tanh ‘)## tanh ‘)##⋯

tanh $ ≤ 1

If 0 < )## < 1, tanh ')## tanh ')## tanh ')##⋯ → 0 If )## is larger, tanh ')## tanh ')## tanh ')##⋯ → ∞ Vanishing Gradients Exploding Gradients 0.9!((( ≈ 0 1.01!((( ≈ 21000 The University of Sydney Page 16 Long Short-Term Memory The University of Sydney Page 17 Long Short-Term Memory Memory Cell Input Gate Output Gate Signal control the input gate Signal control the output gate Forget Gate Signal control the forget gate Output LSTM Special Neuron: 4 inputs, 1 output Input The University of Sydney Page 18 Long Short-Term Memory 5 5B 5C 5D 6 5 7 5B Element-wise Multiply Element-wise multiply 7 is usually a sigmoid function Value From 0 to 1 Mimic open and close gate c 8E = 6 5 7 5B + 87 5C ℎ 8E7 5D ℎ = ℎ 8E 7 5D 6 5 7 5B 8E 7 5C 87 5C 8 The University of Sydney Page 19 Long Short-Term Memory 1 0 0 0( 3 1 0 0 2 0 0 0 4 1 0 0 2 0 0 0 1 0 1 7 3 -1 0 0 6 1 0 0 1 0 1 6 When &F = 1, add the numbers of x1 into the memory When &G = 1, output the number in the memory. 0 0 3 3 7 7 7 0 6 When &F = -1, reset the memory 9 && &F &G The University of Sydney Page 20 " 3 1 0 0 4 1 0 0 2 0 0 0 1 0 1 7 3 -1 0 0 + + + + !! !" !# !! !" !# !! !" !# !! !" !# y 1 0 0 100 0 0 1 1 1 1 0 -10 0 0 100 -10 0 0 10 1000 !! !" !# The University of Sydney Page 21 " 3 1 0 0 4 1 0 0 2 0 0 0 1 0 1 7 3 -1 0 0 + + + + 3 1 0 3 1 0 3 1 0 3 1 0 y 1 0 0 100 0 0 1 1 1 1 0 -10 0 0 100 -10 0 0 10 100 3 ≈1 3 ≈1 03 3≈0 ≈0 !! !" !# The University of Sydney Page 22 " 3 1 0 0 4 1 0 0 2 0 0 0 1 0 1 7 3 -1 0 0 + + + + 4 1 0 4 1 0 4 1 0 4 1 0 y 1 0 0 100 0 0 1 1 1 1 0 -10 0 0 100 -10 0 0 10 100 4 ≈1 4 ≈1 37 7≈0 ≈0 !! !" !# The University of Sydney Page 23 " 3 1 0 0 4 1 0 0 2 0 0 0 1 0 1 7 3 -1 0 0 + + + + 2 0 0 2 0 0 2 0 0 2 0 0 y 1 0 0 100 0 0 1 1 1 1 0 -10 0 0 100 -10 0 0 10 100 2 ≈0 0 ≈1 7 7≈0 ≈0 !! !" !# The University of Sydney Page 24 " 3 1 0 0 4 1 0 0 2 0 0 0 1 0 1 7 3 -1 0 0 + + + + 1 0 1 1 0 1 1 0 1 1 0 1 y 1 0 0 100 0 0 1 1 1 1 0 -10 0 0 100 -10 0 0 10 100 1 ≈0 0 ≈1 7 7≈1 ≈7 !! !" !# The University of Sydney Page 25 " 3 1 0 0 4 1 0 0 2 0 0 0 1 0 1 7 3 -1 0 0 + + + + 3 -1 0 3 -1 0 3 -1 0 3 -1 0 y 1 0 0 100 0 0 1 1 1 1 0 -10 0 0 100 -10 0 0 10 100 3 ≈0 0 ≈0 70 0≈0 ≈0 !! !" !# The University of Sydney Page 26 Long Short-Term Memory && &F + + + + Input (& (& LSTMRNN 4 times of parameters The University of Sydney Page 27 Long Short-Term Memory p Forget gate l 7" = :(0C$ℎ"%& +0C'&" + 2C) p Input gate l ;" = :(0B$ℎ"%& +0B'&" + 2B) p Output gate l <" = :(0D$ℎ"%& +0D'&" + 2D) The University of Sydney Page 28 Long Short-Term Memory p Candidate cell state l =8" = tanh(0$$ℎ"%& +0$'&" + 2$) p Update cell state l 8" = 7" ∗ 8"%& + ;" ∗ =8" p Output state l ℎ" = <" ∗ tanh(8") 8 The University of Sydney Page 29 Long Short-Term Memory p Learning target 10label !! "! ℎ! !" "" ℎ" … … … Complicated Yes … ℎ y ! level The University of Sydney Page 30 Long Short-Term Memory LSTM !% LSTM LSTM…… LSTM !%&! !%&" LSTM…… … … "% …… LSTM LSTM … … "%&! LSTM LSTM … … "%&" p Stacked LSTM The University of Sydney Page 31 Long Short-Term Memory p Bidirectional RNN ("H& LSTM LSTM… … … … LSTM LSTMLSTM … … … … LSTM ("HF(" !% !%&! !%&" !% !%&! !%&" A Little Slow The University of Sydney Page 32 LSTM V.S. RNN The University of Sydney Page 33 Prevent vanishing gradient pRNN lℎ" = tanh(0$$ℎ"%& +0$'&" + 2$) pLSTM l8" = 7" ∗ 8"%& + ;" ∗ =8" LSTMs solve the problem using a unique additive gradient structure that includes direct access to the forget gate’s activations, enabling the network to encourage desired behaviour from the error gradient using frequent gates update on every time step of the learning process. The University of Sydney Page 34 Gate Recurrent Unit The University of Sydney Page 35 Gate Recurrent Unit ℎ! ℎ" store !! !" "! "" Vanilla RNN The University of Sydney Page 36 Gate Recurrent Unit ℎE = tanh 0$$ℎ"%&E +0$'&" + 2$ ?" = : 0I$ℎ"%& +0I'&" + 2I Reset Gate ℎ"%&E = ?" ∗ ℎ"%& ℎ"%&E Update Gate (forget gate + input gate) 5" = : 0J$ℎ"%& +0J'&" + 2J ℎ" = 5" ∗ ℎ"%& + (1 − 5") ∗ ℎE The University of Sydney Page 37 Application p Visual Question Answer (VQA) p Reading Comprehension p Sequence To Sequence (Seq2seq) pLanguage Translation pChat Robot The University of Sydney Page 38 VQA What’s the mustache made of ? AI System Banana Image credit to http://www.visualqa.org/challenge.html The University of Sydney Page 39 VQA The University of Sydney Page 40 VQA p Question Attention Embedding LSTM Embedding LSTM Embedding LSTM Embedding LSTM Embedding LSTM Stack Conv Softmax What’s the mustache made of Question Vector Question Vector What’s What’s the mustache made of = × What’s + + × of⋯ of The University of Sydney Page 41 VQA p Object Detection face eye eyehair nose ear banana banana mouth Object Vector The University of Sydney Page 42 VQA p Image Attention Object Vector Question Vector Tile Vector Tile CNN CNN Elementwise Product L2 Norm & Softmax Image Attention Vector The University of Sydney Page 43 VQA What color is illuminated on the traffic light? What is the man holding? Image credit to Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering The University of Sydney Page 44 VQA – Question & Image Fusion Image Attention Vector Question Vector FC Layer FC LayerElementwise Product L2 Norm FC Layer Softmax Banana The University of Sydney Page 45 Reading Comprehension p Answering question based on given facts p Example A. Brian is a frog. B. Lily is gray. C. Brian is yellow D. Julius is green. E. Greg is a frog üWhat color is Greg? The University of Sydney Page 46 Reading Comprehension Query Memory answer Attention Select Fact Each Fact becomes a vector by LSTM The University of Sydney Page 47 Reading Comprehension End-To-End Memory Networks. S. Sukhbaatar, A. Szlam, J. Weston, R. Fergus. NIPS, 2015. The University of Sydney Page 48 Seq2seq p Language Translation 我是一个学生 I am a student The University of Sydney Page 49 Seq2seq p Chat Robot Where do you come from ? I am from Sydney, and you? The University of Sydney Page 50 DecoderEncoder Seq2seq E L I L E 是 L E 一 L E 个 L E 学生 L E

L

E

我 am

L

E

a

L

E

stu

L

E

I am a stu

The University of Sydney Page 51

Seq2seq

p Attention

Encoder

E

L L

E

L

E

L

E

L

E

学生我

Decoder

L

E

IAttention module

The University of Sydney Page 52

Seq2seq

p Attention

Encoder

E

L L

E

L

E

L

E

L

E

学生我

Decoder

L

E

IAttention module

I

L

E

am

The University of Sydney Page 53

Seq2seq

p Attention

Encoder

E

L L

E

L

E

L

E

L

E

学生我

Decoder

L

E

IAttention module

I

L

E

am

am

L

E

a

The University of Sydney Page 54

Seq2seq

p Two stage
pEncode stage
pDecode stage

p Attention
pWeight for each encode symbol in decode step
pAnother input of LSTM represented as A”
p7″ = :(0C$ℎ”%& +0C’&” +0CLA” + 2C)
p;” = :(0B$ℎ”%& +0B’&” +0BLA” + 2B)
p<" = :(0D$ℎ"%& +0D'&" +0DLA" + 2D) p =8" = tanh(0$$ℎ"%& +0$'&" +0$LA" + 2$) The University of Sydney Page 55 Seq2seq Image credit to Neural Machine Translation by Jointly Learning to Align and Translate The University of Sydney Page 56 Thank you !