程序代写代做代考 chain PowerPoint Presentation

PowerPoint Presentation

Comp90042
Workshop
Week 5

20 April

1

Neural Network Language Model

Parameters of a Neural Network

RNN vs N-gram language model

Vanishing Gradient Problem in RNN

2
Table of Content

Neural Network Language Model

Parameters of a Neural Network

RNN vs N-gram language model

Vanishing Gradient Problem in RNN

3
Table of Content

What is Neural Network Language Model?
Basically, a language model that utilizes neural network. It can be feed-forward neural networks, RNN, CNN and etc.

4
Neural Network language Model

How does a Neural Network Language Model deal with sparsity? Consider why is it an advantage over n-gram Language Model?

5
Continuous V.S. Discrete
Context Cat Dog Eat
Run 1 0 0
Walk 0 1 0
Banana 0 0 1

Dim 1 Dim 2 Dim 3
Cat 0.8 0.9 0.1
Walk 0.9 0.7 0
Banana 0 0 0.9

Discrete representation
Continuous representation
NN models maps words into continuous vector space (i.e. word embeddings).
Word embeddings capture syntactic & semantic relationships between words.
Continuous representation generalize well in unseen sequence.

Consider the following two sentences:

Sentence 1 (in corpus)
The cat is walking in the bedroom.

Sentence 2 (unseen)
A dog was running in a room.

6
Word vectors in action
Cat Dog Walk Run
Cat 1 0.8 0.27 0.26
Dog 0.80 1 0.37 0.30
Walk 0.27 0.37 1 0.55
Run 0.26 0.30 0.55 1

Word vector similarity in spaCy
The semantic of the second sentence can be inferred by looking similar sentences.

Neural Network Language Model

Parameters of a Neural Network

RNN vs N-gram language model

Vanishing Gradient Problem in RNN

7
Table of Content

8

Feedforward Neural Network

+

Input (Embedding) layer
Hidden layer
Output layer
Layer # of parameters
Input (Embedding)
Hidden
Output

Consider a Bi-gram Feed-Forward NN with:
1 hidden layer of 300D
10K unique words

8

9
Example 2
Recurrent Neural Network
RNN

Layer # of parameters
Input (Embedding)
Hidden
Output

Input (Embedding) layer
Hidden layer
Output layer

Neural Network Language Model

Parameters of a Neural Network

RNN vs N-gram language model

Vanishing Gradient Problem in RNN

10
Table of Content

What advantage does an RNN language model have over N-gram language model?

RNN can capture longer context, whereas the context size of N-gram LM is fixed.

11
RNN vs N-gram language model

11

Neural Network Language Model

Parameters of a Neural Network

RNN vs N-gram language model

Vanishing Gradient Problem in RNN

12
Table of Content

What is gradient?
Gradient can be interpreted as the “direction and rate of fastest increase”.

13
Vanishing Gradient Problem

13

14
Vanishing Gradient Problem
Why vanish?

Chain rule: a formula to compute derivatives for composite functions

Now, recall the formula of RNN

Let’s try to get the partial derivative of with respect to at time step

14

15
Vanishing Gradient Problem
RNN

RNN
RNN
……
RNN is a very deep network after unfolded.

The gradient may vanish/explode as we move backwards to earlier time steps.

Eventually the model learn too slow than acceptable

16
Gated Recurrent Neural Networks

Ideally, we want the red part in the formula to be close to either 0 or 1
Challenge: how to pick 0 or 1?
Gated Neural Networks
Long-short term memory networks (LSTM)
Gated Recurrent Unit (GRU)

Can we also let the model to learn that?

16

17
LSTM

RNN Cell

LSTM uses ’Gates’ to control the flow of information
Except hidden state, LSTM also has a cell state for holding it’s ”memory”.

LSTM Cell

Takeaways
Neural Network Language Model
Definition
How to deal with sparsity

Parameters of a Neural Network
Feedforward neural network
Recurrent neural network (RNN)

RNN vs N-gram language model

Vanishing Gradient Problem in RNN
Reason
Advantage of LSTM

18

Thank you

/docProps/thumbnail.jpeg