PowerPoint Presentation
Comp90042
Workshop
Week 5
20 April
1
Neural Network Language Model
Parameters of a Neural Network
RNN vs N-gram language model
Vanishing Gradient Problem in RNN
2
Table of Content
Neural Network Language Model
Parameters of a Neural Network
RNN vs N-gram language model
Vanishing Gradient Problem in RNN
3
Table of Content
What is Neural Network Language Model?
Basically, a language model that utilizes neural network. It can be feed-forward neural networks, RNN, CNN and etc.
4
Neural Network language Model
How does a Neural Network Language Model deal with sparsity? Consider why is it an advantage over n-gram Language Model?
5
Continuous V.S. Discrete
Context Cat Dog Eat
Run 1 0 0
Walk 0 1 0
Banana 0 0 1
Dim 1 Dim 2 Dim 3
Cat 0.8 0.9 0.1
Walk 0.9 0.7 0
Banana 0 0 0.9
Discrete representation
Continuous representation
NN models maps words into continuous vector space (i.e. word embeddings).
Word embeddings capture syntactic & semantic relationships between words.
Continuous representation generalize well in unseen sequence.
Consider the following two sentences:
Sentence 1 (in corpus)
The cat is walking in the bedroom.
Sentence 2 (unseen)
A dog was running in a room.
6
Word vectors in action
Cat Dog Walk Run
Cat 1 0.8 0.27 0.26
Dog 0.80 1 0.37 0.30
Walk 0.27 0.37 1 0.55
Run 0.26 0.30 0.55 1
Word vector similarity in spaCy
The semantic of the second sentence can be inferred by looking similar sentences.
Neural Network Language Model
Parameters of a Neural Network
RNN vs N-gram language model
Vanishing Gradient Problem in RNN
7
Table of Content
8
Feedforward Neural Network
+
Input (Embedding) layer
Hidden layer
Output layer
Layer # of parameters
Input (Embedding)
Hidden
Output
Consider a Bi-gram Feed-Forward NN with:
1 hidden layer of 300D
10K unique words
8
9
Example 2
Recurrent Neural Network
RNN
Layer # of parameters
Input (Embedding)
Hidden
Output
Input (Embedding) layer
Hidden layer
Output layer
Neural Network Language Model
Parameters of a Neural Network
RNN vs N-gram language model
Vanishing Gradient Problem in RNN
10
Table of Content
What advantage does an RNN language model have over N-gram language model?
RNN can capture longer context, whereas the context size of N-gram LM is fixed.
11
RNN vs N-gram language model
11
Neural Network Language Model
Parameters of a Neural Network
RNN vs N-gram language model
Vanishing Gradient Problem in RNN
12
Table of Content
What is gradient?
Gradient can be interpreted as the “direction and rate of fastest increase”.
13
Vanishing Gradient Problem
13
14
Vanishing Gradient Problem
Why vanish?
Chain rule: a formula to compute derivatives for composite functions
Now, recall the formula of RNN
Let’s try to get the partial derivative of with respect to at time step
14
15
Vanishing Gradient Problem
RNN
RNN
RNN
……
RNN is a very deep network after unfolded.
The gradient may vanish/explode as we move backwards to earlier time steps.
Eventually the model learn too slow than acceptable
16
Gated Recurrent Neural Networks
Ideally, we want the red part in the formula to be close to either 0 or 1
Challenge: how to pick 0 or 1?
Gated Neural Networks
Long-short term memory networks (LSTM)
Gated Recurrent Unit (GRU)
Can we also let the model to learn that?
16
17
LSTM
RNN Cell
LSTM uses ’Gates’ to control the flow of information
Except hidden state, LSTM also has a cell state for holding it’s ”memory”.
LSTM Cell
Takeaways
Neural Network Language Model
Definition
How to deal with sparsity
Parameters of a Neural Network
Feedforward neural network
Recurrent neural network (RNN)
RNN vs N-gram language model
Vanishing Gradient Problem in RNN
Reason
Advantage of LSTM
18
Thank you
/docProps/thumbnail.jpeg