PowerPoint Presentation
Comp90042
Workshop
Week 5
20 April
1
Neural Network Language Model
Parameters of a Neural Network
RNN vs N-gram language model
Vanishing Gradient Problem in RNN
2
Table of Content
Neural Network Language Model
Parameters of a Neural Network
RNN vs N-gram language model
Vanishing Gradient Problem in RNN
3
Table of Content
What is Neural Network Language Model?
Basically, a language model that utilizes neural network. It can be feed-forward neural networks, RNN, CNN and etc.
How does a Neural Network Language Model deal with sparsity? Consider why is it an advantage over n-gram Language Model?
Neural Language Models (NLM) address the n-gram data sparsity issue through parameterization of words as vectors (word embeddings) and using them as inputs to a neural network. The parameters are learned as part of the training process. Word embeddings obtained through NLMs exhibit the property whereby semantically and syntactically close words are likewise close in the induced vector space
4
Neural Network language Model
Consider the following two sentences:
1. The cat is walking in the bedroom.
2. A dog was running in a room.
5
word vectors’ cosine similarity scores
Neural Network Language Model
Parameters of a Neural Network
RNN vs N-gram language model
Vanishing Gradient Problem in RNN
6
Table of Content
Assume a vocabulary with 10K words, and the architecture shown below
Assume the dimension of the embeddings and the hidden layer 300
Assume we use 2 previous words as context.
Input word embeddings =
300 x 10K = 3,000,000
Hidden layer W1 = 300 x 600 = 180,000
Hidden layer bias b1= 300
Output layer W2 = 10K x 300 = 3,000,000
Most of parameters of a neural network is in their input and output layer
7
Example 1
Feedforward Neural Network
Assume we have a one direction RNN
Input word embeddings =
300 x 10K = 3,000,000
Wx = 300 x 300 = 90,000
Ws = 300 x 300 = 90,000
b = 300
Output word embeddings
Wy = 10K x 300 = 3,000,000
8
Example 2
Recurrent Neural Network
Neural Network Language Model
Parameters of a Neural Network
RNN vs N-gram language model
Vanishing Gradient Problem in RNN
9
Table of Content
What advantage does an RNN language model have over N-gram language model?
RNN language model can capture arbitrarily long contexts, as opposed to an N-gram language model that uses a fixed-width contexts.
RNN does so by processing a sequence one word at a time, applying a recurrence formula and using a state vector to represent words that have been previously processed.
10
RNN vs N-gram language model
Neural Network Language Model
Parameters of a Neural Network
RNN vs N-gram language model
Vanishing Gradient Problem in RNN
11
Table of Content
What is the vanishing gradient problem in RNN? And what causes it?
If we have a really deep neural network, such as an unrolled RNN, which spans over time (think of a typical document that has 200 words), the gradient of the weight in earlier layers tend to vanish(i.e. nearly zero) or explode(i.e. very large real number), because the gradient terms are multiplied during backpropagation and can increase or decrease exponentially as a function of number of layers needed to reach back to the weight.
How to solve it?
More sophisticated RNN models such as LSTM or GRU use “memory cells” that preserve gradients across time
12
Vanishing Gradient Problem
Takeaways
Neural Network Language Model
Definition
How to deal with sparsity
Parameters of a Neural Network
Feedforward neural network
Recurrent neural network (RNN)
RNN vs N-gram language model
Vanishing Gradient Problem in RNN
13
Thank you
/docProps/thumbnail.jpeg