程序代写代做代考 PowerPoint Presentation

PowerPoint Presentation

Comp90042
Workshop
Week 5

20 April

Neural Network Language Model

Parameters of a Neural Network

RNN vs N-gram language model

Vanishing Gradient Problem in RNN

2
Table of Content

Neural Network Language Model

Parameters of a Neural Network

RNN vs N-gram language model

Vanishing Gradient Problem in RNN

3
Table of Content

What is Neural Network Language Model?
Basically, a language model that utilizes neural network. It can be feed-forward neural networks, RNN, CNN and etc.

How does a Neural Network Language Model deal with sparsity? Consider why is it an advantage over n-gram Language Model?

Neural Language Models (NLM) address the n-gram data sparsity issue through parameterization of words as vectors (word embeddings) and using them as inputs to a neural network. The parameters are learned as part of the training process. Word embeddings obtained through NLMs exhibit the property whereby semantically and syntactically close words are likewise close in the induced vector space
4
Neural Network language Model

Consider the following two sentences:

1. The cat is walking in the bedroom.
2. A dog was running in a room.

word vectors’ cosine similarity scores

Neural Network Language Model

Parameters of a Neural Network

RNN vs N-gram language model

Vanishing Gradient Problem in RNN

6
Table of Content

Assume a vocabulary with 10K words, and the architecture shown below
Assume the dimension of the embeddings and the hidden layer 300
Assume we use 2 previous words as context.

Input word embeddings =
300 x 10K = 3,000,000
Hidden layer W1 = 300 x 600 = 180,000
Hidden layer bias b1= 300
Output layer W2 = 10K x 300 = 3,000,000

Most of parameters of a neural network is in their input and output layer

Example 1
Feedforward Neural Network

Assume we have a one direction RNN
Input word embeddings =
300 x 10K = 3,000,000
Wx = 300 x 300 = 90,000
Ws = 300 x 300 = 90,000
b = 300
Output word embeddings
Wy = 10K x 300 = 3,000,000
8

Example 2
Recurrent Neural Network

Neural Network Language Model

Parameters of a Neural Network

RNN vs N-gram language model

Vanishing Gradient Problem in RNN

9
Table of Content

What advantage does an RNN language model have over N-gram language model?

RNN language model can capture arbitrarily long contexts, as opposed to an N-gram language model that uses a fixed-width contexts.

RNN does so by processing a sequence one word at a time, applying a recurrence formula and using a state vector to represent words that have been previously processed.

10
RNN vs N-gram language model

Neural Network Language Model

Parameters of a Neural Network

RNN vs N-gram language model

Vanishing Gradient Problem in RNN

11
Table of Content

What is the vanishing gradient problem in RNN? And what causes it?
If we have a really deep neural network, such as an unrolled RNN, which spans over time (think of a typical document that has 200 words), the gradient of the weight in earlier layers tend to vanish(i.e. nearly zero) or explode(i.e. very large real number), because the gradient terms are multiplied during backpropagation and can increase or decrease exponentially as a function of number of layers needed to reach back to the weight.
How to solve it?
More sophisticated RNN models such as LSTM or GRU use “memory cells” that preserve gradients across time
12
Vanishing Gradient Problem

Takeaways
Neural Network Language Model
Definition
How to deal with sparsity

Parameters of a Neural Network
Feedforward neural network
Recurrent neural network (RNN)

RNN vs N-gram language model

Vanishing Gradient Problem in RNN

Thank you

/docProps/thumbnail.jpeg

Related Posts