CS计算机代考程序代写 deep learning algorithm COMP5046

COMP5046
Natural Language Processing
Lecture 4: Word Classification and Machine Learning 2
Dr. Caren Han
Semester 1, 2021
School of Computer Science, University of Sydney

0
LECTURE PLAN
Lecture 4: Word Classification and Machine Learning 2
1.
2. 3.
4. 5.
Machine Learning and NLP: Finish
Seq2Seq Learning
Seq2Seq Deep Learning
1. RNN (Recurrent Neural Network)
2. LSTM (Long Short-Term Memory)
3. GRU (Gated Recurrent Unit)
Data Transformation for Deep Learning NLP
Next Week Preview
• Natural Language Processing Stack
…. And some interesting notice in the end of the lecture!

1
Machine Learning and NLP
The purpose of Natural Language Processing: Overview
Understanding
Searching
Dialog
Translation
Sentiment Analysis
Topic Classification
Topic Modelling
….
Search
….
Entity Extraction
When Sebastian Thrun …
Claudia sat on a stool
She sells seashells
Drinking, Drank, Drunk How is the weather today
[she/PRP] [sells/VBZ] [seashells/NNS]
Drink
[How] [is] [the] [weather] [today]
Parsing
PoS Tagging
Stemming
Tokenisation
NLP Stack Application

1
Machine Learning and NLP
Problem Abstraction
N-to-1 Problem
Sentiment Analysis
class
N-to-N Problem
Topic Classification
N-to-Path Problem
N-to-M Problem
PoS Tagging
Parsing
Dialog Translation
class
class
class
class

1
Machine Learning and NLP
Prediction

1
Machine Learning and NLP
Prediction
smiles butter
Vocab
Output distribution
softmax
Hidden layer
Word embeddings (optional)
one hot vector boy is spreading

1
Machine Learning and NLP
Prediction
smiles butter
Vocab
Output distribution
softmax
Hidden layer
Word embeddings (optional)
one hot vector boy is spreading
What if we consider this as a sequential input? Let’s add the concept ‘time’

0
LECTURE PLAN
Lecture 4: Word Classification and Machine Learning 2
1.
2.
3.
4. 5.
Machine Learning and NLP: Finish
Seq2Seq Learning
Seq2Seq Deep Learning
1. RNN (Recurrent Neural Network)
2. LSTM (Long Short-Term Memory)
3. GRU (Gated Recurrent Unit)
Data Transformation for Deep Learning NLP
Next Week Preview
• Natural Language Processing Stack

2
Sequence 2 Sequence Learning
Illustration
Sequence 2 Sequence Learning

2
Sequence 2 Sequence Learning
Running time
M = # of
Sequence Generation
Sequence 2 Sequence Learning
N = # of
Sequence Feeding

2
Sequence 2 Sequence Learning
Sequence 2 Sequence Learning
N =M
N≠M
M> 1
M= 1

2
Sequence 2 Sequence Learning
Seq2Seq – Speech Recognition
How is the weather today Output: Text
Sequence 2 Sequence Learning
Input: Speech Signal

2
Sequence 2 Sequence Learning
Seq2Seq – Movie Frame Labelling
Swing Swing Hit Bat_Broken Output: Scene Labels
Sequence 2 Sequence Learning
Input: Video Frame

2
Sequence 2 Sequence Learning
Seq2Seq – PoS Tagging
ADV VERB DET NOUN NOUN Output: Part of Speech
Sequence 2 Sequence Learning
How is the weather today Input: Text

2
Sequence 2 Sequence Learning
Seq2Seq – Arithmetic Calculation
XY

2
Sequence 2 Sequence Learning
Seq2Seq – Arithmetic Calculation
35
Output: Numbers
Sequence 2 Sequence Learning
7×5
Input: Math Expression

2
Sequence 2 Sequence Learning
Seq2Seq – Machine Translation
今天天气怎么样?
Output: Chinese Text
Sequence 2 Sequence Learning
How is the weather today
Input: English Text

2
Sequence 2 Sequence Learning
Seq2Seq – Sentence Completion
How is the weather today?
How long does it take?
Let’s go to the opera house
It is quite hot inside
I may need to stop by Darling Harbour When is the dinner appointment Change the schedule
Text him that I cannot meet at 6:30pm
I like learning Natural Language Processing

2
Sequence 2 Sequence Learning
Seq2Seq – Sentence Completion
X
Y

2
Sequence 2 Sequence Learning
Seq2Seq – Sentence Completion
Natural Language Processing
I like learning
Natural Language Processing
Output: Partial Sentence
Sequence 2 Sequence Learning
I like
learning
Input: Partial Sentence

2
Sequence 2 Sequence Learning
Seq2Seq – Conversation Modelling
Conversation
X
Y

2
Sequence 2 Sequence Learning
Seq2Seq – Conversation Modelling
Okay. I will open windows for you
Output: Utterance
Sequence 2 Sequence Learning
It is quite hot inside
Input: Utterance

0
LECTURE PLAN
Lecture 4: Word Classification and Machine Learning 2
1. 2. 3.
4. 5.
Machine Learning and NLP: Finish
Seq2Seq Learning
Seq2Seq Deep Learning
1. RNN (Recurrent Neural Network)
2. LSTM (Long Short-Term Memory)
3. GRU (Gated Recurrent Unit)
Data Transformation for Deep Learning NLP
Next Week Preview
• Natural Language Processing Stack

Seq2Seq with Deep Learning
Prediction
Output Layer
Hidden Layer
3
Input Layer
NLP Course is so

Seq2Seq with Deep Learning
Prediction + Convolution Idea
Output Layer
Hidden Layer
3
Input Layer
NLP Course is so

Seq2Seq with Deep Learning
Prediction + Memory = Sequence Modelling
Output Layer
Hidden Layer
3
Memory
Input Layer
NLP Course
is so

3
Seq2Seq with Deep Learning
Prediction + Memory = Sequence Modelling
Output Layer
Hidden Layer
Memory
Input Layer
The previous
The present
The present of next stage
The present
of (2)next stage

3
Seq2Seq with Deep Learning
Neural Network + Memory
Memory is vital to experiences, it is the retention of information over time for the purpose of influencing future action
Reason Action
Reason Action
Reason
The previous The present The present of next stage
Action
The present
of (2)next stage

3
Seq2Seq with Deep Learning
Output Layer
Hidden Layer
Neural Network + Memory
forward Backpropagation
Input Layer
The previous
The present
The present of next stage
The present
of (2)next stage

3
Seq2Seq with Deep Learning
Neural Network + Memory = Recurrent Neural Network
forward
Y1 Output
Layer
Hidden Layer
Input Layer
X1

3
Seq2Seq with Deep Learning
Neural Network + Memory = Recurrent Neural Network (RNN)
Yn Output
forward
Memory
Layer
Hidden Layer
=
Input Layer
NOTICE: the same function and the same set of parameters W are used at every time step
Xn Standard RNN
Unfolded RNN

3
Seq2Seq with Deep Learning
Neural Network + Memory = Recurrent Neural Network (RNN)
Output Layer
Hidden Layer
Input Layer

3
Seq2Seq with Deep Learning
Neural Network + Memory = Recurrent Neural Network
Hidden Layer
h𝑡−1
𝑾𝒉𝒉
𝑾𝒙𝒉
h𝑡
Input Layer
Previous state
input
𝑥𝑡 New hidden state
A function
with parameters W
h𝑡
=
(h𝑡−1, 𝑥𝑡)
𝑓 𝑊

3
Seq2Seq with Deep Learning
Neural Network + Memory = Recurrent Neural Network
Hidden Layer
h𝑡−1
h𝑡
Input Layer
𝑥𝑡
Previous state
input
New hidden state
A function
with parameters W
h𝑡
=
(h𝑡−1, 𝑥𝑡)
𝑓 𝑊

3
Seq2Seq with Deep Learning
Neural Network + Memory = Recurrent Neural Network
Hidden Layer
h𝑡−1
Input Layer
𝑥𝑡
h𝑡
h𝑡
= tanh(𝑊 h
hh 𝑡−1
+ 𝑊 𝑥 + 𝑏 ) 𝑥h 𝑡 h
New hidden state A function Previous state input with parameters W

3
Seq2Seq with Deep Learning
Tanh activation
The tanh activation is used to help regulate the values flowing through the network. The tanh function squishes values to always be between -1 and 1.

3
Seq2Seq with Deep Learning
Neural Network + Memory = Recurrent Neural Network
With Sequence Input

3
Seq2Seq with Deep Learning
Neural Network + Memory = Recurrent Neural Network
Q: Why do we need tanh function?
Vector Transformations without tanh
Vector Transformations with tanh

3
Seq2Seq with Deep Learning
Neural Network + Memory = Recurrent Neural Network
Several Variants of RNN
Vanilla Neural Network
e.g. image captioning
e.g. Sentiment classification
e.g. machine translation or dialog system
e.g. part-of-speech tagger

3
Seq2Seq with Deep Learning
Neural Network + Memory = Recurrent Neural Network
Backpropagation through time
Output Layer
Hidden Layer
𝑦1
𝑦𝑦𝑦𝑦 2345
…
h0 h1 h2 h3 h4 h5
What about very long sequence?
Use Truncate Backpropagation through time
Input Layer
𝑥1 𝑥2 𝑥3 𝑥4 𝑥5
• •
Similar as standard backpropagation on unrolled network Similar as training very deep networks with tied parameters

3
Seq2Seq with Deep Learning
Neural Network + Memory = Recurrent Neural Network
Truncated Backpropagation through time
Output Layer
Hidden Layer
Loss
Input Layer
Run forward and backward through chunks of the sequence instead of whole sequence

3
Seq2Seq with Deep Learning
Neural Network + Memory = Recurrent Neural Network
Truncated Backpropagation through time
Output Layer
Hidden Layer
Input Layer
Carry hidden states forward in time forever, but only backpropagate for some smaller number of steps
Loss

3
Seq2Seq with Deep Learning
Neural Network + Memory = Recurrent Neural Network
Many to 1
Output Layer
y
Hidden h 𝑾𝒉𝒉 h 𝑾𝒉𝒉 h 𝑾𝒉𝒉 h 𝑾𝒉𝒉 h tanh Layerh0 1 2 3 4 5
𝑾𝒙𝒉 𝑾𝒙𝒉 𝑾𝒙𝒉 𝑾𝒙𝒉 𝑾𝒙𝒉
Input Layer
𝑥1 𝑥2 𝑥3 𝑥4 𝑥5

3
Seq2Seq with Deep Learning
Neural Network + Memory = Recurrent Neural Network
Many to 1 – Text Classification
Positive Neg
Output Layer
softmax Hidden h 𝑾𝒉𝒉 h 𝑾𝒉𝒉 h 𝑾𝒉𝒉 h 𝑾𝒉𝒉 h tanh
Layerh0 1 2 3 4 5 𝑾𝒙𝒉 𝑾𝒙𝒉 𝑾𝒙𝒉 𝑾𝒙𝒉 𝑾𝒙𝒉
Input Layer
𝑥1 𝑥2 𝑥3 𝑥4 𝑥5
I am crazy in love

3
Seq2Seq with Deep Learning
Neural Network + Memory = Recurrent Neural Network
Many to Many
𝑦1 𝑦2 𝑦3 𝑦4 𝑦5 Hidden h 𝑾𝒉𝒉 h 𝑾𝒉𝒉 h 𝑾𝒉𝒉 h 𝑾𝒉𝒉 h tanh
Output Layer
Layerh0 1 2 3 4 5 𝑾𝒙𝒉 𝑾𝒙𝒉 𝑾𝒙𝒉 𝑾𝒙𝒉 𝑾𝒙𝒉
Input Layer
𝑥1 𝑥2 𝑥3 𝑥4 𝑥5
I am crazy in love

3
Seq2Seq with Deep Learning
Neural Network + Memory = Recurrent Neural Network
Many to Many
Output Layer
PRP VBP JJ IN NN
Hidden h 𝑾𝒉𝒉 h 𝑾𝒉𝒉 h 𝑾𝒉𝒉 h 𝑾𝒉𝒉 h tanh Layerh0 1 2 3 4 5
𝑾𝒙𝒉 𝑾𝒙𝒉 𝑾𝒙𝒉 𝑾𝒙𝒉 𝑾𝒙𝒉
Input Layer
𝑥1 𝑥2 𝑥3 𝑥4 𝑥5
I am crazy in love

Seq2Seq with Deep Learning
Limitation of Vanilla RNN
Out1 does not cover the data2
anddata3 Out1Out2 Out3
3
If Data1 is too far, it cannot cover the data1
… OutN
… Input N The Problem of Learning Long-Range Dependencies
Input1 Input2 Input3

3
Seq2Seq with Deep Learning
Limitation of Vanilla RNN
“I grew up in Italy … (5 more sentences)… My grandma’s house was very cosy and… (5 more sentences)… I speak fluent ____”

3
Seq2Seq with Deep Learning
Limitation of Vanilla RNN
Limitation1: Vanishing Gradient Issue
During back-propagation and calculating gradients, it tends to get smaller and smaller as we keep on moving backward in the Network. This means that the neurons in the Earlier layers learn very slowly as compared to the neurons in the later layers in the Hierarchy.

3
Seq2Seq with Deep Learning
Limitation of Vanilla RNN
Limitation2: Exploding Gradient
In RNN, error gradients can accumulate during an update and result in very large gradients. These in turn result in large updates to the network weights, and an unstable network. At an extreme, the values of weights can become so large as to overflow and result in NaN weight values that can no longer be updated.

3
Seq2Seq with Deep Learning
LSTM (Long Short-Term Memory) – Idea
Yn
Memory
Xn Standard RNN

3
Seq2Seq with Deep Learning
LSTM (Long Short-Term Memory) – Idea
Yn
Yn
Memory
Memory
Xn Standard RNN
Xn
Long Short-Term Memory
Output gate
Forget gate
Input gate

3
Seq2Seq with Deep Learning
LSTM (Long Short-Term Memory) – Idea
• 4 times more parameters than RNN
• Mitigates vanishing gradient problem through gating
• Widely used and was SOTA in many sequence learning problems
State-Of-The-Art

3
Seq2Seq with Deep Learning
Sigmoid activation
A sigmoid activation is similar to the tanh activation. Instead of squishing values between -1 and 1, it squishes values between 0 and 1.
Any number getting multiplied by 0 is 0, causing values to disappears or be “forgotten.”
Any number multiplied by 1 is the same value therefore that value stays the same or is “kept.”

3
Seq2Seq with Deep Learning
LSTM (Long Short-Term Memory) – Forget Gate
𝑓=σ(𝑊[h ,𝑥]+𝑏) 𝑡 𝑓𝑡−1𝑡 𝑓
Decides what information should be thrown away or kept
Information from the previous hidden state and information from the current input is passed through the sigmoid function. Values come out between 0 and 1. The closer to 0 means to forget, and the closer to 1 means to keep.

3
Seq2Seq with Deep Learning
LSTM (Long Short-Term Memory) – Input Gate
𝑖=σ(𝑊[h ,𝑥]+𝑏) 𝑡 𝑖𝑡−1𝑡 𝑖
ሚ
𝐶=tanh(𝑊[h ,𝑥]+𝑏)
𝑡 𝐶𝑡−1𝑡𝐶
1. Pass the previous hidden state and current input into a sigmoid function
2. Pass the hidden state and current input into the tanh function to squish values
between -1 and 1 to help regulate the network
3. Multiply the tanh output with the sigmoid output
*sigmoid output will decide which information is important to keep from the tanh output

3
Seq2Seq with Deep Learning
LSTM (Long Short-Term Memory) – Cell States
ሚ 𝐶=𝑓∗𝐶 +𝑖∗𝐶
𝑡 𝑡 𝑡−1 𝑡 𝑡
• the cell state gets pointwise multiplied by the forget vector
• take the output from the input gate and do a pointwise addition which
updates the cell state to new values that the neural network finds relevant
• That gives us our new cell state

3
Seq2Seq with Deep Learning
LSTM (Long Short-Term Memory) – Output Gate
𝑜=σ(𝑊[h ,𝑥]+𝑏) 𝑡 𝑜𝑡−1𝑡 𝑜
h =𝑜 ∗tanh(𝐶) 𝑡𝑡𝑡
decides what the next hidden state should be.
• pass the previous hidden state and the current input into a sigmoid function
• pass the newly modified cell state to the tanh function
• multiply the tanh output with the sigmoid output to decide what information
the hidden state should carry

3
Seq2Seq with Deep Learning
LSTM (Long Short-Term Memory) – Overall
Forget gate Input gate
Cell state
Output gate

3
Seq2Seq with Deep Learning
Gated Recurrent Unit

3
Seq2Seq with Deep Learning
Gated Recurrent Unit
• GRU first computes an update gate based on current input word vector and hidden state
• Compute reset gate similarly but
with different weights
• If reset gate unit is ~0, then this ignores
previous memory and only stores the new word information
• Final memory at time step combines current and previous time steps

3
Seq2Seq Modelling
Seq2Seq – PoS tagger
ADV VERB DET NOUN NOUN Output: Part of Speech
Sequence 2 Sequence Learning
How is the weather today Input: Text

3
Seq2Seq Modelling
Sequence Modelling for POS Tagging
ADV VERB
DET NOUN NOUN
RNN
Output Layer
Hidden Layer
N to N
Distributed
Input Layer
One-hot
Symbol to Vector Lookup Table
How is the weather today

0
LECTURE PLAN
Lecture 4: Word Classification and Machine Learning 2
1. 2. 3.
4.
5.
Machine Learning and NLP: Finish
Seq2Seq Learning
Seq2Seq Deep Learning
1. RNN (Recurrent Neural Network)
2. LSTM (Long Short-Term Memory)
3. GRU (Gated Recurrent Unit)
Data Transformation for Deep Learning NLP
Next Week Preview
• Natural Language Processing Stack

4
Data Transformation for Deep Learning NLP
ImageNet: Image Classification
Image Pixel
cat leopard jaguar cheetah

4
Data Transformation for Deep Learning NLP
Topic Classification
News Articles
politics entertainment sports education

4
Data Transformation for Deep Learning NLP
Visual Question Answering

4
Data Transformation for Deep Learning NLP
Classification Formulation
Input
Class 1 Class 2 Class 3 Class 4

4
Data Transformation for Deep Learning NLP
Classification Formulation
Input
Big &
Deep Network
Class 1 Class 2 Class 3 Class 4

4
Data Transformation for Deep Learning NLP
Classification

4
Data Transformation for Deep Learning NLP
Graphical Notation for Data
Data X
Data X
10
2
8
2
15
3
5
1
5
10
2
8
2
15
3
5
1
5
=

4
Data Transformation for Deep Learning NLP
V to 1
10
2
8
2
15
3
5
1
5
10
2
8
2
15
3
5
1
5
=
?

4
Data Transformation for Deep Learning NLP
V to 1 – Simple Method

4
Data Transformation for Deep Learning NLP
V to 1 – Weighted Method
Element-wise multiplication

4
Data Transformation for Deep Learning NLP
V to 1 – General Form
Element-wise multiplication

4
Data Transformation for Deep Learning NLP
V to 1 – Linear Algebra
[9×1] matrix
w1
w2
w3
w4
w5
w6
w7
w8
w9
[1 x 9] matrix
[1×1] matrix
9
∑𝑥∗𝑤 𝑖𝑖
𝑖
x1
x2
x3
x4
x5
x6
x7
x8
x9
X=

4
Data Transformation for Deep Learning NLP
Convolution Neural Network (1)
Data Abstraction
Mairal et al., Convolutional Kernel Networks, 2014

4
Data Transformation for Deep Learning NLP
Convolution Neural Network (2)
Mairal et al., Convolutional Kernel Networks, 2014

4
Data Transformation for Deep Learning NLP
V to V’
V= 9 V’= 2
10
2
8
2
15
3
5
1
5
?
?
10
2
8
2
15
3
5
1
5
?
?

4
Data Transformation for Deep Learning NLP
V to V’ – generalized method

4
Data Transformation for Deep Learning NLP
V to V’ – generalized method
∑∑

4
Data Transformation for Deep Learning NLP
V to V’ – Projection Notation
(1×9)
(1×2)
V’

4
Data Transformation for Deep Learning NLP
V to V’ – Projection with Context (1)

4
Data Transformation for Deep Learning NLP
V to V’ – Projection with Context (2)

4
Data Transformation for Deep Learning NLP
V to V’ with Context – Linear Algebra

4
Data Transformation for Deep Learning NLP
V to V’ with Context – Linear Algebra (Simplified)
I II

4
Data Transformation for Deep Learning NLP
V→V’→1

4
Data Transformation for Deep Learning NLP
Seq2Seq Encoding
Single Item Multiple Item Summarisation Summarisation
?

4
Data Transformation for Deep Learning NLP
Multiple Item Summarisation
10
2
8
2
15
3
5
1
5
13
4
8
4
5
2
1
45
31
6
3
4
1
7
1
3
4
0
?
?
?
?
?
?
?
?
?
Data 1 Data 2 Data 3
V
?
?
V’
1
?

4
Data Transformation for Deep Learning NLP
Vs to V’
10
2
8
2
15
3
5
1
5
13
4
8
4
5
2
1
45
31
6
3
4
1
7
1
3
4
0
9.6
?
6.6
?
?
?
?
?
?
Element-wise Average

4
Data Transformation for Deep Learning NLP
Vs to V’
10
2
8
2
15
3
5
1
5
13
4
8
4
5
2
1
45
31
6
3
4
1
7
1
3
4
0

4
Data Transformation for Deep Learning NLP
Temporal Summarisation
Context
How to include Temporal information?

4
Data Transformation for Deep Learning NLP
Vs→V’s→V’
W
2
Data 2

4
Data Transformation for Deep Learning NLP
Vs→V’s→V’ Context
𝑾𝒄
WW
Reflect Data 2 (the present) and the past Data
1
Data 1 Data 2
2

4
Data Transformation for Deep Learning NLP
Vs→V’s→V’ Context
𝑾𝒄
WW
Reflect Data 2 (the present) and the past Data
Reflect Data 3 (the present) and the past Data
W
1
2
Data 1 Data 2 Data 3
3

4
Data Transformation for Deep Learning NLP
Vs→V’s→V’ Context
𝑾𝒄
WW
Reflect Data 2 (the present) and the past Data
Reflect Data 3 (the present) and the past Data
W
Reflect
all information
1
2
Data 1 Data 2 Data 3
3

4
Data Transformation for Deep Learning NLP
Graphical Notation
Hidden
State
Input
Data
U
W
RNN Layer
Data t-1
Data t Vs→V’s
Data t+1
Simplified Version
Vs→V’s

4
Data Transformation for Deep Learning NLP
Forward/Backward RNN
Forward RNN
U
W
Backward RNN
Hidden
State
Input
Data
Data 1
Data 2 Vs→V’s
Data 3
Data 1
Data 2 Vs→V’s
Data 3

4
Data Transformation for Deep Learning NLP
Bidirectional RNN
V’ V’
V
concatenation
Data 1
Data 2
Vs → (2*V’)s
Data 3

4
Data Transformation for Deep Learning NLP
Stacking RNN
Data 1 Data 2 Data 3 Vs→V’s

4
Data Transformation for Deep Learning NLP
RNN: Input and Output
Out 1 Out 2 Out 3
✓Vs→V’s
✓ Len(Vs)→ Len(V’s)
Data 1 Data 2 Data 3
Out Summarisation
✓Vs→1
Data 1 Data 2 Data 3

4
Data Transformation for Deep Learning NLP Seq2Seq Encoding and Decoding- Dialog System
I am fine

One-hot vector
Decoder Output Layer
Decoder Recurrent Layer
Decoder embedding layer
One-hot vector
Encoder recurrent layer
Encoder embedding layer
One-hot vector
I
am fine
[Decoder]
How are
[Encoder]
you ?

0
LECTURE PLAN
Lecture 4: Word Classification and Machine Learning 2
1. 2. 3.
4.
5.
Machine Learning and NLP: Finish
Seq2Seq Learning
Seq2Seq Deep Learning
1. RNN (Recurrent Neural Network)
2. LSTM (Long Short-Term Memory)
3. GRU (Gated Recurrent Unit)
Data Transformation for Deep Learning NLP
Next Week Preview
• Natural Language Processing Stack

5
Next Week Preview
The purpose of Natural Language Processing: Overview
Understanding
Searching
Dialog
Translation
Sentiment Analysis
Topic Classification
Topic Modelling
….
Search
….
Entity Extraction
When Sebastian Thrun …
Claudia sat on a stool
She sells seashells
Drinking, Drank, Drunk How is the weather today
[she/PRP] [sells/VBZ] [seashells/NNS]
Drink
[How] [is] [the] [weather] [today]
Parsing
PoS Tagging
Stemming
Tokenisation
NLP Stack Application

/
Reference
Reference for this lecture
• Deng, L., & Liu, Y. (Eds.). (2018). Deep Learning in Natural Language Processing. Springer.
• Rao, D., & McMahan, B. (2019). Natural Language Processing with PyTorch: Build Intelligent Language Applications Using Deep Learning. ” O’Reilly Media, Inc.”.
• Manning, C. D., Manning, C. D., & Schütze, H. (1999). Foundations of statistical natural language processing. MIT press.
• Blunsom, P 2017, Deep Natural Language Processing, lecture notes, Oxford University
• Manning, C 2017, Natural Language Processing with Deep Learning, lecture notes, Stanford University
• Sordoni, A., Bengio, Y., Vahabi, H., Lioma, C., Grue Simonsen, J., & Nie, J. Y. (2015, October). A hierarchical recurrent encoder-decoder for generative context-aware query suggestion. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (pp. 553-562). ACM.
Figure Reference
• https://towardsdatascience.com/types-of-optimization-algorithms-used-in-neural-networks-and-ways-to- optimize-gradient-95ae5d39529f
• https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation- 44e9eb85bf21

Related Posts