13_tcn_rnn
Qiuhong Ke
Temporal Convolutional Network
& Recurrent Neural Network
COMP90051 Statistical Machine Learning
Copyright: University of Melbourne
Sit down or stand up?
We need the learn the temporal information to understand the action
… …
2
temporal information: temporal evolution (changes) of
the human pose
Temporal information does matter
… …
3
Change the order: different actions
What is the next word ?
4
This game is really fun
This cookie is really tasty
Like?Dislike? Sentiment analysis
Amazon product data
I bought this charger in Jul 2003 and it worked OK for a while. The design is nice and convenient.
However, after about a year, the batteries would not hold a charge. Might as well just get alkaline
disposables, or look elsewhere for a charger that comes with batteries that have better staying
power.
5http://jmcauley.ucsd.edu/data/amazon/
http://jmcauley.ucsd.edu/data/amazon/
http://jmcauley.ucsd.edu/data/amazon/
Outline
• Temporal convolutional network (TCN)
• Recurrent neural network (RNN)
6
1D Convolution
sliding
window
!(#) = & ‘ # + ) *())
!”#
$%&
+ = , ∗ .
kernel (filter):
. = *&, *#, *’ (
Σ
×*& ×*# ×*’
!! !” !# !$ !% !& !’
, is input vector
+ is output vector (feature map)
“! “” “# “$ “%
7
TCN RNN
1D Convolution
Σ
×”! ×”” ×”#
!! !” !# !$ !% !& !’
“! “” “# “$ “%
# is input vector
$ is output vector
8
TCN RNN
1D Convolution
Σ
×”! ×”” ×”#
!! !” !# !$ !% !& !’
“! “” “# “$ “%
# is input vector
$ is output vector
9
TCN RNN
1D Convolution
Σ
×”! ×”” ×”#
!! !” !# !$ !% !& !’
“! “” “# “$ “%
# is input vector
$ is output vector
10
TCN RNN
1D Convolution
Σ
×”! ×”” ×”#
!! !” !# !$ !% !& !’
“! “” “# “$ “%
# is input vector
$ is output vector
11
TCN RNN
2D Convolution? Next lecture
TCN RNN
12 Figure 9.1 in Deep learning by Ian Goodfellow and Yoshua Bengio and Aaron Courville
Temporal convolution: 1D convolution (on the temporal dimension)
Σ
×”! ×”” ×”#
!! !” !# !$ !% !& !’
“! “” “# “$ “%
a: 6 timesteps of inputs,
each has 1 dimension
#: output sequence
13
1D input feature vector:
special case of sequence data
A sequence that consists of 7 time-steps:
One feature in each step
TCN RNN
Temporal convolution on a sequence of feature vectors
14
Input:
• a sequence of N=4 time-steps
• Each time-step is a feature vector
(dimension: d=2 (2 features))
Kernel: a weight matrix (dxK)
• One dimension is the same as d
• Another dimension: size of temporal
convolutional window K
N
K
TCN RNN
Temporal convolution for text
15https://www.youtube.com/watch?time_continue=227&v=0N3PsjfXW9g&feature=emb_logo
represent words at each time-step before temporal convolutionHow to
I bought this charger in Jul 2003 and it worked OK for a while. The design is nice and convenient.
However, after about a year, the batteries would not hold a charge. Might as well just get alkaline
disposables, or look elsewhere for a charger that comes with batteries that have better staying
power.
• One-hot encoding
• Word embeding
TCN RNN
• Create a feature vector
• Dimension of the vector == size of vocabulary
• If the word is the word in the vocabulary,
then the element of the vector is 1, all the
others are 0
• Sparse and high-dimensional
dic={“the”,”cat”,”sat”,”on”,”mat”,”.”,
“these”,”are”,”other”,”words”}
16
One-hot encoding
ith
ith
TCN RNN
willoweit.
Underline
willoweit.
Oval
17
Word embedding: learn low-dimensional embedding from data
Figure 6.1 in Deep learning with python by Francois Chollet Figure 6.3 in Deep learning with python by Francois Chollet
TCN RNN
18
Word embedding: learn low-dimensional embedding from data
Figure 6.4 in Deep learning with python by Francois Chollet
Embedding layer:
• Create a weight matrix of d (emb_dim) rows, N (vocabulary_size) columns
• Return the column as the feature vector for the word
• Learn word embeddings jointly with the main task or load pretrained word embeddings
ith ith
TCN RNN
Example
19
Word
embedding
Input: Kernel:
Output:
0.2
element-wise
multiplication
and sum
TCN RNN
Example
20
Word
embedding
Input: Kernel:
Output:
0.2
element-wise
multiplication
and sum
0.7
TCN RNN
Example
21
Word
embedding
Input: 2 Kernels:
Output:
0.0
0.5
0.4
element-wise
multiplication
and sum
TCN RNN
Example
22
Word
embedding
Input: 2 Kernels:
Output:
0.0
0.5
0.4
element-wise
multiplication
and sum
-0.4
TCN RNN
A special version: Dilated casual convolution
23Bai, , J. Zico Kolter, and Vladlen Koltun. “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling.” arXiv preprint arXiv:1803.01271 (2018).
Dilation factor (d): step
between every two
adjacent filter taps.
Larger dilation
factor: the weights
are placed more far
away (i.e., more
sparse),
Dilated: expand the alignment of the kernel weights by dilation factor
TCN RNN
Dilated casual convolution
24Bai, , J. Zico Kolter, and Vladlen Koltun. “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling.” arXiv preprint arXiv:1803.01271 (2018).
Causal: The output at at time-step t does not depend on the input
information after time-step t (e.g., real-time recognition, prediction)
TCN RNN
25
TCN RNN
Networks with loops
=
t=2
RNN
o2
x2
s2
⋯
t=3
RNN
o3
x3
s3 s4
t=K
RNN
oK
xK
RNN
o1
x1
t=1
RNN
o
x
Recurrent
connection
Process sequence step by step
Recurrent connection: the output hidden feature vector (state) of each step
is connected to the next step
s1
26
RNN: Process sequence step by step
At each step t: combine current tilmestep (feature vector of input
sequence at tilmestep ) and historical information, i.e., state (feature
vector) to generate
xt
t st
ot
…
…
…xt
st
ot
{
{
{
s1 : zero feature vector
RNN
ot
xtx
TCN RNN
st
st = ot−1
ot = activation(W ⋅ xt + U ⋅ st + b)
Shared over time
A simple RNN
27
TCN RNN
Figure 6.13 in Deep learning with python by Francois Chollet
28
Example: RNN for sentiment analysis
Input: review ( a piece of text) e.g.,“It works well”
output: 1: like or 0: dislike
TCN RNN
xt
s1 RNN
x
It
Word
Embedding
s2
works
RNN
s3
well
RNN
o3
Sigmoid
z y FC layer L
Binary
cross-entropy
ot = activation(W ⋅ xt + U ⋅ st + b)
s1 : zero feature vector
st = ot−1
Recap of Chain rule
Given z = g(u) u = f(x)
dz
dx
=
dz
du
du
dx
Example : z = sin(x2)
u = x2
z = sin(u)
dz
du
= cos(u)
dz
dx
=
dz
du
du
dx
= 2x cos(u)
29
TCN RNN
Recap of Multi-layer perceptron: Function composition
z = f(s)
x → r → u → s → z
u = f(r)uj = f(rj) = sigmoid(rj) =
1
1 + e−rj
s =
p
∑
j=0
wjuj
rj =
m
∑
i=0
xivij
z = f(s) = sigmoid(s) =
1
1 + e−s
Forward prediction
f : activation functions
L
!!
!”
!#
“!
#
“$…
…
!!
“”!
$!
$$
%…
30
TCN RNN
Recap of Multi-layer perceptron: Chain rule
x → r → u → s → z
Introduction Gradient descent Neural network
L
∂L
∂vij
=
∂L
∂z ∂L
∂z
∂z
∂s ∂z
∂s
uj = sigmoid(rj) =
1
1 + e−rj
s =
p
∑
j=0
ujwj
rj =
m
∑
i=0
xivij
∂uj
∂rj
∂uj
∂rj
∂ri
∂vij∂rj
∂vij
∂s
∂uj
∂s
∂uj
z = f(s)
u = f(r)
Forward
Backward propagation:
!!
!”
!#
“!
#
“$…
…
!!
“”!
$!
$$
%…
31
32
BackPropagation Through Time
xt
TCN RNN
s1 RNN
x
It
Word
Embedding
works
RNN
well
RNN
o3
∂L
∂y
∂y
∂z
z y L
∂z
∂o3
∂o3
∂W ∂L
∂y
∂y
∂z
∂z
∂o3
∂o3
∂W
+
∂o1
∂W
∂o1
∂W
∂L
∂W
=
∂o2
∂W ∂o2
∂W
∂L
∂y
∂y
∂z
∂z
∂o3
∂o3
∂o2
∂o3
∂o2
+
∂L
∂y
∂y
∂z
∂z
∂o3
∂o3
∂o2
∂o2
∂o1
∂o2
∂o1
ot = activation(W ⋅ xt + U ⋅ st + b)
A special RNN: Long short-term memory (LSTM)
33
Figure 6.14 in Deep learning with python by Francois Chollet
TCN RNN
34
Figure 6.15 in Deep learning with python by Francois Chollet
TCN RNN
A special RNN: Long short-term memory (LSTM)
it = sigmoid(Wi ⋅ xt + Ui ⋅ st + bi)
ft = sigmoid(Wf ⋅ xt + Uf ⋅ st + bf)
c̃t+1 = tanh(Wc ⋅ xt + Uc ⋅ st + bc)
ct+1 = ct * ft + c̃t+1 * it
35
TCN RNN
Forget gate:
Input gate:
A special RNN: Long short-term memory (LSTM)
Summary
• How does temporal convolution work?
• How does dilated casual convolution work?
• How does recurrent neural network work?
• How does LSTM work?
36