CS计算机代考程序代写 python chain deep learning 13_tcn_rnn

13_tcn_rnn

Qiuhong Ke

Temporal Convolutional Network
& Recurrent Neural Network
COMP90051 Statistical Machine Learning

Sit down or stand up?
We need the learn the temporal information to understand the action

… …

temporal information: temporal evolution (changes) of

the human pose

Temporal information does matter

… …

Change the order: different actions

What is the next word ?

This game is really fun

This cookie is really tasty

Like?Dislike? Sentiment analysis

Amazon product data

I bought this charger in Jul 2003 and it worked OK for a while. The design is nice and convenient.

However, after about a year, the batteries would not hold a charge. Might as well just get alkaline
disposables, or look elsewhere for a charger that comes with batteries that have better staying
power.

5http://jmcauley.ucsd.edu/data/amazon/

http://jmcauley.ucsd.edu/data/amazon/
http://jmcauley.ucsd.edu/data/amazon/

Outline

• Temporal convolutional network (TCN)

• Recurrent neural network (RNN)

1D Convolution

sliding
window

!(#) = & ‘ # + ) *())
!”#

$%&

+ = , ∗ .
kernel (filter):
. = *&, *#, *’ (

×*& ×*# ×*’
!! !” !# !$ !% !& !’

, is input vector

+ is output vector (feature map)
“! “” “# “$ “%

TCN RNN

1D Convolution

×”! ×”” ×”#
!! !” !# !$ !% !& !’

“! “” “# “$ “%

# is input vector

$ is output vector

TCN RNN

1D Convolution

×”! ×”” ×”#
!! !” !# !$ !% !& !’

“! “” “# “$ “%

# is input vector

$ is output vector

TCN RNN

1D Convolution

×”! ×”” ×”#
!! !” !# !$ !% !& !’

“! “” “# “$ “%

# is input vector

$ is output vector

TCN RNN

1D Convolution

×”! ×”” ×”#
!! !” !# !$ !% !& !’

“! “” “# “$ “%

# is input vector

$ is output vector

TCN RNN

2D Convolution? Next lecture

TCN RNN

12 Figure 9.1 in Deep learning by Ian Goodfellow and Yoshua Bengio and Aaron Courville

Temporal convolution: 1D convolution (on the temporal dimension)

×”! ×”” ×”#
!! !” !# !$ !% !& !’

“! “” “# “$ “%

a: 6 timesteps of inputs,
each has 1 dimension

#: output sequence

1D input feature vector:

special case of sequence data

A sequence that consists of 7 time-steps:

One feature in each step

TCN RNN

Temporal convolution on a sequence of feature vectors

Input:
• a sequence of N=4 time-steps
• Each time-step is a feature vector

(dimension: d=2 (2 features))

Kernel: a weight matrix (dxK)
• One dimension is the same as d
• Another dimension: size of temporal

convolutional window K

N
K

TCN RNN

Temporal convolution for text

15https://www.youtube.com/watch?time_continue=227&v=0N3PsjfXW9g&feature=emb_logo

represent words at each time-step before temporal convolutionHow to

I bought this charger in Jul 2003 and it worked OK for a while. The design is nice and convenient.

• One-hot encoding
• Word embeding

TCN RNN

• Create a feature vector

• Dimension of the vector == size of vocabulary

• If the word is the word in the vocabulary,

then the element of the vector is 1, all the
others are 0

• Sparse and high-dimensional

dic={“the”,”cat”,”sat”,”on”,”mat”,”.”,

“these”,”are”,”other”,”words”}

One-hot encoding

ith
ith

TCN RNN

willoweit.
Underline

willoweit.
Oval

Word embedding: learn low-dimensional embedding from data

Figure 6.1 in Deep learning with python by Francois Chollet Figure 6.3 in Deep learning with python by Francois Chollet

TCN RNN

Word embedding: learn low-dimensional embedding from data

Figure 6.4 in Deep learning with python by Francois Chollet

Embedding layer:

• Create a weight matrix of d (emb_dim) rows, N (vocabulary_size) columns

• Return the column as the feature vector for the word

• Learn word embeddings jointly with the main task or load pretrained word embeddings

ith ith

TCN RNN

Example

Word
embedding

Input: Kernel:

Output:

0.2

element-wise

multiplication

and sum

TCN RNN

Example

Word
embedding

Input: Kernel:

Output:

0.2

element-wise

multiplication

and sum

0.7

TCN RNN

Example

Word
embedding

Input: 2 Kernels:

Output:

0.0
0.5

0.4

element-wise

multiplication

and sum

TCN RNN

Example

Word
embedding

Input: 2 Kernels:

Output:

0.0
0.5

0.4

element-wise

multiplication

and sum

-0.4

TCN RNN

A special version: Dilated casual convolution

23Bai, , J. Zico Kolter, and Vladlen Koltun. “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling.” arXiv preprint arXiv:1803.01271 (2018).

Dilation factor (d): step
between every two
adjacent filter taps.

Larger dilation
factor: the weights
are placed more far
away (i.e., more
sparse),

Dilated: expand the alignment of the kernel weights by dilation factor

TCN RNN

Dilated casual convolution

24Bai, , J. Zico Kolter, and Vladlen Koltun. “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling.” arXiv preprint arXiv:1803.01271 (2018).

Causal: The output at at time-step t does not depend on the input
information after time-step t (e.g.， real-time recognition, prediction)

TCN RNN

Networks with loops

t=2

RNN

s2
⋯

t=3

RNN

s3 s4

t=K

RNN

x1
t=1

RNN

Recurrent
connection

Process sequence step by step
Recurrent connection: the output hidden feature vector (state) of each step
is connected to the next step

RNN: Process sequence step by step
At each step t: combine current tilmestep (feature vector of input
sequence at tilmestep ) and historical information, i.e., state (feature
vector) to generate

xt
t st

…

…xt

st
ot

{
{

{

s1 : zero feature vector

RNN

xtx

TCN RNN

st = ot−1

ot = activation(W ⋅ xt + U ⋅ st + b)

Shared over time

A simple RNN

TCN RNN

Figure 6.13 in Deep learning with python by Francois Chollet

Example: RNN for sentiment analysis

Input: review ( a piece of text) e.g.,“It works well”
output: 1: like or 0: dislike

TCN RNN

s1 RNN

x
It

Word

Embedding

works

RNN
s3

well

RNN

Sigmoid
z y FC layer L

Binary

cross-entropy

ot = activation(W ⋅ xt + U ⋅ st + b)

s1 : zero feature vector
st = ot−1

Recap of Chain rule

Given z = g(u) u = f(x)
dz
dx

=
dz
du

du
dx

Example : z = sin(x2)

u = x2

z = sin(u)
dz
du

= cos(u)

dz
dx

=
dz
du

du
dx

= 2x cos(u)

TCN RNN

Recap of Multi-layer perceptron: Function composition

z = f(s)

x → r → u → s → z

u = f(r)uj = f(rj) = sigmoid(rj) =
1

1 + e−rj

s =
p

∑
j=0

wjuj

rj =
m

∑
i=0

xivij

z = f(s) = sigmoid(s) =
1

1 + e−s

Forward prediction

f : activation functions

!”

“!
#

“$…
…

!!
“”!
$!

$$
%…

TCN RNN

Recap of Multi-layer perceptron: Chain rule
x → r → u → s → z

Introduction Gradient descent Neural network

∂L
∂vij

=
∂L
∂z ∂L

∂z

∂z
∂s ∂z

∂s

uj = sigmoid(rj) =
1

1 + e−rj

s =
p

∑
j=0

ujwj

rj =
m

∑
i=0

xivij

∂uj
∂rj

∂ri
∂vij∂rj

∂vij

∂s
∂uj

z = f(s)
u = f(r)

Forward

Backward propagation:

!”

“!
#

“$…
…

!!
“”!
$!

$$
%…

BackPropagation Through Time

TCN RNN

s1 RNN

x
It

Word

Embedding

works

RNN

well

RNN

∂L
∂y

∂y
∂z

z y L

∂z
∂o3

∂o3
∂W ∂L

∂y
∂y
∂z

∂z
∂o3

∂o3
∂W

∂o1
∂W

∂L
∂W

∂o2
∂W ∂o2

∂W
∂L
∂y

∂y
∂z

∂z
∂o3

∂o3
∂o2

+
∂L
∂y

∂y
∂z

∂z
∂o3

∂o3
∂o2

∂o2
∂o1

ot = activation(W ⋅ xt + U ⋅ st + b)

A special RNN: Long short-term memory (LSTM)

Figure 6.14 in Deep learning with python by Francois Chollet

TCN RNN

Figure 6.15 in Deep learning with python by Francois Chollet

TCN RNN

A special RNN: Long short-term memory (LSTM)

it = sigmoid(Wi ⋅ xt + Ui ⋅ st + bi)

ft = sigmoid(Wf ⋅ xt + Uf ⋅ st + bf)

c̃t+1 = tanh(Wc ⋅ xt + Uc ⋅ st + bc)

ct+1 = ct * ft + c̃t+1 * it

TCN RNN

Forget gate:

Input gate:

A special RNN: Long short-term memory (LSTM)

Summary

• How does temporal convolution work?

• How does dilated casual convolution work?

• How does recurrent neural network work?

• How does LSTM work?

Related Posts