CS计算机代考程序代写 python chain deep learning 13_tcn_rnn

13_tcn_rnn

Qiuhong Ke

Temporal Convolutional Network
& Recurrent Neural Network
COMP90051 Statistical Machine Learning

Copyright: University of Melbourne

Sit down or stand up?
We need the learn the temporal information to understand the action

… …

2

temporal information: temporal evolution (changes) of

the human pose

Temporal information does matter

… …

3

Change the order: different actions

What is the next word ?

4

This game is really fun

This cookie is really tasty

Like?Dislike? Sentiment analysis

Amazon product data

I bought this charger in Jul 2003 and it worked OK for a while. The design is nice and convenient.

However, after about a year, the batteries would not hold a charge. Might as well just get alkaline
disposables, or look elsewhere for a charger that comes with batteries that have better staying
power.

5http://jmcauley.ucsd.edu/data/amazon/

http://jmcauley.ucsd.edu/data/amazon/
http://jmcauley.ucsd.edu/data/amazon/

Outline

• Temporal convolutional network (TCN)

• Recurrent neural network (RNN)

6

1D Convolution

sliding
window

!(#) = & ‘ # + ) *())
!”#

$%&

+ = , ∗ .
kernel (filter):
. = *&, *#, *’ (

Σ

×*& ×*# ×*’
!! !” !# !$ !% !& !’

, is input vector

+ is output vector (feature map)
“! “” “# “$ “%

7

TCN RNN

1D Convolution

Σ

×”! ×”” ×”#
!! !” !# !$ !% !& !’

“! “” “# “$ “%

# is input vector

$ is output vector

8

TCN RNN

1D Convolution

Σ

×”! ×”” ×”#
!! !” !# !$ !% !& !’

“! “” “# “$ “%

# is input vector

$ is output vector

9

TCN RNN

1D Convolution

Σ

×”! ×”” ×”#
!! !” !# !$ !% !& !’

“! “” “# “$ “%

# is input vector

$ is output vector

10

TCN RNN

1D Convolution

Σ

×”! ×”” ×”#
!! !” !# !$ !% !& !’

“! “” “# “$ “%

# is input vector

$ is output vector

11

TCN RNN

2D Convolution? Next lecture

TCN RNN

12 Figure 9.1 in Deep learning by Ian Goodfellow and Yoshua Bengio and Aaron Courville

Temporal convolution: 1D convolution (on the temporal dimension)

Σ

×”! ×”” ×”#
!! !” !# !$ !% !& !’

“! “” “# “$ “%

a: 6 timesteps of inputs,
each has 1 dimension

#: output sequence

13

1D input feature vector:

special case of sequence data

A sequence that consists of 7 time-steps:

One feature in each step

TCN RNN

Temporal convolution on a sequence of feature vectors

14

Input:
• a sequence of N=4 time-steps
• Each time-step is a feature vector

(dimension: d=2 (2 features))

Kernel: a weight matrix (dxK)
• One dimension is the same as d
• Another dimension: size of temporal

convolutional window K

N
K

TCN RNN

Temporal convolution for text

15https://www.youtube.com/watch?time_continue=227&v=0N3PsjfXW9g&feature=emb_logo

represent words at each time-step before temporal convolutionHow to

I bought this charger in Jul 2003 and it worked OK for a while. The design is nice and convenient.

However, after about a year, the batteries would not hold a charge. Might as well just get alkaline
disposables, or look elsewhere for a charger that comes with batteries that have better staying
power.

• One-hot encoding
• Word embeding

TCN RNN

• Create a feature vector

• Dimension of the vector == size of vocabulary

• If the word is the word in the vocabulary,

then the element of the vector is 1, all the
others are 0

• Sparse and high-dimensional

dic={“the”,”cat”,”sat”,”on”,”mat”,”.”,

“these”,”are”,”other”,”words”}

16

One-hot encoding

ith
ith

TCN RNN

willoweit.
Underline

willoweit.
Oval

17

Word embedding: learn low-dimensional embedding from data

Figure 6.1 in Deep learning with python by Francois Chollet Figure 6.3 in Deep learning with python by Francois Chollet

TCN RNN

18

Word embedding: learn low-dimensional embedding from data

Figure 6.4 in Deep learning with python by Francois Chollet

Embedding layer:

• Create a weight matrix of d (emb_dim) rows, N (vocabulary_size) columns

• Return the column as the feature vector for the word

• Learn word embeddings jointly with the main task or load pretrained word embeddings

ith ith

TCN RNN

Example

19

Word
embedding

Input: Kernel:

Output:

0.2

element-wise

multiplication

and sum

TCN RNN

Example

20

Word
embedding

Input: Kernel:

Output:

0.2

element-wise

multiplication

and sum

0.7

TCN RNN

Example

21

Word
embedding

Input: 2 Kernels:

Output:

0.0
0.5

0.4

element-wise

multiplication

and sum

TCN RNN

Example

22

Word
embedding

Input: 2 Kernels:

Output:

0.0
0.5

0.4

element-wise

multiplication

and sum

-0.4

TCN RNN

A special version: Dilated casual convolution

23Bai, , J. Zico Kolter, and Vladlen Koltun. “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling.” arXiv preprint arXiv:1803.01271 (2018).

Dilation factor (d): step
between every two
adjacent filter taps.

Larger dilation
factor: the weights
are placed more far
away (i.e., more
sparse),

Dilated: expand the alignment of the kernel weights by dilation factor

TCN RNN

Dilated casual convolution

24Bai, , J. Zico Kolter, and Vladlen Koltun. “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling.” arXiv preprint arXiv:1803.01271 (2018).

Causal: The output at at time-step t does not depend on the input
information after time-step t (e.g., real-time recognition, prediction)

TCN RNN

25

TCN RNN

Networks with loops

=

t=2

RNN

o2

x2

s2

t=3

RNN

o3

x3

s3 s4

t=K

RNN

oK

xK

RNN

o1

x1
t=1

RNN

o

x

Recurrent
connection

Process sequence step by step
Recurrent connection: the output hidden feature vector (state) of each step
is connected to the next step

s1

26

RNN: Process sequence step by step
At each step t: combine current tilmestep (feature vector of input
sequence at tilmestep ) and historical information, i.e., state (feature
vector) to generate

xt
t st

ot

…xt

st
ot

{
{

{

s1 : zero feature vector

RNN

ot

xtx

TCN RNN

st

st = ot−1

ot = activation(W ⋅ xt + U ⋅ st + b)

Shared over time

A simple RNN

27

TCN RNN

Figure 6.13 in Deep learning with python by Francois Chollet

28

Example: RNN for sentiment analysis

Input: review ( a piece of text) e.g.,“It works well”
output: 1: like or 0: dislike

TCN RNN

xt

s1 RNN

x
It

Word

Embedding

s2

works

RNN
s3

well

RNN

o3

Sigmoid
z y FC layer L

Binary

cross-entropy

ot = activation(W ⋅ xt + U ⋅ st + b)

s1 : zero feature vector
st = ot−1

Recap of Chain rule

Given z = g(u) u = f(x)
dz
dx

=
dz
du

du
dx

Example : z = sin(x2)

u = x2

z = sin(u)
dz
du

= cos(u)

dz
dx

=
dz
du

du
dx

= 2x cos(u)

29

TCN RNN

Recap of Multi-layer perceptron: Function composition

z = f(s)

x → r → u → s → z

u = f(r)uj = f(rj) = sigmoid(rj) =
1

1 + e−rj

s =
p


j=0

wjuj

rj =
m


i=0

xivij

z = f(s) = sigmoid(s) =
1

1 + e−s

Forward prediction

f : activation functions

L

!!

!”

!#

“!
#

“$…

!!
“”!
$!

$$
%…

30

TCN RNN

Recap of Multi-layer perceptron: Chain rule
x → r → u → s → z

Introduction Gradient descent Neural network

L

∂L
∂vij

=
∂L
∂z ∂L

∂z

∂z
∂s ∂z

∂s

uj = sigmoid(rj) =
1

1 + e−rj

s =
p


j=0

ujwj

rj =
m


i=0

xivij

∂uj
∂rj

∂uj
∂rj

∂ri
∂vij∂rj

∂vij

∂s
∂uj

∂s
∂uj

z = f(s)
u = f(r)

Forward

Backward propagation:

!!

!”

!#

“!
#

“$…

!!
“”!
$!

$$
%…

31

32

BackPropagation Through Time

xt

TCN RNN

s1 RNN

x
It

Word

Embedding

works

RNN

well

RNN

o3

∂L
∂y

∂y
∂z

z y L

∂z
∂o3

∂o3
∂W ∂L

∂y
∂y
∂z

∂z
∂o3

∂o3
∂W

+

∂o1
∂W

∂o1
∂W

∂L
∂W

=

∂o2
∂W ∂o2

∂W
∂L
∂y

∂y
∂z

∂z
∂o3

∂o3
∂o2

∂o3
∂o2

+
∂L
∂y

∂y
∂z

∂z
∂o3

∂o3
∂o2

∂o2
∂o1

∂o2
∂o1

ot = activation(W ⋅ xt + U ⋅ st + b)

A special RNN: Long short-term memory (LSTM)

33

Figure 6.14 in Deep learning with python by Francois Chollet

TCN RNN

34

Figure 6.15 in Deep learning with python by Francois Chollet

TCN RNN

A special RNN: Long short-term memory (LSTM)

it = sigmoid(Wi ⋅ xt + Ui ⋅ st + bi)

ft = sigmoid(Wf ⋅ xt + Uf ⋅ st + bf)

c̃t+1 = tanh(Wc ⋅ xt + Uc ⋅ st + bc)

ct+1 = ct * ft + c̃t+1 * it

35

TCN RNN

Forget gate:

Input gate:

A special RNN: Long short-term memory (LSTM)

Summary

• How does temporal convolution work?

• How does dilated casual convolution work?

• How does recurrent neural network work?

• How does LSTM work?

36