CS计算机代考程序代写 database algorithm The University of Sydney Page 1

The University of Sydney Page 1

Convolutional
Neural Networks

Dr Chang Xu
School of Computer Science

The University of Sydney Page 2

History of CNNs

Neocognitron (Kunihiko Fukushima, 1980)

The University of Sydney Page 3

History of CNNs

LeNet-5 (LeCun et al, 1998)
– Built the modern framework of CNNs: Convolutional Layer, Pooling

Layer, and Fully-Connected Layer
– Trained with the backpropagation algorithm
– Classify handwritten digits. However, it can not perform well on more

complex problems, e.g., large-scale image and video classification

The University of Sydney Page 4

History of CNNs

Linear classifier: 8% ~ 12% error
K-nearest-neighbor: 1.x% ~ 5% error
Support Vector Machine: 0.6% ~ 1.4% error
(Conventional) Neural Nets: 1% ~ 5% error

“The MNIST Database”

The University of Sydney Page 5

History of CNNs

AlexNet (Krizhevsky et al, 2012)
– Significant improvements on the image classification task, ImageNet 2012
– The network achieved a top-5 error of 15.3%, more than 10.8 percentage

points ahead of the runner up.
– Basic framework of CNNs with a deeper structure
– Benefit from ImageNet dataset, GPUs, ReLU, Dropout …

5 convolutional layers and 3 fully connected layers

The University of Sydney Page 6

Today, CNNs are everywhere
– Image classification, Image segmentation, Pose estimation, Style

transfer, Image detection, Image caption …

(Krizhevsky et al, 2012) (Shaoli et al, 2017)

(Jianfeng et al, 2017)(Xinyuan et al, 2018)

The University of Sydney Page 7

Basic CNNs Components

The University of Sydney Page 8

A general CNN
– Convolutional Layer
– Pooling
– Fully-connected Layer

(https://leonardoaraujosantos.gitbooks.io)

The University of Sydney Page 9

https://github.com/pytorch/examples/blob/master/mnist/main.py

A toy example

https://pytorch.org/docs/stable/nn.html#linear

https://pytorch.org/docs/stable/nn.html#convolution-layers

https://github.com/pytorch/examples/blob/master/mnist/main.py
https://pytorch.org/docs/stable/nn.html
https://pytorch.org/docs/stable/nn.html

The University of Sydney Page 10

Convolution layers in PyTorch

https://pytorch.org/docs/stable/nn.html#convolution-layers

The University of Sydney Page 11

Convolutional Layer

Grayscale Image: !

Filter: ”

Feature

– Give a simple example: take a grayscale image as input

#!,# #!,$

#%,% #%,# #%,$

##,% ##,# ##,$

#!,%

The University of Sydney Page 12

Convolutional Layer

1 2 0 1 0 1

2 1 1 0 0 1

1 0 0 2 1 0

2 0 0 0 2 1

0 1 1 2 0 2

1 0 1 0 1 1

1 0 -1

-1 0 0

0 0 1

-1

– Convolution

The University of Sydney Page 13

Convolutional Layer

1 2 0 1 0 1

2 1 1 0 0 1

1 0 0 2 1 0

2 0 0 0 2 1

0 1 1 2 0 2

1 0 1 0 1 1

1 0 -1

-1 0 0

0 0 1

-1 2

– Convolution

The University of Sydney Page 14

Convolutional Layer

1 2 0 1 0 1

2 1 1 0 0 1

1 0 0 2 1 0

2 0 0 0 2 1

0 1 1 2 0 2

1 0 1 0 1 1

1 0 -1

-1 0 0

0 0 1

-1 2 0

– Convolution

The University of Sydney Page 15

Convolutional Layer

1 2 0 1 0 1

2 1 1 0 0 1

1 0 0 2 1 0

2 0 0 0 2 1

0 1 1 2 0 2

1 0 1 0 1 1

1 0 -1

-1 0 0

0 0 1

-1 2 0 0

– Convolution

The University of Sydney Page 16

Convolutional Layer

1 2 0 1 0 1

2 1 1 0 0 1

1 0 0 2 1 0

2 0 0 0 2 1

0 1 1 2 0 2

1 0 1 0 1 1

1 0 -1

-1 0 0

0 0 1

-1 2 0 0

0 1 3 -2

0 0 -1 4

3 -1 -2 -2

– Convolution

The University of Sydney Page 17

Convolutional Layer

1 2 0 1 0 1

2 1 1 0 0 1

1 0 0 2 1 0

2 0 0 0 2 1

0 1 1 2 0 2

1 0 1 0 1 1

1 0 -1

-1 0 0

0 0 1

-1

– Stride

Stride = 1

2

The stride size is defined by how much you want to shift your filter at each step.

The University of Sydney Page 18

Convolutional Layer

1 2 0 1 0 1

2 1 1 0 0 1

1 0 0 2 1 0

2 0 0 0 2 1

0 1 1 2 0 2

1 0 1 0 1 1

1 0 -1

-1 0 0

0 0 1
-1

– Stride

Stride = 3

0

The University of Sydney Page 19

Convolutional Layer

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

– Zero padding (pad = 1)
By doing this you can apply the filter to every element of your input matrix.

The University of Sydney Page 20

Convolutional Layer
– Output Size

Output Size

!

#

Output Size = !”#$%&’ + 1

$

$

The University of Sydney Page 21

Learn multiple filters

The University of Sydney Page 22

Learn multiple filters

1 2 0 1 0 1

2 1 1 0 0 1

1 0 0 2 1 0

2 0 0 0 2 1

0 1 1 2 0 2

1 0 1 0 1 1

1 0 -1

-1 0 0

0 0 1

-1 2 0 0

0 1 3 -2

0 0 -1 4

3 -1 -2 -2

1

0 2 1

0 1 -1

-1 1 0

Filter 1

Filter 2

The University of Sydney Page 23

1

0 2 1

-1 1 0

0 1 -1

Learn multiple filters

1 2 0 1 0 1

2 1 1 0 0 1

1 0 0 2 1 0

2 0 0 0 2 1

0 1 1 2 0 2

1 0 1 0 1 1

1 0 -1

-1 0 0

0 0 1

-1 2 0 0

0 1 3 -2

0 0 -1 4

3 -1 -2 -2

3

Filter 1

Filter 2

The University of Sydney Page 24

1

0 2 1

-1 1 0

0 1 -1

Learn multiple filters

1 2 0 1 0 1

2 1 1 0 0 1

1 0 0 2 1 0

2 0 0 0 2 1

0 1 1 2 0 2

1 0 1 0 1 1

1 0 -1

-1 0 0

0 0 1

-1 2 0 0

0 1 3 -2

0 0 -1 4

3 -1 -2 -2

3 2 4 -1

1 0 1 4

1 2 4 1

-1 0 3 4

Filter 1

Filter 2

The University of Sydney Page 25

Convolutional Layer
– Above, we have only considered a 2-D image as input
– When the input has depth (e.g. RGB images), the

convolution ops should be…

$%×&%×’%

(&×(‘×’%

$#×&#

The University of Sydney Page 26

Convolutional Layer

Two filters
Stride=2
Zero-padding=1

The University of Sydney Page 27

Convolutional Layer

Two filters
Stride=2
Zero-padding=1

The University of Sydney Page 28

Convolutional Layer

Two filters
Stride=2
Zero-padding=1

The University of Sydney Page 29

Convolutional Layer

Two filters
Stride=2
Zero-padding=1

The University of Sydney Page 30

Convolutional Layer

Two filters
Stride=2
Zero-padding=1

The University of Sydney Page 31

Convolutional Layer

Two filters
Stride=2
Zero-padding=1

The University of Sydney Page 32

Convolutional Layer

Two filters
Stride=2
Zero-padding=1

The University of Sydney Page 33

Convolutional Layer
– Suppose stride is (“!, “”) and pad is (%!, %”)

!) =
*!+),”-.”

/”
+ 1 and %) =

0!+),”-.#
/#

+ 1

$%×&%×’%

(&×(‘×’%

$#×&#

The University of Sydney Page 34

%!,# %!,$

%#,! %#,# %#,$

%$,! %$,# %$,$

%!,!

Convolution as a matrix operation

&!,!%&! &!,#%&! &!,$%&! &!,’%&!

&#,!%&! &#,#%&! &#,$%&! &#,’%&!

&$,!%&! &$,#%&! &$,$%&! &$,’%&!

&’,!%&! &’,#%&! &’,$%&! &’,’%&!

&!,!% &!,#%

&#,!% &#,#%

&#$%×((%,%
#$%, ⋯ , (‘,’

#$%) ⊺= ((%,%
# , ⋯ , (),)

# ) ⊺

– If the input +#$%and output +# were to be unrolled into vectors, the
convolution could be represented as a sparse matrix&#$%where the non-zero
elements are the elements ,*,+ of the kernel.

(!”# =

*#,# *#,% *#,& 0 *%,# *%,% *%,& 0
0 *#,# *#,% *#,& 0 *%,# *%,% *%,&

*&,# *&,% *&,& 0 0 0 0 0
0 *&,# *&,% *&,& 0 0 0 0

0 0 0 0 *#,# *#,% *#,& 0
0 0 0 0 0 *#,# *#,% *#,&

*%,# *%,% *%,& 0 *&,# *&,% *&,& 0
0 *%,# *%,% *%,& 0 *&,# *&,% *&,&

1,-% 2,-% (4,-%) 1,

The University of Sydney Page 35

Back-propagation in convolutional layer

4,-%×(6%,%
,-%, ⋯ , 6.,.

,-%) ⊺= (6%,%
,-%, ⋯ , 6#,#

,-%) ⊺

,-.//
,0(,)

= ∑1,2
,-.//
,3*,+,

,3*,+,
,0(,)

,

– Backward pass

,3*,+,
,0(,)

= (14*$%,24+$%
#$% ,

∗ =

1,-% 2,-% (4,-%) 1,

01233
04!,#$

=
01233
04%$

= ∑5
01233
04&$'(

04&$'(

04%$
= ∑5

01233
04&$'(

45,6
, =

01233
07$'( 4∗,6

, = 4∗,6
, ⊺ 01233

07$'(.

,where

⋯ ⋯

(Note that 66
, represent i-th element in the 1,. Here, ; = (ℎ − 1)×& + @.

The University of Sydney Page 36

Receptive Field

The University of Sydney Page 37

Receptive Field
– The receptive field in Convolutional Neural Networks (CNN) is the

region of the input space that affects a particular unit of the network.
– In this example, we use the convolution filter . with size / = 3×3,

padding % = 1, stride ” = 2×2.

The University of Sydney Page 38

Receptive Field
– From the left column, we are hard to tell the receptive filed size,

especially for deep CNNs.
– The right column shows the fixed-sized CNN visualization, which

solves the problem by keeping the size of all feature maps constant
and equal to the input map. Each feature is then marked at the center
of its receptive field location.

The University of Sydney Page 39

Convolution layers in PyTorch

https://pytorch.org/docs/stable/nn.html#convolution-layers

The University of Sydney Page 40

– In simple terms, dilated convolution is just a convolution
applied to input with defined gaps.

– Dilation: Spacing between kernel elements. Default: 1.
– D=2 means skipping one pixel per input
– The receptive filed grows exponentially while the number of

parameters grows linearly.

Dilated Convolution

(Yu et al, 2015)

The University of Sydney Page 41

Pooling

The University of Sydney Page 42

Pooling
Max pooling
– Filter size: (2,2)
– Stride: (2,2)
– Pooling ops: max(6)

-1 2 0 0

0 1 3 -2

0 0 -1 4

3 -1 -2 -2

2 3

3 4

Feature map

Subsample map

The University of Sydney Page 43

Motivation: Pooling
– Pooling helps the representation become slightly invariant to

small translations of the input
– Invariance to local translation can be a very useful property if we care

more about whether some feature is present than exactly where it is
– Taking max pooling as an example:

1. 0.2 0.1 0.00.1

1. 1. 0.2 0.11.

0.2 1. 0.2 0.00.3

1. 1. 1. 0.20.3

The University of Sydney Page 44

Motivation: Pooling

– Because pooling summarizes the responses over a whole
neighbourhood, it is possible to use fewer pooling units than
detector units

– Since pooling is used for down sampling, it can be used to
handle inputs of varying sizes

1. 0.2 0.1 0.00.1 0.5

1. 0.2 0.5

The University of Sydney Page 45

Pooling
Average pooling
– Filter size: (2,2)
– Stride: (2,2)
– Pooling ops: mean(6)

-1 4 1 2

0 1 3 -2

1 5 -2 6

3 -1 -2 -2

4 3

5 6

Feature map

Max pooling

1 1

2 0

Average pooling

The University of Sydney Page 46

Pooling
&A norm pooling
– Filter size (Gaussian kernel size): (2,2)
– Stride: (2,2)
– Pooling ops: 95= ∑6#5&-,.#

6%,% 6%,# 6#,% 6#,#

#% ##

#$ #.

Feature map

Gaussian window

B% B#

B$ B.

Output

6%,$ 6%,. 6#,$ 6#,.

6$,% 6$,# 6.,% 6.,#

6$,$ 6$,. 6.,$ 6.,.

The University of Sydney Page 47

Pooling

– Other pooling
– !$ pooling (preserves the class-specific spatial/geometric information in

the pooled features)

!-= #
.
$.%-,./

⁄! /

– Mixed pooling (addresses the over-fitting problem)
“%= $max()%,’, … , )%,() + 1 − $ mean()%,’, … , )%,()

– Stochastic pooling (hyper-parameter free, regularizes large CNNs)
“%= )), where 2~4(5′, ⋯ , 5() and 5* =

+%,&
∑& +%,&

– Spectral pooling (preserves considerably more information per parameter
than other pooling strategies)

7 = ℱ(9) ∈ ℂ-×/, <7 = ℱ0' (7 ∈ ℂ1×2) – … The University of Sydney Page 48 Why CNNs ? The University of Sydney Page 49 Motivation: convolution – Problems of fully connected neural networks – Every output unit interacts with every input unit – The number of weights grows largely with the size of the input image – Pixels in distance are less correlated The University of Sydney Page 50 Motivation: convolution – Locally connected neural net – Sparse connectivity: a hidden unit is only connected to a local patch – It is inspired by biological systems, where a cell is sensitive to a small sub-region, called a receptive field. – Here, the receptive field can be called as filter or kernel The University of Sydney Page 51 Motivation: convolution – Problems of Locally connected neural net – The learned filter is a spatially local pattern – A hidden node at a higher layer has a larger receptive field in the input – Stacking many such layers leads to “filters”(not anymore linear) which become increasingly “global” Ranzato CVPR’13 The University of Sydney Page 52 Motivation: convolution – Shared weights – Translation invariance: capture statistics in local patches and they are independent of locations – Hidden nodes at different locations share the same weights. It greatly reduces the number of parameters to learn Example: 1000x1000 image 1 Filters Filter size: 10x10 100 parameters Ranzato CVPR’13 The University of Sydney Page 53 Motivation: convolution – Multiple filters – Multiple filters provide the probability of detecting the spatial distributions of multiple visual patterns – One filter can build a feature map, multiple filters will build a stack of feature maps Example: 1000x1000 image 100 Filters Filter size: 10x10 10k parameters Ranzato CVPR’13 The University of Sydney Page 54 Motivation: convolution – Multiple filters: intuitive examples Input Image blurring Edge detection Image enhancement Vertical detection The University of Sydney Page 55 Visualize features The University of Sydney Page 56 Visualize features – Why CNNs work so well? Hierarchical Convolution, Nonlinear operations (ReLU, max pooling…) What happens inside hidden layers? Class Scores: 1000 numbers Input image The University of Sydney Page 57 Visualize features – Give insight into the function of intermediate feature layers and the operation of the classifier (Zeiler and Fergus, 2014) The University of Sydney Page 58 Thank you!