CS计算机代考程序代写 python deep learning Keras algorithm Hive 13-cnn

13-cnn

Qiuhong Ke

Convolutional Neural Networks
COMP90051 Statistical Machine Learning

Copyright: University of Melbourne

vs

2

Multi-layer perceptron: A fully connected network

9×9

81×1

!!

!”

!#!

“!
#!

“$!…

Input

layer

Hidden

layer

Output

layer

Consists of only fully connected (FC) layers

3

Disadvantage: Not spatial invariant

… …

Multi-layer perceptron: A fully connected network

4

Disadvantage: more parameters with more hidden layers

!!

!”

!#

“!
#!
#”

##”
“#!…

$!
%!
%”

%$
$#%

Multi-layer perceptron: A fully connected network

5Source: Welinder, Peter, et al. “Caltech-UCSD birds 200.” (2010).

Convolutional Neural Network (CNN)
Convolution, Max-Pooling, and Fully Connected (FC) layers

LeCun, Yann, et al. “Gradient-based learning applied to document recognition.” Proceedings of the IEEE 86.11 (1998): 2278-2324. 6

AlexNet – ImageNet Classification with Deep Convolutional Neural Networks

AlexNet – ImageNet Classification with Deep Convolutional Neural Networks

Outline

• Convolutional layer

• Max-Pooling layer

• Additional notes in training neural network

• Batch size

• Optimisation algorithms

• Activation function

• How to prevent overfitting

7

Tool: Keras
Easy, simple and powerful

• Build the architecture (add layers from input to output. eg. FC layer,
convolution layer…)

8

AlexNet – ImageNet Classification with Deep Convolutional Neural Networks

AlexNet – ImageNet Classification with Deep Convolutional Neural Networks

Tool: Keras
Easy, simple and powerful

• Select an optimisation algorithm (eg. SGD, more in this lecture)

• Select the loss function

• Compile the model and train the model

9

AlexNet – ImageNet Classification with Deep Convolutional Neural Networks

AlexNet – ImageNet Classification with Deep Convolutional Neural Networks

• To do classification, we can first extract local features(: Identify local
patterns) and then combine the local features for classification

An image can be decomposed into local patches

Convolutional
layer Max-pooling layer

Additional

Training notes

• Different local patches could have different patterns

10

Identify different patterns at local patches

Convolutional
layer Max-pooling layer

Additional

Training notes

11

Identify different patterns

Convolutional
layer Max-pooling layer

Additional

Training notes

×

Element-wise

multiplication

×Sum ( ) =2

Sum ( )=1

Filter (kernel)

12

Input and kernel have the same pattern: high response

Convolutional
layer Max-pooling layer

Additional

Training notes

×

Element-wise

multiplication

×Sum ( ) =1

Sum ( )=2

Filter (kernel)

13

Identify different patterns

Convolutional
layer Max-pooling layer

Additional

Training notes

×

Element-wise

multiplication

×Sum ( ) =2

Filter (kernel)

Sum ( )=2

14

Different kernels identify different patterns

Convolutional
layer Max-pooling layer

Additional

Training notes

×

Element-wise

multiplication

×Sum ( ) =2

Filter (kernel)

Sum ( )=5

15

Convolution on 2D

Convolutional
layer Max-pooling layer

Additional

Training notes

Use kernel to perform
element-wise multiplication
and sum for every local patch

16 Figure 9.1 in Deep learning by Ian Goodfellow and Yoshua Bengio and Aaron Courville

Response map (Feature map)

Feature map: 2D map of the

presence of a pattern at different

locations in an input

Convolutional
layer Max-pooling layer

Additional

Training notes

kernel
Input

17

Figure 5.3 in Deep learning with python by Francois Chollet

willoweit.
Highlight

Different kernels identify different patterns: use multiple filters in each layer

Convolutional
layer Max-pooling layer

Additional

Training notes

The number of filters decides the number of output feature maps

18

willoweit.
Typewritten Text
multiple filter –> multiple response map

willoweit.
Highlight

Convolutional
layer Max-pooling layer

Additional

Training notes

32x32x1

Convolution layer
No.filters: 6

Filter size: 5×5

Two key parameters in Convolution

Input: 1 channel output: 6 channel

19

Filter (kernel) size: Size of the patches extracted from the inputs

Number of filters: Depth (channel) of the output feature map

willoweit.
Highlight

willoweit.
Highlight

willoweit.
Oval

willoweit.
Oval

Convolution on Multiple-channel input

R

G

B

Kernel: same channel (depth)

* K(Channel 1)

* K(Channel 2)

* K(Channel 3)

Element-wise

sum

One

channel

Convolutional
layer Max-pooling layer

Additional

Training notes

willoweit.
Typewritten Text
feature map 1

willoweit.
Typewritten Text
feature map 2

willoweit.
Typewritten Text
feature map 3

willoweit.
Underline

willoweit.
Typewritten Text

willoweit.
Typewritten Text
one kernel

willoweit.
Typewritten Text
if use multiple kernel,
it will get multiple channel

Advantage: learn translation-invariant pattern

Convolutional
layer Max-pooling layer

Additional

Training notes

21

Advantage: weight sharing and sparse connection

Fully connected layer:

Each arrow is a

weight (no sharing)

Convolutional
layer Max-pooling layer

Additional

Training notes

Convolutional layer:

22

Figure 9.3 in Deep learning by Ian Goodfellow and Yoshua Bengio and Aaron Courville

willoweit.
Typewritten Text
???

Advantage: learn hierarchical pattern

Convolutional
layer Max-pooling layer

Additional

Training notes

23

More layers: larger size of receptive field

(larger window of the input is seen)

Figure 5.2 in Deep learning with python by Francois Chollet Figure 9.4 in Deep learning by Ian Goodfellow and Yoshua Bengio and Aaron Courville

willoweit.
Typewritten Text
???

Zeiler, Matthew D., and Rob Fergus. “Visualizing and understanding convolutional networks.” European conference on computer vision. Springer, Cham, 2014.

Convolutional
layer Max-pooling layer

Additional

Training notes

24

Zeiler, Matthew D., and Rob Fergus. “Visualizing and understanding convolutional networks.” European conference on computer vision. Springer, Cham, 2014.

Convolutional
layer Max-pooling layer

Additional

Training notes

25

Layer 4 Layer 5
Zeiler, Matthew D., and Rob Fergus. “Visualizing and understanding convolutional networks.” European conference on computer vision. Springer, Cham, 2014.

Convolutional
layer Max-pooling layer

Additional

Training notes

26

Convolutional
layer Max-pooling layer

Additional

Training notes

Σ

×”! ×”” ×”#
!! !” !# !$ !% !& !’

“! “” “# “$ “%

# is input vector

$ is output vector
output size input size≠

27

output size input size≠

Convolutional
layer Max-pooling layer

Additional

Training notes

28 Figure 9.1 in Deep learning by Ian Goodfellow and Yoshua Bengio and Aaron Courville

Padding

Convolutional
layer Max-pooling layer

Additional

Training notes

adding an appropriate number of rows and columns on each side of the input feature map

29

Figure 5.6 in Deep learning with python by Francois Chollet

willoweit.
Rectangle

willoweit.
Typewritten Text
5 x 5

willoweit.
Rectangle

willoweit.
Typewritten Text
7×7

willoweit.
Typewritten Text
input: 7 x 7
Kernel: 3×3
output: N-k+1 = 7-3+1 –> 5×5

Stride

padding: output_size=ceiling( input_size /stride)

Convolutional
layer Max-pooling layer

Additional

Training notes

the distance between two successive windows

No padding: output_size=ceiling( (input_size-kernel_size+1)/stride )

30

Figure 5.5 in Deep learning with python by Francois Chollet

willoweit.
Rectangle

willoweit.
Rectangle

willoweit.
Pencil

willoweit.
Underline

willoweit.
Underline

willoweit.
Typewritten Text
if the stride is larger than one the output size is smaller

willoweit.
Underline

willoweit.
Typewritten Text
smallest integer >= result

Convolutional layer

Convolutional
layer Max-pooling layer

Additional

Training notes

filters: the number of filters in the convolution

kernel_size: the height and width of the 2D
convolution window

padding: one of “valid” or “same”

stride: the strides of the convolution along the height
and width

31

willoweit.
Typewritten Text
the output size == the input size

willoweit.
Underline

willoweit.
Typewritten Text
do not perform the padding

willoweit.
Line

Convolutional
layer Max-pooling layer

Additional

Training notes

32

Convolutional
layer Max-pooling layer

Additional

Training notes

33

Convolutional
layer Max-pooling layer

Additional

Training notes

34

Convolutional
layer Max-pooling layer

Additional

Training notes

35

Advantage: downsample feature map, reduce computational burden

Convolutional
layer Max-pooling layer

Additional

Training notes

36

!! !” !# !$ !% !& !’

0.9 0.7 0.3 1 0.4 0.8

0.9 1 0.8

!(

Max-pooling

Conv

Advantage: increase size of receptive field (window of the input is seen)

!! !” !# !$ !% !& !’

0.9 0.7 0.3 1 0.4 0.8

0.9 1 0.8

!(!! !” !# !$ !% !& !’

!(

0.9 0.7 0.3 1 0.4 0.8

Max-pooling

Convolutional
layer Max-pooling layer

Additional

Training notes

37

ConvConv

Conv

Conv

Convolutional
layer Max-pooling layer

Additional

Training notes

38

Other pooling method: Average pooling
taking the average value over the patch

0.325 0.33 0.48

0.280.68

Convolutional
layer Max-pooling layer

Additional

Training notes

39

Why max-pooling to downsample feature map?

Convolution with stride>1: miss feature-presence information

Average pooling: dilute feature-presence information

kernel
Input
Feature map: 2D map of the

presence of a pattern at different

locations in an inputFigure 5.3 in Deep learning with python by Francois Chollet

willoweit.
Underline

willoweit.
Highlight

willoweit.
Typewritten Text
稀释

willoweit.
Underline

willoweit.
Highlight

Outline

Convolutional
layer Max-pooling layer

Additional

Training notes

• Batch size

• Othe optimisation methods (optimiser)

• Momentum

• Adaptive gradient (AdaGrad) 

• Root Mean Square Propagation (Rmsprop)

• Adaptive moment estimation (Adam)

• Activation function

• How to prevent overfitting

40

Gradient descent Algorithm

• Randomly shuffle/split all training examples in ! batches
• Choose initial “(“)

• For # from 1 to %
• For & from 1 to !
• Do gradient descent update using data from batch &

• Advantage of such an approach: computational feasibility for
large datasets

Convolutional
layer Max-pooling layer

Additional

Training notes

Iterations over the
entire dataset are

called epochs

41

Stochastic gradient descent: B=N

Choose initial guess !(“), ! = 0
Here ! is a set of all weights form all layers

For $ from 1 to & (epochs)
For ‘ from 1 to ( (training examples)

Consider example )! , +!
Update: !($%&) = !($) − $%&(!($)); kßk+1

Convolutional
layer Max-pooling layer

Additional

Training notes

42

Stochastic gradient descent (SGD)

Batch number==N (Batch size==1)

• high variance in gradient
• update model too often

Convolutional
layer Max-pooling layer

Additional

Training notes

Quick update each step, but

x

error surface

43

Batch SGD: Batch number==1 (Batch size==N)

Stable update, but

Convolutional
layer Max-pooling layer

Additional

Training notes

x

• not computational feasible for large dataset

• takes long time to move each step

44

Mini-batch SGD

mini-batch: 1