CS计算机代考程序代写 python deep learning Keras algorithm Hive 13-cnn

13-cnn

Qiuhong Ke

Convolutional Neural Networks
COMP90051 Statistical Machine Learning

Multi-layer perceptron: A fully connected network

9×9

81×1

!”

!#!

“!
#!

“$!…
…

…

Input

layer

Hidden

layer

Output

layer

Consists of only fully connected (FC) layers

Disadvantage: Not spatial invariant

≠

… …

Multi-layer perceptron: A fully connected network

Disadvantage: more parameters with more hidden layers

!”

“!
#!
#”

##”
“#!…
…

…

$!
%!
%”

%$
$#%

…

Multi-layer perceptron: A fully connected network

5Source: Welinder, Peter, et al. “Caltech-UCSD birds 200.” (2010).

Convolutional Neural Network (CNN)
Convolution, Max-Pooling, and Fully Connected (FC) layers

LeCun, Yann, et al. “Gradient-based learning applied to document recognition.” Proceedings of the IEEE 86.11 (1998): 2278-2324. 6

AlexNet – ImageNet Classification with Deep Convolutional Neural Networks

Outline

• Convolutional layer

• Max-Pooling layer

• Additional notes in training neural network

• Batch size

• Optimisation algorithms

• Activation function

• How to prevent overfitting

Tool: Keras
Easy, simple and powerful

• Build the architecture (add layers from input to output. eg. FC layer,
convolution layer…)

AlexNet – ImageNet Classification with Deep Convolutional Neural Networks

Tool: Keras
Easy, simple and powerful

• Select an optimisation algorithm (eg. SGD, more in this lecture)

• Select the loss function

• Compile the model and train the model

AlexNet – ImageNet Classification with Deep Convolutional Neural Networks

• To do classification, we can first extract local features(: Identify local
patterns) and then combine the local features for classification

An image can be decomposed into local patches

Convolutional
layer Max-pooling layer

Additional

Training notes

• Different local patches could have different patterns

Identify different patterns at local patches

Convolutional
layer Max-pooling layer

Additional

Training notes

Identify different patterns

Convolutional
layer Max-pooling layer

Additional

Training notes

Element-wise

multiplication

×Sum ( ) =2

Sum ( )=1

Filter (kernel)

Input and kernel have the same pattern: high response

Convolutional
layer Max-pooling layer

Additional

Training notes

Element-wise

multiplication

×Sum ( ) =1

Sum ( )=2

Filter (kernel)

Identify different patterns

Convolutional
layer Max-pooling layer

Additional

Training notes

Element-wise

multiplication

×Sum ( ) =2

Filter (kernel)

Sum ( )=2

Different kernels identify different patterns

Convolutional
layer Max-pooling layer

Additional

Training notes

Element-wise

multiplication

×Sum ( ) =2

Filter (kernel)

Sum ( )=5

Convolution on 2D

Convolutional
layer Max-pooling layer

Additional

Training notes

Use kernel to perform
element-wise multiplication
and sum for every local patch

16 Figure 9.1 in Deep learning by Ian Goodfellow and Yoshua Bengio and Aaron Courville

Response map (Feature map)

Feature map: 2D map of the

presence of a pattern at different

locations in an input

Convolutional
layer Max-pooling layer

Additional

Training notes

kernel
Input

Figure 5.3 in Deep learning with python by Francois Chollet

willoweit.
Highlight

Different kernels identify different patterns: use multiple filters in each layer

Convolutional
layer Max-pooling layer

Additional

Training notes

The number of filters decides the number of output feature maps

willoweit.
Typewritten Text
multiple filter –> multiple response map

willoweit.
Highlight

Convolutional
layer Max-pooling layer

Additional

Training notes

32x32x1

Convolution layer
No.filters: 6

Filter size: 5×5

Two key parameters in Convolution

Input: 1 channel output: 6 channel

Filter (kernel) size: Size of the patches extracted from the inputs

Number of filters: Depth (channel) of the output feature map

willoweit.
Highlight

willoweit.
Oval

Convolution on Multiple-channel input

Kernel: same channel (depth)

* K(Channel 1)

* K(Channel 2)

* K(Channel 3)

Element-wise

sum

One

channel

Convolutional
layer Max-pooling layer

Additional

Training notes

willoweit.
Typewritten Text
feature map 1

willoweit.
Typewritten Text
feature map 2

willoweit.
Typewritten Text
feature map 3

willoweit.
Underline

willoweit.
Typewritten Text

willoweit.
Typewritten Text
one kernel

willoweit.
Typewritten Text
if use multiple kernel,
it will get multiple channel

Advantage: learn translation-invariant pattern

Convolutional
layer Max-pooling layer

Additional

Training notes

Advantage: weight sharing and sparse connection

Fully connected layer:

Each arrow is a

weight (no sharing)

Convolutional
layer Max-pooling layer

Additional

Training notes

Convolutional layer:

Figure 9.3 in Deep learning by Ian Goodfellow and Yoshua Bengio and Aaron Courville

willoweit.
Typewritten Text
？？？

Advantage: learn hierarchical pattern

Convolutional
layer Max-pooling layer

Additional

Training notes

More layers: larger size of receptive field

(larger window of the input is seen)

Figure 5.2 in Deep learning with python by Francois Chollet Figure 9.4 in Deep learning by Ian Goodfellow and Yoshua Bengio and Aaron Courville

willoweit.
Typewritten Text
？？？

Zeiler, Matthew D., and Rob Fergus. “Visualizing and understanding convolutional networks.” European conference on computer vision. Springer, Cham, 2014.

Convolutional
layer Max-pooling layer

Additional

Training notes

Zeiler, Matthew D., and Rob Fergus. “Visualizing and understanding convolutional networks.” European conference on computer vision. Springer, Cham, 2014.

Convolutional
layer Max-pooling layer

Additional

Training notes

Layer 4 Layer 5
Zeiler, Matthew D., and Rob Fergus. “Visualizing and understanding convolutional networks.” European conference on computer vision. Springer, Cham, 2014.

Convolutional
layer Max-pooling layer

Additional

Training notes

Convolutional
layer Max-pooling layer

Additional

Training notes

×”! ×”” ×”#
!! !” !# !$ !% !& !’

“! “” “# “$ “%

# is input vector

$ is output vector
output size input size≠

output size input size≠

Convolutional
layer Max-pooling layer

Additional

Training notes

28 Figure 9.1 in Deep learning by Ian Goodfellow and Yoshua Bengio and Aaron Courville

Padding

Convolutional
layer Max-pooling layer

Additional

Training notes

adding an appropriate number of rows and columns on each side of the input feature map

Figure 5.6 in Deep learning with python by Francois Chollet

willoweit.
Rectangle

willoweit.
Typewritten Text
5 x 5

willoweit.
Rectangle

willoweit.
Typewritten Text
7×7

willoweit.
Typewritten Text
input: 7 x 7
Kernel: 3×3
output: N-k+1 = 7-3+1 –> 5×5

Stride

padding: output_size=ceiling( input_size /stride)

Convolutional
layer Max-pooling layer

Additional

Training notes

the distance between two successive windows

No padding: output_size=ceiling( (input_size-kernel_size+1)/stride )

Figure 5.5 in Deep learning with python by Francois Chollet

willoweit.
Rectangle

willoweit.
Pencil

willoweit.
Underline

willoweit.
Typewritten Text
if the stride is larger than one the output size is smaller

willoweit.
Underline

willoweit.
Typewritten Text
smallest integer >= result

Convolutional layer

Convolutional
layer Max-pooling layer

Additional

Training notes

filters: the number of filters in the convolution

kernel_size: the height and width of the 2D
convolution window

padding: one of “valid” or “same”

stride: the strides of the convolution along the height
and width

willoweit.
Typewritten Text
the output size == the input size

willoweit.
Underline

willoweit.
Typewritten Text
do not perform the padding

willoweit.
Line

Convolutional
layer Max-pooling layer

Additional

Training notes

Convolutional
layer Max-pooling layer

Additional

Training notes

Convolutional
layer Max-pooling layer

Additional

Training notes

Convolutional
layer Max-pooling layer

Additional

Training notes

Advantage: downsample feature map, reduce computational burden

Convolutional
layer Max-pooling layer

Additional

Training notes

!! !” !# !$ !% !& !’

0.9 0.7 0.3 1 0.4 0.8

0.9 1 0.8

”

Max-pooling

Conv

Advantage: increase size of receptive field (window of the input is seen)

!! !” !# !$ !% !& !’

0.9 0.7 0.3 1 0.4 0.8

0.9 1 0.8

”

!(!! !” !# !$ !% !& !’

”

0.9 0.7 0.3 1 0.4 0.8

Max-pooling

Convolutional
layer Max-pooling layer

Additional

Training notes

ConvConv

Conv

Convolutional
layer Max-pooling layer

Additional

Training notes

Other pooling method: Average pooling
taking the average value over the patch

0.325 0.33 0.48

0.280.68

…

Convolutional
layer Max-pooling layer

Additional

Training notes

Why max-pooling to downsample feature map?

Convolution with stride>1: miss feature-presence information

Average pooling: dilute feature-presence information

kernel
Input
Feature map: 2D map of the

presence of a pattern at different

locations in an inputFigure 5.3 in Deep learning with python by Francois Chollet

willoweit.
Underline

willoweit.
Highlight

willoweit.
Typewritten Text
稀释

willoweit.
Underline

willoweit.
Highlight

Outline

Convolutional
layer Max-pooling layer

Additional

Training notes

• Batch size

• Othe optimisation methods (optimiser)

• Momentum

• Adaptive gradient (AdaGrad)

• Root Mean Square Propagation (Rmsprop)

• Adaptive moment estimation (Adam)

• Activation function

• How to prevent overfitting

Gradient descent Algorithm

• Randomly shuffle/split all training examples in ! batches
• Choose initial “(“)

• For # from 1 to %
• For & from 1 to !
• Do gradient descent update using data from batch &

• Advantage of such an approach: computational feasibility for
large datasets

Convolutional
layer Max-pooling layer

Additional

Training notes

Iterations over the
entire dataset are

called epochs

Stochastic gradient descent: B=N

Choose initial guess !(“), ! = 0
Here ! is a set of all weights form all layers

For $ from 1 to & (epochs)
For ‘ from 1 to ( (training examples)

Consider example )! , +!
Update: !($%&) = !($) − $%&(!($)); kßk+1

Convolutional
layer Max-pooling layer

Additional

Training notes

Stochastic gradient descent (SGD)

Batch number==N (Batch size==1)

• high variance in gradient
• update model too often

Convolutional
layer Max-pooling layer

Additional

Training notes

Quick update each step, but

error surface

Batch SGD: Batch number==1 (Batch size==N)

Stable update, but

Convolutional
layer Max-pooling layer

Additional

Training notes

• not computational feasible for large dataset

• takes long time to move each step

Mini-batch SGD

mini-batch: 1

Related Posts