CS计算机代考程序代写 chain deep learning Keras algorithm Deep Learning - COSC2779 - Convolutional Neural Networks

Deep Learning – COSC2779 – Convolutional Neural Networks

Deep Learning – COSC2779
Convolutional Neural Networks

Dr. Ruwan Tennakoon

August 9, 2021

Reference: Chapter 9: Ian Goodfellow et. al., “Deep Learning”, MIT Press, 2016.

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 1 / 42

Outline

1 Motivation
2 2D Convolution in Traditional Computer Vision
3 Basic Convolution Operation
4 Pooling Operation
5 Variants of the Basic Convolution
6 The Neuro-scientific Basis for Convolutional Networks

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 2 / 42

Machine Learning

The Task can be expressed an unknown
target function: y = f (x)
ML finds a Hypothesis (model), h (·), from
hypothesis space H, which approximates
the unknown target function.
ŷ = h∗ (x) ≈ f (x)

The Experience is typically a data set, D,
of values

The Performance is typically numerical
measure that determines how well the
hypothesis matches the experience.

Nearly all machine learning algorithms can
be described with the following fairly
simple recipe:

Dataset
Cost function (Objective, loss)
Model
Optimization procedure

This week: discussed another
representation of Hypothesis (model).

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 3 / 42

Neural Network Resurgence

The ImageNet Large Scale Visual
Recognition Challenge (ILSVRC)
is an annual competition helped
between 2010 and 2017.

The datasets comprised
approximately 1 million images
and 1,000 object classes.

The annual challenge focuses on
multiple tasks for image
classification.

Image source: ImageNet
Alex Krizhevsky, et al. “ImageNet Classification with Deep Convolutional Neural Networks”

developed a convolutional neural network that achieved top results on the ILSVRC-2010 and
ILSVRC-2012 image classification tasks.

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 4 / 42

Neural Network Resurgence
AlexNET:

Convolution + Pooling MLP
Image: ImageNet Classification with Deep Convolutional Neural Networks

Convolutions allows the network to have lots of neurons while keeping the
number of actual parameters that need to be learned fairly small.

Convolutions has become very popular for verity of tasks including: Vision,
NLP, Speech processing, etc.

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 5 / 42

https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

Objective

Understand the importance of convolutions neural networks.
Understand variants of the basic convolution operation.

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 6 / 42

Outline

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 7 / 42

Learning Hierarchical Representations

Feature
Extractor

Mid-level
Features

Trainable
Classifier

Feature
Extractor

Mid-level
Features

Trainable
Classifier

Feature
Extractor

Trainable
Classifier

Unsupervised mid

Traditional

Deep Learning

Handcrafted features are time consuming, brittle and not scalable in
practice. DL learn underlying features directly from data.

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 8 / 42

Apply Neural Networks to Images

ŷFlatten

Each feature in the first hidden layer has a connection to each pixel.

In computer vision features are usually local in space and same operation is
applied across different locations.
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 9 / 42

Spacial Relationship

Natural language Processing (NLP):

Image: The Unreasonable Effectiveness of Recurrent Neural Networks

Speech Recognition (Voice to text):
“We have one hour before our appointment with the real estate agent.”
“There is no right way to write a great novel”

Not all tasks have such relationships. e.g: Predicting house prices using
some attributed like “number of rooms”, “suburb”, etc.

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 10 / 42

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

Apply Neural Networks to Images

In our ML model we would like to have these two properties.
Feature extraction usually happens locally – sparse connectivity.
In feature extraction the same operation is applied at different locations – parameter
sharing.

p1 p2 p3 p4 p5

h1 h2 h3 h4 h5

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 11 / 42

Sparse Connectivity

Feature extraction usually happens locally – sparse connectivity.
In feature extraction the same operation is applied at different locations – parameter sharing.

p1 p2 p3 p4 p5

h1 h2 h3 h4 h5

Fully Connected

p1 p2 p3 p4 p5

h1 h2 h3 h4 h5

Sparse connected

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 12 / 42

Sparse Connectivity

Feature extraction usually happens locally – sparse connectivity.
In feature extraction the same operation is applied at different locations – parameter sharing.

p1 p2 p3 p4 p5

h1 h2 h3 h4 h5

Fully Connected

p1 p2 p3 p4 p5

h1 h2 h3 h4 h5

Sparse connected

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 12 / 42

Parameter Sharing

Feature extraction usually happens locally – sparse connectivity.
In feature extraction the same operation is applied at different locations – parameter sharing.

p1 p2 p3 p4 p5

h1 h2 h3 h4 h5

w
1,

w
1,2 w 2

w
2,

w
2,3 w 3

w
3,

w
3,4 w 4

w
4,

w
4,5 w 5

w
5,

Sparse connected

p1 p2 p3 p4 p5

h1 h2 h3 h4 h5

w 1

w
2

w
3 w 1

w
2

w
3 w 1

w
2

w
3

Shared weights

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 13 / 42

Parameter Sharing

Feature extraction usually happens locally – sparse connectivity.
In feature extraction the same operation is applied at different locations – parameter sharing.

p1 p2 p3 p4 p5

h1 h2 h3 h4 h5

w
1,

w
1,2 w 2

w
2,

w
2,3 w 3

w
3,

w
3,4 w 4

w
4,

w
4,5 w 5

w
5,

Sparse connected

p1 p2 p3 p4 p5

h1 h2 h3 h4 h5
w 1

w
2

w
3

w 1

w
2

w
3 w 1

w
2

w
3

Shared weights

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 13 / 42

Parameter Sharing

Feature extraction usually happens locally – sparse connectivity.
In feature extraction the same operation is applied at different locations – parameter sharing.

p1 p2 p3 p4 p5

h1 h2 h3 h4 h5

w
1,

w
1,2 w 2

w
2,

w
2,3 w 3

w
3,

w
3,4 w 4

w
4,

w
4,5 w 5

w
5,

Sparse connected

p1 p2 p3 p4 p5

h1 h2 h3 h4 h5
w 1

w
2

w
3 w 1

w
2

w
3

w 1

w
2

w
3

Shared weights

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 13 / 42

Parameter Sharing

Feature extraction usually happens locally – sparse connectivity.
In feature extraction the same operation is applied at different locations – parameter sharing.

p1 p2 p3 p4 p5

h1 h2 h3 h4 h5

w
1,

w
1,2 w 2

w
2,

w
2,3 w 3

w
3,

w
3,4 w 4

w
4,

w
4,5 w 5

w
5,

Sparse connected

p1 p2 p3 p4 p5

h1 h2 h3 h4 h5
w 1

w
2

w
3 w 1

w
2

w
3 w 1

w
2

w
3

Shared weights

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 13 / 42

1D Convolution

Both ideas, sparse connectivity & parameter sharing can be achieved with convolutions.

Convolutions can also be implemented in a hierarchy, where each layer act on the features extracted by the
layer below.

p1 p2 p3 p4 p5

h(1)1 h
(1)
2 h

(1)
3 h

(1)
4 h

(1)
5

h(2)1 h
(2)
2 h

(2)
3 h

(2)
4 h

(2)
5

note that this network will have only 6 weights and 2 biases; compared to (25+5) + (25 + 5) parameters
in a fully connected network with the same number of neurons.

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 14 / 42

Outline

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 15 / 42

2D Convolution in Traditional Computer Vision

Feature
Extractor

Trainable
Classifier

Traditional

Extracting Edge Features

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 16 / 42

2D Convolution in Traditional Computer Vision

Sobal operator is human engineered

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 17 / 42

2D Convolution in Traditional Computer Vision

Io (i, j) =

1∑
p=−1

1∑
q=−1

wp,g × Iin (i − p, j − q)

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 17 / 42

2D Convolution in Traditional Computer Vision

Io (i, j) =

1∑
p=−1

1∑
q=−1

wp,g × Iin (i − p, j − q)

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 17 / 42

2D Convolution in Traditional Computer Vision

Io (i, j) =

1∑
p=−1

1∑
q=−1

wp,g × Iin (i − p, j − q)

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 17 / 42

Outline

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 18 / 42

2D Convolution

1 6 2 8 1 2 7
1 6 2 8 1 2 5
0 5 8 1 5 7 1
1 7 1 3 5 8 0
5 2 4 4 5 8 4
8 2 3 7 3 8 2
1 2 3 6 5 9 6

∗
w11 w12 w13
w21 w22 w23
w31 w32 w33

31 46 36

Image Weight Filter Output

The weights [w11, w12, · · · , w33] are learned from data. For this example lets
assume all weights are: [w11, w12, · · · , w33] = [1, 1, · · · , 1].

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 19 / 42

2D Convolution

1 6 2 8 1 2 7
1 6 2 8 1 2 5
0 5 8 1 5 7 1
1 7 1 3 5 8 0
5 2 4 4 5 8 4
8 2 3 7 3 8 2
1 2 3 6 5 9 6

∗
1 1 1
1 1 1
1 1 1

46 36

Image Weight Filter Output

The weights [w11, w12, · · · , w33] are learned from data. For this example lets
assume all weights are: [w11, w12, · · · , w33] = [1, 1, · · · , 1].

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 19 / 42

2D Convolution

1 6 2 8 1 2 7
1 6 2 8 1 2 5
0 5 8 1 5 7 1
1 7 1 3 5 8 0
5 2 4 4 5 8 4
8 2 3 7 3 8 2
1 2 3 6 5 9 6

∗
1 1 1
1 1 1
1 1 1

31 46

Image Weight Filter Output

The weights [w11, w12, · · · , w33] are learned from data. For this example lets
assume all weights are: [w11, w12, · · · , w33] = [1, 1, · · · , 1].

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 19 / 42

2D Convolution

1 6 2 8 1 2 7
1 6 2 8 1 2 5
0 5 8 1 5 7 1
1 7 1 3 5 8 0
5 2 4 4 5 8 4
8 2 3 7 3 8 2
1 2 3 6 5 9 6

∗
1 1 1
1 1 1
1 1 1

31 46 36

Image Weight Filter Output

The weights [w11, w12, · · · , w33] are learned from data. For this example lets
assume all weights are: [w11, w12, · · · , w33] = [1, 1, · · · , 1].

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 19 / 42

Padding

0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 00
0
0
0
0
0
0
0

0
0
0
0
0
0
0
01 6 2 8 1 2 7

1 6 2 8 1 2 5
0 5 8 1 5 7 1
1 7 1 3 5 8 0
5 2 4 4 5 8 4
8 2 3 7 3 8 2
1 2 3 6 5 9 6

∗
w11 w12 w13
w21 w22 w23
w31 w32 w33

14
31 46 36

The weights [w11, w12, · · · , w33] are learned from data. For this example lets
assume all weights are: [w11, w12, · · · , w33] = [1, 1, · · · , 1].

Padding will enable to get a output that is the same size of the input. the
most common type of padding is zero padding.

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 20 / 42

Padding

0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 00
0
0
0
0
0
0
0

0
0
0
0
0
0
0
01 6 2 8 1 2 7

1 6 2 8 1 2 5
0 5 8 1 5 7 1
1 7 1 3 5 8 0
5 2 4 4 5 8 4
8 2 3 7 3 8 2
1 2 3 6 5 9 6

∗
w11 w12 w13
w21 w22 w23
w31 w32 w33

14
31 46 36

The weights [w11, w12, · · · , w33] are learned from data. For this example lets
assume all weights are: [w11, w12, · · · , w33] = [1, 1, · · · , 1].

Padding will enable to get a output that is the same size of the input. the
most common type of padding is zero padding.

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 20 / 42

Stride

1 6 2 8 1 2 7
1 6 2 8 1 2 5
0 5 8 1 5 7 1
1 7 1 3 5 8 0
5 2 4 4 5 8 4
8 2 3 7 3 8 2
1 2 3 6 5 9 6

∗
w11 w12 w13
w21 w22 w23
w31 w32 w33

=
31 36

The weights [w11, w12, · · · , w33] are learned from data. For this example lets
assume all weights are: [w11, w12, · · · , w33] = [1, 1, · · · , 1]. The stride of 2 will

be used.
Stride will downsize the output by a factor related to the stride value.

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 21 / 42

2D Convolution Multiple Input Channels

C i

1 6 2 8 1 2 7
1 6 2 8 1 2 5
0 5 8 1 5 7 1
1 7 1 3 5 8 0
5 2 4 4 5 8 4
8 2 3 7 3 8 2
1 2 3 6 5 9 6

∗

C i

w11 w12 w13
w21 w22 w23
w31 w32 w33

The convolution operation can be easily extended to handle multiple input
channels.

The number of channels in the weights filter should be equal to the number of
channels in the input.

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 22 / 42

Multiple Filters

C i

1 6 2 8 1 2 7
1 6 2 8 1 2 5
0 5 8 1 5 7 1
1 7 1 3 5 8 0
5 2 4 4 5 8 4
8 2 3 7 3 8 2
1 2 3 6 5 9 6

∗
w11 w12 w13
w21 w22 w23
w31 w32 w33

C
o =

C o

We can have multiple weights filters. The number of output channels will now
be equal to the number of weight filters (C0).

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 23 / 42

Multiple Layers

W1
C2

Conv 3 × 3
Ch: C2

Activation
‘ReLU’

Conv 3 × 3
Ch: C3

We can represent a 2D convolution in tensor representation. The input to the convolution is
a tensor of size [B, H1, W 1, C1] and the output is tensor of size [B, H1, W 1, C2]. B is the
batch size (if padding is ‘same’).

tf.keras.layers.Conv2D(
filters, kernel_size, strides=(1, 1), padding=’valid’, data_format=None,
dilation_rate=(1, 1), groups=1, activation=None, use_bias=True,
kernel_initializer=’glorot_uniform’, bias_initializer=’zeros’,
kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None,
kernel_constraint=None, bias_constraint=None, **kwargs

)

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 24 / 42

Multiple Layers

W1
C2

Conv 3 × 3
Ch: C2

Activation
‘ReLU’

Conv 3 × 3
Ch: C3

Convolutions can be stacked one after the other.

)

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 24 / 42

Outline

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 25 / 42

Invariance to Translation

Image: Goodfellow, 2016

We would like our feature representations, not to change minimally when the input is shifted or
rotated slightly.

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 26 / 42

Invariance to Translation

Image: Goodfellow, 2016

We would like our feature representations, not to change minimally when the input is shifted or
rotated slightly.

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 26 / 42

Pooling

Image: Goodfellow, 2016

Pooling help reduce redundant information and provide some level of invariance to
translations.

Pooling can use simple mathematical functions like max, sum , etc. “max-pooling” is
the most common.

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 27 / 42

2D Pooling

1 6 2 8 1 2 7
1 6 2 8 1 2 5
0 5 8 1 5 7 1
1 7 1 3 5 8 0
5 2 4 4 5 8 4
8 2 3 7 3 8 2
1 2 3 6 5 9 6

=
6 8 2
7

In this example we use 2D pooling with 2× 2 and stride 2.

Pooling is done each channel separately.

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 28 / 42

2D Pooling

Image: https://cs231n.github.io/convolutional-networks/

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 29 / 42

Pooling & Convolutions

W1
C2

Conv 3 × 3
Ch: C2

Activation
+ Pool

Conv 3 × 3
Ch: C3

Convolutions can be combined with pooling to construct a chain of layers.

tf.keras.layers.MaxPool2D(
pool_size=(2, 2), strides=None, padding=’valid’, data_format=None, **kwargs

)

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 30 / 42

Pooling & Convolutions

W1
C2

Conv 3 × 3
Ch: C2

Activation
+ Pool

Conv 3 × 3
Ch: C3

Convolutions can be combined with pooling to construct a chain of layers.

tf.keras.layers.MaxPool2D(
pool_size=(2, 2), strides=None, padding=’valid’, data_format=None, **kwargs

)

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 30 / 42

LeNet Architecture
“LeNet is a classic example of convolutional neural network to successfully predict
handwritten digits.” [LeNet]

model = tf.keras.Sequential()
model.add(Conv2D(6, kernel_size=(5, 5), strides=(1, 1), activation=’tanh’, input_shape=input_shape, padding=”valid”))
model.add(AveragePooling2D(pool_size=(2, 2), strides=(2, 2), padding=’valid’))
model.add(Conv2D(16, kernel_size=(5, 5), strides=(1, 1), activation=’tanh’, padding=’valid’))
model.add(AveragePooling2D(pool_size=(2, 2), strides=(2, 2), padding=’valid’))
model.add(Flatten())
model.add(Dense(120, activation=’tanh’))
model.add(Dense(84, activation=’tanh’))
model.add(Dense(10, activation=’softmax’))

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 31 / 42

https://ieeexplore.ieee.org/abstract/document/726791

Outline

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 32 / 42

Strided Convolutions

Image: Goodfellow, 2016

Can replace convolution + pooling with strided convolutions.

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 33 / 42

Receptive Field

Assume 3 by 3 convolutions in each layer

Receptive field of the convolutional network, is defined as the size of the region
in the input that produces the feature.

Receptive field increases with the number of layers (depth).
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 34 / 42

Dilated Convolutions

What if you want to increase the receptive field without having so many layers?

Dilated Convolutions delivers a wider field of view (receptive field) at the same
computational cost. Also known as Atrous convolutions.

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 35 / 42

Transpose Convolution
Convolution + pooling (or strided convolutions) are usually used to reduce the output tensor
width and height in subsequent layers of a network.

What if you want to increase the output tensor width and height? You can use transpose
convolution also known as deconvolution (e.g. Image segmentation).

Image: https://github.com/vdumoulin/conv˙arithmetic

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 36 / 42

Outline

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 37 / 42

Feature Maps

Convolutional Neural Networks as a Model of the Visual System: Past, Present, and Future

Visualizing and Understanding Deep Neural Networks by Matt Zeiler

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 38 / 42

https://arxiv.org/pdf/2001.07092.pdf

Gabor Kernels

Image: Goodfellow, 2016

Gabor functions with a variety of parameter settings.

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 39 / 42

Gabor-like Learned Kernels

Image: Goodfellow, 2016

Many machine learning algorithms learn features that detect edges or specific colors of edges
when applied to natural images. These feature detectors are reminiscent of the Gabor

functions known to be present in the primary visual cortex.

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 40 / 42

Summary

1 Main components of CNN and why they work.
2 Extensions of the basic convolution operation.

Lab: Experiment with different components of feed forward neural networks.
Whey they work?

Next week:
1 Popular Convolutional neural network architectures.

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 41 / 42

Pooling & Convolutions

W1
C2

Conv 3 × 3
Ch: C2

Activation
+ Pool

Conv 3 × 3
Ch: C3

Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 42 / 42

Motivation
2D Convolution in Traditional Computer Vision
Basic Convolution Operation
Pooling Operation
Variants of the Basic Convolution
The Neuro-scientific Basis for Convolutional Networks

Related Posts