Deep Learning – COSC2779 – Convolutional Neural Networks
Deep Learning – COSC2779
Convolutional Neural Networks
Dr. Ruwan Tennakoon
August 9, 2021
Reference: Chapter 9: Ian Goodfellow et. al., “Deep Learning”, MIT Press, 2016.
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 1 / 42
Outline
1 Motivation
2 2D Convolution in Traditional Computer Vision
3 Basic Convolution Operation
4 Pooling Operation
5 Variants of the Basic Convolution
6 The Neuro-scientific Basis for Convolutional Networks
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 2 / 42
Machine Learning
The Task can be expressed an unknown
target function: y = f (x)
ML finds a Hypothesis (model), h (·), from
hypothesis space H, which approximates
the unknown target function.
ŷ = h∗ (x) ≈ f (x)
The Experience is typically a data set, D,
of values
The Performance is typically numerical
measure that determines how well the
hypothesis matches the experience.
Nearly all machine learning algorithms can
be described with the following fairly
simple recipe:
Dataset
Cost function (Objective, loss)
Model
Optimization procedure
This week: discussed another
representation of Hypothesis (model).
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 3 / 42
Neural Network Resurgence
The ImageNet Large Scale Visual
Recognition Challenge (ILSVRC)
is an annual competition helped
between 2010 and 2017.
The datasets comprised
approximately 1 million images
and 1,000 object classes.
The annual challenge focuses on
multiple tasks for image
classification.
Image source: ImageNet
Alex Krizhevsky, et al. “ImageNet Classification with Deep Convolutional Neural Networks”
developed a convolutional neural network that achieved top results on the ILSVRC-2010 and
ILSVRC-2012 image classification tasks.
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 4 / 42
Neural Network Resurgence
AlexNET:
Convolution + Pooling MLP
Image: ImageNet Classification with Deep Convolutional Neural Networks
Convolutions allows the network to have lots of neurons while keeping the
number of actual parameters that need to be learned fairly small.
Convolutions has become very popular for verity of tasks including: Vision,
NLP, Speech processing, etc.
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 5 / 42
https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
Objective
Understand the importance of convolutions neural networks.
Understand variants of the basic convolution operation.
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 6 / 42
Outline
1 Motivation
2 2D Convolution in Traditional Computer Vision
3 Basic Convolution Operation
4 Pooling Operation
5 Variants of the Basic Convolution
6 The Neuro-scientific Basis for Convolutional Networks
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 7 / 42
Learning Hierarchical Representations
Feature
Extractor
Mid-level
Features
Trainable
Classifier
Feature
Extractor
Mid-level
Features
Trainable
Classifier
Feature
Extractor
Trainable
Classifier
Unsupervised mid
Traditional
Deep Learning
Handcrafted features are time consuming, brittle and not scalable in
practice. DL learn underlying features directly from data.
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 8 / 42
Apply Neural Networks to Images
p1
p2
p3
p4
p5
ŷFlatten
Each feature in the first hidden layer has a connection to each pixel.
In computer vision features are usually local in space and same operation is
applied across different locations.
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 9 / 42
Spacial Relationship
Natural language Processing (NLP):
Image: The Unreasonable Effectiveness of Recurrent Neural Networks
Speech Recognition (Voice to text):
“We have one hour before our appointment with the real estate agent.”
“There is no right way to write a great novel”
Not all tasks have such relationships. e.g: Predicting house prices using
some attributed like “number of rooms”, “suburb”, etc.
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 10 / 42
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
Apply Neural Networks to Images
In our ML model we would like to have these two properties.
Feature extraction usually happens locally – sparse connectivity.
In feature extraction the same operation is applied at different locations – parameter
sharing.
p1 p2 p3 p4 p5
h1 h2 h3 h4 h5
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 11 / 42
Sparse Connectivity
Feature extraction usually happens locally – sparse connectivity.
In feature extraction the same operation is applied at different locations – parameter sharing.
p1 p2 p3 p4 p5
h1 h2 h3 h4 h5
Fully Connected
p1 p2 p3 p4 p5
h1 h2 h3 h4 h5
Sparse connected
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 12 / 42
Sparse Connectivity
Feature extraction usually happens locally – sparse connectivity.
In feature extraction the same operation is applied at different locations – parameter sharing.
p1 p2 p3 p4 p5
h1 h2 h3 h4 h5
Fully Connected
p1 p2 p3 p4 p5
h1 h2 h3 h4 h5
Sparse connected
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 12 / 42
Parameter Sharing
Feature extraction usually happens locally – sparse connectivity.
In feature extraction the same operation is applied at different locations – parameter sharing.
p1 p2 p3 p4 p5
h1 h2 h3 h4 h5
w
1,
1
w
1,2 w 2
,1
w
2,
2
w
2,3 w 3
,2
w
3,
3
w
3,4 w 4
,3
w
4,
4
w
4,5 w 5
,4
w
5,
5
Sparse connected
p1 p2 p3 p4 p5
h1 h2 h3 h4 h5
w 1
w
2
w
3 w 1
w
2
w
3 w 1
w
2
w
3
Shared weights
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 13 / 42
Parameter Sharing
Feature extraction usually happens locally – sparse connectivity.
In feature extraction the same operation is applied at different locations – parameter sharing.
p1 p2 p3 p4 p5
h1 h2 h3 h4 h5
w
1,
1
w
1,2 w 2
,1
w
2,
2
w
2,3 w 3
,2
w
3,
3
w
3,4 w 4
,3
w
4,
4
w
4,5 w 5
,4
w
5,
5
Sparse connected
p1 p2 p3 p4 p5
h1 h2 h3 h4 h5
w 1
w
2
w
3
w 1
w
2
w
3 w 1
w
2
w
3
Shared weights
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 13 / 42
Parameter Sharing
Feature extraction usually happens locally – sparse connectivity.
In feature extraction the same operation is applied at different locations – parameter sharing.
p1 p2 p3 p4 p5
h1 h2 h3 h4 h5
w
1,
1
w
1,2 w 2
,1
w
2,
2
w
2,3 w 3
,2
w
3,
3
w
3,4 w 4
,3
w
4,
4
w
4,5 w 5
,4
w
5,
5
Sparse connected
p1 p2 p3 p4 p5
h1 h2 h3 h4 h5
w 1
w
2
w
3 w 1
w
2
w
3
w 1
w
2
w
3
Shared weights
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 13 / 42
Parameter Sharing
Feature extraction usually happens locally – sparse connectivity.
In feature extraction the same operation is applied at different locations – parameter sharing.
p1 p2 p3 p4 p5
h1 h2 h3 h4 h5
w
1,
1
w
1,2 w 2
,1
w
2,
2
w
2,3 w 3
,2
w
3,
3
w
3,4 w 4
,3
w
4,
4
w
4,5 w 5
,4
w
5,
5
Sparse connected
p1 p2 p3 p4 p5
h1 h2 h3 h4 h5
w 1
w
2
w
3 w 1
w
2
w
3 w 1
w
2
w
3
Shared weights
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 13 / 42
1D Convolution
Both ideas, sparse connectivity & parameter sharing can be achieved with convolutions.
Convolutions can also be implemented in a hierarchy, where each layer act on the features extracted by the
layer below.
p1 p2 p3 p4 p5
h(1)1 h
(1)
2 h
(1)
3 h
(1)
4 h
(1)
5
h(2)1 h
(2)
2 h
(2)
3 h
(2)
4 h
(2)
5
note that this network will have only 6 weights and 2 biases; compared to (25+5) + (25 + 5) parameters
in a fully connected network with the same number of neurons.
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 14 / 42
Outline
1 Motivation
2 2D Convolution in Traditional Computer Vision
3 Basic Convolution Operation
4 Pooling Operation
5 Variants of the Basic Convolution
6 The Neuro-scientific Basis for Convolutional Networks
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 15 / 42
2D Convolution in Traditional Computer Vision
Feature
Extractor
Trainable
Classifier
Traditional
Extracting Edge Features
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 16 / 42
2D Convolution in Traditional Computer Vision
Sobal operator is human engineered
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 17 / 42
2D Convolution in Traditional Computer Vision
Io (i, j) =
1∑
p=−1
1∑
q=−1
wp,g × Iin (i − p, j − q)
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 17 / 42
2D Convolution in Traditional Computer Vision
Io (i, j) =
1∑
p=−1
1∑
q=−1
wp,g × Iin (i − p, j − q)
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 17 / 42
2D Convolution in Traditional Computer Vision
Io (i, j) =
1∑
p=−1
1∑
q=−1
wp,g × Iin (i − p, j − q)
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 17 / 42
Outline
1 Motivation
2 2D Convolution in Traditional Computer Vision
3 Basic Convolution Operation
4 Pooling Operation
5 Variants of the Basic Convolution
6 The Neuro-scientific Basis for Convolutional Networks
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 18 / 42
2D Convolution
1 6 2 8 1 2 7
1 6 2 8 1 2 5
0 5 8 1 5 7 1
1 7 1 3 5 8 0
5 2 4 4 5 8 4
8 2 3 7 3 8 2
1 2 3 6 5 9 6
∗
w11 w12 w13
w21 w22 w23
w31 w32 w33
=
31 46 36
Image Weight Filter Output
The weights [w11, w12, · · · , w33] are learned from data. For this example lets
assume all weights are: [w11, w12, · · · , w33] = [1, 1, · · · , 1].
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 19 / 42
2D Convolution
1 6 2 8 1 2 7
1 6 2 8 1 2 5
0 5 8 1 5 7 1
1 7 1 3 5 8 0
5 2 4 4 5 8 4
8 2 3 7 3 8 2
1 2 3 6 5 9 6
∗
1 1 1
1 1 1
1 1 1
=
31
46 36
Image Weight Filter Output
The weights [w11, w12, · · · , w33] are learned from data. For this example lets
assume all weights are: [w11, w12, · · · , w33] = [1, 1, · · · , 1].
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 19 / 42
2D Convolution
1 6 2 8 1 2 7
1 6 2 8 1 2 5
0 5 8 1 5 7 1
1 7 1 3 5 8 0
5 2 4 4 5 8 4
8 2 3 7 3 8 2
1 2 3 6 5 9 6
∗
1 1 1
1 1 1
1 1 1
=
31 46
36
Image Weight Filter Output
The weights [w11, w12, · · · , w33] are learned from data. For this example lets
assume all weights are: [w11, w12, · · · , w33] = [1, 1, · · · , 1].
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 19 / 42
2D Convolution
1 6 2 8 1 2 7
1 6 2 8 1 2 5
0 5 8 1 5 7 1
1 7 1 3 5 8 0
5 2 4 4 5 8 4
8 2 3 7 3 8 2
1 2 3 6 5 9 6
∗
1 1 1
1 1 1
1 1 1
=
31 46 36
Image Weight Filter Output
The weights [w11, w12, · · · , w33] are learned from data. For this example lets
assume all weights are: [w11, w12, · · · , w33] = [1, 1, · · · , 1].
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 19 / 42
Padding
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 00
0
0
0
0
0
0
0
0
0
0
0
0
0
0
01 6 2 8 1 2 7
1 6 2 8 1 2 5
0 5 8 1 5 7 1
1 7 1 3 5 8 0
5 2 4 4 5 8 4
8 2 3 7 3 8 2
1 2 3 6 5 9 6
∗
w11 w12 w13
w21 w22 w23
w31 w32 w33
=
14
31 46 36
The weights [w11, w12, · · · , w33] are learned from data. For this example lets
assume all weights are: [w11, w12, · · · , w33] = [1, 1, · · · , 1].
Padding will enable to get a output that is the same size of the input. the
most common type of padding is zero padding.
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 20 / 42
Padding
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 00
0
0
0
0
0
0
0
0
0
0
0
0
0
0
01 6 2 8 1 2 7
1 6 2 8 1 2 5
0 5 8 1 5 7 1
1 7 1 3 5 8 0
5 2 4 4 5 8 4
8 2 3 7 3 8 2
1 2 3 6 5 9 6
∗
w11 w12 w13
w21 w22 w23
w31 w32 w33
=
14
31 46 36
The weights [w11, w12, · · · , w33] are learned from data. For this example lets
assume all weights are: [w11, w12, · · · , w33] = [1, 1, · · · , 1].
Padding will enable to get a output that is the same size of the input. the
most common type of padding is zero padding.
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 20 / 42
Stride
1 6 2 8 1 2 7
1 6 2 8 1 2 5
0 5 8 1 5 7 1
1 7 1 3 5 8 0
5 2 4 4 5 8 4
8 2 3 7 3 8 2
1 2 3 6 5 9 6
∗
w11 w12 w13
w21 w22 w23
w31 w32 w33
=
31 36
The weights [w11, w12, · · · , w33] are learned from data. For this example lets
assume all weights are: [w11, w12, · · · , w33] = [1, 1, · · · , 1]. The stride of 2 will
be used.
Stride will downsize the output by a factor related to the stride value.
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 21 / 42
2D Convolution Multiple Input Channels
C i
1 6 2 8 1 2 7
1 6 2 8 1 2 5
0 5 8 1 5 7 1
1 7 1 3 5 8 0
5 2 4 4 5 8 4
8 2 3 7 3 8 2
1 2 3 6 5 9 6
∗
C i
w11 w12 w13
w21 w22 w23
w31 w32 w33
=
The convolution operation can be easily extended to handle multiple input
channels.
The number of channels in the weights filter should be equal to the number of
channels in the input.
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 22 / 42
Multiple Filters
C i
1 6 2 8 1 2 7
1 6 2 8 1 2 5
0 5 8 1 5 7 1
1 7 1 3 5 8 0
5 2 4 4 5 8 4
8 2 3 7 3 8 2
1 2 3 6 5 9 6
∗
w11 w12 w13
w21 w22 w23
w31 w32 w33
C
o =
C o
We can have multiple weights filters. The number of output channels will now
be equal to the number of weight filters (C0).
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 23 / 42
Multiple Layers
C1
H1
W1
C2
Conv 3 × 3
Ch: C2
C2
Activation
‘ReLU’
C3
Conv 3 × 3
Ch: C3
We can represent a 2D convolution in tensor representation. The input to the convolution is
a tensor of size [B, H1, W 1, C1] and the output is tensor of size [B, H1, W 1, C2]. B is the
batch size (if padding is ‘same’).
tf.keras.layers.Conv2D(
filters, kernel_size, strides=(1, 1), padding=’valid’, data_format=None,
dilation_rate=(1, 1), groups=1, activation=None, use_bias=True,
kernel_initializer=’glorot_uniform’, bias_initializer=’zeros’,
kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None,
kernel_constraint=None, bias_constraint=None, **kwargs
)
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 24 / 42
Multiple Layers
C1
H1
W1
C2
Conv 3 × 3
Ch: C2
C2
Activation
‘ReLU’
C3
Conv 3 × 3
Ch: C3
Convolutions can be stacked one after the other.
tf.keras.layers.Conv2D(
filters, kernel_size, strides=(1, 1), padding=’valid’, data_format=None,
dilation_rate=(1, 1), groups=1, activation=None, use_bias=True,
kernel_initializer=’glorot_uniform’, bias_initializer=’zeros’,
kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None,
kernel_constraint=None, bias_constraint=None, **kwargs
)
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 24 / 42
Outline
1 Motivation
2 2D Convolution in Traditional Computer Vision
3 Basic Convolution Operation
4 Pooling Operation
5 Variants of the Basic Convolution
6 The Neuro-scientific Basis for Convolutional Networks
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 25 / 42
Invariance to Translation
Image: Goodfellow, 2016
We would like our feature representations, not to change minimally when the input is shifted or
rotated slightly.
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 26 / 42
Invariance to Translation
Image: Goodfellow, 2016
We would like our feature representations, not to change minimally when the input is shifted or
rotated slightly.
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 26 / 42
Pooling
Image: Goodfellow, 2016
Pooling help reduce redundant information and provide some level of invariance to
translations.
Pooling can use simple mathematical functions like max, sum , etc. “max-pooling” is
the most common.
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 27 / 42
2D Pooling
1 6 2 8 1 2 7
1 6 2 8 1 2 5
0 5 8 1 5 7 1
1 7 1 3 5 8 0
5 2 4 4 5 8 4
8 2 3 7 3 8 2
1 2 3 6 5 9 6
=
6 8 2
7
In this example we use 2D pooling with 2× 2 and stride 2.
Pooling is done each channel separately.
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 28 / 42
2D Pooling
Image: https://cs231n.github.io/convolutional-networks/
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 29 / 42
Pooling & Convolutions
C1
H1
W1
C2
Conv 3 × 3
Ch: C2
C2
Activation
+ Pool
C3
Conv 3 × 3
Ch: C3
Convolutions can be combined with pooling to construct a chain of layers.
tf.keras.layers.MaxPool2D(
pool_size=(2, 2), strides=None, padding=’valid’, data_format=None, **kwargs
)
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 30 / 42
Pooling & Convolutions
C1
H1
W1
C2
Conv 3 × 3
Ch: C2
C2
Activation
+ Pool
C3
Conv 3 × 3
Ch: C3
Convolutions can be combined with pooling to construct a chain of layers.
tf.keras.layers.MaxPool2D(
pool_size=(2, 2), strides=None, padding=’valid’, data_format=None, **kwargs
)
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 30 / 42
LeNet Architecture
“LeNet is a classic example of convolutional neural network to successfully predict
handwritten digits.” [LeNet]
model = tf.keras.Sequential()
model.add(Conv2D(6, kernel_size=(5, 5), strides=(1, 1), activation=’tanh’, input_shape=input_shape, padding=”valid”))
model.add(AveragePooling2D(pool_size=(2, 2), strides=(2, 2), padding=’valid’))
model.add(Conv2D(16, kernel_size=(5, 5), strides=(1, 1), activation=’tanh’, padding=’valid’))
model.add(AveragePooling2D(pool_size=(2, 2), strides=(2, 2), padding=’valid’))
model.add(Flatten())
model.add(Dense(120, activation=’tanh’))
model.add(Dense(84, activation=’tanh’))
model.add(Dense(10, activation=’softmax’))
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 31 / 42
https://ieeexplore.ieee.org/abstract/document/726791
Outline
1 Motivation
2 2D Convolution in Traditional Computer Vision
3 Basic Convolution Operation
4 Pooling Operation
5 Variants of the Basic Convolution
6 The Neuro-scientific Basis for Convolutional Networks
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 32 / 42
Strided Convolutions
Image: Goodfellow, 2016
Can replace convolution + pooling with strided convolutions.
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 33 / 42
Receptive Field
Assume 3 by 3 convolutions in each layer
Receptive field of the convolutional network, is defined as the size of the region
in the input that produces the feature.
Receptive field increases with the number of layers (depth).
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 34 / 42
Dilated Convolutions
What if you want to increase the receptive field without having so many layers?
Dilated Convolutions delivers a wider field of view (receptive field) at the same
computational cost. Also known as Atrous convolutions.
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 35 / 42
Transpose Convolution
Convolution + pooling (or strided convolutions) are usually used to reduce the output tensor
width and height in subsequent layers of a network.
What if you want to increase the output tensor width and height? You can use transpose
convolution also known as deconvolution (e.g. Image segmentation).
Image: https://github.com/vdumoulin/conv˙arithmetic
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 36 / 42
Outline
1 Motivation
2 2D Convolution in Traditional Computer Vision
3 Basic Convolution Operation
4 Pooling Operation
5 Variants of the Basic Convolution
6 The Neuro-scientific Basis for Convolutional Networks
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 37 / 42
Feature Maps
Convolutional Neural Networks as a Model of the Visual System: Past, Present, and Future
Visualizing and Understanding Deep Neural Networks by Matt Zeiler
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 38 / 42
https://arxiv.org/pdf/2001.07092.pdf
Gabor Kernels
Image: Goodfellow, 2016
Gabor functions with a variety of parameter settings.
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 39 / 42
Gabor-like Learned Kernels
Image: Goodfellow, 2016
Many machine learning algorithms learn features that detect edges or specific colors of edges
when applied to natural images. These feature detectors are reminiscent of the Gabor
functions known to be present in the primary visual cortex.
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 40 / 42
Summary
1 Main components of CNN and why they work.
2 Extensions of the basic convolution operation.
Lab: Experiment with different components of feed forward neural networks.
Whey they work?
Next week:
1 Popular Convolutional neural network architectures.
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 41 / 42
Pooling & Convolutions
C1
H1
W1
C2
Conv 3 × 3
Ch: C2
C2
Activation
+ Pool
C3
Conv 3 × 3
Ch: C3
Lecture 4 (Part 1) Deep Learning – COSC2779 August 9, 2021 42 / 42
Motivation
2D Convolution in Traditional Computer Vision
Basic Convolution Operation
Pooling Operation
Variants of the Basic Convolution
The Neuro-scientific Basis for Convolutional Networks