CSC 311: Introduction to Machine Learning
Lecture 11 – Convolutional Neural Networks
Anthony Bonner &
Slides by , Amir-massoud Farahmand, and
Copyright By PowCoder代写 加微信 powcoder
Intro ML (UofT) CSC311-Lec11 1 / 1
Neural Nets for Visual Object Recognition
People are very good at recognizing shapes
I Intrinsically di cult, computers are bad at it Why is it di cult?
CSC411 Lec11 2 / 43
Why is it a Problem?
Di cult scene conditions
[From: Grauman & Leibe]
CSC411 Lec11 3 / 43
Why is it a Problem?
Huge within-class variations. Recognition is mainly about modeling variation.
[Pic from: S. Lazebnik]
CSC411 Lec11 4 / 43
Why is it a Problem?
Tons of classes
[Biederman]
CSC411 Lec11 5 / 43
Neural Nets for Object Recognition
People are very good at recognizing object
I Intrinsically di cult, computers are bad at it Some reasons why it is di cult:
I Segmentation: Real scenes are cluttered
I Invariances: We are very good at ignoring all sorts of variations that do
not a↵ect class
I Deformations: Natural object classes allow variations (faces, letters,
I A huge amount of computation is required
CSC411 Lec11 6 / 43
How to Deal with Large Input Spaces
How can we apply neural nets to images?
Images can have millions of pixels, i.e., x is very high dimensional How many parameters do I have?
Prohibitive to have fully-connected layers
What can we do?
We can use a locally connected layer
CSC411 Lec11
Locally Connected Layer
Example: 200×200 image 40K hidden units Filter size: 10×10
4M parameters
Note: This parameterization is good when input image is registered (e.g., 34 face recognition). Ranzato
CSC411 Lec11 8 / 43
When Will this Work?
When Will this Work?
This is good when the input is (roughly) registered
CSC411 Lec11 9 / 43
General Images
The object can be anywhere
[Slide: Y. Zhu]
CSC411 Lec11 10 / 43
General Images
The object can be anywhere
[Slide: Y. Zhu]
CSC411 Lec11 11 / 43
General Images
The object can be anywhere
[Slide: Y. Zhu]
CSC411 Lec11 12 / 43
The Invariance Problem
Our perceptual systems are very good at dealing with invariances I translation, rotation, scaling
I deformation, contrast, lighting
We are so good at this that its hard to appreciate how di cult it is
I Its one of the main di culties in making computers perceive I We still don’t have generally accepted solutions
CSC411 Lec11
Locally Connected Layer
STATIONARITY? Statistics is similar at different locations
Example: 200×200 image 40K hidden units Filter size: 10×10
4M parameters
Note: This parameterization is good when input image is registered (e.g., 35 face recognition). Ranzato
CSC411 Lec11 14 / 43
The replicated feature approach
The red connections all have the same weight.
Adopt approach apparently used in monkey visual systems
Use many di↵erent copies of the same feature detector.
I Copies have slightly di↵erent positions.
I Could also replicate across scale and orientation.
I Tricky and expensive
I Replication reduces the number of
free parameters to be learned.
Use several di↵erent feature types, each
5 with its own replicated pool of detectors. I Allows each patch of image to be
represented in several ways.
CSC411 Lec11
Convolutional Neural Net
Idea: statistics are similar at di↵erent locations (Lecun 1998)
Connect each hidden unit to a small input patch and share the weight across space
This is called a convolution layer and the network is a convolutional network
CSC411 Lec11 16 / 43
Convolutional Layer
hn = max(0, hn 1 ⇤ wn )
Zemel, Urtasun, Fidler (UofT) CSC 411: 11-Neural Networks II 17 / 55
Convolutional Layer
hn = max(0, hn 1 ⇤ wn )
Zemel, Urtasun, Fidler (UofT) CSC 411: 11-Neural Networks II 18 / 55
Convolutional Layer
hn = max(0, hn 1 ⇤ wn )
Zemel, Urtasun, Fidler (UofT) CSC 411: 11-Neural Networks II 19 / 55
Convolutional Layer
hn = max(0, hn 1 ⇤ wn )
Zemel, Urtasun, Fidler (UofT) CSC 411: 11-Neural Networks II 20 / 55
Convolutional Layer
hn = max(0, hn 1 ⇤ wn )
Zemel, Urtasun, Fidler (UofT) CSC 411: 11-Neural Networks II 21 / 55
Convolutional Layer
hn = max(0, hn 1 ⇤ wn )
Zemel, Urtasun, Fidler (UofT) CSC 411: 11-Neural Networks II 22 / 55
Convolution
Convolution layers are named after the convolution operation. If a and b are two arrays,
(a⇤b)t =Xa⌧bt ⌧. ⌧
CSC411 Lec11
Convolution
“Flip and Filter” interpretation:
CSC411 Lec11 18 / 43
2-D Convolution
2-D convolution is analogous:
(A⇤B)ij =XXAstBi s,j t.
CSC411 Lec11 19 / 43
2-D Convolution
The thing we convolve by is called a kernel, or filter. What does this convolution kernel do?
CSC411 Lec11 20 / 43
2-D Convolution
What does this convolution kernel do?
CSC411 Lec11 21 / 43
2-D Convolution
What does this convolution kernel do?
CSC411 Lec11 22 / 43
2-D Convolution
What does this convolution kernel do?
CSC411 Lec11 23 / 43
Convolutional Layer
Learn multiple filters.
E.g.: 200×200 image 100 Filters
Filter size: 10×10 10K parameters
CSC411 Lec11 24 / 43
Convolutional Layer
Figure: Left: CNN, right: Each neuron computes a linear and activation function Hyperparameters of a convolutional layer:
The number of filters (controls the depth of the output volume) The stride: how many units apart do we apply a filter spatially (this
controls the spatial size of the output volume)
The size w ⇥ h of the filters
[http://cs231n.github.io/convolutional-networks/]
CSC411 Lec11 25 / 43
MLP vs ConvNet
Figure : Top: MLP, bottom: Convolutional neural network
[http://cs231n.github.io/convolutional-networks/]
Zemel, Urtasun, Fidler (UofT) CSC 411: 11-Neural Networks II 25 / 55
Pooling Layer
By “pooling” (e.g., taking max) filter responses at different locations we gain robustness to the exact spatial location of features.
CSC411 Lec11 26 / 43
Pooling Options
: return the maximal argument
Average Pooling: return the average of the arguments Other types of pooling exist.
CSC411 Lec11 27 / 43
Figure: Left: Pooling, right: max pooling example Hyperparameters of a pooling layer:
The spatial extent F The stride
[http://cs231n.github.io/convolutional-networks/]
CSC411 Lec11 28 / 43
Backpropagation with Weight Constraints
The backprop procedure from last lecture can be applied directly to conv nets.
This is covered in csc421.
As a user, you don’t need to worry about the details, since they’re handled by automatic di↵erentiation packages.
CSC411 Lec11 29 / 43
Here’s the LeNet architecture, which was applied to handwritten digit recognition on MNIST in 1998:
The!architecture!of!LeNet5!
CSC411 Lec11 30 / 43
Imagenet, biggest dataset for object classification: http://image-net.org/ 1000 classes, 1.2M training images, 150K for test
CSC411 Lec11 31 / 43
AlexNet, 2012. 8 weight layers. 16.4% top-5 error (i.e. the network gets 5 tries to guess the right category).
Figure 2: An illustration of the architecture of our CNN, explicitly showing the delineation of responsibilities
between the two GPUs. One GPU runs the layer-parts at the top of the figure while the other runs the layer-parts
The two processing pathways correspond to 2 GPUs. (At the time, the network
at the bottom. The GPUs communicate only at certain layers. The network’s input is 150,528-dimensional, and
couldn’t fit on one GPU.)
the number of neurons in the network’s remaining layers is given by 253,440–186,624–64,896–64,896–43,264– 4096–4096–1000.
AlexNet’s stunning performance on the ILSVRC is what set o↵ the deep learning boom of the last 6 years.
neurons in a kernel map). The second convolutional layer takes as input the (response-normalized
and pooled) output of the first convolutional layer and filters it with 256 kernels of size 5 ⇥ 5 ⇥ 48. CSC411 Lec11 32 / 43
(Krizhevsky et al., 2012)
The third, fourth, and fifth convolutional layers are connected to one another without any intervening
150 Layers!
Networks are now at 150 layers
They use a skip connections with special form
In fact, they don’t fit on this screen
Amazing performance!
A lot of “mistakes” are due to wrong ground-truth
[He, K., Zhang, X., Ren, S. and Sun, J., 2015. Deep Residual Learning for Image Recognition. arXiv:1512.03385, 2016]
CSC411 Lec11
Results: Object Classification
Slide: R. Liao, Paper: [He, K., Zhang, X., Ren, S. and Sun, J., 2015. Deep Residual Learning for Image Recognition. arXiv:1512.03385, 2016]
CSC411 Lec11 34 / 43
Results: Object Detection
Slide: R. Liao, Paper: [He, K., Zhang, X., Ren, S. and Sun, J., 2015. Deep Residual Learning for Image Recognition. arXiv:1512.03385, 2016]
CSC411 Lec11 35 / 43
Results: Object Detection
Slide: R. Liao, Paper: [He, K., Zhang, X., Ren, S. and Sun, J., 2015. Deep Residual Learning for Image Recognition. arXiv:1512.03385, 2016]
CSC411 Lec11 36 / 43
Results: Object Detection
Slide: R. Liao, Paper: [He, K., Zhang, X., Ren, S. and Sun, J., 2015. Deep Residual Learning for Image Recognition.
arXiv:1512.03385, 2016]
CSC411 Lec11 37 / 43
Results: Object Detection
Slide: R. Liao, Paper: [He, K., Zhang, X., Ren, S. and Sun, J., 2015. Deep Residual Learning for Image Recognition. arXiv:1512.03385, 2016]
CSC411 Lec11 38 / 43
What do CNNs Learn?
Figure: Filters in the first convolutional layer of Krizhevsky et al
CSC411 Lec11 39 / 43
What do CNNs Learn?
Figure: Filters in the second layer [http://arxiv.org/pdf/1311.2901v3.pdf]
CSC411 Lec11 40 / 43
What do CNNs Learn?
[http://arxiv.org/pdf/1311.2901v3.pdf]
Figure: Filters in the third layer
CSC411 Lec11 41 / 43
What do CNNs Learn?
[http://arxiv.org/pdf/1311.2901v3.pdf]
CSC411 Lec11 42 / 43
Tricking a Neural Net
Read about it here (and try it!): https://codewords.recurse.com/issues/five/ why-do-neural-networks-think-a-panda-is-a-vulture
Watch: https://www.youtube.com/watch?v=M2IebCN9Ht4
Zemel, Urtasun, Fidler (UofT) CSC 411: 11-Neural Networks II 51 / 55
More on NNs
Figure : Generate images: http://arxiv.org/pdf/1511.06434v1.pdf Zemel, Urtasun, Fidler (UofT) CSC 411: 11-Neural Networks II 52 / 55
More on NNs
Generate text: https://vimeo.com/146492001, https://github.com/karpathy/neuraltalk2, https://github.com/ryankiros/visual-semantic-embedding
Zemel, Urtasun, Fidler (UofT) CSC 411: 11-Neural Networks II 53 / 55
More on NNs
Figure : Compose music: https://www.youtube.com/watch?v=0VTI1BBLydE
Zemel, Urtasun, Fidler (UofT) CSC 411: 11-Neural Networks II 54 / 55
Great course dedicated to NN: http://cs231n.stanford.edu Over source frameworks:
I Pytorch http://pytorch.org/
I Tensorflow https://www.tensorflow.org/ I Ca↵e http://caffe.berkeleyvision.org/
Most cited NN papers:
https://github.com/terryum/awesome-deep-learning-papers
CSC411 Lec11
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com