Neural Networks III
Today: Outline
• Neural networks cont’d
• Types of networks: Feed-forward networks,
convolutional networks, recurrent networks
• ConvNets: multiplication vs convolution; filters (or kernels); convolutional layers; 1D and 2D convolution; pooling layers; LeNet, CIFAR10Net
Machine Learning 2017, Kate Saenko 2
Neural Networks III
Network Architectures
Neural networks: recap
𝑥 h𝑖
hΘ(𝑥)
Learn parameters via gradient descent
Backpropagation efficiently computes cost (forward pass) and gradient (backward pass)
h
𝑥
𝑦
Machine Learning 2019, Kate Saenko 4
hidden hidden
hidden
Network architectures
Feed-forward
Fully connected
Recurrent time→
Convolutional
Machine Learning 2017, Kate Saenko
5
output
hidden hidden hidden
input
output
input
output
input
output
hidden hidden
hidden
input
Neural Networks III
Convolutional Architectures
Multiplication vs convolution
Input
+2 +2
• •
Recall, a neuron can be thought of as learning to spot certain features in the input
E.g., this neuron detects change from high to low (light to dark) between 3rd and 4th inputs
0
0 +1 -1
0
Sum Squash
+3 1 activation
0
-3
-2
Deep Learning 2017, Brian Kulis & Kate Saenko 7
Multiplication vs convolution
Input
What if the change happens between 1st and 2nd inputs? Neuron no longer activates
• Must have a new neuron for each new location of pattern???
• This is not efficient
• Solution: use convolution
instead of multiplication
0
-3
0
Sum Squash
0
+2 +1 0 0
-1
0
activation
+2
+2
Deep Learning 2017, Brian Kulis & Kate Saenko 8
Multiplication vs convolution
Input
0
-3
+2 -1 +1
• New weights are of size 2 x 1; called filter, or kernel
• Newoutputisthesizeofinput minus 1 because of boundary
• Newconvolutionalneuronsall share the same weights! This is much more efficient; we learn the weights once instead of many times for each position
+1 -1
+1
+3
-5 0
0 00
1
+2
+2 -1
-1
+1
0
Deep Learning 2017, Brian Kulis & Kate Saenko 9
Multiplication vs convolution
Padded Input
0
-3
+2 -1 +1
+2
+2 -1 +1
0 -1
• •
New output is the size of input minus 1 because of boundary
We can fix the boundary effect by padding the input with 0 and adding one more neuron
+1
-1 +3
+1
1 -5 0
-1 0 +1
0 00 +2 1
Deep Learning 2017, Brian Kulis & Kate Saenko 10
Multiplication vs convolution
Padded Input
0
-3
+2 -1 +1
+2
+2 -1 +1
0 -1
• Note, we move the filter by 1 each time, this is called stride
+1
-1 +3
+1
1 -5 0
-1 0 +1
0 00 +2 1
Deep Learning 2017, Brian Kulis & Kate Saenko 11
Multiplication vs convolution
Padded Input
0
-3
+2
+2 +2
+1
-1 +3
+1
-1 0
1
• Note, we move the filter by 1 each time, this is called stride
• Stride can be larger, e.g. here is stride 2
0
+1 0 -1
+2 1
Deep Learning 2017, Brian Kulis & Kate Saenko 12
Multiplication vs convolution
Padded Input
0 -1 0 +1 +1
• Wecanaddanotherfilter,this time to detect opposite change with weights [-1 +1]
• Unique filters are called channels
-3
+2
+2 +2
-1 1 -1
+1
-1 0
+1
0
0 1
+1 0 -1
-1 +1
Deep Learning 2017, Brian Kulis & Kate Saenko 13
Multiplication vs convolution
Padded Input
Channels
0 -1 0 +1 +1
• Wecanaddanotherfilter,this time to detect opposite change with weights [-1 +1]
• Unique filters are called channels
-3
+2
+2 +2
-1 1 -1
+1
-1 0
+1
0
0 1
+1 0 -1
-1 +1
Deep Learning 2017, Brian Kulis & Kate Saenko 14
Multiplication vs convolution
Padded Input
Channels
To summarize, this layer has
• Input5x1,paddedto6x1
• Kernel 2 x 1 with weights [+1,-1] • Stride 2
simplified view
00
-3
1 • Output3x1
• No. channels K
+2 0 +2 0
+2 0
0 1
Deep Learning 2017, Brian Kulis & Kate Saenko 15
Convolutional Neural Networks
For images and other 2-D signals
Representing images
Fully connected
Reshape into a vector
Input Layer
…
Machine Learning 2017, Kate Saenko
17
…
2D Input: fully connected network
Vectorize input by copying rows into a single column
18
2D Input: fully connected network
Problem: shifting, scaling, and other distortion changes location of features
Shift left
19
2D Input: fully connected network
Not invariant to translation!
154 input change
from 2 shift left
77 : black to white 77 : white to black
20
Convolution layer in 2D
• detect the same feature at different positions in the input, e.g. image
• preserve input topology
features
Convolution layer in 2D
Convolve with
-1 0 1 -1 0 1 -1 0 1
Nonlinear f
Input image
Output map
Convolution layer in 2D
Convolve with
-1 0 1 -1 0 1 -1 0 1
Nonlinear f
Input image
w11 w12 w13 w21
w31
w11 w12 w13 w21 w22 w23 w31 w32 w33
Output looks like an image
𝑥11 𝑥12 𝑥13 𝑥21 𝑥22 𝑥23 𝑥31 𝑥32 𝑥33
w22 w23 w32 w33
a
Output map
𝑎 = 𝑓(𝑤11𝑥11 + 𝑤12𝑥12 + 𝑤13𝑥13 + ⋯ 𝑤33𝑥33)
What weights correspond to these output maps?
These are output maps before thresholding
Hint: filters look like the input they fire on
f (x, y) x
-1 0 1 -1 0 1 -1 0 1
f (x, y) y
-1 -1 -1 0 0 0 111
Deep Learning 2017, Brian Kulis & Kate Saenko
What will the output map look like?
Input
filter
What will the output map look like?
Here is Waldo☺
Output
filter
Stacking convolutional layers
• Each layer outputs multi-channel feature maps (like images) • Next layer learns filters on previous layer’s feature maps
channels
Deep Learning 2017, Brian Kulis & Kate Saenko
Pooling layers
• Convolution with stride > 1 reduces the size of the input
• Another way to downsize the feature map is with pooling • A pooling layer subsamples the input in each sub-window
• max-pooling: chose the max in a window • mean-pooling: take the average
Inputs Convolution Pooling
Pooling layer
• the pooling layers reduce the spatial resolution of each feature map
• Goal is to get a certain degree of shift and distortion invariance
Distortion invariance
smaller difference
large difference
Pooling layer
• the weight sharing is also applied in pooling layers • for mean/max pooling, no weights are needed
Feature map
Pooled output
Putting it all together…
Input image Convolution + Pooling RELU
Convolutional Neural Network
A better architecture for 2d signals
LeNet
Machine Learning 2017, Kate Saenko 33
Deep Convolutional Networks
The Unreasonable Effectiveness of Deep Features
Maximal activations of pool5 units
Rich visual structure of features deep in hierarchy.
[R-CNN]
conv5 DeConv visualization [Zeiler-Fergus]
Convolutional Neural Nets
Why they rule
Why CNNs rule: Sparsity
• CNNs have sparse interactions, because the kernel is smaller than the input
• E.g. in thousands or millions pixel image, can detect small meaningful features such as edges
• Very efficient computation!
• Forminputsandnoutputs,matrix multiplication requires O(m × n) runtime (per example)
• Forkconnectionstoeachoutput, need only O(k × n) runtime
• Deep layers have larger effective inputs, or receptive fields
Deep Learning 2017, Brian Kulis & Kate Saenko 36
Why CNNs rule: Parameter sharing
• Kernel weights are shared across all locations
• Statistically efficient – learn from more data
• Memory efficient – store only k parameters, since k<