程序代写代做代考 kernel html deep learning Announcements

Announcements
Reminder: ps3 due Thursday 10/8 at midnight (Boston)
• ps4 out Thursday, due 10/15 (1 week)
• Lab this week – neural network learning
• ps3 self-grading form out Monday, due 10/19

Neural Networks III

Today: Outline
• Neural networks cont’d
• Types of networks: Feed-forward networks,
convolutional networks, recurrent networks
• ConvNets: multiplication vs convolution; filters (or kernels); convolutional layers; 1D and 2D convolution; pooling layers; LeNet, CIFAR10Net
Machine Learning 2017, Kate Saenko 3

Neural Networks III
Network Architectures

Neural networks: recap
𝑥 h𝑖
hΘ(𝑥)
Learn parameters via gradient descent
Backpropagation efficiently computes cost (forward pass) and gradient (backward pass)
h
𝑥
𝑦
Machine Learning 2019, Kate Saenko 5
hidden hidden
hidden

Network architectures
Feed-forward
Fully connected
Recurrent time
Convolutional
Machine Learning 2017, Kate Saenko
6
output
hidden hidden hidden
input
output
input
output
input
output
hidden hidden
hidden
input

Neural Networks III
Convolutional Architectures

Multiplication vs convolution
Input
+2 +2
0
0 +1
-1 0
Sum Squash
+3 1
activation
• Recall, a neuron can be thought of as learning to spot certain features in the input
• E.g., this neuron detects change from high to low (light to dark) between 3rd and 4th inputs
0
-3
-2
Deep Learning 2017, Brian Kulis & Kate Saenko 8

Multiplication vs convolution
Input
0
Sum Squash
What if the change happens between 1st and 2nd inputs? Neuron no longer activates
• Must have a new neuron for each new location of pattern???
• This is not efficient
• Solution: use convolution
instead of multiplication
0
-3
0
+2 +1 0 0
+2 +2
-1 0
activation
Deep Learning 2017, Brian Kulis & Kate Saenko 9

Multiplication vs convolution
Input
0
-3
+1
-1 +3
• New weights are of size 2 x 1; called filter, or kernel
• New output is the size of input minus 1 because of boundary
• New convolutional neurons all share the same weights! This is much more efficient; we learn the weights once instead of many times for each position
+1
+2 -1 +1
+2
1 -5 0
+2 -1
0 00
-1 0 +1
Deep Learning 2017, Brian Kulis & Kate Saenko 10

Multiplication vs convolution
Padded Input
0
-3
+1
-1 +3
• New output is the size of input minus 1 because of boundary
• We can fix the boundary effect by padding the input with 0 and adding one more neuron
+1
+2 -1 +1
+2
1 -5 0
-1 0 +1
+2 -1 +1
0 -1
0 00 +2 1
Deep Learning 2017, Brian Kulis & Kate Saenko 11

Multiplication vs convolution
Padded Input
0
-3
+1
-1 +3
• Note, we move the filter by 1 each time, this is called stride
+1
+2 -1 +1
+2
1 -5 0
-1 0 +1
+2 -1 +1
0 -1
0 00 +2 1
Deep Learning 2017, Brian Kulis & Kate Saenko 12

Multiplication vs convolution
Padded Input
0
-3
+2
+2 +2
+1 0 -1
• Note, we move the filter by 1 each time, this is called stride
• Stride can be larger, e.g. here is stride 2
+1
-1 +3
1
+1
-1 0
0
+2 1
Deep Learning 2017, Brian Kulis & Kate Saenko 13

Multiplication vs convolution
Padded Input
0 -1 0
+1
-1 1
+2
+2 +2
+1 0 -1
• We can add another filter, this time to detect opposite change with weights [-1 +1]
• Unique filters are called channels
-3
+1
+1
-1 +1
0 -1 0
-1 +1
0 1
Deep Learning 2017, Brian Kulis & Kate Saenko 14

Multiplication vs convolution
Padded Input
Channels
0 -1 0
+1
-1 1
+2
+2 +2
+1 0 -1
• We can add another filter, this time to detect opposite change with weights [-1 +1]
• Unique filters are called channels
+1
-3
-1 +1
0 -1 0
-1 +1
+1
0 1
Deep Learning 2017, Brian Kulis & Kate Saenko 15

Multiplication vs convolution
Padded Input
Channels
To summarize, this layer has
• Input 5 x 1, padded to 6 x 1
• Kernel 2 x 1 with weights [+1,-1]
• Stride 2
• Output 3 x 1
• No. channels K
simplified view
00
-3
+2 0 +2
0
1
+2 0
0 1
Deep Learning 2017, Brian Kulis & Kate Saenko 16

Convolutional Neural Networks
For images and other 2-D signals

Representing images
Fully connected
Reshape into a vector
Input Layer

Machine Learning 2017, Kate Saenko
18

2D Input: fully connected network
Vectorize input by copying rows into a single column
19

2D Input: fully connected network
Problem: shifting, scaling, and other distortion changes location of features
Shift left
20

2D Input: fully connected network
Not invariant to translation!
154 input change from 2 shift left 77 : black to white 77 : white to black
21

Convolution layer in 2D
• detect the same feature at different positions in the input, e.g. image
• preserve input topology
features

Convolution layer in 2D
Convolve with -1 0 1 Nonlinear f -1 0 1
-1 0 1
Input image
Output map

Convolution layer in 2D
Convolve with
-1 0 1 Nonlinear f -1 0 1
-1 0 1
Input image
w11 w12 w13 w21
w31
w11 w12 w13 w21 w22 w23 w31 w32 w33
Output looks like an image
w22 w23 w32 w33
𝑥11 𝑥12 𝑥13 𝑥21 𝑥22 𝑥23 𝑥31 𝑥32 𝑥33
a
Output map
𝑎=𝑓(𝑤 𝑥 +𝑤 𝑥 +𝑤 𝑥 +⋯𝑤 𝑥 ) 11 11 12 12 13 13 33 33

What weights correspond to these output maps?
These are output maps before thresholding
Hint: filters look like the input they fire on
f (x, y) x
-1 0 1 -1 0 1 -1 0 1
f (x, y) y
-1 -1 -1 0 0 0 111
Deep Learning 2017, Brian Kulis & Kate Saenko

What will the output map look like?
Input
filter

What will the output map look like?
Here is Waldo
Output
filter

Stacking convolutional layers
• Each layer outputs multi-channel feature maps (like images) • Next layer learns filters on previous layer’s feature maps
channels
Deep Learning 2017, Brian Kulis & Kate Saenko

Pooling layers
• Convolution with stride > 1 reduces the size of the input
• Another way to downsize the feature map is with pooling • A pooling layer subsamples the input in each sub-window
• max-pooling: chose the max in a window • mean-pooling: take the average
Inputs Convolution Pooling

Pooling layer
• the pooling layers reduce the spatial resolution of each feature map
• Goal is to get a certain degree of shift and distortion invariance

Distortion invariance
smaller difference
large difference

Pooling layer
• the weight sharing is also applied in pooling layers • for mean/max pooling, no weights are needed
Feature map
Pooled output

Putting it all together…
Input image Convolution Pooling + RELU

Convolutional Neural Network
A better architecture for 2d signals
LeNet
Machine Learning 2017, Kate Saenko 34

Deep Convolutional Networks
The Unreasonable Effectiveness of Deep Features
Maximal activations of pool5 units
Rich visual structure of features deep in hierarchy.
[R-CNN]
conv5 DeConv visualization [Zeiler-Fergus]

Convolutional Neural Nets
Why they rule

Why CNNs rule: Sparsity
• CNNs have sparse interactions, because the kernel is smaller than the input
• E.g. in thousands or millions pixel image, can detect small meaningful features such as edges
• Very efficient computation!
• For m inputs and n outputs, matrix multiplication requires O(m × n) runtime (per example)
• For k connections to each output, need only O(k × n) runtime
• Deep layers have larger effective inputs, or receptive fields
Deep Learning 2017, Brian Kulis & Kate Saenko 37

Why CNNs rule: Parameter sharing
• Kernel weights are shared across all locations
• Statistically efficient – learn from more data
• Memory efficient – store only k parameters, since k<