Announcements
Reminder: self-grading forms for ps1 and ps2 due 10/5 at midnight (Boston)
• ps3 out on Thursday, due 10/8 (1 week)
• LAB this week: go over solutions for the first two homeworks
Agglomerative Clustering Example (bottom-up clustering)
Image source: https://en.wikipedia.org/wiki/Hierarchical_clustering
K-Means for Image Compression
3
Choose subspace with minimal “information loss”
𝑢(1) ∈ 𝑅3
𝑢(1)
𝑢(2) ∈ 𝑅3
Reduce from 2-dimension to 1- dimension: Find a direction (a vector 𝑢(1) ) onto which to project the data, so as to minimize the projection error.
Reduce from n-dimension to K- dimension: Find K vectors
𝑢(1), 𝑢(2), … , 𝑢(𝐾) onto which to project the data so as to minimize the projection error.
PCA Solution
• The solution turns out to be the first K eigenvectors of the data covariance matrix (see Bishop 12.1 for details)
• Closed-form, use Singular Value Decomposition (SVD) on covariance matrix
5
What features to use?
Deep Learning 2017, Brian Kulis & Kate Saenko 6
Edges? Shapes? Color?
Today: Outline
• Neural networks: artificial neuron, MLP, sigmoid units; neuroscience inspiration, output vs hidden layers; linear vs nonlinear networks;
• Feed-forward networks
Deep Learning 2017, Brian Kulis & Kate Saenko 7
Intro to Neural Networks
Motivation
Recall: Logistic Regression
sigmoid/logistic function 1
Output is probability of label 1 given input
0.5
0
z
𝑝𝑦=1𝑥= 1
1 + 𝑒−𝜃𝑇𝑥
predict “ predict “
“ if “ if
Recall: Logistic Regression Cost
Logistic Regression Hypothesis:
𝜃: parameters
𝐷 = {𝑥𝑖,𝑦𝑖}: data
Logistic Regression Cost Function:
Goal: minimize cost
Cost: Intuition
Logistic regression cost function
If y = 1
0
1
Cost: Intuition
Logistic regression cost function
If y = 0
0
1
Decision boundary
x2
3 2
1
123
Non-linear decision boundaries
Predict “ “ if
x1
-1
x2
1
1 -1
Replace features with non-linear functions e.g. log, cosine, or polynomial
x1
Predict “y = 1“ if
Limitations of linear models
• Logistic regression and other linear models cannot handle nonlinear decision boundaries
• Must use non-linear feature transformations • Up to designer to specify which one
• Can we instead learn the transformation? • Yes, this is what neural networks do!
• A Neural network chains together many layers of “neurons” such as logistic units (logistic regression functions)
Deep Learning 2017, Brian Kulis & Kate Saenko 14
Neural Networks learn features
Deep Learning 2017, Brian Kulis & Kate Saenko 15
Image: http://www.amax.com/blog/wp-content/uploads/2015/12/blog_deeplearning3.jpg
Neurons in the Brain
Inspired “Artificial Neural Networks”
Neurons are cells that process chemical and electrical signals and transmit these signals to neurons and other types of cells
Neuron in the brain
dendrites “Input wire”
Can measure electrical activity (spike) of a single neuron by placing electrodes
axon “Output wire”
nucleus
Cell body
Image: http://webspace.ship.edu/cgboer/neuron.gif
Neural network in the brain
• Micro networks: several connected neurons perform sophisticated tasks: mediate reflexes, process sensory information, generate locomotion and mediate learning and memory.
• Macro networks: perform higher brain functions such as object recognition and cognition.
Deep Learning 2017, Brian Kulis & Kate Saenko 18
Logistic Unit as Artificial Neuron
Input
Multiply by
weights
0 -2 +4 0 +2
Sum Squash
Output
Logistic Unit as Artificial Neuron
Input
+4 +2
Multiply by weights
0 -2
Sum Squash
0 +4 -8 0
0 +2
+3 -2
Logistic Unit as Artificial Neuron
Input
Multiply by
+4 weights 0
+2 0 Sum Squash -2 0
Neurons learn patterns!
-2
0 +4 -8 0
+3 0 +2
-2
-2
0 +4 +8 1
+2 0 +2
+2
Artificial Neuron Learns Patterns
input 1
• Classify input into +4 class 0 or 1
input 2
-3
-2 10
• Teach neuron to predict correct
class label
• Detect presence of a simple “feature”
+2
0 -3
-2
class
+2 +4
0
class
Example
values decrease
other patterns
Neural Networks: Learning
Intuition
Artificial Neuron: Learning
Input Start with random
+4 weights +2
0 -2 +4 0 +2
Sum Squash
0 -40==1
-3 -2
activation class
Adjust weights
Artificial Neuron: Learning
Input
+4 +2
-3 -2
Multiply by weights
Sum Squash
0 +21==1
+1
0 +4
0 +1
activation class
Adjust weights
Artificial Neuron: Learning
Input Start with random
+4 weights +2
0 -2 +4 0 +2
Sum Squash
0 -40==1
-3 -2
activation
Adjust weights via gradient descent
class
Same as in logistic regression
Neural Networks: Learning
Multi-layer network
Artificial Neuron: simplify
Input
+4 +2
Multiply by weights
+1
Sum Squash
0
0 +4 +2 1
-3 -2
0 +1
activation
Artificial Neuron: simplify
Input
Weights
Output
Artificial Neural Network
Input
Input Layer
Hidden Layer
Output Layer
Output
Single Neuron
Neural Network
Deep Network: many hidden layers
Multi-layer perceptron (MLP)
• Just another name for a feed-forward neural network
• Logistic regression is a special case of the MLP with no hidden layer and sigmoid output
Deep Learning 2017, Brian Kulis & Kate Saenko 31
Neural Networks Learn Features
logistic regression unit == artificial neuron
chain several units together == neural network “earlier” units learn non-linear feature transformation
x2
1
simple neural network
x1
-1
x2
1
-1 123
h𝑥 =𝑔(𝜃+𝜃h(1) 𝑥 +𝜃h(2) 𝑥 +𝜃h(3) 𝑥)
Deep Learning 2017, Brian Kulis & Kate Saenko 32
Example
Deep Learning 2017, Brian Kulis & Kate Saenko 33
Training a neural net: Demo
Tensorflow playground
Deep Learning 2017, Brian Kulis & Kate Saenko 34
Artificial Neural Network:
general notation
𝑥1 Input Layer 𝑥=… x1
Hidden Layer Output Layer
h1 h 11
h2
h3
input
hidden layer activations
h𝑖 = 𝑔(Θ(𝑖)𝑥)
𝑔𝑧=1
1 + exp(−𝑧)
output
hΘ(x) = 𝑔(Θ(2)𝑎)
𝑥5
x2
x h1
1 0.5
3 2 x4
0x5 h1 3
weights
Θ(1) = 11 15 Θ(2) = 11 13 ⋮⋱⋮ ⋮⋱⋮
𝜃31 ⋯ 𝜃35 𝜃31 ⋯ 𝜃33
𝜃⋯𝜃 𝜃⋯𝜃
Cost function
Neural network:
training error
regularization
Gradient computation
Need code to compute: –
–
Cover next time!
Use “Backpropagation algorithm”
– Efficient way to compute
– Computes gradient incrementally by “propagating” backwards through the network
Network architectures
Feed-forward Recurrent Fully connected time
Layer 1 Layer 2 Layer 3Layer 4 Convolutional
output
hidden hidden hidden
input
output
input
output
input
Representing images
Fully connected
Reshape into a vector
Input Layer
Convolutional Neural Network
A better architecture for 2d signals
LeNet
Why Deep Learning?
The Unreasonable Effectiveness of Deep Features
Maximal activations of pool5 units
Rich visual structure of features deep in hierarchy.
[R-CNN]
conv5 DeConv visualization [Zeiler-Fergus]
Summary so far
• Neural network chains together many layers of “neurons” such as logistic units
• Hidden neurons learn more and more abstract non-linear features
Deep Learning 2017, Brian Kulis & Kate Saenko 43
Next Class
Neural Networks I: Learning:
Learning via gradient descent; computation graphs, backpropagation algorithm.
Reading: Bishop Ch 5.1-5.3