Today: Outline
• Neural networks: artificial neuron, MLP, sigmoid units; neuroscience inspiration, output vs hidden layers; linear vs nonlinear networks;
• Feed-forward networks
Deep Learning 2017, Brian Kulis & Kate Saenko 1
Intro to Neural Networks
Motivation
Recall: Logistic Regression
sigmoid/logistic function 1
Output is probability of label 1 given input
0.5
0
z
𝑝𝑦=1𝑥= 1
1 + 𝑒−𝜃𝑇𝑥
predict “ predict“
“ if “ if
Recall: Logistic Regression Cost
Logistic Regression Hypothesis:
𝜃: parameters
𝐷 = {𝑥𝑖,𝑦𝑖}: data
Logistic Regression Cost Function:
Goal: minimize cost
Cost: Intuition
Logistic regression cost function
If y = 1
0
1
Cost: Intuition
Logistic regression cost function
If y = 0
0
1
Decision boundary
x2
3 2
1
123
Non-linear decision boundaries
Predict “ “ if
x1
-1
x2
1
1 -1
Replace features with non-linear functions e.g. log, cosine, or polynomial
x1
Predict “y = 1“ if
Limitations of linear models
• Logistic regression and other linear models cannot handle nonlinear decision boundaries
• Must use non-linear feature transformations • Up to designer to specify which one
• Can we instead learn the transformation? • Yes, this is what neural networks do!
• A Neural network chains together many layers of “neurons” such as logistic units (logistic regression functions)
Deep Learning 2017, Brian Kulis & Kate Saenko 8
Neural Networks learn features
Deep Learning 2017, Brian Kulis & Kate Saenko 9
Image: http://www.amax.com/blog/wp-content/uploads/2015/12/blog_deeplearning3.jpg
Neurons in the Brain
Inspired “Artificial Neural Networks”
Neurons are cells that process chemical and electrical signals and transmit these signals to neurons and other types of cells
Neuron in the brain
dendrites “Input wire”
Can measure electrical activity (spike) of a single neuron by placing electrodes
axon “Output wire”
nucleus
Cell body
Image: http://webspace.ship.edu/cgboer/neuron.gif
Neural network in the brain
• Micro networks: several connected neurons perform sophisticated tasks: mediate reflexes, process sensory information, generate locomotion and mediate learning and memory.
• Macro networks: perform higher brain functions such as object recognition and cognition.
Deep Learning 2017, Brian Kulis & Kate Saenko 12
Logistic Unit as Artificial Neuron
Input
Multiply by
weights
0 -2 +4 0 +2
Sum Squash
Output
Logistic Unit as Artificial Neuron
Input
Multiply by
+4 weights
0 -2
+2
Sum Squash
0 +4 -8 0
0
+2
+3
-2
Logistic Unit as Artificial Neuron
Input
Multiply by
+4 weights 0
+2 0 Sum Squash -2 0
Neurons learn patterns!
-2 -2
0 +4 -8 0 0 +4 +8 1
+3
-2
00 +2
+2
+2
+2
Artificial Neuron Learns Patterns
input 1
• Classify input into +4 class 0 or 1
input 2
-3
-2 10
• Teach neuron to predict correct
class label
• Detect presence of a simple “feature”
+2
0 -3
-2
class
0
class
+2 +4
Example
values decrease
other patterns
Neural Networks: Learning
Intuition
Artificial Neuron: Learning
Input
+4 +2
-3
-2
Start with random weights
0 -2 +4 0 +2
Sum Squash
0 -40==1
activation class
Adjust weights
Artificial Neuron: Learning
Input
Multiply by
+4 weights +2
+1
0 +4
0 +1
Sum Squash
0 +21==1
-3
-2
activation class
Adjust weights
Artificial Neuron: Learning
Input
+4 +2
-3
-2
Start with random weights
0 -2 +4 0 +2
Sum Squash
0 -40==1
activation
Adjust weights via gradient descent
class
Same as in logistic regression
Neural Networks: Learning
Multi-layer network
Artificial Neuron: simplify
Input
Multiply by
+4 weights
+1
+2
Sum Squash
0
0 +4 +2 1
-3
-2
0
+1
activation
Artificial Neuron: simplify
Input
Weights
Output
Artificial Neural Network
Input
Input Layer
Hidden Layer
Output Layer
Output
Single Neuron
Neural Network
Deep Network: many hidden layers
Multi-layer perceptron (MLP)
• Just another name for a feed-forward neural network
• Logistic regression is a special case of the MLP with no hidden layer and sigmoid output
Deep Learning 2017, Brian Kulis & Kate Saenko 25
Neural Networks Learn Features
logistic regression unit == artificial neuron
chain several units together == neural network “earlier” units learn non-linear feature transformation
x2
1
simple neural network
x1
-1
x2
1
-1 123
h𝑥 =𝑔(𝜃+𝜃h(1) 𝑥 +𝜃h(2) 𝑥 +𝜃h(3) 𝑥)
Deep Learning 2017, Brian Kulis & Kate Saenko 26
Example
Deep Learning 2017, Brian Kulis & Kate Saenko 27
Training a neural net: Demo
Tensorflow playground
Deep Learning 2017, Brian Kulis & Kate Saenko 28
Artificial Neural Network:
general notation
𝑥1 Input Layer Hidden Layer Output Layer … x h1 h
input
hidden layer activations
h𝑖 = 𝑔(Θ(𝑖)𝑥)
𝑔𝑧=1
1 + exp(−𝑧)
output
hΘ(x) = 𝑔(Θ(2)𝑎)
𝑥=
𝑥5
111 x2
x h1 h 3 2 2
1 0.5
x4
0x5 h1
h3 𝜃⋯𝜃 𝜃⋯𝜃
2
𝜃31 ⋯ 𝜃35 𝜃31 ⋯ 𝜃33
weights
Θ(1) =
11 15 Θ(2) = 11 13 ⋮⋱⋮ ⋮⋱⋮
Cost function
Neural network:
training error
regularization
Gradient computation
Need code to compute: –
–
Cover next time!
Use “Backpropagation algorithm”
– Efficient way to compute
– Computes gradient incrementally by “propagating” backwards through the network
Network architectures
Feed-forward Recurrent Fully connected time→
Layer 1 Layer 2 Layer 3Layer 4 Convolutional
output
hidden hidden hidden
input
output
input
output
input
Representing images
Fully connected
Reshape into a vector
Input Layer
Convolutional Neural Network
A better architecture for 2d signals
LeNet
Why Deep Learning?
The Unreasonable Effectiveness of Deep Features
Maximal activations of pool5 units
Rich visual structure of features deep in hierarchy.
[R-CNN]
conv5 DeConv visualization [Zeiler-Fergus]
Summary so far
• Neural network chains together many layers of “neurons” such as logistic units
• Hidden neurons learn more and more abstract non-linear features
Deep Learning 2017, Brian Kulis & Kate Saenko 37