程序代写代做代考 graph go chain deep learning flex algorithm Announcements

Announcements
Reminder: self-grading forms for ps1 and ps2 due 10/5 at midnight (Boston)
• ps3 out on Thursday, due 10/8 (1 week)
• LAB this week: go over solutions for the first two homeworks

Agglomerative Clustering Example (bottom-up clustering)
Image source: https://en.wikipedia.org/wiki/Hierarchical_clustering

K-Means for Image Compression
3

Choose subspace with minimal “information loss”
𝑢(1) ∈ 𝑅3
𝑢(1)
𝑢(2) ∈ 𝑅3
Reduce from 2-dimension to 1- dimension: Find a direction (a vector 𝑢(1) ) onto which to project the data, so as to minimize the projection error.
Reduce from n-dimension to K- dimension: Find K vectors
𝑢(1), 𝑢(2), … , 𝑢(𝐾) onto which to project the data so as to minimize the projection error.

PCA Solution
• The solution turns out to be the first K eigenvectors of the data covariance matrix (see Bishop 12.1 for details)
• Closed-form, use Singular Value Decomposition (SVD) on covariance matrix
5

What features to use?
Deep Learning 2017, Brian Kulis & Kate Saenko 6
Edges? Shapes? Color?

Today: Outline
• Neural networks: artificial neuron, MLP, sigmoid units; neuroscience inspiration, output vs hidden layers; linear vs nonlinear networks;
• Feed-forward networks
Deep Learning 2017, Brian Kulis & Kate Saenko 7

Intro to Neural Networks
Motivation

Recall: Logistic Regression
sigmoid/logistic function 1
Output is probability of label 1 given input
0.5
0
z
𝑝𝑦=1𝑥= 1
1 + 𝑒−𝜃𝑇𝑥
predict “ predict “
“ if “ if

Recall: Logistic Regression Cost
Logistic Regression Hypothesis:
𝜃: parameters
𝐷 = {𝑥𝑖,𝑦𝑖}: data
Logistic Regression Cost Function:
Goal: minimize cost

Cost: Intuition
Logistic regression cost function
If y = 1
0
1

Cost: Intuition
Logistic regression cost function
If y = 0
0
1

Decision boundary
x2
3 2
1
123
Non-linear decision boundaries
Predict “ “ if
x1
-1
x2
1
1 -1
Replace features with non-linear functions e.g. log, cosine, or polynomial
x1
Predict “y = 1“ if

Limitations of linear models
• Logistic regression and other linear models cannot handle nonlinear decision boundaries
• Must use non-linear feature transformations • Up to designer to specify which one
• Can we instead learn the transformation? • Yes, this is what neural networks do!
• A Neural network chains together many layers of “neurons” such as logistic units (logistic regression functions)
Deep Learning 2017, Brian Kulis & Kate Saenko 14

Neural Networks learn features
Deep Learning 2017, Brian Kulis & Kate Saenko 15
Image: http://www.amax.com/blog/wp-content/uploads/2015/12/blog_deeplearning3.jpg

Neurons in the Brain
Inspired “Artificial Neural Networks”
Neurons are cells that process chemical and electrical signals and transmit these signals to neurons and other types of cells

Neuron in the brain
dendrites “Input wire”
Can measure electrical activity (spike) of a single neuron by placing electrodes
axon “Output wire”
nucleus
Cell body
Image: http://webspace.ship.edu/cgboer/neuron.gif

Neural network in the brain
• Micro networks: several connected neurons perform sophisticated tasks: mediate reflexes, process sensory information, generate locomotion and mediate learning and memory.
• Macro networks: perform higher brain functions such as object recognition and cognition.
Deep Learning 2017, Brian Kulis & Kate Saenko 18

Logistic Unit as Artificial Neuron
Input
Multiply by
weights
0 -2 +4 0 +2
Sum Squash
Output

Logistic Unit as Artificial Neuron
Input
+4 +2
Multiply by weights
0 -2
Sum Squash
0 +4 -8 0
0 +2
+3 -2

Logistic Unit as Artificial Neuron
Input
Multiply by
+4 weights 0
+2 0 Sum Squash -2 0
Neurons learn patterns!
-2
0 +4 -8 0
+3 0 +2
-2
-2
0 +4 +8 1
+2 0 +2
+2

Artificial Neuron Learns Patterns
input 1
• Classify input into +4 class 0 or 1
input 2
-3
-2 10
• Teach neuron to predict correct
class label
• Detect presence of a simple “feature”
+2
0 -3
-2
class
+2 +4
0
class
Example
values decrease
other patterns

Neural Networks: Learning
Intuition

Artificial Neuron: Learning
Input Start with random
+4 weights +2
0 -2 +4 0 +2
Sum Squash
0 -40==1
-3 -2
activation class
Adjust weights

Artificial Neuron: Learning
Input
+4 +2
-3 -2
Multiply by weights
Sum Squash
0 +21==1
+1
0 +4
0 +1
activation class
Adjust weights

Artificial Neuron: Learning
Input Start with random
+4 weights +2
0 -2 +4 0 +2
Sum Squash
0 -40==1
-3 -2
activation
Adjust weights via gradient descent
class
Same as in logistic regression

Neural Networks: Learning
Multi-layer network

Artificial Neuron: simplify
Input
+4 +2
Multiply by weights
+1
Sum Squash
0
0 +4 +2 1
-3 -2
0 +1
activation

Artificial Neuron: simplify
Input
Weights
Output

Artificial Neural Network
Input
Input Layer
Hidden Layer
Output Layer
Output
Single Neuron
Neural Network
Deep Network: many hidden layers

Multi-layer perceptron (MLP)
• Just another name for a feed-forward neural network
• Logistic regression is a special case of the MLP with no hidden layer and sigmoid output
Deep Learning 2017, Brian Kulis & Kate Saenko 31

Neural Networks Learn Features
logistic regression unit == artificial neuron
chain several units together == neural network “earlier” units learn non-linear feature transformation
x2
1
simple neural network
x1
-1
x2
1
-1 123
h𝑥 =𝑔(𝜃+𝜃h(1) 𝑥 +𝜃h(2) 𝑥 +𝜃h(3) 𝑥)
Deep Learning 2017, Brian Kulis & Kate Saenko 32

Example
Deep Learning 2017, Brian Kulis & Kate Saenko 33

Training a neural net: Demo
Tensorflow playground
Deep Learning 2017, Brian Kulis & Kate Saenko 34

Artificial Neural Network:
general notation
𝑥1 Input Layer 𝑥=… x1
Hidden Layer Output Layer
h1 h 11
h2
h3
input
hidden layer activations
h𝑖 = 𝑔(Θ(𝑖)𝑥)
𝑔𝑧=1
1 + exp(−𝑧)
output
hΘ(x) = 𝑔(Θ(2)𝑎)
𝑥5
x2
x h1
1 0.5
3 2 x4
0x5 h1 3
weights
Θ(1) = 11 15 Θ(2) = 11 13 ⋮⋱⋮ ⋮⋱⋮
𝜃31 ⋯ 𝜃35 𝜃31 ⋯ 𝜃33
𝜃⋯𝜃 𝜃⋯𝜃

Cost function
Neural network:
training error
regularization

Gradient computation
Need code to compute: –

Cover next time!
Use “Backpropagation algorithm”
– Efficient way to compute
– Computes gradient incrementally by “propagating” backwards through the network

Network architectures
Feed-forward Recurrent Fully connected time
Layer 1 Layer 2 Layer 3Layer 4 Convolutional
output
hidden hidden hidden
input
output
input
output
input

Representing images
Fully connected
Reshape into a vector
Input Layer

Convolutional Neural Network
A better architecture for 2d signals
LeNet

Why Deep Learning?
The Unreasonable Effectiveness of Deep Features
Maximal activations of pool5 units
Rich visual structure of features deep in hierarchy.
[R-CNN]
conv5 DeConv visualization [Zeiler-Fergus]

Summary so far
• Neural network chains together many layers of “neurons” such as logistic units
• Hidden neurons learn more and more abstract non-linear features
Deep Learning 2017, Brian Kulis & Kate Saenko 43

Next Class
Neural Networks I: Learning:
Learning via gradient descent; computation graphs, backpropagation algorithm.
Reading: Bishop Ch 5.1-5.3