CS计算机代考程序代写 deep learning GPU matlab algorithm CS 4610/5335

CS 4610/5335
Deep Learning and Computer Vision
Robert Platt Northeastern University
Material adapted from:
1. Lawson Wong, CS 5100

Use features (x) to predict targets (y)
Classification
Classification
Targets y are now either: – Binary: {0, 1}
– Multi-class: {1, 2, …, K}
We will focus on binary case (Ex5 Q6 covers multi-class)
2

Classification
Focus: Supervised learning (e.g., regression, classification) Use features (x) to predict targets (y)
Input: Dataset of n samples: {x(i), y(i)}, i = 1, …, n
Each x(i) is a p-dimensional vector of feature values
Output: Hypothesis hθ(x) in some hypothesis class H
H is parameterized by d-dim. parameter vector θ
Goal: Find the best hypothesis θ* within H
What does “best” mean? Optimizes objective function:
J(θ): Error fn. L(pred, y): Loss fn.
A learning algorithm is the procedure for optimizing J(θ)
3

Dendrite
Cell body
Axon Terminal
Biological neuron
Node of Ranvier
Nucleus
Axon
Schwann cell Myelin sheath
4

Artificial neuron
McCulloch-Pitts model (1943) (fixed weights) Rosenblatt (1957) (learnable weights + bias term) Learning algorithm: Perceptron
5

Artificial neuron
Artificial neuron can represent basic logic gates (assume threshold fires when weighted sum ≥ 0)
6

Artificial neural networks
Artificial neuron can represent basic logic gates (assume threshold fires when weighted sum ≥ 0)
Artificial neural network (ANN) can represent any logical circuit / function!
7

Artificial neural networks
How do we train a neuron?
8

Artificial neural networks
How do we train a neuron?
Parameters: w (on input links) Hypothesis:Output=g(w0 +w1 a1 +…+wp ap )
Objective: Error / loss function between output and target y
9

Artificial neural networks
Objective: Error / loss function between output and target y g = Hard threshold: Perceptron algorithm
Works well for single neurons, but not for networks
10

Artificial neural networks
Objective: Error / loss function between output and target y g = Hard threshold: Perceptron algorithm
Works well for single neurons, but not for networks How about gradient descent? Need smooth g
11

How about gradient descent?
Need smooth g
Many choices of activation functions!
Artificial neural networks
12

How about gradient descent?
Need smooth g
Many choices of activation functions!
Most popular: Rectified linear unit (ReLU)
Artificial neural networks
13

How about gradient descent?
Need smooth g
Many choices of activation functions!
Most popular: Rectified linear unit (ReLU)
We will consider sigmoid (logistic)
Artificial neural networks
14

Artificial neural networks
Parameters: w (on input links) Hypothesis:Output=σ(w0 +w1 a1 +…+wp ap )
Seem familiar?
15

Artificial neural networks
Parameters: w (on input links) Hypothesis:Output=σ(w0 +w1 a1 +…+wp ap )
Seem familiar? Logistic regression = learning single neuron
16

Artificial neural networks
Input: x (x0 = bias) Parameters: w
Weighted input: z11 (sum w*x) Activation function: σ (sigmoid) Activation: a11 = σ(z11)
Prediction = a11
17

Artificial neural networks
Assume squared-error loss Compute gradient, perform SGD
18

Artificial neural networks
Input: x (x0 = bias) Parameters: v (layer 1), w (layer 2)
Weighted input: z21 (sum w*x) Activation function: σ (sigmoid) Activation: a21 = σ(z21)
Prediction = a21
19

Artificial neural networks
Assume squared-error loss Compute gradient, perform SGD
20

Artificial neural networks
21

Artificial neural networks
22

Artificial neural networks
23

Artificial neural networks
Underlined terms are the same!
24

Artificial neural networks
Underlined terms are the same!
They will appear in every gradient term in all layers
Avoid recomputing this term
25

Artificial neural networks
Underlined terms are the same!
They will appear in every gradient term in all layers
Avoid recomputing this term: Key idea of backpropagation
Bryson & Ho (1969)
Linnainmaa (1970)
Werbos (1974)
Rumelhart, Hinton, Williams (1986)
26

Artificial neural networks
Underlined terms are the same!
They will appear in every gradient term in all layers
Avoid recomputing this term: Key idea of backpropagation
Learning with backprop
= using gradient descent to learn neural networks,
where gradients are computed efficiently
27

Artificial neural networks
28

Artificial neural networks
Backpropagation Forward pass:
Compute activations (a) Backward pass:
Compute errors (Δ)) Adjust weights
29

Convolutional layers
Deep multi-layer perceptron networks – general purpose
– involve huge numbers of weights
We want:
– special purpose network for image and NLP data – fewer parameters
– fewer local minima
Answer: convolutional layers!

Convolutional layers
Image
stride Filter size
pixels

Convolutional layers
All of these weight groupings are tied to each other
Image
stride Filter size
pixels

Convolutional layers
All of these weight groupings are tied to each other
Image
stride Filter size
pixels
Because of the way weights are tied together
– reduces number of parameters (dramatically) – encodes a prior on structure of data
In practice, convolutional layers are essential to computer vision…

Convolutional layers Two dimensional example:
Why do you think they call this “convolution”?

Think-pair-share
What would the convolved feature map be for this kernel?

Example: MNIST digit classification with LeNet
MNIST dataset: images of 10,000 handwritten digits Objective: classify each image as the corresponding digit

Example: MNIST digit classification with LeNet LeNet:
two convolutional layers two fully connected layers – conv, relu, pooling – relu
– last layer has logistic activation function

Example: MNIST digit classification with LeNet Load dataset, create train/test splits

Example: MNIST digit classification with LeNet Define the neural network structure:
Input
Conv1
Conv2
FC1 FC2

Example: MNIST digit classification with LeNet
Train network, classify test set, measure accuracy
– notice we test on a different set (a holdout set) than we trained on
Using the GPU makes a huge differece…

Deep learning packages
You don’t need to use Matlab (obviously) Tensorflow is probably the most popular platform Caffe and Theano are also big

Another example: image classification w/ AlexNet
ImageNet dataset: millions of images of objects
Objective: classify each image as the corresponding object (1k categories in ILSVRC)

Another example: image classification w/ AlexNet
AlexNet has 8 layers: five conv followed by three fully connected

Another example: image classification w/ AlexNet
AlexNet has 8 layers: five conv followed by three fully connected

Another example: image classification w/ AlexNet
AlexNet won the 2012 ILSVRC challenge – sparked the deep learning craze