Introduction to Machine Learning Convolutional Neural Networks
Prof. Kutty
Input layer
Copyright By PowCoder代写 加微信 powcoder
É Neural Networks
architecture
Hidden layers
Fully connected (FC): each node is connected to all nodes from previous layer
Output layer
h ( x ̄ , W ) = f ( z
examples of activation functions:
• logistic
• σ(z) = !
• range0to1 • hyperbolic tangent
• tanh(z) = 2σ(2z) – 1
• range -1 to 1 • ReLU
• f(z) = max(0,z)
at each node:
weighted sum of inputs
non-linear transformation
Input layer
Hidden layer
Neural Networks
z(2) =W(1)x1 +W(1)x2 +W(1),h(2) =g(z(2)) 1 11 12 101 1
z(2) =W(1)x1 +W(1)x2 +W(1),h(2) =g(z(2)) 2 21 22 202 2
z(2) = W (1)x1 + W (1)x2 + W (1), h(2) = g(z(2)) x h(2) 3 31 32 30 3 3
1 1 Hidden layer essentially
Output layer transforms the input…
h(x ̄,W) = f(z(3)) (x ̄) = [h(2),h(2),h(2),1]
z(3) =W(2)h(2) +W(2)h(2) +W(2)h(2) +W(2) 11 1 12 2 13 3 10
h(x ̄,W) = f(z(3)) …so that we can learn a linear classifier in a
different feature space.
Training Neural Networks
Idea: use back-propagation
back-propagation = gradient descent + chain rule
Algorithm Overview
• Initialize parameters “̅ to small random values
• Foreachtraininginstance(#̅(“),$(“)) – make a prediction !($̅(“), &̅)
• called forward propagation
– measure the loss of the prediction !($̅(“), &̅) with respect to
the true label ((“)
• Loss(!(#) “(%̅(#), ‘̅))
– go through each layer in reverse to measure the error contribution of each connection
• called backward propagation
– tweak weight to reduce error (SGD update step)
SGD for a 2-layer NN
Idea: sample a point at random, nudge parameters toward values that would improve classification on that particular example
(0) Initialize parameters to small random values
(1) Select a point at random
(2) Update the parameters based on that point and the gradient:
̄ ̄ ̄ ✓(k+1) =✓(k) ⌘kr✓ ̄Loss(y(i)h(x ̄(i);✓))
(for a two-layer NN):
(2nd layer of weights) v(k+1) = v(k) + ⌘kyhj[[(1 yz) > 0]] jj
(1st layer of weights)
w(k+1) = w(k) + ⌘ky[[(1 yz) > 0]]vj[[zj > 0]]xi ji ji
Good news… universal approximation theorem
Bad news… prone to overfitting
Reducing Overfitting
1. Early stopping: interrupt training when performance on the validation set starts to decrease
2. L1 & L2 regularization: applied as before to all weights
3. Dropout
Srivastava, Hinton, Krizhevsky, Sutskever and Salakhutdinov
(a) Standard Neural Net (b) After applying dropout.
Figure 1: Dropout Neural Net Model. Left: A standard neural net with 2 hidden layers. Right: An example of a thinned net produced by applying dropout to the network on the left.
Crossed units have been dropped.
Vanishing/Exploding Gradients
• It turns out that the gradient in deep neural networks
is unstable, tending to either explode or vanish in earlier layers.
• e.g., sometimes gradients get smaller and smaller as the algorithm progresses down to the lower layers, this can leave the lower layer connection weights virtually unchanged and training never converges to a good solution
Convolutional Neural Networks
https://xkcd.com/1425/
Convolutional Neural Networks aka ConvNets or CNNs
• class of Neural Networks used primarily in vision
– image recognition and classification
– identifying faces, objects and traffic signs
3 x 224 × 224
3 color channels (RGB); depth is 3
CNNs exploit the inherent structure in images
• has the effect of reducing
#parameters
Example CNN architecture
https://www.mathworks.com/solutions/deep-learning/convolutional-neural-network.html
• Convolution (CONV)
• Rectified Linear Unit (ReLU)
• Pooling (POOL)
• Fully Connected Layer (FC)
CNNs: types of layers
• Convolution (CONV)
– each neuron in this layer is connected to a small region of the previous
– called ‘filters’
input image
• Rectified Linear Unit (ReLU)
– elementwise activation function max(0, %) – no parameters to learn here
• Pooling (POOL)
– downsampling
– no parameters to learn here
• Fully Connected Layer (FC)
w1x1 + … + wjxj
CNNs: CONV layer
outputlayer
Image source: https://github.com/vdumoulin/conv_arithmetic
CNNs: CONV layer input: image, filter
output: convolved feature or feature map filter
stride length = 1
example from: https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/
CNNs: CONV layer
input: image, filter
output: convolved feature or feature map
feature map
input image
w1x1 + … + wjxj
• each neuron (filter) in this layer is connected to a small region of the previous layer
• applied to all channels (i.e., each filter has same depth as image)
Note: the connectivity is local in space, but full along the input depth. Figure from : Fundamentals of Deep Learning by ,
T Your mice depth
image depth
each filter produces a new feature map
Figure from : Fundamentals of Deep Learning by ,
6 filters: each 3x3x3
output of depth 6
zero-padding
usually done to preserve original image size in CONV layer.
zero padding
zero padding 1, stride length 2
Image source: https://github.com/vdumoulin/conv_arithmetic
• Each filter produces a new feature map.
– Intuitively,networklearnsfiltersthatactivateondetailsininitiallayersand more ‘holistic’ features on later layers
• e.g., on first layer edge detection and higher layers wheel-like patterns
• Hyperparameters: – numberoffilters
– stridelength – filtersize
– zero-padding
• Efficiency concerns I can do
– parametersharing(sameweightsacrossalldepths)
CNNs: types of layers
• Convolution (CONV)
• Rectified Linear Unit (ReLU) • Pooling (POOL)
• Fully Connected Layer (FC)
CNNs: ReLU layer
f(z) = max(0,z)
images from: http://cs231n.github.io/convolutional-networks/; https://medium.com/data-science-group-iitr/building-a-convolutional-neural-network-in-python-with-tensorflow-d251c3ca8117
activation map
CNNs: types of layers
• Convolution (CONV)
• Rectified Linear Unit (ReLU) • Pooling (POOL)
• Fully Connected Layer (FC)
CNNs: Pooling layer
Idea: reduce dimensionality of feature map while retaining significant information. as a consequence, control overfitting
CONV; ReLU
In max pooling, the pooling operation is applied separately to each feature map so the depth dimension remains unchanged
example from: http://cs231n.github.io/convolutional-networks/#pool
CNNs: types of layers
• Convolution (CONV)
• Rectified Linear Unit (ReLU) • Pooling (POOL)
• Fully Connected Layer (FC)
CONV; ReLU; POOL
Neural Networks: Example
Softmax neurons
These give a real-valued output that is a smooth and bounded function of their total input.
– The outputs sum up to 1 (useful for classification problems) – They have nice derivatives which make analysis easier!
MI fieryelite 1
95 eÉ ey elite’s image src: https://developers.google.com/machine-learning/crash-course/multi-class-neural-networks/softmax
CNNs: types of layers
• Convolution (CONV)
• Rectified Linear Unit (ReLU) • Pooling (POOL)
• Fully Connected Layer (FC)
Neat visualization available here: http://scs.ryerson.ca/~aharley/vis/conv/flat.html
Training a CNN
• Initialize parameters
• Forward propagate a training example (image)
– i.e., convolution, ReLU, pooling and Fully Connected layers – compute class probabilities
• Calculate the loss at the output layer
• Use backprop to calculate error contribution of each
• update parameter values to minimize the output error.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com