Introduction to Machine Learning Training Neural Networks
Prof. Kutty
Review: Supervised Learning
Copyright By PowCoder代写 加微信 powcoder
Rn E penalty on
model complexity
• Perceptron
• with and without offset • convergence
• Regression
• linear regression with
squared loss
• closed form solution
• Regularization
• ridge regression
• closed form solution
• Decision trees
• Ensemble Methods
(Stochastic) Gradient Descent
• linear classifier with hinge loss
Support Vector Machines
• Soft Margin SVMs
• Kernel trick
• Neural Networks
Announcements
• Midterm evaluations will open on Feb. 21 and close at 11:59 pm (EST) on Mar. 5 (set by registrar – see your email to confirm dates/times).
• All lecture recordings should now be accessible through the lecture recordings tab (including the 3-part Lecture 8)
• The midterm exam will take place 7:00 – 9:00 pm ET on Wednesday 2/23.
• All students approved to take the alternate midterm and/or SSD accommodations
have been contacted and this list is finalized.
• There will be no lecture quiz due this week; no lecture on Wednesday (2/23); no discussion held this week
• Please see the calendar for up to date office hour information.
• Sample exam solutions have been released and are available on canvas. Please review it for your understanding.
Midterm Info
Midterm Info
Midterm Info
Midterm Info
• Please read the preamble (including Additional Instructions) carefully. This will be very similar to the preamble on the actual midterm exam.
• Since the exam is virtual so you may take it from any location
– quiet room: BBB 1670 and BBB 1690 from 7-9pm
– computer with a stable internet connection during the exam: https://caen.engin.umich.edu/computers/list/
• We will be available on zoom; link on exam, calendar and in announcement
– only join if you have questions!
Review: Supervised Learning
• Perceptron
• with and without offset • convergence
• Regression
• linear regression with
squared loss
• closed form solution
• Regularization
• ridge regression
• closed form solution
• Decision trees
• Ensemble Methods
(Stochastic) Gradient Descent
• linear classifier with hinge loss
Support Vector Machines
• Soft Margin SVMs
• Kernel trick
• Neural Networks
Introduction to
Deep Learning
Neural Networks
Input layer
architecture
Hidden layers
Output layer
h ( x ̄ , W ) = f ( z
28 x 28 = 784
at each node:
weighted sum of inputs
non-linear transformation
Single layer NN
Input layer bias/offset +1 !!
sign z Want to learn the weights “̅ = $!,$”,…,$# $
h#̅;%̅ ='(#+( =*
Mullen ! ! % !”#
-1 -1 -1 -1
Example: Two-layer NN
by convention this is called a two-layer network
since there are two sets of hidden weights h d
Input layer Hidden layer
#(“) “($) h(2) g1
Output layer
h ( x ̄ , W ) = f ( z ( 3 ) )
#(“) 2 x “$ h(2)
in+1 Notation:
!(*)is the weight of layer j, unit k, input i ‘(
Activation Functions
parameters:$̅= !-,!.,!/,!, 0
Input layer
activation function h(z)
examples of activation functions:
• threshold
• logistic
• σ(z) = ”
bias/offset
• range0to1
• hyperbolic tangent
• tanh(z) = 2σ(2z) – 1
• range -1 to 1 • ReLU
• f(z) = max(0,z)
weighted sum of inputs
non-linear transformation
Neural Networks
Input layer
z(2) =W(1)x1 +W(1)x2 +W(1),h(2) =g(z(2)) 1 11 12 101 1
z(2) =W(1)x1 +W(1)x2 +W(1),h(2) =g(z(2)) 2 21 22 202 2
Hidden layer
z(2) =W(1)x1 +W(1)x2 +W(1),h(2) =g(z(2)) 3 31 32 303 3
#(.)is the weight of layer j, unit k, input i ,-
Neural Networks
Input layer
z(2) =W(1)x1 +W(1)x2 +W(1),h(2) =g(z(2)) 1 11 12 101 1
z(2) =W(1)x1 +W(1)x2 +W(1),h(2) =g(z(2)) 2 21 22 202 2
(2) (1) (1) (1) (2) (2) z3 =W31 x1 +W32 x2 +W30 ,h3 =g(z3 )
Hidden layer
Neural Networks
Input layer
z(2) =W(1)x1 +W(1)x2 +W(1),h(2) =g(z(2)) 1 11 12 101 1
Hidden layer z(2) = W (1)x1 + W (1)x2 + W (1), h(2) = g(z(2)) 2 21 22 202 2
z(2) =W(1)x1 +W(1)x2 +W(1),h(2) =g(z(2)) h(2) 3 31 32 30 3 3
Output layer
z(3) =W(2)h(2) +W(2)h(2) +W(2)h(2) +W(2) 11 1 12 2 13 3 10
forward propagation
Neural Networks
Input layer
z(2) =W(1)x1 +W(1)x2 +W(1),h(2) =g(z(2)) 1 11 12 101 1
Hidden layer z(2) = W (1)x1 + W (1)x2 + W (1), h(2) = g(z(2)) 2 21 22 202 2
z(2) =W(1)x1 +W(1)x2 +W(1),h(2) =g(z(2)) h(2) 3 31 32 30 3 3
Output layer
z(3) =W(2)h(2) +W(2)h(2) +W(2)h(2) +W(2) 11 1 12 2 13 3 10
Note: f() is a potentially different activation function
Neural Networks
Input layer
z(2) =W(1)x1 +W(1)x2 +W(1),h(2) =g(z(2)) 1 11 12 101 1
Hidden layer z(2) = W (1)x1 + W (1)x2 + W (1), h(2) = g(z(2)) 2 21 22 202 2
z(2) = W (1)x1 + W (1)x2 + W (1), h(2) = g(z(2)) x h(2) 3 31 32 30 3 3
Output layer transforms the input…
h(x ̄,W) = f(z(3)) (x ̄) = [h(2),h(2),h(2),1]
z(3) =W(2)h(2) +W(2)h(2) +W(2)h(2) +W(2) 11 1 12 2 13 3 10
h(x ̄,W) = f(z(3)) …so that we can learn a linear classifier in a
1 1 Hidden layer essentially
z wi wit wie
different feature space.
Given the following data you learn a 2 layer NN with two hidden units, described by the following equations: hit E wi o
z(2) = W (1)x1 + W (1)x2 + W (1), h(2) = max{0, z(2)} 11112101 1
z(2) = W (1)x1 + W (1)x2 + W (1), h(2) = max{0, z(2)} 22122202 2
Show that the points in the new feature space are linearly separable.
x2 gg h(2) 2
z(2) = 0 1
z(2) = 0 2
Given the following data you learn a 2 layer NN with two hidden units, described by the following equations:
z(2) = W (1)x1 + W (1)x2 + W (1), h(2) = max{0, z(2)} 11112101 1
z(2) = W (1)x1 + W (1)x2 + W (1), h(2) = max{0, z(2)} 22122202 2
Show that the points in the new feature space are linearly separable.
x1 Universal
approximation
z(2) = 0 1
z(2) = 0 2
Neural Networks: Example
Training Neural Networks
Single layer NN
Input layer bias/offset +1 !!
x1 !” x2 !#
!#̅;%̅ ='(!#!+(%=* !”#
Want to learn the weights “̅ = $!,$”,…,$# $
-1 -1 -1 -1
Learning Neural Networks
Sn = {x ̄(i), y(i)}ni=1 x ̄ 2 Rd y 2 { 1, 1}
Goal: learn model parameters ✓ ̄ so as to minimize loss over the training examples
! #̅ = & ‘ Loss + ‘ , -̅ ‘ ; #̅
Overview of Optimization Procedure: SGD
Idea: sample a point at random, nudge parameters toward values that would improve classification on that particular example
(0) Initialize parameters to small random values
(1) Select a point at random
(2) Update the parameters based on that point and the gradient:
# ̅ ( , – ) ) = # ̅ ( , ) − 0 , ∇ 0/ L o s s + ‘ , – ̅ ‘ ; # ̅
Let’s compute this for a single layer NN
Single layer NN
Input layer bias/offset +1 !!
x1 !” x2 !#
!”#$$(&’) = !”#$$(&’) !’
∇’& 2344(1*) = ∇’& max{1 − 1*, 0}
“̅= $!,$”,…,$# $
Suppose the output is
h#̅;% =,(*)
where*=∑$ (#+( !”#!! %
Loss of this classifier wrt to example (#̅, 1) is given by 2344(1*)
assuming hinge loss chain rule
1−34>0 otherwise
SGD for a Single Layer NN
Idea: sample a point at random, nudge parameters toward values that would improve classification on that particular example
(0) Initialize parameters to small random values
(1) Select a point at random
(2) Update the parameters based on that point and the gradient:
“!(#$%) = “!(#) ++# ‘ (!
“!(#$%) =”!(#) ++# ‘(! 1−&’>0
# ̅ ( , – ) ) = # ̅ ( , ) − 0 , ∇ 0/ L o s s + ‘ , – ̅ ‘ ; # ̅
Learning Neural Networks
Idea: sample a point at random, nudge parameters toward values that would improve classification on that particular example
(0) Initialize parameters to small random values
(1) Select a point at random
(2) Update the parameters based on that point and the gradient:
# ̅ ( , – ) ) = # ̅ ( , ) − 0 , ∇ 0/ L o s s + ‘ , – ̅ ‘ ; # ̅
Let’s compute this for 2 layer NNs
Good luck on your midterm! Have fun on your break! See you on the other side!
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com