Introduction to Machine Learning Neural Networks
Deep Learning
Prof. Kutty
Copyright By PowCoder代写 加微信 powcoder
Announcements
• Midterm exam Wed Feb 23 at 7pm sharp (remote)
• Midterm conflicts/accommodation emails are out – email us asap if
you haven’t received it!
• Midterm sample exam out soon!
• Midterm review this discussion and sample exam review this Sunday
• No discussions next week (i.e., week of midterm)
Today’s Agenda
• Ensemble Methods – Boosting
• Intro to Deep Learning
– FeedForwardNeuralNetworks
Training phase
decision stump 1
decision stump 2
To classify a new example
new example x decision stump 1
decision stump 2
decision stump M
predicted label y
Training Data
(x(i), y(i)) i ∈ {1, …, n}
decision stump M
weighted samples
weighted samples
weighted prediction
Example: AdaBoost
Ei winces Training Phase M = 3
I in this example
find θ!! = argmin *! where it represents a decision sharp #” !
! ‘ $%& !(&
*̂ =8-, . / ≠h(4̅ ;θ )
totweight of classified
where* =∑’ -, . /$ ≠h(4̅($);θ)
incorrectly
by the mtn decisionstump
1 1−*̂ 9! = ln !
score assignedto the mm classifier
“! #=”!!(&#exp−)$*!h-̅$;θ/! ! 1!+
am l eam or amhis e
whendatapoint i is correctlyclassified
by in otherwise
Decision stumps as a weak classifier
h($̅;θ&)=sign()!($” −)#))
whereθ&=[k,) ,) ]T # !
OoE R O E 1
coordinate
– –oish’s+ + n ++
AdaBoost: Classify
sign(h$ $̅ ) hm h Ici
a cis ensembleprediction
Émpredictionof me weak classifier
$ & h$$̅ =∑%&!α%h($̅;θ%)
6%≥0 E I I for 19 3
AdaBoost Algorithm
• S e t “! 2 # = 43 f o r # = 1 , … , ( • for)=1,…,*
– find θ+5 = argmin 35
where3 =∑4 “! # 58 ≠h(:̅(8);θ) 5 893 5:3
– L e t 3 ̂ = ∑ 4 “! # 5 8 ≠ h ( : ̅ ( 8 ) ; θ+ )
– L e t > 5 = 3 l n 3 : ?> ! = ?>!
– U p d a t e “! # = @ A B : C ” D ! E ̅ ” = G! ! # $ 8 @ A B : C ” H ! D E ̅ ” ; θ6 ! 5F! F!%
for # = 1, … , (
• Output classifier h :̅ = ∑I α h(:̅; θ+ ) I 5935 5
AdaBoost Algorithm
• :9 ; = ‘ ( ) * + ! , ” . ̅ ! = 10 ” # $ 2 ‘ ( ) * + ! 3 ” , . ̅ ! ; θ5 ” %/” /”%
for ; = 1,…,? ImCi Inexp
m I exercise left as
É omt am hereom
yo hm aid y exp
laci exp y’t am
yes am ha Em
y’t x head O
Bydefinition amheadom
AdaBoost optimizes for exponential loss
AdaBoost optimizes exponential loss
• Exponential Loss
• Ensemble classifier
D = exp(−D)
h I : ̅ = I α J h ( : ̅ ; θ+ J ) J93
• Consider some intermediate round ), 1 ≤ ) ≤ *
• Want to pick >5 , K̅ 5 ensemble classifier that minimizes exponential
L >5, K̅ 5 = I ABCCKEL(5(8)h5(:̅(8)))
h 5 : ̅ = α 5 h : ̅ ; θ+ 5 + I α J h : ̅ ; θ+ J J93
= α5 h(:̅; θ+5) + h5:3 :̅
AdaBoost optimizes exponential loss
• Exponential Loss
• Consider some intermediate round ,, 1 ≤ , ≤ 0
!”##&'( $ =exp(−$)
h) 3̅ =α)h(3̅;θ6))+h)*+ 3̅
• Want to pick 9), :̅ ) that minimizes exponential loss .
; 9), :̅ ) = < !"##/01(=(,)h)(3̅(,))) ,-+
= ∑. exp(−=(,)h (3̅(,))) ,-+ ) 6
=∑. exp(−=,α h(3̅(,);θ ))exp(−=,h
,-+ ) ) )*+
= ∑ . e x p − = , α h 3 ̅ , ; θ6 ? ( @ ) ,-+ ) ))*+
• Want to pick 9), :̅ ) that minimizes
; 9 , : ̅ ) = ∑ . e x p − = , α h 3 ̅ , ; θ6 ?A ( @ )
) ,-+ ) ))*+
etxponential loss of the
ensemble classifier
Notice that
loss associated with point Z Im Li Iexp am if sea is
expm Iyamsci
ii h Il 0cm
iD yci.hfz.is Elm
Ci nm if exp
I is misclassified
correctly classified
exp l am ZJm
h expC am I Tm e gli I
I Tm Ici fly hfielis gcm'D
expolauinr
non negative
min J Em dm
want E'm argufy
hint E'm'D This is training the weak classifies
L e t E m I I L i 2C y h a c i s o c m
expc.am to 0 to find a
Mno in Substitute
main mm explain
expc at'm explain7Em
theoptimalain
ifffsum of.ua
unpronounced
partial derivative wrZ
expCam exp2mgexpCam Fm
explain 7 Em 0 l Emy
exp C af I II
Introduction to
Deep Learning
Neural Networks
Ed generous
Neural Networks
Input layer
architecture
Hidden layers
28 x 28 = 784 I
Output layer
h ( x ̄ , W ) = f ( z ( 3 ) )
thresholdingfunction activationfunction
784 in neurons
your inputlayer
forward network
Neural Networks: Example
multiclass classificationb
Single layer NN
Input layer bias/offset +1 4"
x1 4# x2 4$
w hit watact
h#̅;%̅ ='(!#!+(%=* !"#
Want to learn the weights 6̅ = 8&,8',...,8( )
-1 -1 -1 -1
Example: Two-layer NN
Input layer
-(&) x1 &&
Hidden layer
Output layer
by convention this is called a two-layer network since there are two sets of hidden weights
fully connected
h ( x ̄ , W ) = f ( z ( 3 ) ) our
:(<)is the weight of layer j, unit k, input i "2
Activation Functions
Input layer
activation function h(z)
parameters: )̅ = :#, :!, :?, :> @
examples of activation functions:
• threshold
• logistic
• σ(z) = &
bias/offset
• range0to1
• hyperbolic tangent
• tanh(z) = 2σ(2z) – 1
• range -1 to 1
• f(z) = max(0,z)
Rectified linear unit
weighted sum of inputs
non-linear transformation
Neural Networks
ifz(2) = W (1)x1 + W (1)x2 + W (1), h(2) = g(z(2)) 1 11 12 101 1
Hidden layer
Input layer
-(4)is the weight of layer j, unit k, input i 3$
Neural Networks
Input layer
Hidden layer
z(2) =W(1)x1 +W(1)x2 +W(1),h(2) =g(z(2)) 1 11 12 101 1
z(2) =W(1)x1 +W(1)x2 +W(1),h(2) =g(z(2)) 2 21 22 202 2
z(2) = W (1)x1 + W (1)x2 + W (1), h(2) = g(z(2)) 3 31 32 303 3
-(4)is the weight of layer j, unit k, input i 3$
Neural Networks
Input layer
z(2) =W(1)x1 +W(1)x2 +W(1),h(2) =g(z(2)) 1 11 12 101 1
z(2) =W(1)x1 +W(1)x2 +W(1),h(2) =g(z(2)) 2 21 22 202 2
Hidden layer
z(2) = W (1)x1 + W (1)x2 + W (1), h(2) = g(z(2)) h(2) 3 31 32 30 3 3
g activation function
Neural Networks
Input layer
z(2) =W(1)x1 +W(1)x2 +W(1),h(2) =g(z(2)) 1 11 12 101 1
Hidden layer z(2) = W (1)x1 + W (1)x2 + W (1), h(2) = g(z(2)) 2 21 22 202 2
z(2) =W(1)x1 +W(1)x2 +W(1),h(2) =g(z(2)) h(2) 3 31 32 30 3 3
Output layer
z(3) =W(2)h(2) +W(2)h(2) +W(2)h(2) +W(2) +1 11 1 12 2 13 3 10
Note: f() is a potentially different activation function
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com