CS代考 Introduction to Machine Learning Neural Networks

Introduction to Machine Learning Neural Networks
Deep Learning
Prof. Kutty

Announcements
• Midterm exam Wed Feb 23 at 7pm sharp (remote)
• Midterm conflicts/accommodation emails are out – email us asap if
you haven’t received it!
• Midterm sample exam out soon!
• Midterm review this discussion and sample exam review this Sunday
• No discussions next week (i.e., week of midterm)

Today’s Agenda
• Ensemble Methods – Boosting
• Intro to Deep Learning
– FeedForwardNeuralNetworks

Training phase
decision stump 1
decision stump 2
To classify a new example
new example x decision stump 1
decision stump 2
decision stump M
predicted label y
Training Data
(x(i), y(i)) i ∈ {1, …, n}
decision stump M
weighted samples
weighted samples
weighted prediction

Example: AdaBoost
Ei winces Training Phase M = 3
I in this example
find θ!! = argmin *! where it represents a decision sharp #” !
! ‘ $%& !(&
*̂ =8-, . / ≠h(4̅ ;θ )
totweight of classified
where* =∑’ -, . /$ ≠h(4̅($);θ)
incorrectly
by the mtn decisionstump
1 1−*̂ 9! = ln !
score assignedto the mm classifier
“! #=”!!(&#exp−)$*!h-̅$;θ/! ! 1!+
am l eam or amhis e
whendatapoint i is correctlyclassified
by in otherwise

Decision stumps as a weak classifier
h($̅;θ&)=sign()!($” −)#))
whereθ&=[k,) ,) ]T # !
OoE R O E 1
coordinate
– –oish’s+ + n ++

AdaBoost: Classify
sign(h$ $̅ ) hm h Ici
a cis ensembleprediction
Émpredictionof me weak classifier
$ & h$$̅ =∑%&!α%h($̅;θ%)
6%≥0 E I I for 19 3

AdaBoost Algorithm
• S e t “! 2 # = 43 f o r # = 1 , … , ( • for)=1,…,*
– find θ+5 = argmin 35
where3 =∑4 “! # 58 ≠h(:̅(8);θ) 5 893 5:3
– L e t 3 ̂ = ∑ 4 “! # 5 8 ≠ h ( : ̅ ( 8 ) ; θ+ )
– L e t > 5 = 3 l n 3 : ?> ! = ?>!
– U p d a t e “! # = @ A B : C ” D ! E ̅ ” = G! ! # $ 8 @ A B : C ” H ! D E ̅ ” ; θ6 ! 5F! F!%
for # = 1, … , (
• Output classifier h :̅ = ∑I α h(:̅; θ+ ) I 5935 5

AdaBoost Algorithm
• :9 ; = ‘ ( ) * + ! , ” . ̅ ! = 10 ” # $ 2 ‘ ( ) * + ! 3 ” , . ̅ ! ; θ5 ” %/” /”%
for ; = 1,…,? ImCi Inexp
m I exercise left as
É omt am hereom
yo hm aid y exp
laci exp y’t am
yes am ha Em
y’t x head O
Bydefinition amheadom

AdaBoost optimizes for exponential loss

AdaBoost optimizes exponential loss
• Exponential Loss
• Ensemble classifier
D = exp(−D)
h I : ̅ = I α J h ( : ̅ ; θ+ J ) J93
• Consider some intermediate round ), 1 ≤ ) ≤ *
• Want to pick >5 , K̅ 5 ensemble classifier that minimizes exponential
L >5, K̅ 5 = I ABCCKEL(5(8)h5(:̅(8)))
h 5 : ̅ = α 5 h : ̅ ; θ+ 5 + I α J h : ̅ ; θ+ J J93
= α5 h(:̅; θ+5) + h5:3 :̅

AdaBoost optimizes exponential loss
• Exponential Loss
• Consider some intermediate round ,, 1 ≤ , ≤ 0
!”##&'( $ =exp(−$)
h) 3̅ =α)h(3̅;θ6))+h)*+ 3̅
• Want to pick 9), :̅ ) that minimizes exponential loss .
; 9), :̅ ) = < !"##/01(=(,)h)(3̅(,))) ,-+ = ∑. exp(−=(,)h (3̅(,))) ,-+ ) 6 =∑. exp(−=,α h(3̅(,);θ ))exp(−=,h ,-+ ) ) )*+ = ∑ . e x p − = , α h 3 ̅ , ; θ6 ? ( @ ) ,-+ ) ))*+ • Want to pick 9), :̅ ) that minimizes ; 9 , : ̅ ) = ∑ . e x p − = , α h 3 ̅ , ; θ6 ?A ( @ ) ) ,-+ ) ))*+ etxponential loss of the ensemble classifier Notice that loss associated with point Z Im Li Iexp am if sea is expm Iyamsci ii h Il 0cm iD yci.hfz.is Elm Ci nm if exp I is misclassified correctly classified exp l am ZJm h expC am I Tm e gli I I Tm Ici fly hfielis gcm'D expolauinr non negative min J Em dm want E'm argufy hint E'm'D This is training the weak classifies L e t E m I I L i 2C y h a c i s o c m expc.am to 0 to find a Mno in Substitute main mm explain expc at'm explain7Em theoptimalain ifffsum of.ua unpronounced partial derivative wrZ expCam exp2mgexpCam Fm explain 7 Em 0 l Emy exp C af I II Introduction to Deep Learning Neural Networks Ed generous Neural Networks Input layer architecture Hidden layers 28 x 28 = 784 I Output layer h ( x ̄ , W ) = f ( z ( 3 ) ) thresholdingfunction activationfunction 784 in neurons your inputlayer forward network Neural Networks: Example multiclass classificationb Single layer NN Input layer bias/offset +1 4" x1 4# x2 4$ w hit watact h#̅;%̅ ='(!#!+(%=* !"# Want to learn the weights 6̅ = 8&,8',...,8( ) -1 -1 -1 -1 Example: Two-layer NN Input layer -(&) x1 && Hidden layer Output layer by convention this is called a two-layer network since there are two sets of hidden weights fully connected h ( x ̄ , W ) = f ( z ( 3 ) ) our :(<)is the weight of layer j, unit k, input i "2 Activation Functions Input layer activation function h(z) parameters: )̅ = :#, :!, :?, :> @
examples of activation functions:
• threshold
• logistic
• σ(z) = &
bias/offset
• range0to1
• hyperbolic tangent
• tanh(z) = 2σ(2z) – 1
• range -1 to 1
• f(z) = max(0,z)
Rectified linear unit
weighted sum of inputs
non-linear transformation

Neural Networks
ifz(2) = W (1)x1 + W (1)x2 + W (1), h(2) = g(z(2)) 1 11 12 101 1
Hidden layer
Input layer
-(4)is the weight of layer j, unit k, input i 3$

Neural Networks
Input layer
Hidden layer
z(2) =W(1)x1 +W(1)x2 +W(1),h(2) =g(z(2)) 1 11 12 101 1
z(2) =W(1)x1 +W(1)x2 +W(1),h(2) =g(z(2)) 2 21 22 202 2
z(2) = W (1)x1 + W (1)x2 + W (1), h(2) = g(z(2)) 3 31 32 303 3
-(4)is the weight of layer j, unit k, input i 3$

Neural Networks
Input layer
z(2) =W(1)x1 +W(1)x2 +W(1),h(2) =g(z(2)) 1 11 12 101 1
z(2) =W(1)x1 +W(1)x2 +W(1),h(2) =g(z(2)) 2 21 22 202 2
Hidden layer
z(2) = W (1)x1 + W (1)x2 + W (1), h(2) = g(z(2)) h(2) 3 31 32 30 3 3
g activation function

Neural Networks
Input layer
z(2) =W(1)x1 +W(1)x2 +W(1),h(2) =g(z(2)) 1 11 12 101 1
Hidden layer z(2) = W (1)x1 + W (1)x2 + W (1), h(2) = g(z(2)) 2 21 22 202 2
z(2) =W(1)x1 +W(1)x2 +W(1),h(2) =g(z(2)) h(2) 3 31 32 30 3 3
Output layer
z(3) =W(2)h(2) +W(2)h(2) +W(2)h(2) +W(2) +1 11 1 12 2 13 3 10
Note: f() is a potentially different activation function

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts