CS代考 CS 445, Winter 2015 – Problem Set 2, Issued: Fri., 02/06 Due: Fri., 02/20 a

Introduction to Machine Learning Boosting
Prof. Kutty

Announcements

Copyright By PowCoder代写 加微信 powcoder

Homework 2
• due on Wednesday 2/16 at 10pm ET.
• check your remaining late days — use them wisely!
Midterm alternate exam request form/SSD Accommodations
• due tonight Monday 2/14 at 10pm
• If your alternate exam request is approved, you will receive an
email confirming this by Thursday 2/17.
• If we have your accommodation request from SSD, you will also receive an email confirming this by Thursday 2/17.
• A separate announcement will be made once these emails have been sent out.

Today’s Agenda
• Ensemble Methods
– BootstrapSampling – Bagging
– RandomForests
– Boosting

Classifiers
Training phase
Training Data (x(i), y(i))
i ∈ {1, …, n}
learning algorithm (parameters)
To classify a new example
new example x
predicted label y
learning algorithm
(parameters learnt in training phase)

Ensemble Classifiers better generalization error

1. Bagging
za’s yea ace yea
Training Data
(x(i), y(i)) i ∈ {1, …, n}
Training phase
decision tree 1
decision tree 2
To classify a new example
new example x decision tree 1
decision tree 2
decision tree M
predicted label y
decision tree M
bootstrap sampling
majority voting
bootstrap sampling

2. Boosting
Training phase
To classify a new example
new example x
base learning algorithm 1 (parameters)
base learning algorithm 2 (parameters)
base learning algorithm 1 (parameters)
Training Data
(x(i), y(i)) i ∈ {1, …, n}
base learning algorithm 2 (parameters)
base learning algorithm M (parameters)
base learning algorithm M (parameters)
predicted label y
weighted samples
weighted samples
weighted prediction

general strategy for
combining weak classifiers
into a strong classifier
weak classifier misclassification rate < 0.5 Boosting: AdaBoost • Adaptive Boosting • formulated by Freund and Schapire – 2003 Gödel Prize • AdaBoost (with decision trees as the weak learners) – one of the best out-of-the-box classifiers mommmmmmm positive negative Example: AdaBoost Training Phase M = 3 basedon of performance AdaBoost: Classify sign(h! (̅ ) M hyperparameter h ( ̅ = ∑ ! α h ( ( ̅ ; θ2 ) 4 ≥ 0 ! "#$" " " AdaBoost optimizes exponential loss • Exponential Loss • Ensemble classifier 7899%&' : = exp(−:) h ! ( ̅ = > α ( h ( ( ̅ ; θ2 ( ) (#$
AdaBoost aims to minimize
classifier * ensemble
> 7899+,-(?())h”((̅())))

251 y 5 II
w h e r e θ2 = ( k , @ 1 , @ $ ) E
Decision stumps as a weak classifier

original dataset
coordinate
– – – – – + Oo 5
— 5 +! OiI -++!

Example: AdaBoost Training Phase M = 3
basedon of performance

normalizedto AdaBoost
Algorithm • for ) = 1, … , * hyperparameter
• S e t “! ” # = $# f o r # = 1 , … , (
– find θ+% = argmin 3% ‘&
where 3 = ∑$ “!
– L e t 3 ̂ = ∑ $ “!
# 5 ( ≠ h ( : ̅ ( ( ) ; θ+ ) %
≠ h(:̅((); θ+)
is the weights points
of misclassified
=#ln#*/.! – /. !
of weights ossified point 1 Em sum correctly da
e o f t h e Infusion wiggsifier
w e i g h t o f p o i n t i
sum of points ofdata of misclassified loss
contribution expaassition “M” an
ensemble prediction
& –Update”!%#=012*3 4!6̅ =98!#$(012*3″:!46̅”;θ!
for # = 1, … , ( 7! 7! normalization constant
• Output classifier h :̅ = ∑< α h(:̅; θ+ ) < %)#% % AdaBoost: High level view Given training data examples "̅ " , $ " • assign uniform weights %&(') to data ' = 1, ... , , • for-=1,...,. – learnaweakclassifier(restrictedtosimplehypothesisclass)on weighted data. Call the learnt parameter /̅ ' – basedontheperformance(weightedtrainingloss) of this weak classifier assign a classifier weight α' ≥ 0 – basedonensemblepredictions upweight misclassified downweight correctly classified data i.e., compute %'(') for i = 1,...,n Output classifier: sign(h "̅ ) where h "̅ = ∑( α h("̅; θA ) ( ( '#$' ' What is !!? –LetB̂ =∑* DI E ?) ≠h((̅());θ2 ) " misclassified – L e t 4 " = $ l n $ 2 54 ! 3 54! • IfB̂ is0? am " • IfB̂ is0.5? " Boosting Property Weighted error relative to updated weights is exactly 0.5 W ̃ ( i ) [ [ y ( i ) 6 = h ( x ̄ ( i ) ; ✓ ̄ ) ] ] mm CS 445, Winter 2015 – Problem Set 2, Issued: Fri., 02/06 Due: Fri., 02/20 at 9am output final classifier h(x ̄) = P ↵ˆ h(x ̄; ✓ ̄⇤ ) et Wf (i) = 1 for i = 1 . . . n rm=1toM do: Boosting in Practice find h(x ̄; ✓ ̄⇤ ) that approximately minimizes the weighted training error m WeightePd error tends to increase with each boosting iteration achine learning (Fall 2010), lecture 10 (Jaakkola) ✏ = n Wf ( i ) [ [ y ( i ) 6 = h ( x ̄ ( i ) ; ✓ ̄ ) ] ] m i=1m1 m given✓ ̄⇤,compute✏ˆamhtedtrainingloss: ) ; ✓ ̄ ⇤ ) m updateweightsonallas 0.050 10 20 30 40 50 ↵ˆ =1log(1✏ˆ ) 2 ✏ˆm 0.3 for i = 1 to n do: Wm(i)=cmWm x {y m } number of iterations nd for Figure 2: Weighted error ✏ˆm as a function of m. Figure from Jaakkola weighted training error AdaBoost optimizes for exponential loss heeger isuers g so sho inin ns. = Y2p✏ˆ(1✏ˆ) kk k=1 Understanding Boosting xponential loss over the training examples is an upper bound on the trainin pper bound goes down monotonically with m provided that the base learn mething at each iteration (their weighted errors less than half). Figure 3 Empirically, ensemble training error tends to decrease. g error as well as the upper bound as a function of the boosting iteratio 0.16 0.14 0.12 0.08 exp loss in red 0.06 0.04 0.02 00 10 20 30 40 50 number of iterations Figure from Jaakkola 3: The training error of the ensemble as well as the corresponding exponential l bound) as a function of the boosting iterations. training error arners. We will make this statement more precise later on. Moreover, the boo after ) dir eque fit uickroce modify the ensemble in sensible ways (increasing the margin) even ror is zero. We can also relate the margin, or the margin error Rn(hm; ⇢ Understanding Boosting zation error. Another reason for resistance to overfitting is that the s for optimizing the exponential loss is not very e↵ective. We would over Empirically, ensemble test error tends not to increase. ly if we reoptimized {↵j}’s jointly rather than through the sequential p scussion of boosting as gradient descent below). 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 00 10 20 30 40 50 number of iterations Figure from Jaakkola 5: The training error of the ensemble as well as the corresponding generaliz s a function of boosting iterations. ne en dm qd training/test error Boosting- Example Blue circles – positive, Red diamonds - negative You apply the Adaboost algorithm and learn the first decision stump pictured below. Determine Set =8# > = ! for > = 1,…,C $
for D = 1, … , E
– f i n d θI % = a r g m i n N %
$ ((()I whereN%=∑()!=8%*! > P ≠h(S̅ ;θ)
– LetN̂ =∑$ =8 > P( ≠h(S̅(();θI )
a ) ✏ˆ a n d ↵ˆ 11
b) Wf (i) for i = 1…n 1
c) weighted training error relative to updated weights (from b)
0.21 0.2 0.19 0.18 0.17 0.16 0.15 0.14
0.3 0.4 0.5
h(S̅; θI )
– LetU% =!ln!*.-” ” .-”
– U p d a t e =8 % > = / 0 1 * 2 # 3 ” 5 ̅ # 6″
87 ” $ % ( / 0 1 * 2 # 9 ” 3 5 ̅ # ; θ& ” 6″
for > = 1, … , C
Output classifier h S̅ = ∑; α

since only point
the first round all pts have the same
I in III I in 6
4 is misclassified by this decision stump and in
for points correctly classified by mis decision shrimp
unnoomatized two m exp l y a head 8
w o I exp enV6 I flint I
for pointsincorrectly classified by mis decision shrimp
unnoomatized I en V6 IT Wi exp
normalization
constant 2 É
since only one point is misclassified and wrights are normalized
also see slide
weighted training I
error of decision shrimp I for

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com