Introduction to Machine Learning Boosting
Prof. Kutty
Announcements
Copyright By PowCoder代写 加微信 powcoder
Homework 2
• due on Wednesday 2/16 at 10pm ET.
• check your remaining late days — use them wisely!
Midterm alternate exam request form/SSD Accommodations
• due tonight Monday 2/14 at 10pm
• If your alternate exam request is approved, you will receive an
email confirming this by Thursday 2/17.
• If we have your accommodation request from SSD, you will also receive an email confirming this by Thursday 2/17.
• A separate announcement will be made once these emails have been sent out.
Today’s Agenda
• Ensemble Methods
– BootstrapSampling – Bagging
– RandomForests
– Boosting
Classifiers
Training phase
Training Data (x(i), y(i))
i ∈ {1, …, n}
learning algorithm (parameters)
To classify a new example
new example x
predicted label y
learning algorithm
(parameters learnt in training phase)
Ensemble Classifiers better generalization error
1. Bagging
za’s yea ace yea
Training Data
(x(i), y(i)) i ∈ {1, …, n}
Training phase
decision tree 1
decision tree 2
To classify a new example
new example x decision tree 1
decision tree 2
decision tree M
predicted label y
decision tree M
bootstrap sampling
majority voting
bootstrap sampling
2. Boosting
Training phase
To classify a new example
new example x
base learning algorithm 1 (parameters)
base learning algorithm 2 (parameters)
base learning algorithm 1 (parameters)
Training Data
(x(i), y(i)) i ∈ {1, …, n}
base learning algorithm 2 (parameters)
base learning algorithm M (parameters)
base learning algorithm M (parameters)
predicted label y
weighted samples
weighted samples
weighted prediction
general strategy for
combining weak classifiers
into a strong classifier
weak classifier misclassification rate < 0.5
Boosting: AdaBoost
• Adaptive Boosting
• formulated by Freund and Schapire
– 2003 Gödel Prize
• AdaBoost (with decision trees as the weak learners)
– one of the best out-of-the-box classifiers mommmmmmm
positive negative
Example: AdaBoost Training Phase M = 3
basedon of performance
AdaBoost: Classify
sign(h! (̅ ) M hyperparameter
h ( ̅ = ∑ ! α h ( ( ̅ ; θ2 ) 4 ≥ 0 ! "#$" " "
AdaBoost optimizes exponential loss
• Exponential Loss
• Ensemble classifier
7899%&' : = exp(−:)
h ! ( ̅ = > α ( h ( ( ̅ ; θ2 ( ) (#$
AdaBoost aims to minimize
classifier * ensemble
> 7899+,-(?())h”((̅())))
251 y 5 II
w h e r e θ2 = ( k , @ 1 , @ $ ) E
Decision stumps as a weak classifier
original dataset
coordinate
– – – – – + Oo 5
— 5 +! OiI -++!
Example: AdaBoost Training Phase M = 3
basedon of performance
normalizedto AdaBoost
Algorithm • for ) = 1, … , * hyperparameter
• S e t “! ” # = $# f o r # = 1 , … , (
– find θ+% = argmin 3% ‘&
where 3 = ∑$ “!
– L e t 3 ̂ = ∑ $ “!
# 5 ( ≠ h ( : ̅ ( ( ) ; θ+ ) %
≠ h(:̅((); θ+)
is the weights points
of misclassified
=#ln#*/.! – /. !
of weights ossified point 1 Em sum correctly da
e o f t h e Infusion wiggsifier
w e i g h t o f p o i n t i
sum of points ofdata of misclassified loss
contribution expaassition “M” an
ensemble prediction
& –Update”!%#=012*3 4!6̅ =98!#$(012*3″:!46̅”;θ!
for # = 1, … , ( 7! 7! normalization constant
• Output classifier h :̅ = ∑< α h(:̅; θ+ ) < %)#% %
AdaBoost: High level view
Given training data examples "̅ " , $ "
• assign uniform weights %&(') to data ' = 1, ... , ,
• for-=1,...,.
– learnaweakclassifier(restrictedtosimplehypothesisclass)on
weighted data. Call the learnt parameter /̅ '
– basedontheperformance(weightedtrainingloss)
of this weak classifier assign a classifier weight α' ≥ 0
– basedonensemblepredictions upweight misclassified
downweight correctly classified data i.e., compute %'(') for i = 1,...,n
Output classifier: sign(h "̅ ) where h "̅ = ∑( α h("̅; θA ) ( ( '#$' '
What is !!?
–LetB̂ =∑* DI
E ?) ≠h((̅());θ2 ) "
misclassified
– L e t 4 " = $ l n $ 2 54 ! 3 54!
• IfB̂ is0? am "
• IfB̂ is0.5? "
Boosting Property
Weighted error relative to updated weights is exactly 0.5
W ̃ ( i ) [ [ y ( i ) 6 = h ( x ̄ ( i ) ; ✓ ̄ ) ] ] mm
CS 445, Winter 2015 – Problem Set 2, Issued: Fri., 02/06 Due: Fri., 02/20 at 9am
output final classifier h(x ̄) = P ↵ˆ h(x ̄; ✓ ̄⇤ )
et Wf (i) = 1 for i = 1 . . . n
rm=1toM do:
Boosting in Practice
find h(x ̄; ✓ ̄⇤ ) that approximately minimizes the weighted training error m
WeightePd error tends to increase with each boosting iteration achine learning (Fall 2010), lecture 10 (Jaakkola)
✏ = n Wf ( i ) [ [ y ( i ) 6 = h ( x ̄ ( i ) ; ✓ ̄ ) ] ] m i=1m 1 m
given✓ ̄⇤,compute✏ˆamhtedtrainingloss:
) ; ✓ ̄ ⇤ ) m
updateweightsonallas
0.050 10 20 30 40 50
↵ˆ =1log(1 ✏ˆ ) 2 ✏ˆm 0.3
for i = 1 to n do:
Wm(i)=cmWm x { y m }
number of iterations
nd for Figure 2: Weighted error ✏ˆm as a function of m.
Figure from Jaakkola
weighted training error
AdaBoost optimizes for exponential loss
heeger isuers
g so sho inin ns.
= Y2p✏ˆ(1 ✏ˆ) kk
k=1 Understanding Boosting
xponential loss over the training examples is an upper bound on the trainin
pper bound goes down monotonically with m provided that the base learn mething at each iteration (their weighted errors less than half). Figure 3
Empirically, ensemble training error tends to decrease.
g error as well as the upper bound as a function of the boosting iteratio
0.16 0.14 0.12
0.08 exp loss in red
0.06 0.04 0.02
00 10 20 30 40 50
number of iterations
Figure from Jaakkola
3: The training error of the ensemble as well as the corresponding exponential l bound) as a function of the boosting iterations.
training error
arners. We will make this statement more precise later on. Moreover, the boo
after ) dir eque
fit uickroce
modify the ensemble in sensible ways (increasing the margin) even ror is zero. We can also relate the margin, or the margin error Rn(hm; ⇢
Understanding Boosting
zation error. Another reason for resistance to overfitting is that the s for optimizing the exponential loss is not very e↵ective. We would over
Empirically, ensemble test error tends not to increase.
ly if we reoptimized {↵j}’s jointly rather than through the sequential p scussion of boosting as gradient descent below).
0.16 0.14 0.12
0.1 0.08 0.06 0.04 0.02
00 10 20 30 40 50
number of iterations
Figure from Jaakkola
5: The training error of the ensemble as well as the corresponding generaliz s a function of boosting iterations.
ne en dm qd
training/test error
Boosting- Example
Blue circles – positive, Red diamonds - negative You apply the Adaboost algorithm and learn the first decision stump pictured below. Determine
Set =8# > = ! for > = 1,…,C $
for D = 1, … , E
– f i n d θI % = a r g m i n N %
$ ((()I whereN%=∑()!=8%*! > P ≠h(S̅ ;θ)
– LetN̂ =∑$ =8 > P( ≠h(S̅(();θI )
a ) ✏ˆ a n d ↵ˆ 11
b) Wf (i) for i = 1…n 1
c) weighted training error relative to updated weights (from b)
0.21 0.2 0.19 0.18 0.17 0.16 0.15 0.14
0.3 0.4 0.5
h(S̅; θI )
– LetU% =!ln!*.-” ” .-”
– U p d a t e =8 % > = / 0 1 * 2 # 3 ” 5 ̅ # 6″
87 ” $ % ( / 0 1 * 2 # 9 ” 3 5 ̅ # ; θ& ” 6″
for > = 1, … , C
Output classifier h S̅ = ∑; α
since only point
the first round all pts have the same
I in III I in 6
4 is misclassified by this decision stump and in
for points correctly classified by mis decision shrimp
unnoomatized two m exp l y a head 8
w o I exp enV6 I flint I
for pointsincorrectly classified by mis decision shrimp
unnoomatized I en V6 IT Wi exp
normalization
constant 2 É
since only one point is misclassified and wrights are normalized
also see slide
weighted training I
error of decision shrimp I for
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com