Ensemble
Learning
Outline ensembles
Why What
choices How
Applications
Why
ensembles
Two heads
are
better
than one 个臭皮匠 胜过诸葛亮
Why
to to
Simple Easy
use
Yet Many
understand powerful
very SOTA
are
ensembles
No over filling empirically
Preferred
choice
for many
Asimpledemozr Available M individual classifiers
IG
Ensemble
一
ID IGu
eg.by v
x
Majority
x
x
Vote
X
o
Is
Testing results
If Icz
IS
voting
Case
X
Accuracy
x23 43
以
100
t Betrwnnrnn
x
ˇ
V voting
V
IS
Case I Testing results
Accuracy
x23
43 2b
2分 t
No
竟⻅太 一 到
IS Icz
x X
x
difnce
Case II Testing results
Accuracy S
xx
X V x lb
It Iz
xxt voting x X X
t Worse.nu
三个 样臭臭上加臭
Ig
S O
v
Desirable Each IC is
IC is
OK random guessing
也太剖
Better than
Diverse
Each has
造有优缺 取⻓补短
its
own
Heterogeneous
skills
What
choices
Parallel
Inputs
IC
I
IG
Bagging RF LinearRegressiohassifaim
Sequential
北
output
output
eg
IC
北
Inputs
AdaBoost
e.g
ICs
different
be all
may Similar
tree stumps
sunt Logistics 十 CART
Where to
for
friends
䞺
me
回幽
parallel friends
Ǘ
Weighting
Example
A group of
go
dinner
鬯
䞺匪毥
Boy Girl irl
sequential
How
focus
sequential
on
Boostingze
Boost Freund st Schapire
Ada
1996,1997 GBMIGBR
Extensions SGB
boost light GBM
xg
AdaBoost
Freunds.IS chapire
The
in the
1996,1997
best of the shelf classifier
world
一一
Breiman NearIPS 1996
Hierarchical structure Gnsgnymhn
G Esnì
Miwkighted
wkighted sample
G
sample G2 Esi
sample Esnìwkighted
八
Traiug sample
G
Esi
InputfEhTyiYiliinIyiEu.g
Initialize
Fit a classifierGm
obs M
上
n
weights For m 1
wi
O 二 点wi
wilt weights wi frrcrerrm I 付 㳤 Gúnnwi
Compute
to
Ìuiiyi
三
Gm
三ui D
g 造器 wiexpdmiyi
Compute 人 Update
Gm GgsgnxmGm版
Wi Output
Gm e.g
1 1 can be any classifier
Tree
FAN
Height71.8
四與回
weak classifier
betterthan random guessing
Remarkszn
E
Stump
More weights
For
人
uiwiexpdniyi W i x 全器
Wi
In next
to
weak classifier
mis classified point
l
e.mn
to log
if errm 主 I
器
then
so
weights less weights
iii
to to
more
round
Gm
yit Gm yi二Gm
mis classified point
classified
point
Data
Classifier IN
G
_i
小
X
x
小
Classifio 2
Gin
s
合 ㄨ
Find classier
3
hi
Class ifio 3
G
二 SggxmGm外
G
M 1
quality ofGm
Stubborn
If some points keep ding
even
mis classified more weights are assigned
of A NO
overfnngp surprising
It
IQ
danger
Extension from bing
0 yi
Same
to multi
iyD.rinn K
class
E
1,2
procedure
with
last step
Gong
max
Ěxmliy ydik
䋪
Biasvshriancel
北
Inputs Tree stump
北
output
low
is
variance
bias feature only
error use one
high
stable learner Boosting
slow
but
Gradually reducing wilt out
bias increasing variance
Statistical View
Boost
of
Ada
Nhisn
m
Let
fn
业
Gil fintm
后
三点
f
Gin
Additive
二
Model t
Tliliiid
AdaBoost
Stagewise AddieModelling
二
Forward
with
fgkeyf
exponential LG
A
loss
ˊ
Algorithm
I Initialize f
0
Er ward Stagewise Additive Modeling
2 For m 1 M_ Compute
him
egitreestunyiig fitpcj
Update ix
貳年
二 点十PmGG利
PROOF.si
For
Gnn
AdaBoost the basis functions are
Under lBm.Gmka min
G
Efl loss
argminnci
1
LlyfaDeMrf
exp
Èl.li
缈 Èacpfgexpfpyi缈
fit Èapfltp 缈
argmin
G
È
uiepfy off
argmin
G
wixexpfpyfg
N
where
free
G
2
stageq
stim.cn
find Gm
Fix
Given Gm find
f
Pm
First stage.ee
Now
Èuinpfpyi缈
嚪 非Glxi 可 扶Glxi Ěǐ
Ewiiftwt
巨 wig yEGGD
遠
e
與
are
iteii.it
yīGlxi ēijuitei
ē
yīGlxi
地形 獻
free of G
Hence Giargǎniniuiexpfyg
argǎninětn
argǎniny
for
fixed
B
argǎnin fee of
Enid
BJ
反
Secoudstagezfiargmin Èuinpfpcgpargjninli
aětbé
gipibdaé eialb
主 ln 台
Let
g
三
p
0
So
ytlxim fjh
win
還
fd
挑的
贏 喔i t
泛去
焦 win
揪划
in 主Xm
匯
三
A
Update Update
fcx
W
fptpm
GG from
飛飛铡
十 Bmj
wiacpffm 生與 rrihrn
拍哪
uiapfniyiaěm
uipfniyix
w
eapfyi 追
二
uiapffmfz
泓
二
Finally f二点
BmGm
sgnlfg
Sgf
sgnfzfmGm
sgnfxmGm
Variants of Ada Boost
Statistical be Her
BGtension
Boosting Reg
GBMIGBR
xg boost lightGBM
view understanding
Many
Varian
Boosting Regression
Data
Model Y
0
t
避到
ing continuous
LE
Lyf 火
fxt
Eii
i
Loss
fxf
Boosting reg.cn
regression residuals
sequentially
Why
幽吵
Lyfmn
ersrr
和刪二感斷同
7
min BG
明
ii_ 衇 明
Ǘhyim
residual in last round
甽 共遁網那衇明
Ldihii
二
一
豳州 一 㗀
11
Where
Choice
GGD tree
of Glxi D
Boostedg
basis regression
a
simple
reg Reg
Glx
Glxi
H
Boosted Linear
linear
Neural
Nets CNN Boosted NN
8
Boosted Reg Tree
regress in
tree
Gait
Glxi
8
一
點
Boostedg
鑂
Parameter M Hyperparameter
of trees cross_validation
8 d
d
7 tee depth tree stumps
no interactions interactions
learning ratj
l
d1
d
7 g
1
too big too small
might not是苡恐 too slow I
E38 10
Initialize
fdx E
0
Algorithmic
For
tree
fxef it
Output
m 17Gnn
m1
Data hi 啊
G
in
with d leaves
yifg yi M
xd
M
f
7Gm rinan
Simulation
Y
Xi Uníf 0,1 Eine NG 12
Sin 二 500
2而不 Ein
t
Boosted
M y
Reg
Tree
d2 tree
1
stumps 10,102,103
1
BoostlinearRegress
H y
a.GG
xtpgt
simplelinearreg
i.e one
feature only
P
Reg
Boosted Linear Il
Forwardsteplzvg Gradient version is be Her
Eiijdl