程序代写代做代考 algorithm The algorithm

The algorithm
AdaBoost
Daniel Hsu (COMS 4771)
The input training data is {(xi , yi )}ni=1 from X × {−1, +1}. • Initialize D1(i) := 1/n for each i = 1,…,n.
• Fort=1,…,T,do:
– Give Dt-weighted examples to Weak Learner; get back ht : X → {−1, +1}. – Compute weight on ht and update weights on examples:
n
st := 􏰊 Dt(i) · yiht(xi)
2 1−st
Dt+1(i):=Dt(i)·exp(−αt·yiht(xi)) foreachi=1,…,n
Zt
n
Zt := 􏰊 Dt(i) · exp(−αt · yiht(xi))
i=1
αt := 1 ln 1 + st
where
i=1
is the normalizer that makes Dt+1 a probability distribution.
• Final hypothesis is hˆ defined by hˆ(x) := sign 􏰁􏰉Tt=1 αt · ht(x)􏰂 for x ∈ X . Training error rate bound
Let lˆ be the function defined by ˆˆˆˆ
ˆ􏰊 l(x) :=
T
t=1
αt · ht(x)
so h(x) = sign(l(x)). The training error rate of h can be bounded above by the average exponential loss of l:
1n 1n 􏰊ˆ􏰊ˆ
n 1{h(xi) ̸= yi} ≤ n exp(−yil(xi)). i=1 i=1
for x ∈ X
This holds because
ˆˆˆ
h(xi) ̸= yi ⇔ −yil(xi) ≥ 0 ⇔ exp(−yil(xi)) ≥ 1.
1

Furthermore, the average exponential loss of lˆ equals the product of the normalizers from all rounds: 1nn􏰅T􏰆
􏰊ˆ􏰊􏰊 exp(−yil(xi)) = D1(i) · exp −
n
i=1
αt · yiht(xi)
D1(i) · exp(−α1 · yih1(xi)) 􏰅 T 􏰆
i=1 =Z􏰊
t=1
n
·exp −􏰊α·yh(x) 1Ztiti
i=1 1 t=2 n􏰅T􏰆
=Z􏰊D(i)·exp −􏰊α·yh(x) 12titi
i=1
n
t=2
D2(i) · exp(−α2 · yih2(xi))
􏰅 T 􏰆 ·exp −􏰊α·yh(x)
=ZZ􏰊
12 Z titi
=··· T
= 􏰋 Zt. t=1
Since each yiht(xi) ∈ {−1, +1}, the normalizer Zt can be written as n
Zt = 􏰊 Dt(i) · exp(−αt · yiht(xi)) i=1
i=1 2 n􏰅T􏰆
t=3
=ZZZ􏰊D(i)·exp −􏰊α·yh(x)
123 3 i=1
titi t=3
􏰊n
= Dt(i) ·
i=1 􏰊n
= Dt(i)· 2 1+s + 2 1−s i=1 t t
= 􏰌(1 + st)(1 − st)
􏰍
=
So, we conclude the following bound on the training error rate of hˆ:
1n 1n TT􏰍􏰅1T􏰆 􏰊ˆ􏰊ˆ􏰋􏰋2􏰊2
st t=1
􏰃1 + yiht(xi)
2 exp(−αt) +
1 − yiht(xi) 2
􏰄
exp(αt) 􏰃1 + yiht(xi)􏰎1 − st 1 − yiht(xi)􏰎1 + st 􏰄
1−s2t.
exp(−yil(xi))=
where the last step uses the fact that 1 + x ≤ ex for any real number x.
(The bound is usually written in terms of γt := st/2, i.e., as exp(−2 􏰉Tt=1 γt2).) Margins on training examples
Let gˆ be the function defined by
􏰉Tt=1 αt · ht(x)
gˆ(x) := 􏰉Tt=1 |αt| for x ∈ X
so yigˆ(xi) is the margin achieved on example (xi, yi). We may assume without loss of generality that αt ≥ 0 for each t = 1, . . . , T (by replacing ht with −ht as needed.) Fix a value θ ∈ (0, 1), and consider the fraction of training examples on which gˆ achieves a margin at most θ:
1 􏰊n
n
n 1{h(xi)̸=yi}≤ n i=1
i=1
t=1
Zt = 1−st ≤exp −2 t=1
i=1
1{yigˆ(xi) ≤ θ}. 2

This quantity can be bounded above using the arguments from before:
1n1n􏰇T􏰈 􏰊􏰊ˆ􏰊
n
i=1
1{yigˆ(xi)≤θ}= n 1 yil(xi)≤θ αt i=1 t=1
􏰅T 􏰆1n 􏰊􏰊ˆ
≤ exp θ αt · n exp(−yil(xi)) t=1 i=1
􏰅T 􏰆T􏰍 =exp θ􏰊αt ·􏰋 1−s2t
t=1 t=1
= 􏰋T 􏰍(1 + st)1+θ(1 − st)1−θ.
t=1
Suppose that for some γ > 0, st ≥ 2γ for all t = 1,…,T. If θ < γ, then using calculus, it can be shown that each term in the product is less than 1: 􏰍 (1 + st)1+θ(1 − st)1−θ < 1. Hence, the bound decreases to zero exponentially fast with T. 3

Related Posts