One-against-all
Daniel Hsu (COMS 4771)
Theorem. Let ηˆ1, . . . , ηˆK : X → [0, 1] be estimates of conditional probability functions x → P(Y = k | X = x) for k = 1,…,K, and let
ε:=E max ηˆk(X)−P(Y=k|X) .
k=1,…,K
Let fˆ: X → {1,…,K} be the one-against-all classifier based on ηˆ1,…,ηˆK, i.e.,
ˆ
f(x) = argmaxηˆk(x), x ∈ X,
k=1,…,K (withtiesbrokenarbitrarily),andletf⋆:X →{1,…,K}betheBayesoptimalclassifier. Then
ˆ⋆ P(f(X)̸=Y)−P(f (X)̸=Y)≤2ε.
⋆⋆ˆ
Proof. Fixx∈X,y :=f (x),andyˆ:=f(x). Letηk(x):=P(Y =k|X=x)forallk=1,…,K. Then
ˆ⋆ P(f(X)̸=Y |X =x)−P(f (X)̸=Y |X =x)= ηk(x)− ηk(x)
k̸=yˆ k̸=y⋆
= ηy⋆ (x) − ηyˆ(x)
= ηˆy⋆ (x) − ηˆyˆ(x) +ηy⋆ (x) − ηˆy⋆ (x) + ηˆyˆ(x) − ηyˆ(x)
≤0
≤ 2 max |ηˆk(x) − ηk(x)|.
k=1,…,K
Therefore, taking expectations with respect to X,
ˆ⋆
P(f(X)̸=Y)−P(f (X)̸=Y)≤2·E max ηˆk(X)−ηk(X) . k=1,…,K
⋆⋆ˆ The bound on the excess risk is tight. To see this, suppose for a given x ∈ X (with y = f (x) and yˆ = f(x)),
wehaveηˆy⋆(x)=ηˆyˆ(x)−δ,butηy⋆(x)=ηˆy⋆(x)+εandηyˆ(x)=ηˆyˆ(x)−ε. Then ηy⋆ (x) − ηyˆ(x) = (ηˆy⋆(x) + ε) − (ηˆyˆ(x) − ε)
which tends to 2ε as δ → 0.
= 2ε − δ
1