One-against-all
One-against-all
Daniel Hsu (COMS 4771)
Theorem. Let η̂1, . . . , η̂K : X → [0, 1] be estimates of conditional probability functions x 7→ P(Y = k | X = x)
for k = 1, . . . ,K, and let
� := E
[
max
k=1,…,K
∣∣∣η̂k(X)− P(Y = k | X)∣∣∣] .
Let f̂ : X → {1, . . . ,K} be the one-against-all classifier based on η̂1, . . . , η̂K , i.e.,
f̂(x) = arg max
k=1,…,K
η̂k(x), x ∈ X ,
(with ties broken arbitrarily), and let f? : X → {1, . . . ,K} be the Bayes optimal classifier. Then
P(f̂(X) 6= Y )− P(f?(X) 6= Y ) ≤ 2�.
Proof. Fix x ∈ X , y? := f?(x), and ŷ := f̂(x). Let ηk(x) := P(Y = k | X = x) for all k = 1, . . . ,K. Then
P(f̂(X) 6= Y | X = x)− P(f?(X) 6= Y | X = x) =
∑
k 6=ŷ
ηk(x)−
∑
k 6=y?
ηk(x)
= ηy?(x)− ηŷ(x)
= η̂y?(x)− η̂ŷ(x)︸ ︷︷ ︸
≤0
+ηy?(x)− η̂y?(x) + η̂ŷ(x)− ηŷ(x)
≤ 2 max
k=1,…,K
|η̂k(x)− ηk(x)|.
Therefore, taking expectations with respect to X,
P(f̂(X) 6= Y )− P(f?(X) 6= Y ) ≤ 2 · E
[
max
k=1,…,K
∣∣∣η̂k(X)− ηk(X)∣∣∣] .
The bound on the excess risk is tight. To see this, suppose for a given x ∈ X (with y? = f?(x) and ŷ = f̂(x)),
we have η̂y?(x) = η̂ŷ(x)− δ, but ηy?(x) = η̂y?(x) + � and ηŷ(x) = η̂ŷ(x)− �. Then
ηy?(x)− ηŷ(x) = (η̂y?(x) + �)− (η̂ŷ(x)− �)
= 2�− δ
which tends to 2� as δ → 0.
1