CS计算机代考程序代写 One-against-all

One-against-all

One-against-all

Daniel Hsu (COMS 4771)

Theorem. Let η̂1, . . . , η̂K : X → [0, 1] be estimates of conditional probability functions x 7→ P(Y = k | X = x)
for k = 1, . . . ,K, and let

� := E
[

max
k=1,…,K

∣∣∣η̂k(X)− P(Y = k | X)∣∣∣] .
Let f̂ : X → {1, . . . ,K} be the one-against-all classifier based on η̂1, . . . , η̂K , i.e.,

f̂(x) = arg max
k=1,…,K

η̂k(x), x ∈ X ,

(with ties broken arbitrarily), and let f? : X → {1, . . . ,K} be the Bayes optimal classifier. Then

P(f̂(X) 6= Y )− P(f?(X) 6= Y ) ≤ 2�.

Proof. Fix x ∈ X , y? := f?(x), and ŷ := f̂(x). Let ηk(x) := P(Y = k | X = x) for all k = 1, . . . ,K. Then

P(f̂(X) 6= Y | X = x)− P(f?(X) 6= Y | X = x) =

k 6=ŷ

ηk(x)−

k 6=y?
ηk(x)

= ηy?(x)− ηŷ(x)
= η̂y?(x)− η̂ŷ(x)︸ ︷︷ ︸

≤0

+ηy?(x)− η̂y?(x) + η̂ŷ(x)− ηŷ(x)

≤ 2 max
k=1,…,K

|η̂k(x)− ηk(x)|.

Therefore, taking expectations with respect to X,

P(f̂(X) 6= Y )− P(f?(X) 6= Y ) ≤ 2 · E
[

max
k=1,…,K

∣∣∣η̂k(X)− ηk(X)∣∣∣] .
The bound on the excess risk is tight. To see this, suppose for a given x ∈ X (with y? = f?(x) and ŷ = f̂(x)),
we have η̂y?(x) = η̂ŷ(x)− δ, but ηy?(x) = η̂y?(x) + � and ηŷ(x) = η̂ŷ(x)− �. Then

ηy?(x)− ηŷ(x) = (η̂y?(x) + �)− (η̂ŷ(x)− �)
= 2�− δ

which tends to 2� as δ → 0.

1