CS计算机代考程序代写 Proof that the Bayes Decision Rule is Optimal

Proof that the Bayes Decision Rule is Optimal
Theorem For any decision function g : IRd →g {0, 1}, Pr{g(X)! = Y} ≥ Pr{g∗(X)! = Y}
We’ll prove it in the 2-class problem.
Proof
First we concentrate the attention on the error rate (probability of classification error) of the generic decision function g(·). Look at a SPECIFIC feature vector (namely, condition on X = x), and recall that uppercase letters denote a random variable, while a lowercase letter denotes a value.
Pr{g(X)̸=Y |X=x}=1−Pr{g(X)=Y |X=x}
so, when X = x the probability of error is 1 minus the probability of correct decision. Wemakeacorrectdecisionifg(X)=1andY =1ORifg(X)=0andY =0. Note that the events are disjoint, so the probability of the union (OR) is the sum of the probabilities
1−Pr{g(X)=Y |X}=1−Pr{g(X)=1,Y =1|X=x}−Pr{g(X)=0,Y =0|X=x}. We now show that conditional on X = x, the events {g(X) = k} and {Y = k}
are independent (surprising, isn’t it?).
First, note that conditional on X = x, g(X) = g(x), and that, therefore, g(x) is just
the value of g evaluated at x. This is either 0 or 1.
Assume WLOG that g(x) = 1. Then Pr{g(x) = 0,Y = 0 | X = x)} is equal to zero, because g(x) is equal to 1. Note, therefore, that the event {g(X) = 1} has probability 0, and is conditionally independent of the event Y = 0 given X = x. therefore:
Pr{g(X)=0,Y =0|X=x)}=Pr{g(X)=0|X=x}Pr{Y =0|X=x)}.
Similarly,Pr{g(x)=1,Y =1|X=x)}=Pr{Y =1|X=x}because,byassump- tion, Pr {g(X = 1) | X = x} = 1:
BUT an event having probability 1 is independent of any other event (can you prove it ?), then
Pr{g(X)=1,Y =1|X=x}=Pr{g(X)=1|X=x}Pr{Y =1|X=x} by definition of independence.
Thus, for each x where g(x) = 1,
Pr{g(X)=k,Y =k|X=x)}=Pr{g(X)=k|X=x}Pr{Y =k|X=x)},
for k = 0, 1, and independence for this case is proved. 1

The same argument applies for each x where g(x) = 0: thus we can always write Pr{g(X)=k,Y =k|X=x)}=Pr{g(X)=k|X=x}Pr{Y =k|X=x)},
for k = 0, 1, which concludes the independence proof. NownotethatPr{g(X)=k|X=x}=1ifg(x)=k,and=0ifg(x)̸=k. Byusing
the notation 1A to denote the the indicator of the set A, we can write: 1−Pr{g(X)=Y |X}=1−􏰀1g(x)=1Pr{Y =1|X=x}+1g(x)=0Pr{Y =0|X=x}􏰁,
Let’snowsubtractPr{g(X)=Y |X=x}fromPr{g∗(X)=Y |X=x}: Pr{g∗(X)=Y |X=x}−Pr{g(X)=Y |X=x}
= Pr {Y = 1 | X = x} 􏰀1 ∗
g (x)=1
− 1 􏰁 g(x)=1
􏰁
+ Pr {Y = 0 | X = x} 􏰀1 ∗
g (x)=0
− 1
(simplealgebra). NotingthatPr{Y =0|X=x}=1−Pr{Y =1|X=x},wecan
then write
Pr{g∗(X)=Y |X=x}−Pr{g(X)=Y |X=x}
= Pr {Y = 1 | X = x} 􏰀1 ∗ − 1 􏰁
g (X)=1􏰀 g(X)=1 +(1−Pr{Y =1|X=x}) 1g∗(x)=0 −1g(x)=0
􏰁
􏰁
(1)
Now, note that 1g∗(X)=0 = 1 − 1g∗(X)=1, etc. Hence, Pr{g∗(X)=Y |X=x}−Pr{g(X)=Y |X=x}
= Pr {Y = 1 | X = x} 􏰀1 ∗ − 1 􏰁 g (x)=1􏰀 g(x)=1
+(1−Pr{Y =1|X=x}) 1−1g∗(x)=1 −1+1g(x)=1 = (2Pr{Y =1|X=x}−1)􏰀1∗ −1 􏰁
(2)
Now, note that, for each x,
• if Pr{Y = 1 | X = x} > 1/2, then by definition of the Bayes Decision Rule,
1g∗(x)=1 = 1, and, in general 1g(x)=1 ≤ 1; thus, Eq 2 ≥ 0.
• if Pr {Y = 1 | X = x} < 1/2, then again by definition the Bayes Decision Rule, 1g∗(x)=1 = 0, and, in general 1g(x)=1 ≥ 0; thus, Eq 2 ≥ 0. This is true for X = x; Now, take the expectation with respect to f(X). 2 g (x)=1 g(x)=1 g(x)=0