程序代写代做代考 Bayesian Option One Title Here

Option One Title Here

ANLY-601
Advanced Pattern Recognition

Spring 2018

L6 – Bounds on Error Rates

Bounding the Bayes Error Rate

Recall the likelihood ratio tests, written in terms of the negative log of the likelihood

Computation of the Bayes error rate for all but the simplest cases (e.g. Gaussian
features x with equal class-conditional covariance matrices) is difficult.

Calculating the integrals

is difficult or impossible

dhhpxdxp

)|()|(











t
xp

xp
xh 














 








log
)|(

)|(
log )(

2
2

Bounding the Bayes Error Rate

Instead, we seek bounds on the error that are
Hopefully tight

Easy to calculate

In practice, we will use empirical error estimates, but
the theoretical bounds here provide useful tools to
reason with.

The bounds point out factors entering the error rate.
That is, the theoretical framework helps you
conceptualize the problem.

A Calculable Case

Let both class-conditional densities be normal, with the

same covariance matrices S1 = S2 = S. The neg. log-

likelihood ratio is then

since x is Gaussian, and h is linear in x, h is Gaussian. Its

mean and variance are

   
2

12
11

12
)( MMMMxMMxh

TTT 
SSS

     

    



122
1

S

S

S



MMMM

MMMMhE

Calculable Case

The error rates become



p(h|1)

p(h|2)



L2L1

2 2

1 1

( | )

( | ) 1

1
( )

t
p h dh erf

erf z e d























 
   

 

 
    

 





Back to Bounds

Chernoff Bound

Classification error rate is

Use inequality

to obtain

 

1 1 2 2

2 1

1 1 2 2

( | ) ( | )

min ( | ), ( | )

n n

L L

P p x d x P p x d x

P p x P p x d x

 

 



 



10,0,;],min[
1




sbababa
ss

 
 )(1

min

)|()|(min

sss

nssss

ePP

xdxpxpPP









 EE

Simplification – Bhattacharyya
Bound

Don’t insist on minimizing with respect to s, but take s=1/2

for x normal, the exponent is

)2/1(

2121
)|()|(








 
ePP

xdxpxpPP
n

U
E

   


















SS








 SS




2
1

21
12

2
ln

1
)2/1( MMMM

T


Bhattacharyya Bound – Special Cases

• Equal Means

Equal Covariances



















SS



2
1 2ln)2/1(

   
12

12
8

1
)2/1( MMMM

T
S




Single Hypothesis
Tests

Single Hypothesis

May have data predominantly from one class — e.g. failure

analysis, usually have many examples of “healthy” function

and only a few failures; may have many different failure

modes.

Given an observation x, want to know if the corresponds to a

healthy, or a failed example. Also called “novelty detection.”

May use some kind of distance measure – e.g.

d
2

 (x  m)
T

S
1

(x m)

where S and m are the covariance and mean of the
class 1 that one is able to model well.

Single Hypothesis

To use this in classification

Given x, compute d2 and compare with a threshold

d
2



1


not 1

where the threshold c is chosen based on both in-class,

and out-of-class data, or …

Single Hypothesis

Statistics of d2

d2 is a random variable, with mean and variance

Note: d2 is the sum of N random identically distributed

random variables. For large dimension N, we can invoke

the central limit theorem with the result that d2 becomes normally

distributed.

   

    2242

]var[

))((

])( )( [ ][

dEdEd

NTracemxmxETrace

mxmxEdE



SSS

S





Single Hypothesis

Other “distance” measures

– Model the probability distribution for objects in the

known class – e.g. as a Gaussian mixture

where the parameters are fit by maximum

likelihood. Then classify a new datum x0 using a

threshold test

 
 

   111 2
1

1
| exp

k k k k
n

k
k

p x a x m x m






   S 
S



kkk
ma S,,

cxp

not

)|(ln









Single Hypothesis

Other “distance” measures, continued

Model known class by PCA subspace

Then classify a new datum x0 by a threshold test on its
perpendicular distance to the hyperplane.

The hyperplane serves as a geometric model of the
known class. See Oja, Subspace methods of Pattern
Recognition, Wiley and Sons, for other examples.

hyperplane spanned by leading m largest variance directions.

m
Xo

Single Hypothesis

Other distance measures (cont’d)

Model known class as a curved manifold (constructed e.g. with a neural

network). Measure distance between data x0 and its projection F(x0)

onto the manifold.

The threshold test is

0 0

( ; )

not

d x F x c






 
 Xo

F(Xo)

Problems with Single
Hypothesis Tests

• Usually perform worse than full Bayesian tests.

If you have enough data to model both classes,

performance should improve.

• We turned to the single hypothesis test because

one of the two classes is too sparsely sampled

to model well. How does one pick the decision

threshold c?

Related Posts