程序代写代做代考 Option One Title Here

Option One Title Here

ANLY-601
Advanced Pattern Recognition

Spring 2018

L3-4 – Bayes Classifiers (cont’d)

General Cost

• Suppose each of the two classification error
types have different cost. What’s the ideal
decision strategy?

e.g. In a detection problem (medical, fire alarm),
false negatives may be much more costly than
false positives.

• Define an average “loss” (or cost) function and
devise a decision rule that minimizes it.

Loss Function

Cost incurred for choosing class wi when wj is the

actual class

Function to minimize is average cost or “risk”

ij

 

 

 

 
11221

22112

22222

11111

|

|

|

|

PtrueischooseP

PtrueischooseP

PtrueischooseP

PtrueischoosePR

ww

ww

ww

ww

Loss Function

Now

and similarly for the other terms

So

    xdPxpPtrueischooseP n

L

22

1

1222112
|| www 

    

     xdPxpPxp

xdPxpPxpR

n

L

n

L





2

22221121

1

11112212

||

||

ww

ww

Re-write Loss

Note that

so

    

     xdPxpPxp

xdPxpPxpR

n

L

n

L





2

22221121

1

11112212

||

||

ww

ww

   
2 1

| 1 |
n n

i i

L L

p x d x p x d xw w  

       

21 1 22 2

11 21 1 1 12 22 2 2

1

| |
n

L

R P P

p x P p x P d x

 

  w   w

 

     

Minimum Loss

To minimize this, we want the integrand to be as
negative as possible, so

This tells us how to assign each x to either L1 or L2

       

21 1 22 2

11 21 1 1 22 12 2 2

1

| |
n

L

R P P

p x P p x P d x

 

  w   w

 

     

       
1112111221222

|| LxPxpPxp  ww

Minimum Loss

Or (multiply both sides by -1, and change reverse inequality)

another likelihood ratio test.

       
1112111221222

|| LxPxpPxp  ww

>
< w1 w2 1 2 1121 2212 2 1 )|( )|( )( P P xp xp xl   w w    Neyman-Pearson Test Suppose we don’t know the cost of each type of error, or the priors. How do we proceed? Can only work with conditional errors A way to proceed is to minimize E1 subject to some specified acceptable E2 = Eo. This is a constrained minimization problem that uses the Lagrange multiplier formulation.     1 2 112 2 1 221 )|(| )|(| Ε Ε     L n L n xdxptrueischooseP xdxptrueischooseP www www Neyman-Pearson Test We want to minimize E1 subject to the constraint E2=Eo. The Lagrangian (the function to minimize) is                       0 0 Ε ΕΕΕ 12 )|()|( 21 21 L nn L xdxpxdxp r ww  Neyman-Pearson     xdxpxp xdxpxdxpr n L L nn L                      1 12 1 2 2 1 )|()|(1 )|()|( ww ww 0 0 Ε Ε Re-write the Lagrangian r will be minimized when the integrand is kept negative, so the decision rule is another likelihood ratio test. w w w w )|( )|( 1 2 2 1   xp xp Neyman-Pearson Test • What’s the threshold ? It’s set by requiring that the constraint be satisfied 0 Ε          1 2 )|( L n xdxp w Hypothesis Test Continued Minimax Test -- We’ve been using likelihood ratio tests like but what happens if the priors change after the test is designed? One approach - construct test so that its performance is no worse than the worst possible Bayes test. 1 1 12 22 2 2 21 11 1 2 ( | ) ( | ) p x P p x P w w w    w        Minimax Test Rewrite the expected loss, using 1 21  PP xdxp xdxpP xdxpR n L L n L n           1 2 1 )|( )( )|( )( )( )|( )( 22212 1112122111 1221222 w w w What’s this look like as a function of P1 ? eqn (***) Minimax Test P1 R 1.0 Bayes expected cost with L1 and L2 chosen optimally for priors P10 Expected cost for model with L1 and L2 chosen for P10 , then used at varying P1 Minimax Test P1 R 1.0 P10 Bayes expected cost with L1 and L2 chosen optimally for priors Expected cost for model with L1 and L2 chosen for P10 , then used at varying P1 Performance is no worse than worst Bayes performance Best operating point if priors are not known Want dR/dP1 for CONSTANT L1 to be zero. Minimax Test From eqn(***) 21 ΕΕ )( )( )( 0 221222111121 , 1 21    LLFixed dP dR For example, if the cost for each kind of correct decision are the same, and the costs for each type of error are the same I c   2112 2211   and the above condition becomes 21 ΕΕ  choose the operating point that gives equal rates for both kinds of error. ROC Curves • Likelihood ratio tests are threshold tests, with the threshold defined by the priors, and the decision costs. As the priors and decision costs change, the threshold changes and the rate of each kind of error changes. • The Receiver Operating Characteristic (or ROC) curve shows the system behavior over the full range of thresholds. • The ROC is determined only by the class conditional probability distributions for the measured features. ROC Curves Recall the error rates      jiL n j jji xdxp trueischoose )|( ) | P( w wwjΕ To display the system performance at a glance, we’ll plot the error rates as a function of threshold. This is the ROC curve. where the different regions L1 and L2 are defined by the likelihood ratio test p(x |w1) p(x |w2 )   w2 w1  ROC Curves p(x |w1) p(x |w2 )   w2 w1  E2 1 - E1  = 0 ROC Curves • Concave downward • ROCs are above the diagonal • Slope = threshold Log-Likelihood Sometimes it’s more convenient to work with the conditional distribution of the negative log-likelihood than the conditional distribution of the features x. p(x |w1) p(x |w2 )   w2 w1  Recall the likelihood ratio test with threshold  In terms of negative log-likelihood h   log p(x |w1) p(x |w2 ) h    w2 w1  log  x is a random vector, so h is a random scalar, and has distribution p(h|wi) when wi is true Log Likelihood We can rewrite the error probabilities in terms of integrals over the distribution for h h    w2 w1  log  E1  p(x |w1) d n x  p(h |w1)  log    dh L2  E2  p(x |w2 ) d n x  p(h |w2 )   log   dh L1 