Option One Title Here
ANLY-601
Advanced Pattern Recognition
Spring 2018
L5 – Bayes Classifiers (cont’d)
Summary of Dichotomy (Two-Class)
Hypothesis Tests
• Bayes least error rate
• Bayes least cost
1
2
2
1
)|(
)|(
)(
P
P
xp
xp
xl ><
1
2
1
2
1121
2212
2
1
)|(
)|(
)(
P
P
xp
xp
xl
><
1
2
2
Summary of Dichotomy
Hypothesis Tests
• Neyman-Pearson
with
• Minimax
theshold such that
)|(
)|(
2
1
xp
xp
>
<
1
2
0Ε
)(
2
1
)|(
L
n
xdxp
)|(
)|(
2
1
xp
xp
>
<
1
2
21 ΕΕ
3
Multi-Hypotheses
Suppose there are L classes 1, ..., L and
decision costs ij for choosing i when j is true.
Then the minimal cost decision rule is
pick k where )|( minarg )( minarg
1
xpxRk
jij
L
ji
i
i
When jiand ijii ,1 0
the cost is just the average error rate, and the decision rule
is
pick k k arg max
i
p(i | x)
4
Reject Option
For a 2-class, least error rate problem, when the posteriors are close to
0.5, the error rate will be large
One might want to establish a window for rejection within which we
refuse to make a judgment
2
( | )p x
)|(min)( xpx i
i
E
x
t
0.5
L1 L2
L(t) Reject region
1
( | )p x
Reject Option
Reject rate
Error rate
)(
)()(Prob
tL
n
xdxptLx
xdxpxpxp n
tL
)()|(),|(min
)(
21 E
x
t
0.5
L1 L2
L(t) Reject region – lower posterior
greater than t
p(2|x)
p(1|x)
6
Reject Option
Error rate
1 2
( )
1 1 2 2 1 2
( )
min ( | ), ( | ) ( )
min ( | ), ( | )
n
L t
n
L t
p x p x p x d x
P p x P p x d x P P
1 2
E
E E
x
P1 p(x|1) P2 p(x|2)
L(t)
7
Reject Option
• Reject option lowers error rate by refusing to make
decisions on feature values x where the error rate is high
(near the crossing of the posterior curves).
• A larger reject region (smaller t ) lowers the error, and
increases the rate at which we refuse to make a decision.
x
t
0.5
L1 L2
L(t) Reject region – lower posterior
greater than t
p(2|x)
p(1|x)
8
Sequential Hypothesis Tests
Have sequence of observations
assumed to be independent and identically distributed (i.i.d.).
May be from a timeseries, e.g. speech segments,
manufacturing production run …
Each sequence is from one of two possible classes.
Suppose we want to continue to accrue information from this
sequence until we have enough information to make a
decision -- e.g. maybe we have a reject threshold to
overcome.
It seems clear that if we make many measurements (e.g. on
consecutive items in a manufacturing production run) that
we’ll improve our classification results.
n
xxx ,...,,
21
9
Sequential Hypothesis Tests
- log likelihood ratio
How does H behave relative to hi ? Let’s look at
its mean and variance
m
i
i
m
i i
i
m
m
m
xh
xp
xp
xxxp
xxxp
xxxH
11 2
1
221
121
21
)(
)|(
)|(
ln
)|...,,,(
)|...,,,(
ln)...,,,(
2
1
|var)(var|var
||
iii
m
i
ii
iii
mhmxhH
mhEmHE
10
Sequential Hypothesis Test
Conditional mean
Can bound i even for arbitrary density by appeal to the
inequality ln z <= z-1 This gives
xdxp
xp
xp
hE
n
iii
)|(
)|(
)|(
ln|
2
1
.0,
0
011)|(1
)|(
)|(
)|(
)|(
)|(
ln
2
1
1
1
2
1
1
2
1
Similarly
So
xdxp
xp
xp
xdxp
xp
xp
n
n
11
Sequential Hypothesis
Test
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
-20 -15 -10 -5 0 5 10 15 20
h
p(h|w2)p(h|w1)
Separation increases
with increasing number
of observations as m1/2.
We have
2
21
|var
|
0,0
ii
ii
mH
mHE
m 1 m 2
p(H|1) p(H|2)
A convenient measure of
separation between the two
classes is
2
2
2
1
12
12
12
|var|var
||
m
HH
HEHE
12
Sequential Hypothesis
Tests
m=1 m=10 m=50
13
Wald Test for
Sequential Observations
Terminate sequence of observations when H
reaches some threshold -- e.g. when
otherwise, continue gathering measurements.
0|
0|)(
22
1
1
1
mHE
mHExhH
m
k
km
2
1
choosebH
or
chooseaH
m
m
14
Wald Test
• Wald showed that
– Error rates: When h(x) is small
– Average sequence length to reach threshold
is
ba
eBeA
BA
B
BA
AB
,
1
,
)1(
1 2EE
2
22
2
1
11
1
)1(
|
)1(
|
EE
EE
ba
mE
ba
mE
15