Option One Title Here
ANLY-601
Advanced Pattern Recognition
Spring 2018
L02 – Bayes Classifier
Bayesian Decision Theory
Two classes 1, 2
Class priors P1, P2
Class-conditional densities p(x | i ) or likelihood
Posteriors by Bayes rule
P(i | x) = Pi p(x | i ) / p(x)
where the unconditional density is
p(x) = P1 p(x | 1 ) + P2 p(x | 2 )
Posterior
The posterior probability
P(i | x) = Pi p(x | i ) / p(x)
is proportional to the
prior * likelihood
Proportionality constant insures normalization
Bayes Decision Rule
Seems intuitive to choose the most likely class, given the feature measurement vector x and the class priors
Don’t need the p(x) factor.
This is, as we show next, the proper rule to use if we want to minimize the error rate.
Bayes Error Rate
Bayes decision rule induces a decision surface in the feature space —
L1 choose 1
L2 choose 2
L1
x
Error Rate
Error rate for the feature vector x is
or
Minimum Error Rate
Total error rate is
which is minimized if
Bayes Minimal Error Rule
Decision rule : Assign x to the class with highest posterior
In terms of likelihood ratios
Sometimes use log likelihood ratio
>
<
1
2
>
<
1
2
>
<
2
1
Bayes Decision Surface
e.g. Gaussian class-conditional densities
Bayes Decision Surface
for Gaussian Densities
Likelihood ratio is
its -log is
Bayes Decision Surface
Gaussian Class Conditional Densities
Generalized Cost
Suppose each of the two classification error types have different cost. What’s the ideal decision strategy?
(
)
(
)
1
)
(
)
(
)
(
)
|
(
)
(
)
|
(
|
|
2
2
1
1
2
1
=
=
+
=
+
x
p
x
p
x
p
x
p
P
x
p
x
p
P
x
P
x
P
w
w
w
w
.
.
),
|
(
)
|
(
2
1
2
1
w
w
w
w
choose
Otherwise
choose
x
P
x
P
If
>
.
.
,
)
|
(
)
|
(
2
1
2
2
1
1
w
w
w
w
choose
Otherwise
choose
P
x
p
P
x
p
If
>
î
í
ì
=
1
2
2
1
)
|
(
)
|
(
)
|
(
w
w
w
w
choose
if
x
P
choose
if
x
P
x
error
P
î
í
ì
Î
Î
=
1
2
2
1
)
|
(
)
|
(
)
|
(
L
x
x
P
L
x
x
P
x
error
P
w
w
ò
ò
ò
+
=
=
¥
¥
–
2
1
2
)
(
)
|
(
)
(
)
|
(
)
(
)
|
(
)
(
1
L
L
dx
x
p
x
P
dx
x
p
x
P
dx
x
p
x
error
P
error
P
w
w
2
1
2
1
2
1
)
|
(
)
|
(
)
|
(
)
|
(
L
x
for
x
P
x
P
L
x
for
x
P
x
P
Î
>
Î
>
w
w
w
w
)
|
(
)
|
(
2
1
x
P
x
P
w
w
)
(
)
|
(
)
|
(
)
(
1
2
2
1
threshold
P
P
x
p
x
p
x
l
h
w
w
º
º
1
2
()log(()),()loglog
P
hxlxhx
P
h
æö
=-=-
ç÷
èø
(
)
(
)
1
1
2
1
(|)exp
(2)
[|]
cov[|][()()|]
T
iiii
N
i
ii
T
iiiii
pxxx
Ex
xExx
wmm
p
mw
wmmw
–
=–S-
S
=
S==–
(
)
(
)
(
)
(
)
(
)
2
1
2
2
1
1
1
1
2
1
1
2
2
1
exp
)
|
(
)
|
(
)
(
m
m
m
m
w
w
–
S
–
–
–
S
–
–
S
S
=
º
–
–
x
x
x
x
x
p
x
p
x
l
T
T
(
)
(
)
(
)
(
)
(
)
form
quadratic
x
M
x
x
B
C
x
x
x
x
x
l
x
h
T
T
T
T
+
+
=
–
S
–
–
–
S
–
+
S
+
S
–
=
–
=
–
–
2
1
2
2
1
1
1
1
2
1
1
2
1
2
2
1
log
log
)
)
(
log(
)
(
m
m
m
m
2
1
S
¹
S
2
1
S
=
S
/docProps/thumbnail.jpeg