Option One Title Here
ANLY-601
Advanced Pattern Recognition
Spring 2018
L7 – Parametric Classifiers
Parametric Classifiers
We’ve been talking about Bayes optimal classifiers, based on
probabilities
We showed that whatever cost you want to minimize, Bayes
optimal classifiers reduce to likelihood ratio tests
or in terms of the – log-likelihood ratio
ii Pandxp )|(
2
1
)|(
)|(
)(
2
1
xp
xp
xl
ln
)|(
)|(
ln)(
2
1
2
1
xp
xp
xh
2
Parametric Classifiers
Let’s leave probabilities aside for the moment, and
realize that these tests are asking us to construct a
function f(x) that enable us to pick which class x
belongs to – a discriminant function.
The discriminant function along with the threshold
determines the boundary of the regions L1 and L2.
So let’s forget about probabilities and just try to model
good discriminant functions. This is, in fact, easier.
(Why?)
3
Parametric Discriminant Functions
Linear Discriminant
2-class parametric classifier
Where f(x; q) is the discriminant function, and q is a vector of
parameters whose values fully specify f.
Linear example
where q consists of the vector V and the scalar vo (which is minus
the threshold).
Once we’ve chosen to construct the linear model, the classifier
design reduces to determining V and vo
2
1
1
1
);(
choose
choose
xf
0
2
1
0
vxV
T
4
Examples of Linear Classifiers
Inner-product, “correlation”, or “matched filter” classifiers
take a prototype Mi for each class and compare the feature
vector x to each.
Distance classifiers compare the squared (Euclidean in this
example) distance between class prototypes and the feature
vector
which reduces to
CxMxM
TT
2
1
12
CMxMx
2
1
2
2
2
1
1
2
2 2
1
2 1 1 22
/ 2
T T
M x M x M M C
5
When is Linear Bayes Optimal?
• Gaussian x with equal class covariance matrices.
• Binary variables with Bernoulli distribution
the – log-likelihood ratio is
which is linear in x.
iiii
x
i
x
ii
ppso
xpx
1)|0(,)|1(
)1()|(,1,0
1
1
2
2
1
2
1
2
1
1
1
ln
1
1
lnln
)|(
)|(
ln
x
xp
xp
6
When is Linear Bayes Optimal?
• Exponential distribution x is a n-dim vector with
components that are all non-negative xJ>0, J=1…n
[ x in (R+)n ]
(Note there are 2n values of Ji for a 2-class
problem.)
7
Ji
J
Ji
n
J
i
x
xp
exp
1
)|(
1
Linear Classifier Design
• Decision rule
is optimized by adjusting V and vo. Goal is to adjust these
to minimize the classification error rate.
• Perceptron learning algorithm directly minimizes
classification error, but only converges if the error can be
reduced to zero (on the training data).
• Usually optimize by minimizing (or maximizing) some
criterion function that is intuitively related to performance.
0)(
2
1
0
vxVxh
T
8
Criterion Functions
We’ll look at several, but first some notation
00]|[| vmVvxEVhE i
T
i
T
ii
VVxV
vxVh
i
T
i
T
i
T
ii
|var
]|var[|var 0
2
9
Criterion Functions
Fisher criterion
The numerator is a measure of the separation of the class
means along V. The denominator is the sum of the two
class variances along V.
Maximizing f with respect to V (it’s independent of vo) gives
VV
mmV
f
T
T
21
2
21
2
2
2
1
2
21
12
1
212 mmV
10
Criterion Functions
Prior-weighted discriminant
Maximizing this with respect to V and vo
gives
2
22
2
11
2
22
2
11
PP
PP
f
22110
12
1
2211
mPmPVv
mmPPV
T
11
Criterion Functions
Mean-squared error
Define g to be the class indicator function
and take the criterion function to be the mean squared
difference between g and the discriminant function h
2
1
,1
,1
)(
g
x
x
x
12
][]2|)()([2]|)()([2
]|[)|)(var(]|[)|)(var(
)()()(2]|[]|[
)()()(2)()()(
2211
2
2
2
22
2
1
2
11
2
211
2
222
2
111
2
2
2
21
2
1
222
ggg
gg
ggg
PPPP
ExxhEPxxhEP
hExhPhExhP
xExxhEhEPhEP
xExxhExhExxhEE
12
MSE Criterion
13
12
][
]2|)()([2]|)()([2
]|[)|)(var(
]|[)|)(var(
)()()(2
]|[]|[
)()()(2)()()(
2211
2
2
2
22
2
1
2
11
2
211
2
222
2
111
2
2
2
21
2
1
222
g
gg
gg
ggg
PPPP
E
xxhEPxxhEP
hExhP
hExhP
xExxhE
hEPhEP
xExxhExhExxhEE
Criterion Functions
Mean Square Error (cont’d)
We have
Using the expression on p9 for the conditional mean and
variances of h(x), and minimizing E with respect to V and vo
gives
and an expression for vo that I leave to you to derive.
12
)()(
2211
2
2
2
22
2
1
2
11
2
g
PP
PPxxhEE
12
1
2211 mmPPV
14
Design from Data
How do you design classifiers from (class-labeled)
data?
– Use training data to estimate priors, means,
covariance matrices.
– Form discriminant from parameters.
– Exercise on test data to estimate out-of-sample
performance.
15