程序代写代做代考 algorithm Option One Title Here

Option One Title Here

ANLY-601
Advanced Pattern Recognition

Spring 2018

L7 – Parametric Classifiers

Parametric Classifiers

We’ve been talking about Bayes optimal classifiers, based on
probabilities

We showed that whatever cost you want to minimize, Bayes
optimal classifiers reduce to likelihood ratio tests

or in terms of the – log-likelihood ratio

ii Pandxp )|( 


2

1

)|(

)|(
)(

2

1




xp

xp
xl


ln
)|(

)|(
ln)(

2

1

2

1 



xp

xp
xh

2

Parametric Classifiers

Let’s leave probabilities aside for the moment, and
realize that these tests are asking us to construct a
function f(x) that enable us to pick which class x
belongs to – a discriminant function.

The discriminant function along with the threshold
determines the boundary of the regions L1 and L2.

So let’s forget about probabilities and just try to model
good discriminant functions. This is, in fact, easier.
(Why?)

3

Parametric Discriminant Functions

Linear Discriminant

2-class parametric classifier

Where f(x; q) is the discriminant function, and q is a vector of

parameters whose values fully specify f.

Linear example

where q consists of the vector V and the scalar vo (which is minus

the threshold).

Once we’ve chosen to construct the linear model, the classifier

design reduces to determining V and vo




2

1

1

1
);(



choose

choose
xf

0

2

1

0


 vxV

T

4

Examples of Linear Classifiers

Inner-product, “correlation”, or “matched filter” classifiers

take a prototype Mi for each class and compare the feature

vector x to each.

Distance classifiers compare the squared (Euclidean in this

example) distance between class prototypes and the feature

vector

which reduces to

CxMxM
TT

2

1

12


CMxMx

2

1

2
2

2
1




 
1

2

2 2
1

2 1 1 22
/ 2

T T
M x M x M M C


  

5

When is Linear Bayes Optimal?

• Gaussian x with equal class covariance matrices.

• Binary variables with Bernoulli distribution

the – log-likelihood ratio is

which is linear in x.

 

iiii

x
i

x
ii

ppso

xpx









1)|0(,)|1(

)1()|(,1,0
1






















1

2

2

1

2

1

2

1

1

1
ln

1

1
lnln

)|(

)|(
ln


x

xp

xp

6

When is Linear Bayes Optimal?

• Exponential distribution x is a n-dim vector with

components that are all non-negative xJ>0, J=1…n

[ x in (R+)n ]

(Note there are 2n values of Ji for a 2-class

problem.)

7









 Ji

J

Ji

n

J
i

x
xp


 exp

1
)|(

1

Linear Classifier Design

• Decision rule

is optimized by adjusting V and vo. Goal is to adjust these

to minimize the classification error rate.

• Perceptron learning algorithm directly minimizes

classification error, but only converges if the error can be

reduced to zero (on the training data).

• Usually optimize by minimizing (or maximizing) some

criterion function that is intuitively related to performance.

0)(

2

1

0


 vxVxh

T

8

Criterion Functions

We’ll look at several, but first some notation

  00]|[| vmVvxEVhE i
T

i
T

ii  

 

  VVxV
vxVh

i
T

i
T

i
T

ii







|var

]|var[|var 0
2

9

Criterion Functions

Fisher criterion

The numerator is a measure of the separation of the class
means along V. The denominator is the sum of the two
class variances along V.

Maximizing f with respect to V (it’s independent of vo) gives

    
 VV

mmV
f

T

T

21

2

21
2
2

2
1

2
21









   12
1

212 mmV 

10

Criterion Functions

Prior-weighted discriminant

Maximizing this with respect to V and vo
gives

2
22

2
11

2
22

2
11





PP

PP
f


   

 22110

12
1

2211

mPmPVv

mmPPV

T




11

Criterion Functions

Mean-squared error

Define g to be the class indicator function

and take the criterion function to be the mean squared

difference between g and the discriminant function h







2

1

,1

,1
)(


g
x

x
x

        
   

   

      12
][]2|)()([2]|)()([2

]|[)|)(var(]|[)|)(var(

)()()(2]|[]|[

)()()(2)()()(

2211
2
2

2
22

2
1

2
11

2
211

2
222

2
111

2
2

2
21

2
1

222













ggg



gg

ggg

PPPP

ExxhEPxxhEP

hExhPhExhP

xExxhEhEPhEP

xExxhExhExxhEE

12

MSE Criterion

13

        

   
 

 

      12
][

]2|)()([2]|)()([2

]|[)|)(var(

]|[)|)(var(

)()()(2

]|[]|[

)()()(2)()()(

2211

2

2

2

22

2

1

2

11

2

211

2

222

2

111

2

2

2

21

2

1

222

















g

gg





gg



ggg

PPPP

E

xxhEPxxhEP

hExhP

hExhP

xExxhE

hEPhEP

xExxhExhExxhEE

Criterion Functions

Mean Square Error (cont’d)

We have

Using the expression on p9 for the conditional mean and
variances of h(x), and minimizing E with respect to V and vo

gives

and an expression for vo that I leave to you to derive.

      
  12

)()(

2211

2

2

2

22

2

1

2

11

2







g

PP

PPxxhEE

   12
1

2211 mmPPV 

14

Design from Data

How do you design classifiers from (class-labeled)

data?

– Use training data to estimate priors, means,

covariance matrices.

– Form discriminant from parameters.

– Exercise on test data to estimate out-of-sample

performance.

15