程序代写代做代考 L09_Stats_B.ppt

L09_Stats_B.ppt

Classification (II)

• Feature Transformation
– Generates features y from x
– y usually lower dimension than x

• Classifier
– Partitions feature space into different regions

Observations

Feature
Transformation

Classifier

Feature
Vector

Outputy

Example

Cancerous

Normal
1x

211 xxy +=

Increasing y

Decision
Boundary?

)( 1yN Normal
Cancerous

Misclassification

• Impossible to completely separate classes
• Some will always be misclassified
• Good classifier will make fewest mistakes
• Need probability theory to analyse performance

Decision
Boundary?

)( 1yN Normal
Cancerous

Statistical Decisions
Decision
Boundary?

)( 1yN Class 1
Class 2

)1 class|( yp
)2 class|( yp

Decision Rules

• Best decision rule should make fewest
mistakes

• Need to quantify probability of error
• Optimal decision rule is one which

minimises the probability of error

Likelihood Ratio Test

• Classify y by choosing the class, ci which
has the highest conditional probability, )|( yicP

)(
)()|(

)|(
y

y
y

p
cPcp

cP iii =

Likelihood Ratio Test

• For two classes we have:
– Choose class 1 if

– Choose class 1 if

where L(y) is called the likelihood ratio

)(
)()|(

)(
)()|( 2211

y
y

p
cPcp

)(
)(

)|(
)|(

)(
1

cP
cP

cp
cp

L >=
y
y

Example

• Suppose we wish to decide if a cell is
cancerous by measuring how red (r) it is.
– Cancerous cells have

– Normal cells have

– If cancerous cells and normal cells are equally
likely, what is the best classification of a cell
with redness r?

))5(5.0exp(
2

1)|( 2−−= rcrp
π

))3(5.0exp(
2

1)|( 2−−= rnrp
π

Example (cont)

• The likelihood ratio is
• Cell is cancerous if

))3(5.0exp(
))5(5.0exp(

)(
2

−−
−−

=
r
r

1)(
)(
)( =>

cP
nPrL

1
))3(5.0exp(
))5(5.0exp(

>
−−
−−

r
r

0)3(5.0)5(5.0 22 >−+−− rr

0164 >−r

4>r

Example

• This is a special case (both distributions
have equal variance)

0 1 2 3

5 6 7 8 90

0 . 1

0 . 2

0 . 3

0 . 4

0 . 5

0 . 6

0 . 7

0 . 8

0 . 9

)()|( nPnrp

)()|( cPcrp

0 5 1 0 15
0

0 . 1

0 . 2

0 . 3

0 . 4

0 . 5

0 . 6

0 . 7

0 . 8

0 . 9

Differing variances

normal normal
cancerous

)()|( nPnrp

)()|( cPcrp

0
)(5.0)(5.0

2
>

−
+

−
−

n rr
σ

µ
σ

02 >++ cbrar

(Gives two thresholds in general)

1D Classifier

• Given examples {ai} from class A, {bi}
from class B.

• Estimate distributions p(x|A), p(x|B)
– For normal pdf, compute mean and covariance

• Select priors P(A),P(B).
• To classify new example x:
• Select class A if p(x|A)p(A)>p(x|B)p(B)

Modifying the threshold

• If t=P(B)/P(A) we make fewest errors
• If t< P(B)/P(A) we classify more A correctly, but make more mistakes on B • If t> P(B)/P(A) we classify more B

correctly, but make more mistakes on A

t
Bp
Ap

L >=
)|(
)|(

)( if 1 class Choose
y
y

ROC Curves

• “Receiver Operating Characteristic”
• Summarises performance of classifier as threshold

is changed
• Plot true positives (A’s correctly classified)

against false positives (B’s misclassified as A) for
different thresholds.

• Allow choice of threshold to achieve particular
error performance

ROC Curves

• True positives (A’s correctly classified)
• False positives (B’s misclassified as A)

True +ives

False +ives

Random classifier

Equal error point

ROC Curves

• Closer curve is to top-left, the better

True +ives

False +ives

Multi-variate Distributions

• PDFs extend to n dimensions
• In 1D
• In 2D

dxxpdxxxP )(]),([ ≈+

dxdyyxpdyyydxxxP ),(]),][,([ ≈++

y
Equal probability

contours

dxdyyxp ),( is herein Prob.

Multivariate Normal PDF

• In n dimensions, the normal distribution with
mean m and covariance S has pdf:

• The covariance of N samples is

( )Mcp 5.0exp),:( −=Smx

∑
=

−−=
N

T
iiN

S
1

))((
1

mxmx

2/12/ ||)2(
1

Sn
c

π
=)()( 1 mxSmx −−= −TM

Quadratic Classifiers

• Suppose we have training vectors from
several different classes

• For each class, compute the mean and
covariance to generate a normal
distribution,

• To classify a new vector, choose the class
which maximises

)|( icp x

)()|( ii cPcp x

Nearest Neighbour Classifiers

• Useful non-linear classifier
• Retain all training set
• Select class of new example as that of

`closest` vector in training set
• Require a distance metric
• Common metric is Euclidean distance,

),( 21 xxd

2
2121 ||),( xxxx −=d

k-NN Classifier

• Rather than choose single closest,
• Find k closest. (k odd)

– If kA from class A and kB from class B
– Choose class A if kA>kB

• More robust than single nearest neighbour

Support Vector Machines

• A powerful new type of (2 class) classifier
• Designed to minimise expected error over

an unseen test set:
– “Structural Risk Minimisation”
– Avoids problems of overspecialisation on

training set

• Particularly useful for small training sets

Example of (Non-Linear) SVM
Classification Space

Related Posts