L09_Stats_B.ppt
1
Classification (II)
• Feature Transformation
– Generates features y from x
– y usually lower dimension than x
• Classifier
– Partitions feature space into different regions
x
Observations
Feature
Transformation
Classifier
Feature
Vector
Outputy
Example
Cancerous
Normal
1x
2x
211 xxy +=
Increasing y
Decision
Boundary?
1y
)( 1yN Normal
Cancerous
Misclassification
• Impossible to completely separate classes
• Some will always be misclassified
• Good classifier will make fewest mistakes
• Need probability theory to analyse performance
Decision
Boundary?
1y
)( 1yN Normal
Cancerous
Statistical Decisions
Decision
Boundary?
1y
)( 1yN Class 1
Class 2
1y
)1 class|( yp
)2 class|( yp
Decision Rules
• Best decision rule should make fewest
mistakes
• Need to quantify probability of error
• Optimal decision rule is one which
minimises the probability of error
Likelihood Ratio Test
• Classify y by choosing the class, ci which
has the highest conditional probability, )|( yicP
)(
)()|(
)|(
y
y
y
p
cPcp
cP iii =
2
Likelihood Ratio Test
• For two classes we have:
– Choose class 1 if
– Choose class 1 if
where L(y) is called the likelihood ratio
)(
)()|(
)(
)()|( 2211
y
y
y
y
p
cPcp
p
cPcp
>
)(
)(
)|(
)|(
)(
1
2
2
1
cP
cP
cp
cp
L >=
y
y
y
Example
• Suppose we wish to decide if a cell is
cancerous by measuring how red (r) it is.
– Cancerous cells have
– Normal cells have
– If cancerous cells and normal cells are equally
likely, what is the best classification of a cell
with redness r?
))5(5.0exp(
2
1)|( 2−−= rcrp
π
))3(5.0exp(
2
1)|( 2−−= rnrp
π
Example (cont)
• The likelihood ratio is
• Cell is cancerous if
))3(5.0exp(
))5(5.0exp(
)(
2
2
−−
−−
=
r
r
rL
1)(
)(
)( =>
cP
nPrL
1
))3(5.0exp(
))5(5.0exp(
2
2
>
−−
−−
r
r
0)3(5.0)5(5.0 22 >−+−− rr
0164 >−r
4>r
Example
• This is a special case (both distributions
have equal variance)
0 1 2 3
4
5 6 7 8 90
0 . 1
0 . 2
0 . 3
0 . 4
0 . 5
0 . 6
0 . 7
0 . 8
0 . 9
1
4
)()|( nPnrp
)()|( cPcrp
0 5 1 0 15
0
0 . 1
0 . 2
0 . 3
0 . 4
0 . 5
0 . 6
0 . 7
0 . 8
0 . 9
1
Differing variances
normal normal
cancerous
)()|( nPnrp
)()|( cPcrp
0
)(5.0)(5.0
2
22
2
>
−
+
−
−
c
c
n
n rr
σ
µ
σ
µ
02 >++ cbrar
(Gives two thresholds in general)
1D Classifier
• Given examples {ai} from class A, {bi}
from class B.
• Estimate distributions p(x|A), p(x|B)
– For normal pdf, compute mean and covariance
• Select priors P(A),P(B).
• To classify new example x:
• Select class A if p(x|A)p(A)>p(x|B)p(B)
3
Modifying the threshold
• If t=P(B)/P(A) we make fewest errors
• If t< P(B)/P(A) we classify more A
correctly, but make more mistakes on B
• If t> P(B)/P(A) we classify more B
correctly, but make more mistakes on A
t
Bp
Ap
L >=
)|(
)|(
)( if 1 class Choose
y
y
y
ROC Curves
• “Receiver Operating Characteristic”
• Summarises performance of classifier as threshold
is changed
• Plot true positives (A’s correctly classified)
against false positives (B’s misclassified as A) for
different thresholds.
• Allow choice of threshold to achieve particular
error performance
ROC Curves
• True positives (A’s correctly classified)
• False positives (B’s misclassified as A)
0
1
1
True +ives
False +ives
Random classifier
Equal error point
ROC Curves
• Closer curve is to top-left, the better
0
1
1
True +ives
False +ives
Multi-variate Distributions
• PDFs extend to n dimensions
• In 1D
• In 2D
dxxpdxxxP )(]),([ ≈+
dxdyyxpdyyydxxxP ),(]),][,([ ≈++
x
y
Equal probability
contours
dx
dy
dxdyyxp ),( is herein Prob.
Multivariate Normal PDF
• In n dimensions, the normal distribution with
mean m and covariance S has pdf:
• The covariance of N samples is
( )Mcp 5.0exp),:( −=Smx
∑
=
−−=
N
i
T
iiN
S
1
))((
1
mxmx
2/12/ ||)2(
1
Sn
c
π
=)()( 1 mxSmx −−= −TM
4
Quadratic Classifiers
• Suppose we have training vectors from
several different classes
• For each class, compute the mean and
covariance to generate a normal
distribution,
• To classify a new vector, choose the class
which maximises
)|( icp x
)()|( ii cPcp x
Nearest Neighbour Classifiers
• Useful non-linear classifier
• Retain all training set
• Select class of new example as that of
`closest` vector in training set
• Require a distance metric
• Common metric is Euclidean distance,
),( 21 xxd
2
2121 ||),( xxxx −=d
k-NN Classifier
• Rather than choose single closest,
• Find k closest. (k odd)
– If kA from class A and kB from class B
– Choose class A if kA>kB
• More robust than single nearest neighbour
Support Vector Machines
• A powerful new type of (2 class) classifier
• Designed to minimise expected error over
an unseen test set:
– “Structural Risk Minimisation”
– Avoids problems of overspecialisation on
training set
• Particularly useful for small training sets
Example of (Non-Linear) SVM
Classification Space