Option One Title Here
ANLY-601
Advanced Pattern Recognition
Spring 2018
L17 — Nonparametric Classifiers
Non-Parametric Classifiers
Kernel or Parzen Classifiers
The obvious approach is to use a kernel density estimate for each of the
class-conditional densities and proceed with a likelihood ratio test
Error estimation – use leave-one-out (see x-validation) to set kernel width.
Leave-one-out is cheap for kernel estimators:
t
xx
xx
xp
xp
N
i
jN
N
i
jN
2
1
2
2
1
1
1
)2(
2
1
1
)1(
1
1
2
1
)(
)(
)|(ˆ
)|(ˆ
)0()()(ˆ
1
1
1
N
j
ikNkL
xxxp
2
KNN Classifiers
Two paradigms:
• KNN density estimates followed by likelihood ratio test
• Voting KNN. This is what people usually mean when they
refer to a KNN classifier.
For each test sample, select K nearest neighbors from the
training data (all classes). The sample is classified
according to the majority class of the k neighbors.
3
Asymptotics for Large N
Error Rates for Voting KNN
K=1 Classify the point x as the class of its nearest neighbor
The conditional risk is (classification error rate given x and xNN)
In the limit the data points become closer and closer together,
and we can replace with leaving
x
iiNN xx
)()()()(
,andProb,andProb
1221
12211
NNNN
NNNNNNNN
xPxPxPxP
xxxxxxxxr
N
)( NNi xP )( xP i
)|()|(2lim 211
*
1 xpxprr
N
4
For Odd Number of Neighbors in General
Asymptotically, and for odd number of neighbors, the error rate
at x becomes
ki
k
i
ik
N
k xpxpk
k
xpxp
i
i
rr )|()|(
2
)|()|(
1
22
lim 212
1
21
1
1
12
*
12
421
3
21
2
2121
*
7
3
21
2
2121
*
5
2
2121
*
3
21
*
1
)|()|(40)|()|(2
)|()|()|()|(
)|()|(12)|()|()|()|(
)|()|(4)|()|(
)|()|(2
xpxpxpxp
xpxpxpxpr
xpxpxpxpxpxpr
xpxpxpxpr
xpxpr
5
Bayes vs KNN Error
Recall that the Bayes risk at x is
and the error rate for (odd) KNN for infinitely many training
points
One can use these expressions to find the famous bound
ki
k
i
ik
xpxp
k
k
xpxp
i
i
r )|()|(
2
)|()|(
1
22
212
1
21
1
1*
12
1
21i
1
212
1
21
)|()|(
1
22
expansion seriesTaylor by or
)|()|(411)|(),|(min
i
i
B
B
xpxp
i
i
r
xpxpxpxpr
6
BB rrrrr 2…
*
1
*
3
*
5
Bayes vs KNN Error
The KNN bound is tighter than the Bhattacharya bound )|()|( 21 xpxp
2* Bayes
Bhatt
r1
r3
r5
r7
Bayes
7
Error Rate for Even Number of Neighbors
With even number of neighbors, we can have a tie (for two classes).
When there’s a tie, we don’t make a choice. This leads to a slightly
lower error rate
and a sequence of bounds
i
k
i
ik
xpxp
i
i
r )|()|(
1
22
21
1
1*
2
BBB rrrrrrrrr 2……
*
1
*
3
*
5
*
6
*
4
*
22
1
8