Option One Title Here
ANLY-601
Advanced Pattern Recognition
Spring 2018
L16 — Nonparametric Density
Models (cont’d)
K-NN Estimates
Parzen window (uniform kernel) — kernel volume was fixed and we counted
the number of samples falling inside the volume to estimate p(x).
K-nearest neighbor estimator, choose point x at which we estimate the
density, and construct the smallest region L(x) that contains k points. Then
estimate the density at x as
)(
1
)(ˆ
xVN
k
xp
where N is the total number of points, V(x) the volume of the minimal n-dim
region containing k points.
The numerator k-1 gives the estimate lower bias than if it were k.
2
K-NN Estimates
Parzen window (uniform kernel) — kernel volume was fixed and we counted
the number of samples falling inside the volume to estimate p(x).
K-nearest neighbor estimator, choose point x at which we estimate the
density, and construct the smallest region L(x) that contains k points. Then
estimate the density at x as
)(
1
)(ˆ
xVN
k
xp
If r is the distance from x to the kth nearest neighbor, then
we can take V(x) to be the volume inside the n-sphere of radius r
where G is the Euler gamma function.
n
n
n
rV
2
2
2/
G
3
Bias and Variance of KNN
Bias and variance of ˆ p (x)
k 1
N V(x)
n
N
kxpxpE
/21
2
11)( )(ˆ
k
xp
xp
)(
)(ˆvar
2
Notice that to decrease the variance, we increase
the number of nearest neighbors k used. But
this grows V and gives a coarser estimate of p(x)
— i.e. the bias increases. So we have a
bias/variance trade-off to contend with.
where n is the dimension, and j depends
on the density and dimension.
The 2nd term is usually small, so is
approx. unbiased.
)(ˆ xp
Source – Keinosuke Fukunaga, Statistical Pattern Recognition, 2nd Ed., Acad.Press
4
Conceptual Interlude:
Intrinsic Dimensionality
Data handed to us in high-dimensional spaces may, in fact, actually lie
near some lower dimensional sub-manifold. We can find the local
dimension of this sub-manifold by using the k-NN density estimate.
The idea is that the intrinsic dimension (of the manifold) dictates how
the number of data points will increase as we increase the radius of
the region L(x).
Consider data that lies on a plane in 3-space
x
5
Conceptual Interlude:
Intrinsic Dimensionality
Consider data that lies on a plane in 3-space
x
If we erect a small sphere about the point x
(open circle) and grow the radius of the
sphere, the number of points inside will
grow only as r2 (the area in the plane
intercepted by the sphere), not as r3 as
would be the case if the data filled 3-space
rather than just the plane.
6
Intrinsic Dimensionality
We can derive the scaling of the number of points in the sphere with its
radius on purely dimensional grounds. The volume of the n-sphere
containing k points is
The radius is related to the volume by
So the radius of the sphere
containing k points is
The radius containing k+1 points is
Consequently
and we can solve for the data dimensionality
pN
k
k
N
k
Vp 1kVso
1
e
e
n
k
k
n
kk
c
V
rrcV
/1
so
ee
nn
k
k
cpN
k
c
V
r
/1/1
1
en
k
cpN
k
r
/1
1
) largefor equality (last )/11(
)/11(
/1
/1
1 kk
kk
k
r
r
e
e
n
n
k
k
7
Non-Parametric Methods —
Expansion in Ortho-normal Basis Functions
Our last non-parametric technique is the use of orthogonal basis
functions to represent the density. The model is
where the basis functions satisfy the orthogonality
and completeness conditions
1
)()(
i
ii xcxp
iji
n
ji xdxgxx
)()()(
which provides solution for c xdxgxxpc
n
ii
i
)()()(1
1
)'(
)()'()'(
i i
ii xx
xxxg
8
Completeness
Orthogonality is probably familiar to you, but completeness may not
be. It is simply the statement that any function can be expanded in
the basis – it’s a complete set.
The analogs of
for orthogonal, unit-norm, basis vectors ei, i=1..N in a finite-
dimensional vector space are
T
i j ij
e e
1
)'(
)()'()'(
i i
ii xx
xxxg
1
1
N
T
i i
i
e e identity matrix
iji
n
ji xdxgxx
)()()(
9
Error in Terminating the Series
We’re obviously NOT going to use the whole infinite series, but rather
will terminate it. The error incurred in terminating the series at m
terms is
a convenient integrated measure is
so the best basis will have drop off quickly with increasing i.
111
)()()()()(
mi
ii
m
i
ii
i
iim xcxcxcxpxp
ii
mi
mj
ji
mi
iim
c
dxxcxcxgdxxpxpxg
2
1
11
2
)()()()()()(
iic
2
10
Example Basis Functions:
Hermite Polynomials
A useful basis for densities close to Gaussian comes from
Hermite polynomials Hi(x) times Gaussians):
2
22
2 2
2 2
1
22
2 2
( ) exp( ) ( )
( ) ( ) exp( ) exp( )
x
i i
i
i x x
i i
x H x
d
H x
dx
j
/3/)(/1)(
/)(1)(
3
3
2
2
10
xxxHxxH
xxHxH
11
Hermite Polynomial Basis Functions
The first few basis functions i look like this:
12
Orthogonal Function Expansions
Example —
See Graham-Charlier and Edgeworth expansions of classical statistics – in e.g.
Kendall & Stuart The Advanced Theory of Statistics
0 1 2 3
( ) ( ) 0.15 ( ) 0.2 ( ) 0.15 ( )p x x x x xj j j j
13
Binary Input Variables
For binary n-vectors — You only need 2n basis functions to represent
the density without error.
One such basis are the Walsh functions that appear in digital image
processing.
From http://finitegeometry.org/sc/ph/timefold.html14