程序代写代做代考 Option One Title Here

Option One Title Here

ANLY-601
Advanced Pattern Recognition

Spring 2018

L16 — Nonparametric Density

Models (cont’d)

K-NN Estimates
Parzen window (uniform kernel) — kernel volume was fixed and we counted

the number of samples falling inside the volume to estimate p(x).

K-nearest neighbor estimator, choose point x at which we estimate the

density, and construct the smallest region L(x) that contains k points. Then

estimate the density at x as

)(

1
)(ˆ

xVN

k
xp


where N is the total number of points, V(x) the volume of the minimal n-dim

region containing k points.

The numerator k-1 gives the estimate lower bias than if it were k.

2

K-NN Estimates
Parzen window (uniform kernel) — kernel volume was fixed and we counted

the number of samples falling inside the volume to estimate p(x).

K-nearest neighbor estimator, choose point x at which we estimate the

density, and construct the smallest region L(x) that contains k points. Then

estimate the density at x as

)(

1
)(ˆ

xVN

k
xp


If r is the distance from x to the kth nearest neighbor, then

we can take V(x) to be the volume inside the n-sphere of radius r

where G is the Euler gamma function.

 
n

n

n

rV
2

2

2/

G

3

Bias and Variance of KNN

Bias and variance of ˆ p (x) 
k 1

N V(x)

    



  

n

N
kxpxpE

/21
2
11)( )(ˆ 

 
k

xp
xp

)(
)(ˆvar

2

Notice that to decrease the variance, we increase

the number of nearest neighbors k used. But

this grows V and gives a coarser estimate of p(x)

— i.e. the bias increases. So we have a

bias/variance trade-off to contend with.

where n is the dimension, and j depends

on the density and dimension.

The 2nd term is usually small, so is

approx. unbiased.

)(ˆ xp

Source – Keinosuke Fukunaga, Statistical Pattern Recognition, 2nd Ed., Acad.Press

4

Conceptual Interlude:
Intrinsic Dimensionality

Data handed to us in high-dimensional spaces may, in fact, actually lie

near some lower dimensional sub-manifold. We can find the local

dimension of this sub-manifold by using the k-NN density estimate.

The idea is that the intrinsic dimension (of the manifold) dictates how

the number of data points will increase as we increase the radius of

the region L(x).

Consider data that lies on a plane in 3-space

x

5

Conceptual Interlude:
Intrinsic Dimensionality

Consider data that lies on a plane in 3-space

x

If we erect a small sphere about the point x

(open circle) and grow the radius of the

sphere, the number of points inside will

grow only as r2 (the area in the plane

intercepted by the sphere), not as r3 as

would be the case if the data filled 3-space

rather than just the plane.

6

Intrinsic Dimensionality
We can derive the scaling of the number of points in the sphere with its
radius on purely dimensional grounds. The volume of the n-sphere
containing k points is

The radius is related to the volume by

So the radius of the sphere

containing k points is

The radius containing k+1 points is

Consequently

and we can solve for the data dimensionality

pN
k

k
N

k
Vp 1kVso

1



e

e

n
k

k
n

kk
c

V
rrcV

/1

so 






ee
nn

k
k

cpN

k

c

V
r

/1/1
1





 





en

k
cpN

k
r

/1

1 







) largefor equality (last )/11(
)/11(

/1

/1

1 kk
kk

k

r

r
e

e

n

n

k

k 







7

Non-Parametric Methods —
Expansion in Ortho-normal Basis Functions

Our last non-parametric technique is the use of orthogonal basis

functions to represent the density. The model is

where the basis functions satisfy the orthogonality

and completeness conditions



1

)()(
i

ii xcxp 

iji
n

ji xdxgxx  


)()()(

which provides solution for c xdxgxxpc
n

ii
i

)()()(1 
 





1

)'(
)()'()'(

i i

ii xx
xxxg




8

Completeness

Orthogonality is probably familiar to you, but completeness may not

be. It is simply the statement that any function can be expanded in

the basis – it’s a complete set.

The analogs of

for orthogonal, unit-norm, basis vectors ei, i=1..N in a finite-

dimensional vector space are
T

i j ij
e e 



1

)'(
)()'()'(

i i

ii xx
xxxg




1

1
N

T

i i

i

e e identity matrix

 

iji
n

ji xdxgxx  


)()()(

9

Error in Terminating the Series

We’re obviously NOT going to use the whole infinite series, but rather
will terminate it. The error incurred in terminating the series at m
terms is

a convenient integrated measure is

so the best basis will have drop off quickly with increasing i.






111

)()()()()(
mi

ii

m

i

ii

i

iim xcxcxcxpxp 

 

ii

mi

mj

ji

mi

iim

c

dxxcxcxgdxxpxpxg



2

1

11

2
)()()()()()(

 
















iic 
2

10

Example Basis Functions:
Hermite Polynomials

A useful basis for densities close to Gaussian comes from

Hermite polynomials Hi(x) times Gaussians):

 

2

22

2 2

2 2

1

22

2 2

( ) exp( ) ( )

( ) ( ) exp( ) exp( )

x
i i

i
i x x

i i

x H x

d
H x

dx



 

j

 

  

    

/3/)(/1)(

/)(1)(

3
3

2
2

10

xxxHxxH

xxHxH





11

Hermite Polynomial Basis Functions

The first few basis functions i look like this:

12

Orthogonal Function Expansions

Example —

See Graham-Charlier and Edgeworth expansions of classical statistics – in e.g.
Kendall & Stuart The Advanced Theory of Statistics

0 1 2 3
( ) ( ) 0.15 ( ) 0.2 ( ) 0.15 ( )p x x x x xj j j j   

13

Binary Input Variables

For binary n-vectors — You only need 2n basis functions to represent

the density without error.

One such basis are the Walsh functions that appear in digital image

processing.

From http://finitegeometry.org/sc/ph/timefold.html14