程序代写代做代考 flex algorithm decision tree ECE 657A: Classification – Lecture 7: k-NN, SVM and Kernal Methods

ECE 657A: Classification – Lecture 7: k-NN, SVM and Kernal Methods

ECE 657A: Classification
Lecture 7: k-NN, SVM and Kernal Methods

Mark Crowley

February 15, 2017

Mark Crowley ECE 657A : Lecture 7 February 15, 2017 1 / 22

Class Admin Announcements

Today’s Class

Announcements

K-nearest neighbor classification

Kernel Methods

Support Vector Machines

Mark Crowley ECE 657A : Lecture 7 February 15, 2017 2 / 22

Class Admin Announcements

Announcements

Assignment 1 complete!

Next week: no classes

Project Proposal due Friday.

TA: Rasoul Mohammadi Nasiri – r26mohammadinasiri@uwaterloo.ca –
E5 5013

New TA: Atinderdeep Saini – as3saini@uwaterloo.ca

Mark Crowley ECE 657A : Lecture 7 February 15, 2017 3 / 22

Similarity Based Classifiers

Outline of

Mark Crowley ECE 657A : Lecture 7 February 15, 2017 4 / 22

Similarity Based Classifiers

Parametric vs Non-parametric Models

Analysing some data, produce a probability model

Supervised: p(y |x, θ)
Unsupervised: p(x, θ)

Number of parameters θ of the model

Parametric: fix in size, regardless of the amount of data N (µ, σ)
Usually fast.
Use strong assumptions, could be wrong.
Can use MLE or MAP.

Examples: linear regression, logistic regression, Naive Bayes, Simple
Neural Networks, k-means, HMMs

Mark Crowley ECE 657A : Lecture 7 February 15, 2017 5 / 22

Similarity Based Classifiers

Parametric vs Non-parametric Models

Non-parametric: number of parameters grows with the amount of training
data.

More flexible.
Slower than parametric or even completely intractable
for large datasets

Examples: k-nearest neighbors, Support Vector Machines (SVMs),
Parzen Window/KDE, Decision Trees, Gaussian processes

Mark Crowley ECE 657A : Lecture 7 February 15, 2017 6 / 22

Similarity Based Classifiers k-Nearest Neighbor Classifier

k-Nearest Neighbor (KNN) Classifier

A simple example of a supervised, non-parametric, classification model.
General Algorithm

For some test datapoint x define a set of K points in training set
nearest to x

Count how many members of each class are in the nearest neighbors
set

Return empirical fraction for each class as a probability

Optionally take highest probability as class label for x

p(y = c |x,D,K ) =
1

K


i

∈ NK (x,D)I(yi = c)

Mark Crowley ECE 657A : Lecture 7 February 15, 2017 7 / 22

Similarity Based Classifiers k-Nearest Neighbor Classifier

Figure: KNN in 2D for K=3.

x1 :p(y = 1|x1,D,K = 3) = 2/3 (1)
x1 :p(y = 0|x1,D,K = 3) = 1/3 (2)
x2 :p(y = 1|x2,D,K = 3) = 3/3 = 1 (3)
x2 :p(y = 1|x2,D,K = 3) = 0/3 = 0 (4)

Mark Crowley ECE 657A : Lecture 7 February 15, 2017 8 / 22

Similarity Based Classifiers k-Nearest Neighbor Classifier

Special case K=1

If K = 1 then we have 1-Nearest Neighbour.
Induces a Voronoi Diagram. All points in polygons have point as their
nearest point.

Mark Crowley ECE 657A : Lecture 7 February 15, 2017 9 / 22

Similarity Based Classifiers k-Nearest Neighbor Classifier

KNN Example

−3 −2 −1 0 1 2 3

−2

−1

0

1

2

3

4

5

train p(y=1|data,K=10)

20 40 60 80 100

20

40

60

80

100

120

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

p(y=2|data,K=10)

20 40 60 80 100

20

40

60

80

100

120

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

−3 −2 −1 0 1 2 3
−2

−1

0

1

2

3

4

5

predicted label, K=10

c1

c2

c3

Figure: (a) 3-class dataset. (b) Prob of class 1 for K = 10. (c) Class 2. (d) MAP
estimate of class label.

Mark Crowley ECE 657A : Lecture 7 February 15, 2017 10 / 22

Similarity Based Classifiers k-Nearest Neighbor Classifier

Comments on KNN

Non-parametric because the number of neighboring points used
depends on the data.

Biased towards classes with more samples.

Sensitive to noise in the data (eg. random neighbors).

Computationally expensive in large dimensions or large K

Could compute distance using only some dimensions (which ones?)
Remove redundant points from training set (eg. ones surrounded by
same class)
For low dimensions could use search trees to organize training points
into independent subsets.

Mark Crowley ECE 657A : Lecture 7 February 15, 2017 11 / 22

Similarity Based Classifiers Parzen Window Density Estimation

Density Estimation

Estimate the density of points in a part of the dataset.

parametric approach: Could set location one or multiple
distributions to model clusters and fit parameters to the data.

non-parametric approach: Allocate a cluster for each training data
point and combines the distributions.

Parzen window density estimation
Kernal density estimation (KDE)

Mark Crowley ECE 657A : Lecture 7 February 15, 2017 12 / 22

Similarity Based Classifiers Parzen Window Density Estimation

Parzen Window

p̂h(x|D) =
1

Nh

N∑
i=1

κ(
x− xi
h

)

where h is called the bandwidth (usually chosen by cross-validation on the
risk).
Kernal Function κ

κ(x) = N (x|xi , σ2I)

=
1


e−x

2/2 Gaussian kernal

or = I(|x | ≤ 1) Boxcar kernal

Mark Crowley ECE 657A : Lecture 7 February 15, 2017 13 / 22

Similarity Based Classifiers Parzen Window Density Estimation

Parzen Window Example

−5 0 5 10
0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

unif, h=1.000

−5 0 5 10
0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

unif, h=2.000

−5 0 5 10
0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

gauss, h=1.000

−5 0 5 10
0

0.01

0.02

0.03

0.04

0.05

0.06

gauss, h=2.000

Figure: Boxcar kernal vs Gaussian kernal on h = 1 and h = 2

Mark Crowley ECE 657A : Lecture 7 February 15, 2017 14 / 22

Similarity Based Classifiers Parzen Window Density Estimation

From KDE to KNN

We can define K -Nearest Neighbors using the Parzen window or KDE
approach

Instead of a single bandwidth h, define hi for each datapoint xi such
that the K points nearest to xi are in the box.

Mark Crowley ECE 657A : Lecture 7 February 15, 2017 15 / 22

Kernel Methods Definition

Outline of

Class Admin
Announcements

Similarity Based Classifiers
k-Nearest Neighbor Classifier
Parzen Window Density Estimation

Kernel Methods
Definition

The Kernel Trick

Suport Vector Machines

Definitions

Mark Crowley ECE 657A : Lecture 7 February 15, 2017 16 / 22

The Kernel Trick

Kernel Function

Until now we assumed each object/data point can be represented
with fixed-size feature vector xi ∈ RD for purposes of estimation,
classification.

What if the feature is variable length? (eg. text document, protein
geometry, evolutionary tree)

One solution is to define a similarity measure called a kernel
function that doesn’t require processing the features directly, maps
them to a new space of an arbitrary dimensionality.

Mark Crowley ECE 657A : Lecture 7 February 15, 2017 17 / 22

The Kernel Trick

Kernel Function

κ(x, x′) ∈ R and ≤ 0

For x, x′ ∈ X some abstract space X .
Usually symmetric, so κ(x, x′) = κ(x′, x)

Mercer kernal or positive definite kernel has a number of useful
properties that allow extraction of features if the kernel is learned
directly

Requires that the Gram matrix is positive definite

Mark Crowley ECE 657A : Lecture 7 February 15, 2017 18 / 22

The Kernel Trick

Some example kernels

Gaussian Kernel

κ(x, x′) = exp

(

1

2
(x− x′)TΣ−1(x− x′)

)
Radial Basis Function

κ(x, x′) = exp

(

||x− x′||2

2σ2

)
Cosine Similatiry

κ(xi, xi′) =
xi

Txi′

||xi||2||xi′ ||2
where xij is number of times word j occurs in document i .

Mark Crowley ECE 657A : Lecture 7 February 15, 2017 19 / 22

The Kernel Trick

Linear Kernels

Use a linear function ψ(x) = x on the features.
Then we define a linear kernel

κ(x, x′) = ψ(x)Tψ(x′)xTx′

This assumes

Features are linearly seperable

Each feature is independently informative about the datapoint

Mark Crowley ECE 657A : Lecture 7 February 15, 2017 20 / 22

Definitions

Definitions:

R: the set of real numbers
D: training data D = {x1, . . . , xn}
K : number of nearest neighbors to include in kNN algorithm

Nk(x,D): set of K nearest neighbors of point x
I(COND): Indicator function which equals 1 if and only if COND is true.

κ(x):

Mark Crowley ECE 657A : Lecture 7 February 15, 2017 21 / 22

Definitions

Summary

K-nearest neighbor classification

Kernel Methods

Support Vector Machines

Mark Crowley ECE 657A : Lecture 7 February 15, 2017 22 / 22

Class Admin
Announcements

Similarity Based Classifiers
k-Nearest Neighbor Classifier
Parzen Window Density Estimation

Kernel Methods
Definition

The Kernel Trick
Suport Vector Machines
Definitions