CS计算机代考程序代写 flex algorithm Beacon Conference of Undergraduate Research

Beacon Conference of Undergraduate Research

Kernel Method

Lingqiao Liu
University of Adelaide

Outlines

University of Adelaide 2

• Why Kernel methods?

– Motivation

– Benefits

• Kernel Method

– Criteria for kernel functions

– Kernel SVM

– Commonly used kernels

• Kernelizing machine learning algorithms

– Simple practice

– Kernel k-means

– Kernel Regression

– Kernel PCA

Motivation

University of Adelaide 3

• Limitation of Linear Classifiers

Feature transform

University of Adelaide 4

• Idea: transform the feature to higher dimensional space

Demo

University of Adelaide 5

Feature transform

• Importance of the feature transform

– High-dimensional feature space usually works

• The transform function

– Can be a simple arithmetic operation

– Can be anything!

University of Adelaide 6

Feature transform

University of Adelaide 7

Issue

• The dimensionality of the mapped feature

– can be high or even infinite

– computational cost or infeasible

– More convenient to define it implicitly

University of Adelaide 8

Kernel trick

• In many cases, we are only interested in the inner
product of the mapped features

• In that case, we can define the feature map implicitly

• Kernel function

– can be more concise than the feature map

University of Adelaide 9

Polynomial kernel

• Definition

• Can be expanded as (when d = 2):

University of Adelaide 10

Recall that

Polynomial kernel

University of Adelaide 11

Gaussian RBF Kernel

• Definition

University of Adelaide 12

Understand Kernel

• Intuitively, modelling the similarity between two feature
points

– Guideline for designing kernels

– Not all similarity measurement can be a kernel function

University of Adelaide 13

Outlines

University of Adelaide 14

• Why Kernel methods?

– Motivation

– Benefits

• Kernel Method

– Criteria for kernel functions

– Kernel SVM

– Commonly used kernels

• Kernelizing machine learning algorithms

– Simple practice

– Kernel k-means

– Kernel Regression

– Kernel PCA

Criterion for a kernel function

• By definition, check if a kernel function can be written in
this form

• Equivalently, with arbitrary m samples, check if the
kernel matrix

University of Adelaide 15

Criterion for a kernel function

• Implies Semi-positive-definite of K

• Definition

• We can extend this to infinite number of samples
(functional space)

University of Adelaide 16

Criterion for a kernel function

• Mercer condition

University of Adelaide 17

Summary so far

• Kernel function

– Sometimes we are interested in the inner product of transformed
features

– We define a function to calculate the inner product between a
pair of features, called the kernel function

– Define a kernel function is equivalent to defining a feature
transform

• Application?

University of Adelaide 18

Kernel SVM

• Linear SVM (Primal form)

University of Adelaide 19

Kernel SVM

• Dual form

University of Adelaide 20

Kernel SVM

University of Adelaide 21

Kernel SVM

University of Adelaide 22

Kernel SVM

• Decision function for kernel SVM

University of Adelaide 23

No need to access to the exact value of the transformed feature, but
just their inner product

Kernel SVM

• Decision boundary of a Kernel SVM

University of Adelaide 24

Lesson learned

• Kernel SVM produce more flexible decision boundary
and thus can lead to better classification performance

• Choosing a good kernel is the key to success

University of Adelaide 25

Commonly used kernels

• Linear Kernel

• Polynomial kernel

• Gaussian RBF kernel

University of Adelaide 26

Commonly used kernels

• Histogram alike features: Bag-of-words

University of Adelaide 27

Commonly used kernels

• Histogram alike feature: bag-of-features model

University of Adelaide 28

Commonly used kernels for histogram liked
features

• Hellinger kernel

• Histogram Intersection Kernel

• RBF Kernel

University of Adelaide 29

Lesson learned (from kernel SVM)

• Kernel SVM produce more flexible decision boundary
and thus can lead to better classification performance

• Choosing a good kernel is the key to success

• We need to reformulate the original form to make the
problem only depend on the inner product of
transformed features

University of Adelaide 30

Outlines

University of Adelaide 31

• Why Kernel methods?

– Motivation

– Benefits

• Kernel Method

– Criteria for kernel functions

– Kernel SVM

– Commonly used kernels

• Kernelizing machine learning algorithms

– Simple practice

– Kernel k-means

– Kernel Regression

– Kernel PCA

Kernelize Machine Learning Algorithm

• How to kernelize a ML algorithm

– Assume we already know

– Reformulate the calculation to ensure that the algorithm only
depend on

• Revisit ML algorithms

– Kernel K-means

– Kernel regression

– Kernel PCA

University of Adelaide 32

Warm-up Practice

• Euclidean distance

University of Adelaide 33

Warm-up Practice

• Euclidean distance

University of Adelaide 34

K-means algorithm

University of Adelaide 35

How to kernelize k-means

• Key step: Assigning to the nearest centre – calculating
distance between feature and centre

• Challenges

– We cannot explicitly access

– Consequently, we cannot explicitly access

• How to represent centre?

University of Adelaide 36

How to kernelize k-means

University of Adelaide 37

Define

How to kernelize k-means

University of Adelaide 38

Put them together: Kernel K-means

• We could combine E step and M step into a single step:

– Calculating the assignment

By using the above kernelized algorithm

– Iteratively execute above step until the assignment won’t change
between iterations

University of Adelaide 39

Kernel Regression

• Linear Regression Problem

• However, directly expand this term cannot kernelize the
expression

University of Adelaide 40

Represent w by training data

• All the training data span a linear subspace

• You can imagine it as a plane in high-dimensional space

• We can project any d-dimensional vector as a
component in the subspace and a component orthogonal
to the subspace

University of Adelaide 41

Kernel Regression

• Rewrite w

University of Adelaide 42

Component on the subspace. Because it is on the subspace, we
can find a combination weight to reconstruct it from the
training samples

Component orthogonal to the subspace. So

Kernel Regression

University of Adelaide 43

So instead of solving w, we solve instead, then we use

to calculate the prediction

Lesson Learned

• Represent model parameters by linear combination of
training features, learn the combination weight instead

• Representer theorem:
https://en.wikipedia.org/wiki/Representer_theorem

University of Adelaide 44

PCA and kernel PCA

University of Adelaide 45

• Basic concept
– The dimensionality reduction function is a mapping function that

maps the original features to low-dimensional features

– In general, this mapping function can be learned from training data
and apply to other data

– The parameters of the mapping function in PCA (and KPCA):

• Mean vector

• Project matrix P

Key steps in PCA

University of Adelaide 46

• Learning parameters:

– Centralize data

– Calculate covariance matrix

– Calculate eigen vectors and eigen values

– Sort eigen values and their corresponding eigen vectors, keep the
top-k eigen vectors as the projection matrix P

• Apply transform
– Centralize all data with the mean vector calculated from training data

– Apply the learned projection matrix to the centralized data

Kernel PCA

University of Adelaide 47

• Main differences:

– The inner product between two samples are measured by the
kernel function

– Each sample can be essentially represented by

– We do not know the actual representation of

• Challenges
– We cannot directly centralize data (for training and testing)

– We cannot directly calculate the covariance matrix

– We cannot directly calculate the projection matrix P

Kernel PCA

University of Adelaide 48

• Main differences:

– The inner product between two samples are measured by the
kernel function

– Each sample can be essentially represented by

– We do not know the actual representation of

• Challenges
– We cannot directly centralize data (for training and testing)

– We cannot directly calculate the covariance matrix

– We cannot directly calculate the projection matrix P

Overcome Challenge 2

University of Adelaide 49

Overcome Challenge 2

University of Adelaide 50

Overcome Challenge 2

University of Adelaide 51

Note that, in the above equations, we assume transformed
feature is not centred, not the original feature

Kernel PCA

University of Adelaide 52

• Main differences:

– The inner product between two samples are measured by the
kernel function

– Each sample can be essentially represented by

– We do not know the actual representation of

• Challenges
– We cannot directly centralize data (for training and testing)

– We cannot directly calculate the covariance matrix

– We cannot directly calculate the projection matrix P

Overcome Challenge 1 (at training stage)

University of Adelaide 53

Kernel PCA

University of Adelaide 54

• Main differences:

– The inner product between two samples are measured by the
kernel function

– Each sample can be essentially represented by

– We do not know the actual representation of

• Challenges
– We cannot directly centralize data (for training and testing)

– We cannot directly calculate the covariance matrix

– We cannot directly calculate the projection matrix P

Overcome Challenge 3 and Challenge 1 (at
testing time)

University of Adelaide 55

• Recall that the projection matrix P in PCA obtained by stacking
eigen vectors of the covariance matrix (sorting them through
their corresponding eigen-values in the descending order)

Overcome Challenge 3 and Challenge 1 (at
testing time)

University of Adelaide 56

Putting together: KPCA

• Training

– Centralized kernel matrix (slide 51)

– Calculating centeralized Kernel matrix and its eigen vector (slide
49)

– Calculate the eigen vector, eigen value of the centralized kernel
matrix (obtain v’)

– Sort eigen vectors by its corresponding eigen values (same as in
PCA)

• Testing

– Use the equation in slide 54 to calculate the dimension-reduced
features .

University of Adelaide 57

Summary

• Kernel Method: theory

– Motivation

– Criterion for kernel functions.

– Use of Kernels: Kernel SVM

– Commonly used kernel

• Kernel Method: kernelize machine learning algorithm

– Kernel k-means

– Kernel Regression

– Kernel PCA

University of Adelaide 58