CS代考 https://xkcd.com/456/

https://xkcd.com/456/
Not the same kind of kernels …

Announcements

Copyright By PowCoder代写 加微信 powcoder

From the course reps – and
Education info session: FAQ on admin
After the break:
Week 7 Monday – Easter; Week 8 Monday – ANZAC day: no lecture or tutorial, join another online tutorial

Basis functions recap
Non-parametric methods (Chap 2.5)
Dual representation (of linear models, Chap 6.1)
Constructing kernels (Chap 6.2)
Next time:
maximum-margin classifiers (Chap

Recap: basis functions in linear models for classification
Linear w.r.t. Input x
Linear w.r.t. basis 𝟇(x)

Recap: basis in linear models for regression
Linear w.r.t. input x
Linear w.r.t. basis 𝟇(x)

Basis functions CAN:
● Represent non-linear decision boundaries in the input space.
● Separate classes linearly (in the feature space) for classes that are not linearly
separable in the input space.
Basis functions CAN NOT:
● Remove class overlap that already exist in input space.
● Adapt its own shape to
● Adapt the number of features to data (this lecture
data (but neural nets can,
and next lecture … )
functions”)

Where are we going? Two key ideas
● Instead of “summarising” the
length), why not ask
Similarity : Each training pair (input, neighbourhood of the input.
Kernels formalise
these ideas
● Nonparametric methods: do not rely on a fixed number
rather usually on storing the entire training set
used here).
training data into
the features to adapt to data?
Continuity : Mostly targets don’t change abruptly.
a set of weights (with fixed
target) tells us something about the possible
(various loose
of parameters, but
targets in the
definitions are

get there?

Density estimation
Observe data points {xn}n=1
Goal: estimate p(x) from data
Histograms:
Partition space into bins of width 𝝙
Count the number of points in each Normalise
N , draw i.i.d. from p(x)

Histograms
● Depend on bin
as density estimators
● Each data point is used once. Applicable to sequentially arriving data.
● Data points can the thrown away once the histogram is computed — fixed storage cost given bin width.
● Good for quickly visualising densities in 1
● Has discontinuities due to bin edges
● Does not work well in >2 dimensions – curse of dimensionality
Implicit assumption: a distance measure in some local neighbourhood
Kernel density estimators Nearest-neighbour methods
or 2 dimensions

Non-parametric
data points drawn i.i.d.
Probability mass in
Prob that K points fall within R
For large N
● Volume V of R
sufficiently
approximately constant in
● Volume V of R
sufficiently
→ Fix K – K-nearest neighbours
Fix V – kernel density estimation
sufficiently
density estimation
from unknown distribution p(x) in RD
a small region
small, s.t.
large, s.t.
large and binomial sharply peaked
It can be shown that both the K-nearest-neighbour density estimator and the kernel density estimator converge to the true probability density in the limit N → ∞ provided V shrinks suitably with N , and K grows with N (Duda and Hart, 1973).

Kernel density estimation (KDE),
Parzen windows
Hypercubes
haven’t improved from
histograms

Kernel density estimation with

Kernel density estimation in
Choose any k(u) s.t.
training data simply
Con: training data simply
stored, no computation needed
stored, expensive to evaluate the density
for “training”

Nearest neighbour methods
Drawback of KDE: fixed V (not adapting parameter)
Alternative: fix K,
Consider a small sphere
use data to find V
data points. Set V to the volume of the resulting sphere.
! Do not produce
true density estimates.
x and then
to data), fixed k(u), having
to choose h (1-d kernel width
allow the radius to increase until it contains exactly K

Nearest neighbour for classification
draw a sphere
points irrespective of
1 : the nearest-neighbour rule
in the limit N → ∞, the error rate of
1-NN is never more than twice the minimum achievable error rate of an optimal classifier, i.e., one that uses the true class distributions (Cover and Hart, 1967) .
centred on x
their class.
containing precisely K

Basis functions recap
Non-parametric methods (Chap 2.5)
Dual representation (of linear models, Chap 6.1)
Constructing kernels (Chap 6.2)

The role of training data

vs Features

Kernel methods in one slide

Optimal regularised
feature function
Prediction function for new x
to kernels
also (3.27)

Dual representation

Dual representation

Prediction function for new x
for a, prediction

Basis functions recap
Non-parametric methods (Chap 2.5)
Dual representation (of linear models, Chap 6.1)
Constructing kernels (Chap 6.2)

The kernel function

/ Radial Basis Function (RBF)

Kernels over graphs, strings, sets

Kernels from probabilistic generative models

Kernels for regression and classification:

Basis functions recap
Non-parametric methods (Chap 2.5)
Dual representation (of linear models, Chap 6.1)
Constructing kernels (Chap 6.2)

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com