代写代考 https://xkcd.com/1381/

https://xkcd.com/1381/
Not the same kind of margin either … 🙂

Announcements

Copyright By PowCoder代写 加微信 powcoder

Lagrange multipliers, take 2
Maximum-margin classifiers
(Chap 7.1)
● The intuition of margins
● Constructing
the max-margin
classifier
● The dual representation (cultural exposure)
● Support vectors and their geometric intuitions
SVM for overlapping class distributions
Relations to logistic regression
SVM for regression
The kernel trick.
More exposure to optimisation (lagrangian
multipliers, KKT conditions, transforming
a problem … )

kernels to
sparse kernel

Separating two classes with
a linear model
Assume: training dataset linearly separable, i.e. there
and b such that:

The multiple separator problem

Maximum margin
classifier
Figure incomplete in pdf version of the book.

Flashback: Discriminant
for two classes
Q: The target encoding is changed to {-1, +1} from one-of-K encoding from Chapter 4. Does this change the meaning and value of the separating hyperplane, i.e. are the two classes still separated by y(x)=0, why or why not?

Maximum margin solution: objective
Separating hyperplane: y(x) = 0
Distance of y(x) from they separating hyperplane y(x)=0:
Solve the following
obtain maximum margin solution:

Normalising the margin
Rescaling doesn’t change the margin of each data point {xn, tn}:
unchanged.
For points closest to decision boundary, set:
This implies that all data points lie “outside the margin”

Equivalent objective function
points closest to decision boundary, set:
This implies that all data points lie “on or outside the margin”
there is at least one constraint active, where t y(x )=

Lagrange multipliers, take 2
Maximum-margin classifiers
● The intuition of margins
● Constructing the max-margin
● The dual representation
● Support vectors and their geometric intuitions
SVM for overlapping class distributions
Relations to logistic regression
SVM for regression
(Chap 7.1)
classifier

Flashback: Lagrange multipliers (appendix
The first encounter in SML – we’ll see it again
in kernel methods.
objective function equality constraint
maximize f(x) subject to g(x)=0

objective function maximize f(x) inequality constraint subject to g(x) >= 0
Constraint inactive: g(x)>0 →
Constraint active: g(x)=0 →
multipliers with inequality constraints
Karush-Kuhn-Tucker
(KKT) conditions

The KKT conditions were originally named after . Kuhn and . Tucker, who first published the conditions in 1951. Later scholars discovered that the necessary conditions for this problem had been stated
by in his master’s thesis in 1939.
https://en.wikipedia.org/wiki/Karush%E2%80%93Kuhn%E2%80%93Tucker_conditions

The max-margin problem and its Lagrangian (primal)
The Lagrangian, define
Set its derivative w.r.t w and b to 0
one an>=0 for each constraint in (7.5)

The dual representation
Set the derivative of L w.r.t. w and b to 0
Substitute w with (7.8) and use (7.9) – see notes.

Quadratic program: optimize a quadratic function of constraints.
to a set of linear inequality
Assume we have solutions for a, now solve for b

KKT conditions → support vectors
Make predictions
KKT conditions

Geometric insight
for sparsity

Lagrange multipliers, take 2
Maximum-margin classifiers
● The intuition of margins
● Constructing the max-margin
● The dual representation
● Support vectors and their geometric intuitions
SVM for overlapping class distributions
Relations to logistic regression
SVM for regression
(Chap 7.1)
classifier

First, replace the constraints as
pens if the classes
Allow some data points to be on the ’wrong side’ of the decision boundary.
Increase a penalty with distance from the decision boundary.

with “hard margins”
𝝃n > 1 if the point is mis-classified. Therefore 𝝨n 𝝃n is an the number of misclassified points.
optimisation
upper bound on
C controls the trade-off between the slack variable penalty and the margin — the objective tries
minimise “total slack” across all
training points.

KKT conditions
optimisation
What changed, if any, in its solution?

The dual problem
of soft margin
Set derivatives of L to zero
to eliminate w, b, 𝝃n , 𝜇n
Prediction function unchanged

Limitations of
vector machines

Lagrange multipliers, take 2
Maximum-margin classifiers
● The intuition of margins
● Constructing the max-margin
● The dual representation
● Support vectors and their geometric intuitions
SVM for overlapping class distributions
Relations to logistic regression
SVM for regression
(Chap 7.1)
classifier

Re-writing SVM
When yntn>1 , 𝝃n =0 ;
otherwise,
objective with
𝝃n = 1 – yntn

Loss functions

𝜀-insensitive error

SVM regression with two set of slack variables

Prediction

Lagrange multipliers, take 2
Maximum-margin classifiers
● The intuition of margins
● Constructing the max-margin
● The dual representation
● Support vectors and their geometric intuitions
SVM for overlapping class distributions
Relations to logistic regression
SVM for regression
(Chap 7.1)
classifier

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com