https://xkcd.com/1381/
Not the same kind of margin either … 🙂
Announcements
Copyright By PowCoder代写 加微信 powcoder
Lagrange multipliers, take 2
Maximum-margin classifiers
(Chap 7.1)
● The intuition of margins
● Constructing
the max-margin
classifier
● The dual representation (cultural exposure)
● Support vectors and their geometric intuitions
SVM for overlapping class distributions
Relations to logistic regression
SVM for regression
The kernel trick.
More exposure to optimisation (lagrangian
multipliers, KKT conditions, transforming
a problem … )
kernels to
sparse kernel
Separating two classes with
a linear model
Assume: training dataset linearly separable, i.e. there
and b such that:
The multiple separator problem
Maximum margin
classifier
Figure incomplete in pdf version of the book.
Flashback: Discriminant
for two classes
Q: The target encoding is changed to {-1, +1} from one-of-K encoding from Chapter 4. Does this change the meaning and value of the separating hyperplane, i.e. are the two classes still separated by y(x)=0, why or why not?
Maximum margin solution: objective
Separating hyperplane: y(x) = 0
Distance of y(x) from they separating hyperplane y(x)=0:
Solve the following
obtain maximum margin solution:
Normalising the margin
Rescaling doesn’t change the margin of each data point {xn, tn}:
unchanged.
For points closest to decision boundary, set:
This implies that all data points lie “outside the margin”
Equivalent objective function
points closest to decision boundary, set:
This implies that all data points lie “on or outside the margin”
there is at least one constraint active, where t y(x )=
Lagrange multipliers, take 2
Maximum-margin classifiers
● The intuition of margins
● Constructing the max-margin
● The dual representation
● Support vectors and their geometric intuitions
SVM for overlapping class distributions
Relations to logistic regression
SVM for regression
(Chap 7.1)
classifier
Flashback: Lagrange multipliers (appendix
The first encounter in SML – we’ll see it again
in kernel methods.
objective function equality constraint
maximize f(x) subject to g(x)=0
objective function maximize f(x) inequality constraint subject to g(x) >= 0
Constraint inactive: g(x)>0 →
Constraint active: g(x)=0 →
multipliers with inequality constraints
Karush-Kuhn-Tucker
(KKT) conditions
The KKT conditions were originally named after . Kuhn and . Tucker, who first published the conditions in 1951. Later scholars discovered that the necessary conditions for this problem had been stated
by in his master’s thesis in 1939.
https://en.wikipedia.org/wiki/Karush%E2%80%93Kuhn%E2%80%93Tucker_conditions
The max-margin problem and its Lagrangian (primal)
The Lagrangian, define
Set its derivative w.r.t w and b to 0
one an>=0 for each constraint in (7.5)
The dual representation
Set the derivative of L w.r.t. w and b to 0
Substitute w with (7.8) and use (7.9) – see notes.
Quadratic program: optimize a quadratic function of constraints.
to a set of linear inequality
Assume we have solutions for a, now solve for b
KKT conditions → support vectors
Make predictions
KKT conditions
Geometric insight
for sparsity
Lagrange multipliers, take 2
Maximum-margin classifiers
● The intuition of margins
● Constructing the max-margin
● The dual representation
● Support vectors and their geometric intuitions
SVM for overlapping class distributions
Relations to logistic regression
SVM for regression
(Chap 7.1)
classifier
First, replace the constraints as
pens if the classes
Allow some data points to be on the ’wrong side’ of the decision boundary.
Increase a penalty with distance from the decision boundary.
with “hard margins”
𝝃n > 1 if the point is mis-classified. Therefore 𝝨n 𝝃n is an the number of misclassified points.
optimisation
upper bound on
C controls the trade-off between the slack variable penalty and the margin — the objective tries
minimise “total slack” across all
training points.
KKT conditions
optimisation
What changed, if any, in its solution?
The dual problem
of soft margin
Set derivatives of L to zero
to eliminate w, b, 𝝃n , 𝜇n
Prediction function unchanged
Limitations of
vector machines
Lagrange multipliers, take 2
Maximum-margin classifiers
● The intuition of margins
● Constructing the max-margin
● The dual representation
● Support vectors and their geometric intuitions
SVM for overlapping class distributions
Relations to logistic regression
SVM for regression
(Chap 7.1)
classifier
Re-writing SVM
When yntn>1 , 𝝃n =0 ;
otherwise,
objective with
𝝃n = 1 – yntn
Loss functions
𝜀-insensitive error
SVM regression with two set of slack variables
Prediction
Lagrange multipliers, take 2
Maximum-margin classifiers
● The intuition of margins
● Constructing the max-margin
● The dual representation
● Support vectors and their geometric intuitions
SVM for overlapping class distributions
Relations to logistic regression
SVM for regression
(Chap 7.1)
classifier
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com