CS计算机代考程序代写 scheme python ANLP_1: Introduction

ANLP_1: Introduction

ISML_3: Linear Classifier and (Linear) Support
Vector Machine

Lingqiao Liu

Outlines

University of Adelaide 2

• Linear Classifier

• Support Vector Machine: Primal form
– Separable case

– Non-separable case

• Support Vector Machine: Dual form
– KKT condition

– Dual form of separable linear SVM

– Dual form of non-separable linear SVM

• Solving optimization problem with CVX

• Summary

Requirements

• Basic concepts of linear classifier

• Basic ideas and concepts in SVMs

• Hard-margin SVMs:
– formulation (but not derivation)

– How to calculate margin and find support vectors

• Soft-margin SVMs:
– Motivation

– Formulation

– Hinge loss formulation

• SVM dual
– How to derive

– Formulation

– Relationship to primal problem solutions

University of Adelaide 3

Outlines

University of Adelaide 4

• Linear Classifier

• Support Vector Machine: Primal form
– Separable case

– Non-separable case

• Support Vector Machine: Dual form
– KKT condition

– Dual form of separable linear SVM

– Dual form of non-separable linear SVM

• Solving optimization problem with CVX

• Summary

Linear classifier

• Suppose each data sample can be represented as a vector
(called feature vector)

• In linear classifier, the classification can be done by
calculating the linear combination of feature values

– are the parameters of the classifier

– For binary classification, we only need a , the decision value
is larger than 0, the prediction is class 1, otherwise class -1

– For multi-class classification, we need to learn c sets of , one
for each class. The class that gives the highest decision value will
be classified as the predicted class

University of Adelaide 5

Geometrically

University of Adelaide 6

The linear boundary forms a Hyperplane in high-dimensional space

Multiclass classification

• Multiclass classification can
be converted into c binary
classification problems by
using the one-vs-rest scheme

– For each problem, we use the
c-th class as positive class and
all the others as the negative
class

– We focus on binary
classification today

University of Adelaide 7

Outlines

University of Adelaide 8

• Linear Classifier

• Support Vector Machine: Primal form
– Separable case

– Non-separable case

• Support Vector Machine: Dual form
– KKT condition

– Dual form of separable linear SVM

– Dual form of non-separable linear SVM

• Solving optimization problem with CVX

• Summary

Motivation

• To separate two classes, which hyperplane we should
choose

University of Adelaide 9

What makes a good separating hyperplane

• Training data is just a sampled subset of all possible data

• Suppose the hyperplane is close to one sample

• If we see new samples close to sample i, it is likely that it
will be wrongly placed on the other side of hyperplane

– Assumption: similar sample, similar label

• Lead to poor generalization

University of Adelaide 10

What makes a good separating hyperplane

• Hyperplane should be as far as possible from every
sample

• In this way, new samples close to old samples will be
classified correctly

• Good generalization

University of Adelaide 11

Support Vector Machine

• Idea: maximize the distance to the closest example =
margin

• This leads to an optimization problem

University of Adelaide 12

Properties of the optimal hyperplane

• Distance to the closest negative example = distance to
the closest positive example

– Why?

• Examples closest to the hyperplane are support vectors.

– Removing non-support vector will not affect the optimal hyper-
plane

University of Adelaide 13

Optimization problem for finding the
optimal hyperplane

• The primal optimization problem for linear support
vector machine (Hard-margin SVMs)

• Decision rule

University of Adelaide 14

Proof: (1) Related Math Conclusions

• Cosine angle between two vectors

• Norm vector of a hyperplane

University of Adelaide 15

Proof: (1) Related Math Conclusions

• Distance between a point to a hyperplane

University of Adelaide 16

Proof:

Proof: (2) Original optimization problem

SVM objective

• (1) maximize the distance to the closest example

• (2) positive class and negative class on each side of the
hyperplane

University of Adelaide 17

The minimal distance between hyper-plane

Correct classification

Proof: (2) transformed optimization
problem

University of Adelaide 18

Proof: (2) transformed optimization
problem (continue)

University of Adelaide 19

Take a closer look at the SVM problem

University of Adelaide 20

• Margin:

• All the support
vectors are in

Non-separable case

University of Adelaide 21

What if the training set is not linearly separable?

In that case, not all
inequality constraints can
be satisfied

• Slack variables indicates how much a sample violates the
inequality constraint

• And we want to minimize the total violation

• C is a hyper-parameter represents how much violation
allowed.

– Question: How to choose C?

Slack variables and soft-margin SVM

University of Adelaide 22

Hinge loss view of soft-margin SVM

• The soft-margin SVM can also be viewed from the
perspective of hinge-loss

University of Adelaide 23

Hinge loss view of soft-margin SVM

• If , which means the sample has been
correctly classified with sufficient confidence (not just
>0, but > 1), then the loss incurred for this sample is 0

• This means, if the “confidence” is large enough, we do
not pursuit a larger “confidence”.

• Instead, we focus on those samples that haven’t been
confidently classified.

University of Adelaide 24

Outlines

University of Adelaide 25

• Linear Classifier

• Support Vector Machine: Primal form
– Separable case

– Non-separable case

• Support Vector Machine: Dual form
– KKT condition

– Dual form of separable linear SVM

– Dual form of non-separable linear SVM

• Solving optimization problem with CVX

• Summary

SVM dual problem

• Sometimes it is beneficial to convert the SVM
optimization problems into another equivalent
optimization problem called the dual formulation of
SVM, or dual problem of SVM

• By applying certain transforms, we can derive a dual
problem for any optimization problem.

– An important type of dual problem is called Lagrangian dual
problem

– If the original problem is convex, then there are some
equivalence between the optimal solutions and the original
problem. In other words, a convex optimization problem could
be solved in its original (called primary form) form or its dual
form

University of Adelaide 26

Convex optimization

University of Adelaide 27

Convex function:

Convex Set:

An optimization is a convex optimization problem if is a convex function and
is a convex set

If f is a quadratic function and g is linear function, the problem is convex.

Lagrangian dual problem

• Original problem (Primal problem)

• Lagrangian dual problem

University of Adelaide 28

Lagrangian functionDefine

Assume and are convex functions, e.g.,
linear or quadratic functions

Relationship between dual problem and
primal problem

• Notation: using and denote the optimal solution
for the primal problem and dual problem, respectively

• Note that the inner problem is an unconstrained
optimization problem, so we should have

• Complementary slackness

• More information (if you want to know “why”) can be
found in this tutorial

lagrangian_duality.pdf (stanford.edu)

University of Adelaide 29

https://cs.stanford.edu/people/davidknowles/lagrangian_duality.pdf

Summary: Karush–Kuhn–Tucker (KKT)
conditions

University of Adelaide 30

1. Stationarity

2. Primary feasibility

3. Dual feasibility

4. Complementary slackness

Relationship between the solution to the primal problem and dual
problem can be specified by the KKT condition. The dual problem can
be derived by applying KKT condition and make some transformations