CS计算机代考程序代写 scheme python ANLP_1: Introduction

ANLP_1: Introduction

ISML_3: Linear Classifier and (Linear) Support
Vector Machine

Lingqiao Liu

Outlines

University of Adelaide 2

• Linear Classifier

• Support Vector Machine: Primal form
– Separable case

– Non-separable case

• Support Vector Machine: Dual form
– KKT condition

– Dual form of separable linear SVM

– Dual form of non-separable linear SVM

• Solving optimization problem with CVX

• Summary

Requirements

• Basic concepts of linear classifier

• Basic ideas and concepts in SVMs

• Hard-margin SVMs:
– formulation (but not derivation)

– How to calculate margin and find support vectors

• Soft-margin SVMs:
– Motivation

– Formulation

– Hinge loss formulation

• SVM dual
– How to derive

– Formulation

– Relationship to primal problem solutions

University of Adelaide 3

Outlines

University of Adelaide 4

• Linear Classifier

• Support Vector Machine: Primal form
– Separable case

– Non-separable case

• Support Vector Machine: Dual form
– KKT condition

– Dual form of separable linear SVM

– Dual form of non-separable linear SVM

• Solving optimization problem with CVX

• Summary

Linear classifier

• Suppose each data sample can be represented as a vector
(called feature vector)

• In linear classifier, the classification can be done by
calculating the linear combination of feature values

– are the parameters of the classifier

– For binary classification, we only need a , the decision value
is larger than 0, the prediction is class 1, otherwise class -1

– For multi-class classification, we need to learn c sets of , one
for each class. The class that gives the highest decision value will
be classified as the predicted class

University of Adelaide 5

Geometrically

University of Adelaide 6

The linear boundary forms a Hyperplane in high-dimensional space

Multiclass classification

• Multiclass classification can
be converted into c binary
classification problems by
using the one-vs-rest scheme

– For each problem, we use the
c-th class as positive class and
all the others as the negative
class

– We focus on binary
classification today

University of Adelaide 7

Outlines

University of Adelaide 8

• Linear Classifier

• Support Vector Machine: Primal form
– Separable case

– Non-separable case

• Support Vector Machine: Dual form
– KKT condition

– Dual form of separable linear SVM

– Dual form of non-separable linear SVM

• Solving optimization problem with CVX

• Summary

Motivation

• To separate two classes, which hyperplane we should
choose

University of Adelaide 9

What makes a good separating hyperplane

• Training data is just a sampled subset of all possible data

• Suppose the hyperplane is close to one sample

• If we see new samples close to sample i, it is likely that it
will be wrongly placed on the other side of hyperplane

– Assumption: similar sample, similar label

• Lead to poor generalization

University of Adelaide 10

What makes a good separating hyperplane

• Hyperplane should be as far as possible from every
sample

• In this way, new samples close to old samples will be
classified correctly

• Good generalization

University of Adelaide 11

Support Vector Machine

• Idea: maximize the distance to the closest example =
margin

• This leads to an optimization problem

University of Adelaide 12

Properties of the optimal hyperplane

• Distance to the closest negative example = distance to
the closest positive example

– Why?

• Examples closest to the hyperplane are support vectors.

– Removing non-support vector will not affect the optimal hyper-
plane

University of Adelaide 13

Optimization problem for finding the
optimal hyperplane

• The primal optimization problem for linear support
vector machine (Hard-margin SVMs)

• Decision rule

University of Adelaide 14

Proof: (1) Related Math Conclusions

• Cosine angle between two vectors

• Norm vector of a hyperplane

University of Adelaide 15

Proof: (1) Related Math Conclusions

• Distance between a point to a hyperplane

University of Adelaide 16

Proof:

Proof: (2) Original optimization problem

SVM objective

• (1) maximize the distance to the closest example

• (2) positive class and negative class on each side of the
hyperplane

University of Adelaide 17

The minimal distance between hyper-plane

Correct classification

Proof: (2) transformed optimization
problem

University of Adelaide 18

Proof: (2) transformed optimization
problem (continue)

University of Adelaide 19

Take a closer look at the SVM problem

University of Adelaide 20

• Margin:

• All the support
vectors are in

Non-separable case

University of Adelaide 21

What if the training set is not linearly separable?

In that case, not all
inequality constraints can
be satisfied

• Slack variables indicates how much a sample violates the
inequality constraint

• And we want to minimize the total violation

• C is a hyper-parameter represents how much violation
allowed.

– Question: How to choose C?

Slack variables and soft-margin SVM

University of Adelaide 22

Hinge loss view of soft-margin SVM

• The soft-margin SVM can also be viewed from the
perspective of hinge-loss

University of Adelaide 23

Hinge loss view of soft-margin SVM

• If , which means the sample has been
correctly classified with sufficient confidence (not just
>0, but > 1), then the loss incurred for this sample is 0

• This means, if the “confidence” is large enough, we do
not pursuit a larger “confidence”.

• Instead, we focus on those samples that haven’t been
confidently classified.

University of Adelaide 24

Outlines

University of Adelaide 25

• Linear Classifier

• Support Vector Machine: Primal form
– Separable case

– Non-separable case

• Support Vector Machine: Dual form
– KKT condition

– Dual form of separable linear SVM

– Dual form of non-separable linear SVM

• Solving optimization problem with CVX

• Summary

SVM dual problem

• Sometimes it is beneficial to convert the SVM
optimization problems into another equivalent
optimization problem called the dual formulation of
SVM, or dual problem of SVM

• By applying certain transforms, we can derive a dual
problem for any optimization problem.

– An important type of dual problem is called Lagrangian dual
problem

– If the original problem is convex, then there are some
equivalence between the optimal solutions and the original
problem. In other words, a convex optimization problem could
be solved in its original (called primary form) form or its dual
form

University of Adelaide 26

Convex optimization

University of Adelaide 27

Convex function:

Convex Set:

An optimization is a convex optimization problem if is a convex function and
is a convex set

If f is a quadratic function and g is linear function, the problem is convex.

Lagrangian dual problem

• Original problem (Primal problem)

• Lagrangian dual problem

University of Adelaide 28

Lagrangian functionDefine

Assume and are convex functions, e.g.,
linear or quadratic functions

Relationship between dual problem and
primal problem

• Notation: using and denote the optimal solution
for the primal problem and dual problem, respectively

• Note that the inner problem is an unconstrained
optimization problem, so we should have

• Complementary slackness

• More information (if you want to know “why”) can be
found in this tutorial

lagrangian_duality.pdf (stanford.edu)

University of Adelaide 29

https://cs.stanford.edu/people/davidknowles/lagrangian_duality.pdf

Summary: Karush–Kuhn–Tucker (KKT)
conditions

University of Adelaide 30

1. Stationarity

2. Primary feasibility

3. Dual feasibility

4. Complementary slackness

Relationship between the solution to the primal problem and dual
problem can be specified by the KKT condition. The dual problem can
be derived by applying KKT condition and make some transformations

Derive dual problem for soft-margin SVMs

University of Adelaide 31

1. Stationarity

Step 1: take derivative over Lagrangian function

Derive dual problem for soft-margin SVMs

University of Adelaide 32

Step 2: represent primal variables with dual variables, substitute into the Lagrangian
function

Derive dual problem for soft-margin SVMs

University of Adelaide 33

Derive dual problem for soft-margin SVMs

University of Adelaide 34

Derive dual problem for soft-margin SVMs

University of Adelaide 35

Derive dual problem for soft-margin SVMs

University of Adelaide 36

Derive dual problem for soft-margin SVMs

University of Adelaide 37

Step 3: Check constraints by using dual feasibility

Any constraints? Dual variables must be no less than 0

SVM dual problem

• Dual problem

• Primal problem

University of Adelaide 38

Convert dual solutions to primal solutions

University of Adelaide 39

• How about b?

– By using complementary slackness

Convert dual solutions to primal solutions

• A closer look at complementary slackness condition

– If then , then

– If then , then

– How about then , then

University of Adelaide 40

Points satisfying the last two conditions
are called support vectors

How to solve b

• Find any sample with its corresponding ,
solving b by

University of Adelaide 41

SVM dual problem

• Primal problem

– Number of variables is proportional to the number of feature
dimensions

– Number of constraints is proportional to the number of samples

• Dual problem

– Number of variables is proportional to the number of samples

• Solving SVM from dual or primal?

– Depends which one results in smaller optimization problem, i.e.,
number of variables to optimize

– Efficient solutions have been developed by using the relationship
between dual problem and primal problem

University of Adelaide 42

Outlines

University of Adelaide 43

• Linear Classifier

• Support Vector Machine: Primary form
– Separable case

– Non-separable case

• Support Vector Machine: Dual form
– KKT condition

– Dual form of separable linear SVM

– Dual form of non-separable linear SVM

• Solving optimization problem with CVX

• Summary

Optimization solver

• Optimization problem is usually solved by existing solver

• For the optimization problem in SVMs

– Using standard convex optimization solver

• CVX Linear program — CVXPY 1.1.13 documentation

– Using efficient solver specifically developed for SVMs

• LIBSVM — A Library for Support Vector Machines (ntu.edu.tw)

• LIBLINEAR — A Library for Large Linear Classification
(ntu.edu.tw)

University of Adelaide 44

https://www.cvxpy.org/examples/basic/linear_program.html
https://www.csie.ntu.edu.tw/~cjlin/libsvm/
https://www.csie.ntu.edu.tw/~cjlin/liblinear/

How to use CVX for solving optimization
problem

• Scientific programming in Python

– Commonly used packages: Numpy, Scipy

– NumPy Tutorial (tutorialspoint.com)

– Data are represented as Ndarray and various matrix operations
are supported by that

• Using CVX

– CVX provides a more user-friendly interface to define your
optimization problem

– Before CVX, you need to convert your problem into its canonical
form, e,g. a quadratic programming problem

University of Adelaide 45

https://www.tutorialspoint.com/numpy/index.htm

Example

University of Adelaide 46

Example

University of Adelaide 47

Important skills

• Converting summation to matrix forms

University of Adelaide 48

Important skills

• Please refer to the documents and examples of CVX

University of Adelaide 49

Summary

• Concept of linear classifier

• Basic idea of linear SVMs: margin maximization

• Primal problem (hard-margin)

• Related concepts

– Support vectors

– Margin

University of Adelaide 50

Summary

• Primal problem (soft-margin)

• Dual problem: concepts and derivation

University of Adelaide 51

Summary

• Dual problem

– Relationship with primal problem

• Support vector

• Calculating w and b

• Using optimization toolbox

– CVX

– Skills for converting summation into matrix operations

University of Adelaide 52