代写代考 SVM 1129 e ti

Introduction to Machine Learning Dual formulation of SVMs
Prof. Kutty

Today’s Agenda

Copyright By PowCoder代写 加微信 powcoder

• Recap: Hard and Soft Margin SVMs
• Section 1:
• Section 2: Dual Formulation for SVMs
• Section 3: SVMs and the kernel trick

Recap: Hard and Soft Margin SVMs

Hard Margin SVM Assuming data are linearly separable
Sn Seti yeaÉ
• Boundary that classifies the training set correctly
• That is maximally removed from training examples closest to the decision boundary
“! ! “!,$ %
Quadratic Program
subjectto,(‘) .̅⋅0̅’ +2 ≥1for5∈ 1,…,9
Linear classifier output by this QP: :5;9(.̅ ⋅ 0̅ + 2)
Parameters O b Program

Linear Separability
What if data are not linearly separable? How can we handle such cases?
1. Soft-Margin SVMs
2. Map to a higher dimensional space
3. SVM dual and the kernel trick

Soft-Margin SVM
·x+0 =1
lack v This
Soft-margin SVM advantages
• can handle data that are not linearly separable ariables is that we can now solve problems that
hen examples are still linearly separable
• reduce effect of outliers and less chances of overfitting s illustrated in Figure 2 with di↵erent values of
-̅ ‘ + penaltyterm min +C12(
=100 (C =0.01)
·x+0 =1
“!,$,&% 2 ()̅*
subjectto;(() -⋅=̅ ( +> ≥1−2(
forB∈ 1,…,E and2(≥0
Image credit: Barzilay & Jaakkola
slack variables
examples are no longer linearly separable
hyperparameter
e support vector machines can be written in a familiar is exactly the same as minimizing

Linear Separability
What if data are not linearly separable? How can we handle such cases?
1. Soft-Margin SVMs
2. Map to a higher dimensional space
3. SVM dual and the kernel trick

Linear classifiers
in higher dimensional spaces: idea
map data to a higher dim. space in which there exists a separating hyperplane (corresponds to a non-linear decision boundary in the original space)

Implications for SVM 1129 e ti
EIR Sn tch yeayÉ .̅%K I
“! , $ , HG 2 ‘ I J s s
subjectto,(‘) .̅⋅B(0̅ ‘ )+2 ≥1−A’
andA’ ≥0 for5∈ 1,…,9
Feature map !: R! → R” …Parameter learnt in RL space
To classify a new data point at Md Oct sign E Oct Issue: potentially very inefficient

Linear Separability
What if data are not linearly separable? How can we handle such cases?
1. Soft-Margin SVMs
2. Map to a higher dimensional space
f Hot to do this efficiently
3. SVM dual and the kernel trick

Section 1:

Support Vector Machines
QP formulation
Goal: rewrite in dual form Xn 1XnXn
max ↵i ↵i↵jy(i)y(j)x ̄(i) ·x ̄(j) ↵ ̄,↵i0 i=1 2 i=1 j=1
hardmargin sun
for with no offer parameter
“!! (‘) ̅ ‘
min subject to , . ⋅ 0̅ ≥ 1 for 5 ∈ 1, … , 9

Dual Formulation in General

Relating the Lagrangian to !($#)
object ystraints on t h.ca so and
originalproblem: min) $# s.t. h$ $# ≤0 gig
for?=1,…,B nice to
Claim: “! $# = D) $# ,
Case 1: constraints are satisfied
substituting in the definitionof LCT
Lagrangian: /$#,,1=)$#+∑’ ,h($#) $%& $ $
E 1 1 2 s ‘ t
” T $# = m a x / $# , ,1 ! #”,#!*+
if constraints are satisGied otherwise
“! $# =max ) $# ++,$h$ $# #” $%&
=max ) $# +,&h& $# +⋯+,(h( $# +⋯+,’h’ $#
if u in 20 25 0
x hjw kzIgt
Case 2: constraints are not satisfied
“! $# =max ) $# +,&h& $# +⋯+,(h( $# +⋯+,’h’ $#
#” t let’s assume 70 aa njit

Primal formulation
originalproblem: min) $# s.t. h$ $# ≤0 ,”
L a g r a n g i a n : / $# , ,1 = ) $# + ∑ ‘ , h ( $# ) $%& $ $
D e f i n e : ” ! $# = m a x / $# , ,1 #” , # ! * +
for?=1,…,B
“! $# = D) $# , if constraints are satisGied ∞, otherwise
wantto picka
to egg pea
NotethatminM/ N! =minmaxR N!,SG =minT N!
.! .! 0! .! if constraints are satisfiable
Primal formulation min max &()(, ,+) “! #! , # ! % &

Primal vs Dual formulation
originalproblem: min) $# s.t. h$ $# ≤0 ,”
L a g r a n g i a n : / $# , ,1 = ) $# + ∑ ‘ , h ( $# ) $%& $ $
D e f i n e : ” ! $# = m a x / $# , ,1 #” , # ! * +
for?=1,…,B
“! $# = D) $# , ∞,
Primal formulation
Dual formulation
if constraints are satisGied otherwise
min max &()(, ,+) “! #! , # ! % &
m a x m i n & ( )( , ,+ ) #!,#!%& “!

Duality gap
Primal formulation min max &()(, ,+) jingoismpi “! #!,#!%&
D u a l f o r m u l a t i o n m a x m i n & ( )( , ,+ ) d #!,#!%& “!
p Edt M da E p
1. Thedifferencebetweenthesesolutionsiscalledthe
duality gap
2. Thedualgivesalowerboundonthesolutionoftheprimal 3. Undercertainconditions*,however,thedualitygapiszero
These conditions hold for our problem i.e., the duality gap is zero
* quadratic convex objective, constraint functions affine, primal/dual feasible

Support vector machines (SVMs) at work:
distinguishing acute lymphoblastic leukemia from acute myeloid leukemia (AML).
What is a support vector machine? Noble (NATURE BIOTECHNOLOGY 2006)

Vladimir Vapnik

Section 2: Dual Formulation for SVMs

Support Vector Machines Quadratic Program (QP) formulation for hard
margin SVM (without offset)
“!! (‘) ̅ ‘
min subject to , . ⋅ 0̅ ≥ 1 for 5 ∈ 1, … , 9
Goal: rewrite in dual form Xn 1XnXn
max ↵i ↵i↵jy(i)y(j)x ̄(i) ·x ̄(j) ↵ ̄,↵i0 i=1 2 i=1 j=1

classifier
output of this optimization problem

Kernelized Dual SVM
max ↵i 2 ↵i↵j y(i)y(j)((x ̄(i)) · (x ̄(j)))
↵ ̄ i=1 i=1j=1
subject to ↵i 0 8i = 1,..,n
max ↵i 2 ↵i↵jy(i)y(j)K(x ̄(i),x ̄(j))
↵ ̄ i=1 i=1j=1
subject to ↵i 0 8i = 1,..,n
• Intuitively, can think of .(0̅ ‘ , 0̅(=)) as a measure of similarity between 0̅ ‘ and 0̅(=)
• Sometimes it is much more efficient to compute .(0̅ ‘ , 0̅(=)) directly

Classifying a new example
Previously:
h(x ̄) = sign(✓ ̄ · x ̄)
(x ̄(i) )
Recall: ̄ Xn ✓⇤ =
i=1 h0̅ =:5;9
↵iy(i)x ̄(i) K
@H’,’0̅’ ⋅0̅ ‘IJIai y’sdca
xs (‘) KOct .0̅ ,0̅
=:5;9 @H’,(‘)0̅(‘)⋅0̅ ‘IJ

Deriving the dual formulation for SVM Step 1: Compose the Lagrangian
originalproblem: L a g r a n g i a n :
min) $# s.t. h$ $# ≤0 for?=1,…,B ,”
/ $# , ,1 s. t.
= ) $# + ∑ ‘ , h ( $# )
$%& $ $ see
( ̅ ( 0 7 1 y o o
; – ⋅ =̅ ≥ 1
R – ̅ , SG =
for B = 1, … , E
( w i t h , ‘ ≥ 0 )
“!-+ (̅( A. ‘ +∑()*S( 1−; -⋅=̅
B . “! – + ∑ + S + 1 − ; ( – ̅ ⋅ = ̅ ( ‘ ()*(
C . “! – + ∑ + S ; ( – ̅ ⋅ = ̅ ( ‘ ()*(
D . “! – + ∑ + S + ; ( ‘ ()*(
– ̅ ⋅ = ̅ (
C1 yo 0.20

Deriving the dual formulation for SVM Step 1: Compose the Lagrangian
originalproblem: min) $# s.t. h$ $# ≤0 for?=1,…,B ,”
L a g r a n g i a n : / $# , ,1 = ) $# + ∑ ‘ , h ( $# ) $%& $ $
min s. t. ; – ⋅ =̅ ≥ 1 for B = 1, … , E
& 3 ̅ , ,+ = 2 + 5 , ‘ 1 − : ‘ 3 ̅ ⋅ 0 ̅ ‘
‘)* with ,’ ≥ 0

Deriving the dual formulation for SVM Step 1: Compose the Lagrangian
min s. t. ; – ⋅ =̅ ≥ 1 for B = 1, … , E
& 3 ̅ , ,+ = 2 + 5 , ‘ 1 − : ‘ 3 ̅ ⋅ 0 ̅ ‘
‘)* with ,’ ≥ 0
Step 2: Write the dual formulation
maxmin&(3̅,,+)
#! , # ! % &

Deriving the dual formulation Step 3: Rewrite in primal variable in terms of dual variables
max min5 7̅,95 “!,”!$% &!
where !#̅,&% = 2 +*&$(1−.$ (#̅⋅0̅($))) AE 4 t
S e t ∇ “! J . ̅ , H H | “! I “! ∗ = 0
Do L 10,27
Oo lg t É.ae
3̅∗ = 5,’: ‘ 0̅ ‘ ‘)*

Deriving the dual formulation Step 4: Simplify the dual formulation
max min& 3̅,,+
#! , # ! % &
!#̅,&% = 2 +*&$(1−.$ (#̅⋅0̅($)))
7̅∗ = 99(= ( >̅ ( ()*
max ↵i ↵i↵jy(i)y(j)x ̄(i) ·x ̄(j)
↵ ̄,↵i0 i=1 2 i=1 j=1

Lio t 11012 IiLiCi yaEAIci
OA EA EEai y I
x yciszcis T
a CDT zci yet y D

Dual variables and Support Vectors
max min5 7̅,95 “! , ” ! $ % &!
Solution satisfies “complementary slackness constraints”:
(support vector) (non-support vector)
̄ 2 Xn ̄ ||✓||
↵i(1 y(i)(✓ · x ̄(i)))
c o n s t r a i n t s
where L(✓, ↵ ̄) = 2 +
Let optimal values be given by .̅∗and HLJ, … , HLK
→ (i) * (i) ̄
↵ˆ i > 0 : y ✓ · x ̄ = 1
(i) ̄* (i)
↵ˆ = 0 : y ✓ · x ̄ > 1
In other words, either
the primal inequality is satisfied with equality or
the dual variable is zero.
3̅∗ = 5,’: ‘ 0̅ ‘ ‘)*

Dual variables and Support Vectors
• Support vectors are the most important datapoints in the dataset ànon-zero dualsàseparating hyperplane depends on these
• for hard margin SVMs support vectors are – points on the margin
• for soft margin SVMs support vectors are
– points on the “wrong side” of the margin
• misclassified points
• points within the margin – points on the margin

Section 3: (Efficient) Non-linear decision boundaries with SVMs
SVMs and the kernel trick

Dual formulation of Hard-Margin SVM
max ↵i ↵i↵jy(i)y(j)x ̄(i) ·x ̄(j)
↵ ̄,↵i0 i=1 2 i=1 j=1

Feature mapping with SVM
max ↵i ↵i↵jy(i)y(j)x ̄(i) ·x ̄(j)
↵ ̄,↵i0 i=1 2 i=1 j=1 Xn 1XnXn
max ↵i 2 ↵i↵j y(i)y(j)((x ̄(i)) · (x ̄(j))) ↵ ̄ i=1 i=1j=1
subject to ↵i 0 8i = 1,..,n Issue: potentially very inefficient

Kernelized Dual SVM
max ↵i 2 ↵i↵j y(i)y(j)((x ̄(i)) · (x ̄(j)))
↵ ̄ i=1 i=1j=1
subject to ↵i 0 8i = 1,..,n
max ↵i 2 ↵i↵jy(i)y(j)K(x ̄(i),x ̄(j))
↵ ̄ i=1 i=1j=1
subject to ↵i 0 8i = 1,..,n
• Intuitively, can think of .(0̅ ‘ , 0̅(=)) as a measure of similarity between 0̅ ‘ and 0̅(=)
• Sometimes it is much more efficient to compute .(0̅ ‘ , 0̅(=)) directly

Kernels and Feature Maps
(( ?ra(1 C o n s i d e r a F e a t u r e M a p < =+ = = * , = ( , 2 = * = ( f o r =+ ∈ R . Here <: R( → R@ < =+ ⋅ < B ̅ = = ( , = ( , 2 = = ? ⋅ B ( , B ( , 2 B B ? *(*(*(*( = =*(B*( + =(B( + 2=*=(B*B( = =+ ⋅ B̅ ( TJ uuthave Kernel.=+,B̅ = =+⋅B̅(=<=+ ⋅CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com