February 16, 2022
Vishakh (CDS) Recitation 4 February 16, 2022 1 / 32
Recitation 4
Recap of SVMs and Complementary Slackness
Copyright By PowCoder代写 加微信 powcoder
Recap: Hyperplanes to SVMs
Hard-margin vs Soft-margin SVMs
Preview to Complementary Slackness + Kernelization
Vishakh (CDS) Recitation 4 February 16, 2022 3 / 32
Component of v1, v2 in the direction w
wT v1 kwk2
wT v2 kwk2
Vishakh (CDS)
Recitation 4
February 16, 2022
Intro Review
Level Surfaces of f (v ) = w T v with kw k2 = 1
wT v = 5 wT v = 4 wT v = 3 wT v = 2 wT v = 1 wT v = 0
wT v = 1 wT v = 2 wT v = 3 wT v = 4
(CDS) Recitation 4
February 16, 2022
Sides of the Hyperplane w T v = 15
wT v 15 > 0 wT v = 15
wT v 15 < 0
Vishakh (CDS) Recitation 4
February 16, 2022 6 / 32
Signed Distance from x1, x2 to Hyperplane w T v = 20
wT v = 27 wT v = 20
wT x1 20 kwk2
wT x2 20 7
kwk2 = p10
Vishakh (CDS)
Recitation 4
February 16, 2022
Linearly Separable
Vishakh (CDS) Recitation 4 February 16, 2022 8 / 32
Definition
We say (xi,yi) for i = 1,...,n are linearly separable if there is a w 2 Rd and a 2 R such that yi (w T xi + a) > 0 for all i , y = ±1. The set
{v 2 Rd | wT v + a = 0} is called a separating hyperplane.
Linearly Separable Data
Vishakh (CDS) Recitation 4 February 16, 2022 9 / 32
Many Separating Hyperplanes Exist
Vishakh (CDS) Recitation 4 February 16, 2022 10 / 32
Maximum Margin Separating Hyperplane
wTv+a =M kwk2
wTv+a =0 kwk2
wTv+a = M kwk2
Vishakh (CDS)
Recitation 4
February 16, 2022
Maximizing the Margin
We can rewrite this in a more standard form: maximizew ,a,M M
subjectto yi(wTxi+a) M foralli. kw k2
Let’s fix the norm kwk2 to 1/M to obtain: maximize 1
subjectto yi(w xi +b) 1 foralli
It’s equivalent to solving the minimization problem minimize 12 kw k2
subjectto yi(wTxi+b) 1 foralli
Vishakh (CDS) Recitation 4 February 16, 2022 12 / 32
Soft Margin SVM (unlabeled points have ⇠i = 0)
Vishakh (CDS)
Recitation 4 February 16, 2022 13 / 32
Soft Margin SVM
Introduce slack variables:
minimize 12kwk2 + =1 ⇠i subjectto yi(wTxi +b) 1 ⇠i
⇠i 0 foralli If ⇠i = 0 8i, it’s reduced to hard SVM.
If ⇠i > 0, we have misclassified an example i.e. it is on the wrong side of the hyperplane
C controls the penalty for each misclassfication.
Vishakh (CDS) Recitation 4 February 16, 2022 14 / 32
Soft Margin SVM (unlabeled points have ⇠i = 0)
1 If your data is linearly separable, which SVM (hard margin or soft margin) would you use?
2 Consider the optimization problem:
minimizew,a,⇠ subjectto
yi(wTxi +a) 1 ⇠i ⇠i 0 foralli. kwk2 r2
Vishakh (CDS)
Recitation 4
February 16, 2022
Optimize Over Cases Where Margin Is At Least 1/r
Vishakh (CDS) Recitation 4 February 16, 2022 16 / 32
Overfitting: Tight Margin With No Misclassifications
Almost no margin
Vishakh (CDS) Recitation 4 February 16, 2022 17 / 32
Training Error But Large Margin
Large margin
Vishakh (CDS) Recitation 4 February 16, 2022 18 / 32
SVM Review : Primal and Dual Formulations
Vishakh (CDS) Recitation 4 February 16, 2022 19 / 32
minimize 12||w||2 + =1 ⇠i subject to ⇠i 0 for i = 1,…,n
1 yi⇥wTxi+b⇤ ⇠i0 fori=1,…,n
SVM Review : Primal and Dual Formulations
Constraint
⇥ -⇠i0⇤
1 yi wTxi+b ⇠i0
Vishakh (CDS) Recitation 4
February 16, 2022 21 / 32
L(w,b,⇠,↵, ) = 12||w||2 + =1 ⇠i
+Pni=1 ↵i 1 yi ⇥wTxi +b⇤ ⇠i +Pni=1 i ( ⇠i)
SVM Review : Primal and Dual Formulations
The SVM Dual Problem
By Slater’s conditions, we have strong duality (Convex Optimization + A ne Constraints + Feasibility)
We can draw some insights from complementary slackness.
If x⇤ is primal optimal and ⇤ is dual optimal, f0(x⇤) = g( ⇤)
⇤ ⇤P⇤Pm⇤⇤ f0(x)=g( )=f0(x)+ i=1 ifi(x)
Each term in sum mi=1 ⇤i fi(x⇤) must actually be 0.
Thatis i >0 =) fi(x⇤)=0 and fi(x⇤)<0 =) i =0 8i
Vishakh (CDS) Recitation 4 February 16, 2022 22 / 32
SVM Review : Primal and Dual Formulations
The SVM Dual Problem
We found the SVM dual problem can be written as::
X↵i 1 X ↵i↵jyiyjxjTxi
↵iyi =0 i=1 h ci
↵i20,n i=1,...,n. (First order conditions on the Lagrangian)
i=1 2 i,j=1 n
Vishakh (CDS) Recitation 4 February 16, 2022 23 / 32
SVM Review : Primal and Dual Formulations
The SVM Dual Problem
We found the SVM dual problem can be written as::
X↵i 1 X ↵i↵jyiyjxjTxi
↵iyi =0 i=1 h ci
↵i20,n i=1,...,n.
i=1 2 i,j=1 n
Given Psolution ↵⇤ to the dual problem, primal solution is w⇤= ni=1↵i⇤yixi.
↵i⇤,yi is scalar, so the optimum solution is in the span of the input examples
Vishakh (CDS) Recitation 4 February 16, 2022 24 / 32
SVM Review : Primal and Dual Formulations
The SVM Dual Problem
We found the SVM dual problem can be written as::
X↵i 1 X ↵i↵jyiyjxjTxi
↵iyi =0 i=1 h ci
i=1 2 i,j=1 n
↵i20,n i=1,...,n. Given Psolution ↵⇤ to the dual problem, primal solution is
w⇤= ni=1↵i⇤yixi.
We also know that ↵i⇤ 2 [0, nc ], which is the ’weight’ associated with
each example. So C controls max weight on each example.
Vishakh (CDS) Recitation 4 February 16, 2022 25 / 32
SVM Review : Primal and Dual Formulations
Support Vectors and The Margin
Recall ”slack variable” ⇠⇤ = max(0, 1 yi f ⇤(xi )) is the hinge loss on (xi,yi).
Suppose ⇠⇤ = 0, Then yi (f ⇤(xi )) 1
”on the margin” (=1) or ”on the good side” (> 1)
Vishakh (CDS) Recitation 4
February 16, 2022
Complementary Slackness Conditions
Complementary Slackness Conditions
Recall our primal constraints and Lagrange multipliers:
((1 yi f (xi )) ⇠i ) 0
BPy strong duality, we must have complementary slackness. Each of mi=1 ⇤i fi(x⇤) must be 0:
↵ i⇤ ( 1 y i f ⇤ ( x i ) ⇠ i⇤ ) = 0 ⇤i ⇠ i ⇤ = ⇣ nc ↵ i ⇤ ⌘ ⇠ i ⇤ = 0
Recall first order condition r⇠i L = 0 gave us ⇤i = nc ↵i⇤
Constraint
Vishakh (CDS) Recitation 4 February 16, 2022 27 / 32
Complementary Slackness Conditions
Consequences of Complementary Slackness
By strong duality, we must have complementary slackness: ↵ i⇤ ( 1 y i f ⇤ ( x i ) ⇠ i⇤ ) = 0
⇣ nc ↵ i ⇤ ⌘ ⇠ i ⇤ = 0
if yi f ⇤(xi ) > 1, then you’re on the right side of the margin i.e slack
⇠i⇤ =0andweget↵i⇤ =0
if yi f ⇤(xi ) < 1, then a misclassification has occurred and slack
⇠i⇤ > 0, so ↵i⇤ = nc
Vishakh (CDS) Recitation 4 February 16, 2022 28 / 32
Complementary Slackness Conditions
Consequences of Complementary Slackness
By strong duality, we must have complementary slackness: ↵ i⇤ ( 1 y i f ⇤ ( x i ) ⇠ i⇤ ) = 0
⇣ nc ↵ i ⇤ ⌘ ⇠ i ⇤ = 0
W e a l s o k n o w t h a t ↵ i ⇤ 2 [ 0 , nc ]
if↵i⇤ =0,then⇠i⇤ =0,whichimpliesnoloss,soyif⇤(xi) 1 if↵i⇤ 2 0,nc ,then⇠i⇤ =0,whichimplies1 yif⇤(xi)=0
Vishakh (CDS) Recitation 4 February 16, 2022 29 / 32
Complementary Slackness Conditions
Support Vectors
If ↵i⇤ is a solution to the dual problem, then primal solution is
with ↵i⇤ 2 ⇥0, nc ⇤ as the ’weight’ associated with that example
In the case where ↵i⇤ = 0, there is no dependence on those example xi
The xi ’s corresponding to ↵i⇤ > 0 are called support vectors.
Few margin errors or ”on the margin” examples =) sparsity in input examples.
Vishakh (CDS) Recitation 4 February 16, 2022 30 / 32
Complementary Slackness Conditions
Complementary Slackness Results: Summary
↵i⇤=0 =) yif⇤(xi) 1 ↵i⇤2⇣0,nc⌘ =) yif⇤(xi)=1
↵i⇤=nc =) yif⇤(xi)1 yif⇤(xi)<1 =) ↵i⇤=nc
yif⇤(xi)=1 =) ↵i⇤2h0,nci yif⇤(xi)>1 =) ↵i⇤=0
Recitation 4
February 16, 2022
Teaser for Kernelization
Dual Problem: Dependence on x through inner products SVM Dual Problem:
X↵i 1 X ↵i↵jyiyjxjTxi
↵iyi =0 i=1 h ci
i=1 2 i,j=1 n
↵i20,n i=1,…,n.
Note that all dependence on inputs xi and xj is through their inner
product: hxj,xii = xjTxi.
We can replace xjT xi by any other inner product… This is a “kernelized” objective function.
Vishakh (CDS) Recitation 4 February 16, 2022 32 / 32
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com