程序代写 February 16, 2022

February 16, 2022
Vishakh (CDS) Recitation 4 February 16, 2022 1 / 32
Recitation 4
Recap of SVMs and Complementary Slackness

Copyright By PowCoder代写 加微信 powcoder

Recap: Hyperplanes to SVMs
Hard-margin vs Soft-margin SVMs
Preview to Complementary Slackness + Kernelization
Vishakh (CDS) Recitation 4 February 16, 2022 3 / 32

Component of v1, v2 in the direction w
wT v1 kwk2
wT v2 kwk2
Vishakh (CDS)
Recitation 4
February 16, 2022

Intro Review
Level Surfaces of f (v ) = w T v with kw k2 = 1
wT v = 5 wT v = 4 wT v = 3 wT v = 2 wT v = 1 wT v = 0
wT v = 1 wT v = 2 wT v = 3 wT v = 4
(CDS) Recitation 4
February 16, 2022

Sides of the Hyperplane w T v = 15
wT v 15 > 0 wT v = 15
wT v 15 < 0 Vishakh (CDS) Recitation 4 February 16, 2022 6 / 32 Signed Distance from x1, x2 to Hyperplane w T v = 20 wT v = 27 wT v = 20 wT x1 20 kwk2 wT x2 20 7 kwk2 = p10 Vishakh (CDS) Recitation 4 February 16, 2022 Linearly Separable Vishakh (CDS) Recitation 4 February 16, 2022 8 / 32 Definition We say (xi,yi) for i = 1,...,n are linearly separable if there is a w 2 Rd and a 2 R such that yi (w T xi + a) > 0 for all i , y = ±1. The set
{v 2 Rd | wT v + a = 0} is called a separating hyperplane.

Linearly Separable Data
Vishakh (CDS) Recitation 4 February 16, 2022 9 / 32

Many Separating Hyperplanes Exist
Vishakh (CDS) Recitation 4 February 16, 2022 10 / 32

Maximum Margin Separating Hyperplane
wTv+a =M kwk2
wTv+a =0 kwk2
wTv+a =M kwk2
Vishakh (CDS)
Recitation 4
February 16, 2022

Maximizing the Margin
We can rewrite this in a more standard form: maximizew ,a,M M
subjectto yi(wTxi+a) M foralli. kw k2
Let’s fix the norm kwk2 to 1/M to obtain: maximize 1
subjectto yi(w xi +b)1 foralli
It’s equivalent to solving the minimization problem minimize 12 kw k2
subjectto yi(wTxi+b)1 foralli
Vishakh (CDS) Recitation 4 February 16, 2022 12 / 32

Soft Margin SVM (unlabeled points have ⇠i = 0)
Vishakh (CDS)
Recitation 4 February 16, 2022 13 / 32

Soft Margin SVM
Introduce slack variables:
minimize 12kwk2 + =1 ⇠i subjectto yi(wTxi +b)1⇠i
⇠i0 foralli If ⇠i = 0 8i, it’s reduced to hard SVM.
If ⇠i > 0, we have misclassified an example i.e. it is on the wrong side of the hyperplane
C controls the penalty for each misclassfication.
Vishakh (CDS) Recitation 4 February 16, 2022 14 / 32

Soft Margin SVM (unlabeled points have ⇠i = 0)
1 If your data is linearly separable, which SVM (hard margin or soft margin) would you use?
2 Consider the optimization problem:
minimizew,a,⇠ subjectto
yi(wTxi +a)1⇠i ⇠i0 foralli. kwk2  r2
Vishakh (CDS)
Recitation 4
February 16, 2022

Optimize Over Cases Where Margin Is At Least 1/r
Vishakh (CDS) Recitation 4 February 16, 2022 16 / 32

Overfitting: Tight Margin With No Misclassifications
Almost no margin
Vishakh (CDS) Recitation 4 February 16, 2022 17 / 32

Training Error But Large Margin
Large margin
Vishakh (CDS) Recitation 4 February 16, 2022 18 / 32

SVM Review : Primal and Dual Formulations
Vishakh (CDS) Recitation 4 February 16, 2022 19 / 32
minimize 12||w||2 + =1 ⇠i subject to ⇠i  0 for i = 1,…,n
1yi⇥wTxi+b⇤⇠i0 fori=1,…,n

SVM Review : Primal and Dual Formulations

Constraint
⇥ -⇠i0⇤
1yi wTxi+b ⇠i0
Vishakh (CDS) Recitation 4
February 16, 2022 21 / 32
L(w,b,⇠,↵,) = 12||w||2 + =1 ⇠i
+Pni=1 ↵i 1yi ⇥wTxi +b⇤⇠i+Pni=1 i (⇠i)

SVM Review : Primal and Dual Formulations
The SVM Dual Problem
By Slater’s conditions, we have strong duality (Convex Optimization + Ane Constraints + Feasibility)
We can draw some insights from complementary slackness.
If x⇤ is primal optimal and ⇤ is dual optimal, f0(x⇤) = g(⇤)
⇤ ⇤P⇤Pm⇤⇤ f0(x)=g()=f0(x)+ i=1ifi(x)
Each term in sum mi=1 ⇤i fi(x⇤) must actually be 0.
Thatisi >0 =) fi(x⇤)=0 and fi(x⇤)<0 =) i =0 8i Vishakh (CDS) Recitation 4 February 16, 2022 22 / 32 SVM Review : Primal and Dual Formulations The SVM Dual Problem We found the SVM dual problem can be written as:: X↵i 1 X ↵i↵jyiyjxjTxi ↵iyi =0 i=1 h ci ↵i20,n i=1,...,n. (First order conditions on the Lagrangian) i=1 2 i,j=1 n Vishakh (CDS) Recitation 4 February 16, 2022 23 / 32 SVM Review : Primal and Dual Formulations The SVM Dual Problem We found the SVM dual problem can be written as:: X↵i 1 X ↵i↵jyiyjxjTxi ↵iyi =0 i=1 h ci ↵i20,n i=1,...,n. i=1 2 i,j=1 n Given Psolution ↵⇤ to the dual problem, primal solution is w⇤= ni=1↵i⇤yixi. ↵i⇤,yi is scalar, so the optimum solution is in the span of the input examples Vishakh (CDS) Recitation 4 February 16, 2022 24 / 32 SVM Review : Primal and Dual Formulations The SVM Dual Problem We found the SVM dual problem can be written as:: X↵i 1 X ↵i↵jyiyjxjTxi ↵iyi =0 i=1 h ci i=1 2 i,j=1 n ↵i20,n i=1,...,n. Given Psolution ↵⇤ to the dual problem, primal solution is w⇤= ni=1↵i⇤yixi. We also know that ↵i⇤ 2 [0, nc ], which is the ’weight’ associated with each example. So C controls max weight on each example. Vishakh (CDS) Recitation 4 February 16, 2022 25 / 32 SVM Review : Primal and Dual Formulations Support Vectors and The Margin Recall ”slack variable” ⇠⇤ = max(0, 1 yi f ⇤(xi )) is the hinge loss on (xi,yi). Suppose ⇠⇤ = 0, Then yi (f ⇤(xi )) 1 ”on the margin” (=1) or ”on the good side” (> 1)
Vishakh (CDS) Recitation 4
February 16, 2022

Complementary Slackness Conditions
Complementary Slackness Conditions
Recall our primal constraints and Lagrange multipliers:
((1 yi f (xi )) ⇠i )  0
BPy strong duality, we must have complementary slackness. Each of mi=1 ⇤i fi(x⇤) must be 0:
↵ i⇤ ( 1 y i f ⇤ ( x i ) ⇠ i⇤ ) = 0 ⇤i ⇠ i ⇤ = ⇣ nc ↵ i ⇤ ⌘ ⇠ i ⇤ = 0
Recall first order condition r⇠i L = 0 gave us ⇤i = nc ↵i⇤

Constraint
Vishakh (CDS) Recitation 4 February 16, 2022 27 / 32

Complementary Slackness Conditions
Consequences of Complementary Slackness
By strong duality, we must have complementary slackness: ↵ i⇤ ( 1 y i f ⇤ ( x i ) ⇠ i⇤ ) = 0
⇣ nc ↵ i ⇤ ⌘ ⇠ i ⇤ = 0
if yi f ⇤(xi ) > 1, then you’re on the right side of the margin i.e slack
⇠i⇤ =0andweget↵i⇤ =0
if yi f ⇤(xi ) < 1, then a misclassification has occurred and slack ⇠i⇤ > 0, so ↵i⇤ = nc
Vishakh (CDS) Recitation 4 February 16, 2022 28 / 32

Complementary Slackness Conditions
Consequences of Complementary Slackness
By strong duality, we must have complementary slackness: ↵ i⇤ ( 1 y i f ⇤ ( x i ) ⇠ i⇤ ) = 0
⇣ nc ↵ i ⇤ ⌘ ⇠ i ⇤ = 0
W e a l s o k n o w t h a t ↵ i ⇤ 2 [ 0 , nc ]
if↵i⇤ =0,then⇠i⇤ =0,whichimpliesnoloss,soyif⇤(xi)1 if↵i⇤ 20,nc,then⇠i⇤ =0,whichimplies1yif⇤(xi)=0
Vishakh (CDS) Recitation 4 February 16, 2022 29 / 32

Complementary Slackness Conditions
Support Vectors
If ↵i⇤ is a solution to the dual problem, then primal solution is
with ↵i⇤ 2 ⇥0, nc ⇤ as the ’weight’ associated with that example
In the case where ↵i⇤ = 0, there is no dependence on those example xi
The xi ’s corresponding to ↵i⇤ > 0 are called support vectors.
Few margin errors or ”on the margin” examples =) sparsity in input examples.
Vishakh (CDS) Recitation 4 February 16, 2022 30 / 32

Complementary Slackness Conditions
Complementary Slackness Results: Summary
↵i⇤=0 =) yif⇤(xi)1 ↵i⇤2⇣0,nc⌘ =) yif⇤(xi)=1
↵i⇤=nc =) yif⇤(xi)1 yif⇤(xi)<1 =) ↵i⇤=nc yif⇤(xi)=1 =) ↵i⇤2h0,nci yif⇤(xi)>1 =) ↵i⇤=0
Recitation 4
February 16, 2022

Teaser for Kernelization
Dual Problem: Dependence on x through inner products SVM Dual Problem:
X↵i 1 X ↵i↵jyiyjxjTxi
↵iyi =0 i=1 h ci
i=1 2 i,j=1 n
↵i20,n i=1,…,n.
Note that all dependence on inputs xi and xj is through their inner
product: hxj,xii = xjTxi.
We can replace xjT xi by any other inner product… This is a “kernelized” objective function.
Vishakh (CDS) Recitation 4 February 16, 2022 32 / 32

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com