CS考试辅导 Introduction to Machine Learning Soft Margin SVMs

Introduction to Machine Learning Soft Margin SVMs
(Efficient) Feature Maps
Prof. Kutty

Announcements
• HW1 due on Wednesday
– rememberearlyupload!
– latedaysareforemergencyuse!
• Project 1 released after HW1 is due
• Project 1 QuickStart on Thursday

• Recap: Hard Margin SVMs
• Section 1: Soft Margin SVMs
• Section 2: Feature maps
• Section 3:
cat y’DÉ d
Today’s Agenda perception
feces’s ́plicit offset

Recap: Hard Margin SVMs

Perception algorithm yes b Hard Margin SVM min 8 E b
Assuming data are linearly separable
• Boundary that classifies the
training set correctly
• That is maximally removed from training examples closest to the decision boundary
Dmax “! , $
subjectto3(%) (̅⋅5̅ % +* ≥1 for9∈ 1,…,< min&(%) (̅,* % ng point i of data nanny the training that correctly classify classifier the datapoints closest to Hard Margin SVM Assuming data are linearly separable port vectors lie on margin boundaries tl E at 5 0 "! ! "!,$ ( E Etb E 2 6 1 subjectto3(%) (̅⋅5̅% +* ≥1for9∈ 1,...,< Linear classifier output by this QP: =9><((̅ ⋅ 5̅ + *) tQuadratic Linear Separability What if data are not linearly separable? How can we handle such cases? 1. Constraints seem too restrictive – Fix the constraints: Soft-Margin SVMs 2. Map to a higher dimensional space Section 1: Soft Margin SVMs Soft-Margin SVM Suppose data are not linearly separable CONSTRAINT SATISFIED? !(") #̅⋅%̅" +' ≥1 !($) #̅⋅%̅$ +' ≥1 !(&) #̅⋅%̅& +' ≥1 !(%) #̅⋅%̅% +' ≥1 !(') #̅⋅%̅' +' ≥1 !(() #̅⋅%̅( +' ≥1 !()) #̅⋅%̅) +' ≥1 M formulat lonitb "!,$ 2 (') ̅ ' subjectto/ %⋅1̅ +3 ≥1 for6∈ 1,...,: Soft-Margin SVM Suppose data are not linearly separable CONSTRAINT !(") #̅⋅%̅" +' =1 SATISFIED? PENALTY ($*) !($) #̅⋅%̅$ +' =1 !(&) #̅⋅%̅& +' >1
!(%) #̅⋅%̅% +’ >1
M formulat
!(‘) #̅⋅%̅’ +’ ≥1
!()) #̅⋅%̅) +’ ≥1
‘ < $/ < ) !(() #̅⋅%̅( +' ≥1 min %̅ % 9 greek letter "!,$ 2 Xi subjectto/(') %̅⋅1̅ ' +3 ≥1 for6∈ 1,...,: Soft-Margin SVM Suppose data are not linearly separable 2 + C < slack variables /(') %̅⋅1̅' +3 ≥1−=' min "!,$,*) subject to objective function /̅ ∈ R!; 3 ∈ R for6∈ 1,...,: and=' ≥0 soft constraints Soft-Margin SVM ·x+0 =1 lack v This Soft-margin SVM advantages • can handle data that are not linearly separable ariables is that we can now solve problems that hen examples are still linearly separable • reduce effect of outliers and less chances of overfitting s illustrated in Figure 2 with di↵erent values of -̅' + min +C12( =100 (C =0.01) ·x+0 =1 "!,$,&% 2 ()̅* subjectto;(() -⋅=̅ ( +> ≥1−2(
forB∈ 1,…,E and2(≥0
hyperparameter
Image credit: Barzilay & Jaakkola
examples are no longer linearly separable
e support vector machines can be written in a familiar is exactly the same as minimizing

Figure 2: The e↵ect of slack when examples are st
Soft-Margin SVM hyperparameter
for higher values of C
Image credit: Barzilay & Jaakkola
·x+0 =1 · x + 0 = 0
=100 (C=0.0

Soft-Margin SVM hyperparameter
=100 (C =0.01) for lower values of C
·x+0 =1
a) · a) ·x+0 =1 ·x+0 =1
Figure 2: The e↵ect of slack when examples are still linearly se
The other advantage of the slack variables is that we can now solve
Image credit: Barzilay & Jaakkola
·x+ =0 0
pro are no longer linearly separable. This is illustrated in Figure 2 with di↵eren
when exampltehse raerguelasritzialtlionlinpaeramreltyerse.parable
·x+0 =1 ·x+ ·x+0 =0
=100 (C =0.01)

Soft-Margin SVM: hyperparameter
%̅% – min +C<=' "!,$,*) 2 '+, subjectto/(') %̅⋅1̅ ' +3 ≥1−=' ='≥0for6∈ 1,...,: With C=0, this is the hard margin SVM A. True Soft-Margin SVM: hyperparameter %̅% - min +C<=' "!,$,*) 2 '+, subjectto/(') %̅⋅1̅ ' +3 ≥1−=' ='≥0for6∈ 1,...,: Intuitively as ! ↑, penalty on errors/misclassifications ↑ as ! → ∞, in the limit this is the hard margin SVM Data that are not linearly separable ++++---- ++ ++ -+ ++ ---- + -- + + - - - - - - + +------ + +--- ++ Idea: minimize empirical risk with s Idea: feature maps hinge loss using gradient descent Idea: Soft Margin SVMs Linear Separability What if data are not linearly separable? How can we handle such cases? Constraints seem too restrictive – Fix the constraints: Soft-Margin SVMs Map to a higher dimensional space – map data to a higher dim. space in which there exists a separating hyperplane (corresponds to a non-linear decision boundary in the original space) Section 2: Feature maps Linear Separability with offset sign O E tb Projected 2D space 0 1129 sign o Original 1D space sign E Oise z2 -4z+3.5=0 What feature map will allow us to learn a linear decision boundary with zero training error in the mapped space? A. - # = #,#$ 3 B. - # = #,#$,1 3 C. -# = #,#$,#&,13 D. all of the above Classifierh(z)=sign(z2 -4z+3.5) classifier Linear classifiers in higher dimensional spaces: exercise ‘ideal’ decision boundary centerat(2,2) radius=1 1125 2,2224N 422 7 + - - - ++ - - + + - ---- @1̅=1,% 1% 1, 1%1T %̅=1 1−4 −4 7T Linear Classifier sign(%̅ ⋅ @ 1̅ ) separates positive from negative examples Linear classifiers in higher dimensional spaces: idea map data to a higher dim. space in which there exists a separating hyperplane (corresponds to a non-linear decision boundary in the original space) Implications for SVM (̅ ( K O EIIRP min +CCD% "! , $ , HG 2 % I J subjectto3(%) (̅⋅5̅ % +* ≥1−D for9∈ 1,...,< our Issue: potentially very inefficient Linear Separability What if data are not linearly separable? How can we handle such cases? 1. Soft-Margin SVMs 2. Map to a higher dimensional space 3. SVM dual and the kernel trick 程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts