Introduction to Machine Learning Soft Margin SVMs
(Efficient) Feature Maps
Prof. Kutty
Copyright By PowCoder代写 加微信 powcoder
Announcements
• HW1 due on Wednesday
– rememberearlyupload!
– latedaysareforemergencyuse!
• Project 1 released after HW1 is due
• Project 1 QuickStart on Thursday
• Recap: Hard Margin SVMs
• Section 1: Soft Margin SVMs
• Section 2: Feature maps
• Section 3:
cat y’DÉ d
Today’s Agenda perception
feces’s ́plicit offset
Recap: Hard Margin SVMs
Perception algorithm yes b Hard Margin SVM min 8 E b
Assuming data are linearly separable
• Boundary that classifies the
training set correctly
• That is maximally removed from training examples closest to the decision boundary
Dmax “! , $
subjectto3(%) (̅⋅5̅ % +* ≥1 for9∈ 1,…,<
min&(%) (̅,* %
ng point i of data
nanny the training that correctly classify
classifier
the datapoints closest to
Hard Margin SVM Assuming data are linearly separable
port vectors lie on margin boundaries
tl E at 5 0
"! ! "!,$ (
E Etb E 2 6 1
subjectto3(%) (̅⋅5̅% +* ≥1for9∈ 1,...,<
Linear classifier output by this QP: =9><((̅ ⋅ 5̅ + *)
tQuadratic
Linear Separability
What if data are not linearly separable? How can we handle such cases?
1. Constraints seem too restrictive
– Fix the constraints: Soft-Margin SVMs
2. Map to a higher dimensional space
Section 1: Soft Margin SVMs
Soft-Margin SVM
Suppose data are not linearly separable
CONSTRAINT
SATISFIED?
!(") #̅⋅%̅" +' ≥1
!($) #̅⋅%̅$ +' ≥1
!(&) #̅⋅%̅& +' ≥1
!(%) #̅⋅%̅% +' ≥1
!(') #̅⋅%̅' +' ≥1
!(() #̅⋅%̅( +' ≥1
!()) #̅⋅%̅) +' ≥1
M formulat
lonitb "!,$ 2 (') ̅ '
subjectto/ %⋅1̅ +3 ≥1 for6∈ 1,...,:
Soft-Margin SVM
Suppose data are not linearly separable
CONSTRAINT
!(") #̅⋅%̅" +' =1
SATISFIED?
PENALTY ($*)
!($) #̅⋅%̅$ +' =1
!(&) #̅⋅%̅& +' >1
!(%) #̅⋅%̅% +’ >1
M formulat
!(‘) #̅⋅%̅’ +’ ≥1
!()) #̅⋅%̅) +’ ≥1
‘ < $/ < )
!(() #̅⋅%̅( +' ≥1
min %̅ % 9 greek letter
"!,$ 2 Xi subjectto/(') %̅⋅1̅ ' +3 ≥1
for6∈ 1,...,:
Soft-Margin SVM
Suppose data are not linearly separable
2 + C < slack variables
/(') %̅⋅1̅' +3 ≥1−='
min "!,$,*)
subject to
objective function
/̅ ∈ R!; 3 ∈ R
for6∈ 1,...,: and=' ≥0
soft constraints
Soft-Margin SVM
·x+ 0 = 1
lack v This
Soft-margin SVM advantages
• can handle data that are not linearly separable ariables is that we can now solve problems that
hen examples are still linearly separable
• reduce effect of outliers and less chances of overfitting s illustrated in Figure 2 with di↵erent values of
-̅' + min +C12(
=100 (C =0.01)
·x+ 0 = 1
"!,$,&% 2 ()̅*
subjectto;(() -⋅=̅ ( +> ≥1−2(
forB∈ 1,…,E and2(≥0
hyperparameter
Image credit: Barzilay & Jaakkola
examples are no longer linearly separable
e support vector machines can be written in a familiar is exactly the same as minimizing
Figure 2: The e↵ect of slack when examples are st
Soft-Margin SVM hyperparameter
for higher values of C
Image credit: Barzilay & Jaakkola
·x+ 0 = 1 · x + 0 = 0
=100 (C=0.0
Soft-Margin SVM hyperparameter
=100 (C =0.01) for lower values of C
·x+ 0 = 1
a) · a) ·x+ 0 =1 ·x+ 0 = 1
Figure 2: The e↵ect of slack when examples are still linearly se
The other advantage of the slack variables is that we can now solve
Image credit: Barzilay & Jaakkola
·x+ =0 0
pro are no longer linearly separable. This is illustrated in Figure 2 with di↵eren
when exampltehse raerguelasritzialtlionlinpaeramreltyers e.parable
·x+ 0 =1 ·x+ ·x+ 0 =0
=100 (C =0.01)
Soft-Margin SVM: hyperparameter
%̅% – min +C<='
"!,$,*) 2 '+,
subjectto/(') %̅⋅1̅ ' +3 ≥1−='
='≥0for6∈ 1,...,:
With C=0, this is the hard margin SVM A. True
Soft-Margin SVM: hyperparameter
%̅% - min +C<='
"!,$,*) 2 '+,
subjectto/(') %̅⋅1̅ ' +3 ≥1−='
='≥0for6∈ 1,...,:
Intuitively
as ! ↑, penalty on errors/misclassifications ↑
as ! → ∞, in the limit this is the hard margin SVM
Data that are not linearly separable
++++---- ++ ++ -+
++ ---- + --
+ + - - - - - - +
+------ + +--- ++
Idea: minimize empirical risk with s
Idea: feature maps
hinge loss using gradient descent Idea: Soft Margin SVMs
Linear Separability
What if data are not linearly separable? How can we handle such cases?
Constraints seem too restrictive
– Fix the constraints: Soft-Margin SVMs
Map to a higher dimensional space
– map data to a higher dim. space in which there exists a separating hyperplane (corresponds to a non-linear decision boundary in the original space)
Section 2: Feature maps
Linear Separability with offset sign O E tb
Projected 2D space
0 1129 sign o
Original 1D space
sign E Oise
z2 -4z+3.5=0
What feature map will allow us to learn a linear decision boundary with zero training error in the mapped space? A. - # = #,#$ 3
B. - # = #,#$,1 3
C. -# = #,#$,#&,13
D. all of the above
Classifierh(z)=sign(z2 -4z+3.5)
classifier
Linear classifiers
in higher dimensional spaces: exercise
‘ideal’ decision boundary
centerat(2,2) radius=1
1125 2,2224N 422 7
+ - - - ++
- - + + - ----
@1̅=1,% 1% 1, 1%1T %̅=1 1−4 −4 7T
Linear Classifier sign(%̅ ⋅ @ 1̅ ) separates positive from negative examples
Linear classifiers
in higher dimensional spaces: idea
map data to a higher dim. space in which there exists a separating hyperplane (corresponds to a non-linear decision boundary in the original space)
Implications for SVM
(̅ ( K O EIIRP
min +CCD% "! , $ , HG 2 % I J
subjectto3(%) (̅⋅5̅ % +* ≥1−D
for9∈ 1,...,< our
Issue: potentially very inefficient
Linear Separability
What if data are not linearly separable? How can we handle such cases?
1. Soft-Margin SVMs
2. Map to a higher dimensional space
3. SVM dual and the kernel trick
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com