08_hard-margin-SVM
Qiuhong Ke
Hard-margin Support Vector Machines
COMP90051 Statistical Machine Learning
Copyright: University of Melbourne
Before we start…
About me
• 2015.02-2018.04: PhD in UWA
• 2018.05-2019.12: Post-doc in MPII
• From 2020.01: Lecturer in UniMelb
• Research: Action recognition and prediction using machine learning
• Contact: qiuhong. .com;
2
qiuhong. .au
mailto:qiuhong. .com
mailto:qiuhong. .com
qiuhongk
Cross-Out
Outline
• Linear Classifier
• Deriving Hard-margin SVM Objective
• SVM Objective as Regularised Loss Function
3
Linear separability
4
Linear Classifier Regularised LossSVM Objective
Form
5
Line Plane
Higher-dimensional
features …
Hyperplane
Linear Classifier Regularised LossSVM Objective
Formula
6
y = f(x) =
d
∑
i=1
wixi + b = w
Tx + b
w : d-dim weight vector (column)
x : d-dim feature vector (column)
b : Bias
T : Transpose
wTx = ∥w∥∥x∥ cos θ
Linear Classifier Regularised LossSVM Objective
Recap of Perceptron
7
y = 1, if f(x) =
d
∑
i=1
wixi + b ≥ 0
y = − 1, if f(x) =
d
∑
i=1
wixi + b < 0 Linear Classifier Regularised LossSVM Objective Example: classify angry birds 8 vs Source: by Papachan (CC BY) https://www.sketchport.com/drawing/4524248689278976/bomb-and-chuck Linear Classifier Regularised LossSVM Objective https://www.sketchport.com/user/4893810324668416/papachan https://www.sketchport.com/drawing/4524248689278976/bomb-and-chuck https://www.sketchport.com/user/4893810324668416/papachan https://www.sketchport.com/drawing/4524248689278976/bomb-and-chuck Use perceptron as the classifier 9 Perceptron : all boundaries are equally good as they separate classes perfectly Linear Classifier Regularised LossSVM Objective One possible (bad) decision boundary 10 ! f(x) = 0 f(x) < 0 f(x) > 0
A B
Which one of predictions of the data
points is more confident?
A or B?
Slight change of decision boundary or
data points changes the prediction
Linear Classifier Regularised LossSVM Objective
Data points are closer to decision
boundary->less confident prediction
Another possible (bad) decision boundary
11
!
f(x) = 0f(x) < 0 f(x) > 0
Linear Classifier Regularised LossSVM Objective
12
Good decision boundary
Linear Classifier Regularised LossSVM Objective
Goal of SVM
Ma
rgi
n
☺
Margin: 2x minimum distance between boundary and data points
Small-margin decision boundary
13
Margin
Margin: 2x minimum distance between boundary and data points
!
Linear Classifier Regularised LossSVM Objective
Small-margin decision boundary
14
M
ar
gi
n
Margin: 2x minimum distance between boundary and data points
!
Linear Classifier Regularised LossSVM Objective
15
☺
SVM
Goal: find maximum-margin decision boundary
(Robust and confident in prediction)
Hard-margin SVM: all points are perfectly
classified and on or outside the margin
(hard constraint)
Soft-margin SVM? Next lecture.
Linear classifier
Linear Classifier Regularised LossSVM Objective
Ma
rgi
n
Margin and weights
16
f(x) = wTx + b
Linear Classifier Regularised LossSVM Objective
☺
Ma
rgi
n
SVM decision boundary:
SVM goal: find parameters of max-
margin decision boundary
Margin: 2x minimum distance between boundary and data points
What’s the relationship between
parameters and margin?
Weight vector decision boundary
f(x) = 0
w
f(x) = wTx + b
x1
x2
wTx1 + b = 0
wTx2 + b = 0
wT(x1 − x2) = 0
∥w∥∥x1 − x2∥cosθ = 0
17
f(x) = wTx + b
o
Linear Classifier Regularised LossSVM Objective
18
Margin hyperplanes
wTx + b = − b0
wTx + b = + b0
wTx + b = 0
−
+
Margin = 2γ*
γ*γ*
Linear Classifier Regularised LossSVM Objective
Margin: 2x minimum distance between boundary and data points
Positive margin hyperplane: f(x) = + b0
Nositive margin hyperplane: f(x) = − b0
(b0 > 0)
f(x) = 0
−
+
19
Distance for positive class
f(x) = wTx + b
γ*
x+
x′�+ w
wTx+ + b = b0
wT(x+ − x′�+) = b0
γ* =
b0
∥ w ∥
o
Multiclass SVMLinear Classifier Regularised LossSVM Objective
wTx′�+ + b = 0
f(x) = + b0
∥w∥∥x1 − x2∥cosθ = b0
f(x) = 0
−
+
20
Distance for negative class
f(x) = wTx + b
γ*
x−
x′�−
w
o
Linear Classifier Regularised LossSVM Objective
f(x) = − b0
wTx− + b = − b0
wT(x− − x′�−) = − b0
γ* =
b0
∥ w ∥
wTx′�− + b = 0
∥w∥∥x1 − x2∥cosθ = − b0
21
Geometric Margin
wTx + b = − b0
wTx + b = + b0
wTx + b = 0
−
+
γ* =
b0
∥ w ∥
γ*γ*
Linear Classifier Regularised LossSVM Objective
γi =
yi(w
Txi + b)
∥ w ∥
γi = −
wTxi + b
∥ w ∥
yi = − 1,
: Euclidean distance between a
data point to the hyperlane.
γi
i
γi =
wTxi + b
∥ w ∥
yi = + 1,
f(x) = 0
−
+
22
Linear Classifier
Functional margin
f(x) = wTx + b
̂γi = yi(w
Txi + b)
For data point i:
Regularised LossSVM Objective
23
Example
̂γi = yi(w
Txi + b)
Functional margin: indicate confidence
and correctness of prediction for
different points given the same
boundary
̂γA < ̂γB ̂γC < ̂γD ̂γE < 0 f(x) = 3x1 + 2x2 − 12 = 0 A B D C E Linear Classifier Regularised LossSVM Objective 24 Problem of functional margin f(x) = 3x1 + 2x2 − 12 = 0 ̂γA = 4 Rescale w and b change the margin value Linear Classifier Regularised LossSVM Objective f(x) = 6x1 + 4x2 − 24 = 0 ̂γA = 8 B A B A ̂γB = 2 ̂γB = 4 How about the geometric margin? 25 Maximum margin objective w, b SVM objective: max margin f(x) = wTx + b Any problem? Linear Classifier Regularised LossSVM Objective (Geometric) Margin = 2γ* = 2b0 ∥ w ∥ wTx + b = − b0 wTx + b = + b0 wTx + b = 0 − + γ* = b0 ∥ w ∥ γ*γ* 26 Non-unique representation f(x) = wTx + b = 0 f(x) = x1 + 2x2 − 4 = 0 f(x) = 2x1 + 4x2 − 8 = 0 f(x) = awTx + ab = 0 Linear Classifier Regularised LossSVM Objective 27 Resolving ambiguity wTx + b = − b0 wTx + b = + b0 wTx + b = 0 − + − + wT x + b = 0 wT x + b = − 1 wT x + b = + 1 w b0 → w b b0 → b 2γ* γ*γ* Linear Classifier Regularised LossSVM Objective Margin = 2γ* = 2b0 ∥ w ∥ Margin = 2γ* = 2 ∥ w ∥ SVM: Constrained optimisation problem s.t w min ∥w∥2 2 1 − y(i)(wTx(i) + b) ≤ 0, i = 1,⋯, n (data points) max 2 ∥w∥w if y(i) = + 1 : f(x(i)) = wTx(i) + b ≥ + 1 if y(i) = − 1 : f(x(i)) = wTx(i) + b ≤ − 1 subject to (i = 1,⋯, n data points)f(x) = 0 f(x) = − 1 f(x) = + 1 − + w 28 1 ∥ w ∥1 ∥ w ∥ Linear Classifier Regularised LossSVM Objective Previous ML classification algorithms… 29 Linear Classifier Regularised LossSVM Objective Ridge regression vs Hard-margin SVM objective 30 s.t w min ∥w∥2 2 1 − y(i)(wTx(i) + b) ≤ 0, i = 1,⋯, n , 1 − y(i)(wTx(i) + b) ≤ 0,0 • Hard-margin SVM loss !! = #0 1 − ' ( ") + + ≤ 0 ∞ ./ℎ123451 • Soft-margin SVM loss (hinge loss) !# = # 0 1 − ' (") + + ≤ 0 1 − ' (") + + ./ℎ123451li = , 1 − y(i)(wTx(i) + b) > 0,∞
Data-dependent
training error
Data-independent
regularisation term
Recall ridge regression objective:
min (
n
∑
i=1
(y(i) − wTx(i))2 + λ ∥ w ∥)2)w
Hard Hard-margin SVM objective:
Linear Classifier Regularised LossSVM Objective
Summary
• What’s the goal of SVM
• What’s objective of hard-margin SVM
• How to represent SVM Objective as regularised loss function
31