Perceptron (continued)
57
Perceptron through origin
perceptron_through_origin(D, T): 𝜃 = zeros(d);
for t=1 to T
for i=1 to n
if𝑦𝑖 𝜃𝑇𝑥𝑖 ≤0:
𝜃 = 𝜃 + 𝑦(𝑖)𝑥(𝑖) return(𝜃)
58
Linear separability
• Property of a dataset 𝒟
• Linearly separable if there is some 𝜃 such that 𝑦𝑖 𝜃𝑇𝑥𝑖 +𝜃0 >0∀𝑖.
59
Margin
60
Signed perpendicular distance from hyperplane to origin
𝑥2
• Projection of vector G onto
vector H: 𝐺⋅𝐻 ‖𝐻‖
𝑥
𝑥1
𝜃
?
61
Signed perpendicular distance from point to
hyperplane
𝑥2
𝑝
?
𝑥1
𝜃
62
«𝑝 − 𝑥» additional explanation 𝑥2
𝑝
𝑝−𝑥
𝑥
63
𝜃
𝑥1
Margin
• Margin of a labeled data point (x, y) w.r.t. a separator 𝜃: 𝑦 ⋅ 𝜃𝑇𝑥
‖𝜃‖
• Margin of 𝒟 w.r.t. 𝜃:
min 𝑦(𝑖) 𝜃𝑇𝑥(𝑖) 𝑖 ‖𝜃‖
64
Perceptron convergence theorem
• If:
• there is 𝜃∗ s.t. 𝑦(𝑖) 𝜃∗𝑇𝑥(𝑖) ≥ 𝛾 > 0 for all 𝑖
the margin of 𝒟 w.r.t. 𝜃∗ ≥ 𝛾
‖𝜃∗‖
• 𝑥(𝑖) ≤𝑅
Then the Perceptron will make at most 𝑅 2 mistakes.
𝛾
65