Introduction
The Perceptron Algorithm: A Geometrical Proof
Marc de Kamps
Institute for Artificial and Biological Intelligence University of Leeds
Copyright By PowCoder代写 加微信 powcoder
Marc de Algorithm
Introduction
The Perceptron What is the definition?
We introduced the perceptron:
h = w1x1 +w2x2 +w3x3 −θ
if h > 0 then o = 1 if h ≤ 0 then o = 0
h is sometimes called the local field
Perceptron Algorithm
Marc de Kamps
Introduction
The Perceptron What is the definition?
Data set: linearly separable
Perceptron Algorithm: will always converge on weights, bias
Weights and bias will classify the data set correctly
Figure: Data set
Perceptron Algorithm
Marc de Kamps
Introduction
The Perceptron What is the definition?
Data set: linearly separable
Perceptron Algorithm: will always converge on weights, bias
Weights and bias will classify the data set correctly
Figure: linearly separable
Perceptron Algorithm
Marc de Kamps
Introduction
Perceptron Algorithm Proof
We introduced the perceptron:
h = w1x1 +w2x2 +w3x3 −θ
if h > 0 then o = 1 if h ≤ 0 then o = 0
h is sometimes called the local field
Perceptron Algorithm
Marc de Kamps
Introduction
Perceptron Algorithm Get Rid of Bias
w1 Consider a 2D classification w2
Find two weights and one bias But note:
w1x1 + w2x2 − θ is equivalent to:
w1x1 + w2x2 + w3x3 ifx3 ≡1andw3 ≡−θ
Perceptron Algorithm
Marc de Kamps
Introduction
Perceptron Algorithm Get Rid of Bias
In the data set this amounts to adding an extra column, e.g.:
classification
classification
Perceptron Algorithm
Marc de Kamps
Introduction
Perceptron Algorithm Get Rid of Bias
Effectively: new data set 3D not 2D
data points hover at z = 1
plane through origin (hard to see, but easy to verify algebraically:)
3x + y − 3z = 0 z = 1: y = −3x + 1
Perceptron Algorithm
Marc de Kamps
Introduction
The Perceptron Algorithm
Linearly Classifiable Dataset
Assume that there are two classes of input patterns in an N-dimensional space: C1 and C2. Also assume that a set of weights w⃗ ∗ exists, such that for all patterns ⃗x ∈ C1:
w⃗ ∗ · ⃗x < 0 and for all patterns ⃗y ∈ C2:
w⃗ ∗ · ⃗y ≥ 0
Marc de Algorithm
Introduction
The Perceptron Algorithm
The Perceptron Algorithm
1 Start with a random set of weights w⃗0.
2 Pick an arbitrary pattern ⃗x ∈ C1 ∪ C2.
3 If ⃗x ∈ C1 and ⃗x · w⃗ < 0 goto 2. Else w⃗ → w⃗ − λ⃗x. Goto 2
4 If ⃗x ∈ C2 and ⃗x · w⃗ ≥ 0 goto 2. Else w⃗ → w⃗ + λ⃗x. Goto 2
The Perceptron Theorem states that this algorithm will hit the ’Else’ clauses only a finite number of times if and only if the data set is linearly classifiable.
0 < λ ≤ 1 is the learning rate.
Marc de Algorithm
Introduction
Perceptron Algorithm Leave Only One Class
Construct a single class C as follows:
Take a pattern ⃗x from C1 Replace it by −⃗x and add
Remove the original
pattern from C1
Repeat until C1 is empty
Perceptron Algorithm
Marc de Kamps
Introduction
Perceptron Algorithm Leave Only One Class
Simplified Problem
Given that we have a set of input vectors, and given the existance of a plane through the origin such that all input vectors of a class C are on one side of the plane, find such a plane. That plane is also the solution of our orginal classification problem of separating the two classes C1 and C2.
Marc de Algorithm
Introduction
The Perceptron Algorithm
The Simplified Perceptron Algorithm
1 Start with a random set of weights w⃗0.
2 Pick an arbitrary pattern ⃗x ∈ C.
3 If⃗x∈Cand⃗x·w⃗ ≥0goto2.Elsew⃗ →w⃗ +λ⃗x.Goto2
The Perceptron Theorem states that this algorithm will hit the ’Else’ clauses only a finite number of times if and only if the data set is linearly classifiable.
0 < λ ≤ 1 is the learning rate.
Marc de Algorithm
Introduction
Perceptron Algorithm Leave Only One Class
Normalizing:
Normaling does not alter classification
Amounts to multiplying all weights by the same positive constant
Put patterns on the unit sphere
But still on the same side of the plane
Perceptron Algorithm
Marc de Kamps
Introduction
Perceptron Theorem
We now have a single class C of patterns, x⃗i with | x⃗i |= 1. We assume the existence of a set of weights w⃗∗ such that for each ⃗x ∈ C ,
⃗x · w⃗ ≥ δ > 0 .
w⃗∗ is a solution to our classifcation problem, that we do not know, but that we know exists. (Without loss of generality:
| w⃗∗ |= 1)
We will look at what happens to
w⃗ ∗ · w⃗ n | w⃗ n |
during the execution of the Perceptron Algorithm.
Marc de Algorithm
Introduction
Perceptron Theorem
Strategy: assume we have had n misclassifications so far. The current set of weights is w⃗n, reflecting that the weights have been updated n times. What happens to:
w⃗ ∗ · w⃗ n | w⃗ n |
We will show that when the weights change, this increases by a finite amount
Marc de Algorithm
Introduction
Perceptron Theorem
Observe that:
w⃗ ∗ · w⃗ n | w⃗ n |
is nothing but cos(w⃗∗, w⃗n), so < 1 Observe numerator under change:
Denominator:
w⃗ ∗ · w⃗ → w⃗ ∗ · w ⃗
w⃗ · w⃗ → w ⃗ · w ⃗
n n n+1 n+1
Marc de Algorithm
Introduction
For an input pattern ⃗x:
w⃗∗·w⃗ =w⃗∗·w⃗ +w⃗∗·⃗x
We know that:
w⃗ ∗ · ⃗x ≥ δ > 0
So the numerator increases with every step by at least a fixed amount δ:
w⃗∗·w⃗ ≥w⃗∗·w⃗+δ n+1 n
Marc de Algorithm
Introduction
Denominator
For an input pattern ⃗x:
w⃗ ·w⃗ =|w⃗ |2+2⃗x·w⃗+|⃗x·⃗x|
since w⃗n · ⃗x < 0 (Why?)
w ⃗ · w ⃗ < | w⃗ | 2 + 1 , n+1 n+1 n
Marc de Algorithm
Introduction
Conclusion
Conclusion
Since after the n-th application of the weight adding, w ∗ · w⃗ n ≥ n δ
it follows that:
i.e. increases by a fixed amount. This means, there can only be a finite number of steps or cos(w⃗∗,w⃗n) > 1, which is a contradiction.
| w⃗ n | 2 < n , w⃗ ∗ · w⃗ n n δ
| w⃗ n | > √ n ,
Marc de Algorithm
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com