CS计算机代考程序代写 algorithm Perceptron and Online Perceptron

Perceptron and Online Perceptron

Perceptron and Online Perceptron

Daniel Hsu (COMS 4771)

Margins
Let S be a collection of labeled examples from Rd × {−1, +1}. We say S is linearly separable if there exists
w ∈ Rd such that

min
(x,y)∈S

y〈w, x〉 > 0,

and we call w a linear separator for S.

The (minimum) margin of a linear separator w for S is the minimum distance from x to the hyperplane
orthogonal to w, among all (x, y) ∈ S. Note that this notion of margin is invariant to positive scaling of w. If
we rescale w so that

min
(x,y)∈S

y〈w, x〉 = 1,

then this minimum distance is 1/‖w‖2. Therefore, the linear separator with the largest minimum margin is
described by the following mathematical optimization problem:

min
w∈Rd

1
2
‖w‖22

s.t. y〈w, x〉 ≥ 1, (x, y) ∈ S.

Perceptron algorithm
The Perceptron algorithm is given as follows. The input to the algorithm is a collection S of labeled examples
from Rd × {−1, +1}.

• Begin with ŵ1 := 0 ∈ Rd.
• For t = 1, 2, . . .:

– If there is a labeled example in S (call it (xt, yt)) such that yt〈ŵt, xt〉 ≤ 0, then set ŵt+1 := ŵt+ytxt.
– Else, return ŵt.

Theorem. Let S be a collection of labeled examples from Rd × {−1, +1}. Suppose there exists a vector
w? ∈ Rd such that

min
(x,y)∈S

y〈w?, x〉 = 1.

Then Perceptron on input S halts after at most ‖w?‖22L2 loop iterations, where L := max(x,y)∈S ‖x‖2.

Proof. Suppose Perceptron does not exit the loop in the t-th iteration. Then there is a labeled example
(xt, yt) ∈ S such that

yt〈w?, xt〉 ≥ 1,
yt〈ŵt, xt〉 ≤ 0.

We bound 〈w?, wt+1〉 from above and below to deduce a bound on the number of loop iterations. First, we
bound 〈w?, ŵt〉 from below:

〈w?, ŵt+1〉 = 〈w?, ŵt〉+ yt〈w?, xt〉 ≥ 〈w?, ŵt〉+ 1.

1

Since ŵ1 = 0, we have
〈w?, ŵt+1〉 ≥ t.

We now bound 〈w?, ŵt+1〉 from above. By Cauchy-Schwarz,

〈w?, ŵt+1〉 ≤ ‖w?‖2‖ŵt+1‖2.

Also,
‖ŵt+1‖22 = ‖ŵt‖

2
2 + 2yt〈ŵt, xt〉+ y

2
t ‖xt‖

2
2 ≤ ‖ŵt‖

2
2 + L

2.

Since ‖ŵ1‖2 = 0, we have
‖ŵt+1‖22 ≤ L

2t,

so
〈w?, ŵt+1〉 ≤ ‖w?‖2L


t.

Combining the upper and lower bounds on 〈w?, ŵt+1〉 shows that

t ≤ 〈w?, ŵt+1〉 ≤ ‖w?‖2L

t,

which in turn implies the inequality t ≤ ‖w?‖22L2.

Online Perceptron algorithm
The Online Perceptron algorithm is given as follows. The input to the algorithm is a sequence
(x1, y1), (x2, y2), . . . of labeled examples from Rd × {−1, +1}.

• Begin with ŵ1 := 0 ∈ Rd.
• For t = 1, 2, . . .:

– If yt〈ŵt, xt〉 ≤ 0, then set ŵt+1 := ŵt + ytxt.
– Else, ŵt+1 := ŵt.

We say that Online Perceptron makes a mistake in round t if yt〈ŵt, xt〉 ≤ 0.

Theorem. Let (x1, y1), (x2, y2), . . . be a sequence of labeled examples from Rd × {−1, +1} such that there
exists a vector w? ∈ Rd satisfying

min
t=1,2,…

yt〈w?, xt〉 = 1.

Then Online Perceptron on input (x1, y1), (x2, y2), . . . makes at most ‖w?‖22L2 mistakes, where L :=
maxt=1,2,… ‖xt‖2.

Proof. The proof of this theorem is essentially the same as the proof of the iteration bound for Perceptron.

Online Perceptron may be applied to a collection of labeled examples S by considering the labeled examples
in S in any (e.g., random) order. If S is linearly separable, then the number of mistakes made by Online
Perceptron can be bounded using the theorem.

However, Online Perceptron is also useful when S is not linearly separable. This is especially notable in
comparison to Perceptron, which never terminates if S is not linearly separable.

Theorem. Let (x1, y1), (x2, y2), . . . be a sequence of labeled examples from Rd×{−1, +1}. Online Perceptron
on input (x1, y1), (x2, y2), . . . makes at most

min
w?∈Rd


‖w?‖22L2 + ‖w?‖2L√ ∑

t∈M
`(〈w?, xt〉, yt) +


t∈M

`(〈w?, xt〉, yt)


mistakes, where L := maxt=1,2,… ‖xt‖2,M is the set of rounds on which Online Perceptron makes a mistake,
and `(ŷ, y) := [1− ŷy]+ = max{0, 1− ŷy} is the hinge loss of ŷ when y is the correct label.

2

Proof. Fix any w? ∈ Rd. Consider any round t in which Online Perceptron makes a mistake. Let
Mt := {1, . . . , t} ∩M and Mt := |Mt|. We will bound 〈w?, ŵt+1〉 from above and below to deduce a bound
on Mt, the number of mistakes made by Online Perceptron through the first t rounds. First we bound
〈w?, ŵt+1〉 from above. By Cauchy-Schwarz,

〈w?, ŵt+1〉 ≤ ‖w?‖2‖ŵt+1‖2.

Moreover,
‖ŵt+1‖22 = ‖ŵt‖

2
2 + 2yt〈ŵt, xt〉+ y

2
t ‖xt‖

2
2 ≤ ‖ŵt‖

2
2 + L

2.

Since ŵ1 = 0, we have
‖ŵt+1‖22 ≤ L

2Mt,

and thus
〈w?, ŵt+1〉 ≤ ‖w?‖2L


Mt.

We now bound 〈w?, wt+1〉 from below:

〈w?, ŵt+1〉 = 〈w?, ŵt〉+ 1− [1− yt〈w?, xt〉]
≥ 〈w?, ŵt〉+ 1− [1− yt〈w?, xt〉]+
= 〈w?, ŵt〉+ 1− `(〈w?, xt〉, yt),

Since ŵ1 = 0,
〈w?, ŵt+1〉 ≥Mt −Ht,

where
Ht :=


t∈Mt

`(〈w?, xt〉, yt).

Combining the upper and lower bounds on 〈w?, ŵt+1〉 shows that

Mt −Ht ≤ 〈w?, ŵt+1〉 ≤ ‖w?‖2L

Mt,

i.e.,
Mt − ‖w?‖2L


Mt −Ht ≤ 0.

This inequality is quadratic in

Mt. By solving it1, we deduce the bound

Mt ≤
1
2
‖w?‖22L

2 +
1
2
‖w?‖2L


‖w?‖22L2 + 4Ht + Ht,

which can be further loosened to the following (slightly more interpretable) bound:

Mt ≤ ‖w?‖22L
2 + ‖w?‖2L


Ht + Ht.

The claim follows.

1The inequality is of the form x2 − bx− c ≤ 0 for some non-negative b and c. This implies that x ≤ (b +

b2 + 4c)/2, and
hence x2 ≤ (b2 + 2b


b2 + 4c + b2 + 4c)/4. We can then use the fact that


A + B ≤


A +

B for any non-negative A and B to
deduce x2 ≤ b2 + b


c + c.

3

Margins
Perceptron algorithm
Online Perceptron algorithm