-4exELEC 400M / EECE 571M Homework 1: Perceptron -8ex Homework is not handed in. Please solve for your own benefit.
ELEC 400M / EECE 571M
Homework 1: Perceptron
Homework is not handed in. Please solve for your own benefit.
• Exercise 1.3, solution attached
• Problem 1.2, solution attached
• Problem 1.3, solution attached
• Problems 1.4 and 1.5; results attached (these are programming exercises, similar to one of
the problems in Assignment 1)
1
• Exercise 1.3
(a)
ynw
T (t)xn = |ynwT (t)xn|sign(ynwT (t)xn)
= | · |sign(yn)sign(wT (t)xn)
= | · |ynsign(wT (t)xn)
For misclassified xn, we have yn 6= sign(wT (t)xn) and hence ynwT (t)xn < 0. (b) PLA: w(t+ 1) = w(t) + ynxn Then: ynw T (t+ 1)xn = ynw T (t)xn + y 2 nx T nxn = ynw T (t)xn + ‖xn‖2 ≥ ynwT (t)xn (c) The PLA update brings w(t + 1) closer to ynw T (t + 1)xn > 0, so closer to a correct
classification.
• Problem 1.2
The perceptron is given by h(x) = sign(wTx) with w = [w0, w1, w2]
T and x = [1, x1, x2]
T .
(a) The two regions h(x) = +1 and h(x) = −1 imply that wTx > 0 and wTx < 0. Hence,
they are separated by wTx = w0 +w1x1 +w2x2 = 0, which is the equation of a line. We can
also write it as x2 = −w1w2x1 −
w0
w2
.
(b) Please draw the picture yourself.
• Problem 1.3
(a) yn = sign(w
∗Txn) → ∀n : ynw∗Txn = sign(w∗Txn)w∗Txn = |w∗Txn| > 0,
and thus also ρ = minn ynw
∗Txn > 0
(b) Part I:
wT (t)w∗ = [wT (t− 1) + y(t− 1)xT (t− 1)]w∗
= wT (t− 1)w∗ + y(t− 1)w∗Tx(t− 1)
≥ wT (t− 1)w∗ + ρ
Part II: Proof by induction, using Part I:
t = 1: wT (1)w∗ ≥ wT (0)w∗ + ρ = 1 · ρ
Assume it is true for t and show that it follows for t+ 1:
wT (t+ 1)w∗ ≥ wT (t)w∗ + ρ ≥ tρ+ ρ = (t+ 1)ρ
(c) We can write
‖w(t)‖2 = ‖w(t− 1) + y(t− 1)x(t− 1)‖2
= ‖w(t− 1)‖2 + ‖x(t− 1)‖2 + 2y(t− 1)wT (t− 1)x(t− 1)
≤ ‖w(t− 1)‖2 + ‖x(t− 1)‖2
2
(d) t = 0: ‖w(0)‖ = 0 ≤ 0 ·R2
Assume it is true for t and show that it follows for t+ 1:
‖w(t+ 1)‖2 ≤ ‖w(t)‖2 + ‖x(t)‖2 ≤ tR2 +R2 = (t+ 1)R2
(e) From (b) and (d) we get
wT (t)w∗
‖w(t)‖
≥
tρ
√
tR
=
√
tρ
R
Furthermore, from the Cauchy-Schwarz inequality we have
wT (t)w∗ ≤ ‖w(t)‖‖w∗‖ ↔
wT (t)w∗
‖w(t)‖
≤ ‖w∗‖
and hence
‖w∗‖ ≥
√
tρ
R
→ t ≤
R2‖w∗‖2
ρ2
• Problem 1.4
(a)+(b)
The PLA took 67 iterations to converge for this example. It got fairly close to the target
function as the samples leave relatively little room for the decision boundary.
(c)
The PLA took 2 iterations to converge for this example. Compared to the example
from (b), the samples leave more room for the decision boundary and the result is not
as close to the target function as in case (b).
3
(d)
The PLA took 148 iterations to converge. With the increased number of samples, the
PLA takes longer and can learn a decision boundary that is fairly close to the target
function.
(e)
The PLA took 669 iterations to converge. The conclusions from (d) are further empha-
sized.
(f) For the test case with d = 10, the PLA took 2454 iterations to converge, which is
notably larger than for the test cases with d = 2 above.
(g)
The number of iterations for the 100 trials is between 958 and 3644. The plot shows a
histogram with 10 equally sized bins.
(h) Will be revisited in Assignment 1.
• Problem 1.5
4
The following results are for one sample set. They will look different for other sample sets.
So play around with using different random sets.
(a)
The figure shows the norm of the weight vector. In this case, the algorithm does not
converge for η = 100, but it does start oscillating away from an optimal solution.
(b)
The learning algorithm stops after 56 iterations on the training set and achieves an error
rate of 0.021 for the test set.
(c)
The learning algorithm runs all 1,000 iterations on the training set and achieves an error
rate of 0.042 for the test set.
(d)
5
The learning algorithm runs all 1,000 iterations on the training set and achieves an error
rate of 0.121 for the test set.
(e) We observe the smaller learning rate for smaller η. With longer training, we could get
a smaller test error with smaller η. Your results may look different for your sets of
training and test data.
6