ELEC 400M / EECE 571M Homework 1: Perceptron
Homework is not handed in. Please solve for your own benefit.
• Exercise 1.3, solution attached
• Problem 1.2, solution attached
• Problem 1.3, solution attached
• Problems 1.4 and 1.5; results attached (these are programming exercises, similar to one of the problems in Assignment 1)
1
• Exercise 1.3 (a)
For misclassified xn, we have yn ̸= sign(wT (t)xn) and hence ynwT (t)xn < 0. (b) PLA: w(t + 1) = w(t) + ynxn
Then:
ynwT (t)xn =
= | · |sign(yn)sign(wT (t)xn)
|ynwT (t)xn|sign(ynwT (t)xn) = | · |ynsign(wT (t)xn)
ynwT (t)xn + yn2 xTn xn ≥ ynwT (t)xn
ynwT (t + 1)xn =
= ynwT (t)xn + ∥xn∥2
(c) The PLA update brings w(t + 1) closer to ynwT (t + 1)xn > 0, so closer to a correct classification.
• Problem 1.2
The perceptron is given by h(x) = sign(wT x) with w = [w0, w1, w2]T and x = [1, x1, x2]T .
(a) The two regions h(x) = +1 and h(x) = −1 imply that wTx > 0 and wTx < 0. Hence,
they are separated by wT x = w0 + w1x1 + w2x2 = 0, which is the equation of a line. We can
alsowriteitasx2 =−w1x1 −w0. w2 w2
(b) Please draw the picture yourself.
• Problem 1.3
(a) yn = sign(w∗T xn) → ∀n : ynw∗T xn = sign(w∗T xn)w∗T xn = |w∗T xn| > 0, and thus also ρ = minn ynw∗Txn > 0
(b) Part I:
wT(t)w∗ = [wT(t−1)+y(t−1)xT(t−1)]w∗ = wT(t−1)w∗ +y(t−1)w∗Tx(t−1)
≥ wT(t−1)w∗ +ρ
Part II: Proof by induction, using Part I:
t=1: wT(1)w∗ ≥wT(0)w∗ +ρ=1·ρ
Assume it is true for t and show that it follows for t + 1: wT(t+1)w∗ ≥wT(t)w∗ +ρ≥tρ+ρ=(t+1)ρ
(c) We can write
∥w(t)∥2 = ∥w(t − 1) + y(t − 1)x(t − 1)∥2
= ∥w(t−1)∥2 +∥x(t−1)∥2 +2y(t−1)wT(t−1)x(t−1) ≤ ∥w(t − 1)∥2 + ∥x(t − 1)∥2
2
(d)t=0: ∥w(0)∥=0≤0·R2
Assume it is true for t and show that it follows for t + 1: ∥w(t+1)∥2 ≤∥w(t)∥2 +∥x(t)∥2 ≤tR2 +R2 =(t+1)R2 (e) From (b) and (d) we get
wT (t)w∗ tρ √tρ ∥w(t)∥ ≥√tR= R
Furthermore, from the Cauchy-Schwarz inequality we have
T ∗ ∗ wT(t)w∗ ∗
w (t)w ≤∥w(t)∥∥w ∥↔ ∥w(t)∥ ≤∥w ∥ √tρ R2∥w∗∥2
The PLA took 67 iterations to converge for this example. It got fairly close to the target function as the samples leave relatively little room for the decision boundary.
(c)
and hence
• Problem 1.4 (a)+(b)
∥w∗∥≥R→t≤ ρ2
The PLA took 2 iterations to converge for this example. Compared to the example from (b), the samples leave more room for the decision boundary and the result is not as close to the target function as in case (b).
3
(d)
(e)
(f) For the test case with d = 10, the PLA took 2454 iterations to converge, which is notably larger than for the test cases with d = 2 above.
(g)
(h) Will be revisited in Assignment 1. • Problem 1.5
The PLA took 148 iterations to converge. With the increased number of samples, the PLA takes longer and can learn a decision boundary that is fairly close to the target function.
The PLA took 669 iterations to converge. The conclusions from (d) are further empha- sized.
The number of iterations for the 100 trials is between 958 and 3644. The plot shows a histogram with 10 equally sized bins.
4
The following results are for one sample set. They will look different for other sample sets. So play around with using different random sets.
(a)
(b)
(c)
(d)
The figure shows the norm of the weight vector. In this case, the algorithm does not converge for η = 100, but it does start oscillating away from an optimal solution.
The learning algorithm stops after 56 iterations on the training set and achieves an error rate of 0.021 for the test set.
The learning algorithm runs all 1,000 iterations on the training set and achieves an error rate of 0.042 for the test set.
5
The learning algorithm runs all 1,000 iterations on the training set and achieves an error rate of 0.121 for the test set.
(e) We observe the smaller learning rate for smaller η. With longer training, we could get a smaller test error with smaller η. Your results may look different for your sets of training and test data.
6