1. Recognition of one pattern.
a) Define
SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS
j =42
Q(μ,ν) = ζ(μ)ζ(ν). (1)
jj j=1
The bit j constributes with +1 to Q(μ,ν) if ζ(μ) = ζ(ν), and with −1 if ζ(μ) ̸= jjj
ζ(ν). Since the number of bits are 42, we have Q(μ,ν) = 42 − 2 · H(μ,ν), where j
H(μ,ν) is the number of bits that are different in pattern μ and pattern ν (the Hamming distance). We find:
• H(1,1) =0⇒Q(1,1) =42 • H(1,2) =10⇒Q(1,2) =22
• H(1,3) =2⇒Q(1,3) =38
• H(1,4) =42⇒Q(1,4) =−42 • H(1,5) =21⇒Q(1,5) =0
1
• H(2,1) =H(1,2) =10⇒Q(2,1) =22 • H(2,2) =0⇒Q(2,2) =42
• H(2,3) =10⇒Q(2,3) =22
• H(2,4) =42−H(2,1) =32⇒Q(2,4) =−22 • H(2,5) =11⇒Q(2,5) =20
b) We have that
b(ν) =wijζ(ν) = 1 ζ(1)ζ(1) +ζ(2)ζ(2)ζ(ν) i j42ijijj
jj
= 1 ζ(1) ζ(1)ζ(ν) + 1 ζ(2) ζ(2)ζ(ν) 42i jj 42i jj jj
= 1 ζ(1)Q(1,ν) + 1 ζ(2)Q(2,ν). (2)
42 i From a), we have that:
Q(1,1) (1) =42ζi
Q(1,2) (1) = 42 ζi
Q(1,3) (1) =42ζi
Q(1,4) (1) =42ζi
42 i Q(2,1) (2)
+ 42ζi Q(2,2) (2)
+ 42 ζi Q(2,3) (2)
+ 42ζi Q(2,4) (2)
+ 42ζi Q(2,5) (2)
(1) 22 (2) =ζi +42ζi ,
22 (1) (2) =42ζi +ζi ,
38 (1) 22 (2) =42ζi +42ζi ,
(1) 22 (2) =−ζi −42ζi ,
20 (2)
=42ζi . (3)
(1)
bi (2)
bi (3)
bi (4)
bi (5)
Q(1,5) (1) = 42 ζi
bi
c) We have that From b), we find that:
+ 42 ζi
ζ(1) → sgn(b(1)) = ζ(1),
iii
ζ(2) → sgn(b(2)) = ζ(2), iii
ζ(3) → sgn(b(3)) = ζ(1), iii
ζ(4) → sgn(b(4)) = −ζ(1) = ζ(4), iiii
ζ(5) → sgn(b(5)) = ζ(2). (4) iii
Thus patterns ζ(1), ζ(2) and ζ(4) are stable. iii
2
2. Linearly inseparable problem.
a) In the figure below ξ(A) and ξ(B) are to have output 1 and ξ(C) and ξ(D) are to have output 0. There is no straight line that can separate patterns ξ(A) and ξ(B) from patterns ξ(C) and ξ(D).
b) The triangle corners are:
(1) −4 (2) 4 (3) 0
ξ = 0 ξ = −1 ξ = 3 . (5) • Let v1 = 0 at ξ(1) and ξ(2). This implies
0= w ξ(1) +w ξ(1) −θ =−4w −θ 111 122 1 11 1
0= w ξ(2)+w ξ(2)−θ =4w −w −θ 111 122 1 11 12 1
⇒θ1 =−4w11 and w12 =4w11 −θ1 =8w11. (6) Wechoosew11 =1,w12 =8 and θ1 =−4.
3
• Let v2 = 0 at ξ(2) and ξ(3). This implies
0= w ξ(2)+w ξ(2)−θ =4w −w −θ
211 222 2 21 22 2
0= w ξ(3)+w ξ(3)−θ =3w −θ 211 222 2 22 2
⇒w22 =4w21 −θ2 =4w21 −3w22 ⇒w22 =w21
and θ2=3w22. (7)
Wechoosew21 =w22 =1 and θ2 =3. • Let v3 = 0 at ξ(3) and ξ(1). This implies
0= w ξ(3)+w ξ(3)−θ =3w −θ 311 322 3 32 3
0= w ξ(1)+w ξ(1)−θ =−4w −θ 311 322 3 31 3
⇒ 3w32 = −4w31 and θ3 = 3w32. (8) We choose w32 = 4, w31 = −3 and θ3 = 12. In summary:
1 8 −4
w=1 1 and θ=3. (9)
−3 4 12 4 1
The origin maps to
v = H (w0 − θ) = H −3 = 0 . (10)
−12 0
We know that the origin maps to v = [1, 0, 0]T and that the hidden neurons
change values at the dashed lines:
4
Thus we can conclude that the regions in input space maps to these regions in the hidden space:
We want v = [1,0,0]T to map to 1 and all other possible values of v to map to 0. The hidden space can be illustrated as this:
W must be normal to the plane passing through the crosses in the picture 5
above. Also, W points to v = [1,0,0]T from v = [0,1,1]T. We may choose 1 0 1
W =0−1=−1. (11) 0 1 −1
Weknowthatthepointv=[1/2,0,0]T liesonthedecisionboundaryweare looking for. So
Let Nm denote the number of weights w(m). Let nm denote the number of ij
hidden units v(m,μ) for i = 1,…,L − 1, let n0 denote the number of input i
units and let nL denote the number of output units. Find that the number of weights are
LL
Nm = nm−1nm, (13) m=1 m=1
and that the number of thresholds are
L
nm. (14) m=1
3. Backpropagation.
a)
1/2 2
WT 0 −T=0⇒T=1. (12) 0
6
b)
∂ v(m,μ) = ∂ g b(m,μ) = g′ b(m,μ) ∂ b(m,μ)
∂w(p) i ∂w(p) i i ∂w(p) i qrqr qr
∂w(p) c) From b), we have:
∂w(p)
=g′ =g′
∂
b(m,μ) −θ(m) + w(m)v(m−1,μ) i∂w(p)i ijj
qr j
∂
w(m)v(m−1,μ) . (15) ij j
∂v(m−1,μ)
j . (16)
∂ v(m,μ) = g′ b(m,μ) ∂ w(m)v(m−1,μ). (17) ∂w(p) i i ∂w(p) ij j
qr jqr
But since p = m, we find:
∂ v(m,μ) = g′ b(m,μ) δ δ v(m−1,μ) = g′ b(m,μ) δ v(m−1,μ). (18)
Using that p < m, we find:
∂v(m,μ) i
b(m,μ) i
w(m) ij
b(m,μ) i
∂w(p) j qr
=g′
qr j qr
∂w(p) i i qi rj j qr j
d) We have w(L−2) ← w(L−2) + δw(L−2), where qr qr qr
δw(L−2) = −η ∂H
qr ∂w(L−2)
qr
i
.
qi r
(19)
We derive the energy function:
∂H = ∂ 1O(μ) −ζ(μ)2
∂w(L−2) ∂w(L−2) 2 i i qr qr μi
= O(μ) −ζ(μ) i i
From 3b) and 3c), we have:
i
∂O(μ)
i (20)
(21)
∂w(L−2) μi qr
(m,μ) (m) ∂v(m−1,μ) (m,μ) g′b w j
∂vi ∂wqr
ifp
10. Even if the weight vector in Oja’s rule equals its stable steady state at one iteration, it may change in the following iterations. TRUE (it is only a statistically steady state).
11. If your Kohonen network is supposed to learn the distribution P(ξ), it is important to generate the patterns ξ(μ) before you start training the network. FALSE (training your network does not affect which pattern you draw from your distribution).
12. All one-dimensional Boolean problems are linearly separable. TRUE (two different points can always be separated by a line).
13. In Kohonen’s algorithm, the neurons have fixed positions in the output space. TRUE (it is the weights, in the input space , that are updated).
14. Some elements of the covariance matrix are variances. TRUE (the diagonal elements).
9
5. Oja’s rule.
a)
Insert
0 = ⟨δw⟩ = ⟨ηζ (ξ − ζw)⟩
⇒ ⟨ξζ⟩ = ⟨ζζw⟩ . (23)
ζ = ξT w = wT ξ : (24) 0 = ⟨δw⟩ ⇒ ξξTw = wTξξTww
⇒ ξξTw = wT ξξTww. (25) ξξT = C is a matrix, so:
0 = ⟨δw⟩ ⇒ Cw = wTCww. (26) We see that ⟨δw⟩ = 0 implies that w is an eigenvector of C with eigenvalue
λ = wT Cw:
λ = wT Cw = wT λw = λwT w ⇒ wT w = 1. (27)
(note that wT w = i wiwi). b) Are the patterns centered?
5
ξ(μ) =−6−2+1+1+5=0 (28) 1
μ=1 5
ξ(μ) =−5−4+2+3+4=0. (29) 2
μ=1
So ⟨ξ⟩ = 0, and the patterns are centered. This means that the covariance
matrix is:
We have
1 5
ξ(μ)ξ(μ)T =ξξT. (30)
C=5
(1) (1) T 36 30
μ=1
ξξ =3025 (2)(2)T 48
ξξ =816 (3) (3)T 4 4
ξξ=44 (4) (4)T 1 3
ξξ=39
(5) (5) T 25 20
ξ ξ = 20 16 . (31) 10
We compute the elements of C:
We find that
5C11 =36+4+4+1+25=70
5C12 =5C21 =30+8+4+3+20=65
5C22 =25+16+4+9+16=70. (32)
14 13
C = 13 14 . (33)
Maximal eigenvalue:
14−λ13222 222
0= 13 14−λ=(14−λ) −13 =λ −28λ+14 −13 =λ −28λ+27 √√
⇒λ=14± ⇒λmax = 27.
142 −27=14± 169=14±13
Eigenvector u:
So an eigenvector corresponging to the largest eigenvalue of C is given by 1
u=t 1 (36) for an arbitrary t. This is the principal component.
14 13 u1
1314 u =27u ⇒u1=u2. (35)
u1 22
11
(34)
6. General Boolean problems. There was a typo in Eqn. (18) of the exam. The correct equation is:
1 if −θ+wξ(μ)>0 (μ) i jijj
vi = 0 if −θi+wijξ(μ)≤0 . jj
a) The solution uses wij = ξ(i). This means that the ith row of the weight j
matrix w is a vector w(i) = ξ(i):
From the figure above, we see that:
• w(i)Tξ(i) = 1+1+1 = 3.
• w(i)Tξ(μ) =1+1−1=1forμ=j1,j4 andj3.
• w(i)Tξ(μ) =1−1−1=−1forμ=j2,j5 andj7.
• w(i)Tξ(j6) =−1−1−1=−3. Using that θi = 2, we note that:
So we have:
T >0 if i=μ
w(i) ξ(i)−θi is <0 if i̸=μ . (37)
1 if i=μ
v(i,μ) = 0 if i̸=μ . (38)
We can understant that the corner μ of the cube of possible inputs is sepa- rated from the other corners by that it assigns 1 to the μth hidden neuron and 0 to the others.
12
From Figure 4 in the exam, we see that there are exactly 4 of the 8 possible inputs ξ(μ) that are to be mapped to Oμ = 1. These are ξ(μ) for μ = 2,4,5 and 7.
These inputs will assign, respectively:
0 1
0
0 0 0 0
0 0 1 (5) 0
0 0
(2) 0 (4)
v =0, v =0, v =1, and v =0. (39)
0 0 0 0 0 0 0 1
0000
The weights W are now to ’detect’ these and only these patterns, so that
1 for μ ∈ {2,4,5,7}
O(μ) =WTw= 0 for μ∈{1,3,6,8} . (40)
This is achieved by letting:
0
1
0
1
W = 1 . (41)
0 1
0
13
0 (7) 0
b) The solution in 6a) implies separating each corner ξ(μ) of the cube of input patterns by letting
1 if i=μ
v(i,μ) = 0 if i̸=μ . (42)
Thus the solution requires 23 = 8 hidden neurons. The analogous solution in 2D is to separate each corner of a square, and it requires 2N = 22 = 4 neurons. The decision boundaries of the hidden neurons are shown here:
14