CS计算机代考程序代写 algorithm Time: Place: Teachers:

Time: Place: Teachers:
Allowed material: Not allowed:
Here ζ(μ) takes values 1 or −1. i
RE-EXAM for ARTIFICIAL NEURAL NETWORKS
January 20, 2018, at 830 − 1230 SB Multisal
Bernhard Meh
Mathematics Handbook for Science and Engineering Any other written material, calculator
Maximum score on this exam: 12 points.
Maximum score for homework problems: 12 points.
To pass the course it is necessary to score at least 5 points on this written exam.
1. Recognition of one pattern. In the deterministic Hopfield model, the state Si of the i-th neuron is updated according to the Mc-Culloch Pitts rule
Si ← sgn(bi), (1) N
where bi = 􏰑wijSj . (2) j=1
Here N is the number of neurons, wij are the weights. The weights depend on p patterns ζ(μ) = (ζ(μ), . . . , ζ(μ))T stored in the network according to Hebb’s
rule:
1N 1 􏰑p
wij = ζ(μ)ζ(μ) for i,j = 1,…,N . (3) Nij
μ=1
1

a) Consider the patterns in Figure 1. Compute the quantity
N
􏰑 ζ(μ)ζ(ν) (4)
jj j=1
for the following ten combinations of μ and ν: • μ=1andν=1,…,5,
• μ=2andν=1,…,5.
(Hint: The result can be read off from the Hamming distances between the patterns shown in Figure 1.)
(0.5 p)
b) The patterns stored in the network are ζ(1) and ζ(2). Let b(ν) denote bi i
obtained by substituting Sj by ζ(ν) in eq. (2). Use your answer to a) to j
compute b(ν) for ν = 1 , . . . , 5. Express these results as linear combinations i
of ζ(1) and ζ(2). (1 p) ii
c) Feed the patterns in Figure 1 to the network. Which ones are stable under synchronous updating? Show how you arrive to your results. (0.5 p)
(i) Pattern ζ(1). (ii) Pattern ζ(2).
(iii) Pattern ζ(3). (iv) Pattern ζ(4). (v) Pattern ζ(5). Figure 1: The patterns studied in Question 1. Each pattern consists of 42
bits ζ(μ) taking values +1 or −1. A black square for bit i in pattern μ stands i
for ζ(μ) = 1, a white square stands for ζ(μ) = −1. ii
2. Linearly inseparable problem. A classification problem is specified in Figure 2, where a grey triangle in input space is shown. The aim is to map input patterns ξ(μ) to outputs O(μ) as follows: if a point with coordinate
2

vector ξ(μ) lies inside the triangle it is mapped to O(μ) = 1, but if ξ(μ) is outside the triangle it is mapped to O(μ) = 0. How patterns on the boundary of the triangle are classified is not important.
a) This problem is not linearly separable. Show this by constructing a counter-example using four input patterns. (0.5 p)
b) The problem can be solved by a perceptron with one hidden layer with three neurons vi for i = 1 , 2 , 3, where neuron i is given the value
􏱆2􏱇
v(μ)=H −θi+􏰑wijξ(μ) (5)
ij j=1
and the output is given the value
􏱆3􏱇
O(μ)=H −T+􏰑Wiv(μ) . (6)
Here wij and Wi are weights and θi and T are thresholds. In eqs. (5) and (6),
ξ1
Figure 2: Specification of classification problem in Question 2 (see text).
The input space is the (ξ1,ξ2)-plane.
3
􏱏0 if b≤0 1 if b>0
H(b) =
Find weights and thresholds that solve the classification problem. (1 p)
i i=1
. (7)
ξ2

3. Backpropagation. A multilayer perceptron has L−1 hidden layers, one input layer and one output layer. The state of a neuron v(m,μ) in the m-th
layer is given by
where
for m > 1 and
v(m,μ) =g􏱋b(m,μ)􏱌, ii
b(m,μ) = −θ(m) + 􏰑 w(m)v(m−1,μ) i i ijj
j
j
g = g(b) in eq. (8) is the activation function and ξ(μ) in eq. (10) is the input j
to the perceptron. An element O(μ) in the output is given by i
O(μ) =g􏱋b(L,μ)􏱌, (11) ii
with b(L,μ) given by eq. (9). i
a) Draw this network. Indicate where elements ξ(μ), b(m,μ), v(m,μ), O(μ), w(m) i i i i ij
and θ(m) belong in your illustration. How does the total number of weights i
b(m,μ) = −θ(m) + 􏰑 w(m)ξ(μ) i i ijj
(10) for m = 1. In eqs. (9) and (10) w(m) are weights and θ(m) are thresholds.
and thresholds depend on the network architecture? (0.5 p) b) Find the recursive rule for how the derivatives
i
(8)
(9)
depend on the derivatives
∂v(m,μ) i
∂w(p) qr
∂v(m−1,μ) j
∂w(p) qr
(12)
(13)
ij i
ifp 2 patterns uniformly distributed on a circle. None of the eigenvalues of the covariance matrix of the patterns is zero.
10. Even if the weight vector in Oja’s rule equals its stable steady state at one iteration, it may change in the following iterations.
11. If your Kohonen network is supposed to learn the distribution P(ξ), it is important to generate the patterns ξ(μ) before you start training the network.
ij
5

12. All one-dimensional Boolean problems are linearly separable.
13. In Kohonen’s algorithm, the neurons have fixed positions in the output space.
14. Some elements of the covariance matrix are variances.
5. Oja’s rule.
a) The output of Oja’s rule for the input pattern ξ(μ) is
ζ(μ) = 􏰑 wiξ(μ), (15)
and the update rule based on this pattern is wi → wi + δw(μ) with i
􏰑wiwi =1. (17) i
i i
δw(μ) = ηζ(μ) 􏱋ξ(μ) − ζ(μ)w 􏱌 . iii
(16) Let ⟨δwi⟩ denote the update of wi averaged over the input patterns. Show
that ⟨δwi⟩ = 0 implies
(1 p)
b) Calculate the principal component of the patterns in Figure 3. (1 p)
ξ2
ξ(2) ξ(1)
Figure 3:
ξ1
Patterns studied in Question 5b.
6
ξ(4)
ξ(3)
ξ(5)

6. General Boolean problems. Any N dimensional Boolean problem can be solved using a perceptron with one hidden layer consisting of 2N neurons. Here we consider N = 3 and N = 2.
a) The three-dimensional parity problem is specified in Figure 4. Here input bits ξ(μ) for i = 1, 2, 3 are either +1 or -1. The problem is solved by that
i
the output O(μ) of the network is +1 if there is an odd number of positive
bits in ξ(μ), and -1 if the number of positive bits are even. In one solution, a neuron v(μ) for i = 1, … , 2N in the hidden layer depends on the input ξ(μ)
according to
where the weights and thresholds are given by
wij=ξ(i) and θi=2. (19)
i
The output is given by
j
O(μ) =􏰑Wjv(μ). (20) j
j
Determine the weights Wj. (1 p)
􏱏1 if −θ+􏰎wξ(μ)>0 (μ) i jijj
vi = 0 if −θi+􏰎wijξ(μ)≤0 , (18) jj
7

ξ(5)
ξ3
ξ(1)
ξ(8)
ξ(6) ξ(4)
ξ(2)
ξ2
ξ(7)
ξ(3)
ξ1
Figure 4: The three-dimensional parity problem (see text). A white ball
indicates O(μ) = −1, and a black ball indicates O(μ) = +1.
b) The problem in two dimensions analogous to the problem in a) is the XOR-problem. Draw the decision boundaries of the hidden neurons in the solution to the XOR-problem that is analogous to the solution given in a). (0.5 p)
8