代写代考 COMP400: Bio-Inspired Computing Hopfield Networks

COMP400: Bio-Inspired Computing Hopfield Networks
Marc de Kamps March 23, 2020
1 Introduction
1.1 Why Hopfield Networks

Copyright By PowCoder代写 加微信 powcoder

Unlike multi-layer perceptrons and backpropagation, Hopfield networks are not at the forefront in machine learning at the moment. Yet, they illustrate some extremely important topics:
• Content-addressableMemory(CAM)Memoriescanberetrievedusingpartsofmemory.Think of remembering a summer night long ago, based on a smell, or perhaps a song, which sud- denly takes you back and creates a vivid memory of the entire scene. We will see that Hopfield networks can recreate patterns it has learnt when a part of the original pattern is entered into the network.
• Holographic memory. The memory in Hopfield network does not have a particular location, it is distributed in the whole network. Damaging a few connections does not destroy the memory as it would in a conventional computer memory, it degrades the overall quality, but unless the damage is extensive, the original memory can still be retrieved.
• Human memory appears to have similar characteristics. The Hopfield network is an important model for human memory.
• TherelativelysimpledynamicsoftheHopfieldnetworkallowscapacitycalculationsformem- ory. An important result in this area is that the number of memories that can be stored in the network scales with the number of nodes, not the number of connections. This is important in the light of the opinion of some psychologist that the astronomical number of connections in the human brain O(1015), would imply an effectively infinite capacity for long-term memory. Memory capacity scaling with the number of nodes, rather than the number of connections, suggests this is not the case.
• When multiple patterns are stored in a Hopfield network, this can be interpreted as imposing constraints that are partially in contradiction. Hopfield networks can be seen as networks that satisfy the maximum number of constraints that are imposed. We will study a paper by Hopfield on how he used his network to find acceptable solutions for the Traveling Salesman Problem.
• Whennodesarenotupdatedusingadeterministicrule,butastochasticone,Hopfieldnetwork become Boltzmann Machines. Restricted Boltzmann Machines (RBMs) played an important role in the breakthrough of deep learning. The mathematics underpinning RBMs still plays an important role in understanding autoencoders.

Written as a formula:
􏰃 1 : 􏰄iwixi≥0 o= −1 : 􏰄iwixi<0 Figure 1: An artificial neuron as used in a Hopfield network. • Therefore, studying Hopfield machines sets you up for understanding this important area of machine learning. We will come back to this later. Hopfield networks are constructed from artificial neurons (see Fig. 1). These artificial neurons have N inputs. With each input i there is a weight wi associated. They also have an output. The state of the output is maintained, until the neuron is updated. Updating the neuron entails the following operations: • The value of each input, xi is determined and the weighted sum of all inputs, 􏰄i wi xi is calcu- lated. In Hopfield networks a threshold is not used. • The output state of the neuron is set to +1 if the weighted input sum is larger or equal to 0. It is set to -1 if the weighted input sum is smaller than 0. • A neuron retains its output state until it is updated again. A Hopfield network is a network of N such artificial neurons, which are fully connected. The connection weight from neuron j to neuron i is given by a number wij. The collection of all such numbers is represented by the weight matrix W, whose components are wij. Now given the weight matrix and the updating rule for neurons the dynamics of the network is defined if we tell in which order we update the neurons. There are two ways of updating them: Asynchronous: one picks one neuron, calculates the weighted input sum and updates imme- diately. This can be done in a fixed order, or neurons can be picked at random, which is called asynchronous random updating. Synchronous: the weighted input sums of all neurons are calculated without updating the neurons. Then all neurons are set to their new value, according to the value of their weighted input sum. The lecture slides contain an explicit example of synchronous updating. Use of the Hopfield network The way in which the Hopfield network is used is as follows. A pattern is entered in the network by setting all nodes to a specific value, or by setting only part of the nodes. The network is then subject to a number of iterations using asynchronous or synchronous updating. This is stopped after a while. The network neurons are then read out to see which pattern is in the network. The idea behind the Hopfield network is that patterns are stored in the weight matrix. The input must contain part of a training pattern. The dynamics of the network then retrieve the patterns stored Figure 2: A Hopfield network as an autoassociator. One enters a pattern in blue nodes and let the network evolve. After a while one reads out the yellow nodes. The memory of the Hopfield network associates the yellow node pattern with the blue node pattern. in the weight matrix. This is called Content Addressable Memory (CAM). The network can also be used for auto-association. The patterns that are stored in the network are divided in two parts: cue and association (see Fig. 2). By entering the cue into the network, the entire pattern, which is stored in the weight matrix, is retrieved. In this way the network restores the association that belongs to a given cue. The stage is now almost set for the Hopfield network, we must only decide how we determine the weight matrix. We will do that in the next section, but in general we always impose two conditions on the weight matrix: • symmetry:wij =wji • no self connections: wii = 0 It turns out that we can guarantee that the network converges under asynchronous updating when we use these conditions. 3 Training the network 3.1 A simple example Consider the two nodes in Fig. 3 A. Depending on which values they contain initially they will reach end states {+1, +1} or {-1,-1} under asynchronous updating. The nodes in Fig. 3 B on the other hand will reach end states {-1, +1} and Exercise: Check this! Exercise: Check that under synchronous updating there are no stable states in the network. This simple example illustrates two important aspects of the Hopfield network: the steady final state is determined by the value of the weight and the network is ’sign-blind’; it depends on the Figure 3: A two-neuron network. There are just two options for the weight matrix: wi j = 1 (A) or wi j = −1. If the weight is 1, there are two stable states under synchronous updating {+1, +1}, or {−1, −1}. For a weight of -1, the stable states will be {−1, +1, } or (+1, −1} depending on initial conditions. Under synchronous updating the states are oscillatory. initial conditions which final state will be reached, {+1,+1} or {-1, -1} for 3 A. It also shows that synchronous updating can lead to oscillatory states, something which is not true for asynchronous updating (see section 4.2). 4 Setting the weight matrix 4.1 A single pattern Consider two neurons. If the weight between them is positive, then they will tend to drive each other in the same direction: this is clear from the two-neuron network in the example. It is also true in larger networks: suppose we have neuron i connected to neuron j with a weight of +1, then the contribution of neuron i to the weighted input sum of neuron j is positive if xi = 1 and negative if x= − 1. In other words neuron, i tries to drive neuron j to the same value as it has currently. If the connection weights between them is negative, however, neuron i will try to drive neuron j to the opposite value! This inspires the choice of the Hebb rule: given a network of N nodes and faced with a pattern ⃗x = (x1, ..., xN ) that we want to store in the network, we chose the values of the weight matrix as follows: wij = xixj (1) To see how this works out in a network of four nodes, consider Fig. 4. Note that weights are optimal for this example in the sense that if the pattern (1, -1, 1, 1) is present in the network, each input node receives the maximal or minimal (in the case of neuron 2) input some possible! We can actually prove this: suppose the pattern ⃗x, which has been used to train the network, is present in the network and we update neuron i, what will happen? The weighted input sum of a node is often denoted by hi ≡ 􏰄j wijxj, which is called the local field. If wij was chosen to store this very same pattern, we can calculate hi: hi = wijxixj = xixjxj = xi + xi + xi = 3xi (2) j j=1 This is true for all nodes i. So all other three nodes (self connections are forbidden!) in the network give a summed contribution of 3xi. This means that xi will never change value, the pattern is stable! 4.2 Energy We can define an energy for each node: Figure 4: A four-node network to store pattern (1, -1, 1, 1). Because wOJ = wJo we have represented the two weights by a single link. What happens if instead of ⃗x we have another pattern in the network, ⃗v = −⃗x? What happens now if we try to update it? Again for the local field hi we find: hi = wijvj =− wijxj =− xixjxj =− xi =−3xi =3vi (3) j j j j=1 Again, this pattern is maximally stable: the Hopfield network is ’sign-blind’. Ei =−21hixi (4) Note that the energy is positive if the sign of hi and xi is different! An update of node i will result in a sign change in this case because the local field hi has a different sign. The updating will change the energy form a positive number to a negative number. This corresponds to the intuitive idea that stable states have a low energy. We can define an energy for the entire network: E(⃗x) = 􏰅Ei = −􏰅 1hixi = −1 􏰅wijxixj (5) ii22ij Again, it is clear that for the pattern that has been used to train the weight matrix the energy is minimal. In this case: E=−1􏰅wijxixj =−1􏰅xixjxixj =−1􏰅1=−1(N−1)2 (6) 2ij 2ij 2ij 2 5 Stability of a single pattern So far we have looked at the situation where the same patterns was in the network that was used to train the network. Now let us assume that another pattern is in the network. It is pattern ⃗y, which is the same as pattern ⃗x except for three nodes. The network, as usual, has N nodes. Now we are updating a node i of the network. What is the local field? It is given by: hi = wijyj = xixjyj (7) jj We can split this sum in 3 nodes, which have an opposite value of ⃗x and N − 3 nodes which have the same value: hi = xixjxj + xixj ·−xj =(N−3)xi −3xi =(N−6)xi j=1 j=1 For N > 6 all the local fields point the direction of xi, the pattern that was used to define the weight matrix!. So updating will result in no change (if the value of node i is equal to xi) or an update (if the value of node i is equal to -xi). So, the pattern that is stored into the network is stable: up to half of the nodes can be inverted and the network will still reproduce ⃗x.
Exercise: To which pattern will the network converge if more than half the nodes are inverted? There is also another way to look at this. The pattern that was used to train the network is an absolute energy minimum. This follows from equation 6. One can show the following important
proposition, which is true for any symmetric weight matrix: Proposition 1
The energy of a Hopfield network can only decrease or stay the same. If an update cause a neuron to change sign, then the energy will decrease, otherwise it will stay the same.
The condition that the energy of a Hopfield network can only decrease only rests on the condition that the weight matrix is symmetric. This is true for the Heb rule and also the generalised Hebb rule (see below). The undergraduates need not know this proof, but the postgraduates must be able to reproduce it. We will proof proposition 1 in section A.
Since there is only one absolute minimum in a Hopfield network that has been trained with a single pattern (equation 6 is proof of that) and the minimum is reached when the training pattern (or its negative inverse; the same pattern with multiplied by -1) is in the network, the network must converge to the trained pattern (or its negative inverse).
6 The Hebb rule and the generalised Hebb rule
Instead of setting the weights by equation 1 we usually use:
wij = N1 xixj (8)
Exercise: Why does this not make a difference for the dynamics of the network?
How do we store more than one single pattern in the network? Let us assume we have two patterns x⃗(1) and x⃗(2) that we want to store. The trick is to calculate the weight matrix for pattern one, as if the other pattern dies not exist:
w(1) = 1 x(1) x(1) ijNij
w2 = 1x(2)x(2) ijNij

and the to add them:
for p patterns the procedure is the same and this leads to the generalised Hebb rule:
then the 3-mixture is:
x(1) = (1,−1,1,−1,1,−1,1,−1,1,−1) (9) x(2) = (1,−1,−1,−1,1,1,1,−1,−1,−1)
x(3) = (1,1,1,1,1,−1,−1,−1,−1,−1)
(1,−1,1,−1,1,−1,1,−1,−1,−1)
wi j = w(1) + w(2) ij ij
wij = x(k)x(k)
where xik is the value of node i in pattern k.
7 Energy and the generalised Hebb rule
The two weight matrices interfere with each other. The trained patterns are no longer absolute energy minima. In section 8, however, that they are still local energy minima, if the network is large and the patterns do not resemble each other to closely (are uncorrelated). The network is (p ≪ N) , where p is the number of patterns used to train the network and N the number of nodes. This means that if you enter a pattern in the network and then let network dynamics take over that you will still end up in an energy minimum. The generalised Hebb rule still satisfies the condition of Proposition 1, so network dynamics guarantees that the network will converge towards an energy minimum and therefore to a steady final state. And it is highly likely that such a minimum will correspond to one of the training patterns. It is not guaranteed, however.
7.1 Spurious minima
There are so-called spurious minima. These are patterns which are local minima, but do not corre- spond to a training pattern. If more than three patterns are stored in the network, one can show, for example, that asymmetric 3-mixtures are also local minima. What are those? Given three training patterns, one can create a fourth one. For each node, the value is defined by the majority values of the other node: if the other patterns at node i have the values -1, -1, 1, respectively, the value of node i in the 3-mixture will be -1. Example, suppose that in a 10-node network, there are three training patterns:
The redeeming quality of Hopfield networks is that the so-called ’basin of attraction’ is much smaller for 3-mixtures than for training patterns. So, although one may sometimes end up in a mixture of training patterns, as the demonstration showed, one is much more likely to end up in a training pattern, provided p ≪ N and the training patterns are not too strongly correlated.
8 Estimating the Capacity of a Hopfield network
The stability of Hopfield networks can be investigated with the generalised training rule: consider a Hopfield network of N nodes that has been trained with p patterns. Now assume that patterns

are generated randomly and that the elements of individual patterns are uncorrelated and that the elements of different training patterns are also uncorrelated. Throwing a coin to determine the individual elements of training patterns would achieve exactly this.
Question: If you try to store an image in the network (black and white images), are the individual pixels uncorrelated?
8.1 The Random Walk
Set a drunk at position 0 at t = 0. Assume that he has probability of 12 of moving left and also 12 for moving right. Assume that at each time step he makes a right-left decision. Now follow him for a 100 steps. Individual paths are unpredictable, see Figure 5, taken from http://mahalanobis. twoday.net/stories/210704.
Figure 5: Individual random walks can not be predicted.
Obviously, you can not tell where he after a 100 steps, but you can calculate the probability of
where he could be. This probability density is given by the binomial distribution. For a large number
of steps, large meaning larger than 5, a Gaussian distribution is a good approximation. It is centred
position and with 68% probability he will be in the range of [−10, 10] steps away. See also Figure 6
at 0 and has a σ =
(taken from http://zoonek2.free.fr/UNIX/48_R/07.html).
⃗x ( μ ) , μ = 1 , · · · p 1 􏰅p
N. This means that the probability is highest to find him close to the original √√
In general for N steps the range is is [− N, N].
8.2 The Dynamics of Patterns Close to a Training Pattern
If we have p training patterns:
where we have determined the elements of each vector, we can determine the weight matrix. It is
wij = x(k)x(k) (10) Nij
Now assume that we take one of these training patterns. As argued above, we can assume that we have generated these patterns by throwing a coin, so they are random. There is therefore no harm in selecting training pattern 1. Assume that this is the pattern which we store in a Hopfield network of which the Hopfield network is given by Equation 10. We will calculate the local field

at an arbitrary node in the network. Assume that the current pattern in the network is ⃗x. We know about x that it is equal to ⃗x(1). Now assume that we investigate the local field at node i.
hi = x(k)x(k)x(1) (11)
by virtue of the generalised Hebb rule. This looks complex but we can separate the sum over the training patterns into two parts:
This is equal to
Nijj j k=1
􏰅 1 􏰅 1 􏰅p hi = x(1)x(1)x(1) +
x(k)x(k)x(1) (12) NijjNijj
This looks hopelessly complicated, but is actual quite simple. First, observe that the second sum is a random walk of (p − 1)N steps.
Question: Why?
This means that we can not calculate it, but we can estimate that with a probability of 66% it is less than 􏰆(p − 1)N. This follows from the following observations:
1. For a random sum of N variable one can calculate directly that if N
which hints that the number of steps scales with
In general for a distribution with mean 0 and standard deviation σ one can show that:
√ E | S n2 | = σ N
In general the variance of a quantity x is defined as:
σ2 = E[x2] − E[x]2
A Gaussian is defined as the following distribution: N(x,μ,σ)= √1 e−(x−μ)2
E(Sn) = 0 E ( S n2 ) = n
limE(|Sn |)=
n. In fact
For a Gaussian the part of the curve which contains 66 % of the probability density is given by the interval

pN? What if this probability is not acceptable (what does this mean)? What can you change? The first
The important fact to remember is that the sum of N random terms scales with ignore the scaling factor.
N, we will typically
Question: What happens in the 33 % of the cases that the sum is larger than
sum is even simpler:
where x(1) ± 1 and R is probably smaller than i
hi = 􏰅 1 x(1) x(1) x(1) = 1 􏰅 x(1) = x(1) NijjNii
One can write the local field therefore as:
hi = x(1) + R
R < 􏰆(p − 1)N ≈ (13) In other words, the local field at node i probably points in the same direction as element x(1)! This i again proves that a training pattern is stable, even if multiple training patterns have been used. R can be interpreted as a noise term, which represent interference in the Hopfield network due to the other training patterns. This calculation explicitly shows that this influence is limited as long as the number of training patterns is smaller than the number of nodes. Figure 6: The distribution of end points of a random walk can be estimated with a Gaussian distri- bution. Now consider the situation where we had not entered training pattern 1 into the network, but a pattern that is different at k random positions. By the same reas 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com