代写代考 COMP5400M Bio-Inspired Computing

Overview Use of the Hopfield network Non-examinable Summary
COMP5400M Bio-Inspired Computing
Dr. Marc de Kamps
Lecture Hopfield (2)

Copyright By PowCoder代写 加微信 powcoder

Dr. Marc de Kamps
COMP5400M Bio-Inspired Computing

Use of the Hopfield network Non-examinable Summary
Reminder of the last lecture
Yesterday we looked at how to implement the Hopfield model:
􏰀 A discrete time, recurrent neural network with N neurons.
􏰀 No explicit inputs/outputs.
􏰀 Each neuron can be in one of two states, +1 or −1.
􏰀 The activity pattern for the whole network can be represented by a vector x,
x = (x1, x2, . . . xN ) ,
where each xi ∈ {−1, +1}.
􏰀 Connections by a weight matrix wij:
􏰀 Symmetric: wij = wji.
􏰀 No self connections: wii = 0.
Dr. Marc de Kamps
COMP5400M Bio-Inspired Computing

Use of the Hopfield network Non-examinable Summary
􏰀 Converges to a stable activity pattern x under asynchronous updates.
􏰀 Convergence be proven using the network energy
􏰀 Initialise the weights in terms of p storage patterns x(α):
􏰃1􏰄p x(α)x(α) :i̸=j wij= N α=1i j
0 :i=j 􏰀 This is known as the (generalised) Hebb rule.
Dr. Marc de Kamps
COMP5400M Bio-Inspired Computing

Use of the Hopfield network Non-examinable Summary
Overview of today’s lecture
At the end of today’s lecture, you should be able to:
􏰀 Appreciate how Hopfield networks can be used for storage:
􏰀 Content-addressable memory. 􏰀 Auto-association.
􏰀 Understand how spurious minima can reduce reliability.
􏰀 Estimate the capacity of Hopfield networks.
􏰀 (Non-examinable) See how deep learning relates to the models in this module.
I will also ask you to fill out the module feedback form. This is the last lecture of material for this course.
Dr. Marc de Kamps
COMP5400M Bio-Inspired Computing

Use of the Hopfield network
Non-examinable Summary
Hopfield networks as storage systems
Spurious minima
Capacity of Hopfield networks
Content-Addressable Memory (CAM)
Hopfield networks can be used as a system for Content-Addressable Memory (CAM):
􏰀 The p training patterns are stored in the weight matrix.
􏰀 An activity pattern x is applied.
􏰀 The activity converges to a steady state under asynchronous updating.
􏰀 This steady state should be one of the training patterns x(α).
Therefore the training patterns can be recovered from noisy data. 􏰀 Can recover patterns from real-world measurements.
Dr. Marc de Kamps
COMP5400M Bio-Inspired Computing

Use of the Hopfield network
Non-examinable Summary
Hopfield networks as storage systems
Spurious minima
Capacity of Hopfield networks
Auto-association
Hopfield networks can also be used for auto-association:
􏰀 The training patterns are divided into two parts, the cue and the association.
􏰀 i.e. some neurons are ‘cue’, some are ‘association,’ and the remainder (if any) are neither.
􏰀 If the cue is entered into the network and the activity stabilised, the entire training pattern including the association is retrieved.
􏰀 In this way, the network restores the association that belongs to a given cue.
Dr. Marc de Kamps
COMP5400M Bio-Inspired Computing

Use of the Hopfield network
Non-examinable Summary
Hopfield networks as storage systems
Spurious minima
Capacity of Hopfield networks
Enter a pattern in the blue neurons and let the pattern stabilise,
Figure 2: A Hopfield network as an autoassociator. One enters a pattern in blue nodes and let the
then read out the yellow neurons.
network evolve. After a while one reads out the yellow nodes. The memory of the Hopfield network associates the yellow node pattern with the blue node pattern.
Dr. Marc de Kamps
COMP5400M Bio-Inspired Computing

Use of the Hopfield network
Non-examinable Summary
Hopfield networks as storage systems
Spurious minima
Capacity of Hopfield networks
Spurious minima
Consider a network with a single storage pattern x(1): 􏰃 x(1)x(1) : i ̸= j
wij= i j 0:i=j
This pattern is stable, since
􏰅wijx(1) ∝x(1)
with a positive constant of proportionality1. Since the right-hand side, which is the input to each neuron, has the same sign as x(1) for each element, it is stable.
1 􏰄N 􏰉 (1)􏰊2 􏰉 (1)􏰊2
The constant is j=1,j̸=i xj = N − 1 (since xj =1).
Dr. Marc de Kamps
COMP5400M Bio-Inspired Computing

Use of the Hopfield network
Non-examinable Summary
Hopfield networks as storage systems
Spurious minima
Capacity of Hopfield networks
Multiple training patterns
With multiple training patterns, i.e. p > 1, two related questions arise:
1. Do patterns always converge to one of the training patterns? 2. How many training patterns can be stored in the network?
The answer to the first question is: ‘not always.’
Consider an N = 10-neuron network with p = 3 training patterns:
x(1) = (1,−1,1,−1,1,−1,1,−1,1,−1) x(2) = (1,−1,−1,−1,1,1,1,−1,−1,−1) x(3) = (1,1,1,1,1,−1,−1,−1,−1,−1)
Dr. Marc de Kamps
COMP5400M Bio-Inspired Computing

Use of the Hopfield network
Non-examinable Summary
Hopfield networks as storage systems
Spurious minima
Capacity of Hopfield networks
Now consider the 3-mixture:
x = (1,−1,1,−1,1,−1,1,−1,−1,−1)
This is constructed by taking the majority values for each neuron, i.e. 1 when 2 or 3 values in the training patterns are 1, and −1 when 2 or 3 patterns are −1.
It can be shown that this 4th pattern is also stable. This is known as a spurious minimum, ‘minumum’ since it is a local minimum of the energy E.
􏰀 Patterns ‘close’ to x will converge to x, not to one of the training patterns x(α).
Dr. Marc de Kamps
COMP5400M Bio-Inspired Computing

Use of the Hopfield network
Non-examinable Summary
Hopfield networks as storage systems
Spurious minima
Capacity of Hopfield networks
Basins of attraction
In general, the range of patterns that converge to the same final state x corresponds to a region in the space of all possible patterns.
􏰀 This region is known as its basin of attraction.
The redeeming quality of Hopfield networks is that the basin of attraction for 3-mixtures is much smaller than for the training patterns.
Therefore, starting from an arbitrary pattern it is much more likely to converge to one of the training patterns than to a spurious minimum, such as this 3-mixture.
􏰀 This assumes p ≪ N and that the training patterns are not too strongly correlated.
Dr. Marc de Kamps
COMP5400M Bio-Inspired Computing

Use of the Hopfield network
Non-examinable Summary
Hopfield networks as storage systems Spurious minima
Capacity of Hopfield networks
Capacity of Hopfield networks
How many patterns can we store and expect to recover? In other words, what is its capacity?
It is possible to estimate the capacity for training patterns that are random and uncorrelated.
􏰀 Uncorrelated means e.g. x(1) is not more or less likely to be
+1 just because x(2) is. i
􏰀 We also assume the elements within a training vector are uncorrelated, i.e. x(α) is uncorrelated with all x(α), j ̸= i.
Note that real-world data is often correlated. For instance, if you tried to store a black-and-white image in a Hopfield network, the individual pixels will be correlated.
Dr. Marc de Kamps
COMP5400M Bio-Inspired Computing

Use of the Hopfield network
Non-examinable Summary
Hopfield networks as storage systems Spurious minima
Capacity of Hopfield networks
Random walks
Supposeadrunkisatalocationx=0attimet=0. Eachtime step, the drunk is just as likely to move forwards as backwards.
https://people.duke.edu/∼rnau/411rand.htm
􏰀 Horizontal axis: time. 􏰀 Vertical axis: location.
Dr. Marc de Kamps
COMP5400M Bio-Inspired Computing

Use of the Hopfield network
Non-examinable Summary
Hopfield networks as storage systems Spurious minima
Capacity of Hopfield networks
Note this is stochastic, i.e. each realisation of a random walk will be different.
Not a Random Walk
􏰀 In code, each walk would correspond to a different sequence of numbers generated by the pseudo-random number generator.
􏰀 A classic and well-studied random process.
􏰀 Used to describe e.g. diffusion of molecules in the air.
Dr. Marc de Kamps
COMP5400M Bio-Inspired Computing

Use of the Hopfield network
Non-examinable Summary
Hopfield networks as storage systems Spurious minima
Capacity of Hopfield networks
Since random walks are stochastic, the best we can hope for is to derive the probability distribution P (x, t).
􏰀 P(x,t) is the probability that the walker is at location x after t steps, starting from x = 0 at t = 0.
Although the exact form for P(x,t) is known1, it is usually more convenient to approximate it with a Normal distribution, with zero
mean and standard deviation σ(t): √
􏰀 σ(t) increases as the square-root of time, i.e. σ(t) ≈
􏰀 Around 66% of all walkers are in the range −σ(t) < x < σ(t). 􏰀 Very small, but non-zero, chance the walker is as far as ±t. 1 It is the binomial distribution B 􏰇t, 1 􏰈. 2 Dr. Marc de Kamps COMP5400M Bio-Inspired Computing Use of the Hopfield network Non-examinable Summary Hopfield networks as storage systems Spurious minima Capacity of Hopfield networks Network capacity Suppose we have an N-neuron network and p randomised, uncorrelated training vectors x(α), α = 1 . . . p. The weights are wij = 􏰅x(α)x(α) For simplicity, we do not require the diagonal elements to be zero1. Take one training pattern. Since they have been randomised, we may as well take x(1). As usual, stability can be inferred from the signs of the elements of the vector W x(1). 1Not difficult to consider wii = 0, but the basic result is the same. Dr. Marc de Kamps COMP5400M Bio-Inspired Computing Use of the Hopfield network Non-examinable Summary Hopfield networks as storage systems Spurious minima Capacity of Hopfield networks Substitute the full expression for wij, 􏰁􏰂N1Np Wx(1) =􏰅wijx(1) = 􏰅􏰅x(α)x(α)x(1) ijNijj j=1 j=1 α=1 Separate the α sum into the α = 1 and α > 1 parts,
W x(1) = 􏰅 x(1)x(1)x(1) + 􏰅 􏰅 x(α)x(α)x(1)
iNijjN ijj j=1 j=1 α=2
The first term on the right-hand side is easy to calculate
􏰅 x(1)x(1)x(1) = x(1) 􏰅 x(1)x(1) = x(1)
NijjiNjji j=1 j=1
observing that x(1)x(1) = 1 always. jj
Dr. Marc de Kamps
COMP5400M Bio-Inspired Computing

Use of the Hopfield network
Non-examinable Summary
Hopfield networks as storage systems Spurious minima
Capacity of Hopfield networks
Consider the second term. Under our assumptions of uncorrelated x(α), each factor x(α)x(α)x(1) is a random variable taking the
values {−1, +1} with equal chance.
􏰀 It is a random walk as just described.
The total number of terms (steps) in this double sum is (p − 1)N , therefore its standard deviation σ is 􏰆(p − 1)N .
􏰀 There is around a 66% chance it lies in the range −􏰆(p − 1)N to 􏰆(p − 1)N.
With the factor N1 , the whole term is a Normal distribution with σ≈􏰆p/N (assumingplarge,sop−1≈p).
􏰀 Denote this R.
Dr. Marc de Kamps
COMP5400M Bio-Inspired Computing

Use of the Hopfield network
Non-examinable Summary
Hopfield networks as storage systems Spurious minima
Capacity of Hopfield networks
We now have
􏰁Wx(1)􏰂 =x(1)+R ii
where R has mean zero and width 􏰆p/N.
Therefore, if p ≪ N, the magnitude of R will probably be less than
the magnitude of x(1) (which is 1). i
􏰀 The sign of x(1) +R is the same as the sign of x(1). i
􏰀 The pattern is stable.
We can say that reliable storage of patterns is possible if p ≪ N. 􏰀 This is the capacity of an N-neuron Hopfield network.
Recall this assume uncorrelated training vectors.
Dr. Marc de Kamps
COMP5400M Bio-Inspired Computing

Overview Use of the Hopfield network Non-examinable Summary
The path to deep learning
Failure of early MLPs Deep learning
Deep learning
To finish the module, it might be interesting to see how what we have done connects with the current ‘hot topic’ in machine learning, what is usually called deep learning.
None of the material on these final few slides will appear in the exam.
Recall form earlier in the course the Multi-Layer Perceptron or MLP. This had many early successes:
􏰀 NETTalk, the MNIST data set for hand-written character recognition, etc.
􏰀 Less successful in object recognition.
Dr. Marc de Kamps
COMP5400M Bio-Inspired Computing

Overview Use of the Hopfield network Non-examinable Summary
The path to deep learning
Failure of early MLPs
Deep learning
Complex applications of MLPs suffer from: 􏰀 Sub-optimal local minima.
􏰀 Weak gradients (i.e. slow learning).
During the 1990’s, so-called kernal-based methods (e.g. support vector machines) all but replaced neural networks.
This changed in 2006 when neural networks were used to reduce the dimensionality in data sets, by using a small hidden layer1.
􏰀 Devised a way to accelerate training by choosing initial weights close to their trained values.
1Hinton and Salakhutdinov, Science 313, 504 (2006).
Dr. Marc de Kamps
COMP5400M Bio-Inspired Computing

Overview Use of the Hopfield network Non-examinable Summary
The path to deep learning Failure of early MLPs Deep learning
Since then a variety of training methods have been developed.
These are generally referred to as deep learning; not really a common approach, but a set of approaches with commonalities.
For instance, Boltzmann machines have a ‘temperature’ T.
􏰀 For T = 0 converges to an energy minimum, much like
Hopfield networks.
􏰀 Becomes stochastic when T > 0, visiting all states/patterns, but dwelling for longer in low-energy states.
􏰀 Can use techniques from statistical mechanics to accelerate learning, in particular the Boltzmann distribution of energies is known.
􏰀 Train by back-propagation with a likelihood function.
Dr. Marc de Kamps
COMP5400M Bio-Inspired Computing

Overview Use of the Hopfield network Non-examinable Summary
The path to deep learning Failure of early MLPs Deep learning
Deep learning may involve:
􏰀 Novel training methods, e.g. contrast divergence.
􏰀 Make efficient use of general-purpose graphics processing units (GPUs), which are cost-effective and highly parallel. This reduces training times from e.g. weeks to days.
􏰀 Apply to deep neural networks, i.e. MLPs with many hidden layers.
Software for deep learning includes:
􏰀 Tensorflow or Keras with shared variables for GPU use and evaluation of symbolic expressions.
􏰀 Other frameworks include PyTorch, Caffe.
Dr. Marc de Kamps
COMP5400M Bio-Inspired Computing

Overview Use of the Hopfield network Non-examinable Summary
Today we have
􏰀 Seen how Hopfield networks can be used for storage, either as
content-addressable memory or auto-association.
􏰀 Spurious minima can result in stable patterns that were not stored.
􏰀 The capacity of the network, i.e. how many patterns it can store and expect to retrieve.
􏰀 p patterns can be stored in N neurons if p ≪ N.
Dr. Marc de Kamps
COMP5400M Bio-Inspired Computing

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com