What is a Neural Network?
A Neural Network (NN) is a network of connected elements called neurons:
for example, as found in a (biological) brain.
What is a Brain?
Copyright By PowCoder代写 加微信 powcoder
A highly complex, non-linear and parallel information processing system (a computer)
It is composed of neurons lots of neurons:
~ 85 billion (8.5×1010) for humans
~ 1 billion (1×109) for cats
~ 1 million (1×106) for bees or cockroaches
● What is a Neural Network?
– biological neural networks
– artificial neural networks
● general principles and terminology
● why study artificial neural networks?
● Types of artificial neural network
– Linear Threshold Units (Perceptrons)
● Delta Learning Algorithm
– Competitive Learning Networks
– Negative Feedback Networks
– Autoencoder Networks
What is a Neuron?
Like all cells, all neurons are small, complex, biological machines.
What is a Neuron?
Neurons consist of a number of different parts:
● A cell body (or ‘nucleus’ or ‘soma’)
● Many dendrites
● Many synapses
What is a Neuron?
A neuron is a type of cell.
Neurons come in many different types.
What is a Synapse?
Each synapse is also a small, complex, biological machine.
● Activity in the the pre-synaptic neuron causes the release of neurotransmitters from synaptic vesicles.
● These neurotransmitters diffuse across the gap to receptors on the post-synaptic neuron and cause activity there.
● ~100 different neurotransmitters, e.g., dopamine, serotonin, and acetylcholine.
● A synapse converts a
presynaptic electrical signal
into a chemical signal and
then back into a postsynaptic
Information flow through a neuron
electrical signal.
The dendrites transmit input signals from the synapses to the soma.
The soma integrates inputs and if total is sufficiently strong, soma emits an action potential (it ‘fires’ or ‘spikes’).
The axon transmits the action potential to the dendrites of other neurons, via the synapses.
What is a Synapse?
Synapses are the connections between the axon of one neuron and the dendrites of another neuron.
pre-synaptic neuron
post-synaptic neuron
What is an Action Potential?
There is a threshold for spiking.
Reaching spiking threshold requires either:
● repetitive stimulation of the same synapses (temporal summation)
● simultaneous stimulation of a large number of synapses (spatial summation)
● or (most typically) a combination of both
Synaptic influences on spiking
The arrival of an impulse at a synapse may have the opposite effect, i.e., it may render the post-synaptic neuron less activated by other stimuli (inhibition).
There are thus two types of synapse:
● Excitatory
– tend to cause spiking in the postsynaptic neuron
– e.g. glutamate ● Inhibitory
– tend to prevent spiking in the postsynaptic neuron
– e.g. GABA
What is an Action Potential?
An electrical impulse.
A neuron is an electronic device with a voltage, or potential. The dendrites and axons act like (highly nonlinear) wires.
rapid spikes riding on top of a more slowly varying subthreshold potential
only the action potentials get transmitted along the axon
What is a biological Neural Network?
● A network of neurons interconnected through synapses
● Each neuron makes about 103 to 104 connections, hence,
networks are very large:
~ 1014 synapses in a human brain ~ 1012 synapses in a cat brain
~ 109 synapses in a bee brain
● Neurons communicate via synapses
● Neurons perform computations on their synaptic inputs and if certain conditions are met (the potential exceeds a threshold) they produce an output
● Neurons can adapt, which will change the behaviour of the network
What is a Neural Network: Artificial NNs
Synaptic influences on spiking
Different synapses may have stronger or weaker affects on the postsynaptic neuron
– synapseshavedifferentstrengths(or‘weights’) – synapsesmaychangetheirstrength
Synaptic plasticity
● Plasticity permits the nervous system to adapt to its
environment: learning, memory, new computations
– Two mechanisms:
● creation of new synaptic connections between neurons
● modification of existing synapses
What is an artificial Neural Network?
A network of simple processing units which communicate by sending signals to each other over weighted connections.
processing units –
analogous to neurons
weighted connections –
analogous to synapses
A hugely simplified model of a biological neural network
What is an artificial Neural Network?
An ANN is a parallel computational system consisting of many simple processing elements connected together in a specific way in order to perform a particular task.
Also known as:
● parallel distributed processing (PDP), or
● connectionist models
● What is a Neural Network?
– biological neural networks
– artificial neural networks
● general principles and terminology
● why study artificial neural networks?
● Types of artificial neural network
– Linear Threshold Units (Perceptrons)
● Delta Learning Algorithm
– Competitive Learning Networks
– Negative Feedback Networks
– Autoencoder Networks
Why study artificial neural networks?
From a biological perspective:
● To build models of brain function by simulation. ➔ to understand human intelligence
➔ to understand brain dysfunction
● To mimic certain cognitive capabilities of human/animals.
➔ to build artificial intelligence
➔ e.g. the brain can perform certain pattern recognition tasks much faster and more accurately than any conventional computer
Why study artificial neural networks?
From a practical perspective:
● ANNs are extremely powerful computational devices
● Turing equivalent, universal computers.
● Any continuous function from input to output can be
implemented in a three-layer ANN ● universal function approximators
• i.e. y=f(x) for any function f can be implemented using an ANN
● universal classifiers
• recall classification can be defined as trying to approximate ω=f(x)
What is an artificial Neural Network?
In diagrams:
● processing units usually shown as circles
● weighted connections usually shown as arrows
Why study artificial neural networks?
Lots of variables
processing units: how many? what computation should they perform?
weighted connections: which nodes are connected? what weight values to use? adaptable, if so how?
Processing units: layers
● Typically, processing units are arranged into populations or layers.
● Layers can be:
– visible – receive inputs from, or send outputs to, the
external environment
– hidden – only receive inputs and send output to other
processing units
Why study artificial neural networks?
The catch:
● In principle, ANNs can perform any calculation.
● In practice, it is problematic to find the neural network architecture (and set its parameters) to solve a particular problem.
Processing units: layers
Input layers typically are linear – they don’t perform any
processing
● Hence, often ignored
➔ In which case the network below would be described as 2- layer, rather than 3-layer
Some people count the layers of connections (not layers of
➔ In which case the network below would be described as 2-layer, rather than 3- layer
Processing units: function
● Each unit receives a vector of inputs, x (from other units or the external environment).
● Each input is associated with a weight, w, which determines the strength of influence of each input.
● Each w can be either +ve (excitatory) or -ve (inhibitory).
Processing units: layers
Visible layers can be sub-divided into:
● input layers: receive signals from the environment
– for classification, this is the feature vector which is to be classified
● output layers: which send signals to the environment
– for classification, this is the predicted class label associated
with the feature vector
– typically, “1-hot” encoded
Processing units: function
● Output, y, is computed as simple function of x and w
● This response function often split into two component parts:
– A transfer function that determines how the inputs are integrated.
– An activation function that determines the output the neuron produces.
Connection weights can be defined by:
● Setting weights explicitly using prior knowledge.
● Optimising connectivity to achieve some objective (e.g. using a genetic algorithm, or gradient decent).
● Training the network by feeding it training data and allowing it to adapt the connection weights.
Training can be:
– supervised
● e.g. Delta Learning Rule (see later)
– or unsupervised
● e.g. Hebbian learning rule (see later)
Processing units: function
● Typically all processing units, in each layer, perform the same computation.
● Computation is typically simple.
Specific artificial neural networks
● Linear Threshold Unit (or Perceptron)
● Competitive Learning Networks
● Inhibitory Feedback Networks
● Autoencoder Networks
● Radial Basis Function Networks
● Convolutional Neural Networks
● Generative Adversarial Networks
● Restricted Boltzmann Machines
● Hopfield Networks
● Kohonen Networks
● Capsule Networks
● there are lots more
future weeks
Linear Threshold Units
A generic artificial neural network
● An environment in which the system operates (inputs to the network, outputs from the network).
● A set of processing units (‘neurons’, ‘cells’, ‘nodes’).
● A set of weighted connections between units, wji, which
determines the influence of unit i on unit j.
● A transfer function that determines how the inputs to a unit are integrated.
● An activation function that determines the output the neuron produces.
● An activation state, yj, for every unit (‘response’, ‘output’).
● A method for setting/changing the connection weights.
Linear Threshold Unit
● Also known as a Perceptron
● (restricted case, where weights and activations are binary, known as Threshold Logic Unit, or McCulloch-Pitts neuron)
Linear Threshold Unit
transfer function
yj=H(∑wji xi−θj) i
activation function
● What is a Neural Network?
– biological neural networks
– artificial neural networks
● general principles and terminology
● why study artificial neural networks?
● Types of artificial neural network
– Linear Threshold Units (Perceptrons)
● Delta Learning Algorithm
– Competitive Learning Networks
– Negative Feedback Networks
– Autoencoder Networks
Linear Threshold Unit
y=H(∑wi xi−θ) i
Vector notation: (w row, x column)
y=H (wx−θ)
Linear Threshold Unit
(w row, x column)
Augmented vector notation: y=H (wx)
y=H(∑wi xi−θ) i
Vector notation: y=H (wx−θ)
Linear Threshold Unit
Simplified notation: x1
Linear Threshold Unit
Logical NOT:
x -1 -0.5 y=H(∑wi xi−θ)
x1 Transfer fn, Σwixi Activation fn, y
0 (-1×0)=0 H(0-(-0.5))=1
1 (-1×1)=-1 H(-1-(-0.5))=0
It is possible to build any logical circuit out of AND and NOT gates, therefore, it is possible to build any computational device using a network of linear threshold units.
Linear Threshold Unit
Logical OR:
0.5 y=H(∑wi xi−θ) i
x1 x2 Transfer fn, Σwixi y
Linear Threshold Unit
Logical AND:
1.5 y=H(∑wi xi−θ) i
x1 x2 Transfer fn, Σwixi Activation fn, y
0 0 (1×0)+(1×0)=0 H(0-1.5)=0
0 1 (1×0)+(1×1)=1 H(1-1.5)=0
1 0 (1×1)+(1×0)=1 H(1-1.5)=0
1 1 (1×1)+(1×1)=2 H(2-1.5)=1
Linear Threshold Unit
● wandθdefinea hyperplane that divides the input space into two parts
● This hyperplane is called the “decision boundary.”
Linear Threshold Unit
● AND and OR are linearly separable functions
● However, not all logical functions are linearly separable (e.g. XOR)
● So not all logical functions can be implemented by a single perceptron
● But can be represented by a multi-layer network of perceptrons
Linear Threshold Unit
Arbitrary Logical Function: (x1 AND x2) OR (x1 AND x3) OR (x2 AND x3)
x1 2 x2 -2
1 y=H(∑wi xi−θ) i
x1 x2 x3 Transfer fn, Σwixi =2×1-2×2+2×3 y
Equivalence to Linear Discriminant Functions
A single linear threshold unit is equivalent to a single linear discriminant function
● and can therefore act as a dichotomizer
– assign x to ω1 if g(x)>0 where g(x)=aty (y=[1 xt]t)
– assign x to ω1 if y=1 where y=H(wx)
● note the unfortunate clash in standard notation!
Equivalence to Linear Discriminant Functions
A layer of linear threshold units (without saturation, i.e. no heaviside function) is equivalent to a set of linear discriminant functions
● and can therefore act as a linear machine
– assign x to ωj if gj(x)>gi(x) ∀ i ≠ j ≡
(g(x)=aty) (y=wx)
assign x to ωj if yj>yi y1
Equivalence to Linear Discriminant Functions
Recall from an earlier lecture:
A linear discriminant function is a linear combination of
feature values:
– g(x)=wtx+w0
that divides feature-space
using a linear decision boundary
|W0| ||W||
Equivalence to Linear Discriminant Functions
A linear threshold unit can implement the same function as a generalised linear discriminant function:
y = w0 + w1f1(x) + w2f2(x) + … + wNfN(x)
where fi(x) for i=1 to N can be any scalar function of x
● fi(x) can be implemented by another neuron.
x1 ● both the expansion of the x2
feature-space and the discrimination can be performed by different layers of neurons in a x5 multilayer neural network (see later lecture). xm
The Delta Learning Algorithm
Equivalence to Linear Discriminant Functions
Recall from an earlier lecture:
Generalised linear discriminant functions can produce non-linear decision boundaries in the original feature space by applying a linear discriminant function to an expanded feature vector:
g(x) = w0 + w1f1(x) + w2f2(x) + … + wNfN(x)
where fi(x) for i=1 to N can be any scalar function of x
Weights for a linear threshold unit can be learnt using the same methods used to set the parameters of a linear discriminant function:
● Perceptron Learning
● Minimum Squared Error Learning (Widrow-Hoff) In addition:
● Delta Learning Rule
Delta Learning Rule
● Supervised.
● Adjust weights in proportion to the difference between:
– thedesiredoutput,t,and – the actual output, y
● sequential (online) update is: w←w+η(t−y)xt
● batch update is:
w←w+η∑(t p−y p)xtp p
(using augmented vector notation i.e. y=H(wx))
● What is a Neural Network?
– biological neural networks
– artificial neural networks
● general principles and terminology
● why study artificial neural networks?
● Types of artificial neural network
– Linear Threshold Units (Perceptrons)
● Delta Learning Algorithm
– Competitive Learning Networks
– Negative Feedback Networks
– Autoencoder Networks
Delta Learning Rule
Two types of update:
● False negative (t = 1, y=0)
– Make w more like x.
● Falsepositive(t=0,y=1)
w←w−ηxt ● Similar to (Multiclass) Perceptron learning rule.
Delta Learning Rule
Implemented using Gradient Descent.
e.g. pseudocode for sequential delta learning algorithm:
– Setvalueofhyper-parameter(η)
– Initialisewtoarbitrarysolution
– Foreachsample,(xk,tk)inthedatasetinturn
● update weights: w←w+η(tk−H(wxk))xtk Run through all samples in turn as many times as necessary
for convergence (i.e. until no changes in w).
If dataset is not linearly separable, learning will not converge!
– Make w less like x.
w←w+η(t−y)xt w←w+ηxt
Delta Learning Rule
● Supervised.
● Adjust weights in proportion to the difference between:
– thedesiredoutput,t,and
– the actual output, y
● sequential (online) update is: w←w+η(t−y)xt
● batch update is:
● Similar to Widrow-Hoff learning rule (except arbitrary margin,
b, replaced by desired output, t).
● Similar to Perceptron learning rule (as learning only occurs for misclassified exemplars, i.e. when y≠t).
w←w+η∑(t p−y p)xtp p
Sequential Delta Learning Algorithm
• Initialise w to arbitrary solution and select learning rate • Until convergence (all samples correctly classified)
• For each sample, xk, in the dataset in turn
− w←w+η(tk−H(wxk))xtk Example:
Sequential Delta Learning Algorithm
• Initialise w to arbitrary solution and select learning rate • Until convergence (all samples correctly classified)
• For each sample, xk, in the dataset in turn
− w←w+η(tk−H(wxk))xtk Example:
Sequential Delta Learning Algorithm
• Initialise w to arbitrary solution and select learning rate • Until convergence (all samples correctly classified)
• For each sample, xk, in the dataset in turn
− w←w+η(tk−H(wxk))xtk Example:
w initialised to [-1X.5,5,-1]
1 )=H(-1.5)=0
y=H(wx)X=H([-1.5,5,-1] x
0 1 2×1 w=[-1.5,5,-1]+1x(X1-0)x[1,0,0]=[-0.5,5,-1]
1 )=H(4.5)=1
y=H(wx)X=H([-0.5,5,-1] x
0 1 2×1 w=[-0.5,5,-1]+1x(X1-1)x[1,1,0]=[-0.5,5,-1]
Sequential Delta Learning Algorithm
• Initialise w to arbitrary solution and select learning rate • Until convergence (all samples correctly classified)
• For each sample, xk, in the dataset in turn
− w←w+η(tk−H(wxk))xtk Example:
Sequential Delta Learning Algorithm
• Initialise w to arbitrary solution and select learning rate • Until convergence (all samples correctly classified)
• For each sample, xk, in the dataset in turn
− w←w+η(tk−H(wxk))xtk Example:
1 )=H(-1.5)=0
y=H(wx)X=H([-0.5,5,-1] x
0 1 2×1 w=[-0.5,5,-1]+1x(X0-0)x[1,0,1]=[-0.5,5,-1]
1 )=H(2.5)=1
y=H(wx)X=H([-0.5,5,-1] x
0 1 2×1 w=[-0.5,5,-1]+1x(X0-1)x[1,1,2]=[-1.5,4,-3]
Sequential Delta Learning Algorithm
• Initialise w to arbitrary solution and select learning rate • Until convergence (all samples correctly classified)
• For each sample, xk, in the dataset in turn
− w←w+η(tk−H(wxk))xtk Example:
y=H(wx)X=H([-0.5,5,-1] x
0 1 2×1 w=[-0.5,5,-1]+1x(X1-1)x[1,2,1]=[-0.5,5,-1]
1 )=H(8.5)=1
Sequential Delta Learning Algorithm
• Initialise w to arbitrary solution and select learning rate • Until convergence (all samples correctly classified)
• For each sample, xk, in the dataset in turn
− w←w+η(tk−H(wxk))xtk Example:
Sequential Delta Learning Algorithm
• Initialise w to arbitrary solution and select learning rate • Until convergence (all samples correctly classified)
• For each sample, xk, in the dataset in turn
− w←w+η(tk−H(wxk))xtk Example:
1 )=H(3.5)=1
y=H(wx)X=H([-0.5,4,-3] x
0 1 2×1 w=[-0.5,4,-3]+1x(X1-1)x[1,1,0]=[-0.5,4,-3]
1 )=H(4.5)=1
y=H(wx)X=H([-0.5,4,-3] x
0 1 2×1 w=[-0.5,4,-3]+1x(X1-1)x[1,2,1]=[-0.5,4,-3]
Sequential Delta Learning Algorithm
• Initialise w to arbitrary solution and select learning rate • Until convergence (all samples correctly classified)
• For each sample, xk, in the dataset in turn
− w←w+η(tk−H(wxk))xtk Example:
y=H(wx)X=H([-1.5,4,-3] x
0 1 2×1 w=[-1.5,4,-3]+1x(X1-0)x[1,0,0]=[-0.5,4,-3]
1 )=H(-1.5)=0
Sequential Delta Learning Algorithm
• Initialise w to arbitrary solution and select learning rate • Until convergence (all samples correctly classified)
• For each sample, xk, in the dataset in turn
− w←w+η(tk−H(wxk))xtk Example:
Sequential Delta Learning Algorithm
• Initialise w to arbitrary solution and select learning rate • Until convergence (all samples correctly classified)
• For each sample, xk, in the dataset in turn
− w←w+η(tk−H(wxk))xtk Example:
1 )=H(-2.5)=0
y=H(wx)X=H([-0.5,4,-3] x
0 1 2×1 w=[-0.5,4,-3]+1x(X0-0)x[1,1,2]=[-0.5,4,-3]
1 )=H(-0.5)=0
y=H(wx)X=H([-0.5,4,-3] x
0 1 2×1 w=[-0.5,4,-3]+1x(X1-0)x[1,0,0]=[0.5,4,-3]
Sequential Delta Learning Algorithm
• Initialise w to arbitrary solution and select learning rate • Until convergence (all samples correctly classified)
• For each sample, xk, in the dataset in turn
− w←w+η(tk−H(wxk))xtk Example:
y=H(wx)X=H([-0.5,4,-3] x
0 1 2×1 w=[-0.5,4,-3]+1x(X0-0)x[1,0,1]=[-0.5,4,-3]
1 )=H(-3.5)=0
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com