程序代写 CCSMPNN 2020-21 2 / 52

1 Introduction
2 Feedforward Neural Networks
Neuron Model
Three-Layer Fully-Connected Feedforward Neural Networks Activation/Transfer Functions

General Feed-Forward Operation
Network Topology
3 Backpropagation Algorithm Network Operation Modes Network Learning Hidden-to-Output Weights Input-to-Hidden Weights Pseudo-Code Algorithm Learning Curves
4 Radial Basis Function Neural Networks RBF Network Learning
5 Comparison of RBF and MLP Networks
6 Conclusion
Dr H.K. Lam (KCL) Multilayer Neural Networks and Backpropagation 7CCSMPNN 2020-21 2 / 52

Introduction
Dr H.K. Lam (KCL) Multilayer Neural Networks and Backpropagation 7CCSMPNN 2020-21 3 / 52

Introduction
Goal: Classify objects by learning non-linearity
Inspired by biological neural networks ⇒ artificial neural networks (ANNs)
There are many problems for which linear discriminants are insufficient for minimum error.
In previous methods, the central difficulty was the choice of the appropriate nonlinear functions
A “brute-force” approach might be to select a complete basis set such as all polynomials; such a classifier would require too many parameters to be determined from a limited number of training samples
In using the multilayer Neural Networks, the form of the nonlinearity is learned from the training data.
Careful choice of network topology is required, nevertheless, to avoid an over-complex network or one that performs poorly.
Dr H.K. Lam (KCL) Multilayer Neural Networks and Backpropagation 7CCSMPNN 2020-21 4 / 52

Introduction
Characteristics
Massive parallelism
Distributed representation and computation Learning ability
Generalisation ability
Adaptivity
Dr H.K. Lam (KCL) Multilayer Neural Networks and Backpropagation 7CCSMPNN 2020-21 5 / 52

Introduction
Pattern Classifier
+ +++ +++++
+++ + ++ + + + + ++ ++
++++++++ +++++
over−fitting to noisy training data
Applications
Pattern recognition/classification,
e.g., speech or handwritten symbol
Abnormal Cardiogram
Pattern Classifier
Stock value
mining, data compression
Control, e.g., controller for
Load torque
Controller Load torque
Cardiogram
Airplane partially occluded by clouds
Retrieved airplane
re control, m
Cardiogram
Normal Abnormal
Normal Abnormal
Pattern Classifier
Throttle angle
++ +++++++
over−fitting to noisy training data
Associative Memory
Controller
over−fitting to
noisy training data
Pattern Classifier
+ ++++++++
Airplane partially
occluded by clouds
Idle speed
Associative
A Memory DA
engineering applications sDuch as + + ++ +
Load torque
Throttle angle
Idle speed
Stock value
(d) Clustering
(e) Control
Controller
Load torque Idle speed
Controller
Throttle Engine angle
Throttle angle
++++ ++ +++
+ + + + ++ ++
+ ++ + ++ +
+ ++ + + ++++
+ + + x + 1 2 ++3 +++ +++ ++
+++ ++ ++ + recognition + + + + +
+++ ++ ++ +++++
+++ ++ +++
Forecasting/prediction, e.g., stock
over−fitting to noisy training data
market prediction, weat over−fitting to
y Stock value
time series analysis
Function approximation, e.g., model
construction Clustering/categoBrisation, e.g., data
Stock value
t1 t2 t Associative
t1 t2 At3 tn tn+1 t D
(b) Prediction
(a) Classific
Airplane partially occluded by clouds
Pattern Classifier
(c) Approximation
Dr H.K. Lam (KCL)
Multilayer Neural Networks and Backpropagation
7CCSMPNN 2020-21

Introduction
Network Architectures
Feedforward networks: no loop exists (static system)
Feedback/Recurrent networks: loop exists (dynamic system)
Neural Networks
Feedforward Networks
Single−layer Perceptron
Radial Basis Function Nets
Competitive Networks
Kohonen’s SOM
Hopfield Network
ART Models
Figure 1: Different network architectures
Dr H.K. Lam (KCL) Multilayer Neural Networks and Backpropagation 7CCSMPNN 2020-21 7 / 52
Feedback/Recurrent Networks

Introduction
Learning paradigms
Supervised learning: Target is known for every input pattern. Weights are determined so that the network can produce output as close as possible to the known target. Reinforcement learning is a special case of supervised learning where the weights are determined by critiques on the corrections of network outputs, e.g., a reward function.
Unsupervised learning: Target is not known. Weights are determined by exploring the underlying structure in the data or correlation between patterns in the data.
Hybird learning: It combines the supervised learning and unsupervised learning. A portion of weights are determined by supervised learning while the others are determined by unsupervised learning.
Dr H.K. Lam (KCL) Multilayer Neural Networks and Backpropagation 7CCSMPNN 2020-21 8 / 52

Feedforward Neural Networks
Dr H.K. Lam (KCL) Multilayer Neural Networks and Backpropagation 7CCSMPNN 2020-21 9 / 52

Feedforward Neural Networks
A feedforward neural network (multilayer perceptron (MLP)) consists of one input layer, some hidden layers and one output layer.
Each layer is an array of neurons.
Layers are interconnected by links.
Each link is associate with a connection weight.
Input layer
Hidden layer
f(·) y1 wkj f(·) y2
Output layer
f(·) z1 f(·) z2
Figure 2: A diagram of three-layer fully-connected feedforward neural network.
Dr H.K. Lam (KCL) Multilayer Neural Networks and Backpropagation 7CCSMPNN 2020-21 10 / 52

Neuron Model
A single “bias unit” is connected to each unit other than the input units.
netj = x1wj1 +x2wj2 +···+xdwjd +wj0 yj =f(netj)
Hidden node j
Figure 3: A diagram of multi-input-single-output neuron model.
Dr H.K. Lam (KCL)
Multilayer Neural Networks and Backpropagation
7CCSMPNN 2020-21

Three-Layer Fully-Connected Feedforward Neural Networks
A three-layer neural network consists of an input layer, a hidden layer and an output layer interconnected by modifiable weights represented by links between layers.
A three-layer neural network can, in principle, model any nonlinear function arbitrarily well in a compact domain. (Universal approximator)
Typical structure: d − nH − c.
Network training is often based on the back propagation algorithm which is a form of gradient descent procedure (will cover this further on).
Figure 4: A diagram of 2-4-1 neural network.
Dr H.K. Lam (KCL) Multilayer Neural Networks and Backpropagation 7CCSMPNN 2020-21 12 / 52

Three-Layer Fully-Connected Feedforward Neural Networks
Net activation: netj = ∑xiwji +wj0 = ∑xiwji = wTj x.
wj2  x2  wherewj= . ;x=. wjd  xd 
where the subscript i indexes units in the input (source) layer, j in the hidden; wji denotes the input-to-hidden layer weights at the hidden unit j, x0 = 1
In neurobiology, such weights or connections are called “synapse”.
Each hidden unit emits an output that is a nonlinear function of its activation, that is:
yj =f(netj).
The function f (·) is called the activation function or transfer function.
Dr H.K. Lam (KCL) Multilayer Neural Networks and Backpropagation 7CCSMPNN 2020-21 13 / 52
wj1  x1 

Three-Layer Fully-Connected Feedforward Neural Networks
Each output unit similarly computes its net activation based on the hidden unit signals as:
netk = ∑yjwkj +wk0 = ∑yjwkj = wTk y,
j=1 j=0 zk =f(netk).
where the subscript k indexes units in the output layer and nH denotes the number of hidden units, y0 = 1
In the case of c outputs (classes), we can view the network as computing c discriminants functions zk = gk(x) and classify the input x according to the largest discriminant function
gk(x), k = 1, …, c.
Output node k
Dr H.K. Lam (KCL)
Multilayer Neural Networks and Backpropagation
7CCSMPNN 2020-21

Activation/Transfer Functions
1 Commonly used activation/transfer 0.5
f(n)=sgn(n)=
Symmetric hard limit (signum)
Symmetric hard limit (signum) transfer function: 􏰉
1 ifn≥0 −1 ifn<0 Linear transfer function: f (n) = n Symmetric sigmoid transfer function: f(n)= 2 −1 1+e−2n Logarithmic sigmoid transfer function: −1 −4 −2 0 2 4 1 0.5 0 −0.5 −1 −4 −2 0 2 4 n Logarithmic sigmoid 11 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 00 −4 −2 0 2 4 −4 −2 0 2 4 Symmetric sigmoid Radial basis Radial basis transfer function: f(n)=e−n2 −4 −2 0 2 4 nn Figure 5: Activation/transfer functions. Dr H.K. Lam (KCL) Multilayer Neural Networks and Backpropagation 7CCSMPNN 2020-21 15 / 52 General Feed-Forward Operation 􏰂d 􏰃 􏰂nH 􏰃 Recall yj = f(netj) = f ∑xiwji +wj0 ; zk = f(netk) = f ∑yjwkj +wk0 Hidden units enable us to express more complicated nonlinear functions and this extend the classification: 􏰂 nH gk(x)=zk =f ∑wkjf 􏰂 d 􏰃 ∑wjixi+wj0 +wk0 ,k=1,...,c. In matrix form: Z= z2 =f􏰂W Y+W 􏰃=f􏰂W f􏰂W x+W 􏰃+W 􏰃. . kj k0 kj ji j0 k0 The activation function does not have to be a signum function, it is often required to be continuous and differentiable. We can allow the activation function in the output layer to be different from the activation function in the hidden layer or have different activation functions for each individual unit. Dr H.K. Lam (KCL) Multilayer Neural Networks and Backpropagation 7CCSMPNN 2020-21 16 / 52 General Feed-Forward Operation Input layer Hidden layer Output layer x1 / w11 f(·)y1 f(·) z1 x w22 f(·) y m22 f(·) z 2/22 (bias) 1 (bias) 1 Dr H.K. Lam (KCL) Multilayer Neural Networks and Backpropagation 7CCSMPNN 2020-21 General Feed-Forward Operation Input layer Hidden layer Output layer x1 / w11 f(·)y1 f(·) z1 x w22 f(·) y m22 f(·) z 2/22 y1 = f(w11x1 +w12x2 +w10); y2 = f(w21x1 +w22x2 +w20); z1 = f(m11y1 +m12y2 +m10) z2 = f(m21y1 +m22y2 +m20) Dr H.K. Lam (KCL) Multilayer Neural Networks and Backpropagation 7CCSMPNN 2020-21 General Feed-Forward Operation Input layer Hidden layer Output layer x1 / w11 f(·)y1 f(·) z1 x w22 f(·) y m22 f(·) z 2/22 y1 = f(w11x1 +w12x2 +w10); y2 = f(w21x1 +w22x2 +w20); 􏰎􏰍􏰌􏰏 􏰎 􏰍􏰌 􏰏􏰎􏰍􏰌􏰏 􏰎 􏰍􏰌 􏰏 z1 = f(m11y1 +m12y2 +m10) z2 = f(m21y1 +m22y2 +m20) 􏰐y1􏰑 = f􏰇􏰐w11 w12􏰑􏰐x1􏰑+􏰐w10􏰑􏰈 ⇒ Y = f(Wjix+Wj0) y2 w21 w22 x2 w20 Y Wji x Wj0 Dr H.K. Lam (KCL) Multilayer Neural Networks and Backpropagation 7CCSMPNN 2020-21 General Feed-Forward Operation Input layer Hidden layer Output layer x1 / w11 f(·)y1 f(·) z1 x w22 f(·) y m22 f(·) z 2/22 y1 = f(w11x1 +w12x2 +w10); y2 = f(w21x1 +w22x2 +w20); 􏰐y1􏰑 = f􏰇􏰐w11 y2 w21 w12􏰑􏰐x1􏰑+􏰐w10􏰑􏰈 ⇒ Y = f(Wjix+Wj0) w22 x2 w20 z1 = f(m11y1 +m12y2 +m10) z2 = f(m21y1 +m22y2 +m20) Y􏰇Wji xWj0􏰈 􏰏􏰎􏰍􏰌􏰏 􏰎 􏰍􏰌 􏰏 z2 m21 m22 y2 m20 􏰎􏰍􏰌􏰏 􏰎 􏰍􏰌 􏰏􏰎􏰍􏰌􏰏 􏰎 􏰍􏰌 􏰏 Z Wkj Y Wk0 Dr H.K. Lam (KCL) Multilayer Neural Networks and Backpropagation 􏰐z1􏰑 = f 􏰐m11 m12􏰑􏰐y1􏰑+􏰐m10􏰑 ⇒ Z = f(WkjY+Wk0) 7CCSMPNN 2020-21 General Feed-Forward Operation Input layer Hidden layer Output layer x1 / w11 f(·)y1 f(·) z1 x w22 f(·) y m22 f(·) z 2/22 y1 = f(w11x1 +w12x2 +w10); y2 = f(w21x1 +w22x2 +w20); 􏰐y1􏰑 = f􏰇􏰐w11 y2 w21 􏰏􏰎􏰍􏰌􏰏 􏰎 􏰍􏰌 􏰏 z1 = f(m11y1 +m12y2 +m10) z2 = f(m21y1 +m22y2 +m20) w12􏰑􏰐x1􏰑+􏰐w10􏰑􏰈 ⇒ Y = f(Wjix+Wj0) w22 x2 w20 Y􏰇Wji xWj0􏰈 􏰐z1􏰑 = f 􏰐m11 m12􏰑􏰐y1􏰑+􏰐m10􏰑 ⇒ Z = f(WkjY+Wk0) z2 m21 m22 y2 m20 Z Wkj Y􏰀Wk0 􏰁 􏰍􏰌 􏰏􏰎􏰍􏰌􏰏 􏰎 􏰍􏰌 􏰏 Z = f(WkjY+Wk0) = f(Wkj f(Wjix+Wj0) +Wk0) Dr H.K. Lam (KCL) Multilayer Neural Networks and Backpropagation 7CCSMPNN 2020-21 Network Topology Any function from input to output can be implemented as a three-layer neural network, i.e., universal approximators. These results are of greater theoretical interest than practical, since the construction of such a network requires knowledge of the nonlinear functions and the weight values which are unknown. Hence, it needs some approach to train the weights based on some criteria. Dr H.K. Lam (KCL) Multilayer Neural Networks and Backpropagation 7CCSMPNN 2020-21 18 / 52 Backpropagation Algorithm Dr H.K. Lam (KCL) Multilayer Neural Networks and Backpropagation 7CCSMPNN 2020-21 19 / 52 Backpropagation Algorithm Backpropagation is one of the simplest and most general methods for supervised training of multilayer neural networks. Our goal now is to set the interconnection weights based on the training patterns and the desired outputs. The power of backpropagation is that it enables us to compute an effective error for each hidden unit, and thus derive a learning rule for the input-to-hidden weights, this is known as: “The credit assignment problem”. Dr H.K. Lam (KCL) Multilayer Neural Networks and Backpropagation 7CCSMPNN 2020-21 20 / 52 Network Operation Modes Feed-Forward Networks have two modes of operation: Feed-forward: The feed-forward operations consists of presenting a pattern to the input units and passing (or feeding) the signals through the network in order to get outputs units (no cycles!). Supervised Learning: The supervised learning consists of presenting an input pattern and modifying the network parameters (weights) to reduce distances between the computed output and the desired/teaching/target output pattern. Dr H.K. Lam (KCL) Multilayer Neural Networks and Backpropagation 7CCSMPNN 2020-21 21 / 52 Network Learning Input layer Hidden layer f(·) y1 wkj f(·) y2 f(·) ynH (bias) 1 Output layer Target f(·)z1 t1 f(·)z2 t2 . . f(·)zc tc Figure 6: A diagram of d − nH − c fully-connected three-layer feed-forward neural network. Dr H.K. Lam (KCL) Multilayer Neural Networks and Backpropagation 7CCSMPNN 2020-21 22 / 52 Network Learning Input Dataset (training/validation/test dataset): ... . , e.g., 􏰎􏰍􏰌􏰏 􏰎􏰍􏰌􏰏 􏰎􏰍􏰌􏰏 x1x2 xn 􏰌􏰏􏰎􏰍 􏰌􏰏􏰎􏰍 􏰌􏰏􏰎􏰍 x1→x11 x12 ···x1n x2 →  x21 x22 ··· x2n  􏰄 x→xd1xd2···xdn x3→4.75.5···2.87 pattern 1 pattern 2 pattern n −0.2 1.1 ··· 3.6 −7.1 −8.6 ··· 2.9 Target Dataset: 􏰌􏰏􏰎􏰍 􏰌􏰏􏰎􏰍 􏰌􏰏􏰎􏰍 t1→t11 t12 ···x1n 􏰐 t2→t21t22···t2n t1→10.2···−1 tc→tc1 tc2···tcn 􏰎􏰍􏰌􏰏 􏰎􏰍􏰌􏰏 􏰎􏰍􏰌􏰏 pattern 1 pattern 2 pattern n Dr H.K. Lam (KCL) Multilayer Neural Networks and Backpropagation 7CCSMPNN 2020-21 , e.g., t2 → −1 −0.5 ··· 2 Network Learning Before training: Network Initialisation Choose the network size, i.e., d, nH, c. Choose the number of hidden layers (for network with more than one hidden Choose the activation functions. Initialise the weights with pseudo-random values. Dr H.K. Lam (KCL) Multilayer Neural Networks and Backpropagation 7CCSMPNN 2020-21 24 / 52 Network Learning Dr H.K. Lam (KCL) Multilayer Neural Networks and Backpropagation 7CCSMPNN 2020-21 25 / 52 Network Learning The backpropagation learning rule is based on gradient descent (c.f. MSE or LMS method). The training error (at any instance): 1c1 J(w) = 2 ∑(tk −zk)2 = 2∥t−z∥2 k=1 where tk is the kth desired or target output; zk is the kth computed output, k = 1, . . ., c; w represents all weights of the network, ∥ · ∥ denotes the Euclidean norm operator, i.e., a = 􏰊a1,a2,··· ,an􏰋, ∥a∥ = 􏰜a21 +a2 +···+a2n. The weights are changed (updated) in a direction that will reduce the error: w(m + 1) = w(m) + ∆w(m), ∆w(m) = −η ∂ J(w(m)) ∂ w(m) where m denotes the iteration number and η > 0 is the (pre-defined) learning
rate (indicating the relative size of the change in weights).
Dr H.K. Lam (KCL) Multilayer Neural Networks and Backpropagation 7CCSMPNN 2020-21 26 / 52

Hidden-to-Output Weights
Error on the hidden-to-output weights (at the mth iteration): ∂J = ∂J ∂netk =−δ ∂netk
∂w ∂net ∂w k ∂w kj k kj kj
where the sensitivity of unit k is defined as: δk=− ∂J
which describes how the overall error (i.e., J) changes with the activation
function of the unit’s net (netk). y1
wk1 y2 wk2
Hidden node k
Dr H.K. Lam (KCL)
Multilayer Neural Networks and Backpropagation
7CCSMPNN 2020-21

Hidden-to-Output Weights
δk = − ∂J ∂netk
= − ∂J ∂zk ∂zk ∂netk
= (tk −zk)∂f(netk) = (tk −zk)f′(netk) ∂netk
netk = ∑yjwkj +wk0 = ∑yjwkj = wTk y
∂netk =yj, y0=1forthebias. ∂ wkj
Conclusion: the weight update (learning rule) for the hidden-to-output weights (i.e., wkj) is:
∆wkj = −η ∂J = ηδkyj = η(tk −zk)f′(netk)yj ∂ wkj
New weight wkj at the next iteration:
wkj(m+1) = wkj(m)+∆wkj(m), j = 0(bias),1,…,nH;k = 1,…,c;y0 = 1.
Dr H.K. Lam (KCL) Multilayer Neural Networks and Backpropagation 7CCSMPNN 2020-21 28 / 52

Input-to-Hidden Weights
Error on the input-to-hidden units (at the mth iteration): ∂J =∂J ∂yj ∂netj.
∂wji ∂yj ∂netj ∂wji
Since J = 12 ∑(tk −zk)2, yj = f(netj) and netj = ∑xiwji +wj0, then
∂Jc∂J∂zk c ∂zk ∂y =∑∂z ∂y =−∑(tk−zk)∂y
jk=1kjk=1 j
= − ∑(tk −zk)
Multilayer Neural Networks and Backpropagation 7CCSMPNN 2020-21 29 / 52
∂zk ∂netk c
= − ∑(tk −zk)f (netk)wkj,
∂ yj ∂ netj
Dr H.K. Lam (KCL)
= f ′(netj), ∂ netj = xi (x0 = 1 for wj0(bias)).
k=1􏰎 􏰍􏰌 􏰏 δk

Input-to-Hidden Weights
∂w =−∑δkwkjf (netj)xi
The sensitivity at a hidden unit (i.e., δj = f ′ (netj ) ∑ck=1 wkj δk ) is simply the sum of the individual sensitivities at
the output units (i.e., δj) weighted by the hidden-to-output weights wkj; all multiplied by f′(netj). Conclusion: The learning rule for the input-to-hidden weights (i.e., wji) is:
∂J ′ 􏰄c 􏰅 ∆wji =−η∂w =ηδjxi =ηf (netj) ∑wkjδk xi
ji k=1 where i = 0(bias),1,…,d;j = 0(bias),1,…,nH;k = 1,…,c.
New weight wji at the next iteration:
wji(m + 1) = wji(m) + ∆wji(m)
where i = 0(bias),1,…,d;j = 0(bias),1,…,nH;k = 1,…,c;x0 = 1,y0 = 1.
Dr H.K. Lam (KCL) Multilayer Neural Networks and Backpropagation 7CCSMPNN 2020-21 30 / 52

Pseudo-Code Algorithm
Algorithm: Stochastic Backpropagation
begin initialise network topology (d, nH , c), w, criterion θ , η , m ← 0
do m ← m + 1
xm ← randomly chosen pattern wkj ←wkj+ηδkyj
wji ←wji+ηδjxi
until |∆J(w(m))| = |J(w(m)) − J(w(m − 1))| < θ return w end Table 1: Algorithm for Stochastic Backpropagation. Remark: δj and δk are calculated with the old values of wji and wkj. Question: Must initialise w with non-zero vector, why? Dr H.K. Lam (KCL) Multilayer Neural Networks and Backpropagation 7CCSMPNN 2020-21 31 / 52 Pseudo-Code Algorithm Stopping Criterion: The algorithm terminates when the change in the criterion function J(w) is smaller than some preset value θ. So far, we have considered the error on a single pattern. A weight update may red 程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts