Backpropagation of Error
Marc de Kamps
Institute for Artificial and Biological Intelligence University of Leeds
Copyright By PowCoder代写 加微信 powcoder
Multilayer-Perceptrons
Single Perceptron: important, powerful, especially as part of more complex tools:
Multi-layered networks Support Vector Machines
Fails on relatively simple problems: XOR
Already established that a three-layer network can solve XOR: hand-picked weights
Very laborious: need an automatic procedure
Backpropagation of Error
Backpropagation
Distinguish between the network: multi-layer perceptron: a network architecture
and algorithm: backpropagation is a way to set the weights Modern deep learners use clever ways to pre-train
But backpropagation is still the most widely used method to finish training, even in modern architectures
Backpropagation
Marc de Kamps
A Graded Perceptron Continuous Dependency on Weights
The classic perceptron can be described using the Heaviside function:
o = H(wixi) i
Here we used the Heaviside function as squashing function x<0: 0
H(x)= x≥0: 1 Instead, now we will use the sigmoid:
f(x) = 1 1+e−βx
Typically, β is fixed. It can be set to ensure a hard or a soft decision
Graded response
Rather than a hard decision, a perceptron using a sigmoid produces a graded response This can be used to:
Express uncertainty: e.g. probability Approximate functions
Backpropagation
Marc de Kamps
A Perceptron Network
Perceptrons can be grouped together, forming a two-layer network
This network can be trained by a vector version of the perceptron algorithm
No lateral connections in this network, so you can train each node individually
Backpropagation
Marc de Kamps
A Perceptron Network Matrix-Vector Representation
Represent output by vector: ⃗o
Now need a weight matrix
⃗o = f ( W · ⃗x )
Components wij
i runs over output, j input
Backpropagation
Marc de Kamps
Different Ways of Looking at Network 1 - As a Matrix-Vector Operation
⃗ ⃗o = w⃗ ⃗x
Different Ways of Looking at Network 2 - In terms of components
oi =wijxj j
Different Ways of Looking at Network 3 - In terms a numerical computation
x1 o1 w11 w12 w13 w14 x2
o=wwwwx= 2 21 22 23 24 3
Direction is important! Features in Backpropagation
x1 o1 w11 w12 w13 w14 x2
o=wwwwx= 2 21 22 23 24 3
oi =wijxk j
x1 w11 w21
x2 w12 w22 o1
x=wwo= 3 13 23 2
x4 w14 w24
xi =wjioj j
The Error Function AKA loss Function
We are still considering supervised learning ⃗
Data points are presented as vectors x(i)
They come with a desired classifcation (function value) d(i)
A network is a machine that transforms an input vector x(i) into an output o(i) (observed value)
E = 1(d(i) −o(i))2 2kk
E is a measure for how well the machine approximates the
function that produced the data
E is a function of the weights only, given a fixed data set The sum over k runs over all output nodes
The sum over i runs over all data points
Backpropagation
Marc de Kamps
The Error Function A Single Perceptron
The Equation of a single perceptron is:
o = f(wkxk) k
therefore the error function is:
E = 1(d(i) −f(wkx(i)))2
or in full:
E = 1 (d(i) − 1 )2
1+e− w x(i) i kkk
The latter form is a bit impractical
The Data Matrix
We have treated a single data point as a vector, e.g.: 1
x⃗ i = 2 3
The vector is sometimes written as:
X(i) Aligning all vectors produces a matrix:
X = [X(1)X(2) ···X(m)]
xik : components of X
n: dimension of data points m: number of data points
i labels the component of data point k (row)
k labels the data point Backpropagation
Marc de Kamps
Neural Networks Perform Regression
Regression requires a model f(x). The residu of a single data point is:
ri =(y(i)−f(xi))2 Linear regression uses a line
as a regression model:
ri = (y(i) − (ax(i) + b))2
(x, y) 1 1
Neural Networks Perform Regression
For a minimum the following conditions must hold:
∂R = 0, ∂a
The gradient is:
∂ R = −2(y(i) −(ax(i) +b))x(i)
Two equations with two
unknowns Least-squares method
∂R = −2(y(i)−(ax(i)+b))
Backpropagation
Marc de Kamps
Steepest Gradient Descent Recap
Calculate gradient: ∂E ∂wj
wj →wj −λ∂E ∂wj
λ learning rate Repeat
Backpropagation
Marc de Kamps
Steepest Gradient Descent Single Perceptron
The gradient:
E = 1 (ok − dk )2
∂E =(ok −dk)f′(wlxl)∂wkxm ∂wpi lm∂wp
Usef′ =f(1−f)
=(ok −dk)ok(1−ok)xp k
Gives the so-called delta rule
This is for a single data point! What is the only change if there are more?
Multi-layer Perceptron Steepest Gradient Descent
Input vector: ⃗x Hidden layer ⃗h Output vector ⃗o
Two weight matrices: V ⃗
⃗o = f (W · f (V · ⃗x ))
⃗o = f (W · h), with:
h = f ( V · ⃗x )
Backpropagation
Marc de Kamps
Multi-layer Perceptron Steepest Gradient Descent
Determine ∂E ∂ wij
Determine ∂E ∂ vkl
Derivation in hand out
Backpropagation
Marc de by Error A Single Datapoint
E = 12 ( ⃗d − ⃗o ) 2
⃗⃗ ⃗o=f(w⃗ ·f(⃗v⃗x))
Backpropagation by Error Output layer
∆(2) =(o −d )o (1−o ) ppppp
=(⃗o − ⃗d) · ∂⃗o ∂wpq
⃗ ′⃗⃗ ∂⃗w·⃗h =(⃗o−d)·f (w⃗ ·h)· ∂wpq
=(ok −dk)ok(1−ok)∂l wklhl k ∂wpq
=(op − dp)op(1 − op)hq =∆(2) h
Backpropagation by Error Hidden layer
The hidden layer is more work:
∂ok ′⃗⃗⃗′⃗∂⃗v·⃗x ∂v =f(w⃗·h)w⃗·f(⃗v·x)∂v
pq pq =ok(1−ok)wkphp(1−hp)xq, (1)
∂E =(ok −dk)ok(1−ok)wkphp(1−hp)xq ∂wpq k
= ∆(2)wkphp(1 − hp)xq k
This is again of the form ∆(1)x , with pq
∆(1)=∆(2)w h(1−h) p kkppp
Backpropagation by Error Interpretation
We can find the gradient of the weights between layer node p in layer i + 1, and node q in layer i as follows:
∂E = ∆(i+1)x ∂w(i)pq p q
For the output layer we have:
∆(out) =(o −d )o (1−o )
Below we have:
∆(i) =∆(i+1)w h (1−h ) p k kpp p
Observe, the summation order: backpropagation
Backpropagation by Error First Step
Calculate the error:
∆(w) =(o −d )o (1−o )
The gradient by:
∂E =∆(w)o ∂wpq p q
This is the percptron learning rule!
Backpropagation by Error Second Step
Backpropagate the error: ∆(v)=∆(w)w h(1−h)
Apart from the weird factor hp(1 − hp) this amounts to backpropagation of error
Backpropagation by Error Third Step
Calculate the gradient of the V matrix:
∂E =∆(v)hl ∂vkl k
Backpropagation by Error Fourth Step
Update weights:
w⃗ → w⃗ − λ ∂ E ∂w
⃗v → ⃗v − λ ∂ E ∂v
Backpropagation
Marc de Kamps
General Remarks
General Remarks
General Remarks
General Remarks
General Remarks
General Remarks
General Remarks
General Remarks
General Remarks
General Remarks
General Remarks
General Remarks
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com