CS代写 Backpropagation of Error

Backpropagation of Error
Marc de Kamps
Institute for Artificial and Biological Intelligence University of Leeds

Multilayer-Perceptrons
Single Perceptron: important, powerful, especially as part of more complex tools:
Multi-layered networks Support Vector Machines
Fails on relatively simple problems: XOR
Already established that a three-layer network can solve XOR: hand-picked weights
Very laborious: need an automatic procedure
Backpropagation of Error
Backpropagation
Distinguish between the network: multi-layer perceptron: a network architecture
and algorithm: backpropagation is a way to set the weights Modern deep learners use clever ways to pre-train
But backpropagation is still the most widely used method to finish training, even in modern architectures
Backpropagation
Marc de Kamps

A Graded Perceptron Continuous Dependency on Weights
The classic perceptron can be described using the Heaviside function:
o = H(􏰅wixi) i
Here we used the Heaviside function as squashing function 􏰎x<0: 0 H(x)= x≥0: 1 Instead, now we will use the sigmoid: f(x) = 1 1+e−βx Typically, β is fixed. It can be set to ensure a hard or a soft decision Graded response Rather than a hard decision, a perceptron using a sigmoid produces a graded response This can be used to: Express uncertainty: e.g. probability Approximate functions Backpropagation Marc de Kamps A Perceptron Network Perceptrons can be grouped together, forming a two-layer network This network can be trained by a vector version of the perceptron algorithm No lateral connections in this network, so you can train each node individually Backpropagation Marc de Kamps A Perceptron Network Matrix-Vector Representation Represent output by vector: ⃗o Now need a weight matrix ⃗o = f ( W · ⃗x ) Components wij i runs over output, j input Backpropagation Marc de Kamps Different Ways of Looking at Network 1 - As a Matrix-Vector Operation ⃗ ⃗o = w⃗ ⃗x Different Ways of Looking at Network 2 - In terms of components oi =􏰅wijxj j Different Ways of Looking at Network 3 - In terms a numerical computation  x1  􏰕o1 􏰖 􏰕w11 w12 w13 w14 􏰖x2  o=wwwwx= 2 21 22 23 24 3 Direction is important! Features in Backpropagation  x1  􏰕o1 􏰖 􏰕w11 w12 w13 w14 􏰖x2  o=wwwwx= 2 21 22 23 24 3 oi =􏰅wijxk j x1 w11 w21 x2  w12 w22 􏰕o1 􏰖 x=wwo=  3   13 23  2 x4 w14 w24 xi =􏰅wjioj j The Error Function AKA loss Function We are still considering supervised learning ⃗ Data points are presented as vectors x(i) They come with a desired classifcation (function value) d(i) A network is a machine that transforms an input vector x(i) into an output o(i) (observed value) E = 1􏰅􏰅(d(i) −o(i))2 2kk E is a measure for how well the machine approximates the function that produced the data E is a function of the weights only, given a fixed data set The sum over k runs over all output nodes The sum over i runs over all data points Backpropagation Marc de Kamps The Error Function A Single Perceptron The Equation of a single perceptron is: o = f(􏰅wkxk) k therefore the error function is: E = 1􏰅(d(i) −f(􏰅wkx(i)))2 or in full: E = 1 􏰅(d(i) − 1 )2 1+e−􏰄 w x(i) i kkk The latter form is a bit impractical The Data Matrix We have treated a single data point as a vector, e.g.: 1 x⃗ i =  2  3 The vector is sometimes written as: X(i) Aligning all vectors produces a matrix: X = [X(1)X(2) ···X(m)] xik : components of X n: dimension of data points m: number of data points i labels the component of data point k (row) k labels the data point Backpropagation Marc de Kamps Neural Networks Perform Regression Regression requires a model f(x). The residu of a single data point is: ri =(y(i)−f(xi))2 Linear regression uses a line as a regression model: ri = (y(i) − (ax(i) + b))2 (x, y) 1 1 Neural Networks Perform Regression For a minimum the following conditions must hold: ∂R = 0, ∂a The gradient is: ∂ R = −2􏰅(y(i) −(ax(i) +b))x(i) Two equations with two unknowns Least-squares method ∂R = −2􏰅(y(i)−(ax(i)+b)) Backpropagation Marc de Kamps Steepest Gradient Descent Recap Calculate gradient: ∂E ∂wj wj →wj −λ∂E ∂wj λ learning rate Repeat Backpropagation Marc de Kamps Steepest Gradient Descent Single Perceptron The gradient: E = 1 􏰅(ok − dk )2 ∂E =􏰅(ok −dk)f′(􏰅wlxl)􏰅∂wkxm ∂wpi lm∂wp Usef′ =f(1−f) =􏰅(ok −dk)ok(1−ok)xp k Gives the so-called delta rule This is for a single data point! What is the only change if there are more? Multi-layer Perceptron Steepest Gradient Descent Input vector: ⃗x Hidden layer ⃗h Output vector ⃗o Two weight matrices: V ⃗ ⃗o = f (W · f (V · ⃗x )) ⃗o = f (W · h), with: h = f ( V · ⃗x ) Backpropagation Marc de Kamps Multi-layer Perceptron Steepest Gradient Descent Determine ∂E ∂ wij Determine ∂E ∂ vkl Derivation in hand out Backpropagation Marc de by Error A Single Datapoint E = 12 ( ⃗d − ⃗o ) 2 ⃗⃗ ⃗o=f(w⃗ ·f(⃗v⃗x)) Backpropagation by Error Output layer ∆(2) =(o −d )o (1−o ) ppppp =(⃗o − ⃗d) · ∂⃗o ∂wpq ⃗ ′⃗⃗ ∂⃗w·⃗h =(⃗o−d)·f (w⃗ ·h)· ∂wpq =􏰅(ok −dk)ok(1−ok)∂􏰄l wklhl k ∂wpq =(op − dp)op(1 − op)hq =∆(2) h Backpropagation by Error Hidden layer The hidden layer is more work: ∂ok ′⃗⃗⃗′⃗∂⃗v·⃗x ∂v =f(w⃗·h)w⃗·f(⃗v·x)∂v pq pq =ok(1−ok)wkphp(1−hp)xq, (1) ∂E =􏰅(ok −dk)ok(1−ok)wkphp(1−hp)xq ∂wpq k = 􏰅 ∆(2)wkphp(1 − hp)xq k This is again of the form ∆(1)x , with pq ∆(1)=􏰅∆(2)w h(1−h) p kkppp Backpropagation by Error Interpretation We can find the gradient of the weights between layer node p in layer i + 1, and node q in layer i as follows: ∂E = ∆(i+1)x ∂w(i)pq p q For the output layer we have: ∆(out) =(o −d )o (1−o ) Below we have: ∆(i) =􏰅∆(i+1)w h (1−h ) p k kpp p Observe, the summation order: backpropagation Backpropagation by Error First Step Calculate the error: ∆(w) =(o −d )o (1−o ) The gradient by: ∂E =∆(w)o ∂wpq p q This is the percptron learning rule! Backpropagation by Error Second Step Backpropagate the error: ∆(v)=􏰅∆(w)w h(1−h) Apart from the weird factor hp(1 − hp) this amounts to backpropagation of error Backpropagation by Error Third Step Calculate the gradient of the V matrix: ∂E =∆(v)hl ∂vkl k Backpropagation by Error Fourth Step Update weights: w⃗ → w⃗ − λ ∂ E ∂w ⃗v → ⃗v − λ ∂ E ∂v Backpropagation Marc de Kamps General Remarks General Remarks General Remarks General Remarks General Remarks General Remarks General Remarks General Remarks General Remarks General Remarks General Remarks General Remarks 程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts