CS作业代写 Backpropagation

Backpropagation
1. Why do we need to rescale the input data before training an MLP?
MLPs use metric information (distances) to determine di error. Therefore, dimensions whose scale is far larger than others will dominate the error, making the smaller-scale features irrelevant.
2. How does the error propagate from the output layer to the hidden layer?

Copyright By PowCoder代写 加微信 powcoder

Through the δ: δj=(zj−tj)zj(1−zj) .
3. Why de we compute the gradient of the error?
In order to perform gradient descent. The anti-gradient (the direction opposite to the gradient) is the steepest direction for descent.
4. Derive the update rule of the output neurons according to backpropagation.
We aim to minimise the mean squared error, that is: E=12(y−t)2, where y is the output of
the MLP, and t is the desired class. The output y=σ(a)) is a sigmoid function, applied to the dot product between the weights of the output neuron and its inputs (including the bias
input): a=∑wi zi. We derive the gradient of the error with respect to each weight by i
applying the chain rule: ∂E =∂ E ∂a =(y−t) y(1−y)zi ,
∂wi ∂a ∂wi
where we used the derivative of the sigmoid:(σ(x))’=σ(x)(1−σ(x)). Substituting the
gradient into the general update rule for gradient descent: xt+1=xt−η∇f (xt) we obtain the update rule for the output weights:
wt+1=wt−η(y−t)y(1−y)z
5. Derive the update rule of the hidden neurons according to backpropagation. For a hidden neuron j, we need to compute its δ:
δj=∂E=∑∂E ∂ak=∑δk wjk zj(1−zj) , where the ak are the output of the ∂aj k ∂ak∂aj k
neurons that follow neuron j in the network, ∂ak=∑wjkσ(aj)=wjkσ(aj)(1−σ(aj)). , ∂aj j
and σ(a j)=z j . Then, we can chain this with the derivative of a j with respect to any of
the weights connected to its input, wij : ∂aj =zi . So the gradient of the error with ∂wij

respect to wij is ∂E =zi∑δk wjk zj(1−zj) , which leads to the update rule: ∂wij k
w(t+1)=w(t)−ηz∑δw z(1−z) . ij ij i kjkj j

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com