代写代考 The Multi-layer Perceptron

The Multi-layer Perceptron
1. What is the difference between a batch and a single stochastic update? What are the reasons to prefer one over the other?
A batch update computes the gradient using a set of misclassified points, while an online update uses a single point. The batch update uses a better estimate of the gradient: indeed if the set of used points is the whole set of misclassfied points, the computed gradient is exact. On the other hand, stochastic gradient descent converges in expectation, and the single update may increase the error on the whole dataset (but it will decrease on average). A Batch update is therefore preferable, but more computationally expensive. On the other hand, the single stochastic update is faster to compute, and can be performed online as new data points arrive, if the training set grows over time.
2. Why do we need to substitute the step function with the sigmoid for the Multi-Layer Perceptron (MLP)?

Copyright By PowCoder代写 加微信 powcoder

Because the step function is not differentiable in 0, and the gradient would be of no use anywhere else as well. The sigmoid function, on the other hand, is smooth.
3. What optimization algorithm do we use, both for the perceptron and the MLP?
Gradient descent.
4. Given the dataset: <1,1,0>,<2,1,0>,<1,3,1>,<2,-1,1>, where the last element of each vector is the class, construct an MLP that separates the classes.
This data set is not linearly separable, therefore we can’t use a single neuron. There are infinitely many solutions, an easy one to compute is to use the two neurons with decision boundaries y=2 and y=0 , corresponding to the weight vectors: ⟨−2,0, 1⟩ and ⟨ 0,0, 1⟩ respectively. The first vector ( ⟨ 0,1⟩ ) points upward , and is therefore OK. The second vector ( ⟨0,1⟩ ) also points upward, while in this case we want the neuron to return 1 below the line y=0 . Therefore, we it is best to invert the second vector of weights, which becomes ⟨0,0,−1⟩ . These two vectors constitute the first layer of our MLP.
The hidden layer has to compute the union of the two areas in which the first layer returns 1. This corresponds to a logical OR operator:
Realised, for instance, by the discriminating boundary y=−x+12 , which corresponds to the weight vector ⟨−12 ,1,1⟩ . The final, resulting, MLP is:

5. Given the dataset: <2,1,0>,<1,2,0>,<-1,2,1>,<-1,-3,1>,<2,3,1>, construct an MLP that separates the classes.
Again, we begin by choosing two possible discriminating lines, for instance x = 0, and y = 2.5, with vectors w1=⟨0,1,0⟩ and w2=⟨−2.5,0,1⟩ . The first vector returns 1 on the right-hand side of

the corresponding vertical line, so we need to invert it: w1=⟨0,−1,0⟩ . These two vectors form the first layer. The second layer is also an OR, so is the same as the previous answer. The resulting MLP is:

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com