CS代写 The Perceptron

The Perceptron
1. Consider the function f(x)=x3+2×2 and the current solution xt=⟨2⟩ compute one step of gradient descent with learning rate η=0.1 .
∇f(x)=3×2+4x ∇f(xt)=3(2)2+4(2)=20 xt+1=xt−η∇f(xt)=2−0.1⋅20=0
The new point is xt+1=0 . We can verify that it is an improvement over the previous point,

Copyright By PowCoder代写 加微信 powcoder

because the value of the function is lower: f (2)=16 , while f (0)=0 minimum, since ∇f(0)=0 .
2. Consider the function f(x,y,z)=x2+yz+yz2 and the current solution compute one step of gradient descent with learning rate η=0.1 .
∇f(x)=⟨2x,z+z2 , y+2 yz⟩
∇f(xt)=⟨2,0,−1⟩ xt+1=⟨1,1,−1⟩−0.1⋅⟨2,0,−1⟩=⟨0.8,1,−0.9⟩ .
. It is also a
xt=⟨1,1,−1⟩
3. The threshold of the activation function of the perceptron is a parameter to learn, as much as the weights. How can we put the threshold in the same form as the weights, so that the same update rule can be used for it?
The equation of the output of the perceptron is: o(x)={1 if wT x>k
0 if wT x≤k
both inequalities can be rewritten by moving the threshold k to the other side, and
incorporating it into the vector of weights, by adding a constant input: wT x−k>0=∑wi xi+k(−1)=0
where -1 can be added to the inputs (forming the bias input), and k to the weights.
4. What is the equation of the decision boundary for a perceptron? What does it represent?
wT x+w0=0 . It is the equation of an hyperplane.
5. Given the equation of the decision boundary of a perceptron in 2D (a straight line), what is the slope and what is the intercept?

In 2D the equation is: w0+w1 x1+w2 x2=0 . From which: x2=−w1 x1−w0 , where w2 w2
−w1 is the slope, and −w0 is the intercept. w2 w2
6. What does it mean for two classes to be linearly separable in D dimensions?
That there exists a hyperplane in D dimensions that separates the points of the two classes.
7. Will the perceptron algorithm always converge to a point with zero error?
Only if the dataset is linearly separable.
8. What error function do we use to derive the update rule for the perceptron? Why not the number of errors on a dataset? What is the advantage of the function we use over the number of errors?
We use the error function: E(X)=∑ wT xn(yn−tn) where X is the dataset, yn is the xn∈X
output of the perceptron on point xn, and tn is the desired output for point xn . We prefer this function to the number of errors because it is proportional to the distance of the misclassified points from the decision boundary and is differentiable. The number of errors, on the other hand, does not provide any direction for improvement, since the error does not change smoothly as the hyperplane changes.
9. Construct a perceptron able to separate the points: <1,1,0>, <2,3,1> where the last element is the class.
One possible solution is given by the perceptron implementing the boundary y=2 which corresponds to the weight vector: ⟨−2,0,1⟩ . Note that any line separating the classes is a valid solution! The resulting perceptron is:

10. Add a new point to the dataset from the previous question, so that the perceptron you computed missclassifies it. Perform one step of gradient descent on the perceptron error to improve the error on the new data point.
The point <2,1> of class 1 is misclassified by the perceptron above. The error is: 1
E(X)=∑wTx(y−t)=[−2,0,1]⋅2(0−1)=1 .
Stochastic gradient descent performs the update wk+1=wk−ηx(y−t) , in our case:
wk+1=⟨−2,0,1⟩−0.1⋅⟨1,2,1⟩⋅(0−1)=⟨−1.9,0.2,1.1⟩ . 1
The error of the new vector is: [−1.9, 0.2, 1.1]⋅ 2 (0−1)=0.4 which is lower than before. []
11. Construct a perceptron able to separate the points <1,1,0>, <1, -1, 1>, and <-1, 1, 1> where the last element is the class.
An equation that separates the points is y=−x+1 which is equivalent to x+ y−1=0 which leads to the weight vector: ⟨−1,1,1⟩ . The part of the vector that is multiplied by

the variables (note how this is the gradient of the line…) is ⟨1,1⟩ and points towards the point of class 0, while we want the perceptron to return 1 in the other half-plane. Therefore we need to multiply the vector by -1, and obtain the weight vector: w=⟨1,−1,−1⟩ and the corresponding perceptron:

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com