代写 R algorithm matlab UML Assigment 2

Assigment 2
EECS 4404/5327
The assignment is due on Thursday, March 14, 4pm.
Place your submission in the dropbox Lassonde building, ground floor. 1. Gradient computation
Let X ∈ Rd×d be some matrix. Consider the function f : Rd → R defined by f(w) = wT Xw.
Show that the gradient of this function with repect to w is ∇f (w) = wT (X + XT ). (10 marks)
2. Stochastic Gradient Descent
Recall the logistic loss function, pointwise defined as
llogist(yw, (x, t)) = ln(1 + exp(−t ⟨w, x⟩)) We know that the empirical logistic loss over a dataset
n=1
(a) Compute the gradient with respect to w of the logistic loss llogist(yw,(x,t)) on a data point (x, t).
(b) Describe Stochastic Gradient Descent with a fixed stepsize η with respect to the logistic loss
(c) Compare the derived algorithm to the SGD algorithm we derived for SVM. What is similar? What is different? On what type of data would you expect to see different behavior?
LD (yw) = N
to minimize the logistic loss over a data set.
logist 1 􏰃N
ln(1 + exp(−tn ⟨w, xn⟩))
is a convex function in w. We will now consider running stochastic gradient descent
1
(5 + 8 + 7 marks)

3. SoftSVM optimization
In this question you will implement the SGD optimization method for SoftSVM to find a linear with minimal empirical loss. The first dataset is stored in the file, bg.txt. The file contains only the feature vectors. There is one feature vector per line. The first 100 lines correspond to positive instances (label +1) and the next 100 lines are negative instances (label −1). The data can be loaded into a matrix with the load function in Matlab/Octave.
(a) Implement SGD for SoftSVM. You may skip the averaging over weight vectors for the output and instead, simply output the last iterate. During the optimization, keep track of both the empirical and the hinge loss of the current weight vector. Include a printout of your code.
(2 marks )
(b) Run the optimization method with various values for the regularization parameter λ = {100,10,1,.1,.01,.001} on the data (remember to first add an additional feature with value 1 to each datapoint, so that you are actually training a general linear classifier). For each value plot the binary loss of the iterates and the hinge loss of the iterates (in separate plots). Include three plots of your choice where you observe distinct behavior (you may run the method several times for each parameter setting and choose).
(4 marks )
(c) Discuss the plots. Are the curves monotone? Are they approximately monotone? Why or why not? How does the choice of λ affect the optimization? How would you go about finding a linear predictor of minimal binary loss?
(4 marks )
(d) Download the “seeds” data set from the UCI repository: https://archive.ics.uci.edu/ml/datasets/seeds.
That data is also stored in text file and can be loaded the same way. It contains 210 instances, with three different label (the last column in the file corresponds to the label).
(e) Train three binary linear predictors. w1 should separate the first class from the other two (ie the first 70 instances are labeled +1 and the next 140 instances −1). w2 should separate the second class from the other two and w3 should separate the third class from the first two classes (ie for training w2 label the middle 70 instances positive and the rest negative and analogously for w3). Report the binary loss that you achieve with w1, w2 and w3 for each of these tasks.
2
(5 marks )

(f) Turn the three linear separators into a multi-class predictor for the three different classes in the seeds dataset using the following rule:
y(x) = argmaxi∈{1,2,3} ⟨wi, x⟩
(See also UML Chapter 17, page 190-191.) Compute and report the loss on the
original labels using this rule.
Store the three weight vectors in a text file, w1 on the first line, w2 one the second and w3 on the third. Make sure that the last entry for each of these corresponds to the bias. Submit the file by email to ruth@eecs.yorku.ca. For this last part, you may also choose a different method to find the three linear separators that give the best multi-class prediction. Report what you did and what the multi-class error of your submitted vectors is.
(5 marks )
IF ANYTHING IS UNCLEAR, COME TO MY OFFICE HOURS OR WRITE ME EMAILS AND ASK QUESTIONS 🙂
3