STA314 Fall 2021 Homework 2
Homework 2 – V2 Oct. 3
Deadline: Thursday, Oct. 14, at 11:59pm.
Submission: You need to submit one or two files through Quercus with our answers to Ques- tions 1, 2, and 3 as well as code requested for Question 2. You can submit a single PDF titled hw2_writeup.pdf with screenshots of your code. You may also upload PDF titled hw2_writeup.pdf and a Python file title q2.py. You can produce the PDF file however you like (e.g. LATEX, Microsoft Word, scanner), as long as it is readable.
Copyright By PowCoder代写 加微信 powcoder
Neatness Point: One point will be given for neatness. You will receive this point as long as we don’t have a hard time reading your solutions or understanding the structure of your code.
Late Submission: 10% of the total possible marks will be deducted for each day late, up to a maximum of 3 days. After that, no submissions will be accepted.
Computing: To install Python and required libraries, see the instructions on the course web page. Homeworks are to be done alone or in pairs. See the Course Information handout1 for detailed
1. [2pts] Linear Models. Recall that a function f : RD → R is linear if both of the following conditions hold.
• Forallx,y∈RD,f(x+y)=f(x)+f(y). • Forallx∈RD anda∈R,f(ax)=af(x).
Consider the hypothesis class of linear regression. That is, any predictor
y(x) = w⊤x with w ∈ RD. (0.1)
Prove that every linear regression predictor is a linear function. You may use the definition of w⊤x = j wjxj as well as any basic fact about arithmetic without proof.
2. [8pts] Robust Regression. One problem with linear regression using squared error loss is that it can be sensitive to outliers. This can be a problem if the training set does not have the same distribution as the validation or testing sets. We will explore this scenario in this question.
We can use different loss functions to make training robust to outliers. Recall that an outlier is a data point that differs significantly from other observations. In our context, we will consider a few targets t(i) that are outliers in the sense they are drawn from a conditional p(t|x) that is distinct from one used in the validation set and potentially with much larger variance. To cope with this, we will use the Huber loss, parameterized by a hyperparameter δ > 0:
Lδ(y, t) = Hδ(y − t)
12a2 if |a| ≤ δ Hδ(a)= δ(|a|−12δ) if|a|>δ
1 https://www.cs.toronto.edu/~cmaddis/courses/sta314_f21/sta314_f21_syllabus.pdf 1
STA314 Fall 2021 Homework 2
(a) [1pt] Sketch the Huber loss Lδ(y,t) and squared error loss LSE(y,t) = 21(y − t)2 for t = 0, either by hand or using a plotting library. Based on your sketch, why would you expect the Huber loss to be more robust to a label t(i) that is an outlier?
(b) [2pt] Just as with linear regression, assume a linear model without a bias: y(x) = x⊤w.
As usual, the cost is the average loss over the training set: ˆ 1 N
R=N Lδ(y(i),t(i)). i=1
Derive a sequence of vectorized mathematical expressions for the gradients of the cost (averaged over a training set) with respect to w. Recall that the inputs are organized into a design matrix
with one row per training example and recall there is no bias term. The expressions should be something you can translate into a Python program without requiring a for- loop. Your answer should look like:
y=··· ∂ Rˆ
∂y =··· ∂Rˆ
We recommend you find a formula for the derivative Hδ′ (a). Then give your answers in terms of Hδ′ (y − t), where we assume that Hδ′ is applied point-wise to the vector y − t. Remember that ∂Rˆ/∂w denotes the gradient vector,
∂ Rˆ ∂w1
∂Rˆ . ∂ w = .
(c) [2pt] We have provided Python starter code to perform gradient descent on this model and you need to write two functions. Complete the function robust_regression_grad, which computes the gradients of the robust regression model for a weight vector w. You should be able to read the expected parameters and return values from the docstring. You will want find the functions np.where, np.abs, np.dot, np.shape, and np.sign. You may submit your code as a screenshot or the completed script q2.py.
(d) [2pt] Complete the function optimization. This function initializes a weight vec-
tor at 0 and runs gradient descent for num_iterations. There is no need to modify num_iterations nor the initialization of w. You should use your function robust_regression_grad in this function. You may submit your code as a screenshot or the completed script
STA314 Fall 2021 Homework 2
(e) [1pt] We provided a script that tries 5 different δ values of the Huber loss for training and reports validation losses. For this experiment, we generated a dataset in which the training set has target values t(i) that are outliers, i.e., some small subset of training points are not i.i.d. with the validation set and the noise that we add these is much larger. You can see how we generated the data in q2_data.py.
Run the script q2.py (or the equivalent notebook). The model in the script is trained on the Huber loss using the training set, which is robust to these outliers, but we report the standard squared error loss on the validation and training sets. In sum we report:
i. the average squared error on the validation set of a linear regression model
ii. for each δ, the average squared error on the validation set of a robust regression
model trained with the Huber loss
iii. for each δ, the average squared error on the training set of a robust regression model
trained with the Huber loss
If you implemented your functions correctly, you should see that the training squared error of the robust model goes down as δ increases and approaches the loss of the linear regression (not robust) model. On the other hand, you should see that there is an optimal δ value for the validation squared error. Why do you think this is? Answer this question briefly in a few sentences.
3. [4pts] Feature Maps. A 1-D binary classification training set {(x(i),t(i))}Ni=1 with x(i) ∈ R and t(i) ∈ {0, 1} is linear separable if there exists a threshold a ∈ R such that
x(i)