CS计算机代考程序代写 decision tree 10-301/10-601 Fall 2020 Midterm 1 Practice Problems

10-301/10-601 Fall 2020 Midterm 1 Practice Problems
Solutions
1 K-Nearest Neighbors
1. Select all that apply: Please select all that apply about k-NN in the following options:
􏰆 k-NN works great with a small amount of data, but it is too slow when the amount of data becomes large.
􏰆 k-NN is sensitive to outliers; therefore, in general, we decrease k to avoid overfitting.
􏰆 k-NN can only be applied to classification problems, and it cannot be used to solve regression problems.
􏰆 We can always achieve zero training error (perfect classification) on a consistent data set with k-NN, but it may not generalize well in testing.
Option 1: True: Curse of dimensionality
Option 2: False: we increase k to avoid overfitting Option 3: False: K-NN regression
Option 4: True: by setting k = 1
2. (1 point) Select one: A k-Nearest Neighbor model with a large value of k is analogous to…
⃝ A short Decision Tree with a low branching factor
⃝ A short Decision Tree with a high branching factor
⃝ A long Decision Tree with a low branching factor
⃝ A long Decision Tree with a high branching factor
A short Decision Tree with a low branching factor
3. (1 point) Select one. Imagine you are using a k-Nearest Neighbor classifier on a data set with lots of noise. You want your classifier to be less sensitive to the noise. Which is more likely to help and with what side-effect?
⃝ Increase the value of k → Increase in prediction time ⃝ Decrease the value of k → Increase in prediction time ⃝ Increase the value of k → Decrease in prediction time ⃝ Decrease the value of k → Decrease in prediction time

10-301/10-601 Midterm 1 Practice Problems – Page 2 of 7
Increase the value of k → Increase in prediction time

10-301/10-601 Midterm 1 Practice Problems – Page 3 of 7
2 Model Selection and Errors
1. Training Sample Size: In this problem, we will consider the effect of training sample
size N on a linear regression problem with M features.
The following plot shows the general trend for how the training and testing error change as we increase the training sample size N. Your task in this question is to analyze this plot and identify which curve corresponds to the training and test error. Specifically:
1. Which curve represents the training error? Please provide 1–2 sentences of justification. Curve (ii) is the training set. Training error increases as the training set increases in size (more points to account for). However, the increase tapers out when the model generalizes well. Evidently, curve (i) is testing, since larger training sets better form generalized models, which reduces testing error.
2. In one word, what does the gap between the two curves represent? Overfitting

10-301/10-601 Midterm 1 Practice Problems – Page 4 of 7
3 Linear Regression
1. (1 point) Select one: The closed form solution for linear regression is θˆ = (XT X)−1XT y. Suppose you have N = 35 training examples and M = 5 features (excluding the bias term). Once the bias term is now included, what are the dimensions of X, y, θˆ in the closed form equation?
⃝ Xis35×6,yis35×1,θˆis6×1 ⃝ Xis35×6,yis35×6,θˆis6×6 ⃝ Xis35×5,yis35×1,θˆis5×1 ⃝ Xis35×5,yis35×5,θˆis5×5
A.
2. Consider linear regression on N 1-dimensional points x(i) ∈ R with labels y(i) ∈ R. We apply linear regression in both directions on this data, i.e., we first fit y with x and get y=β1xasthefittedline,thenwefitxwithyandgetx=β2yasthefittedline. Discuss the relations between β1 and β2:
True or False: The two fitted lines are always the same, i.e. we always have β2 = 1 . β1
⃝ True
⃝ False
False.β =xTy andβ =yTx
1xTx 2yTy
3. Please circle True or False for the following questions, providing brief explanations to
support your answer.
(i) [3 pts] Consider a linear regression model with only one parameter, the bias, ie., y = b. Then given N data points (x(i), y(i)) (where x(i) is the feature and y(i) is the output), minimizing the sum of squared errors results in b being the median of the y(i) values.
Circle one: True False Brief explanation:
False. 􏰅Ni=1(y(i) −b)2 is the training cost, which when differentiated and set to zero gives b = 􏰅Ni=1 y(i) , the mean of the y(i) values.
N
(ii) [3 pts] Given data D = {(x(1), y(1)), …, (x(N), y(N))}, we obtain θˆ, the parameters that minimize the training error cost for the linear regression model y = θT x we learn from D.
Consider a new dataset Dnew generated by duplicating the points in D and adding 10 points that lie along y = θˆT x. Then the θˆnew that we learn for y = θT x from

10-301/10-601 Midterm 1 Practice Problems – Page 5 of 7 Dnew is equal to θˆ.
Circle one: True False Brief explanation:
True. The new squared error can be written as 2ε1 +ε2, where ε1 is the old squared error. ε2 = 0 for the 10 points that lie along the line, the lowest possible value for ε2. And 2ε1 is least when ε1 is least, which is when the parameters don’t change.
4. We have an input x and we want to estimate an output y using linear regression.
Consider the dataset S plotted in Fig. 1 along with its associated regression line. For each of the altered data sets Snew plotted in Fig. 2, indicate which regression line (relative to the original one) in Fig. 3 corresponds to the regression line for the new data set. Write your answers in the table below.
Dataset Regression line
Dataset Regression line
(a) (b) (c) (d) (e)
(a) (b) (c) (d) (e)
(b) (c) (b) (a) (a)
Figure 1: An observed data set and its associated regression line.

10-301/10-601 Midterm 1 Practice Problems – Page 6 of 7
(a) Adding one outlier to the (b) Adding two outliers to the original data set. original data set.
(c) Adding three outliers to the original data set. Two on one side (d) Duplicating the original data set. and one on the other side.
(e) Duplicating the original data set and adding four points that lie on the trajectory of the original regression line.
Figure 2: New data set Snew.

10-301/10-601 Midterm 1 Practice Problems – Page 7 of 7
Figure 3: New regression lines for altered data sets Snew.