Winter 2022 Midterm APS1070
University of Toronto
Faculty of Applied Science and Engineering
Midterm – Limited 2.5-Hour Window
Copyright By PowCoder代写 加微信 powcoder
(Feb 15 at 9:00am to Feb 16 at 3:00pm)
APS1070 – Foundations of Data Analytics and Machine Learning
Examiner: and Sinisa Colic
Please read the instructions and honour agreement below carefully.
Family Name(s): Given Name(s): Student Number:
Instructions to Candidate:
This exam paper has 8 pages (including this one) Duration: 2.5 hours timed (1.5 hours to answer questions) Maximum mark is 14% of final grade
Show important steps leading to final answers
Late submissions will receive a grade of 0
Read over and sign the honour agreement
In submitting this assessment, I confirm that my conduct during this take-home exam adheres to the Code of Behaviour on Academic Matters. I confirm that I have not acted in such a way that would constitute cheating, misrepresentation, or unfairness, including but not limited to, using unauthorized aids and assistance, impersonating another person, and committing plagiarism. I pledge upon my honour that I have not violated the Faculty of Applied Science & Engineering’s during this assessment.
Question Mark
Engineering :
Page 1 of 8
Winter 2022 Midterm APS1070 Question 1. [5 MARKS]
Part (a) [1 mark]
What is the computational complexity of the following sample code?
ii. O(sqrt(n))
iii. O(logn)
iv. O(nlogn)
Part (b) [1 mark]
We would like to use 3-Nearest Neighbours to classify point p using the data to the right. What is our prediction if we use cosine similarity distance? Euclidean distance?
i. Cosine distance: O, Euclidean distance: O
ii. Cosine distance: X, Euclidean distance: O
iii. Cosine distance: O, Euclidean distance: X
iv. Cosine distance: X, Euclidean distance: X
Part (c) [1 mark]
Which of the following statements regarding Decision Trees is false?
i. Performance is affected by rotations.
ii. Nonparametric instance-based learning algorithm.
iii. It has a strong tendency to overfit the training data.
iv. Standardization can help improve performance.
Part (d) [1 mark]
Which of the following about a high bias model (in the context of bias-variance tradeoff) is true, compared to a high variance model?
i. A high bias model is more prone to overfitting.
ii. A high bias model requires less training data to train.
iii. A high bias model will have a lower training accuracy.
iv. Both (ii) and (iii) are true.
Part (e) [1 mark]
Which of the following matrices performs a mirroring (reflection) and scaling transformation?
𝒊)𝑨= [1 0] 𝒊𝒊)𝑨= [0 −2] 𝒊𝒊𝒊) 𝑨= [−2 2] 𝒊𝒗)none 0−2 20 22
while a > 1:
Page 2 of 8
Winter 2022 Midterm APS1070
Question 2. [5 MARKS]
Select either “True” or “False” for each of the below statements.
False False
False False
Decisions trees are more appropriate for anomaly detection than KNNs. In general, a mixture of gaussian model will perform better if you restrict
the covariance matrix to be diagonal.
Worst case running-time for retrieval in hash tables is constant time.
From the three vectors above, 𝑥1 has the smallest L1 norm. None of the pairs of above vectors are orthogonal
−1 1 1 220
Answer the questions pertaining to the three vectors: 𝑥1 = 1 , 𝑥2 = 0 , 𝑥3 = 0
[−1] [−1] [ 4] 120
Page 3 of 8
Winter 2022 Midterm APS1070 Question 3. [8 MARKS]
Provided below is sample code for k-means clustering. You may assume all the necessary libraries are included and that there are no syntax errors.
def kmeans(x, k, n_iter):
ind = np.random.randint(0, len(x)-1, k) centroids = x[ind, :]
distances = compute_distances(x, centroids)
labels = np.array([np.argmin(i) for i in distances]) for _ in range(n_iter):
centroids = []
for ind in range(k):
cent = x[points==ind].mean(axis=0)
centroids.append(cent)
centroids = np.vstack(centroids)
distances = compute_distances(x, centroids)
labels = np.array([np.argmin(i) for i in distances])
return (labels, centroids)
Part (a) [3 mark]
Fill in the compute_distances function to obtain the Euclidean distance between all points and k clusters.
def compute_distances(x, centroids):
Page 4 of 8
Winter 2022 Midterm APS1070
Part (b) [3 mark]
Determine the asymptotic complexity of kmeans in terms of big O notation for a dataset with n samples of d features and k centroids over a fixed number of iterations. Show your work.
Part (c) [2 mark]
What are two key things would you add and/or change in the kmeans function above to ensure that we converge to a global minimum? You may provide your answer(s) in/next to the code above. Assume the number of centroids, k, is fixed.
Page 5 of 8
Winter 2022 Midterm APS1070 Question 4. [6 MARKS]
You have a binary classifier that uses a series of features to predict the probability of a transaction being fraud. The model prediction and correct label are provided below for 30 test data.
Part (a) [1 mark]
At a standard threshold of 0.5, what is the accuracy of this classification model?
Part (b) [1 mark]
Indicate the threshold in the diagram above for which you would achieve the best accuracy on the classification task. Then compute the score.
Part (c) [1 mark]
Indicate the threshold in the diagram above for which you would achieve the best precision on the classification task. Then compute the score.
Part (d) [1 mark]
Indicate the threshold in the diagram above for which you would achieve the best recall on the classification task. Then compute the score.
Part (e) [2 mark]
Indicate the threshold in the diagram above for which you would achieve the best F1-score. Then compute the score.
Page 6 of 8
Winter 2022 Midterm APS1070
Question 5. [5 MARKS] You have the following dataset:
X1 1010110011 X2 -1 2 2 -1 -1 -1 2 2 2 -1
Part (a) [3 marks]
Calculate the covariance matrix of X. Show your work.
Part (b) [1 mark]
Calculate the correlation coefficient.
Part (c) [1 mark]
Determine if the variables X1 and X2 are independent?
Page 7 of 8
Winter 2022
Question 6. [6 MARKS]
Given that V ∊ R2 with basis vectors 𝑣1 = [1 1]𝑇 and 𝑣2 = [2 1]𝑇. Answer the following questions
given their coordinates in two subspaces U and W are:
• [𝑣1]𝑊 = [7 1]𝑇, [𝑣2]𝑊 = [13 3]𝑇
• [𝑣1]𝑈 = [−1 25]𝑇 [𝑣2]𝑈 = [−6 22]𝑇
Part (a) [4 marks]
Compute the transformation matrix 𝐴𝑈→𝑊.
Part (b) [2 mark]
If [𝑣3]𝑊 = [1 − 4]𝑇, calculate [𝑣3]𝑈.
Page 8 of 8
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com