CS计算机代考程序代写 APS1070 Fall 2019 Page 1 of 2

APS1070 Fall 2019 Page 1 of 2

APS1070 – Foundations of Data Analytics and Machine Learning
Midterm Examination Fall 2019

Open book
Non-programmable & non-communicating calculators are allowed
Time allotted: 90 minutes

1. We discussed K-Nearest Neighbour Classification (k-NN) in class, a simple and
intuitive way of classifying data.

a) Here are data points plotted in 2D space:

What is the predicted class of a new data point at x = 5, y = 5, using a K-

NN classifier and Euclidian distance with k = 3? (“ ”, “ ” or “ ”) [2]

b) In the dataset above, what is the predicted class of a new data point at

x = 11, y = 7, using Manhattan distance, for k = 5? (“ ”, “ ” or “ ”) [2]

c) In general, if k is increased, which of the following statements is correct [2]:

i. The K-NN decision boundary is smoothed and the noise
sensitivity is increased.

ii. The K-NN decision boundary is jagged and the noise sensitivity is
increased.

iii. The K-NN decision boundary is smoothed and the noise
sensitivity is decreased.

iv. The K-NN decision boundary is jagged and the noise sensitivity is
decreased.

y

x

+

21 0
predicted class = X

+
d=1.5

d=2

d=3

d=4
d=4

d=4d=4

Manhattan Distance

d < 4 2 circles 1 triangle tied at d = 4 3 triangles 1 circle predicted class will be either circle or triangles depending on how ties are handled. Assuming we consider all 7 ties, then X is the predicted class. Euclidean Distance 1 circle 2 x's APS1070 Fall 2019 Page 2 of 2 d) In general, if you build a k-NN classifier that achieves high accuracy on training data, but gets poor accuracy on test data, which of the following statements is most likely correct? [2] i. The model is overfitting. ii. The model is underfitting. iii. The model is neither overfitting nor underfitting. iv. The model is both overfitting and underfitting. 2. Here are four scatterplots, each expressing the relation between two variables: Rank the datasets A, B, C and D in terms of correlation coefficient, from lowest to highest [2]. 3. Here are two vectors x1 and x2: �� = � 2 1 � , �� = � 1 −2 � a) Are x1 and x2 orthogonal? [2] b) Calculate the norm of x1 and the norm of x2 [2] c) Do x1 and x2 form an orthonormal basis for vector space R2? Why? [2] 4. Calculate the inverse of matrix A by gaussian elimination. [2] � = � 1 1 0 −1 0 0 0 1 1 � 5. You build a classification model for cancer detection using an imbalanced training dataset and achieve an accuracy of 97% when testing on new data. Explain how this performance can be deceiving, and what performance metric(s) might be more appropriate. [2] neg correlation ~zero correlation pos correlation large neg correlatoin from lowest to highest: D, A, B, C x1 T x2 = 2*1 + 1*-2 = 0 yes, they are orthogonal assume Euclidian (L2) norm x1 = sqrt (x1 x1) =sqrt(5) not orthonormal, both vectors require norm of 1 1 0 0 1 0 0 0 0 1 1 0 0 0 1 0 0 0 1 1. could be outputting the majority class (i.e. no cancer) in which case it would get it right most of the time 2. confusion matrix, F1-score or ROC would be a better performance metric x2 = sqrt (x2 x2) =sqrt(5) 0 -1 0 1 1 0 -1 -1 1 -1 A T T