Instructions:
0. 1.
• Open book exam, feel free to use any lecture notes;
• Discussion is not permitted;
• Always give your answer and explain it (guaranteed 5 point for nonempty answer); • 20 points per problem, totally 110 points (20 * 5 + 10).
Write down your name. (10 pts)
Consider the following data points (represented by circles) in 2-dimensional space.
• Illustrate the group structure discovered by sparse subspace clustering in the left panel;
• Illustrate the clustering result of single iteration of k-means with k = 3 and initial centers
(−10, 0), (0, −10), (10, 10) in the right panel.
CS 541-B Artificial Intelligence: Final Exam
Instructor: Jie Shen 12/14/2020, 6:30 pm – 9:00 pm EST
10
5
0
-5
-10
-10 -5 0 5 10
10
5
0
-5
-10
-10 -5 0 5 10
1
2. An outlier is a data point that behaves abnormally, that its magnitude is out of control. For example, consider the age of a patient in his transcript: it is an outlier if it shows age = 2000, or if age = 25.5. Even if we observe a “normal” value of 85 is may also be an outlier provided that his actual age is 8.
Outlier may incur for a variety of reasons, e.g. human mistakes. In this case, we have to take such dirty data into algorithmic design. Consider the simple regression model in high-dimensional regime:
yi =⟨ai,wtrue⟩+ei, ∥wtrue∥0 ≤k, i=1,…,n. (0.1) In the above expression, ai ∈ Rd is the feature vector, wtrue is the groundtruth model we aim to
estimate, ei ∈ R is the possible outlier for the ith sample and yi ∈ R is the corrupted response.
• Given {ai,yi}ni=1 for sufficiently large n, say n → ∞, is it possible to learn the true model
without any prior knowledge/condition on {ei}ni=1?
• Now suppose that only n1 samples are corrupted by outliers where n1 ≪ n. In other words, among {ei}ni=1, n − n1 of them are zeros (but we do not know which of them have zero values). Give a proper formulation under which it is possible to simultaneously recover wtrue and detect the outliers.
Suppose we have a set of data from n patients as in Table 1, where for each column and each row there are some missing entries (represented by the symbol “?”). State when and how we can estimate all the missing values.
Table 1: Patient data.
3.
Age Weight Height Gender Blood Pressure · · · Sharp Pain
Patient 1 Patient 2 Patient 3 .
Patient n
?z12z13z14 ?···z1m
? z22 ? z24 z25 ··· ? z31 ? z33 ? ? ··· z3m . . . . . . .
? zn2 zn3 ? zn5 ··· ?
4. In real-world applications the positive and negative classes are often imbalanced. For example, there are a massive amount of data from airplanes that operate normally, while the number of aviation accidents is relatively few. Therefore, in order for a machine learning system to detect abnormal status, the algorithm must appreciate those negative examples (i.e. the data from flight recorder). Given a set of training samples {(xi, yi)}ni=1 ⊂ Rd × {+1, −1}, state how you will design a learning algorithm for such abnormality detection system.
5. Consider two functions F1(w) and F2(w): both of them are strongly convex, but F1 is smooth and F2 is non-smooth. Suppose we apply GD to optimize these two functions. The following figure shows two convergence curves: a solid line and a dashed line. One is for F1(wt) and another for F2(wt).
• Explain which curve may correspond to F2; 2
• Plot another possible convergence curve of applying GD to optimize F2 with a different initial iterate or learning rate.
1 0.8 0.6 0.4 0.2 0
5 10 15 20 t
3
F(wt)