ECE421 Final Exam First Name: Last Name:
Student Number:
ECE 421S — Introduction to Machine Learning Final Examination
6 a.m. April 22nd – 6 a.m. April 23rd, 2020 Instructors: Ashish Khisti and Ben Liang
April 22nd, 2020
Instructions
The exam starts at 6:00 a.m. on April 22nd, EDT (UTC-04:00). You must submit your completed exam before 6:00 a.m. on April 23rd, EDT (UTC-04:00).
The completed exam must be submitted as a PDF file. You have the following options:
1. Write on your screen directly in the provided space and submit the resultant file in PDF;
2. Print out this file, write in the provided space with a pen or pencil, scan all pages into a PDF file, and submit that PDF file;
3. Write your answers on blank sheets of paper (clearly labeling all parts), scan all pages into a PDF file, and submit that PDF file.
Please submit the same PDF file twice: (1) on Quercus and (2) by email to ece421uoft@gmail.com.
This examination is open-book. You may use any material at your disposal, including material from the Internet, but you must work alone. You are not allowed to receive any assistance from others in answering these exam questions, nor are you allowed to discuss these exam questions with anyone until after 6:00 a.m. on April 23rd.
Before you start this exam, you must take the following pledge by handwriting it in the space provided below (or on the first page of your submitted PDF file):
I, [Please enter full first name, last name, and any initials], University of Toronto student number [Please enter full student number], pledge to honour myself and my community by assuring that the work I do on this assessment fully represents my own knowledge and ideas. I will feel proud of my work here when I am done because I know that it was my own and only mine.
After you have completed this exam, in the space below (or on the last page of your submitted PDF file), you must choose one of the following two declarations applicable to you and provide your signature.
– I confirm that the work I’ve done here is my own and no one else’s, in line with the principles of scholarship and the University of Toronto’s Code of Behaviour on Academic Matters.
– I regret that I violated the Code of Behaviour on Academic Matters on this assessment and would like to admit that now so that I can take responsibility for my mistake.
SIGNATURE: DATE: LOCATION:
Page 1 of 11
5 marks
(a) Using the given data samples, write down a quadratic-programming problem to find the maximum- margin classifier. You should express your optimization as minimizing an objective function f(w,b) subject to constraints of the form gn(w,b) ≥ 1, for n = 1,2,…,6, and specify the functions f(⋅) and gn(⋅).
ECE421 Final Exam April 22nd, 2020
1. (15MARKS)Considerbinarylinearclassifiersoftheformyˆ=sign(w1x1+w2x2+b),wherew=(w1,w2) is the weight vector and b is the bias, and data vectors x ∈ R2 and the labels y ∈ {−1, +1}. Our training dataset consists of the following labeled samples:
x1 =(1,0), x2 =(2,0), x3 =(0,1), x4 =(−1,0), x5 =(−2,0), x6 =(0,−1),
y1 =+1 y2 =+1 y3 =+1
y4 =−1 y5 =−1 y6 =−1
total/5
Page 2 of 11
5 marks
ECE421 Final Exam April 22nd, 2020 (b) Find the maximum-margin classifier and specify the support vectors. It may help to first draw a
figure to observe the position of the data samples on the 2D plane.
5 marks
(c) Write the Lagrange function using αn , n = 1, 2, . . . , 6 as the Lagrange multipliers. Which Lagrange multipliers must be zero? Explain why.
total/10
Page 3 of 11
ECE421 Final Exam April 22nd, 2020
5 marks
2. (15 MARKS) Consider a quadratic learning model over data x ∈ R, with hypotheses of the form h(x) = a2x2 + a1x + a0, where ai ∈ {−1, 0, 1} for all i. The model is trained using N independent and identically distributed data samples.
(a) Given any arbitrarily chosen hypothesis h, suppose we wish to bound the generalization error such that ∣Eout(h) − Ein(h)∣ ≤ 0.1 with at least 95% probability. Use the Hoeffding bound (as given in the textbook) to compute the required number of training samples N.
total/5
Page 4 of 11
5 marks
ECE421 Final Exam April 22nd, 2020 (b) Let g be the chosen hypothesis of a learning algorithm using a training dataset with N = 400. Use
the union bound to find a lower bound on the probability that ∣Eout(g) − Ein(g)∣ ≤ 0.1.
5 marks
(c) Suppose the training data samples were not independent. Which part(s) of your calculation in Parts (a) and/or (b) would need to be changed? (You do not need to explain how to change any part.)
total/10
Page 5 of 11
5 marks
3. (20 MARKS) Consider binary classification with data vectors x = (x1,x2) ∈ R2 and the labels y ∈ {−1,+1}. Let Hv be the hypothesis set that contains all vertical lines, such that each hypothesis is of the form yˆ = sgn(x1 − a), where a ∈ R. Let Hh be the hypothesis set that contains all horizontal lines, such that each hypothesis is of the form yˆ = sgn(−x2 + b), where b ∈ R. (Hint: note the subtle change of signs before x1 and x2.)
(a) Find the growth functions mHv (N ) and mHh (N ), for hypotheses sets Hv and Hh respectively.
5 marks
(b) Find the VC dimensions dVC(Hv) and dVC(Hh).
ECE421 Final Exam April 22nd, 2020
total/10
Page 6 of 11
ECE421 Final Exam April 22nd, 2020
10 marks (c) Suppose we now merge the two hypotheses sets into one, such that H = Hv ∪ Hh. Find mH(N) and dVC(H).
total/10
Page 7 of 11
5 marks
Let gD(x) be the chosen hypothesis given D.
(a) For any given data sample x, find gD(x) and the average hypothesis, i.e., g ̄(x) = E [gD(x)].
D
ECE421 Final Exam April 22nd, 2020
4.
(15 MARKS) We wish to learn an unknown target function f(x), which in fact is cos(πx), within the domain −1 ≤ x ≤ 1. Suppose the training dataset has only two samples, represented by D = {(x1,y1),(x2,y2)}, where x1 and x2 are each independently drawn from a uniform distribution over the interval [−1,1], and y1 and y2 are the true labels corresponding to x1 and x2, respectively.
The hypothesis set consists of all constant functions of the form h(x) = b. For performance mea- surement, we consider the squared error. Therefore, our learning algorithm picks the hypothesis that minimizes the following in-sample error:
Ein(b)= 1((y1 −b)2 +(y2 −b)2). 2
total/5
Page 8 of 11
5 marks
ECE421 Final Exam April 22nd, 2020 (b) For any given data sample x, compute the bias and variance components of the average squared
error ∆(x) = E [(gD(x) − f(x))2]. D
5 marks
(c) Compute the expected test error ED [Eout(gD(x))].
total/10
Page 9 of 11
ECE421 Final Exam April 22nd, 2020
5.
(15 MARKS) Suppose we use leave-one-out cross-validation with SVM for binary classification. Our dataset contains N independent and identically distributed data samples, denoted by
D = {(x1,y1),(x2,y2),…(xN,yN)}.
Let en = I(gn−(xn) ≠ yn) be the cross-validation error due to (xn, yn), where I(⋅) is the indicator function and gn−(⋅) is the chosen hypothesis based on a training dataset with xn removed, i.e., the set D/{xn}. Then the overall cross-validation error is
1N ECV = N ∑ en.
n=1
(a) Explain why ECV ≤ K , where K is the number of support vectors. N
10 marks
total/10
Page 10 of 11
5 marks
ECE421 Final Exam April 22nd, 2020 (b) Explain why we have the following bound for the expected test error:
K ̄ ED [Eout(gD(x))] ≤ N + 1,
where K ̄ is the expected number of support vectors in a dataset with N + 1 independent samples each generated from the same distribution as a sample in D.
total/5
Page 11 of 11