ECE421 Final Exam First Name: Last Name:
Student Number:
ECE 421S — Introduction to Machine Learning Final Examination
7 p.m. April 29th – 9:30 p.m. April 29th, 2021 Instructors: Ashish Khisti and Ben Liang
April 29th, 2021
Instructions
This examination is open-book. You may use any material at your disposal, but you must work alone. You are not allowed to receive any assistance from others in answering these exam questions, nor are you allowed to discuss these exam questions with anyone until after 3 a.m. on April 30.
The completed exam must be submitted as one single PDF file. On the submitted PDF file, please remember to write (1) your name and (2) whether you are in the morning or evening section.
The PDF file may be generated directly on a computer screen, or written on pieces of paper and then scanned, but your answers must be handwritten by you, unless you have pre-approved specific accessibility accommodation.
The exam time is from 7:00 p.m. to 9:30 p.m., but you will have until 10:00 p.m. to submit your PDF files. No submission past 10:00 p.m. will be accepted.
The first question, Question 1, is a pledge of academic integrity. Please follow the instruction there and perform your pledge in writing. We will not accept any exam submission without this pledge.
Clearly label your answers and show intermediate steps for partial credits. Answers without justification will not be accepted.
The value of each question is indicated. Allocate your time wisely!
In case you have any doubt about any question, please make an assumption and write it. The teaching
staff will not be available to answer questions during the exam.
Good luck!
Page 1 of 12
ECE421 Final Exam April 29th, 2021 1. You must take the following pledge by HANDWRITING it and submitting it along with your signature,
date, and city:
I, [Please enter full first name, last name, and any initials], University of Toronto student number [Please enter full student number], pledge to honour myself and my community by assuring that the work I do on this assessment fully represents my own knowledge and ideas, in line with the principles of scholarship and the University of Toronto’s Code of Behaviour on Academic Matters. I will feel proud of my work here when I am done because I know that it was my own and only mine.
SIGNATURE:
DATE:
CITY:
Page 2 of 12
5 marks
5 marks
ECE421 Final Exam April 29th, 2021
2.
(20 MARKS) Consider a binary classification problem with D = {(x1,y1),(x2,y2),(x3,y3),(x4,y4)} where the data points are in two-dimensions and labels are either +1 or -1:
x1 = (−1,−1), y1 = +1 x2 = (−1,+1), y2 = −1 x3 =(1,1), y3 =+1 x4 =(1,−1), y4 =−1
We wish to consider second-order classifiers i.e., given a data-vector x = (x1,x2) our classification rule is of the form:
yˆ=sign(b+w1x1 +w2x2 +w3x1x2 +w4x21 +w5x2)
where w = (w1 , . . . , w5 ) is the weight vector for the classifier and b denotes the bias term.
(a) Is the data-set D separable with respect to second order classifiers stated above? Justify your answer briefly
(b) We wish to find a second-order classification rule with maximum margin for D. Using the dual formulation for support vector machines (SVM) as discussed in class, write down a quadratic programming problem for computing this. Assuming that α = [α1 , . . . , αs ] denotes the vector of dual variables (with each αi ≥ 0) for some appropriately chosen s. Express your problem as minimizing a cost function L(α) subject to a constraint g(α) = 0 and specify the functions L(⋅) and g(⋅). Do not solve the optimization problem yet.
total/10
Page 3 of 12
ECE421 Final Exam April 29th, 2021
10 marks
(c) Solve the optimization problem in part (b) to find the classification rule with maximum margin. Explicitly state the classification rule by providing the associated values of w and b. Sketch your classification rule in the two-dimensional x1 − x2 plane and specify the decision regions. Also identify the support vectors and the margin associated with the classifier.
total/10
Page 4 of 12
5 marks
(a) For v = [1.5, 1, 1], w1 = [0, 1, 1] and w2 = [0, 1, −1] please sketch the output generated by hv,w1,w2(x) in the two-dimensional plane.
ECE421 Final Exam April 29th, 2021
3.
(15 MARKS) Consider a hypothesis class H that maps each vector x = (x1, x2) ∈ R2 to either +1 or −1, defined as:
H = {hv,w1,w2 ∶ v = [v0,v1,v2] ∈ R3,w1 = [w1,0,w1,1,w1,2] ∈ R3,w2 = [w2,0,w2,1,w2,2] ∈ R3} where we define the function
and furthermore:
are linear decision functions.
hv,w1,w2(x)=sign(v0 +v1 ⋅gw1(x)+v2 ⋅gw2(x))
gw1 (x) = sign (w1,0 + w1,1×1 + w1,2×2) gw2 (x) = sign (w2,0 + w2,1×1 + w2,2×2)
total/5
Page 5 of 12
ECE421 Final Exam April 29th, 2021 10 marks (b) Show that the growth function for the hypothesis class H satisfies: mH(N) ≤ (N3 + 1)3.
total/10
Page 6 of 12
5 marks
Please note that x0 = 1. Throughout assume that θ(s) = max(0, s) is the relu activation function. The variables in the network are computed as follows:
x(1) =θ(w(1) ⋅x0 +a⋅x) 1 0,1
x(1) =θ(w(1) ⋅x0 +a⋅x) 2 0,2
x(2) =a⋅x+b⋅x(1) +b⋅x(1) 112
(a) Suppose that w(1) = 1 and w(1) = −1 and a = b = 1. Find a relationship between the output x(2) 0,1 0,2 1
and the input x.
ECE421 Final Exam April 29th, 2021 4. (15 MARKS) Consider a two-layer neural network shown below where x is the input, x(1) for i = 1, 2
(2) i denote the output of the first hidden layer and x1 is the output of the neural network.
total/5
Page 7 of 12
10 marks
(b) Given an input (x, y) suppose that the error is defined as e = (x(2) − y)2. Compute the following
ECE421 Final Exam April 29th, 2021
derivatives:∂e, ∂e , ∂e ,∂e, ∂e , ∂e ,∂e. ∂b ∂x(1) ∂x(1) ∂a ∂w(1) ∂w(1) ∂x
1 2 0,1 0,2
1
total/10
Page 8 of 12
5 marks
(a) We first use the k-means algorithm, on the data features only (i.e., without considering the labels), to separate the data points into two groups, i.e., k = 2. Suppose the initial clusters are {x1,x2} and {x3,x4,x5,x6}. Write the steps of the k-means algorithm until it converges. In each step, you must clearly indicate both the clusters and the cluster centers.
ECE421 Final Exam April 29th, 2021 5. (15 MARKS) This question demonstrates how we can use unsupervised learning to perform binary
classification. Suppose our training dataset consists of the following labeled samples:
x1 = (0,1.5), y1 = −1 x2 = (0, 1.25), y2 = −1
Given a predicted label yˆi for any data point xi, the loss indicator function, i.e., ei = 1(yi ≠ yˆi).
function for classification is the usual
x3 =(0,1), x4 =(1,1), x5 =(0,0), x6 =(1,0),
y3 =+1 y4 =+1 y5 =+1 y6 =+1
total/5
Page 9 of 12
5 marks
ECE421 Final Exam April 29th, 2021 (b) After k-means clustering, the predicted label yˆ for each data point within a cluster is decided
by applying the majority rule over all of the true labels of the data points in the same cluster. What is the training error Ein for classification for the clusterig outcome from Part (a)?
5 marks
(c) Suppose we have validation dataset consisting of the following labeled samples: x7 = (0.5, 1.25), y7 = −1
x8 = (0.5,1), y8 = +1 What is the validation error for classification?
i
total/10
Page 10 of 12
5 marks
ECE421 Final Exam April 29th, 2021
6.
(15 MARKS) We wish to learn an unknown target function f(x), which in fact is f(x) = x. All data points are independently drawn, from a uniform distribution within the domain −1 ≤ x ≤ 1. Suppose the training dataset has only two samples, represented by D = {(x1,y1),(x2,y2)}, where y1 and y2 are noisy labels defined as follows:
yi =f(xi)+zi, fori=1,2,
where zi are independent noise terms modeled as Gaussian random variables with mean zero and
variance σ2.
The hypothesis set consists of all linear functions of the form h(x) = ax + b. For performance mea- surement, we consider the squared error. Therefore, given any D, our learning algorithm picks the hypothesis that minimizes the following in-sample error:
122 Ein(a,b)= 2((y1 −ax1 −b) +(y2 −ax2 −b) ).
LetgD(x)=aDx+bD representthechosenhypothesis.
(a) Given any D, express the chosen hypothesis gD(x) in terms of x1, y1, x2, and y2.
total/5
Page 11 of 12
5 marks
ECE421 Final Exam April 29th, 2021 (b) For any given data sample x, find the bias bias(x) = (g ̄(x) − f (x)), where g ̄(x) is the average
3 marks
(c) Suppose σ2 = 0. For any given data sample x, find the variance var(x), which is defined as DD2
2 marks
(d) For the case σ2 = 0, find the expected test error ED [Eout(gD(x))].
hypothesis, i.e., g ̄(x) = ED[gD(x)].
E [(g (x) − g ̄(x)) ].
total/10
Page 12 of 12