ECE421S MidTerm February 24th, 2020 First Name: Last Name:
Student Number:
ECE 421S — Introduction to Machine Learning
Circle your tutorial section:
TUT0101 Thu 3-5 (SF2202)
TUT0102 Thu 3-5 (GB304)
TUT0103 Tue 10-12 (SF2202) TUT0104 Fri 9-11 (BA1230)
Midterm Examination Monday, Febrary 24th, 2020 6:15 p.m. – 8:00 p.m.
Instructors: Ashish Khisti and Ben Liang
Instructions
Please read the following instructions carefully.
You have one hour forty-five minutes (1:45) to complete the exam.
Please make sure that you have a complete exam booklet.
Please answer all questions. Read each question carefully.
The value of each question is indicated. Allocate your time wisely!
No additional pages will be collected beyond this answer book. You may use the reverse side of each page if needed to show additional work.
This examination is closed-book; One 8.5 11 aid-sheet is permitted. A non-programmable calculator is also allowed.
Good luck!
Page 1 of 11
5 marks
ECE421S MidTerm February 24th, 2020
1.
(20 MARKS) Consider a binary linear classification problem where the data points are two dimensional, i.e., x = (x1,x2) ∈ R2 and the labels y ∈ {−1,+1}. Throughout this problem consider a data-set with the following four points:
D = {(x1, y1), (x2, y2), (x3, y3), (x4, y4)} where the input data-vectors are given by
x1 =(0,0)T, x2 =(0,1)T, x3 =(1,1)T, x4 =(1,0)T. and the associated labels are given by
y1 =+1, y2 =−1, y3 =+1, y4 =−1.
Our aim is to find a linear classification rule w0 + w1x1 + w2x2, with weight vector w = (w0, w1, w2)T ,
that classifies this dataset.
[Important: Recall that, in our learning algorithms and their analysis, we transform the data vectors
to include the constant term, i.e., x1 = (0, 0)T must be transformed to x ̃1 = (1, 0, 0)T etc. ] (a) Is there a weight vector w that satisfies the following property?
yn(wT xn) > 0 for all n ∈ {1,2,3,4}.
If your answer is yes, find such w. If your answer is no, explain why not.
total/5
Page 2 of 11
10 marks
(b) Suppose we implement the perceptron learning algorithm as discussed in class with the initial weight vector w = (3.5, 0, 1)T and the standard update rule for mis-classified points. Assume that each point that falls on the boundary is treated as a mis-classified point. The algorithm visits the points in the following order:
x1 →x2 →x3 →x4 →x1 →x2⋯
until the training error Ein(w) ≤ 1, at which time the algorithm terminates. Here the training
ECE421S MidTerm February 24th, 2020
error is defined as usual:
4
14
Ein(w) = 4 ∑ 1(yˆn ≠ yn),
n=1
where 1(⋅) is the indicator function and yˆn is the output of the classifier.
Show the output of the perceptron learning algorithm in each step and sketch the final decision boundary when the algorithm terminates.
total/-
Page 3 of 11
ECE421S
MidTerm February 24th, 2020
[continue part (b) here]
total/10
Page 4 of 11
5 marks
ECE421S MidTerm February 24th, 2020 (c) Suppose we now consider binary logistic regression to classify the points in D, with the following
sigmoid function for likelihood:
Pˆ ( y = + 1 ∣ x ) = θ ( w T x ) =
Assume that we use the log-loss function (i.e., log-likelihood) to measure training error as discussed
in class. Among the following two possible solutions for w, which one would we prefer?
w1 =(0.5,−1,−1)T w2 =(0.5,−1,0)T
ewT x
1 + ewT x
.
total/5
Page 5 of 11
10 marks
ECE421S MidTerm February 24th, 2020
2.
(20 MARKS) Consider linear regression over dataset D = {(x1, y1), (x2, y2), . . . , (xN , yN )}. Suppose the labels yn ∈ {−1,+1}, for n = 1,2,…,N. For any given model parameter w, instead of the usual squared error, we use the following loss function for each example xn:
en(w) = (max(0, 1 − ynwT xn))2 . The training error Ein(w) = 1 ∑Nn=1 en(w).
N
(a) Derive the gradient ∇Ein(w).
total/10
Page 6 of 11
ECE421S MidTerm February 24th, 2020
5 marks (b) Write the pseudo code for the Stochastic Gradient Descent method to minimize Ein(w), with minibatch size 1.
total/5
Page 7 of 11
ECE421S MidTerm February 24th, 2020
5 marks (c) What does it mean to have en(w) = 0 for some given w and xn? You should give a geometric interpretation. (Hint: consider two cases depending on the value of yn.)
total/5
Page 8 of 11
ECE421S MidTerm February 24th, 2020
3. (20 MARKS) Consider a binary linear classification problem where the data points are two dimensional, i.e., x = (x1, x2) ∈ R2 and the labels y ∈ {−1, +1}. We wish to build a multi-layer perceptron to classify the dataset as shown below, where the “+” and “-” signs indicate examples with labels +1 and −1, respectively.
x2 2+
++++++–
–1-+— —-
— -+++ – – +++1++ 2
6 marks
x1
(a) Design two perceptrons to implement the two lines shown in the figure above. The lines are
x2 = x1 and x2 = − 1 x1 + 1. To ensure uniformity for easy marking by the teaching staff, each of 2
your perceptrons must classify the region above the line to +1.
total/6
Page 9 of 11
ECE421S MidTerm February 24th, 2020
6 marks (b) Design a (single-layer) perceptron to implement the NAND function. Recall that NAND(a, b) = NOT(AND(a, b)) = OR(NOT(a), NOT(b)), where a and b are binary variables. You should use {−1,+1} to label a binary variable as shown in class.
total/6
Page 10 of 11
ECE421S MidTerm February 24th, 2020
8 marks (c) Use only the perceptrons in parts (a) and (b) to build a multi-layer perceptron to classify the dataset in the figure. You may use as many copies of these perceptrons as you need. Draw your design and clearly label all edges and weights.
total/8
Page 11 of 11