CS代考计算机代写 algorithm ECE421S MidTerm First Name: Last Name:

ECE421S MidTerm First Name: Last Name:
Student Number:
ECE 421S — Introduction to Machine Learning MidTerm Examination
Friday March 1st, 2019 6:15 p.m. – 7:55 p.m.
Instructors: Amir Ashouri, Ashish Khisti and Ben Liang Circle your tutorial section:
• TUT0101 Thu 9-11 (GB120)
• TUT0102 Wed 10-12 (MY380) • TUT0103 Thu 9-11 (GB404)
• TUT0104 Thu 4-6 (MP203)
• TUT0105 Fri 12-2 (RS211)
• TUT0106 Fri 12-2 (SS2106)
• TUT0107 Wed 10-12 (UC144) • TUT0108 Fri 4-6 (MP134)
March 1st, 2019
Instructions
• Please read the following instructions carefully.
• You have 1 hour forty minutes (1:40) to complete the exam.
• Please make sure that you have a complete exam booklet.
• Please answer all questions. Read each question carefully.
• The value of each question is indicated. Allocate your time wisely!
• No additional pages will be collected beyond this answer book. You may use the reverse side of each page if needed to show additional work.
• This examination is closed-book; One 8.5 × 11 aid-sheet is permitted. A non-programmable calculator is also allowed.
• Good luck!
Page 1 of 10

10 marks
D = {(x1, y1), (x2, y2), (x3, y3)} where the input data-vectors are given by:
x1 =(1,0)T, x2 =(0,1)T, x3 =(−1,0)T. and the associated labels are given by
y1 =+1, y2 =+1, y3 =−1.
Our aim is to find a linear classification rule: w0 + w1x1 + w2x2 with weight vector w = (w0, w1, w2)T
that classifies this dataset.
(a) Suppose we implement the perceptron learning algorithm as discussed in the class with the initial weight vector w = (0, 0, 0)T and the standard update rule for mis-classified points. Assume that each point that falls on the boundary is treated as a mis-classified point and the algorithm visits the points in the following order:
x1 →x2 →x3 →x1 →x2⋯
until it terminates. Show the output of the perceptron algorithm in each step and sketch the final decision boundary when the algorithm terminates. What is the distance between the decision boundary to the closest data vector in D?
[Important: When applying the perceptron update, recall that you have to transform the data vectorstoincludetheconstanttermi.e.,x1=(1,0)T mustbetransformedtox ̃1=(1,1,0)T etc.]
ECE421S MidTerm March 1st, 2019
1. (20 MARKS) Consider a binary linear classification problem where the data points are two dimensional, i.e., x = (x1,x2) ∈ R2 and the labels y ∈ {−1,1}. Throughout this problem consider the data-set with
following three points:
total/-
Page 2 of 10

ECE421S
MidTerm March 1st, 2019
[continue part (a) here]
total/10
Page 3 of 10

ECE421S
MidTerm March 1st, 2019 (b) Suppose we modify the perceptron algorithm as follows: At iteration t suppose that wt =
10 marks
(wt,wt,wt) is the present value of the weight vector and (xt,yt) is the training sample from
D selected. Let us express x = (x ,x ). We perform the standard update to w if any of the 12
012
ttt t
following two conditions are satisfied:
• The training point (xt,yt) is mis-classified with respect to wt (or lies on the decision bound- ary)
• The weight vector wt and the training point (xt,yt) are such that we have: yt(wt +wtxt +wtxt) 1
order:
√0 11 22≤.
(w1t )2 + (w2t )2
x1 →x2 →x3 →x1 →x2⋯
2
Assume that we initialize w0 = (0, 0, 0) and the algorithm visits the points in D in the following
until it terminates. Show the output of this algorithm in each step and sketch the final decision boundary when the algorithm terminates. What is the distance between the decision boundary to the closest data vector in D?
total/10
Page 4 of 10

ECE421S
MidTerm March 1st, 2019
[continue part (b) here]
total/-
Page 5 of 10

10 marks
(a) Find the gradients of eSM(w(1),w(2)) with respect to w(1) and w(2). (Note that you should n
ECE421S MidTerm March 1st, 2019
2. (20 MARKS) Suppose we use a multi-class softmax regression model to classify input data vectors x ∈ Rd+1 (including bias) with two possible class labels y ∈ {1, 2}. Let w(1) and w(2) be the weight vectors for classes 1 and 2, respectively. For any input x, we hypothesize that the probability of x belonging to class i is
w(i)T x
PˆSM(i∣x)≜ e , fori∈{1,2}.
ew(1)T x + ew(2)T x
Furthermore, for any given training example (xn,yn), we define the loss function as
always consider the two possible values of yn.)
eSM(w(1),w(2))=−logPˆSM(y ∣x ). nnn
total/10
Page 6 of 10

4 marks
(b) Suppose instead of the above softmax regression model, we use binary logistic regression to learn whether or not input x should be labelled class 1. We hypothesize that x belongs to class 1 with probability
ECE421S MidTerm March 1st, 2019
(xn,yn), we define the loss function as
eLR(w)=−logPˆLR(y ∣x ).
Find a relationship between (w(1⎧⎨⎪),w(2)) and w, so that we have ⎪⎩ˆ ˆ
wT x PˆLR(1∣x) ≜ e ,
1 + ewT x
and x belongs to class 2 with probability PˆLR(2∣x) = 1−PˆLR(1∣x). For any given training example
nnn
⎪P SM(1∣x) = P LR(1∣x) , PˆSM(2∣x) = 1 − PˆLR(1∣x) .
total/4
Page 7 of 10

6 marks
ECE421S MidTerm March 1st, 2019
(c) Given (w(1),w(2)) and w as described Part (b), we apply SGD to separately train the above softmax regression model and binary logistic regression model, with constant learning rates εSM and εLR, respectively. For both models, all weights are initialized to zero, and we use the same random seed so that in each iteration of SGD the same random training example is selected. Find
iterations of SGD.
a relationship between εSM and εLR, so that eSM(w(1),w(2)) and eLR(w) are identical in all
nn
total/6
Page 8 of 10

ECE421S MidTerm March 1st, 2019 3. (10 MARKS) Assume two logical inputs (they can be either 0 or 1) as the following:
x1 0 1 0 1 x2 0 0 1 1
They are input to our single-layer model shown below:
ν y = φ(ν) Output x2 w2
x1 w1
4 marks
⎪⎩w2 = 1 ⎪⎩0 otherwise
(a) Given the 4 different set of inputs that x1 and x2 can have, calculate the output of the unit and
3 marks
mention what function can be represented by this unit?
(b) Suggest on how to change the threshold levels (ν) of the activation function to implement the following function (we will use the same weights as before):
x1 0101
x2 0011 g(x1,x2) 0 1 1 1
(c) Can the following function shown below be implemented by a single unit (one set of inputs and an activation function)? Explain why?
x1 0101
x2 0011 z(x1,x2) 0 1 1 0
where our weights and activation function are defined as following: weights = ⎧⎪⎨⎪w1 = 1 φ(ν) = ⎧⎪⎨⎪1 if ν ≥ 2
3 marks
total/10
Page 9 of 10

ECE421S MidTerm March 1st, 2019
4. (10 MARKS) Consider a single-layer Neural Network with three inputs (no biases), one activation function (ReLU), and one output xL. All the weights and input values are initialized and shown below. Assume the error is calculated using squared-error method as: e(Ω) = (xL −y)2 where Ω = (ω, ν) denotes the weights in the neural network. and xL and y are the output of network (prediction value) and the true label, respectively. ReLU is also defined as following:
θ(S) = max(0,S) = ⎧⎪⎨⎪S if S ≥ 0
⎩0 otherwise Input
layer
x1 −1
x22w=1∑sθ xL
4 marks
x3 1
(a) Given a training example (x1,x2,x3) = (−1,2,1) and with weights (ω,ν) = (1,2) shown above
6 marks
(b) Calculate the back-propagation pass, i.e., compute ∂e , ∂e for the input value x and (ω, ν) = (1, 2). ∂w ∂v
and true value y = 2, calculate xL and e(Ω) using the Forward Pass algorithm.
total/10
Page 10 of 10
w=1
v=2