CS计算机代考程序代写 algorithm Instructions:

Instructions:
• The exam has three problems.
• The exam is open book.
ELEC 400M/ EECE 571M – FinalExam
26 April 2021
• Please write your answers to the problems on your own answer booklet and upload it electronically through the Canvas assignment submission. Please use a common file format, such as pdf, jpeg, png, etc. for upload.
• You have 90 minutes to solve the problems. You have an extra 10 minutes to deal with the logistics of creating a version for upload and executing the upload.
• For all answers, show all steps.
• Sub-tasks that can be solved independently of previous sub-tasks are identified through “−→”
1

1. (Multilayer) perceptron
We consider (D + 1)-dimensional feature vectors xn ∈ {1} × RD and binary targets yn ∈ {−1, +1},
n = 1,2,…,N.
(a) We first consider the linear model
yˆn =sign(wTxn) with the D + 1 parameters w = [w0, w1, . . . , wD]T .
To train this model, we apply the error measure
en(w) = max 􏰀0, (1 − ynwT xn)3􏰁 .
i. (3 marks) Determine the relationship between the error measure en and the decision error
the steps of algorithm (write it as pseudo-code).
(b) We now assume D = 2 and N = 4, i.e., we have four data samples. The four samples are given by
(x1 = [1,0,0]T,y1 = −1), (x2 = [1,1,0]T,y2 = +1), (x3 = [1,0,1]T,y3 = +1), (x4 = [1,1,1]T,y4 = −1).
−→ i. (3 marks) Denoting xn = [1, xn1, xn2]T , sketch the four data points in the (xn1, xn2)-coordinate system. Identify data points with different labels using different markers.
Draw decision boundaries that consist of straight lines and identify the corresponding decision regions such that a classifier would correctly decide on those four data points.
ii. (5 marks) Sketch a two-layer perceptron that would realize the decision boundaries as drawn in the previous sub-problem and thus would correctly classify those four data points. Label the necessary details and briefly explain how it implements the decision boundaries.
(You do not need to determine the numerical values for weights that determine the slope and intersept of straight lines.)
2. Validation
We partition a data set D into a training data set Dtrain and a validation data set Dval with Dtrain ∪ Dval = D and Dtrain ∩ Dval = ∅. We denote the model obtained by training on Dtrain as g− and the corresponding validation error as Eval(g−).
(a) We first use Eval(g−) to estimate the out-of-sample error Eout(g−).
−→ i. (3 marks) Is Eval(g−) an unbiased estimate of Eout(g−)? Show why or why not.
−→ ii. (2 marks) State a bound for Eout(g−) in terms of Eval(g−) and explain the variables in this bound. (b) We now use the validation error to decide on hyperparameters (say m ∈ {1, . . . , M }) of the model.
−→
nn
−→ ii. (5 marks) Derive the stochastic gradient descent algorithm for updating w based on en. State
−→ −→
We denote the model with the selected hyperparameter as gm−∗.
i. (3 marks) Is Eval(gm−∗ ) an unbiased estimate of Eout(gm−∗ )? Why or why not? If not, does Eval(gm−∗ )
tend to overestimate or underestimate the out-of-sample error?
ii. (2 marks) Could a test data set be used to determine the model hyperparameters, i.e., m∗? Why or why not?
′ 􏰂1, ifyn̸=yˆn en= 0,ify=yˆ.
2

3. Expectation-maximization method
We assume that observed samples x = [x1, . . . xN ] of D-dimensional binary data xn = [xn1, . . . , xnD] ∈ {0, 1}D have been generated independently from the K-mixture distribution with probability mass function (pmf )
K
P(xn|w,μ) = 􏰄wjPj(xn|μj),
j=1
which has K component distributions
P (x |μ )=􏰅μxni(1−μ )1−xni
D
jnj ji ji
i=1
and parameter vectors w = [w1,…,wK] and μ = [μ1,…,μK].
wj > 0 is the weight for the jth component distribution and 􏰃Kj=1 wj = 1. Furthermore, μj = [μj1, . . . , μjD] with μji ∈ [0, 1].
The interpretation of the mixture distribution is that xn is drawn from the jth component distribution with probability wj.
Let us assume that K = 1.
ii. (4 marks) Using P(x|μ), determine the maximum likelihood estimate for μ as a function of x.
(It is likely useful to consider the logarithm of probabilities.)
Now we consider a general value for K and want to develop the expectation-maximization (EM) algorithm to estimate the parameters of the pmf.
For this, we introduce the vector j = [j1, . . . , jN ] of unobserved data, where jn = k if xn is drawn from the kth component distribution.
i. (4 marks) Determine the “complete log-likelihood” log (P (x, j|w, μ)) of the “complete data” (x,j).
(a)
−→ i. (1 mark) State the pmf P (x|μ).
(b)
−→
−→ ii. (4 marks) Determine the probability
γnk =P(jn =k|xn,wˆ,μˆ)
for given parameter estimates wˆ and μˆ.
−→ iii. (4 marks) Following the steps for the EM algorithm as presented in the course, we can express
the updated parameter estimates as
N
μ∗k = argmax 􏰄 γnk log (Pk(xn|μk)) ,
μk n=1
Solve the optimization for μ∗k and express the solution in terms of observed data x and the
probabilities γnk.
−→ (c) (2 marks) Describe how the EM algorithm can be used to categorize samples xn according to the
component distribution they have been drawn from?
N
􏰃 γnk
wk∗ = n=1 . N
3