CS代考 IN2064 / Endterm Date: Thursday 17th February, 2022

Data Analytics and Machine Learning Department of Informatics
Technical University of Munich
Place student sticker here
Exam: Examiner:

Copyright By PowCoder代写 加微信 powcoder

• Duringtheattendancecheckastickercontainingauniquecodewillbeputonthisexam. • This code contains a unique number that associates this exam with your registration
• Thisnumberisprintedbothnexttothecodeandtothesignaturefieldintheattendance
check list.

IN2064 / Endterm Date: Thursday 17th February, 2022
Prof. Dr. ̈nnemann Time: 17:00 – 19:00
Working instructions
• This graded exercise consists of 52 pages with a total of 11 problems and four versions of each problem.
Please make sure now that you received a complete copy of the graded exercise.
• Use the problem versions specified in your personalized submission sheet on TUMExam. Different problems may have different versions: e.g. Problem 1 (Version A), Problem 5 (Version C), etc. If you solve the wrong version you get zero points.
• The total amount of achievable credits in this graded exercise is 96.
• This document is copyrighted and it is illegal for you to distribute it or upload it to any third-party websites.
• Do not submit the problem descriptions (this document) to TUMexam
• You can ignore the “student sticker” box above.
– Page 1 / 52 –

Problem 1: Probabilistic inference (Version A) (10 credits)
Consider the the following probabilistic model:
􏲙 αλα P(θ | λ,α) = θα+1
if λ ≤ θ otherwise
if 0 ≤ x ≤ θ
with λ > 0,α > 0 and a set of observations D = {x1,…,xN} consisting of N samples xi ∈ R+ generated from
the above probabilistic model.
Derive the posterior distribution P(θ | D, λ, α).
0 otherwise
– Page 2 / 52 –

Problem 1: Probabilistic inference (Version B) (10 credits)
Consider the the following probabilistic model:
􏲙 αλα P(θ | λ,α) = θα+1
if λ ≤ θ otherwise
if 0 ≤ x ≤ θ
with λ > 0,α > 0 and a set of observations D = {x1,…,xN} consisting of N samples xi ∈ R+ generated from
the above probabilistic model.
Derive the posterior distribution P(θ | D, λ, α).
0 otherwise
– Page 3 / 52 –

Problem 1: Probabilistic inference (Version C) (10 credits)
Consider the the following probabilistic model:
􏲙 αλα P(θ | λ,α) = θα+1
if λ ≤ θ otherwise
if 0 ≤ x ≤ θ
with λ > 0,α > 0 and a set of observations D = {x1,…,xN} consisting of N samples xi ∈ R+ generated from
the above probabilistic model.
Derive the posterior distribution P(θ | D, λ, α).
0 otherwise
– Page 4 / 52 –

Problem 1: Probabilistic inference (Version D) (10 credits)
Consider the the following probabilistic model:
􏲙 αλα P(θ | λ,α) = θα+1
if λ ≤ θ otherwise
if 0 ≤ x ≤ θ
with λ > 0,α > 0 and a set of observations D = {x1,…,xN} consisting of N samples xi ∈ R+ generated from
the above probabilistic model.
Derive the posterior distribution P(θ | D, λ, α).
0 otherwise
– Page 5 / 52 –

0 1 2 3 4 5 6 7 8
Note that there is no bias term.
where Var(y) is the sample variance of y.
Find the data matrix Xnew ∈ RN×D such that the solution to the new problem:
Problem 2: Linear regression (Version A) (8 credits)
We want to perform regression on a dataset consisting of N samples xi ∈ RD with corresponding targets yi ∈ R (represented compactly as X ∈ RN×D and y ∈ RN).
Assume that we have fitted a linear regression model and obtained the optimal weight vector w∗ ∈ RD as
w∗ =argmin2 (wTxi −yi)2.
Now, assume that we normalize the target variables to have a variance of 1, i.e. ynew = σ1 · y with σ = Var(y),
w∗new = argmin 2 (wTxnew,i −ynew,i)2
will be the same as the solution to the previous problem i.e. w∗new = w∗. Justify your answer.
Note: xnew,i is row i of Xnew, represented as a column vector.
– Page 6 / 52 –

Problem 2: Linear regression (Version B) (8 credits)
We want to perform regression on a dataset consisting of N samples xi ∈ RD with corresponding targets yi ∈ R (represented compactly as X ∈ RN×D and y ∈ RN).
Assume that we have fitted a linear regression model and obtained the optimal weight vector w∗ ∈ RD as
w∗ =argmin2 (wTxi −yi)2.
Now, assume that we normalize the target variables to have a variance of 1, i.e. ynew = σ1 · y with σ = Var(y),
where Var(y) is the sample variance of y.
Find the data matrix Xnew ∈ RN×D such that the solution to the new problem: 0 1
1 􏲛N 2 w∗new = argmin 2 (wTxnew,i −ynew,i)2 3
Note that there is no bias term.
will be the same as the solution to the previous problem i.e. w∗new = w∗. Justify your answer. Note: xnew,i is row i of Xnew, represented as a column vector.
– Page 7 / 52 –

0 1 2 3 4 5 6 7 8
Note that there is no bias term.
where Var(y) is the sample variance of y.
Find the data matrix Xnew ∈ RN×D such that the solution to the new problem:
Problem 2: Linear regression (Version C) (8 credits)
We want to perform regression on a dataset consisting of N samples xi ∈ RD with corresponding targets yi ∈ R (represented compactly as X ∈ RN×D and y ∈ RN).
Assume that we have fitted a linear regression model and obtained the optimal weight vector w∗ ∈ RD as
w∗ =argmin2 (wTxi −yi)2.
Now, assume that we normalize the target variables to have a variance of 1, i.e. ynew = σ1 · y with σ = Var(y),
w∗new = argmin 2 (wTxnew,i −ynew,i)2
will be the same as the solution to the previous problem i.e. w∗new = w∗. Justify your answer.
Note: xnew,i is row i of Xnew, represented as a column vector.
– Page 8 / 52 –

Problem 2: Linear regression (Version D) (8 credits)
We want to perform regression on a dataset consisting of N samples xi ∈ RD with corresponding targets yi ∈ R (represented compactly as X ∈ RN×D and y ∈ RN).
Assume that we have fitted a linear regression model and obtained the optimal weight vector w∗ ∈ RD as
w∗ =argmin2 (wTxi −yi)2.
Now, assume that we normalize the target variables to have a variance of 1, i.e. ynew = σ1 · y with σ = Var(y),
where Var(y) is the sample variance of y.
Find the data matrix Xnew ∈ RN×D such that the solution to the new problem: 0 1
1 􏲛N 2 w∗new = argmin 2 (wTxnew,i −ynew,i)2 3
Note that there is no bias term.
will be the same as the solution to the previous problem i.e. w∗new = w∗. Justify your answer. Note: xnew,i is row i of Xnew, represented as a column vector.
– Page 9 / 52 –

Problem 3: k-nearest neighbors (Version A) (3 credits)
In the following figure we see the unit circles of three distance functions.
1.0 0.5 0.0 0.5 1.0
1.0 0.5 x(1)
0.0 0.5 1.0 x(2)
a) Assign each of the following three distance functions its corresponding unit circle (letter a-c) from the 1 figure.
• L2-distance: ∥x(1) − x(2)∥2 = (1) (2)
􏲝􏲗 􏲘2 􏲚 x(1) − x(2) iii
􏲚 􏲖􏲖 (1) (2)􏲖􏲖 • L1-distance: ∥x −x ∥1 = i 􏲖xi −xi 􏲖
• L -distance: ∥x(1) − x(2)∥ = max 􏲖􏲖􏲖x(1) − x(2)􏲖􏲖􏲖 ∞∞iii
In the following figure we see a two-dimensional dataset with two classes. We would like to classify the point (2, 6) marked with a circle using k -nearest-neighbors with k = 3.
11 10 9 8 7 6 5 4 3 2 1 0
0 b) What is the predicted class of the point when using the L1 distance? 1
0 c) What is the predicted class of the point when using the L∞ distance? 1
0 1 2 3 4 5 6 7 8 9 10 11
– Page 10 / 52 –
x(1) x(2) 22

Problem 3: k-nearest neighbors (Version B) (3 credits)
In the following figure we see the unit circles of three distance functions.
1.0 0.5 0.0 0.5 1.0
a) Assign each of the following three distance functions its corresponding unit circle (letter a-c) from the figure.
1.0 0.5 x(1)
0.0 0.5 1.0 x(2)
• L2-distance: ∥x(1) − x(2)∥2 = (1) (2)
􏲝􏲗 􏲘2 􏲚 x(1) − x(2) iii
􏲚 􏲖􏲖 (1) (2)􏲖􏲖 • L1-distance: ∥x −x ∥1 = i 􏲖xi −xi 􏲖
• L -distance: ∥x(1) − x(2)∥ = max 􏲖􏲖􏲖x(1) − x(2)􏲖􏲖􏲖 ∞∞iii
In the following figure we see a two-dimensional dataset with two classes. We would like to classify the point (2, 6) marked with a circle using k -nearest-neighbors with k = 3.
11 10 9 8 7 6 5 4 3 2 1 0
0 1 2 3 4 5 6 7 8 9 10 11
b) What is the predicted class of the point when using the L1 distance? 0 1
c) What is the predicted class of the point when using the L∞ distance? 0 1
– Page 11 / 52 –
x(1) x(2) 22

Problem 3: k-nearest neighbors (Version C) (3 credits)
In the following figure we see the unit circles of three distance functions.
1.0 0.5 0.0 0.5 1.0
1.0 0.5 x(1)
0.0 0.5 1.0 x(2)
a) Assign each of the following three distance functions its corresponding unit circle (letter a-c) from the 1 figure.
• L2-distance: ∥x(1) − x(2)∥2 = (1) (2)
􏲝􏲗 􏲘2 􏲚 x(1) − x(2) iii
􏲚 􏲖􏲖 (1) (2)􏲖􏲖 • L1-distance: ∥x −x ∥1 = i 􏲖xi −xi 􏲖
• L -distance: ∥x(1) − x(2)∥ = max 􏲖􏲖􏲖x(1) − x(2)􏲖􏲖􏲖 ∞∞iii
In the following figure we see a two-dimensional dataset with two classes. We would like to classify the point (8, 6) marked with a circle using k -nearest-neighbors with k = 3.
11 10 9 8 7 6 5 4 3 2 1 0
0 b) What is the predicted class of the point when using the L1 distance? 1
0 c) What is the predicted class of the point when using the L∞ distance? 1
0 1 2 3 4 5 6 7 8 9 10 11
– Page 12 / 52 –
x(1) x(2) 22

Problem 3: k-nearest neighbors (Version D) (3 credits)
In the following figure we see the unit circles of three distance functions.
1.0 0.5 0.0 0.5 1.0
a) Assign each of the following three distance functions its corresponding unit circle (letter a-c) from the figure.
1.0 0.5 x(1)
0.0 0.5 1.0 x(2)
• L2-distance: ∥x(1) − x(2)∥2 = (1) (2)
􏲝􏲗 􏲘2 􏲚 x(1) − x(2) iii
􏲚 􏲖􏲖 (1) (2)􏲖􏲖 • L1-distance: ∥x −x ∥1 = i 􏲖xi −xi 􏲖
• L -distance: ∥x(1) − x(2)∥ = max 􏲖􏲖􏲖x(1) − x(2)􏲖􏲖􏲖 ∞∞iii
In the following figure we see a two-dimensional dataset with two classes. We would like to classify the point (8, 6) marked with a circle using k -nearest-neighbors with k = 3.
11 10 9 8 7 6 5 4 3 2 1 0
0 1 2 3 4 5 6 7 8 9 10 11
b) What is the predicted class of the point when using the L1 distance? 0 1
c) What is the predicted class of the point when using the L∞ distance? 0 1
– Page 13 / 52 –
x(1) x(2) 22

Problem 4: Classification (Version A) (6 credits)
You are given a balanced dataset with two classes, i.e. p(y = 0) = p(y = 1). Assume that the ground truth classconditionaldistributionsarebivariateGaussiandistributions,i.e. p(x|c)=N(x|μc,Σc)withmeanμc and covariance Σc for each class c ∈ {0, 1}.
Further assume that we can choose between two models to fit the data:
• Linear Discriminant Analysis with Gaussian class conditional distributions
• Naïve Bayes with Gaussian class conditional distributions
For each of the datasets shown below (a, b, c), choose one of the possible options (1,2,3) and justify your answer:
1. We should use Linear Discriminant Analysis.
2. We should use Naïve Bayes.
3. There is no clear reason to prefer one model over the other.
(a) (b) (c)
– Page 14 / 52 –

Problem 4: Classification (Version B) (6 credits)
You are given a balanced dataset with two classes, i.e. p(y = 0) = p(y = 1). Assume that the ground truth classconditionaldistributionsarebivariateGaussiandistributions,i.e. p(x|c)=N(x|μc,Σc)withmeanμc and covariance Σc for each class c ∈ {0, 1}.
Further assume that we can choose between two models to fit the data:
• Linear Discriminant Analysis with Gaussian class conditional distributions
• Naïve Bayes with Gaussian class conditional distributions
For each of the datasets shown below (a, b, c), choose one of the possible options (1,2,3) and justify your answer:
1. We should use Linear Discriminant Analysis.
2. We should use Naïve Bayes.
3. There is no clear reason to prefer one model over the other.
– Page 15 / 52 –

Problem 4: Classification (Version C) (6 credits)
You are given a balanced dataset with two classes, i.e. p(y = 0) = p(y = 1). Assume that the ground truth classconditionaldistributionsarebivariateGaussiandistributions,i.e. p(x|c)=N(x|μc,Σc)withmeanμc and covariance Σc for each class c ∈ {0, 1}.
Further assume that we can choose between two models to fit the data:
• Linear Discriminant Analysis with Gaussian class conditional distributions
• Naïve Bayes with Gaussian class conditional distributions
For each of the datasets shown below (a, b, c), choose one of the possible options (1,2,3) and justify your answer:
1. We should use Linear Discriminant Analysis.
2. We should use Naïve Bayes.
3. There is no clear reason to prefer one model over the other.
(a) (b) (c)
– Page 16 / 52 –

Problem 4: Classification (Version D) (6 credits)
You are given a balanced dataset with two classes, i.e. p(y = 0) = p(y = 1). Assume that the ground truth classconditionaldistributionsarebivariateGaussiandistributions,i.e. p(x|c)=N(x|μc,Σc)withmeanμc and covariance Σc for each class c ∈ {0, 1}.
Further assume that we can choose between two models to fit the data:
• Linear Discriminant Analysis with Gaussian class conditional distributions
• Naïve Bayes with Gaussian class conditional distributions
For each of the datasets shown below (a, b, c), choose one of the possible options (1,2,3) and justify your answer:
1. We should use Linear Discriminant Analysis.
2. We should use Naïve Bayes.
3. There is no clear reason to prefer one model over the other.
(a) (b) (c)
– Page 17 / 52 –

Problem 5: Optimization – Convexity (Version A) (10 credits)
Consider the two functions
1 Prove or disprove that f(x) is convex in x. 2
f(x) = max xi − min xi i=1,…,n i=1,…,n
|xi − median(x)|
Prove or disprove that g(x) is convex in x.
Hint: median(x) = mint∈R ∥x − t1∥1 with ∥ · ∥1 being the sum over x’s elements’ absolute values.
g(x) = n with x ∈ Rn. You may assume that n is odd.
0 1 2 3 4 5 6
– Page 18 / 52 –

Problem 5: Optimization – Convexity (Version B) (10 credits)
Consider the two functions
f(x) = max xi − min xi i=1,…,n i=1,…,n
|xi − median(x)|
Prove or disprove that g(x) is convex in x.
Hint: median(x) = mint∈R ∥x − t1∥1 with ∥ · ∥1 being the sum over x’s elements’ absolute values.
g(x) = n with x ∈ Rn. You may assume that n is odd.
Prove or disprove that f(x) is convex in x.
– Page 19 / 52 –
0 1 2 3 4 5 6

Problem 5: Optimization – Convexity (Version C) (10 credits)
Consider the two functions
1 Prove or disprove that f(x) is convex in x. 2
f(x) = max xi − min xi i=1,…,n i=1,…,n
|xi − median(x)|
Prove or disprove that g(x) is convex in x.
Hint: median(x) = mint∈R ∥x − t1∥1 with ∥ · ∥1 being the sum over x’s elements’ absolute values.
g(x) = n with x ∈ Rn. You may assume that n is odd.
0 1 2 3 4 5 6
– Page 20 / 52 –

Problem 5: Optimization – Convexity (Version D) (10 credits)
Consider the two functions
f(x) = max xi − min xi i=1,…,n i=1,…,n
|xi − median(x)|
Prove or disprove that g(x) is convex in x.
Hint: median(x) = mint∈R ∥x − t1∥1 with ∥ · ∥1 being the sum over x’s elements’ absolute values.
g(x) = n with x ∈ Rn. You may assume that n is odd.
Prove or disprove that f(x) is convex in x.
– Page 21 / 52 –
0 1 2 3 4 5 6

Problem 6: Deep learning (Version A) (8 credits)
Suppose x ∈ RN and y ∈ RN are two vectors. We define the functions f : RN × RN → RN and g : RN → R,
and use them to compute
z = f(x,y) t = g(z).
The code below implements the computation of f and g, as well as its gradients using backpropagation. Your task is to complete the missing code fragments.
NOTE: The code is given in Python but you can write the solution in pseudocode as long as it is clear and unambiguous, making sure that the return values have correct shapes.
import numpy as np
def forward(self, x, y):
self.cache = (x, y) ################################################## # MISSING CODE FRAGMENT #1 ################################################## return out
def backward(self, d_out):
# x, y are arrays of shape (N,) x, y = self.cache
d_x = np.sin(y) * d_out
d_y = x * np.cos(y) * d_out return d_x , d_y
def forward(self, z):
self.cache = z out = np.mean(z) return out
def backward(self, d_out):
# z is an array of shape (N,)
z = self.cache ################################################## # MISSING CODE FRAGMENT #2 ################################################## return d_z
# Example usage
f, g = F(), G()
x = np.array([1, 2, 3]) y = np.array([4, 5, 6])
z = f.forward(x, y) t = g.forward(z)
d_z = g.backward(d_out=1.0) d_x, d_y = f.backward(d_z)
– Page 22 / 52 –

a) Complete the MISSING CODE FRAGMENT #1.
b) Complete the MISSING CODE FRAGMENT #2.
– Page 23 / 52 –

Problem 6: Deep learning (Version B) (8 credits)
Suppose x ∈ RN and y ∈ RN are two vectors. We define the functions f : RN × RN → RN and g : RN → R,
and use them to compute
z = f(x,y) t = g(z).
The code below implements the computation of f and g, as well as its gradients using backpropagation. Your task is to complete the missing code fragments.
NOTE: The code is given in Python but you can write the solution in pseudocode as long as it is clear and unambiguous, making sure that the return values have correct shapes.
import numpy as np
def forward(self, x, y):
self.cache = (x, y) ################################################## # MISSING CODE FRAGMENT #1 ################################################## return out
def backward(self, d_out):
# x, y are arrays of shape (N,)
x, y = self.cache
d_x = np.exp(x) / np.exp(y) * d_out d_y = -d_x
return d_x , d_y
def forward(self, z):
self.cache = z out = np.sum(z) return out
def backward(self, d_out):
# z is an array of shape (N,)
z = self.cache ################################################## # MISSING CODE FRAGMENT #2 ################################################## return d_z
# Example usage
f, g = F(), G()
x = np.array([1, 2, 3]) y = np.array([4, 5, 6])
z = f.forward(x, y) t = g.forward(z)
d_z = g.backward(d_out=1.0) d_x, d_y = f.backward(d_z)
– Page 24 / 52 –

a) Complete the MISSING CODE FRAGMENT #1.
b) Complete the MISSING CODE FRAGMENT #2.
– Page 25 / 52 –

Problem 6: Deep learning (Version C) (8 credits)
Suppose x ∈ RN and y ∈ RN are two vectors. We define the functions f : RN × RN → RN and g : RN → R,
and use them to compute
z = f(x,y) t = g(z).
The code below implements the computation of f and g, as well as its gradients using backpropagation. Your task is to complete the missing code fragments.
NOTE: The code is given in Python but you can write the solution in pseudocode as long as it is clear and unambiguous, making sure that the return values have correct shapes.
import numpy as np
def forward(self, x, y):
self.cache = (x, y) ################################################## # MISSING CODE FRAGMENT #1 ################################################## return out
def backward(self, d_out):
# x, y are arrays of shape (N,) x, y = self.cache
temp = np.cos(x * y) * d_out d_x = y * temp
d_y = x * temp
return d_x , d_y
def forward(self, z):
self.cache = z
out = np.prod(z) # Product of array elements return out
def backward(self, d_out):
# z is an array of shape (N,)
z = self.cache ################################################## # MISSING CODE FRAGMENT #2 ################################################## return d_z
# Example us

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com