Dashboard Mycourses [WiSe21/22]ML1 General WrittenExamA
Started on Wednesday, 9 March 2022, 8:30 AM State Finished
Completed on Wednesday, 9 March 2022, 10:30 AM Time taken 2 hours
Eigenständigkeitserklärung / Declaration of Independence
Copyright By PowCoder代写 加微信 powcoder
By proceeding further, you confirm that you are completing the exam alone, without other resources than those that are authorized. Authorized resources include the ML1 course material, personal notes, online APIs documentation, and calculation/plotting tools.
Information
Question 1 Complete
Marked out of 5.00
Which of the following is True: Let kernel and dataset always satisfies:
be a Mercer (PSD and symmetric) kernel and
be an unlabeled dataset. A Gram matrix
associated to this
Question 2 Complete
Marked out of 5.00
Which of the following is True: A Mixture Model:
a. Is a class of unsupervised models designed in a way that the objective function is always convex.
b. Should be used if fitting a single distribution (e.g. Gaussian) for maximum likelihood takes too long or does not converge.
c. Is a class of unsupervised models that approximates the probability density function from which the data is generated.
d. Is a class of unsupervised models that can represent efficiently any probability density function.
N × N K Nx , … ,1x k
0 > jiK : 1=jN∀1=iN∀ 0≥uK⊤u : NR∈u∀
1−K = ⊤K I = ⊤KK
Which of the following is True: Layer-wise relevance propagation (LRP) is a method for explainable AI that:
a. can be applied to any black-box machine learning model.
b. assumes that the machine learning model has a neural network (or computational graph) structure.
c. requires function evaluations, where is the number of input dimensions, in order to produce an explanation.
d. can be applied to any black-box model, with the only condition that the gradient w.r.t. the input features can be computed.
Which of the following is True: A Gaussian Process (GP):
a. defines a multivariate Gaussian distribution over output variables, with covariance determined by input similarity.
b. defines a multivariate Gaussian distribution over input variables, with covariance determined by output similarity.
c. defines a multivariate distribution over output variables, with input drawn from a Gaussian distribution.
d. defines a multivariate Gaussian distribution over input variables.
Assume you would like to build a neural network that implements some decision boundary in . For this, you have at your disposal neurons of the type
Question 4 Complete
Marked out of 5.00
Information
Denote by and the two input neurons (initialized to the value and respectively). Denote by the hidden neurons, and by the output neuron.
sums over the indices of the incoming neurons. The sign function returns +1 when its input is positive, -1 when its input is negative, and 0 when its input is zero.
Question 5 Complete
Marked out of 10.00
Give the weights and biases associated to a neural network with the structure above and that implements the function depicted below:
w_13w_23[1 0 ] b_3[1] w_36 [1] b_7=0 w_14w_24[1 0 ] b_4[1] w_36 [-1] w_15w_25[-1 0 ] b_5[0] w_36 [0]
Question 3 Complete
Marked out of 5.00
6a 5a,4a,3a 2x 1x 2a 1a
)jb + jiwia i∑ (ngis = ja dR
Give the derivative of the function implemented by your network w.r.t. the parameter when the function is evaluated at
A kernel is positive semi-definite (PSD) if for any sequence of data points and real-valued scalars inequality holds:
Furthermore, if a kernel is PSD and symmetric, there exists a feature map associated to this kernel satisfying for all pairs
Consider the kernel
Rewrite the expression as a sum of positive terms, e.g. squared terms. (Note: you just need to write the final form, no need for the intermediate steps.)
2 \sum_{i=1}^ N \sum_{j=1} ^ N c_i c_j + \sum_{i=1}^ N \sum_{j=1} ^ N c_i c_j (x+ x’)^2
Without explicitly computing the feature map , express the distance to the origin in feature space, i.e. compute . 2 +
Information
the equality
, the following
Question 7 Complete
Marked out of 10.00
Question 8 Complete
Marked out of 5.00
Question 6
Not answered Marked out of 5.00
Nc , … ,1c
dR ∈ Nx , … ,1x N
R → dR × dR : k
.⟩)′x(φ ,)x(φ⟨ = )′x ,x(k H → dR : φ
∥0 − )x(φ∥
)0,3(=)2x,1x(31w
2⟩′x ,x⟨ + 2 = )′x ,x(k
1=j 1=i 0 ≥ )jx ,ix(kjcic ∑ ∑
)jx ,ix(kjcic j∑ i∑
Give a possible feature map associated to the kernel above, for the case where . p(x,y ) = (x^2, sqrt2 x x′, x ^2 , sqrt2 )
Consider some parameter and an estimator of that parameter, based on observations. It is common to analyze such estimator using a bias-variance analysis:
Information
Let parameter
be a sample drawn i.i.d. from a univariate Gaussian distribution with mean from those observations, and analyze its properties.
and consider the estimator
and variance
. We would like to build an estimator of the (unknown)
Question 10 Complete
Marked out of 5.00
of the parameter . Give its bias. 1/4 + (N/4 -1) ⋅ μ
Give its variance. N/16 ⋅ σ^2
Question 11 Complete
Marked out of 5.00
Question 9 Complete
Marked out of 5.00
)5X+3X+1X+1(⋅ 4 =^μ 1
.)^θ(raV + 2)^θ(saiB = ]2)θ − ^θ([E = )^θ(rorrE
R ∈ NX,…,1X
]2)]^θ[E − ^θ([E = )^θ(raV ]θ − ^θ[E = )^θ(saiB
Give its error.
[1/4 + (N/4 -1) μ ]^2 + N/16 ⋅ σ^2
Consider a new data point drawn from the same distribution, and consider the new estimator
Express the bias of this new estimator as a function of .
In this exercise, we would like to implement the maximum-likelihood method to estimate the best parameter of a data density model
, and use that approach to build a classifier. Assuming the data is generated independently and identically distributed (iid.), the dataset likelihood is
and the maximum likelihood solution is then computed as
where the log term can also be expressed as a sum, i.e.
In your implementation, when possible, you should make use of numpy vector operations in order to avoid loops.
Question 13 Complete
Marked out of 5.00
Information
with respect to some dataset
Question 12 Complete
Marked out of 5.00
.)θ|kx(p gol ∑ = )θ|D(p gol
θ )θ|D(pgol xamgra=
)θ|D(p xam gra = ^θ
)θ|kx(p ∏ = )θ|D(p
1 + N X 1 + N + ^μ 1 + N = ) w e n ( ^μ 1N
)Nx,…,1x( = D
Consider and the probability model
Write a function that takes as input some dataset D given as a numpy array of size , and some array of parameters THETA of size . The function should return an array of size containing the corresponding log-probability scores.
Question 15 Complete
Marked out of 10.00
Write a procedure that finds using grid search the parameters The parameters should be searched for on the interval
min_logp = 100
for i in range(100):
for j in range (100): θ_1 = (i -50 ) / 10 θ_2 = (j-50) / 10
if θ_1 + θ_2 >= 5 or θ_1 + θ_2<= -5 : break
logp = \sum_{k =1}^N \log p(x_k|\theta) if logp < min_logp:
min_logp = logp
that are optimal in the maximum likelihood sense for a given dataset D. . Furthermore, parameters should be constrained to satisfy .
Question 14
Not answered Marked out of 10.00
5 < ∥1θ + 2θ∥
)|iθ − x| 1 − ( pxe 1 ∑ 1 = )θ|x(p
Explain in one sentence the problem you would face if applying this grid search procedure when the model has a similar structure but more than two parameters (e.g. 10 parameters or more). Explain in one sentence how the problem can be addressed for the given class of models.
1. Grid search will encounter little influence of parameters, which will take a lot of time to find the best results, making it inefficient. 2. Use random search instead of grid search
Question 16 Complete
Marked out of 5.00
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com