CS计算机代考程序代写 algorithm Bayesian network python Bayesian 1 Bayesian Sequential Update (?? marks)

1 Bayesian Sequential Update (?? marks)
In this section we will explore using Bayesian sequential updating for linear regression.
a) (1 mark) Suppose we estimate a weight vector w from data using a Gaussian prior and a Gaus-
sian likelihood. Write (with appropriate definitions) the prior and posterior for w given N data
points. Assume the prior is zero-mean and with diagonal covariance matrix 1 I for scalar α > 0. α
b) (3 marks) Consider the following data generator, which returns an (xn,tn) pair, where xn ∈ R and tn ∈ R.
􏰃􏰀
􏰂
import numpy as np
def sim one example (): ’’’
Generate one single (x,t) pair,
where x is drawn uniformly from [−1, 1), y(x,w) = w0 + w1 ∗ x, and
t = N(x | y(x,w), sigmaˆ2).
’’’
w0, w1, sigma = −0.2, 0.8, 0.04
x = np.random.uniform(−1, 1) y = w0 + w1 ∗ x
t = np.random.normal(y, sigma) return x, t
The following illustration contains 8 plots in 3 rows by 3 columns. The columns are the like- lihood, the prior/posterior, and 5 samples from the posterior predictive distribution. On the rightmost plot, the data is also shown as circles. The first row depicts the situation before ob- serving any data, and the following two rows depict the situation after observing the first and second (example, label) pair respectively.
Page 2 of ?? – Statistical Machine Learning – COMP4670 / COMP8600
􏰁

likelihood
prior/posterior
data space
0.5 0.0 0.5 1.0
x
1.0 0.5 0.0 0.5 1.0
1.0 0.5 0.0 0.5
1.0 0.5 0.0 0.5 1.0
1.0 0.5 0.0 0.5 1.0
1.0 0.5 0.0 0.5
1.0
w0 w0 x
1.0 0.5 0.0 0.5 1.0
1.0 0.5 0.0 0.5 1.0
1.0
w0 w0
1.0 0.5 0.0 0.5
1.0 0.5 0.0 0.5 1.0
x
1.0
0.5
0.0 w0
0.5
1.0
1.0
1.0 0.5 0.0 0.5 1.0
1.0 0.5 0.0 0.5 1.0
1.0 0.5 0.0 0.5 1.0
1.0 0.5 0.0 0.5 1.0
1.0
1.0 0.5 0.0 0.5 1.0
Discuss the plots, explaining why they make sense. Argue using the concrete example.
c) (1 mark) The plot below shows the figures after 5 samples. Identify the most recently added data point in the right hand side figure. Explain your reasoning.
d) (2 marks) Write down the update equations for the mean vector and covariance matrix of the (Gaussian) posterior distribution, when observing a single new example (xn,tn).
1.0 0.5 0.0 0.5 1.0
1.0 0.5 0.0 0.5
prior/posterior
1.0 0.5 0.0 0.5
data space
likelihood
1.0
w0 w0 x
1.0 0.5 0.0 0.5 1.0
1.0 0.5 0.0 0.5 1.0
1.0
1.0 0.5 0.0 0.5 1.0
Page 3 of ?? – Statistical Machine Learning – COMP4670 / COMP8600
w1 w1 w1
w1 w1 w1 w1
y yyy

2 Logistic Regression (?? marks)
In the following questions we consider logistic regression without quadratic regularisation. For the questions which require code, use Python 3 and include comments which specify the input and output formats for your functions. Note that marks are allocated for documentation.
􏰃
You may assume the following preamble: 􏰀 􏰁
a) (1mark)Forbinarylogisticregression(asdiscussedinthelectureandusedinthetutorial),how are the labels encoded?
b) (1 mark) There are three possible output types that a logistic regression classifier can produce, corresponding to the linear model output, the probabilistic output and the classification output. The three output types are called decision function, predict and predict proba in the scikit-learn interface (in no particular order). Explain (using both text and equations) the three output types.
c) (1 mark) Define and explain the purpose of a confusion matrix (using both text and equations).
d) (1 mark) Write down the mathematical definition of the sigmoid function y = σ (x).
e) (1 mark) Write down the mathematical definition of the cost function for binary logistic regres- sion (average cross-entropy). Do not forget to define your notation.
f) (2 marks) Calculate the mathematical form of the gradient (with respect to the model parame- ters) of the cost function above.
g) (2 marks) Consider a single (input, target) pair. Draw graphical models which depict
i) the logistic regression model, and ii) the naive Bayes classifier model.
􏰂 import numpy as np
Page 4 of ?? – Statistical Machine Learning – COMP4670 / COMP8600

3 Graphical Models – Comparison (?? marks)
(3 marks) Compare Bayesian networks and Markov random fields by describing three similarities and
three differences.
Page 5 of ?? – Statistical Machine Learning – COMP4670 / COMP8600

4 Graphical Models – Alarm (?? marks)
John and Mary live in a house which is equipped with a burglar alarm system. The situation can be described by the following Bayesian network with the random variables B (a Burglar entered the house), E (an Earthquake occured), A (the Alarm in the house rings), R (Radio news report about an earthquake), J (John calls the police), and M (Mary, who listens to the radio, calls the police).
The domain of each random variable is B = {0, 1} encoding False (= 0) and True (= 1). BE
AR
JM
(a) (1 mark) Write out the joint distribution p(B,E,A,R,J,M) in its factored form according to this Bayesian network structure.
Express the answers to the following three queries only in terms of marginalisations “∑X∈X ” and maximisations “argmaxX∈X ” , where each X is a random variable, and X the corresponding set of possible values X can take. For example, the following identity is acceptable:
p(B,E = 1,A,R,J,M) = 1
Wherever possible, simplify your expressions by exploiting the structure of the above graphical model.
(b) (1 mark) The probability that the alarm will ring given no observations.
(c) (1 mark) The joint probability that Mary called the police and John did not when there was an earthquake observed.
(d) (2marks)Themostprobablevaluesforthetuple(E,B)ifbothJohnandMarycalledthepolice and at least one of the events E, B did happen.
(e) (2 marks) Write down all conditional independence relations in the Bayesian network when only A is observed.
(f) (2 marks) Write down all conditional independence relations in the Bayesian network when only E is observed.

B,A,R,J,M∈B
Page 6 of ?? – Statistical Machine Learning – COMP4670 / COMP8600

5
a) b)
c) d) e)
f) g)
Sampling – Gaussian Mixture (?? marks) (1 mark) Define the setup for and the goal of rejection sampling.
(2 marks) Write down all steps of the rejection sampling algorithm. You are encouraged to provide precise mathematical formulations and/or pseudo-code where appropriate, and also to explain your answer.
(1 mark) What are some limitations of rejection sampling?
(1 mark) Define the setup for and the goal of ancestral sampling.
(2 marks) Write down all steps of the ancestral sampling algorithm. You are encouraged to provide precise mathematical formulations and/or pseudo-code where appropriate, and also to explain your answer.
(1 mark) What are some limitations of ancestral sampling? Consider the mixture of three Gaussians,
p(x) = 3 N (x|5,0.5)+ 3 N (x|9,2)+ 4 N (x|2,20), 10 10 10
where N (x|μ,σ) is a Gaussian distribution with mean μ and standard deviation σ.
g1) (1 mark) Discuss which sampling method (rejection or ancestral) is more appropriate for
the above distribution, and why.
g2) (1 mark) Derive, explain and write python code to draw 1000 samples from the given
distribution using whichever of the two methods you deem most appropriate. Assume that
􏰃􏰀 􏰁
your code starts with
import numpy as np 􏰂
Page 7 of ?? – Statistical Machine Learning – COMP4670 / COMP8600

6 Principal Component Analysis (?? marks)
(3 marks) Given is a set of data points xi ∈ RD,i = 1,…,N. Project all data points onto a one- dimensional hyperplane. Derive the formula for the unit vector u representing the hyperplane in which the projected data points xi have the largest variance.
(1 mark) Discuss under which circumstances u is not uniquely defined.
Page 8 of ?? – Statistical Machine Learning – COMP4670 / COMP8600

7 Expectation Maximisation (EM) (?? marks) Consider the Bernoulli mixture model defined by
K
p(z|π) = ∏πzk k
k=1 K
p(x|z,μ1,μ2,…,μK)= ∏p(x|μk)zk k=1
D
p(x|μ) = μxi (1 − μ )1−xi
where x ∈ {0,1}D, z ∈ {0,1}K, π ∈ [0,1]K, μk ∈ [0,1]D, and exactly one element of z is equal to one, and the rest are equal to zero. Assume we are given N observations x1,x2,…,xN, which are independently and identically distributed as x above.
a) (1 mark) Describe the set up for and the goal of EM.
b) (1 mark) Draw the graphical model using plate notation, shading the observed variable(s).
c) (1 mark) Derive p(x|μ,π).
d) (2 marks) Derive logp(X,Z|π,μ1,μ2,…,μK). X is an N×D matrix of observations. Z is an N × K matrix of the corresponding latent variables, a priori distributed as z above.
e) (1mark)StatethecriteriaoptimisedbythemaximisationstepoftheEMalgorithmforinferring π,μ1,μ2,…,μK given X.
f) (6 marks) Derive EM updates for π,μ1,μ2,…,μK given X in readily implementable form.
g) (2 marks) Give two different solutions to which the EM algorithm may converge, for the case
N=2,K=2withx1 ̸=x2.
Page 9 of ?? – Statistical Machine Learning – COMP4670 / COMP8600
∏ii i=1

8 Local quadratic approximation (?? marks)
(4 marks) Given is a smooth and twice differentiable function E(w) mapping vectors w ∈ RD to R. Show that in the vicinity of a critical (or stationary) point w⋆ of E, the function can be approximated by a quadratic form
1D
E ( w ) ≈ E ( w ⋆ ) + 2 ∑ λ i α i2 ,
i=1
where λi and ui, are the D eigenvalues and unit norm eigenvectors of the Hessian of E at the critical point w⋆, respectively. Each αi measures the distance between w and w⋆ in the direction of the
eigenvector ui
Hint: Use the Taylor expansion of E.
uTi (w−w⋆)=αi.
Page 10 of ?? – Statistical Machine Learning – COMP4670 / COMP8600