CS计算机代考程序代写 Bayesian UNIVERSITY COLLEGE LONDON

UNIVERSITY COLLEGE LONDON
Faculty of Engineering Sciences

BENG0095: Assignment 1

Tuesday 16th November 2021

Guidelines

• Release Date: Tuesday, 16th November 2021

• Due Date: Tuesday, 29th November 2021 at 4.00 p.m.

• Weighting: 25% of module total

• This paper consists of TWO questions. Answer ALL TWO questions.

• You must format your submission as a pdf using one of the following:

– LATEX: A template, ‘Assignment1 BENG0095 SolutionTemplate.tex’, is provided
on the module’s Moodle page for this purpose.

– MS Word and its equation editor.

• You must submit your pdf via the module’s Moodle page using the ‘BENG0095 (2021/22)
Individual Coursework 1: Submission’ submission portal

• You should preface your report with a single page containing, on two lines:

– The module code and assignment title: ‘BENG0095: Assignment 1’

– Your candidate number: ‘Candidate Number: [YOUR NUMBER]’

• Within your report you should begin each sub-question on a new page.

• Please be aware not all questions carry equal marks.

• Marks for each part of each question are indicated in square brackets.

• Unless a question specifies otherwise, then please make use of the Notation section as a
guide to the definition of objects.

• Please express your answers as succinctly as possible, remember to detail your working,
and state clearly any assumptions which you make.

1

Notation & Formulae

Inputs:

x = [1, x1, x2, …, xm]
T ∈ Rm+1

Outputs:

y ∈ R for regression problems

y ∈ {0, 1} for binary classification problems

Training Data:

S = {x(i), y(i)}ni=1

Input Training Data:

The design matrix, X, is defined as:

X =



x(1)T

x(2)T

·
·

x(n)T


 =




1 x
(1)
1 · · x

(1)
m

1 x
(2)
1 · · x

(2)
m

· · · · ·
· · · · ·
1 x

(n)
1 · · x

(n)
m




Output Training Data:

y =



y(1)

y(2)

·
·

y(n)




Data-Generating Distribution:

S is drawn i.i.d. from a data-generating distribution, D

Page 2

1. In linear regression in general we seek to learn a linear mapping, fw, characterised by a
weight vector, w ∈ Rm+1, and drawn from a function class, F :

F = {fw(x) = w · x|w = [w0, w1, …, wm]T ∈ Rm+1}

Now, consider a data-generating distribution described by a Gaussian additive noise model,
such that:

y = w · x + ε where: ε ∼ N (0, α), α > 0

Here y is the outcome of a random variable, Y, which characterises the output of a particular
data point, and x is the outcome of a random variable, X , which characterises the input to
a particular data point.

Given some sample training data, S, (containing n data points), and a novel test point
with input x̃, we seek to make statements about the (unknown) output, ỹ, of this novel test
point. The statements will be phrased in terms of a prediction, fw(x̃), of ỹ.

(a) [4 marks]
Derive the Maximum Likelihood Estimate (MLE) prediction of ỹ.

(b) [6 marks]
Given a prior distribution, pW(w), over w, such that each instance of w is an outcome
of a Gaussian random variable, W , where:

w ∼ N (0, βIm+1) where: β > 0

Derive the Maximum A Posteriori (MAP) prediction of ỹ. (Your answer should be
succinct: Do not explicitly use differentiation in your derivation).

(c) [10 marks]
Given the same prior distribution as in (b), derive the Bayesian predictive distribution
for the outcome ỹ. In other words fully characterise: pY(ỹ|S, x̃). (Your answer should
be succinct: Use the properties of the marginal and conditional distributions of linear
Gaussian models).

(d) [5 marks]
As n becomes very large, succinctly describe the relationship between your answers to
parts (a), (b), and (c).

Page 3

2. Assume that x is the outcome of a random variable, X , y is the outcome of a random
variable, Y, and that (x, y) are drawn i.i.d. from some data generating process, D, i.e.
(x, y) ∼ D.
Here D is characterised by pX ,Y(x, y) = pY(y|x)pX (x) for some pmf, pY(·|·), and some pdf,
pX (·).

We evaluate the performance of a prediction function, f , on a particular data point, (x, y),
using a losss measure, E(f(x), y).

We define the generalisation loss as:

L(E ,D, f) = ED[E(f(X ),Y)]

f∗ is said to be Bayes Optimal if:

f∗ = argmin
f

ED[E(f(X ),Y)]

(a) [5 marks]
Assuming a binary classification setting in which y ∈ {0, 1} and E(f(x), y) is described
by the following loss matrix:

y = 0 y = 1[ ]
0 1, 000 f(x) = 0
1 0 f(x) = 1

Derive the Bayes optimal classifier in this case.

(b) [7 marks]
Assuming a multinomial classification setting in which y ∈ {1, …, k} and E(f(x), y) is
described by the misclassification loss: E(f(x), y) = I[y 6= f(x)].
Derive the Bayes optimal classifier in this case.

(c) [7 marks]
In a binary classification setting, assume that classes are distributed according to a
Bernoulli random variable, Y, with outcomes y ∼ Bern(θ), with pmf characterised by
pY(y = 1) = θ. Furthermore we model the class conditional probability distributions
for the random variable, X , whose outcomes are given by instances of particular input
attributes, x = x ∈ R, as (note that we are dealing with a 1-dimensional attribute
vector in this case):

x|(y = 0) ∼ Poisson(λ0) where: λ0 ∈ R
x|(y = 1) ∼ Poisson(λ1) where: λ1 ∈ R

Here λ0 6= λ1.

With the aid of a well marked sketch, fully characterise the discriminant boundary
between the two classes for each of the following cases:

Page 4

i) The loss matrix is balanced.

ii) The loss matrix is similar to that in part (a).

[Hint:

If a random variable Z, with outcomes z ∈ {0, 1, 2, …}, is Poisson distributed then the
associated pdf is given by: pZ(z;λ) =

e−λλz

z!
, for some λ ∈ R.]

(d) [6 marks]
With the aid of a well marked sketch, and for the case of a balanced loss matrix, com-
pare and contrast the boundary you generated in part (c) with the boundary generated
by an appropriate Gaussian Naive Bayes model for each of the following cases:

i) The class contingent variances of the Gaussian Naive Bayes model are equal.

ii) The class contingent variances of the Gaussian Naive Bayes model are unequal.

Page 5