UNIVERSITY COLLEGE LONDON
Faculty of Engineering Sciences
BENG0095: Assignment 1
Tuesday 16th November 2021
Guidelines
• Release Date: Tuesday, 16th November 2021
• Due Date: Tuesday, 29th November 2021 at 4.00 p.m.
• Weighting: 25% of module total
• This paper consists of TWO questions. Answer ALL TWO questions.
• You must format your submission as a pdf using one of the following:
– LATEX: A template, ‘Assignment1 BENG0095 SolutionTemplate.tex’, is provided
on the module’s Moodle page for this purpose.
– MS Word and its equation editor.
• You must submit your pdf via the module’s Moodle page using the ‘BENG0095 (2021/22)
Individual Coursework 1: Submission’ submission portal
• You should preface your report with a single page containing, on two lines:
– The module code and assignment title: ‘BENG0095: Assignment 1’
– Your candidate number: ‘Candidate Number: [YOUR NUMBER]’
• Within your report you should begin each sub-question on a new page.
• Please be aware not all questions carry equal marks.
• Marks for each part of each question are indicated in square brackets.
• Unless a question specifies otherwise, then please make use of the Notation section as a
guide to the definition of objects.
• Please express your answers as succinctly as possible, remember to detail your working,
and state clearly any assumptions which you make.
1
Notation & Formulae
Inputs:
x = [1, x1, x2, …, xm]
T ∈ Rm+1
Outputs:
y ∈ R for regression problems
y ∈ {0, 1} for binary classification problems
Training Data:
S = {x(i), y(i)}ni=1
Input Training Data:
The design matrix, X, is defined as:
X =
x(1)T
x(2)T
·
·
x(n)T
=
1 x
(1)
1 · · x
(1)
m
1 x
(2)
1 · · x
(2)
m
· · · · ·
· · · · ·
1 x
(n)
1 · · x
(n)
m
Output Training Data:
y =
y(1)
y(2)
·
·
y(n)
Data-Generating Distribution:
S is drawn i.i.d. from a data-generating distribution, D
Page 2
1. In linear regression in general we seek to learn a linear mapping, fw, characterised by a
weight vector, w ∈ Rm+1, and drawn from a function class, F :
F = {fw(x) = w · x|w = [w0, w1, …, wm]T ∈ Rm+1}
Now, consider a data-generating distribution described by a Gaussian additive noise model,
such that:
y = w · x + ε where: ε ∼ N (0, α), α > 0
Here y is the outcome of a random variable, Y, which characterises the output of a particular
data point, and x is the outcome of a random variable, X , which characterises the input to
a particular data point.
Given some sample training data, S, (containing n data points), and a novel test point
with input x̃, we seek to make statements about the (unknown) output, ỹ, of this novel test
point. The statements will be phrased in terms of a prediction, fw(x̃), of ỹ.
(a) [4 marks]
Derive the Maximum Likelihood Estimate (MLE) prediction of ỹ.
(b) [6 marks]
Given a prior distribution, pW(w), over w, such that each instance of w is an outcome
of a Gaussian random variable, W , where:
w ∼ N (0, βIm+1) where: β > 0
Derive the Maximum A Posteriori (MAP) prediction of ỹ. (Your answer should be
succinct: Do not explicitly use differentiation in your derivation).
(c) [10 marks]
Given the same prior distribution as in (b), derive the Bayesian predictive distribution
for the outcome ỹ. In other words fully characterise: pY(ỹ|S, x̃). (Your answer should
be succinct: Use the properties of the marginal and conditional distributions of linear
Gaussian models).
(d) [5 marks]
As n becomes very large, succinctly describe the relationship between your answers to
parts (a), (b), and (c).
Page 3
2. Assume that x is the outcome of a random variable, X , y is the outcome of a random
variable, Y, and that (x, y) are drawn i.i.d. from some data generating process, D, i.e.
(x, y) ∼ D.
Here D is characterised by pX ,Y(x, y) = pY(y|x)pX (x) for some pmf, pY(·|·), and some pdf,
pX (·).
We evaluate the performance of a prediction function, f , on a particular data point, (x, y),
using a losss measure, E(f(x), y).
We define the generalisation loss as:
L(E ,D, f) = ED[E(f(X ),Y)]
f∗ is said to be Bayes Optimal if:
f∗ = argmin
f
ED[E(f(X ),Y)]
(a) [5 marks]
Assuming a binary classification setting in which y ∈ {0, 1} and E(f(x), y) is described
by the following loss matrix:
y = 0 y = 1[ ]
0 1, 000 f(x) = 0
1 0 f(x) = 1
Derive the Bayes optimal classifier in this case.
(b) [7 marks]
Assuming a multinomial classification setting in which y ∈ {1, …, k} and E(f(x), y) is
described by the misclassification loss: E(f(x), y) = I[y 6= f(x)].
Derive the Bayes optimal classifier in this case.
(c) [7 marks]
In a binary classification setting, assume that classes are distributed according to a
Bernoulli random variable, Y, with outcomes y ∼ Bern(θ), with pmf characterised by
pY(y = 1) = θ. Furthermore we model the class conditional probability distributions
for the random variable, X , whose outcomes are given by instances of particular input
attributes, x = x ∈ R, as (note that we are dealing with a 1-dimensional attribute
vector in this case):
x|(y = 0) ∼ Poisson(λ0) where: λ0 ∈ R
x|(y = 1) ∼ Poisson(λ1) where: λ1 ∈ R
Here λ0 6= λ1.
With the aid of a well marked sketch, fully characterise the discriminant boundary
between the two classes for each of the following cases:
Page 4
i) The loss matrix is balanced.
ii) The loss matrix is similar to that in part (a).
[Hint:
If a random variable Z, with outcomes z ∈ {0, 1, 2, …}, is Poisson distributed then the
associated pdf is given by: pZ(z;λ) =
e−λλz
z!
, for some λ ∈ R.]
(d) [6 marks]
With the aid of a well marked sketch, and for the case of a balanced loss matrix, com-
pare and contrast the boundary you generated in part (c) with the boundary generated
by an appropriate Gaussian Naive Bayes model for each of the following cases:
i) The class contingent variances of the Gaussian Naive Bayes model are equal.
ii) The class contingent variances of the Gaussian Naive Bayes model are unequal.
Page 5