COMP90051 StatML 2021S2 Q&A 09 – Solutions
October 2, 2021
Exercise 1: In Lecture 18 Slide 18, we claimed without proof a closed form posterior for Bayesian
linear regression: p(w|X,y, σ2) = N (w|wN ,VN ) where
wN =
1
σ2
VNX
Ty
VN = σ
2(XTX +
σ2
γ2
I)−1
Show how this is derived.
First recall that
p(w|X,y, σ2) =
p(w)p(y|X, σ2,w)
p(y|X, σ2)
.
Now since the denominator does not involve the variable w only constants (with respect to this
variable), we can ignore it given we’re hunting for the posterior taking a Gaussian form. We only
need to recognise the parameters of that final posterior Gaussian distribution, in order to fill in
the constant again later (since we know how to normalise a Gaussian given its parameters). When
we’re working with probability densities, any multiplicative constant can be ignored in this way.
While each of these Gaussian distributions are OK to work with, by taking logarithms of both
sides we’ll make things a little easier as we’ll eliminate the exponential terms of the Gaussian—less
to write per step! One thing to remember, is that ignorable constants will now be additive, no
longer multiplicative. For example if z ∼ N (µ,Σ), then log p(z) = −1
2
(z−µ)TΣ−1(z−µ)+C where
C just means some constant (no z at all) who’s exact value we don’t care about. Returning to our
problem:
log p(w|X,y, σ2) = log p(w) + log p(y|X, σ2,w)− log p(X,y, σ2)
= log p(w) + log p(y|X, σ2,w) + C
As promised the last term of the first equality is not dependent on w, so we treat it as a constant.
Now plugging in the Gaussians from L18:
log p(w) = −
1
2
wT
(
1
γ2
I
)
w + C
log p(y|X, σ2,w) = −
1
2σ2
‖Xw − y‖2 + C
= −
1
2σ2
(
wTXTXw − 2yTXw + yTy
)
+ C
= −
1
2σ2
(
wTXTXw − 2yTXw
)
+ C
1
Note yTy is another constant, disappeared in the last equality. Now adding these together (and
combining constants further) yields posterior
log p(w|X,y, σ2) = −
1
2
wT
(
1
γ2
I
)
w −
1
2σ2
(
wTXTXw − 2yTXw
)
+ C
= −
1
2
wT
(
1
γ2
I +
1
σ2
XTX
)
w +
1
2
(
2
σ2
yTXw
)
+ C
= −
1
2
wTV−1N w +
1
2
(
2
σ2
yTXw
)
+ C ,
where in the second equality we’ve collected the terms quadratic in w in order to move closer to the
typical log Gaussian form we’re trying to match against; and in the second equality we’ve plugged
in the definition of the final covariance matrix for the posterior, given to us in the exercise question
as it matched our term.
Let’s now work on that linear term by just rearranging terms a little, until we have our target
wN term present:
1
σ2
yTXw =
(
1
σ2
XTy
)T
w
=
(
1
σ2
VNX
Ty
)T
V−1N w
= wTNV
−1
N w .
So our posterior is now:
log p(w|X,y, σ2) = −
1
2
(
wTV−1N w − 2w
T
NV
−1
N w
)
+ C
= −
1
2
(
wTV−1N w − 2w
T
NV
−1
N w + w
T
NV
−1
N wN −w
T
NV
−1
N wN
)
+ C
= −
1
2
(
wTV−1N w − 2w
T
NV
−1
N w + w
T
NV
−1
N wN
)
+
1
2
wTNV
−1
N wN + C
= −
1
2
(w −wN )TV−1N (w −wN ) +
1
2
wTNV
−1
N wN + C
= −
1
2
(w −wN )TV−1N (w −wN ) + C ,
where equalities (2) follows from adding and substracting term wTNV
−1
N wN (3) follows from moving
the second of these terms outside the bracket (4) follows from recognising the big brackets now
contained a perfect square (5) follows from recognising that the last remaining extra term is also
a constant with no w. This completes the derivation, since we’ve arrived at the desirved Gaussian
with target parameters.
What did we do in the last steps? We completed the square! Like if you want to factor a scalar
quadratic x2 + x+ 3 you look at the linear term, and add/subtract half its coefficient squared e.g.,
(x2 + x+ 0.25) + 3− 0.25 = (x+ 0.5)2 + 3− 0.25 to get a perfect square term plus just constants
(i.e., moving the linear and quadratic terms in together).
2