https://xkcd.com/2205/
Announcements
Released: Quiz 2, Assignment 2, video assignment
Copyright By PowCoder代写 加微信 powcoder
Next three lectures: graphical models
Week 10 Wed: guest lecture
Approximate Inference + GP Classification
Laplace approximation – in general
Laplace approximation – Bayesian logistic regression
Bishop, Chap 4.4,
6.4 (6.4.5, part of 6.4.6, 6.4.7)
http://gaussianprocess.org/gpml/cha
GP classification
Laplace approximation – GP
Connection to neural networks
classification
Laplace approximation in general
[Bishop 4.4]
Goal: find a Gaussian approximation to a probability density
defined over a set of
continuous variables.
How: find Gaussian pdf q(z), centred
a mode of the
distribution p(z)
Consider pdf
Find mode z0
Taylor expansion
of ln f(z) at z0
Take exp()
Normalise to obtain q(z)
⇔ log(q) quadratic
Assume A > 0
Laplace approximation in higher
dimensions
f(z), with z0 a stationary point
Taylor expansion of ln
f(z) around z0
Laplace approximation in higher
dimensions
normalise, use (2.43)
q(z) is a valid multivariate Gaussian distribution iff A positive semi-definite → z0 is a local maximum, not local min or saddle point
What about f(z) with multiple modes? Different Laplace approximations for
Pros + cons
🙂 Normalisation constant
of Laplace approximation
🙂 CLT → posterior increasingly better observed data points grow.
Z for f(z)
🙁 Assumes the domain of z is R, or Rd, may need to
🙁 Is based
purely on f(z) around z0, no global info
not need to be known.
approximated
Gaussian as the number of
transform the r.v.
Approximate Inference + GP Classification
Laplace approximation – in general
Laplace approximation – Bayesian logistic regression
GP classification
Laplace approximation – GP
classification
Connection to neural networks
Recap: Logistic regression
Negative log-likelihood, or cross-entropy error function
logistic regression
Laplace approximation for
Bayesian logistic regression
Example courtesy of Edinburgh MLPR course https://www.inf.ed.ac.uk/teaching/courses/mlpr/2016/notes/w8a_bayes_logistic_regression_laplace.pdf
does the predictive distribution look like?
Figure courtesy of Edinburgh MLPR course https://www.inf.ed.ac.uk/teaching/courses/mlpr/2016/notes/w8a_bayes_logistic_regression_laplace.pdf
Approximate Inference + GP Classification
Laplace approximation – in general
Laplace approximation – Bayesian logistic regression
GP classification
Laplace approximation – GP
classification
Connection to neural networks
for classification
a(x) ~ GP(0, K)
Training set: input {x1, x2, … xN},
target variable t={t1, t2, … tN}T,
with ti 𝝐 {0, 1}
What we just said
intermediate variable a={a1, a2, … aN}T
For regression
[x1, x2, …
Sigmoid function
target variable
t={t1, t2, … tN}T
Rename the variable
Covariance
function of the GP
● Assume it’s
noise-less
● BUT add diagonal term
ensure that it’s positive semi-definite
Assume kernel function k(x,
Use Bayes rule:
do prediction?
Plug in GP prediction
x is “hidden” inside kernels k and C
posterior for
GP classification
Laplace approximation
Want to approximate
Taylor expansion of
Laplace approximation of
positive definite, but posterior is
non-Gaussian (since WN depend on aN)
Bringing it back
Bringing it back
Connection
to neural nets
2-layer neural network, with M
hiddent units
Bayesian neural
f(x, w), with
prior over w
(Neal 1996) for a broad class of prior
distributions over w, the distribution of
functions generated by a neural network will tend to a Gaussian process when M
tends to infinity.
Covariance function k(x, x’) non-stationary for
neural nets with probit
and Gaussian
activations.
Weight prior determine the lengths scales of the neural net function.
Gaussian Processes 2
GP classification
Laplace approximation – in general
Laplace approximation – Bayesian logistic regression
Laplace approximation – GP
classification
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com