Stat314/461Term 4: Simple simulation methods for non-standard densities: Importance sampling
Stat314/461Term 4:
Simple simulation methods for
non-standard densities: Importance
sampling
September, 2021
1 / 13
Reminder: Basic problem of Bayesian computation
p(θ|Yobs) =
p(Yobs |θ)p(θ)∫
p(Yobs |θ)p(θ) dθ
(1)
I We specify p(θ)
I We derive p(Yobs |θ) from our data model
I Hence we can always write an expression for the
numerator of (1).
I But to do anything useful with (1) we need to be able to
integrate the numerator and this may be difficult to do
explicitly in realistic multi-parameter, non-conjugate
problems.
I Modern Bayesian computation makes extensive use of
Monte Carlo methods to simulate the posterior
distribution.
2 / 13
Importance sampling
I Suppose p(θ|Yobs) ∝ p(Yobs |θ)p(θ) does not correspond
to a standard distribution, we cannot simulate directly
from it and cannot easily integrate it.
I We can compute the unnormalised posterior
q(θ|Yobs) = p(Yobs |θ)p(θ).
I We can easily simulate from an approximation g(θ)
I Suppose we would like to compute E (h(θ)|Yobs) for some
function h(θ), e.g. h(θ) = θ or h(θ) = (θ − E (θ|Yobs))2
E (h(θ)|Yobs) =
∫
h(θ)p(θ|Yobs) dθ (2)
=
∫
h(θ)q(θ|Yobs) dθ∫
q(θ|Yobs) dθ
(3)
=
∫ [
h(θ)q(θ|Yobs)/g(θ)
]
g(θ) dθ∫
[q(θ|Yobs)/g(θ)] g(θ) dθ
(4)
3 / 13
Importance sampling algorithm when only the
unnormalised posterior is known
E (h(θ)|Yobs) =
∫ [
h(θ)q(θ|Yobs)/g(θ)
]
g(θ) dθ∫
[q(θ|Yobs)/g(θ)] g(θ) dθ
(5)
We can apply a form of Monte Carlo integration to evaluate
the numerator and denominator of (5) Suppose we want
E (h(θ)|Yobs)
for (i in 1 . . .M)
1. draw θ(i) from g(θ)
2. compute r(θ(i)) = q(θ(i)|Yobs)/g(θ(i))
3. compute h(θ(i))
4. store r(θ(i)), h(θ(i))
E (h(θ|Y) ≈
∑
i h(θ(i))r(θ(i))∑
i r(θ(i))
(6)
4 / 13
Effective Monte Carlo sample size for importance
sampling
r(θ(i)) = q(θi |Yobs).
Let r̃(θi) = r(θi)/
∑M
i=1 r(θi)) denote the normalised
importance ratios (
∑M
i=1)r̃(θi) = 1) then the effective Monte
Carlo sample size is
Neff =
1∑M
i=1 (r̃(θi))
2
. (7)
Neff ≤ M ; equality holds if the importance weights are
constant
Neff � M ; if weights are highly variable, e.g a few very large
weights.
5 / 13
Approximating the Monte Carlo error for
importance sampling
Since we have an approximation to the effective Monte Carlo
sample size we can also obtain an approximation to the Monte
Carlo error.
I E (θ|Yobs) ≈ Ê (θ|Yobs) =
∑
i r̃(θi)θi
I V (θ|Yobs) ≈ V̂ (θ|Yobs) =
∑
i r̃(θi)(θi − Ê (θ|Y
obs))2
I sd(θ|Yobs) ≈ ŝd(θ|Yobs) =
√
V̂ (θ|Yobs)
I MCerror ≈ ŝd(θ|Yobs)/
√
we recall that the r̃(θi are the normalised importance
ratios (or weights and, so, sum to one).
6 / 13
Importance sampling when the approximating
density is the prior
Note if g(θ) is the prior p(θ) then
r(θ) =
p(θ|Yobs)
p(θ)
=
p(Yobs |θ)p(θ)
p(θ)
(8)
= p(Yobs |θ) = (9)
i.e the likelihood. The prior weighted by the likelihood is the
posterior!
7 / 13
Comments on importance sampling algorithm
I The formulation is very general. If we are interested
particular posterior probabilities, e.g. Pr(a ≤ θ ≤ b|Yobs)
just define h(θ) = I (a ≤ θ ≤ b).
I For most practical purposes, can just treat the weighted
sample of θ’s as a sample from the posterior. Need to
keep in mind the importance sample just provides an
approximation to the posterior
I Plotting posterior densities or histograms is a bit awkward
because of the weights. – A simple solution is to resample
the original θ sample with probability proportional to the
importance sampling ratio. Then plot the resulting
sample.
8 / 13
More comments on importance sampling
I If the distribution of importance sampling weights is very
uneven with a small number of θ values having very large
weights, then most of the information about the posterior
will be concentrated on only a few sample points. Not
ideal and means the effective Monte Carlo sample size
will be much less than the nominal size.
I It is important and helpful to plot a histogram of log
importance weights before proceeding to inference.
Concentrate on the distribution of largest importance
sampling weights, e.g top 30%.
9 / 13
History and applications of importance sampling
I Prior to the MCMC revolution beginning around 1990,
importance sampling was an active area of research and
practice in Bayesian statistics, with various clever ways of
forming approximations to the posterior developed.
I It still features today as a reasonable approach for simple
problems and as a component of more advanced methods
such as Sequential Monte Carlo.
I Recently, a form of importance of sampling (Pareto
smoothed importance sampling) has found application in
the development of “leave one out cross-validation” for
model comparison and selection. Here the posterior needs
to repeatedly re-computed on data with one observation
dropped each time. Importance sampling provides an
efficient means of doing that.
https://arxiv.org/pdf/1507.02646.pdf
10 / 13
https://arxiv.org/pdf/1507.02646.pdf
Examples of good and bad importance samplers
I using a t with small df to approximate a Normal
distribution is good.
I using a Normal to approximate a t with low degrees of
freedom is not so good.
11 / 13
Importance sampling approximation to normal
based on t3(0,1) approximation
12 / 13
Application of importance sampling to the
“unknown N, known p” problem
see separate importance sampling code
importance_sampling_examples.2021.Rmd
13 / 13