CS计算机代考程序代写 Bayesian algorithm Stat314/461Term 4: Simple simulation methods for non-standard densities: Importance sampling

Stat314/461Term 4: Simple simulation methods for non-standard densities: Importance sampling

Stat314/461Term 4:

Simple simulation methods for

non-standard densities: Importance

sampling

September, 2021

1 / 13

Reminder: Basic problem of Bayesian computation

p(θ|Yobs) =
p(Yobs |θ)p(θ)∫
p(Yobs |θ)p(θ) dθ

(1)

I We specify p(θ)

I We derive p(Yobs |θ) from our data model
I Hence we can always write an expression for the

numerator of (1).

I But to do anything useful with (1) we need to be able to
integrate the numerator and this may be difficult to do
explicitly in realistic multi-parameter, non-conjugate
problems.

I Modern Bayesian computation makes extensive use of
Monte Carlo methods to simulate the posterior
distribution.

2 / 13

Importance sampling
I Suppose p(θ|Yobs) ∝ p(Yobs |θ)p(θ) does not correspond

to a standard distribution, we cannot simulate directly
from it and cannot easily integrate it.

I We can compute the unnormalised posterior
q(θ|Yobs) = p(Yobs |θ)p(θ).

I We can easily simulate from an approximation g(θ)
I Suppose we would like to compute E (h(θ)|Yobs) for some

function h(θ), e.g. h(θ) = θ or h(θ) = (θ − E (θ|Yobs))2

E (h(θ)|Yobs) =

h(θ)p(θ|Yobs) dθ (2)

=


h(θ)q(θ|Yobs) dθ∫
q(θ|Yobs) dθ

(3)

=

∫ [
h(θ)q(θ|Yobs)/g(θ)

]
g(θ) dθ∫

[q(θ|Yobs)/g(θ)] g(θ) dθ
(4)

3 / 13

Importance sampling algorithm when only the

unnormalised posterior is known

E (h(θ)|Yobs) =
∫ [

h(θ)q(θ|Yobs)/g(θ)
]
g(θ) dθ∫

[q(θ|Yobs)/g(θ)] g(θ) dθ
(5)

We can apply a form of Monte Carlo integration to evaluate
the numerator and denominator of (5) Suppose we want
E (h(θ)|Yobs)

for (i in 1 . . .M)
1. draw θ(i) from g(θ)
2. compute r(θ(i)) = q(θ(i)|Yobs)/g(θ(i))
3. compute h(θ(i))
4. store r(θ(i)), h(θ(i))

E (h(θ|Y) ≈

i h(θ(i))r(θ(i))∑
i r(θ(i))

(6)

4 / 13

Effective Monte Carlo sample size for importance

sampling

r(θ(i)) = q(θi |Yobs).
Let r̃(θi) = r(θi)/

∑M
i=1 r(θi)) denote the normalised

importance ratios (
∑M

i=1)r̃(θi) = 1) then the effective Monte
Carlo sample size is

Neff =
1∑M

i=1 (r̃(θi))
2
. (7)

Neff ≤ M ; equality holds if the importance weights are
constant
Neff � M ; if weights are highly variable, e.g a few very large
weights.

5 / 13

Approximating the Monte Carlo error for

importance sampling

Since we have an approximation to the effective Monte Carlo
sample size we can also obtain an approximation to the Monte
Carlo error.

I E (θ|Yobs) ≈ Ê (θ|Yobs) =

i r̃(θi)θi

I V (θ|Yobs) ≈ V̂ (θ|Yobs) =

i r̃(θi)(θi − Ê (θ|Y
obs))2

I sd(θ|Yobs) ≈ ŝd(θ|Yobs) =

V̂ (θ|Yobs)

I MCerror ≈ ŝd(θ|Yobs)/

we recall that the r̃(θi are the normalised importance
ratios (or weights and, so, sum to one).

6 / 13

Importance sampling when the approximating

density is the prior

Note if g(θ) is the prior p(θ) then

r(θ) =
p(θ|Yobs)

p(θ)

=
p(Yobs |θ)p(θ)

p(θ)
(8)

= p(Yobs |θ) = (9)

i.e the likelihood. The prior weighted by the likelihood is the
posterior!

7 / 13

Comments on importance sampling algorithm

I The formulation is very general. If we are interested
particular posterior probabilities, e.g. Pr(a ≤ θ ≤ b|Yobs)
just define h(θ) = I (a ≤ θ ≤ b).

I For most practical purposes, can just treat the weighted
sample of θ’s as a sample from the posterior. Need to
keep in mind the importance sample just provides an
approximation to the posterior

I Plotting posterior densities or histograms is a bit awkward
because of the weights. – A simple solution is to resample
the original θ sample with probability proportional to the
importance sampling ratio. Then plot the resulting
sample.

8 / 13

More comments on importance sampling

I If the distribution of importance sampling weights is very
uneven with a small number of θ values having very large
weights, then most of the information about the posterior
will be concentrated on only a few sample points. Not
ideal and means the effective Monte Carlo sample size
will be much less than the nominal size.

I It is important and helpful to plot a histogram of log
importance weights before proceeding to inference.
Concentrate on the distribution of largest importance
sampling weights, e.g top 30%.

9 / 13

History and applications of importance sampling
I Prior to the MCMC revolution beginning around 1990,

importance sampling was an active area of research and
practice in Bayesian statistics, with various clever ways of
forming approximations to the posterior developed.

I It still features today as a reasonable approach for simple
problems and as a component of more advanced methods
such as Sequential Monte Carlo.

I Recently, a form of importance of sampling (Pareto
smoothed importance sampling) has found application in
the development of “leave one out cross-validation” for
model comparison and selection. Here the posterior needs
to repeatedly re-computed on data with one observation
dropped each time. Importance sampling provides an
efficient means of doing that.
https://arxiv.org/pdf/1507.02646.pdf

10 / 13

https://arxiv.org/pdf/1507.02646.pdf

Examples of good and bad importance samplers
I using a t with small df to approximate a Normal

distribution is good.
I using a Normal to approximate a t with low degrees of

freedom is not so good.

11 / 13

Importance sampling approximation to normal

based on t3(0,1) approximation

12 / 13

Application of importance sampling to the

“unknown N, known p” problem

see separate importance sampling code
importance_sampling_examples.2021.Rmd

13 / 13