CS计算机代考程序代写 Bayesian algorithm Stat314/461Term 4: Rejection sampling: A simple simulation method for non-standard (posterior) densities

Stat314/461Term 4: Rejection sampling: A simple simulation method for non-standard (posterior) densities

Stat314/461Term 4:
Rejection sampling: A simple simulation method for

non-standard (posterior) densities

September, 2021

1 / 28

Rejection sampling: Problem 1 – non-conjugate Binomial
model

Yi
indep
∼ Binomial(10, θ)

logit(θ) ∼ Normal(µ, σ2);
[
logit(θ) = log

(
θ

(1− θ)

)]
(1)

⇒ p(θ) =
1


2πσθ(1− θ)

exp

{

1

2σ2

[
log

(
θ

1− θ

)
− µ

]2}

Y =(6, 7, 5, 5, 4, 8)

p(θ|Y) =
∏i=6

i=1 binomial(Yi |10, θ)logitN(θ|µ, σ
2)∫ ∏i=6

i=1 binomial(Yi |10, θ)logitN(θ|µ, σ2) dθ


i=6∏
i=1

θYi (1− θ)10−Yi × logitN(θ|µ, σ2)

How can we simulate p(θ|Y)? 2 / 28

Rejection sampling (or acceptance sampling)

Simulate from an approximation; then accept with a probability that
depends on the ratio of the true density to the approximating density.

This idea re-appears in more advanced Bayesian computation
methods e.g. Metropolis-Hastings.

3 / 28

Rejection sampling algorithm

(stated in a general way, assuming some parameter of interest θ, and data
Y )

Suppose

p(θ|Y) =
p(Y|θ)p(θ)∫
p(Y|θ)p(θ)dθ

=
q(θ)∫

p(Y|θ)p(θ)dθ
Let g(θ) be an approximation to p(θ|Y), that can be easily simulated
from.
Set r(θ) = q(θ|Y)/g(θ)
Find M such that r(θ) ≤ M, for all θ.

1 Sample θ∗ from g(θ)
2 Accept θ∗ with probability r(θ∗)/M.
3 Repeat until the desired number of draws is obtained.

To implement step 2:

draw u ∼ Uniform(0, 1)
Accept if u ≤ r(θ)/M

4 / 28

Some characteristics of rejection sampling (i)

a good approximation g(θ) is roughly proportional to q(θ|Y) which is,
of course, proportional to p(θ|data).
if proportionality does hold

g(θ) = Kq(θ|Y)

r(θ) =
q(θ|Y)
Kq(θ|Y)

= 1/K (2)

so if we set M = 1/K all draws from g(θ) will be accepted.

5 / 28

Some characteristics of rejection sampling (ii):
Consequences of non-proportionality

If g(θ) not approximately proportional to q(θ|data)
There will be θ s.t. g(θ) << q(θ|data) Hence M must be set large to ensure r(θ) = q(θ|Y)/g(θ) ≤ M for all θ But g(θ) >> q(θ|data) for some θ, hence r(θ)/M will be low in
these high g density regions

So most draws rejected.

The worse the approximation the greater the expected number of
draws required before acceptance.

But self-diagnosing – many rejections implies a poor approximating
density

6 / 28

A poor approximating density

(consider (i) red approximating black; (ii) black approximating red)

7 / 28

A good approximating density

8 / 28

Insight: what if the approximating density is the prior

This will often be inefficient because the prior is usually more diffuse than
the posterior. Hence many draws are likely to be rejected (wasted).
Nevertheless it is interesting to think this through:
if g(θ) = p(θ) then the importance sampling ratio, r(θ) is

r(θ) =
q(θ|Y)
p(θ)

=
p(Y|θ)p(θ)

p(θ)

= p(Y|θ)︸ ︷︷ ︸
= Likelihood (3)

9 / 28

Implications of using the prior as the approximation –
cont’d

setting the approximating density g(θ) to the prior p(θ) implies the
importance sampling ratio is equal to the likelihood: r(θ) = p(Y|θ).
Hence M can be found by maximizing the likelihood p(Y|θ) wrt θ
i.e. set M to the value of the likelihood with θ set to the maximising
value, p(Y|θ̂), where θ̂ is the MLE.

10 / 28

Rejection algorithm when prior is used as the
approximating density

(M is the maximum value of the likelihood)

Sample θ∗ from the prior p(θ)

Accept θ∗ with probability p(Y|θ∗)/M.
so values of θ close to the the value that maximises the likelihood
(i.e. the MLE) have a high probability of being accepted

values far from the MLE have low probability of being accepted.

Not a bad insight on Bayesian inference; start with the prior; values
more strongly supported by the data (high likelihood) are more likely
to make it into the posterior sample;

Like giving more weight to values more strongly supported by the
data.

11 / 28

Rejection sampling for Binomial – logitNormal problem:
Using the prior as the approximator

Yi
indep
∼ Binomial(10, θ)

logit(θ) ∼ Normal(µ, σ2) (4)

p(θ|Y,N) ∝

i

Binomial(Yi |10, θ)× logitN(θ|µ, σ2) (5)

Y = (6, 7, 5, 5, 4, 8), N = 10
If we use the prior as the approximating density:

r(θ) =


i Binomial(Yi |10, θ)× logitN(θ|µ, σ

2)

logitN(θ|µ, σ2)
=

i

Binomial(Yi |10, θ) ∝ θ

i Yi (1− θ)

i (10−Yi ) (6)

12 / 28

Rejection sampler using prior as the approximator cont’d


i Yi = 35;


i Ni = 60

set θMLE =

i Yi/

i Ni = 35/60

set M =

i Binomial(Yi |10, θ = θMLE )
draw θ∗ ∼ logitN(µ, σ2)
Accept with probability


i Binomial(Yi |10, θ = θ

∗)/M

Repeat until desired simulation size is reached

For acceptance step it is easiest to work on the log scale:

draw U ∼ Uniform(0, 1)
accept θ∗ if logU < log r(θ∗)− logM)) 13 / 28 Performance of the rejection sampler with prior as the approximator -see R script for code p(θ) = logitN(logit(0.5), 5) : acceptance rate 5.4% p(θ) = logitN(logit(0.5), 1) : acceptance rate 24.7% 14 / 28 Rejection sampling for Binomial - logitNormal problem: Using a beta approximation to the posterior as the approximator The conjugate prior for Binomial likelihood is the beta distribution If we can construct a beta approximation to the logit-normal prior we can obtain a beta approximation to the posterior using the standard updating results for the Binomial - beta model. A beta approximation to the logitN(µ, σ2) distribution can be obtained by matching the mean and variance of the two densities. 15 / 28 Rejection sampling using a beta approximation to the posterior, cont’d if θ ∼ Beta(a, b) E(θ) = a a + b (7) V(θ) = ab (a + b)2(a + b + 1) (8) obtain E(θ) and V(θ) for θ ∼ logitN(µ, σ2) by simulation or numerical integration (see logitnorm package). Use (7) and (8) to solve for a and b in terms of E(θ) and V(θ) Beta(a + ∑ i Yi , b + ∑ i (N − Yi )) is then an approximation to the posterior. 16 / 28 Rejection sampling using a beta approximation to the posterior, cont’d r(θ) = p(θ|Y,N) g(θ) ∝ θ ∑ i Yi (1− θ) ∑ i (N−Yi ) × logitN(θ|µ, σ2) beta (θ|a + ∑ i Yi , b + ∑ i (Ni − Yi )) = rA(θ), say Find log(M) = max log(rA(θ)) draw θ∗ from θ ∼ Beta(a + ∑ i Yi , b + ∑ i (Ni − Yi )) compute log(rA(θ ∗)) draw U ∼ Uniform(0, 1) Accept θ∗ if log(U) ≤ log(rA(θ∗))− log(M) 17 / 28 Performance of the rejection sampler using a beta approximation to the posterior as the approximator -see R script for code p(θ) = logitN(logit(0.5), 5): acceptance rate 71.5% p(θ) = logitN(logit(0.5), 1): acceptance rate 70.7% 18 / 28 Proof that rejection sampling works see separate notes 19 / 28 Rejection sampler Example 2: Population estimation with known capture probability Consider the following problem: X “successes” observed from a sequence of Bernoulli trials with known success probability θ but unknown number of trials, N. We need to estimate N e.g. N is the unknown size of a population, X is the observed number of “captures” from the population and θ is the capture probability -assumed known here but in practice estimated by clever methods (e.g. census coverage). The likelihood for this problem is p(X |N, θ) = ( N X ) θX (1− θ)N−X i.e. a Binomial likelihood but with N as the unknown. 20 / 28 Notes on the likelihood p(X |N, θ) = ( N X ) θX (1− θ)N−X (9) ∝ ( N X ) (1− θ)N−X 1 “data” in this example is just X - a single count hence not boldfaced. 2 the likelihood is zero for values of N < X so no support from the data for values of the population size less than the observed count. 21 / 28 The posterior for the unknown number of trials We want p(N|X , θ) which is shorthand for Pr(N = n|X , θ), for n = 0, 1, . . . . Combining the binomial likelihood (9) with a prior for N, p(N), (keep in mind N is discrete) leads to Pr(N = n|X , θ) = ( N X ) θX (1− θ)N−X Pr(N = n)∑ m ( m X ) θX (1− θ)m−X Pr(N = m) (10) and we need to compute this for every possible n How can we simulate a sample of values from this posterior pmf? 22 / 28 The inverse distribution method If our prior for N was discrete uniform or some other bounded distribution we could tabulate the entire distribution. This means we could also tabulate the cumulative distribution function which would just be Pr(N ≤ q|X , θ) = n=q∑ n=nmin Pr(N = n|X , θ) (11) In this case (bounded prior) we could use the inverse distribution method to simulate the posterior for N. If the cdf, F is known and U ∼ Uniform(0, 1) then F−1(U) is a draw from the target distribution (i.e distribution for which F is the cdf). But if there is no a priori bound or the bound is very large this could be computationally inefficient. 23 / 28 Back to the unknown N , known p (aka θ) problem We have Pr(N = n|X , θ) = ( n X ) θX (1− θ)n−X Pr(N = n)∑ m ( m X ) θX (1− θ)m−X Pr(N = m) , n = 0, 1, 2, . . . (12) Suppose p(N) is defined to be discrete uniform on some suitably chosen interval: N ∼ Uniform(L1, L2) then Pr(N = n|X , θ) = ( n X ) θX (1− θ)n−X∑ m ( m X ) θX (1− θ)m−X ; L1 ≤ n ≤ L2 (13) ∝ ( n X ) θX (1− θ)n−X ; L1 ≤ n ≤ L2 (14) = 0; n < L1, n > L2 (15)

i.e posterior pmf is just the likelihood “normalised” to integrate to 1.

24 / 28

Rejection sampling for N assuming a uniform prior

For given success probability θ and observed count X the value of N that
maximises the likelihood is N̂ = X/θ.
If g(N) is also uniform on (L1, L2), then previous argument suggests the
following rejection sampling algorithm

1 set M = p(X )|N = N̂, θ) ∝
(

X

)
(1− θ)N̂−X

2 draw N∗ uniformly from U(L1, L2)

3 accept with probability
(N


X )θ

X (1−θ)N
∗−X

M

25 / 28

Posterior for N given X = 4350, θ = 0.87

26 / 28

posterior quantiles

probs=c(0.025,0.05,0.5,0.95,0.975))

quant_rejection <- quantile(possamp,probs=probs) quant_rejection 2.5% 5% 50% 95% 97.5% 4945 4956 5000 5044 5053 true_quantiles 2.5% 5% 50% 95% 97.5% 4947 4956 5000 5045 5054 prior_quantiles 2.5% 5% 50% 95% 97.5% 4400.000 4444.000 5335.000 6243.050 6288.125 27 / 28 Concluding remarks on rejection sampling Potentially inefficient if the approximating density is a poor approximation. A poor approximation is immediately revealed by low acceptance rates; An “adaptive” variant of rejection sampling is discussed in the literature - the idea is to improve the approximation as a simulation progresses. Rejection sampling struggles as the dimension of the problem increases. 28 / 28