1 Proof of rejection sampling
We will show that rejection sampling results in draws from the target dis-
tribution for the case of a continuous random variable, θ. The proof in the
discrete case follows very similar lines.
In this proof we will use the symbol data to denote the observed data.
So if we observed Yobs = (Y1, . . . , Yn)
′
data = (Y1, . . . , Yn)
′
.
For Bayesian inference the target density is the posterior
p(θ|data) =
p(data|θ)p(θ)∫
p(data|θ)p(θ) dθ
=
q(θ|data)∫
q(θ|data) dθ
(1)
where q(θ|data) = p(data|θ)p(θ) is the unnormalised posterior density which
is the likelihood multiplied by the prior. For convenience we let I =
∫
q(θ|data) dθ
which is a constant not depending on θ. Let g(θ) denote an appoximation to
the posterior density from which we can easily sample. If we want to refer
to the value of this density at a particular value θ = a, say, we will write
gθ(a). (The notation g(θ) is really just shorthand for the density function
gθ(a),∀a). Similarly, if we want to refer to the posterior density evaluated at
a particular value, θ = a, say we will use the notation pθ|data(a), and similarly
for the unnormalised posterior density qθ|data(a).
U will denote a Uniform(0, 1) random variable.
r(θ) = q(θ|data)/g(θ) is the importance ratio
M is an upper bound for the importance ratio i.e r(θ) ≤M, ∀θ
The rejection sampling algorithm generates draws of θ from g(θ), then draws
U and accepts θ if U ≤ q(θ|data)
Mg(θ)
i.e if U ≤ r(θ)/M.
We note:
1. if U ∼ Uniform(0, 1), and 0 ≤ a ≤ 1, Pr(U ≤ a) =
∫ a
0
1 du = a.
1
2. For continuous random variables X and Y
Pr(X ≤ Y ) =
∫
y
Pr(X ≤ Y |Y = y)pY (y)dy
=
∫
y
Pr(X ≤ y)pY (y)dy (2)
3. For some function h(Y )
Pr(X ≤ h(Y ), Y < c) =
∫ c
−∞
Pr(X ≤ h(Y ), Y < c|Y = a)pY (a) da
(3)
=
∫ c
−∞
Pr(X ≤ h(Y ), Y = a)pY (a) da (4)
We want to show
p
(
θ|U ≤
q(θ|data)
Mg(θ)
)
= p(θ|data).
We need to keep in mind that θ is originally generated from g(θ). It is easiest
to work with the distribution function, hence we consider
Pr
(
θ < c|U ≤
q(θ|data)
Mg(θ)
)
and want to show
Pr
(
θ < c|U ≤
q(θ|data)
Mg(θ)
)
= Pr
θ|data
(θ < c|data),∀c
where Prθ|data(θ < c|data) =
∫ c
−∞ p(θ|data) dθ. Using Bayes theorem, we can
write
Pr
(
θ < c|U ≤
q(θ|data)
Mg(θ)
)
=
Pr(U ≤ q(θ|data)
Mg(θ)
|θ < c) Prg(θ < c)
Pr(U ≤ q(θ|data)
Mg(θ)
)
(5)
where Prg(θ < c) =
∫ c
−∞ g(θ) dθ.
2
Now noting that U and θ (and hence
q(θ|data)
Mg(θ)
) are both random variables
we can use (2) to write the denominator of (5) as
Pr
(
U ≤
q(θ|data)
Mg(θ)
)
=
∫
a
Pr(U ≤
q(θ|data)
Mg(θ)
|θ = a)gθ(a) da∫
a
qθ|data(a)
Mgθ(a)
gθ(a) da∫
a
qθ|data(a)
M
da
=
I
M
(6)
Turning, now to the numerator of (5) we note that
Pr(U ≤
q(θ|data)
Mg(θ)
|θ < c) =
Pr(U ≤ q(θ|data)
Mg(θ)
, θ < c)
Prg(θ < c)
(7)
=
∫ c
−∞ Pr(U ≤
q(θ|data)
Mg(θ)
, θ = a) da
Prg(θ < c)
(8)
=
∫ c
−∞ Pr(U ≤
q(θ|data)
Mg(θ)
|θ = a)gθ(a) da
Prg(θ < c)
(9)
=
∫ c
−∞
qθ|data(a)
Mgθ(a)
gθ(a) da
Prg(θ < c)
(10)
=
I Prθ|data(θ < c|data)
M Prg(θ < c)
(11)
Substituting (11) and (6) in (5) gives
Pr
(
θ < c|U ≤
q(θ|data)
Mg(θ)
)
=
I Prθ|data(θ