CS计算机代考程序代写 Bayesian algorithm 1 Proof of rejection sampling

1 Proof of rejection sampling

We will show that rejection sampling results in draws from the target dis-

tribution for the case of a continuous random variable, θ. The proof in the

discrete case follows very similar lines.

In this proof we will use the symbol data to denote the observed data.

So if we observed Yobs = (Y1, . . . , Yn)

data = (Y1, . . . , Yn)


.

For Bayesian inference the target density is the posterior

p(θ|data) =
p(data|θ)p(θ)∫
p(data|θ)p(θ) dθ

=
q(θ|data)∫
q(θ|data) dθ

(1)

where q(θ|data) = p(data|θ)p(θ) is the unnormalised posterior density which
is the likelihood multiplied by the prior. For convenience we let I =


q(θ|data) dθ

which is a constant not depending on θ. Let g(θ) denote an appoximation to

the posterior density from which we can easily sample. If we want to refer

to the value of this density at a particular value θ = a, say, we will write

gθ(a). (The notation g(θ) is really just shorthand for the density function

gθ(a),∀a). Similarly, if we want to refer to the posterior density evaluated at
a particular value, θ = a, say we will use the notation pθ|data(a), and similarly

for the unnormalised posterior density qθ|data(a).

U will denote a Uniform(0, 1) random variable.

r(θ) = q(θ|data)/g(θ) is the importance ratio
M is an upper bound for the importance ratio i.e r(θ) ≤M, ∀θ
The rejection sampling algorithm generates draws of θ from g(θ), then draws

U and accepts θ if U ≤ q(θ|data)
Mg(θ)

i.e if U ≤ r(θ)/M.
We note:

1. if U ∼ Uniform(0, 1), and 0 ≤ a ≤ 1, Pr(U ≤ a) =
∫ a
0
1 du = a.

1

2. For continuous random variables X and Y

Pr(X ≤ Y ) =

y

Pr(X ≤ Y |Y = y)pY (y)dy

=


y

Pr(X ≤ y)pY (y)dy (2)

3. For some function h(Y )

Pr(X ≤ h(Y ), Y < c) = ∫ c −∞ Pr(X ≤ h(Y ), Y < c|Y = a)pY (a) da (3) = ∫ c −∞ Pr(X ≤ h(Y ), Y = a)pY (a) da (4) We want to show p ( θ|U ≤ q(θ|data) Mg(θ) ) = p(θ|data). We need to keep in mind that θ is originally generated from g(θ). It is easiest to work with the distribution function, hence we consider Pr ( θ < c|U ≤ q(θ|data) Mg(θ) ) and want to show Pr ( θ < c|U ≤ q(θ|data) Mg(θ) ) = Pr θ|data (θ < c|data),∀c where Prθ|data(θ < c|data) = ∫ c −∞ p(θ|data) dθ. Using Bayes theorem, we can write Pr ( θ < c|U ≤ q(θ|data) Mg(θ) ) = Pr(U ≤ q(θ|data) Mg(θ) |θ < c) Prg(θ < c) Pr(U ≤ q(θ|data) Mg(θ) ) (5) where Prg(θ < c) = ∫ c −∞ g(θ) dθ. 2 Now noting that U and θ (and hence q(θ|data) Mg(θ) ) are both random variables we can use (2) to write the denominator of (5) as Pr ( U ≤ q(θ|data) Mg(θ) ) = ∫ a Pr(U ≤ q(θ|data) Mg(θ) |θ = a)gθ(a) da∫ a qθ|data(a) Mgθ(a) gθ(a) da∫ a qθ|data(a) M da = I M (6) Turning, now to the numerator of (5) we note that Pr(U ≤ q(θ|data) Mg(θ) |θ < c) = Pr(U ≤ q(θ|data) Mg(θ) , θ < c) Prg(θ < c) (7) = ∫ c −∞ Pr(U ≤ q(θ|data) Mg(θ) , θ = a) da Prg(θ < c) (8) = ∫ c −∞ Pr(U ≤ q(θ|data) Mg(θ) |θ = a)gθ(a) da Prg(θ < c) (9) = ∫ c −∞ qθ|data(a) Mgθ(a) gθ(a) da Prg(θ < c) (10) = I Prθ|data(θ < c|data) M Prg(θ < c) (11) Substituting (11) and (6) in (5) gives Pr ( θ < c|U ≤ q(θ|data) Mg(θ) ) = I Prθ|data(θ