CS计算机代考程序代写 algorithm Bayesian Monte Carlo approximation

Monte Carlo approximation
Bayesian Statistics Statistics 4224/5224 Spring 2021
January 28, 2021
1

Summarizing posterior inference
In addition to point summaries (posterior mean, median, or mode), it is nearly always important to report posterior uncer- tainty.
Our usual approach is to present quantiles of the posterior dis- tributions of estimands of interest.
A 100(1 − α)% posterior credible interval for θ is an interval [l(y),u(y)] such that
Pr{l(y)<θ θ1.
6

Here yi ∼ Binomial(n, θi) where θi ∼ Beta(1, 1), and thus θi|yi ∼ Beta(yi + 1, ni − yi + 1)
for i = 1, 2, with θ1 and θ2 independent. Data: y1 =5 and y2 =8.
Evaluate Pr(θ2 > θ1|y1, y2).
Use Monte Carlo!
Generate random samples θis ∼ Beta(yi + 1, ni − yi + 1) for s = 1,…,S.
7

Then
Pr(θ2 > θ1|y1, y2) ≈ S Find the R code in
Courseworks → Files → Examples → Example01 MC . Example: Posterior prediction
Last spring we planted n = 10 seedlings, and y = 2 survived the winter; if we plant another 10 this year, what is the probability at least one will survive the winter?
1􏰄S s s
s=1
I(θ2 > θ1)
Sampling model: y, y ̃|θ ∼ iid Binomial(n, θ).
8

Prior: θ ∼ Uniform(0, 1) = Beta(1, 1).
The posterior predictive distribution satisfies
􏰒
p(y ̃|y) =
We can approximate Pr(y ̃ > 0|y) by Monte Carlo sampling:
First sample θ1, . . . , θS ∼ p(θ|y). ss
Then sample y ̃ ∼ p(y ̃|θ ) for s = 1,…,S. 1Ss
Then Pr(y ̃ > 0|y) ≈ S 􏰃s=1 I(y ̃ > 0).
p(y ̃|θ)p(θ|y)dθ
Find the R code in
Courseworks → Files → Examples → Example02 PPD .
9

The Monte Carlo method
Suppose the posterior distribution θ|y has the following density:
0.0 0.2 0.4
0.6 0.8 1.0
And we wish to use the Monte Carlo method to approximate
• posterior probabilities, such as Pr(θ > .50|y);
• posterior quantiles (median, a 95% posterior interval); • posterior moments, such as E(θ|y) = 􏰦 θp(θ|y)dθ.
10
theta
p(theta|y)
0.0 0.5 1.0 1.5 2.0 2.5

Simulating a sequence of random draws, θ1, . . . , θS ∼ iid p(θ|y), can be visualized as randomly selecting S points in the area under the curve, and taking their θ-coordinates as our simulated sample.
●● ●


● ●
● ●
● ●●

●●●● ●
● ●

●●●● ●
● ●● ●
●● ●●● ●


●● ●● ●●

● ●● ●
●●●
●● ●●
●●●●● ●●● ●●● ●● ●
●●● ● ●●●●●●●
●● ●●●●●● ●●●●●● ● ●●●●●●
● ●● ●● ●●●●
●●● ●●●●
●●● ● ●● ● ●●● ●


● ● ●●●●●● ● ● ● ●●
●●●●●● ● ●● ●●●
●● ●●● ● ●● ●●
●● ●●●● ●●● ●●●● ●●●●
●●●●●
●● ●
●●●● ● ●
●●● ● ● ● ●
●● ●●●●
●●●● ●●●
●●● ●●● ●●●●●●●● ●●●● ●●●● ●●●
● ● ● ● ● ● ● ● ● ●● ● ● ●●
●●
●●●●● ● ● ●
● ●●●●●●●●●●●

● ●●●● ● ●●●●● ●
●●●●●●●●●●● ●●●●●● ● ● ● ● ●● ● ●● ●● ● ● ●● ●●
● ●●
0.0 0.2 0.4
0.6 0.8 1.0
theta
11
p(theta|y)
0.0 0.5 1.0 1.5 2.0 2.5

Posterior probabilities
We can approximate
Pr(θ > 0.5|y) =
􏰒∞ 0.5
p(θ|y)dθ
by 1 􏰃S I(θs > 0.5), the proportion of simulated draws that
S s=1
are greater than 0.5.
●● ●


● ●
● ●
● ●●

●●●● ●
● ●

●●● ● ● ●● ●
●●● ● ●

● ●● ●
●●● ●●

●● ●●
●● ●
●●
●●●●● ●●● ●●● ●● ●

●●● ● ●●●●
●●●
●● ●●●●●●
●● ● ● ●●●
● ●●●● ●● ● ●●●
●●● ●● ● ● ●● ● ●●● ●
●●● ●●●●
● ● ●●●●●● ● ● ● ●●


●●
● ● ●●●● ●●
●●●● ●
●●● ●●● ● ●● ● ● ●●●●●●●● ●
● ●●● ●
●● ●●●●
● ● ● ●
● ●●● ●●●
● ● ● ● ●●
●●● ●●●
● ● ●●●● ●●●●
●●●●● ●●● ● ●●●
●●
● ● ●●●● ● ●●●●●●●●●● ●●●●●● ● ● ● ● ●● ● ●● ●● ● ● ●● ●●
● ●
● ● ●
● ● ● ● ● ●
●●●● ●
● ● ●●
● ● ●● ● ● ●●
●●●●● ● ●
● ●●●●●●●●●●
●● ●
● ●●●●● ●
0.0 0.2 0.4
0.6 0.8 1.0
theta
12
p(theta|y)
0.0 0.5 1.0 1.5 2.0 2.5

Posterior quantiles
We approximate the posterior q-quantile, 0 < q < 1, by a value for which 100q% of the θs are less than this value. ● ● ●● ● ● ●● ● ● ● ● ● ●●●● ● ●●● ● ● ●● ● ● ● ●● ● ● ● ● ●●● ●● ● ●● ●● ● ●● ● ●● ● ● ●●●●● ●●● ● ● ●●● ●● ● ●●● ● ● ●●● ●●● ●● ●●●●●● ●● ● ● ● ●●● ●●●●●● ● ●●●● ● ●● ●●● ●●●● ● ● ●●● ● ●● ● ●●● ● ●● ● ● ●●●●●● ● ● ● ●● ● ● ●●● ●● ● ●● ●●● ●●● ●● ●● ●● ●●●●●●● ● ●●●● ● ●● ● ● ● ● ● ● ●●● ● ●●●●●● ●●● ● ● ●●●● ● ●●●● ●●●● ●●● ●●● ●●●●●●●● ●●●● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ●● ●●●●●●●●●●● ● ●●●●●●●●●●● ● ●●●●● ●●●●●● ● ●●●●●●●●●● ●●●●●● ● ● ● ● ●● ● ●● ●● ● ● ●● ●● ● ●●●● 0.0 0.2 0.4 0.6 0.8 1.0 The lower 5% and upper 5% of the simulated θs-values are marked by red dots. We are 90% certain that the true value of θ falls between the two dashed blue lines. 13 theta p(theta|y) 0.0 0.5 1.0 1.5 2.0 2.5 Posterior moments We approximate the posterior expectation of θ by the sample average of the simulated random draws, 􏰒 1􏰄Ss E(θ|y) = θp(θ|y)dθ ≈ S We can similarly approximate posterior expectations for various functions of the unknown parameter, 􏰒 1􏰄Ss E[h(θ)|y] = h(θ)p(θ|y)dθ ≈ S The LLN guarantees that the Monte Carlo approximation con- verges to the posterior moment, as S → ∞, and the CLT can be used to establish asymptotic bounds on the Monte Carlo error. 14 s=1 θ . h(θ ) . s=1 Sampling from mixture distributions Finite mixtures Suppose the posterior density can be expressed as a weighted average of k different density functions p(θ|y) = λ1p1(θ|y) + λ2p2(θ|y) + · · · + λkpk(θ|y) where λi ≥ 0 and λ1 + λ2 + · · · + λk = 1. Simulating a random draw θs from a finite mixture distribution is accomplished in two steps: 1. Randomly select zs ∈ {1,2,...,k}, where Pr(z = i) = λi, for i = 1,2,...,k; then 2. draw the sample θs ∼ pzs(θ|y). 15 Continuous mixtures If instead of there being k distinct component distributions, there was a continuum of them, and instead of probabilities λi = Pr(z = i) there was a probability density λ(z), we would have a continuous mixture distribution of the form 􏰒 p(θ|y) = The two-step algorithm for drawing a random sample from this distribution generalizes the algorithm given above: 1. Randomly generate zs ∼ λ(z), then 2. draw the sample θs ∼ p(θ|y; zs) . p(θ|y; z)λ(z)dz . 16 The PPD is a mixture distribution If the two-step process described above seemed familiar, it’s be- cause we’ve encountered this idea already, in our discussion of simulating data from the posterior predictive distribution. The posterior predictive distribution is a mixture of sampling distributions p(y ̃|θ), weighted according to the posterior density p(θ|y). Thus we simulate a random draw from 􏰒 p(y ̃|y) = p(y ̃|θ)p(θ|y)dθ as follows: 1. First draw a parameter value from the posterior, θs ∼ p(θ|y); ss 2. then draw the predictive quantity y ̃ ∼ p(y ̃|θ ). 17 The two-parameter normal model Building on the results we developed for estimation of the mean and variance separately, assuming the other parameter was known, we now consider the model y1, . . . , yn|μ, σ2 ∼ iid Normal(μ, σ2) and suppose we wish to do Bayesian inference about the two- dimensional model parameter θ = (μ, σ2). The likelihood for this model can be written 1n  p(y|μ, σ2) ∝ (σ2)−n/2 exp − 􏰄 (yi − μ)2  2σ2i=1  2−n/2􏰂1􏰜 2 2􏰝􏰏 =(σ ) exp −2σ2 (n−1)s +n(y ̄−μ) 18 where s2 = 1 􏰃n (yi − y ̄)2 is the sample variance of the yi’s. n−1 i=1 Conjugate prior The conjugate prior is given by μ|σ2 ∼ Normal(μ0, σ2/κ0) σ2 ∼ Inv-χ2(ν0, σ02) . The resulting posterior is given by μ|σ2, y ∼ Normal(μn, σ2/κn) σ2|y ∼ Inv-χ2(νn, σn2) 19 where κn = κ0 + n μ n = κ 0 μ 0 + n y ̄ κn νn = ν0 + n 21􏰆2 2κ0n2􏰇 σn=ν ν0σ0+(n−1)s+κ(y ̄−μ0) nn 20 Simulated draws from this posterior can be produced by σ2(s) ∼ Inv-χ2(νn, σn2) , then μ(s) ∼ Normal(μn, σ2(s)/κn) for s = 1,...,S. This Monte Carlo procedure can be implemented in R with only two lines of code: sigma2.sim <- nu.n * sigma2.n / rchisq(S, df=nu.n) mu.sim <- rnorm(S, mean=mu.n, sd=sqrt(sigma2.sim/kappa.n)) Noninformative prior A sensible vague prior density for μ and σ is uniform on (μ, log σ) or, equivalently, p(μ, σ2) ∝ (σ2)−1 . 21 The resulting posterior can be specified by μ|σ2, y ∼ Normal(y ̄, σ2/n) and σ2|y ∼ Inv-χ2(n − 1, s2) . The marginal posterior of μ, integrating over σ2, is given by μ − y ̄ 􏰣􏰣 √ 􏰣􏰣y∼tn−1 s/ n 􏰣 That is, the marginal posterior of μ is Student’s t-distribution with location y ̄ and scale s/√n and n − 1 degrees of freedom. 22