CS计算机代考程序代写 Bayesian The hierarchical normal model, assuming known variance

The hierarchical normal model, assuming known variance
Bayesian Statistics Statistics 4224/5224 Spring 2021
February 9, 2021
1

Please first read Sections 5.4–5.5 of Gelman et al, and review the summary notes given in Ch05b HierarchicalNormal.pdf.
The model
We assume that the observable data y1,…,yJ are normally dis- tributed,
where
yj|θ ∼ indep Normal(θj,σj2) θ1,…,θJ|μ,τ ∼ iid Normal(μ,τ2) .
The normal variances σj2 are assumed known, and the hyperpa- rameters (μ,τ) are assigned a vague prior, such as
an improper prior.
p(μ,τ) ∝ 1 ,
2

Where might such a model be applicable?
Let yij denote the response for subject i in group j, for i = 1,…,nj and j = 1,…,J.
For example, yij could be the test score achieved by the ith student at the jth school.
A reasonable sampling model is
yij|θ ∼ indep Normal(θj, σ2) .
Takey =y ̄ =1􏰃nj y andσ2=σ2/n andwehave j.jnji=1ij j j
for j = 1,…,J.
yj|θ ∼ indep Normal(θj,σj2)
3

It is perhaps not so realistic to assume that σ2 is known, but we will, for now anyway, assume exactly that. (Very soon we’ll learn more sophisticated simulation techniques that make this simplifying assumption unnecessary.)
The primary inferential goal is the estimation of θ = (θ1, . . . , θJ ). Assigning independent noninformative priors
p(θ1,…,θJ) ∝ 1
will result in separate estimates, θˆ = y for j = 1,…,J. We can
jj
call these the unpooled estimates of θ. This would be appropriate
if the schools’ students consist of independent and unrelated populations, or if the schools are administering different exams.
4

At the other extreme we could assume the θj are all equal, and assign a noninformative prior to their common value,
Pr(θ1 =···=θJ =μ)=1, p(μ)∝1. This would result in the estimates
􏰃J 1y
ˆ j=1σj2j∗
θj=μˆ=􏰃J 1=y ̄ j=1 σj2
which would be appropriate if the students were sampled from the same population, and randomly assigned to the J identical schools. We can call this estimate complete pooling — it is hard to imagine a real-world scenario where this assumption is justified.
5

Surely the best estimate of θj available from such data is some- where between these two extremes,
ˆ∗ θj=λjyj+(1−λj)y ̄ .
This estimator will result from the prior distribution θ1,…,θJ|μ,τ ∼ iid Normal(μ,τ2) .
How do we fit this model?
We will use Monte Carlo simulation to estimate the joint poste- rior distribution
p(θ, μ, τ |y) = p(τ |y)p(μ|τ, y)p(θ|μ, τ, y) ,
generating random samples by the following three-step process:
Sample τs ∼ p(τ|y), then μs ∼ p(μ|τs, y), then θs ∼ p(θ|μs, τs, y). 6

The second and third steps are straightforward. By the normal- normal conjugacy we have
where
θ |μ,τ,y ∼ indep Normal(θˆ ,V ) jjj
ˆ yj/σj2 + μ/τ2 1
θj = 1/σj2 + 1/τ2 and Vj = 1/σj2 + 1/τ2 .
Also, yj |μ, τ ∼ indep Normal(μ, σj2 + τ 2) and thus μ|τ, y ∼ Normal(μˆ, Vμ)
where
􏰃J 1y
j=12 2j J
μˆ= σj+τ andV−1=􏰄1. 􏰃J 1 μ σ2+τ2
j=1σj2+τ2 j=1 j
7

The initial step in the simulation process, τs ∼ p(τ|y), is ac- complished by evaluating p(τ|y) numerically at a discrete set of points, sampling from those points with selection probability pro- portional to the unnormalized density, then adding random jitter to better approximate the continuous distribution.
A numerical expression for an unnormalized version of the marginal posterior of τ is
J  (yj−μˆ)2 p(τ|y) ∝ 􏰅 (σj2 + τ2)−1/2 exp −2(σ2 + τ2) .
j=1 j
See
Courseworks → Files → Examples → Example05b . 8

What posterior predictives might be of interest?
Here are three possibilities:
1. We can imagine predicting the score of an additional student
at one of the previously sampled schools, simulating from
the posterior predictive density
􏰒
p(y ̃ |θ )p(θ |y)dθ given θjs, a draw from p(θj|y), by
p(y ̃ |y) =
ij ijjjj
ss2
y ̃ ∼Normal(θ,σ). ij j
9

2. Predict the true mean score θ ̃ for a (J + 1)st school, with posterior predictive density
̃ 􏰒􏰒 ̃
p(θ|y) = p(θ|μ, τ )p(μ, τ |y)dμdτ
given (μs,τs)∼p(μ,τ|y), by sss2
θ ̃ ∼ Normal(μ , (τ ) )
3. Predict the score achieved by a single student at a school
not in the initial sample, with posterior predictive density
􏰒
̃ ̃ ̃ p(y ̃|y) = p(y ̃|θ)p(θ|y)dθ ,
s
given θ ̃ in item 2 above, by
ss2
y ̃ ∼ Normal(θ ̃ ,σ ) .
10