CS计算机代考程序代写 Bayesian Multivariate normal model

Multivariate normal model
Bayesian Statistics Statistics 4224/5224 Spring 2021
April 6, 2021
1

The following notes summarize Sections 3.5–3.6 of Bayesian Data Analysis, by Andrew Gelman et al. We also include, as an example, Exercise 7.4 (page 239) of A First Course in Bayesian Statistical Methods, by Peter D. Hoff.
Hoff (2009, Chapter 7) gives a detailed account of the multivari- ate normal model, including a discussion of missing data. Note, however, that Hoff’s treatment uses a ‘semiconjugate prior’ on μ and Σ, not the fully conjugate prior that is described below.
Reich and Ghosh (2019, sections 2.1.6–2.1.7) also discuss the multivariate normal model, but consider inference about μ and Σ separately.
2

Multivariate normal model with known variance
Here we give a somewhat formal account of the distributional results of Bayesian inference for the parameters of a multivariate normal distribution.
Multivariate normal likelihood
The basic model to be discussed concerns an observed vector y of d components, with the multivariate normal distribution
y|μ, Σ ∼ Normald(μ, Σ) ,
where μ is a (column) vector of length d, and Σ is a d×d variance-
covariance matrix, which is symmetric and positive definite. 3

The likelihood function for a single observation is
−1/2 􏰂1 T−1 􏰏
p(y|μ,Σ)∝|Σ| exp −2(y−μ) Σ (y−μ) ,
and for a sample of n independent and identically distributed
observations y1,…,yn is
p(y1,…,yn|μ,Σ)∝|Σ|−n/2exp − 􏰄(yi−μ)TΣ−1(yi−μ) .
 1 n   2i=1 
Conjugate analysis
We analyze the multivariate normal model by first considering the case of known Σ.
4

Conjugate prior distribution for μ with known Σ. The log- likelihood is a quadratic form in μ, and therefore the conjugate prior distribution for μ is the multivariate normal distribution, which we parameterize as
μ ∼ Normald(μ0, Λ0) .
Posterior distribution for μ with known Σ. The posterior distri-
bution of μ, p(μ|y,Σ), satisfies
 1  n  
∝ exp − (μ−μ0)TΛ−1(μ−μ0)+ 􏰄(yi −μ)TΣ−1(yi −μ) , 20
 i=1  which is an exponential of a quadratic form in μ.
5

Completing the quadratic form and pulling out constant factors gives
􏰂1 T−1 􏰏
p(μ|y,Σ) ∝ exp −2(μ − μn) Λn (μ − μn) = Nd(μ|μn,Λn) ,
where
respectively.
μn = (Λ−1 + nΣ−1)−1(Λ−1μ0 + nΣ−1y ̄) 00
Λn = (Λ−1 + nΣ−1)−1 . 0
These are similar to the results for the univariate normal model,
discussed in the ‘0126’ slide deck, the posterior mean being a
weighted average of the data and the prior mean, with weights
given by the data and prior precision matrices, nΣ−1 and Λ−1, 0
6

Writing
Λ−1 = Λ−1 + nΣ−1 n0
shows that the posterior precision is the sum of the prior and data precisions.
Posterior predictive distribution for new data. We now work out the analytic form of the posterior predictive distribution for new observations, y ̃ ∼ Normald(μ, Σ).
As with the univariate normal, we first note that the joint distri- bution,
p(y ̃, μ|y) = Nd(y ̃|μ, Σ) × Nd(μ|μn, Λn) ,
is the exponential of a quadratic form in (y ̃, μ); hence (y ̃, μ) have a joint 2d-variate normal posterior distribution, and so the marginal posterior of y ̃ is d-variate normal.
7

Still assuming the covariance matrix Σ is known, we can deter- mine the posterior mean and variance of y ̃ as follows.
First,
Also,
and thus
E(y ̃|y) = E [E(y ̃|μ, y)|y] = E(μ|y) = μn .
Var(y ̃|y) = Var [E(y ̃|μ, y)|y] + E 􏰜Var( ̃|μ, y)|y􏰝 Var(y ̃|y) = Var(μ|y) + E(Σ|y) = Λn + Σ .
Noninformative prior density for μ. A noninformative uniform
prior density is p(μ) ∝ 1, obtained in the limit as the precision
tends to zero in the sense that |Λ−1| → 0; in the limit of infinite 0
prior variance (zero prior precision), the prior mean is irrelevant. 8

Though this choice of prior density does not combine with the likelihood to form a proper joint probability for μ and y, the posterior density obtained by applying Bayes’ rule is a proper posterior density.
The posterior density is proportional to the likelihood, which is an exponential of a quadratic form in μ.
Completing the quadratic form and pulling out constant terms yields the posterior distribution for μ, given the uniform prior density, as
μ|Σ, y ∼ Normald(y ̄, Σ/n) .
9

Multivariate normal with unknown mean and variance
Conjugate inverse-Wishart family of prior distributions
Recall that the conjugate distribution for the univariate normal with unknown mean and variance is the normal-inverse-χ2 distri- bution.
We can use the inverse-Wishart distribution, a multivariate gen- eralization of the scaled-inverse-χ2, to describe the prior distri- bution of the matrix Σ.
The conjugate prior distribution for (μ,Σ), the normal-inverse- Wishart, is conveniently parameterized in terms of hyperparam- eters (μ0,κ0,Λ0,ν0):
10

Σ ∼ InvWishart(ν0, Λ−1) 0
μ|Σ ∼ Normald(μ0, Σ/κ0) .
The parameters ν0 and Λ0 describe the degrees of freedom and
the scale matrix for the inverse-Wishart distribution on Σ.
The remaining parameters are the prior mean, μ0, and the num-
ber of prior measurements, κ0, on the Σ scale.
Multiplying the prior density by the normal likelihood results in a
posterior density of the same family,
Σ|y ∼ InvWishart(νn, Λ−1)
n μ|Σ, y ∼ Normald(μn, Σ/κn) ,
11

with parameters
μn= κ0 μ0+ n y ̄
κ0 + n κ0 + n κn = κ0 + n
νn = ν0 + n
Λn =Λ0+S+ κ0n (y ̄−μ0)(y ̄−μ0)T
κ0 + n
where S is the sum of squares matrix about the sample mean,
n
S = 􏰄 (yi − y ̄)(yi − y ̄)T .
i=1
Samples from the joint posterior distribution of (μ,Σ) are easily
obtained using the following procedure:
• first, draw Σt ∼ Inv-Wishart(μn, Λ−1),
n
• then draw μt ∼ Normald(μn, Σt/κn) .
12

Post.sim.MVN <- function(Sim.size, y, mu0, k0, nu0, Lambda0){ T <- Sim.size; n <- dim(y)[1]; d <- dim(y)[2]; ybar <- apply(y, 2, mean); S <- (n-1) * cov(y); k.n <- k0 + n; mu.n <- (k0*mu0+n*ybar)/k.n; nu.n <- nu0 + n; Lambda.n <- Lambda0 + S + k0*n/k.n * outer(ybar-mu0, ybar-mu0) Lambda.n.inv <- solve(Lambda.n); rm(Lambda.n); mu.sim <- matrix(rep(NA, T*d), T, d ) Sigma.sim <- matrix(rep(NA, T*d^2), T, d^2) for(t in 1:T){ Sigma <- solve(rWishart(1, nu.n, Lambda.n.inv)[,,1]) mu <- mvrnorm(1, mu.n, 1/k.n*Sigma) mu.sim[t,] <- mu; rm(mu); Sigma.sim[t,] <- c(Sigma); rm(Sigma); } list(mu.sim=mu.sim, Sigma.sim=Sigma.sim) } 13 To draw from the posterior predictive distribution of a new ob- servation, draw • y ̃∼Normald(μt,Σt) given the already drawn values of μt and Σt. y.tilde <- matrix(NA, Sim.size, d); for(t in 1:Sim.size){ mu <- mu.sim[t,]; Sigma <- matrix(Sigma.sim[t,], d, d); y.tilde[t,] <- mvrnorm(1, mu, Sigma); rm(mu); rm(Sigma); } Different noninformative prior distributions Inverse-Wishart with d + 1 degrees of freedom. Setting Σ ∼ Inv-Wishart(d + 1, I) 14 has the appealing feature that each of the correlations in Σ has, marginally, a uniform prior distribution. (The joint distribution is not uniform, however, because of the constraints that the correlation matrix be positive definite.) Inverse-Wishart with d−1 degrees of freedom. Another proposed noninformative prior distribution is the multivariate Jeffreys prior density p(μ, Σ) ∝ |Σ|−(d+1)/2 , which is the limit of the conjugate prior density as κ0 → 0, ν0 →−1, and |Λ0|→0. The corresponding posterior distribution can be written as Σ|y ∼ InvWishart(n − 1, S−1) μ|Σ, y ∼ Normald(y ̄, Σ/n) . 15 Example See http://www2.stat.duke.edu/~pdh10/FCBS/Exercises/agehw.dat for data on the ages of 100 opposite-sex married couples sampled from the U.S. population. Assume the sampling model y1,...,yn|μ,Σ ∼ iid Normal2(μ,Σ) . Use the conjugate prior described above, with μ0 = (45, 40) and κ0 = 1 and ν0 = 3 and Λ0 = I2. 16 1. Obtain a Monte Carlo approximation to p(μ,Σ|y1,...,yn). 2. Plot the joint distribution of μh and μw, and also the marginal posterior density of the correlation between yh and yw, the ages of a husband and wife. 3. Obtain 95% posterior intervals for μh, μw, and the correlation coefficient. Solution See Courseworks → Files → Examples → Example03b MVN . 17