CS计算机代考程序代写 Bayesian Part3. Normal Model.

Part3. Normal Model.

Elena Moltchanova

2021S2

A note on notation -1:

The normal distribution can be parametrised via mean and
standard deviation:

X ∼ N(µ, σ)

or via mean and variance (the squared standard deviation):

X ∼ N(µ, σ2)

A note on notation -2:

In Bayesian statistics, it is often convenient to express the normal
distribution via precision, i.e., inverse variance:

X ∼ N(µ, τ),

where τ = 1
σ2

Make sure to check when you are reading a new book or using a
new software!

Normal distribution with known precision and unknown
mean:

Let y1, …, yn be an i.i.d. sample from a normal distribution with
unknown mean µ and known precision (inverse variance) τ :

yi |µ, τ ∼ N(µ, τ).

Assume a normal prior for µ:

µ ∼ N(µ0, τ0).

Analysis.

Let’s derive the classical MLE for µ and the Bayesian posterior
distribution for µ.

. . .

Normal distribution with unknown precision and known
mean:

Let y1, …, yn be an i.i.d. sample from a normal distribution with
unknown mean µ and known precision (inverse variance) τ :

yi |µ, τ ∼ N(µ, τ).

Assume now that the mean µ is known and it is the precision τ we
are after. The precision (inverse variance) is a positive number, so a
technically suitable distribution is a gamma distribution:

τ ∼ Γ(a, b).

Analysis.

Let’s derive the classical MLE for τ and the Bayesian posterior
distribution for τ .

. . .

Bayesian vs. Classical:

What happens to the posterior inference when the sample size n
increases?

. . .

Introducing WinBUGS.

. . .

Summary of numerical methods:

I Use simulations (i.e., your own toy datasets) to learn new
methods

I Don’t forget to check for convergence

Linear Regression Model. Classical Set-Up:

yi = a + bxi + �i

where

�i ∼ N(0, σ2)

Note: σ2 is the variance of the above normal distribution.

A note on interpretation:

yi = a + bxi + �i

The intercept a is the expected value when x = 0.

Each one unit increase in X is associated with an average b unit
increase/decrease in Y . (correlation does not imply causality!)

Sample-specific centering:

yi = a + b(xi − x̄) + �i

The intercept a is the expected value when x = x̄ .

Logging the response:

log(yi ) = a + bxi + �i

Each one unit increase in X is associated with an average
(eb − 1)× 100% increase/decrease in Y .

The linear regression is linear in parameters!!! (Not
variables)

Linear Regression Model. Another Way:

yi ∼ N(µi , σ2)

where

µi = a + bxi

Note: σ2 is the variance of the above normal distribution.

Linear Regression Model. Bayesian Way.

Likelihood:

yi |µi , τ ∼ N(µi , τ) (1)
µi = a + bX (2)

Priors:

a ∼ N(µa, τa) (3)
b ∼ N(µb, τb) (4)

τ ∼ Gamma(aτ , bτ ) (5)

Example: Howell’s Data

140 150 160 170 180
height

w
e

ig
h

Model:

Weighti = a + bHeighti + �i .

In Bayesian formulation:

Weighti |µi ∼ N(µi , τ),

where

µi = a + bHeighti .

The classical way-1

m1 <- lm(weight ~ height, data=d.adult) summary(m1) ## ## Call: ## lm(formula = weight ~ height, data = d.adult) ## ## Residuals: ## Min 1Q Median 3Q Max ## -11.8022 -3.0183 -0.2293 2.8117 14.7348 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -52.31618 4.52650 -11.56 <2e-16 *** ## height 0.62942 0.02924 21.52 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 4.242 on 350 degrees of freedom ## Multiple R-squared: 0.5696, Adjusted R-squared: 0.5684 ## F-statistic: 463.3 on 1 and 350 DF, p-value: < 2.2e-16 Classical assumption checking 35 40 45 50 55 60 − 1 5 − 5 0 5 1 0 1 5 Fitted values R e si d u a ls Residuals vs Fitted 479 6 174 −3 −2 −1 0 1 2 3 − 3 − 1 0 1 2 3 4 Theoretical Quantiles S ta n d a rd iz e d r e si d u a ls Normal Q−Q 479 6 174 35 40 45 50 55 60 0 .0 0 .5 1 .0 1 .5 Fitted values S ta n d a rd iz e d r e si d u a ls Scale−Location 479 6174 0.000 0.005 0.010 0.015 0.020 0.025 0.030 − 2 0 2 4 Leverage S ta n d a rd iz e d r e si d u a ls Cook's distance Residuals vs Leverage 479 6 117 The classical way-2 m2 <- lm(weight ~ I(height-mean(height)), data=d.adult) summary(m2) ## ## Call: ## lm(formula = weight ~ I(height - mean(height)), data = d.adult) ## ## Residuals: ## Min 1Q Median 3Q Max ## -11.8022 -3.0183 -0.2293 2.8117 14.7348 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 44.99049 0.22609 199.00 <2e-16 *** ## I(height - mean(height)) 0.62942 0.02924 21.52 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 4.242 on 350 degrees of freedom ## Multiple R-squared: 0.5696, Adjusted R-squared: 0.5684 ## F-statistic: 463.3 on 1 and 350 DF, p-value: < 2.2e-16 The classical way-3 m3 <- lm(log(weight) ~ height, data=d.adult) summary(m3) ## ## Call: ## lm(formula = log(weight) ~ height, data = d.adult) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.30191 -0.06556 0.00108 0.06525 0.33286 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.6174274 0.1022805 15.81 <2e-16 *** ## height 0.0140923 0.0006608 21.33 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.09585 on 350 degrees of freedom ## Multiple R-squared: 0.5651, Adjusted R-squared: 0.5639 ## F-statistic: 454.8 on 1 and 350 DF, p-value: < 2.2e-16 3.6 3.7 3.8 3.9 4.0 4.1 − 0 .2 0 .0 0 .2 0 .4 Fitted values R e si d u a ls Residuals vs Fitted 479 174 286 −3 −2 −1 0 1 2 3 − 3 − 1 0 1 2 3 4 Theoretical Quantiles S ta n d a rd iz e d r e si d u a ls Normal Q−Q 479 174 286 3.6 3.7 3.8 3.9 4.0 4.1 0 .0 0 .5 1 .0 1 .5 Fitted values S ta n d a rd iz e d r e si d u a ls Scale−Location 479 174 286 0.000 0.005 0.010 0.015 0.020 0.025 0.030 − 2 0 2 4 Leverage S ta n d a rd iz e d r e si d u a ls Cook's distance Residuals vs Leverage 479 94 211 The Bayesian way library(MCMCglmm) m1b <- MCMCglmm(weight ~ height, data=d.adult, verbose=F) summary(m1b) ## ## Iterations = 3001:12991 ## Thinning interval = 10 ## Sample size = 1000 ## ## DIC: 2020.144 ## ## R-structure: ~units ## ## post.mean l-95% CI u-95% CI eff.samp ## units 18.12 15.47 21.04 899.6 ## ## Location effects: weight ~ height ## ## post.mean l-95% CI u-95% CI eff.samp pMCMC ## (Intercept) -52.4646 -60.7339 -43.8509 1000 <0.001 *** ## height 0.6305 0.5742 0.6844 1000 <0.001 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Bayesian Diagnostics: Model Fit dtf <- data.frame(height=seq(130,185,.1)) # a function to produce an posterior pred. estimate pred.reg <- function(x,a,b,sigma){ a+b*x+rnorm(length(a),0,sd=sigma) } m1b.pred <- sapply(dtf$height, pred.reg, a=m1b$Sol[,1],b=m1b$Sol[,2],sigma=sqrt(m1b$VCV)) dtf$pp.mean <- apply(m1b.pred,2,mean) dtf$pp.q025 <- apply(m1b.pred,2,quantile,.025) dtf$pp.q975 <- apply(m1b.pred,2,quantile,.975) Plotting 20 40 60 130 140 150 160 170 180 height w e ig h t Bayesian diagnostics: convergence of a plot(m1b$Sol[,1],ylab='a') 4000 6000 8000 10000 12000 − 6 5 − 6 0 − 5 5 − 5 0 − 4 5 − 4 0 Iterations a Trace of var1 −70 −60 −50 −40 0 .0 0 0 .0 2 0 .0 4 0 .0 6 0 .0 8 Density of var1 N = 1000 Bandwidth = 1.177 a Bayesian diagnostics: convergence of b plot(m1b$Sol[,2],ylab='b') 4000 6000 8000 10000 12000 0 .5 5 0 .6 0 0 .6 5 0 .7 0 Iterations b Trace of var1 0.55 0.60 0.65 0.70 0.75 0 2 4 6 8 1 0 1 2 1 4 Density of var1 N = 1000 Bandwidth = 0.007604 b

Related Posts