Part3. Normal Model.
Part3. Normal Model.
Elena Moltchanova
2021S2
A note on notation -1:
The normal distribution can be parametrised via mean and
standard deviation:
X ∼ N(µ, σ)
or via mean and variance (the squared standard deviation):
X ∼ N(µ, σ2)
A note on notation -2:
In Bayesian statistics, it is often convenient to express the normal
distribution via precision, i.e., inverse variance:
X ∼ N(µ, τ),
where τ = 1
σ2
.
Make sure to check when you are reading a new book or using a
new software!
Normal distribution with known precision and unknown
mean:
Let y1, …, yn be an i.i.d. sample from a normal distribution with
unknown mean µ and known precision (inverse variance) τ :
yi |µ, τ ∼ N(µ, τ).
Assume a normal prior for µ:
µ ∼ N(µ0, τ0).
Analysis.
Let’s derive the classical MLE for µ and the Bayesian posterior
distribution for µ.
. . .
Normal distribution with unknown precision and known
mean:
Let y1, …, yn be an i.i.d. sample from a normal distribution with
unknown mean µ and known precision (inverse variance) τ :
yi |µ, τ ∼ N(µ, τ).
Assume now that the mean µ is known and it is the precision τ we
are after. The precision (inverse variance) is a positive number, so a
technically suitable distribution is a gamma distribution:
τ ∼ Γ(a, b).
Analysis.
Let’s derive the classical MLE for τ and the Bayesian posterior
distribution for τ .
. . .
Bayesian vs. Classical:
What happens to the posterior inference when the sample size n
increases?
. . .
Introducing WinBUGS.
. . .
Summary of numerical methods:
I Use simulations (i.e., your own toy datasets) to learn new
methods
I Don’t forget to check for convergence
Linear Regression Model. Classical Set-Up:
yi = a + bxi + �i
where
�i ∼ N(0, σ2)
Note: σ2 is the variance of the above normal distribution.
A note on interpretation:
yi = a + bxi + �i
The intercept a is the expected value when x = 0.
Each one unit increase in X is associated with an average b unit
increase/decrease in Y . (correlation does not imply causality!)
Sample-specific centering:
yi = a + b(xi − x̄) + �i
The intercept a is the expected value when x = x̄ .
Logging the response:
log(yi ) = a + bxi + �i
Each one unit increase in X is associated with an average
(eb − 1)× 100% increase/decrease in Y .
The linear regression is linear in parameters!!! (Not
variables)
Linear Regression Model. Another Way:
yi ∼ N(µi , σ2)
where
µi = a + bxi
Note: σ2 is the variance of the above normal distribution.
Linear Regression Model. Bayesian Way.
Likelihood:
yi |µi , τ ∼ N(µi , τ) (1)
µi = a + bX (2)
Priors:
a ∼ N(µa, τa) (3)
b ∼ N(µb, τb) (4)
τ ∼ Gamma(aτ , bτ ) (5)
Example: Howell’s Data
30
40
50
60
140 150 160 170 180
height
w
e
ig
h
t
Model:
Weighti = a + bHeighti + �i .
In Bayesian formulation:
Weighti |µi ∼ N(µi , τ),
where
µi = a + bHeighti .
The classical way-1
m1 <- lm(weight ~ height, data=d.adult)
summary(m1)
##
## Call:
## lm(formula = weight ~ height, data = d.adult)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.8022 -3.0183 -0.2293 2.8117 14.7348
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -52.31618 4.52650 -11.56 <2e-16 ***
## height 0.62942 0.02924 21.52 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.242 on 350 degrees of freedom
## Multiple R-squared: 0.5696, Adjusted R-squared: 0.5684
## F-statistic: 463.3 on 1 and 350 DF, p-value: < 2.2e-16
Classical assumption checking
35 40 45 50 55 60
−
1
5
−
5
0
5
1
0
1
5
Fitted values
R
e
si
d
u
a
ls
Residuals vs Fitted
479
6
174
−3 −2 −1 0 1 2 3
−
3
−
1
0
1
2
3
4
Theoretical Quantiles
S
ta
n
d
a
rd
iz
e
d
r
e
si
d
u
a
ls
Normal Q−Q
479
6
174
35 40 45 50 55 60
0
.0
0
.5
1
.0
1
.5
Fitted values
S
ta
n
d
a
rd
iz
e
d
r
e
si
d
u
a
ls
Scale−Location
479
6174
0.000 0.005 0.010 0.015 0.020 0.025 0.030
−
2
0
2
4
Leverage
S
ta
n
d
a
rd
iz
e
d
r
e
si
d
u
a
ls
Cook's distance
Residuals vs Leverage
479
6
117
The classical way-2
m2 <- lm(weight ~ I(height-mean(height)), data=d.adult)
summary(m2)
##
## Call:
## lm(formula = weight ~ I(height - mean(height)), data = d.adult)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.8022 -3.0183 -0.2293 2.8117 14.7348
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 44.99049 0.22609 199.00 <2e-16 ***
## I(height - mean(height)) 0.62942 0.02924 21.52 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.242 on 350 degrees of freedom
## Multiple R-squared: 0.5696, Adjusted R-squared: 0.5684
## F-statistic: 463.3 on 1 and 350 DF, p-value: < 2.2e-16
The classical way-3
m3 <- lm(log(weight) ~ height, data=d.adult)
summary(m3)
##
## Call:
## lm(formula = log(weight) ~ height, data = d.adult)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.30191 -0.06556 0.00108 0.06525 0.33286
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.6174274 0.1022805 15.81 <2e-16 ***
## height 0.0140923 0.0006608 21.33 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.09585 on 350 degrees of freedom
## Multiple R-squared: 0.5651, Adjusted R-squared: 0.5639
## F-statistic: 454.8 on 1 and 350 DF, p-value: < 2.2e-16
3.6 3.7 3.8 3.9 4.0 4.1
−
0
.2
0
.0
0
.2
0
.4
Fitted values
R
e
si
d
u
a
ls
Residuals vs Fitted
479
174
286
−3 −2 −1 0 1 2 3
−
3
−
1
0
1
2
3
4
Theoretical Quantiles
S
ta
n
d
a
rd
iz
e
d
r
e
si
d
u
a
ls
Normal Q−Q
479
174
286
3.6 3.7 3.8 3.9 4.0 4.1
0
.0
0
.5
1
.0
1
.5
Fitted values
S
ta
n
d
a
rd
iz
e
d
r
e
si
d
u
a
ls
Scale−Location
479
174
286
0.000 0.005 0.010 0.015 0.020 0.025 0.030
−
2
0
2
4
Leverage
S
ta
n
d
a
rd
iz
e
d
r
e
si
d
u
a
ls
Cook's distance
Residuals vs Leverage
479
94
211
The Bayesian way
library(MCMCglmm)
m1b <- MCMCglmm(weight ~ height, data=d.adult, verbose=F)
summary(m1b)
##
## Iterations = 3001:12991
## Thinning interval = 10
## Sample size = 1000
##
## DIC: 2020.144
##
## R-structure: ~units
##
## post.mean l-95% CI u-95% CI eff.samp
## units 18.12 15.47 21.04 899.6
##
## Location effects: weight ~ height
##
## post.mean l-95% CI u-95% CI eff.samp pMCMC
## (Intercept) -52.4646 -60.7339 -43.8509 1000 <0.001 ***
## height 0.6305 0.5742 0.6844 1000 <0.001 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Bayesian Diagnostics: Model Fit
dtf <- data.frame(height=seq(130,185,.1))
# a function to produce an posterior pred. estimate
pred.reg <- function(x,a,b,sigma){
a+b*x+rnorm(length(a),0,sd=sigma)
}
m1b.pred <- sapply(dtf$height, pred.reg,
a=m1b$Sol[,1],b=m1b$Sol[,2],sigma=sqrt(m1b$VCV))
dtf$pp.mean <- apply(m1b.pred,2,mean)
dtf$pp.q025 <- apply(m1b.pred,2,quantile,.025)
dtf$pp.q975 <- apply(m1b.pred,2,quantile,.975)
Plotting
20
40
60
130 140 150 160 170 180
height
w
e
ig
h
t
Bayesian diagnostics: convergence of a
plot(m1b$Sol[,1],ylab='a')
4000 6000 8000 10000 12000
−
6
5
−
6
0
−
5
5
−
5
0
−
4
5
−
4
0
Iterations
a
Trace of var1
−70 −60 −50 −40
0
.0
0
0
.0
2
0
.0
4
0
.0
6
0
.0
8
Density of var1
N = 1000 Bandwidth = 1.177
a
Bayesian diagnostics: convergence of b
plot(m1b$Sol[,2],ylab='b')
4000 6000 8000 10000 12000
0
.5
5
0
.6
0
0
.6
5
0
.7
0
Iterations
b
Trace of var1
0.55 0.60 0.65 0.70 0.75
0
2
4
6
8
1
0
1
2
1
4
Density of var1
N = 1000 Bandwidth = 0.007604
b