MH example 2: Poisson regression for birth rates
MH example 2: Poisson regression for birth rates
September 2021
The data.
This example is about regional variation across in birth rates in the 15-19 age group. We use a Bayesian Poisson regression to investigate whether birth rates vary between major urban and other areas and between North and South Islands.
##General set-up
library(MASS)
library(mvtnorm)
## Warning: package ‘mvtnorm’ was built under R version 4.0.3
library(LearnBayes) #laplace function
library(arm)
## Loading required package: Matrix
## Loading required package: lme4
##
## arm (Version 1.11-1, built: 2020-4-27)
## Working directory is C:/Users/ /Documents/Patrick/Stat314-2021/Metropolis-Hastings/final
#setwd(“~/Patrick/Stat314-2017”) #Should not be needed since data created
#in the code
##births by region example
#set-up the data
births <- c(64,365,149,113,30,65,38,95,117,6,12,9,9,137,36,36)
popsize <- c(5450,51620,14960,9360,1770,5640,3880,9020,17040,
1420,1600,1290,980,18670,8290,3170)
regions <- c("Northland","Auckland","Waikato","Bay of Plenty",
"Gisborne","Hawke's Bay","Taranaki",
"Manawatu-Wanganui","Wellington","Tasman","Nelson",
"Marlborough","West Coast","Canterbury","Otago",
"Southland")
majurb <- c(0,1,0,0,0,0,0,0,1,0,0,0,0,1,1,0)
North <- c(1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0)
South <- 1- North
###place into a data frame for easier viewing and writing out if necessary
births_reg.df <- data.frame(regions,births,popsize,majurb,South)
###Have a look at the data
births_reg.df
## regions births popsize majurb South
## 1 Northland 64 5450 0 0
## 2 Auckland 365 51620 1 0
## 3 Waikato 149 14960 0 0
## 4 Bay of Plenty 113 9360 0 0
## 5 Gisborne 30 1770 0 0
## 6 Hawke's Bay 65 5640 0 0
## 7 Taranaki 38 3880 0 0
## 8 Manawatu-Wanganui 95 9020 0 0
## 9 Wellington 117 17040 1 0
## 10 Tasman 6 1420 0 1
## 11 Nelson 12 1600 0 1
## 12 Marlborough 9 1290 0 1
## 13 West Coast 9 980 0 1
## 14 Canterbury 137 18670 1 1
## 15 Otago 36 8290 1 1
## 16 Southland 36 3170 0 1
rate <- births / popsize
rate
## [1] 0.011743119 0.007070903 0.009959893 0.012072650 0.016949153 0.011524823
## [7] 0.009793814 0.010532151 0.006866197 0.004225352 0.007500000 0.006976744
## [13] 0.009183673 0.007337975 0.004342581 0.011356467
There is not much to work with in these data - only two covariates. However for illustrative purposes we will work through the steps of using a Metropolis-Hastings algorithm to compute the posterior for a Bayesian Poisson regression of the birth rate against the major-urban and North/ South Island indicators
Like logistic regression, Poisson regression is an examaple of a generalised linear model (GLM).
Model specification
The Model for the data is \[
\begin{aligned}
\left [ Y_{i}|N_{i},\theta_{i} \right ] & \stackrel{\mathrm{indep}}{\sim} \mathrm{Poisson}(\theta_{i}N_{i}), \, i=1,\ldots, 12 \\
\mathrm{log}(\theta_{i}) & = \beta_{0}+ \beta_{1}\mathrm{majurb}_{i} + \beta_{2}\mathrm{North}_{i} \,\, i=1,\ldots, 12.
\end{aligned}
\] Here \(Y\) denotes births, and \(N\) denotes female population size (popsize) Under the model \(\theta_{i}=E(Y_{i}/N_{i})\) is the expected rate, which is often referred to as the underlying rate, in contrast to the directly observed rate \(Y_{i}/N_{i}\) (births divided by female population). To complete the model we need to specify a prior for the regression parameters \(p(\beta_{0},\beta_{1},\beta_{2}).\) We assume the parameters are a priori independent, so \(p(\beta_{0},\beta_{1},\beta_{2})=p(\beta_{0})p(\beta_{1})p(\beta_{2})\) and adopt normal priors for each parameter. This is equivalent to a multivariate normal prior for the vector \((\beta_{0},\beta_{1},\beta_{2}),\) with mean vector equal to the concatenation of the prior means for each parameter and a diagonal variance matrix with \(i^{th}\) diagonal element equal to the prior variance for the \(i^{th}\) parameter.
The parameters are best interpreted once exponentiated: \(\exp(\beta_{0})\) is the expected birth rate in a non -major urban region in the South Island \(\exp(\beta_{1})\) is the ratio of the expected rate in major urban regions compared to non major urban regions; holding Island constant. \(\exp(\beta_{2})\) is the ratio of the expected rate in a North Island region compared to a South Island region, holding major urban status constant.
Since all the regression parameters are interpretable once exponentiated we can use the mkpriorreg() function that we defined for specifying priors for logistic regression parameters for all parameters of our Poisson glm This is in contrast to logistic regression where it was the inverse-logit of the intercept parameter that was interpretable, requiring a different function for setting the prior, makepriorb0().
For convenience, we re-define the makepriorreg() function here although we could read in the previous version using the function source()
makepriorbreg <- function(low,high,credibility) {
#returns prior mean and standard deviation corresponding to
##log-transformed parameter
require(arm)
log_low <- log(low)
log_high <- log(high)
priormean <- 0.5*(log_low + log_high)
normal_quantile <- 1 - (1-credibility)/2
priorsd <- (log_high - priormean)/qnorm(normal_quantile)
outlist <- c(priormean,priorsd)
}
Now we use this function to specify the priors
priorb0 <- makepriorbreg(low=0.001,high=0.1,credibility=0.95)
prior_majurb <- makepriorbreg(low=0.5,high=3,credibility=0.9)
prior_north <- makepriorbreg(low=0.1,high=10,credibility=0.99)
prior.matrix <- rbind(priorb0,prior_majurb,prior_north)
prior.matrix
## [,1] [,2]
## priorb0 -4.605170e+00 1.1748099
## prior_majurb 2.027326e-01 0.5446562
## prior_north 2.220446e-16 0.8939199
Combine the individual means and variances into a mean vector and diagonal variance matrix that will define the multivariate normal prior.
prior_mean <- prior.matrix[,1]
prior_sd <- prior.matrix[,2]
prior_variance <- diag(prior_sd^2) ##setting up the prior variance matrix
#check
prior_mean
## priorb0 prior_majurb prior_north
## -4.605170e+00 2.027326e-01 2.220446e-16
prior_variance
## [,1] [,2] [,3]
## [1,] 1.380178 0.0000000 0.0000000
## [2,] 0.000000 0.2966504 0.0000000
## [3,] 0.000000 0.0000000 0.7990928
Preliminary analysis
To start the ball rolling we will fit a conventional glm model, ignoring the prior. This is useful for setting up a model object which can be used in log-likelihood and log-posterior functions.
birthsmodel1 <- glm(births ~ majurb + North,family=poisson(link="log"),
offset=log(popsize) )
display(birthsmodel1)
## glm(formula = births ~ majurb + North, family = poisson(link = "log"),
## offset = log(popsize))
## coef.est coef.se
## (Intercept) -4.67 0.07
## majurb -0.43 0.06
## North 0.15 0.07
## ---
## n = 16, k = 3
## residual deviance = 25.6, null deviance = 92.8 (difference = 67.2)
beta_mle <- coef(birthsmodel1)
Set-up the log-likelihood and log-posterior functions
loglike_poissonreg <- function(beta,model){
##This function is vectorised so can cope with multiple settings of beta
if (is.vector(beta)) {beta <- matrix(beta,nrow=1,ncol=length(beta))}
##beta assumed to be nsim by p matrix where nsim is number of beta
# values (i.e number of settings of beta) to evaluated
##and p is the dimension of the parameter vector in the model
##model is a model object which contains the model formula, the data, the model matrix and so on (e.g the output from a call to glm())
X <- model.matrix(model) ##extracts the design matrix
Xb <- X %*% t(beta)
Y <- model$y #vector
# turn Y into a matrix conformable with beta for some subsequent
#calculations
Ymatrix <- matrix(Y,nrow=length(Y),ncol=nrow(beta),byrow=FALSE)
logN <- model$offset
logNmatrix <- matrix(logN,nrow=length(logN),ncol=nrow(beta),byrow=FALSE)
expect <- exp(Xb + logN)
logl <- Ymatrix*log(expect) - expect -lfactorial(Ymatrix)
##logl returns a matrix with nrow= length(Y)=length(N)
##so the overall log-likelihood is the sum of these components
return(colSums(logl))
}
###evaluate the log-likelihood at a few values
loglike_poissonreg(beta=beta_mle,model=birthsmodel1)
## [1] -58.08406
loglike_poissonreg(beta=rep(0,length(beta_mle)),model=birthsmodel1)
## [1] -146834.3
loglike_poissonreg(beta= beta_mle+0.01,model=birthsmodel1)
## [1] -58.4539
###set-up the log-posterior function
logpost_poissonreg <- function(beta,priormean,priorvariance,model) {
logpost <- loglike_poissonreg(beta,model) + dmvnorm(beta,mean=priormean,
sigma=priorvariance,log=TRUE)
return(logpost)
}
##test this out
logpost_poissonreg(beta_mle,
priormean=prior_mean,
priorvariance=prior_variance,
model=birthsmodel1)
## [1] -60.96331
Build an approximation to the posterior.
We will use a multivariate normal approximation based on the posterior mode and an approximation to the posterior variance based on the curvature of the unnormalised log-posterior at the mode (negative inverse of the second derivative). We use the laplace function from the LearnBayes package to find the posterior mode and approximate variance.
logpost1 <- laplace(logpost_poissonreg,mode=beta_mle,
priormean=prior_mean,
priorvariance=prior_variance,
model=birthsmodel1)
str(logpost1)
## List of 4
## $ mode : Named num [1:3] -4.671 -0.419 0.15
## ..- attr(*, "names")= chr [1:3] "(Intercept)" "majurb" "North"
## $ var : num [1:3, 1:3] 0.00548 -0.00214 -0.00447 -0.00214 0.00317 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:3] "(Intercept)" "majurb" "North"
## .. ..$ : chr [1:3] "(Intercept)" "majurb" "North"
## $ int : num -67.3
## $ converge: logi TRUE
mean_approx <- logpost1$mode
var_approx <- logpost1$var
The multivariate normal approximation to the variance is defined by mean_approx and var_approx.
Setting up the Metropolis-Hastings algorithm.
We need to define the variance for the staring distribution which is an over-dispersed (i.e bigger variance) version of our approximation to the posterior
var_start <- 1.5*var_approx
We also need to define the jumping distribution. We will use a normal jumping density, centred on the current value with variance computed according to the usual Gelman et al recommendation. Since we are using a normal jumping density (which is symmetric) this is really a Metropolis algorithm.
var_jump <- ((2.4^2)/length(mean_approx)) * var_approx
##specify number of chains, length of chains and burn-in period
nchains <- 5
nsim <- 2000
burnin <- 1000
##structures for storing output
store_beta <- array(dim=c(length(mean_approx),nchains,nsim))
store_accept <- matrix(nrow=nsim,ncol=nchains)
Run the Metropolis algorithm
for (j in 1:nchains){ ##loop over chains
##generate a starting value
old_beta <- rmvnorm(n=1,mean=mean_approx,sigma=var_start)
for (i in 1:nsim) { ##loop over simulations
###generate proposal
prop_beta <- rmvnorm(n=1,mean=old_beta,sigma=var_jump)
##compare log unnormalised posterior at proposed and old value
logpost_prop <- logpost_poissonreg(prop_beta,
priormean=prior_mean,
priorvariance=prior_variance,
model=birthsmodel1)
logpost_old <- logpost_poissonreg(old_beta,
priormean=prior_mean,
priorvariance=prior_variance,
model=birthsmodel1)
logrMH <- logpost_prop - logpost_old
###decide on acceptance or rejection
logU <- log(runif(1))
if (logU <= logrMH) {
old_beta <- prop_beta
store_accept[i,j] <- 1
} else {
store_accept[i,j] <- 0
}
#store current value of beta
store_beta[,j,i] <- old_beta
} #end loop over simulations
} #end loop over chains
Check acceptance rates:
colMeans(store_accept)
## [1] 0.3360 0.3140 0.3045 0.3215 0.3235
Check convergence
npos <- nsim - burnin ##size of posterior sample in each chain
n1 <- round(npos/2) # size of chunk1
#set up vectors for storing
rhat <- vector(mode="numeric",length=length(mean_approx))
neff <- vector(mode="numeric",length=length(mean_approx))
#for storing the statistics for each parameter.
k <- 1
for (k in 1:length(mean_approx)) { #looping over parameters
#subset out post burn-in sample for kth parameter
#set this up as npos by nchains matrix
betak <- t(store_beta[k,1:nchains,((burnin+1):nsim)] )
##chunk each chain into two pieces and reassemble the matrix -
## for convergence checking
chunk1 <- betak[1:n1,]
chunk2 <- betak[(n1+1):npos,]
betak_chunked <- cbind(chunk1,chunk2)
#str(betak_chunked)
chainmeans <- colMeans(betak_chunked)
withinsd <- apply(betak_chunked,MARGIN=2,FUN="sd")
betweensd <- sd(chainmeans)
B = (npos/2)*betweensd^2
W = mean(withinsd^2)
varplus <- ((npos-1)/npos) * W + (1/npos)*B
rhat[k] <- sqrt(varplus/W)
#set-up betak_chunked as MCMC object
betak.mcmc <- coda::mcmc(betak_chunked)
#str(betak.mcmc)
neff[k] <- sum(coda::effectiveSize(betak.mcmc))
}
rhat
## [1] 1.004361 1.003792 1.007180
neff
## [1] 565.2899 510.2379 520.3874
We will now look at the traceplots. We could just plot the post burn-in samples, but will examine the full iteration history.
x <- seq(from=1,to=nsim,by=1)
#plots for intercept
plot(x,store_beta[1,1,],type="l",main= "b0") #parameter 1, chain1 all simulations
lines(x,store_beta[1,2,],col="red")
lines(x,store_beta[1,3,],col="blue")
lines(x,store_beta[1,4,],col="green")
lines(x,store_beta[1,5,],col="orange")
#plots for beta1 - effects of major urban regions
plot(x,store_beta[2,1,],type="l",main= "b1")
lines(x,store_beta[2,2,],col="red")
lines(x,store_beta[2,3,],col="blue")
lines(x,store_beta[2,4,],col="green")
lines(x,store_beta[2,5,],col="orange")
#plots for beta2 - effects of North Island
plot(x,store_beta[3,1,],type="l",main= "b2")
lines(x,store_beta[3,2,],col="red")
lines(x,store_beta[3,3,],col="blue")
lines(x,store_beta[3,4,],col="green")
lines(x,store_beta[3,5,],col="orange")
Based on the Rhat statistics and the traceplots, the chains certainly seem to have converged. However the effective Monte Carlo sample sizes are not spectacular. Longer chains may be required to get highly accurate tail probabilities
Posterior inference
Since the chains have converged we will move to inference
#Subset out the post burn-in sample.
postsample <- store_beta[,,(burnin+1):nsim]
###since all parameters are interpretable once exponentiated
## we exponentiate the entire postsample
exppost <- exp(postsample)
quantile(exppost[1,,],probs=c(0.025,0.5,0.975))
## 2.5% 50% 97.5%
## 0.008119144 0.009329523 0.010716539
quantile(exppost[2,,],probs=c(0.025,0.5,0.975))
## 2.5% 50% 97.5%
## 0.5891598 0.6554885 0.7301577
quantile(exppost[3,,],probs=c(0.025,0.5,0.975))
## 2.5% 50% 97.5%
## 1.013194 1.170524 1.336541
##density plots
plot(density(exppost[1,,],adjust=1.5),main="rate in non-urban South Island")
plot(density(exppost[2,,],adjust=1.5),main="rate ratio for major urban v other")
plot(density(exppost[3,,],adjust=1.5),main="rate ratio for North v South Island")
Overall it looks like there is good evidence that birth rates are markedly lower in major urban areas. There is also evidence that birth rates are higher in the North Island.