Stat314/461Term 4: Logistic Regression from a Bayesian viewpoint
Stat314/461Term 4:
Logistic Regression from a Bayesian viewpoint
September, 2021
1 / 14
Logistic regression
Regression modelling for a binary response variable
Yi ∼ Bernoulli(θi ); i = 1, . . . ,N
log
(
θi
(1− θi )
)
= β0 + X1iβ1 + X2iβ2 + X3iβ3 + ….
or equivalently
θi =
exp(β0 + X1iβ1 + X2iβ2 + X3iβ3 + ….)
1 + exp(β0 + X1iβ1 + X2iβ2 + X3iβ3 + ….)
This model ensures 0 < θi < 1, for all i , or, equivalently, for all
combinations of the X’s.
We are modelling the relationship of the success probabilities to the
covariates X1,X2, . . . ... i.e “regression.”
2 / 14
Interpretation of logistic regression coefficients (i)
Let
θ =
exp(β0 + X1β1 + X2β2 + X3β3 + ....)
1 + exp(β0 + X1β1 + X2β2 + X3β3 + ....)
log
(
θ
1− θ
)
= β0 + X1β1 + X2β2 + X3β3 + .... (1)
Effect on logit(θ) on changing from, say, X1 to X1 + 1 is
β0+(X1 + 1)β1 + X2β2 + X3β3 + ....) (2)
− β0 + (X1)β1 + X2β2 + X3β3 + ....) (3)
= β1 (4)
3 / 14
Interpretation of logistic regression coffecients (ii)
log
(
θ
1− θ
|(X1 + 1)
)
− log
(
θ
1− θ
|X1
)
= β1 (5)
Hence
exp
[
log
(
θ
1− θ
|(X1 + 1)
)
− log
(
θ
1− θ
|X1
)]
= exp(β1) (6)
[θ/(1− θ)] |(X1 + 1)
[θ/(1− θ)] |X1
= exp(β1) (7)
exponentiated logistic regression coefficients are interpretable as odds
ratios (except for intercept)
invlogit(β0) is the probability of the event when X1 = X2 = . . . ,= 0
4 / 14
Example
Example: Pr(income > $1250/week |age, sex , quals)
– from the income survey data
p(β|Y) ∝ p(β)
∏
i
θ
Yi
i (1− θi )
(1−Yi )
θi =
exp(β0 + X1iβ1 + X2iβ2 + X3iβ3 + ….)
1 + exp(β0 + X1iβ1 + X2iβ2 + X3iβ3 + ….)
and X1=age, X2 = sex, X3,X4,X5,X6 =indicators for qualification
categories.
5 / 14
Directed Acyclic Graph for Income model
X Quals
Occupation
Income
6 / 14
Likelihood for the logistic regression example (i)
X fixed; Y random (or analysing conditional on X)
really just a sequence of Bernoulli trials but with different success
probabilities on each trial.
Don’t panic!
Let Xi = (1,X1i ,X2i , . . .)
′, β = (β0, β1, β2, . . .)
′
X′β =
∑i=K
j=0 Xjiβj where X0i = 1
θi = expit(X
′
i β) = invlogit(X
′
i β) =
exp(X
′
i β)
1 + exp(X
′
i β)
.
7 / 14
Likelihood for the logistic regression example (ii)
If
log
(
θi
(1− θi )
)
= X
′
i β
θi = invlogit(X
′
i β)⇒ (1− θi ) = (1 + exp(X
′
i β))
−1
and
p(Y|X,β) =
∏
i
θ
Yi
i (1− θi )
(1−Yi )
=
∏
i
(
θi
1− θi
)Yi
(1− θi )
log(p(Y|X,β)) =
∑
i
Yi log
(
θi
1− θi
)
+ log((1 + exp(X
′
i β))
−1)
=
∑
i
Yi × X
′
i β − log(1 + exp(X
′
i β))
8 / 14
Likelihood for the logistic regression example (ii)
If
log
(
θi
(1− θi )
)
= X
′
i β
θi = invlogit(X
′
i β)⇒ (1− θi ) = (1 + exp(X
′
i β))
−1
and
p(Y|X,β) =
∏
i
θ
Yi
i (1− θi )
(1−Yi )
=
∏
i
(
θi
1− θi
)Yi
(1− θi )
log(p(Y|X,β)) =
∑
i
Yi log
(
θi
1− θi
)
+ log((1 + exp(X
′
i β))
−1)
=
∑
i
Yi × X
′
i β − log(1 + exp(X
′
i β))
8 / 14
Prior for logistic regression coefficients (ii)
We have a vector of parameters β = (β0, β1, β2, . . .)
′. Even if we
model these parameters as a priori independent they will be correlated
in the posterior: (To get the idea try
logitmodel <- glm(highincome ~ hoursfactor + sexfactor + qualfactor,family=binomial(link="logit") ) vcov(logitmodel) which returns the variance matrix for the mle. 9 / 14 Prior for logistic regression coefficients (ii) It is convenient to treat β = (β0, β1, β2, . . .) ′. as a single entity in computations. If we do regard the components of β as a priori independent and adopt a normal prior we are saying p(β) = j=K∏ j=0 normal(βj |µj , σ2j ) (8) This is equivalent to saying β ∼ MVN(µ,Ω) where µ = (µ0, µ1, . . . µK ) ′ and Ω = σ20 0 0 ... 0 σ21 0 ... ... ... ... ... 0 0 0... σ2 K 10 / 14 Prior for logistic coefficients (iii) In general the multivariate normal (MVN) distribution is characterised by a mean vector and variance matrix. Off-diagonal elements in the variance matrix are covariances. Non-zero covariances indicate correlation between the corresponding elements of the vector being modelled σ20 σ01 σ02 ...σ0K σ01 σ 2 1 σ12 ...σ1K ... ... ... ... σ0K σ1K σ2K ... σ 2 K ρ12 = σ12 σ1σ2 etc 11 / 14 Prior for logistic coefficients (iv) If we assume a priori independence we can construct the joint prior for β by considering each component separately. π0 = invlogit(β0) is the probability of the event if the X ′s are all zero. Could 1 set , say, 95% prior limits on π0 2 transform to logit scale 3 construct a normal that matches the transformed limits: µ0 = 0.5(logit(π0,low ) + logit(π0,high)); variance s.t transformed limits are 2.5% and 97.5% quantiles of a normal with mean µ0 γj = exp(βj) is the odds ratio for the effect of Xj , j = 1, . . . ,K . 1 set 95% prior limits on γj (remember 1 is the “null” value) 2 transform to log-scale 3 construct a normal that matches the transformed limits (2.5% and 97.5% quantiles of a normal.) 12 / 14 Importance sampling for logistic regression If we can: 1 write a function to compute the log-likelihood for any value of β 2 write a function to compute the log prior density for any value of β we can compute the log unnormalised posterior density at any setting of the parameters. We can obtain a multivariate normal approximation to the posterior by maximizing the log unnormalised posterior density wrt β to find the posterior mode and the curvature of the log-posterior at the model. The approximating MVN is centred at the posterior mode and and has variance based on the inverse of the curvature. Standard optimisation functions can find these quantities. The laplace function from the LearnBayes package is useful; bayesglm from the ARM package can also be used. 13 / 14 Example: logistic regression for analysis of effect of sex and tertiary education on probability of high income see logistic regression importance sampling R code for an application based on the income survey data logisticregression_importance2021.r 14 / 14