STAT314/461. Beta-Binomial Model.
STAT314/461. Beta-Binomial Model.
STAT314/461-2021S1
Binomial Distribution.
Binomial distribution is used to model identical independent
observations with binary outcomes. For example:
I how many patients (y) out of n will improve (or not) after a
certain surgical procedure.
I how many students (y) out of n will pass (or fail) the exam.
I how many people (y) out of n have eaten vegetables (or not)
at least once during the past week.
I how many seeds (y) out of n have germinated (or not) after 24
hours.
I how many questions (y) will you answer correctly (rather than
incorrectly) in a n = 20 question exam?
Binomial Distribution
In this case, we often assume that the response variable y has a
binomial distribution with the total number of binomial trials n and
the probability of success (arbitratily defined as one of the two
possible outcomes) p:
y ∼ Bin(n, p).
Notation 0 vs. 1
Note, that if we model the result of each individual trial (a single
patient, a single student, a single seed etc.) as xi ∈ (0, 1) so that
Pr(xi = 1) = 1− Pr(xi = 0) = p,
Bernoulli Distribution
Let xi has a Bernoulli distribution with probability p:
xi ∼ Bern(p).
Bernoulli distribution is the same as Binomial distribution with
n = 1.
Let y =
∑n
i xi , then
y ∼ Bin(n, p).
Example: Seed germination.
Suppose, you have planted n = 50 seeds of a certain variety and
y = 35 of them have germinated after 72 hours. What can you say
about the expected probability of germination given this data?
Deriving MLE
. . .
Bayesian Model:
y ∼ Bin(n, p).
with the Beta prior for p:
p ∼ Beta(a, b).
Note, Beta(1, 1) is the same as Uniform.
Deriving the posterior distribution
. . .
What happens when n→∞?
. . .
Two Groups.
0
2
4
6
0.00 0.25 0.50 0.75 1.00
p
f(
p
|y
)
Sample A: Beta(36,16)
0
2
4
6
0.00 0.25 0.50 0.75 1.00
p
f(
p
|y
)
Sample B: Beta(16,6)
Combining ???
0
2
4
6
0.00 0.25 0.50 0.75 1.00
p
f(
p
|y
)
sample
sample A
sample B
Combining!
r =
pA
pB
I r = 1 means no difference
I r > 1 means pA > pB, i.e., A is better than B
I r < 1 means pA < pB, i.e., A is worse than B
Simulations
p.a <- rbeta(10^4,36,16)
p.b <- rbeta(10^4,16,6)
ratio <- p.a/p.b
Posterior Distribution for r
0
1
2
3
0.5 1.0 1.5 2.0
x
f(
r|
y)
mean(ratio)
## [1] 0.9730693
quantile(ratio,c(.025,.975))
## 2.5% 97.5%
## 0.7098594 1.3603458
Posterior Distribution for r
Pr
(
pA
pB
> 1
)
= …
mean(ratio>1)
## [1] 0.3767
Model Comparison: Deviance Information Criterion
DIC = pD + D(θ),
where
pD = D(θ)− D(θ̄),
and
D(θ) = −2 log(f (y |θ)) + C .
In the above, f (y |θ) is the likelihood of data y given the parameter
θ. Luckily, we don’t have to always calculate it by hand (although
we’ll have a go at that later.)
Model Comparison: Deviance Information Criterion – 2
I DICs can be used to compare model with the same response
variable. (y and log(y) are not the same response variable, for
example)
I The compared models must be fitted to the same datasets.
(Remove observations with missing data before comparing
models.)
I DICs can take any real values (positive or negative). These
values have no meaningful interpretation of their own, but they
can be used to compare statistical model fit.
I The smaller the DIC, the better the model. A difference of at
least 3 is considered sufficient evidence to choose the model
with the lower DIC.
I Some statisticians prefer to use Bayes Factors for model
comparison. But DIC comparison is a fairly standard practice.
Models:
Consider two models
Model 1. Group-specific probabilities of germination pg :
yg ∼ Bin(ng , pg ) for g = 1, 2 (1)
pg ∼ Beta(a, b) (2)
Model 2: Common probability of germination p.
yg ∼ Bin(ng , p) for g = 1, 2 (3)
p ∼ Beta(a, b) (4)
Posteriors
Note that the priors for ps do not actually need to be the same.
In the first case, we can use group-specific outcomes to obtain the
posterior distributions
pg ∼ Beta(a + yg , b + ng − yg ).
In the second we can obtain the posterior distribution for the
common parameter:
p ∼ Beta(a + y1 + y2, b + n1 + n2 − y1 − y2).
Evaluating DIC.
Recall the equations for the DIC
DIC = pD + D(θ)
pD = D(θ)− D(θ̄)
D(θ) = −2 log(f (y |θ)) + C
where f (y |theta) is the likelihood.
In our example, a = 1, b = 1, y1 = 35, y2 = 15, n1 = 50, n2 = 20,
so the group specific posteriors will be Beta(36, 16) and Beta(16, 6).
To evaluate the D(θ) in the above equations, we will do the
following:
I produce posterior samples from each distribution
I evaluate likelihood of the observed data based on each
realisation
I take the mean of -2log-likelihood
First model (p1! = p2)
# posterior samples
p1 <- rbeta(10^3,36,16); p2 <- rbeta(10^3,16, 6)
# -2 log likelihood
LL <- dbinom(35,50,p1,log=T) + dbinom(15,20,p2,log=T)
mean.D <- mean(-2*LL)
To evaluate the D(θ̄) in the above equations, we will evaluate the
likelihood of the observed data based on posterior means of p
D.mean <- -2*(dbinom(35,50,36/(36+16),log=T) +
dbinom(15,20,16/(16+6),log=T))
And the DIC for the first model is. . .
We can then put them into the above equations:
p.D <- mean.D - D.mean
DIC.2groups <- p.D + mean.D
Second Model (p1 = p2)
For the second model (common germination probability), we can
repeat similar steps with only one p:
# posterior samples
pp <- rbeta(10^3,51,21)
# -2 log likelihood
LL <- dbinom(35,50,pp,log=T) + dbinom(15,20,pp,log=T)
mean.D <- mean(-2*LL)
To evaluate the D(θ̄) in the above equations, we will evaluate the
likelihood of the observed data based on posterior means of p
D.mean <- -2*(dbinom(35,50,51/(51+21),log=T) +
dbinom(15,20,51/(51+21),log=T))
We can then put them into the above equations:
p.D <- mean.D - D.mean
DIC.1group <- p.D + mean.D
Comparing the Two models:
We can then take the look at the resulting DICs:
print(DIC.1group)
## [1] 9.49945
print(DIC.2groups)
## [1] 11.07023
The DIC for one common group model is smaller than that for 2.
Thus, we do not have enough statistical evidence for two groups.
Which means, in turn, that the germination probabilities for the two
groups do not differ from each other (something, our ratio analysis
has already confirmed).
Drinkers and Teetotallers
Days Freq
0 22
1 6
2 18
3 23
4 18
5 10
6 3
7 0
Beta-Binomial Model
Let xi describe the number of days in a week, the participant i
drank alcohol. Then, assume Binomial distribution:
xi |p ∼ Bin(7, p)
for some unknown parameter p, reflecting the probability of having
a drink on any individual day.
Assume conjugate Beta prior with some prior parameters a and b:
p ∼ B(a, b).
Then, the posterior distribution can be shown to be:
p|x ∼ B(a +
∑
i
xi , b + 7n −
∑
i
xi ),
where n = 100 is the number of observations (participants).
Inference
Assuming a uniform prior B(1, 1) for this particular example, this
results in
p|x ∼ B(252, 450),
We can evaluate posterior mean and the 95% credible interval as follows:
# posterior mean a/b
252/702
## [1] 0.3589744
# lower limit for the 95% CI
qbeta(.025,252,450)
## [1] 0.323907
# upper limit
qbeta(.975,252,450)
## [1] 0.3948028
We can thus conclude that the posterior mean estimated probability of having a drink on any given day is 0.35 with
the 95% CI: (0.32,0.39).
What about posterior predictive distribution?
. . .
What would be a better model?
Let zi = 1 if the person i drinks alcohol in principle and zi = 0
otherwise.
Pr(xi |zi = 1) ∼ Bin(7, p)
Pr(xi = 1|zi = 0) = 1− Pr(xi = 0|zi = 0) = 0
p ∼ Beta(a, b)
Running ZiB model in WinBUGS
. . .