CS计算机代考程序代写 Bayesian STAT314/461. Beta-Binomial Model.

STAT314/461. Beta-Binomial Model.

Elena Moltchanova

STAT314/461-2021S1

Binomial Distribution.

Binomial distribution is used to model identical independent
observations with binary outcomes. For example:

I how many patients (y) out of n will improve (or not) after a
certain surgical procedure.

I how many students (y) out of n will pass (or fail) the exam.

I how many people (y) out of n have eaten vegetables (or not)
at least once during the past week.

I how many seeds (y) out of n have germinated (or not) after 24
hours.

I how many questions (y) will you answer correctly (rather than
incorrectly) in a n = 20 question exam?

Binomial Distribution

In this case, we often assume that the response variable y has a
binomial distribution with the total number of binomial trials n and
the probability of success (arbitratily defined as one of the two
possible outcomes) p:

y ∼ Bin(n, p).

Notation 0 vs. 1

Note, that if we model the result of each individual trial (a single
patient, a single student, a single seed etc.) as xi ∈ (0, 1) so that

Pr(xi = 1) = 1− Pr(xi = 0) = p,

Bernoulli Distribution

Let xi has a Bernoulli distribution with probability p:

xi ∼ Bern(p).

Bernoulli distribution is the same as Binomial distribution with
n = 1.

Let y =
∑n

i xi , then

y ∼ Bin(n, p).

Example: Seed germination.

Suppose, you have planted n = 50 seeds of a certain variety and
y = 35 of them have germinated after 72 hours. What can you say
about the expected probability of germination given this data?

Deriving MLE

. . .

Bayesian Model:

y ∼ Bin(n, p).

with the Beta prior for p:

p ∼ Beta(a, b).

Note, Beta(1, 1) is the same as Uniform.

Deriving the posterior distribution

. . .

What happens when n→∞?

. . .

Two Groups.

0.00 0.25 0.50 0.75 1.00
p

f(
p

|y
)

Sample A: Beta(36,16)

0.00 0.25 0.50 0.75 1.00
p

f(
p

|y
)

Sample B: Beta(16,6)

Combining ???

0.00 0.25 0.50 0.75 1.00
p

f(
p

|y
)

sample

sample A

sample B

Combining!

r =
pA
pB

I r = 1 means no difference

I r > 1 means pA > pB, i.e., A is better than B

I r < 1 means pA < pB, i.e., A is worse than B Simulations p.a <- rbeta(10^4,36,16) p.b <- rbeta(10^4,16,6) ratio <- p.a/p.b Posterior Distribution for r 0 1 2 3 0.5 1.0 1.5 2.0 x f( r| y) mean(ratio) ## [1] 0.9730693 quantile(ratio,c(.025,.975)) ## 2.5% 97.5% ## 0.7098594 1.3603458 Posterior Distribution for r Pr ( pA pB > 1
)

= …

mean(ratio>1)

## [1] 0.3767

Model Comparison: Deviance Information Criterion

DIC = pD + D(θ),

where

pD = D(θ)− D(θ̄),

and

D(θ) = −2 log(f (y |θ)) + C .

In the above, f (y |θ) is the likelihood of data y given the parameter
θ. Luckily, we don’t have to always calculate it by hand (although
we’ll have a go at that later.)

Model Comparison: Deviance Information Criterion – 2
I DICs can be used to compare model with the same response

variable. (y and log(y) are not the same response variable, for
example)

I The compared models must be fitted to the same datasets.
(Remove observations with missing data before comparing
models.)

I DICs can take any real values (positive or negative). These
values have no meaningful interpretation of their own, but they
can be used to compare statistical model fit.

I The smaller the DIC, the better the model. A difference of at
least 3 is considered sufficient evidence to choose the model
with the lower DIC.

I Some statisticians prefer to use Bayes Factors for model
comparison. But DIC comparison is a fairly standard practice.

Models:

Consider two models

Model 1. Group-specific probabilities of germination pg :

yg ∼ Bin(ng , pg ) for g = 1, 2 (1)
pg ∼ Beta(a, b) (2)

Model 2: Common probability of germination p.

yg ∼ Bin(ng , p) for g = 1, 2 (3)
p ∼ Beta(a, b) (4)

Posteriors

Note that the priors for ps do not actually need to be the same.

In the first case, we can use group-specific outcomes to obtain the
posterior distributions

pg ∼ Beta(a + yg , b + ng − yg ).

In the second we can obtain the posterior distribution for the
common parameter:

p ∼ Beta(a + y1 + y2, b + n1 + n2 − y1 − y2).

Evaluating DIC.

Recall the equations for the DIC

DIC = pD + D(θ)

pD = D(θ)− D(θ̄)

D(θ) = −2 log(f (y |θ)) + C

where f (y |theta) is the likelihood.

In our example, a = 1, b = 1, y1 = 35, y2 = 15, n1 = 50, n2 = 20,
so the group specific posteriors will be Beta(36, 16) and Beta(16, 6).

To evaluate the D(θ) in the above equations, we will do the
following:

I produce posterior samples from each distribution

I evaluate likelihood of the observed data based on each
realisation

I take the mean of -2log-likelihood

First model (p1! = p2)

# posterior samples
p1 <- rbeta(10^3,36,16); p2 <- rbeta(10^3,16, 6) # -2 log likelihood LL <- dbinom(35,50,p1,log=T) + dbinom(15,20,p2,log=T) mean.D <- mean(-2*LL) To evaluate the D(θ̄) in the above equations, we will evaluate the likelihood of the observed data based on posterior means of p D.mean <- -2*(dbinom(35,50,36/(36+16),log=T) + dbinom(15,20,16/(16+6),log=T)) And the DIC for the first model is. . . We can then put them into the above equations: p.D <- mean.D - D.mean DIC.2groups <- p.D + mean.D Second Model (p1 = p2) For the second model (common germination probability), we can repeat similar steps with only one p: # posterior samples pp <- rbeta(10^3,51,21) # -2 log likelihood LL <- dbinom(35,50,pp,log=T) + dbinom(15,20,pp,log=T) mean.D <- mean(-2*LL) To evaluate the D(θ̄) in the above equations, we will evaluate the likelihood of the observed data based on posterior means of p D.mean <- -2*(dbinom(35,50,51/(51+21),log=T) + dbinom(15,20,51/(51+21),log=T)) We can then put them into the above equations: p.D <- mean.D - D.mean DIC.1group <- p.D + mean.D Comparing the Two models: We can then take the look at the resulting DICs: print(DIC.1group) ## [1] 9.49945 print(DIC.2groups) ## [1] 11.07023 The DIC for one common group model is smaller than that for 2. Thus, we do not have enough statistical evidence for two groups. Which means, in turn, that the germination probabilities for the two groups do not differ from each other (something, our ratio analysis has already confirmed). Drinkers and Teetotallers Days Freq 0 22 1 6 2 18 3 23 4 18 5 10 6 3 7 0 Beta-Binomial Model Let xi describe the number of days in a week, the participant i drank alcohol. Then, assume Binomial distribution: xi |p ∼ Bin(7, p) for some unknown parameter p, reflecting the probability of having a drink on any individual day. Assume conjugate Beta prior with some prior parameters a and b: p ∼ B(a, b). Then, the posterior distribution can be shown to be: p|x ∼ B(a + ∑ i xi , b + 7n − ∑ i xi ), where n = 100 is the number of observations (participants). Inference Assuming a uniform prior B(1, 1) for this particular example, this results in p|x ∼ B(252, 450), We can evaluate posterior mean and the 95% credible interval as follows: # posterior mean a/b 252/702 ## [1] 0.3589744 # lower limit for the 95% CI qbeta(.025,252,450) ## [1] 0.323907 # upper limit qbeta(.975,252,450) ## [1] 0.3948028 We can thus conclude that the posterior mean estimated probability of having a drink on any given day is 0.35 with the 95% CI: (0.32,0.39). What about posterior predictive distribution? . . . What would be a better model? Let zi = 1 if the person i drinks alcohol in principle and zi = 0 otherwise. Pr(xi |zi = 1) ∼ Bin(7, p) Pr(xi = 1|zi = 0) = 1− Pr(xi = 0|zi = 0) = 0 p ∼ Beta(a, b) Running ZiB model in WinBUGS . . .

Related Posts