CS计算机代考程序代写 Bayesian Model Comparison

Model Comparison
Bayesian Statistics Statistics 4224/5224 Spring 2021
February 16, 2021
1

Bayes Factors
(Section 7.4 of Bayesian Data Analysis, by Gelman et al; and Section 5.2 of Bayesian Statistical Methods, by Reich and Ghosh)
Bayes factors provide a formal summary of the evidence that the data support one model over another.
Say there are two models under consideration, H1 and H2. Their posterior probabilities are the most intuitive summaries of model uncertainty.
Posterior model probabilities incorporate information form both the data and the prior. Bayes factors remove the effect of the prior, and quantify the data’s support of the models.
2

The Bayes factor for Model H2 relative to Model H1 is defined implicitly by
Pr(H2|y) = Pr(H2) × Bayes factor(H2; H1) . Pr(H1|y) Pr(H1)
Thus the Bayes factor of Model 2 relative to Model 1 is the ratio of posterior odds to prior odds,
Bayes factor(H2; H1) = Pr(H2|y)/Pr(H1|y) . Pr(H2)/Pr(H1)
It follows that
p(y|H2) 􏰦 p(y|θ;H2)p(θ|H2)dθ Bayes factor(H2; H1) = p(y|H1) = 􏰦 p(y|θ; H1)p(θ|H1)dθ .
3

Hypothesis testing and Bayes factors
In classical hypothesis testing, one of the models is referred to as the null model or null hypothesis, and the other is the alternative model/hypothesis.
Hypothesis tests are usually designed to be conservative, so that the null model is rejected in favor of the alternative only if the data strongly support this model.
If we define H1 as the null hypothesis, and H2 as an alternative hypothesis, then a rule of thumb is that BF(H2; H1) > 10 pro- vides strong evidence of the alternative hypothesis compared to the null hypothesis, and BF(H2; H1) > 100 is decisive evidence.
4

Model comparison using Bayes factors
In general, model selection can be framed as treating the model as an unknown random variable H ∈ {H1, H2} with prior probabil- ities Pr(Hj) for j = 1, 2. Conditional on model j, the remainder of the Bayesian model is
Hj : y ∼ p(y|θ; Hj) where θ ∼ p(θ|Hj) Bayes factors cannot be used with improper priors
The Bayes factor requires the marginal likelihood, integrating over uncertainty in the parameters,
􏰒
p(y|Hj) =
But an improper prior is only known up to proportionality. There-
p(y|θ; Hj)p(θ|Hj)dθ .
fore, Bayes factors cannot be used with improper priors.
5

Similarity to likelihood ratio testing
The Bayes factor for H2 relative to H1,
􏰦 p(y|θ;H2)p(θ|H2)dθ BF(H2; H1) = 􏰦 p(y|θ; H1)p(θ|H1)dθ ,
is the ratio of the marginal distribution of the data under the two models. This resembles the likelihood ratio
p ( y | θˆ ; H ) LR(H2; H1) = 2 2
from frequentist hypothesis testing, where θˆ is the MLE under j
model Hj.
Both measures compare models based on the ratio of their like- lihoods; Bayes factor integrates over posterior uncertainty in the parameters, whereas the likelihood ratio plugs in point estimates.
6
p ( y | θˆ ; H ) 11

Beta-binomial example
Let y|θ ∼ Binomial(n = 20, θ) and consider two models for θ: H1:θ=0.5 versus H2:θ∼Uniform(0,1).
We have, for each y = 0,1,2,…,20,
p1(y) = p(y|θ = 0.5) = y!(20 − y)! 2
20! 􏰍 1 􏰎20 and leave it as an exercise to show that
p2(y) = 􏰒 1 20! θy(1−θ)20−ydθ = 1 . 0 y!(20 − y)! 21
7

We compute
BF(H2; H1) = p2(y) p1(y)
for each y = 0,1,2,…,20, and find
• BF>10 for y≤4 or y≥16,
• BF>100 for y≤2 or y≥18.
See
Courseworks → Files → Examples → Ex07b BayesFactor .
8

Normal mean example
Say there is a single observation y|μ ∼ Normal(μ,1) and the objective is to test whether μ = 0:
H1 :μ=0 versus H2 :μ∼Normal(0,τ2).
Given that we observe y, it can be shown that the Bayes factor
of H2 relative to H1 is
BF(H2; H1) = (1 + τ 2)−1/2 exp 2 1 + τ 2 .
􏰈y2􏰐 τ2 􏰑􏰡
For fixed τ, the Bayes factor increaases to infinity as y2 increases,
as expected because data far from zero contradict H1 : μ = 0.
However, for any fixed y, the Bayes factor converges to zero as the prior variance τ2 increases.
9

Homework 3 hints and suggestions
1. Problem 1 concerns a beta-binomial hierarchical model. You can easily adapt the R code given in ‘Example05a’ to solve much of this problem. As in that example, I suggest you reparameterize from (α, β) to log(α/β), log(α + β)) for pos- terior sampling based on a grid approximation. You should find the values outside the ranges −1.5 < log(α/β) < 1.5 and 0 < log(α + β) < 6 can safely be ignored (have posterior probability of essentially zero). 2. Problem 2 concerns a hierarchical normal model with known variance, like the one estimated in ’Example05b,’ which con- tains most of the R code you will need to solve this problem. 10 3. Problem 3 concerns an ordinary (nonhierarchical) Poisson- gamma model. Part (a) can be answered ‘exactly’ using the qgamma function, parts (b) and (c) are best accomplished by Monte Carlo sampling. 4. Problem 4 involves posterior predictive checks, which we will discuss in class on Thu Feb 18. The instructions are pretty explicity, however, so there’s no need to wait before attempt- ing to solve this problem. 5. Bayes factor. An analytic solution is available, but I sug- gest you evaluate the integrals numerically, perhaps using the Monte Carlo method. The first step is to figure out exactly what those integrals are, upon which BF(H2; H1) = p(y1, y2|H2) p(y1, y2|H1) depends. 11 Cross-validation and information criteria Cross-validation is a technique for comparing competing statisti- cal models for a particular data set, based on their out-of-sample predictive performance. Model selection criteria, such as the deviance information crite- rion (DIC) and Wantanabe-Akaike information criterion (WAIC), provide a useful, less computationally intensive alternative to cross-validation. You are referred to Sections 7.1–7.3 of Gelman et al, or Sec- tions 5.1 and 5.5 of Reich and Ghosh. We will not cover model selection criteria in this course. 12