Bayesian statistics
Introduction to Bayesian methods in ecology and evolution
Matteo Fumagalli m.fumagalli@imperial.ac.uk Imperial College London
February 17, 2020
Contents
1 Birds 1 2 Frogs 3 3 Ancient DNA 5 4 Extinctions 8
1 Birds
You are in the Galapagos and you want to model the distribution of beak widths in Darwin finches. In the species you are investigating, you know my past infor- mation that this width is very different between sexes but highly homogenous within each sex (there is no much variation in beak width within each sex). In particular experts told you that the average beak width for males is 12mm while the average width for females is 25mm. You also know that in the particular island you are located the proportion of females vs. males is 3 .
2
Question A Based on this piece of information, propose a suitable prior dis- tribution on the parameter beak width assuming that you are interested in describing the beak width in the entire population (both sexes comprised) in the island where you are located. Hint: imagine you randomly pick one bird, what is the expected beak width?
Question B Let’s suppose you move to another island, which is still unex- plored, and you want to investigate the beak width, θ, of female birds only. You collect n = 10 samples and your likelihood function is N(θ|μ,σ2) and 25 is the average value across the 10 samples with a variance of 5.
What are the appropriate values for μ and σ2?
Question C You now want to derive a posterior distribution for θ. You ask for prior information from four different experts (Dr Harrison, Dr Lennon, Dr McCartney, Dr Starr). Based on each prior distribution suggested, you obtained the following posterior distributions.
Discuss which prior might be more suitable in the case where: (i) you have little idea on what to expect in this new island for your parameter or (ii) expect that the beak width is comparable to what known in the previous island (as in Question A). Discuss one point for each posterior (e.g. who is the most or least confident, who uses a safest or bravest prior, etc ect.).
Question D You want to study another character, the beak length θ2 which is correlated with the beak width, now labelled θ1. Your vector of parameters is now θ⃗ = {θ1, θ2}. You choose a uniform prior (not conjugate to the Normal dis- trbution) and therefore choose an MCMC algorithm (specifically the Metropolis algorithm) to obtain samples from the posterior distribution.
The algorithm begins by proposing a proposal symmetric density q(θ⃗∗|θ⃗(t−1)) which satisfies q(θ⃗∗|θ⃗(t−1)) = q(θ⃗(t−1)|θ⃗∗). From a starting value θ⃗(0) at iteration
t = 0, for (t = 1, …, T ) the algorithm repeats: 1. draw θ⃗∗ = q(·|θ⃗(t−1)),
2. calculate r = h(θ⃗∗)/h(θ⃗(t−1)),
1
Figure 1: Posterior distributions for Question C on birds.
3. if r ≥ 1, set θ⃗(t) = θ⃗∗, otherwise
set θ⃗(t) = θ⃗∗ with probability r or
set θ⃗(t) = θ⃗(t−1) with probability 1 − r.
Under mild assumptions, θ⃗(t) converges in distribution to a draw from the true posterior density p(θ⃗|⃗y).
The samples obtained by this algorithm are auto-correlated, as the sampling at time t is conditional to the sampling at time t − 1. This is often not entirely satisfactory as ideally we want to draw uncorrelated samples from the posterior distribution. Propose a modification of the algorithm written above (or write a pseudocode) to ensure that your samplings from the Markov chain are fairly uncorrelated. Discuss pros and cons of your modification.
Another issue in this computation is that θ1 and θ2 are highly correlated (or you expect them to be highly correlated). In this case the chain will have a ”slow mixing” and may explore only a fraction of the parameter space. Propose a modification of the algorithm written above (or write a pseudocode) to ensure that you reach convergence (by exploring the whole parameter space) even for correlated parameters θ1 and θ2. Discuss pros and cons of your modification and suggest a computational technique to overcome a potential disadvantage of this modification (hint: assume you can use the HPC).
2
2 Frogs
During your latest field trip in Costa Rica you observed how brightly coloured poison dart frogs (part of the Dendrobatidae family) were. In fact, the brightness of their skin colouration is correlated with their toxicity. To investigate the prevalence of toxic frogs in the area under study, you collected n samples of poison dart frogs and observed that k of them have bright skin colour (and thus are toxic).
Question A You now want to estimate the population frequency of the red colour phenotype, f ∈ [0, 1]. Assuming a generic likelihood function p(k|f, n), where k is our observed data, and prior distribution p(f ), write the expression for the posterior distribution of f , p(f |n, k). Indicate the interval for the integration over f explicitly. Finally, if p(f|μ) (assuming we have uncertainty on the prior for f with parameter μ, what is the resulting full equation for the posterior distribution p(f|n,k,μ)?
Question B If we define the likelihood function as a Binomial distribution: p(k|f,n) = (nk)fk(1−f)n−k (1)
and the prior function as a Beta distribution B(α,β):
p(f) = 1 fα−1(1 − f)β−1 (2)
B(α,β)
then the posterior distribution of f is a Beta distribution with parameters α′ = k+αandβ′ =n−k+β.
What is the frequentist estimate of f? What is the maximum likelihood estimate of f? What is the maximum a posteriori mode of f using the non- informative conjugate prior p(f) ∼ B(α = 1,β = 1)? What is the maximum a posteriori mode of f assuming k = 12 and n = 235 using the uniform prior p(f) ∼ U(0.5,1)?
Question C Assuming we collected 100 samples and 25 of them have bright skin colour, please complete the R code below in order to generate both the exact and approximated posterior distribution of f using the informative prior p(f) ∼ B(α = 2,β = 1). You need to fill in where the string ’???’ is present. Finally, discuss which summaries you would report to describe the resulting posterior distribution.
# we evaluate our parameter f over a grid of 100 values for the whole range [0 ,1]
f <− seq(0, 1, ???)
# suppose we collected 100 samples and 25 of them have bright skin colour
k <− 25
1
2 3 4
5
3
6 7
8
9 10 11
12 13 14
15
16 17
n <− 100
# alpha and beta are the parameter values for the
posterior Beta distribution
alpha <− ??? beta <− ???
# we now evaluate the density function to obtain the EXACT posterior distribution
y <− ???(??? , shape1=alpha , shape2=beta )
# we now use Monte Carlo sampling to obtain the APPROXIMATED posterior distribution .
Make a reasonable choice for the number of random samples .
y sampled <− ???
y sampled distribution <− ???
If we use a Normal distribution as prior information, such as p(f) = N(μ,σ2), we cannot derive a closed form and cannot sample directly from the posterior distribution. We can use a rejection sampling algorithm for indirect sampling of the posterior distribution. This algorithm requires the identification of an envelope function g(f) and a constant M > 0 such that p(k|f,n)p(f) < Mg(f).
Identify both a suitable envelope function g(f) and a value for M assuming that we know that the maximum density value for the posterior distribution is K. Describe the full algorithm (with formal notation for equarions, e.g. ν ∼ P (λ = 2)) or write its pseudocode (as accurate as possible). Describe what happens to the algorithm and/or the approximation if we choose M >> K or M << K.
If we want to use an MCMC sampling approach, is it possible to use a Gibbs sampler in this case or is it more appropriate to use a rejection algorithm like the Metropolis one? Why?
Question D
4
3 Ancient DNA
You are hiking on the Pyrenees mountain range and you discovered an ancient human bone. You extracted the DNA and sequenced its genome. Unfortunately the sample was not well preserved and it was highly contaminated. Therefore you were able to obtain the genotypic information for only one locus. Specif- ically, you found that, at a specific genomic position, the mysterious ancient human has an AA genotype, homozygous for adenine. Nevertheless, you want to make some statistical inferences on whether this sample is genetically closer to modern Spanish, French or Basque individuals. This analysis is a proxy for understanding whether this sample is more likely to be an ancestor for modern Spanish, French or Basque. In other words, you want to perform a population assignment based on genetic data.
Let’s assume that your parameter of interest is θ = {S,F,B} representing
the probability that your sample comes from a Spanish (S), French (F) or Basque
(B) population, respectively. The data is y = {g,f(S),f(F),f(B)} where g is the AAA
ancient genotype (so that g = AA) and f(S), f(F), f(B) are the known popula- AAA
tion frequencies of allele A in modern Spanish, French and Basque, respectively. Under the assumption of Hardy Weinberg Equilibrium, we know that
p(g = AA,f(i)|θ = i) = (f(i))2 AA
for a generic population i. Note the latter equation represents the likelihood function f(y|θ).
Question A Using Bayes’ law, write the equation for the posterior probability of the sample belonging to the Spanish population given the data. Assume that you have a generic prior probability π(θ) with known hyperparameters. Be as formal and explicit as possible. No proofs or extra calculations are required. Let’s assume that we gather the follow population allele frequencies
f(S) =0.7 A
f(F) =0.2 A
f(B) =0.1 A
and that we ask for an opinion to experts regarding a prior probability of this sample belonging to any of the tested populations. Here the opinion from our 3 experts:
1. Dr Cobain: It is still highly debated whether ancient humans in
the Pyrenees are the ancestors of modern French, Spanish or Basque. All anthropological evidences so far are not solid enough to point towards any specific population. We have no clue!
5
2. Professor Grohl FRS: It must be Spanish. I see no evidence why
this sample should be the ancestor of any other modern population. I am 100% sure.
3. Mr Novoselic: We have collected more than 1,000 ancient samples from Pyrenees so far and we were able to assign 50% of them as Spanish, 30% as French, and 20% as Basque.
Question B Based on this information, choose the most suitable prior distri- bution π(θ). Justify your choice. There is no right or wrong answer (although one of them is hardly acceptable) as long as it is properly justified. Formalise π(θ) by assigning a prior probability for each value of θ based on your choice.
Question C Based on your chosen prior distribution, calculate the Bayes factor for model M1 with parameter {θ = S} vs. model M2 with parameter {θ ̸= S}. Write the equation and provide the value for the Bayes factor. Approximate any calculation as much as you wish but be reasonable (e.g. 0.82/0.19 ≈ 4 is totally fine but 0.82/0.19 ≈ 3 is not). Provide a brief discussion on the support for M1 or M2.
Question D Let’s assume that a new prior probability on θ is now dependent on an unknown hyperparameter τ with distribution h(τ) with τ = {−1,0,1}. In other words, the prior distribution is π(θ|τ) and τ can only have the discrete values of −1, 0, or −1. Using Bayes’ law in hierarchical modelling, write the equation for the posterior distribution of the sample being Spanish. Be as formal and explicit as possible. No proofs or extra calculations are required. Note that both θ and τ are discrete distributions.
Finally, you want to estimate the posterior distribution for the allele fre- quency of the ancestral Spanish population given the ancient and modern geno- type data. Let’s call this parameter φ, continuous within the range [0, 1]. The model is more complicated (and outside the scope of this exercise) and in- deed you cannot derive a closed form for the posterior distribution. There- fore you want to use an indirect sampling method to approximate the poste- rior distribution p(φ|y). As seen in class, one possibility is to use a rejection sampling algorithm which uses an envelope function to ”cover” the posterior distribution. However, this algorithm requires the specification of M so that L(φ)π(φ) < Mg(φ) with L(φ) being the likelihood function. M may be not readily available. Furthermore, this algorithm is computationally inefficient as it has a rejection step.
Therefore, you want to derive and implement a more efficient algorithm for indirect sampling of the posterior distribution, p(φ). Suppose you have N draws of φ (so φ1, φ2, ...φN ) sampled from some approximating density function g(φ). This situation is illustrated in Figure 2. The continuous black line is the unknown posterior distribution, or a distribution up to a constant to L(φ)π(φ); the red dashed line is the chosen approximating density g(φ); red small vertical
6
lines on the x-axis represent the N samples drawn which are then evaluated at g(φ).
Figure 2: Indirect sampling to approximate a posterior distribution.
Question E Assume that you can evaluate L(φi)π(φi) and g(φi) for each one of the N draws of φ. You cannot sample from L(φ)π(φ) but you can sample from g(φ) which is an approximating function for the posterior distribution. Devise an efficient algorithm for sampling φi to approximate the posterior distribution. Discuss what happens when N becomes larger and any potential limitation of this method (e.g. based on the choice of g(φ)).
(Hint: If g(φ) ≡ p(φ|y) then your sample φ1, φ2, ...φN is already a sample from posterior. However, since g(φ) is just an approximating function, you do not want to resample from the set {φ1 , φ2 , ...φN } with equally likely probability of selection... )
7
4 Extinctions
The time between extinction events of amphibians in South America under current climatic conditions (λ) can be described with an exponential distribution
p(x|λ) = λe−λx
for x ≥ 0 with X = {x1, x2, ..., xn} being a continuous random variable. Please note that p(x|λ) = 0 for x < 0.
The conjugate prior for an exponential distribution is a Gamma distribution p(λ|α, β) = λα−1e−λ/β
Γ(α)βα
with Γ(α) being the gamma function (a normalising factor). Please note that
α > 0 and β > 0 and that the expected value is αβ and the variance i s αβ2
Question A Show that the posterior distribution p(λ|x) is a Gamma distri- bution G(α′, β′) with α′ = α + 1, assuming we have a single observation x. Please note that β′ = β + x.
Question B Assume that, based on past observations, you expect a time between extictions of 3.5 a priori but with a large uncertainty associated to it. Choose suitable values for hyper-parameters α and β to fit this prior belief and calculate the posterior mean with x = 2.5.
Question C Assume that you calculate a Bayes factor for testing M1 = {λ ≥ t} vs. M2 = {λ < t} with t > 0 being a threshold on whether or not to activate a conservation strategy. You obtain a value of 150. Discuss the support for p(λ|x) ≥ t and p(λ|α,β) ≥ t in light of the definition and interpretation of Bayes factors.
Assuming that the 95% highest density posterior interval for λ is [0.29 − 28.69], what can we say about the probability that the time between extictions is larger than 28.69?
Question D Assume that your prior information is now described by a Nor- mal distribution p(λ|μ,σ2), that is you lack a conjugate prior. Describe an algorithm (or write a pseudo code) for obtaining samples for the posterior dis- tribution p(λ|x). Be as precise and formal as possible and highlight any pros and cons of the chosen algorithm.
Question E Answer either point (a) or (b). (a)
Describe the rationale behind the sequential Monte Carlo (SMC) MCMC algorithm to estimate parameters and perform model selection. What are the
8
main advantages (and disadvantages, if any) over a standard MCMC? What are the additional parameters of the algorithm?
(b)
Describe the main features of representing probabilistic relationships be- tween random variables with a Bayesian network.
9
Abbreviations
BF: Bayes factor
HPD: highest posterior density
iid: independent and identically distributed LR: likelihood ratio
LRT: likelihood ratio test
MAP: maximum a posteriori probability MCMC: Markov chain Monte Carlo
MLE: maximum likelihood estimate
pdf: probability density function
pmf: probability mass function
List of Figures
1 PosteriordistributionsforQuestionConbirds. . . . . . . . . . . 2
2 Indirect sampling to approximate a posterior distribution . . . . 7
List of Tables
10