CS计算机代考程序代写 Bayesian Assignment 1. Solutions

Assignment 1. Solutions

Assignment 1. Solutions

STAT314/STAT461

Set: Tue, July-27. Due: Fri Aug-06

Problem 1. Inverse Probability.
A total of 38 ancient manuscripts have so far been found in a certain area of England. Of those, 20 have
been solidly attributed to the Historian A, 17 to the Historian B, and only 1 to the Historian C. All the
chronicles pertain to the legendary King Arthur. Historian A tends to mention him on average 5 times per
page, Historian B – 3 times per page, and Historian C is a fan and mentions King Arthur about 10 times per
page.

Assume Poisson distribution for the number of mentions per page, so that you can use Poisson p.d.f. to
evaluate the probability of a specific number of references to King Arthur on a page.

(a) If a single page is found from the same time period, and there is no mention of King Arthur, what is
the probability that it was written by the historian C? (1pt)

(b) What assumptions have you made in the process of your analysis? List a couple; explain why you think
they apply; and give counterexamples when they would not. (1pt)

Solution.

Using the information available so far, and assuming that the sample so far has been representative and
that no other sources (Historians D, E etc.) exist. Then we can evaluate the respective probabilities that a
manuscript has been written by a particular historian as

Pr(Hist=A) = 20/38,
P r(Hist=B) = 17/38,
P r(Hist=C) = 1/38.

Furthermore, if x denotes the number of mentions of King Arthur, then, assuming Poisson distribution,

Pr(x|Hist) =
λxHist exp(−λHist)

x!
,

where λHist is the average frequency of mentions of King Arthur per page attributed to the respective
Historian. In other words,λA = 5, λB = 3, λC = 10.

Specifically,

Pr(x = 0|Hist) =
λ0Hist exp(−λHist)

0!
= exp(−λHist).

Now, we are ready to use the Bayes’ Formula:

1

Pr(Hist = C|x = 0) =
Pr(x = 0|Hist = C)Pr(Hist = C)

Pr(x = 0|A)Pr(A) + Pr(x = 0|B)Pr(B) + Pr(x = 0|C)Pr(C)

=
exp(−10)1/38

exp(−5)20/38 + exp(−3)17/38 + exp(−10)1/38
≈ 0.000046. (1)

In other words, the probability that the newly found page was produced by Historian C is extremely small.

Some assumptions you might want to mention:

• one of only these three historians could have written the page (with an obvious counterexample of there
being others, not before seen)

• Poisson distribution for the counts per page was assumed. A zero-inflated distribution is more likely
(some pages may be about events which did not involve King Arthur at all).

Problem 2: Maximum Likelihood and Bayesian inference.
(a) To prove that the function f(x) = λ exp−λx is a valid p.d.f., we need to prove that it is non-negative

everywhere and that it integrates to 1 over the domain of x.

Since the parameter λ is non-negative, and the exponent is always positive, the product will also be
non-negative. I.e., f(x) ≥ 0 for λ > 0, x ≥ 0.

Let’s look at the integral:

∫ ∞
0

λ exp−λx dx = − exp−λx |∞0 = −(0− 1) = 1.

QED.

(b) The joint likeihood can be found as:

L =

i

f(xi) =

i

(
λ exp−λxi

)
= λn exp(−λ


i

xi) for xi ≥ 0, i = 1, …, n.

In order to find the maximum likelihood estimator for λ, we need to find a value λ̂ that maximizes the above
function.

It is easier to differentiate a sum rather than a product, and log() is a monotonically increasing function, so
if λ̂ maximizes log(L), then it will also maximize L.

logL = n log λ− λ

i

xi.

Differentiating with respect to λ:

d logL

= n/λ−

i

xi = 0.

when

λ =
n∑
i xi

=
1

,

where x̄ is the sample average.

2

Remember, that we need to confirm that this is indeed a global maximum. One way to do it is differentiate
again:

d2 logL
dλ2

=
d

(
n/λ−


i

xi

)
= −

1
λ2
.

The second derivative is negative for any positive value of λ. The function is thus strictly concave, and λ̂ = 1

is the MLE.

NB. Another way is to evaluate and compare the values of logL at the endpoints of the domain and at 1/x̄.
This would involve the use of limits and is, perhaps, unnecessarily complicated, but is still a solution.

(c) The prior p.d.f. is

f(λ|α0, β0) =
βα00

Γ(α0)
λα0−1 exp(−β0λ)

Using Bayes’ theorem and proportionality (i.e., concentrating on the numerator only, and discarding any
terms which do not contain the parameter of interest), we get

f(λ|x, α0, β0) ∝ λn exp(−λ

i

xi)λα0−1 exp(−β0λ)

∝ λα0+n−1 exp(−(β0 +

i

xi)λ)

which is proportional to a Gamma density up to a constant. In other words,

λ|x, α0, β0 ∼ Gamma(α0 + n, β0 +

i

xi)

(d) The posterior mean of the above Gamma distribution is:

E(λ|x, α0, β0) =
α0 + n

β0 +

i xi

=
α0/n+ 1
β0/n+ x̄

As n→∞, E(λ|x, α0, β0)→ 1x̄ . I.e., the Bayesian estimate approaches the MLE.

Problem 3: Prior distribution.
Note, that the parameter λ is the rate parameter. I.e., how many buses per minute you expect on average.
We thus want to find parameters α0 and β0 which would ensure that

α0/β0 ≈ 1/15

and that most of the distribution 1/16 and 1/14.

One way to do it is to set the variance to equal ((1/14− 1/16)/4)2 ≈ 0.0042.

α0/β
2
0 = 0.004

2

Solving this yields β0 = 1/15/(0.0042) ≈ 4166.67, and α0 = β0/15 = 277.78.

To check, try making a plot or a simulation and see how well it incorporates your prior assumptions

3

alpha0 <- 277.78 beta0 <- 4166.67 lambda <- rgamma(10^5,alpha0,beta0) mean(1/lambda) ## [1] 15.05029 quantile(1/lambda,c(0.025,.975)) ## 2.5% 97.5% ## 13.37896 16.92814 hist(lambda,col='plum') abline(v=c(1/14,1/15,1/16),lwd=3,lty=c(3,1,3)) Histogram of lambda lambda F re q u e n cy 0.050 0.055 0.060 0.065 0.070 0.075 0.080 0.085 0 5 0 0 0 1 0 0 0 0 1 5 0 0 0 2 0 0 0 0 Not too bad. Of course, the 4σ approximation only applies to the normal distribution density. Another way to incorporate this prior assumption would be to search for β0 on a grid so that α0/β0 = 1/15 and there is about 95% of the density mass between the 1/14 and 1/16. Problem 4: Prior Predictive Distribution. To generate a sample from the prior predictive distribution: alpha0 <- 277.78 beta0 <- 4166.67 4 lambda <- rgamma(10^4,alpha0,beta0) x <- rexp(10^4,lambda) hist(x,col='plum',xlab='Time waiting for a bus',freq=F) Histogram of x Time waiting for a bus D e n si ty 0 50 100 150 0 .0 0 0 .0 1 0 .0 2 0 .0 3 0 .0 4 0 .0 5 Based on this simulation, the probability that a random wait will be at least 20 minutes is mean(x>=20)

## [1] 0.2574

So, not particularly unusual. We can thus conclude that the data point and the model do not disagree.

5

Problem 1. Inverse Probability.
Solution.

Problem 2: Maximum Likelihood and Bayesian inference.
Problem 3: Prior distribution.
Problem 4: Prior Predictive Distribution.