CS计算机代考程序代写 chain Bayesian Excel Introduction to Bayesian Inference.

Introduction to Bayesian Inference.

Introduction to Bayesian Inference.

Elena Moltchanova

STAT314/461-2021S1

Rev.Thomas Bayes ()

An Essay towards solving a Problem in the Doctrine of Chances. By
the late Rev. Mr. Bayes, communicated by Mr. Price, in a letter to
John Canton, M. A. and F. R. S.

in

Philosophical Transactions of the Royal Society of London 53
(1763), 379-418

Experienced

Trainee

5/7

2/7

Excellent

Good

So−So

Excellent

Good

So−So

0.80

0.15

0.05

0.20

0.50

0.30

Pr(good coffee|trainee barista)

1. Meet barista

2. Barista makes coffee

Bayes’ Theorem aka Inverse Probability Formula.

Consider a set of mutually exhaustive and mutually exclusive events
A1,A2, …,AK and an event B. Assume that the probabilities Pr(Ak)
and Pr(B|Ak) are known for all k = 1, …,K .Then, for some j ,

Pr(Aj |B) =
Pr(B|Aj)Pr(Aj)∑
k Pr(B|Ak)Pr(Ak)

.

Proof.


k

Pr(B|Ak)Pr(Ak) =

k
Pr(B&Ak) = Pr(B).

Therefore:

Pr(Aj |B) =
Pr(B|Aj)Pr(Aj)

Pr(B)
.

Multiplying both sides by Pr(B):

Pr(Aj |B)Pr(B) = Pr(B|Aj)Pr(Aj).

Pr(Aj&B) = Pr(B&Aj).

Thus, the equality holds.

Alternatively:

Pr(A|B) =
Pr(B|A)Pr(A)

Pr(B)
.

Back to coffee:

Given that I am drinking a cup of a very good coffee, what is
the probability that it was made by the trainee barista?

In other words, let’s think of a set of K = 2 mutually exlusive and
mutually exhaustive events: A1 = experienced, and A2 = trainee.
And the event of interest B = excellent coffee.

Experienced

Trainee

5/7

2/7

Excellent

Good

So−So

Excellent

Good

So−So

0.80

0.15

0.05

0.20

0.50

0.30

Put numbers into Bayes’ Formula:

We know that Pr(A1) = 5/7 and Pr(A2) = 2/7. We also know that
Pr(B|A1) = 0.80 and Pr(B|A2) = 0.20. We can use Bayes’
Theorem to obtain our quantity of interest:

Pr(A1|B) =
Pr(B|A1)Pr(A1)

Pr(B|A1)Pr(A1) + Pr(B|A2)Pr(A2)

=
5/7 ∗ 0.80

5/7 ∗ 0.80 + 2/7 ∗ .20
= 10/11 ≈ 0.91.

Using the “tree”:

Experienced

Trainee

5/7

2/7

Excellent

Good

So−So

Excellent

Good

So−So

0.80

0.15

0.05

0.20

0.50

0.30

Another cup

Experienced

Trainee

10/11

1/11

Excellent

Good

So−So

Excellent

Good

So−So

0.80

0.15

0.05

0.20

0.50

0.30

Another cup: so-so

Pr(Experienced|Excellent,So-so) =
10/11 ∗ .05

10/11 ∗ .05 + 1/11 ∗ .30
= 0.625.

Two cups at once:

The probability that you get and Excellent and a So-so cup from the
experienced barista is 0.80 ∗ 0.05 = 0.04. The probability that you
get the same from the trainee is 0.20 ∗ 0.30 = 0.06. Applying Bayes’
Theorem we get

Pr(Experienced|Excellent,So-so) =

Pr(Ex,Ss|Exp)Pr(Exp)
Pr(Ex,Ss|Exp)Pr(Exp) + Pr(Ex,Ss|Tr)Pr(Tr)

=
0.04 ∗ 5/7

0.04 ∗ 5/7 + 0.06 ∗ 2/7
= 0.625.

Using Bayes Formula:

I The object of inference – the probability that the particular
barista is at work today – is constantly updated in light of the
data.

I This natural learning process is happening sequentially – cup by
cup. (Or batch by batch).

I You can easily include other people’s observations into your
process.

Refresher on the Probability Distributions – 1

Consider two random variables x and y with the corresponding
probability density functions (p.d.f.) f (x) and f (y). Note, that we
will use f () as a generic notation for any p.d.f.

The following properties apply to any p.d.f.:

1. f (x) ≥ 0 ∀x .
2.

∫∞
−∞ f (x)dx = 1.

Refresher on the Probability Distributions – 2:

The cumulative density function (c.d.f.) is defined as

F (x) = Pr(X ≤ x) =
∫ x
−∞

f (x)dx .

Refresher on the Probability Distributions – 3:

The joint distribution of x and y is

f (x , y) = f (x |y)f (y) = f (y |x)f (x).

Here, f (y |x) is referred to as conditional p.d.f. of y given x .

Note, that when x and y are independent,

f (x , y) = f (x)f (y).

Refresher on the Probability Distributions – 4:

The marginal p.d.f. f (x) can be obtained as:

f (x) =
∫ ∞
−∞

f (x , y)dy .

Refresher on the Probability Distributions – 5:

In general, for random variables x1, x2, x3, …, xK with the joint p.d.f.
f (x1, x2, x3, …, xK ) the following chain rule applies:

f (x1, x2, x3, …, xK ) = f (x1|x2, x3, …, xK )f (x2|x3, …, xK )…f (xK ).

Bayes formula for distributions-1

Let f (x |θ) denote the p.d.f of data x given parameter θ, and let
f (θ) denote the p.d.f. of the parameter θ. Then:

f (θ|x) =
f (x |θ)f (θ)∫

Θ f (x |θ)f (θ)dθ
.

Bayes formula for distributions-2

Note, it is easy to check that this holds:


Θ

f (x |θ)f (θ)dθ =

Θ
f (x , θ)dθ = f (x).

I.e.,
f (θ|x) =

f (x |θ)f (θ)
f (x)

.

Classical vs. Bayesian Inference:

Classical:
I Experiments are infinitely repeatable under the same conditions

(hence: ’frequentist’)
I The parameter of interest (θ) is fixed and unknown
I Inference via Maximum Likelihood

Bayesian:
I Each experiment is unique (i.e., not repeatable)
I The parameter of interest has an unknown distribution
I Inference via Bayes’ Theorem

What is prior distribution?

I Prior expresses our knowledge about the parameter distribution
before the experiment. It may be based on some general
considerations (a binomial probability has to lie between 0 and
1; average human height must lie between 140 and 190) or on
previous experiments (the first diagnostic test was positive).

I If no information is available, a so-called vague or uninformative
prior can be used. (BUT: what is uninformative?)

I Different statisticians may have different priors. Sensitivity
analysis is important.