COMP2610 / COMP6261 – Information Theory – Lecture 4: Bayesian Inference
COMP2610 / COMP6261 – Information Theory
Lecture 4: Bayesian Inference
Robert C. Williamson
Research School of Computer Science
1 L O G O U S E G U I D E L I N E S T H E A U S T R A L I A N N A T I O N A L U N I V E R S I T Y
ANU Logo Use Guidelines
Deep Gold
C30 M50 Y70 K40
PMS Metallic 8620
PMS 463
Black
C0 M0 Y0 K100
PMS Process Black
Preferred logo Black version
Reverse version
Any application of the ANU logo on a coloured
background is subject to approval by the Marketing
Office, contact
brand@anu.edu.au
The ANU logo is a contemporary
reflection of our heritage.
It clearly presents our name,
our shield and our motto:
First to learn the nature of things.
To preserve the authenticity of our brand identity, there are
rules that govern how our logo is used.
Preferred logo – horizontal logo
The preferred logo should be used on a white background.
This version includes black text with the crest in Deep Gold in
either PMS or CMYK.
Black
Where colour printing is not available, the black logo can
be used on a white background.
Reverse
The logo can be used white reversed out of a black
background, or occasionally a neutral dark background.
July 31, 2018
1 / 38
Last time
Examples of joint, marginal and conditional distributions
When can we say that X ,Y do not influence each other?
What, if anything, does p(X = x |Y = y) tell us about
p(Y = y |X = x)?
2 / 38
Review Exercise
Suppose we have binary random variables X ,Y such that
p(X = 1) = 0.6
p(Y = 1|X = 0) = 0.7
p(Y = 1|X = 1) = 0.8
Then,
p(X = 1|Y = 1) =
p(Y = 1|X = 1)p(X = 1)
p(Y = 1)
Bayes’ rule
=
p(Y = 1|X = 1)p(X = 1)
p(Y = 1|X = 1)p(X = 1) + p(Y = 1|X = 0)p(X = 0)
=
(0.8)(0.6)
(0.8)(0.6) + (0.7)(0.4)
≈ 0.63
3 / 38
Review Exercise
Suppose we have binary random variables X ,Y such that
p(X = 1) = 0.6
p(Y = 1|X = 0) = 0.7
p(Y = 1|X = 1) = 0.8
Then,
p(X = 1|Y = 1) =
p(Y = 1|X = 1)p(X = 1)
p(Y = 1)
Bayes’ rule
=
p(Y = 1|X = 1)p(X = 1)
p(Y = 1|X = 1)p(X = 1) + p(Y = 1|X = 0)p(X = 0)
=
(0.8)(0.6)
(0.8)(0.6) + (0.7)(0.4)
≈ 0.63
3 / 38
Review Exercise
Suppose we have binary random variables X ,Y such that
p(X = 1) = 0.6
p(Y = 1|X = 0) = 0.7
p(Y = 1|X = 1) = 0.8
Then,
p(X = 1|Y = 1) =
p(Y = 1|X = 1)p(X = 1)
p(Y = 1)
Bayes’ rule
=
p(Y = 1|X = 1)p(X = 1)
p(Y = 1|X = 1)p(X = 1) + p(Y = 1|X = 0)p(X = 0)
=
(0.8)(0.6)
(0.8)(0.6) + (0.7)(0.4)
≈ 0.63
3 / 38
Review Exercise
Suppose we have binary random variables X ,Y such that
p(X = 1) = 0.6
p(Y = 1|X = 0) = 0.7
p(Y = 1|X = 1) = 0.8
Then,
p(X = 1|Y = 1) =
p(Y = 1|X = 1)p(X = 1)
p(Y = 1)
Bayes’ rule
=
p(Y = 1|X = 1)p(X = 1)
p(Y = 1|X = 1)p(X = 1) + p(Y = 1|X = 0)p(X = 0)
=
(0.8)(0.6)
(0.8)(0.6) + (0.7)(0.4)
≈ 0.63
3 / 38
This time
More examples on Bayes’ theorem:
I Eating hamburgers
I Detecting terrorists
I The Monty Hall problem
Are there notions of probability beyond frequency counting?
4 / 38
Outline
1 Bayes’ Rule: Examples
Eating Hamburgers
Detecting Terrorists
The Monty Hall Problem
2 Moments for functions of two discrete Random Variables
3 The meaning of Probability
4 Wrapping Up
5 / 38
1 Bayes’ Rule: Examples
Eating Hamburgers
Detecting Terrorists
The Monty Hall Problem
2 Moments for functions of two discrete Random Variables
3 The meaning of Probability
4 Wrapping Up
6 / 38
Bayesian Inference:
Example 1 (Barber, BRML, 2011)
90% of people with McD syndrome are frequent hamburger eaters
Probability of someone having McD syndrome: 1/10000
Proportion of hamburger eaters is about 50%
What is the probability that a hamburger eater will have McD syndrome?
7 / 38
Bayesian Inference:
Example 1 (Barber, BRML, 2011)
90% of people with McD syndrome are frequent hamburger eaters
Probability of someone having McD syndrome: 1/10000
Proportion of hamburger eaters is about 50%
What is the probability that a hamburger eater will have McD syndrome?
7 / 38
Bayesian Inference:
Example 1 (Barber, BRML, 2011)
90% of people with McD syndrome are frequent hamburger eaters
Probability of someone having McD syndrome: 1/10000
Proportion of hamburger eaters is about 50%
What is the probability that a hamburger eater will have McD syndrome?
7 / 38
Bayesian Inference:
Example 1 (Barber, BRML, 2011)
90% of people with McD syndrome are frequent hamburger eaters
Probability of someone having McD syndrome: 1/10000
Proportion of hamburger eaters is about 50%
What is the probability that a hamburger eater will have McD syndrome?
7 / 38
Bayesian Inference:
Example 1: Formalization
Let McD ∈ {0, 1} be the variable denoting having the McD syndrome and
H ∈ {0, 1} be the variable denoting a hamburger eater. Therefore:
p(H = 1|McD = 1) = 9/10 p(McD = 1) = 10−4
p(H = 1) = 1/2
We need to compute p(McD = 1|H = 1), the probability of a hamburger
eater having McD syndrome.
Any ballpark estimates of this probability?
8 / 38
Bayesian Inference:
Example 1: Formalization
Let McD ∈ {0, 1} be the variable denoting having the McD syndrome and
H ∈ {0, 1} be the variable denoting a hamburger eater. Therefore:
p(H = 1|McD = 1) = 9/10
p(McD = 1) = 10−4
p(H = 1) = 1/2
We need to compute p(McD = 1|H = 1), the probability of a hamburger
eater having McD syndrome.
Any ballpark estimates of this probability?
8 / 38
Bayesian Inference:
Example 1: Formalization
Let McD ∈ {0, 1} be the variable denoting having the McD syndrome and
H ∈ {0, 1} be the variable denoting a hamburger eater. Therefore:
p(H = 1|McD = 1) = 9/10 p(McD = 1) = 10−4
p(H = 1) = 1/2
We need to compute p(McD = 1|H = 1), the probability of a hamburger
eater having McD syndrome.
Any ballpark estimates of this probability?
8 / 38
Bayesian Inference:
Example 1: Formalization
Let McD ∈ {0, 1} be the variable denoting having the McD syndrome and
H ∈ {0, 1} be the variable denoting a hamburger eater. Therefore:
p(H = 1|McD = 1) = 9/10 p(McD = 1) = 10−4
p(H = 1) = 1/2
We need to compute p(McD = 1|H = 1), the probability of a hamburger
eater having McD syndrome.
Any ballpark estimates of this probability?
8 / 38
Bayesian Inference:
Example 1: Formalization
Let McD ∈ {0, 1} be the variable denoting having the McD syndrome and
H ∈ {0, 1} be the variable denoting a hamburger eater. Therefore:
p(H = 1|McD = 1) = 9/10 p(McD = 1) = 10−4
p(H = 1) = 1/2
We need to compute p(McD = 1|H = 1), the probability of a hamburger
eater having McD syndrome.
Any ballpark estimates of this probability?
8 / 38
Bayesian Inference:
Example 1: Formalization
Let McD ∈ {0, 1} be the variable denoting having the McD syndrome and
H ∈ {0, 1} be the variable denoting a hamburger eater. Therefore:
p(H = 1|McD = 1) = 9/10 p(McD = 1) = 10−4
p(H = 1) = 1/2
We need to compute p(McD = 1|H = 1), the probability of a hamburger
eater having McD syndrome.
Any ballpark estimates of this probability?
8 / 38
Bayesian Inference:
Example 1: Solution
p(McD = 1|H = 1) =
p(H = 1|McD = 1)p(McD = 1)
p(H = 1)
= 1.8× 10−4
Repeat the above computation if the proportion of hamburger eaters is
rather small: (say in France) 0.001.
9 / 38
Bayesian Inference:
Example 1: Solution
p(McD = 1|H = 1) =
p(H = 1|McD = 1)p(McD = 1)
p(H = 1)
= 1.8× 10−4
Repeat the above computation if the proportion of hamburger eaters is
rather small: (say in France) 0.001.
9 / 38
Example 2: Detecting Terrorists:
From understandinguncertainty.org
Scanner detects true terrorists with 95% accuracy
Scanner detects upstanding citizens with 95% accuracy
There is 1 terrorist on your plane with 100 passengers aboard
The shifty looking man sitting next to you tests positive (terrorist)
What are the chances of this man being a terrorist?
10 / 38
understandinguncertainty.org
Example 2: Detecting Terrorists:
From understandinguncertainty.org
Scanner detects true terrorists with 95% accuracy
Scanner detects upstanding citizens with 95% accuracy
There is 1 terrorist on your plane with 100 passengers aboard
The shifty looking man sitting next to you tests positive (terrorist)
What are the chances of this man being a terrorist?
10 / 38
understandinguncertainty.org
Example 2: Detecting Terrorists:
From understandinguncertainty.org
Scanner detects true terrorists with 95% accuracy
Scanner detects upstanding citizens with 95% accuracy
There is 1 terrorist on your plane with 100 passengers aboard
The shifty looking man sitting next to you tests positive (terrorist)
What are the chances of this man being a terrorist?
10 / 38
understandinguncertainty.org
Example 2: Detecting Terrorists:
From understandinguncertainty.org
Scanner detects true terrorists with 95% accuracy
Scanner detects upstanding citizens with 95% accuracy
There is 1 terrorist on your plane with 100 passengers aboard
The shifty looking man sitting next to you tests positive (terrorist)
What are the chances of this man being a terrorist?
10 / 38
understandinguncertainty.org
Example 2: Detecting Terrorists:
From understandinguncertainty.org
Scanner detects true terrorists with 95% accuracy
Scanner detects upstanding citizens with 95% accuracy
There is 1 terrorist on your plane with 100 passengers aboard
The shifty looking man sitting next to you tests positive (terrorist)
What are the chances of this man being a terrorist?
10 / 38
understandinguncertainty.org
Example 2: Detecting Terrorists:
Simple Solution Using “Natural Frequencies” (David Spiegelhalter)
Figure reproduced from understandinguncertainty.org
The chances of the man being a terrorist are ≈ 16
Relation to disease example
Consequences when catching criminals
11 / 38
understandinguncertainty.org
Example 2: Detecting Terrorists:
Simple Solution Using “Natural Frequencies” (David Spiegelhalter)
Figure reproduced from understandinguncertainty.org
The chances of the man being a terrorist are ≈ 16
Relation to disease example
Consequences when catching criminals
11 / 38
understandinguncertainty.org
Example 2: Detecting Terrorists:
Simple Solution Using “Natural Frequencies” (David Spiegelhalter)
Figure reproduced from understandinguncertainty.org
The chances of the man being a terrorist are ≈ 16
Relation to disease example
Consequences when catching criminals
11 / 38
understandinguncertainty.org
Example 2: Detecting Terrorists:
Formalization with Actual Probabilities
Let T ∈ {0, 1} denote the variable regarding whether the person is a
terrorist and S ∈ {0, 1} denote the outcome of the scanner.
p(S = 1|T = 1) = 0.95 p(S = 0|T = 1) = 0.05
p(S = 0|T = 0) = 0.95 p(S = 1|T = 0) = 0.05
p(T = 1) = 0.01 p(T = 0) = 0.99
We want to compute p(T = 1|S = 1), the probability of the man being a
terrorist given that he has tested positive.
12 / 38
Example 2: Detecting Terrorists:
Formalization with Actual Probabilities
Let T ∈ {0, 1} denote the variable regarding whether the person is a
terrorist and S ∈ {0, 1} denote the outcome of the scanner.
p(S = 1|T = 1) = 0.95 p(S = 0|T = 1) = 0.05
p(S = 0|T = 0) = 0.95 p(S = 1|T = 0) = 0.05
p(T = 1) = 0.01 p(T = 0) = 0.99
We want to compute p(T = 1|S = 1), the probability of the man being a
terrorist given that he has tested positive.
12 / 38
Example 2: Detecting Terrorists:
Formalization with Actual Probabilities
Let T ∈ {0, 1} denote the variable regarding whether the person is a
terrorist and S ∈ {0, 1} denote the outcome of the scanner.
p(S = 1|T = 1) = 0.95 p(S = 0|T = 1) = 0.05
p(S = 0|T = 0) = 0.95 p(S = 1|T = 0) = 0.05
p(T = 1) = 0.01 p(T = 0) = 0.99
We want to compute p(T = 1|S = 1), the probability of the man being a
terrorist given that he has tested positive.
12 / 38
Example 2: Detecting Terrorists:
Formalization with Actual Probabilities
Let T ∈ {0, 1} denote the variable regarding whether the person is a
terrorist and S ∈ {0, 1} denote the outcome of the scanner.
p(S = 1|T = 1) = 0.95 p(S = 0|T = 1) = 0.05
p(S = 0|T = 0) = 0.95 p(S = 1|T = 0) = 0.05
p(T = 1) = 0.01 p(T = 0) = 0.99
We want to compute p(T = 1|S = 1), the probability of the man being a
terrorist given that he has tested positive.
12 / 38
Example 2: Detecting Terrorists:
Formalization with Actual Probabilities
Let T ∈ {0, 1} denote the variable regarding whether the person is a
terrorist and S ∈ {0, 1} denote the outcome of the scanner.
p(S = 1|T = 1) = 0.95 p(S = 0|T = 1) = 0.05
p(S = 0|T = 0) = 0.95 p(S = 1|T = 0) = 0.05
p(T = 1) = 0.01 p(T = 0) = 0.99
We want to compute p(T = 1|S = 1), the probability of the man being a
terrorist given that he has tested positive.
12 / 38
Example 2: Detecting Terrorists:
Solution with Bayes’ Rule
p(T = 1|S = 1) =
p(S = 1|T = 1)p(T = 1)
p(S = 1|T = 1)p(T = 1) + p(S = 1|T = 0)p(T = 0)
=
(0.95)(0.01)
(0.95)(0.01) + (0.05)(0.99)
≈ 0.16
The probability of the man being a terrorist is ≈ 16
13 / 38
Example 2: Detecting Terrorists:
Solution with Bayes’ Rule
p(T = 1|S = 1) =
p(S = 1|T = 1)p(T = 1)
p(S = 1|T = 1)p(T = 1) + p(S = 1|T = 0)p(T = 0)
=
(0.95)(0.01)
(0.95)(0.01) + (0.05)(0.99)
≈ 0.16
The probability of the man being a terrorist is ≈ 16
13 / 38
Example 2: Detecting Terrorists:
Solution with Bayes’ Rule
p(T = 1|S = 1) =
p(S = 1|T = 1)p(T = 1)
p(S = 1|T = 1)p(T = 1) + p(S = 1|T = 0)p(T = 0)
=
(0.95)(0.01)
(0.95)(0.01) + (0.05)(0.99)
≈ 0.16
The probability of the man being a terrorist is ≈ 16
13 / 38
Example 2: Detecting Terrorists:
Solution with Bayes’ Rule
p(T = 1|S = 1) =
p(S = 1|T = 1)p(T = 1)
p(S = 1|T = 1)p(T = 1) + p(S = 1|T = 0)p(T = 0)
=
(0.95)(0.01)
(0.95)(0.01) + (0.05)(0.99)
≈ 0.16
The probability of the man being a terrorist is ≈ 16
13 / 38
Example 2: Detecting Terrorists:
Posterior Versus Prior Belief
While the man has a low probability of being a terrorist, our belief has
increased compared to our prior:
p(T = 1|S = 1)
p(T = 1)
=
0.16
0.01
= 16
i.e. our belief in him being a terrorist has gone up by a factor of 16
Since terrorists are so rare, a factor of 16 does not result in a very high
(absolute) probability or belief
(Aside: They are indeed very rare. For an intruiging (and surprising) example of the implications of inability to take account of actual
base rates (in the example above we made the numbers up), and the effect on people’s subsequent decisions, see Gerd Gigerenzer,
Dread Risk, September 11, and Fatal Traffic Accidents, Psychological Science 15(4), 286–287, (2004); Gerd Gigerenzer, Out of the
Frying Pan into the Fire: Behavioural Reactions to Terrorist Attacks, Risk Analysis 26(2), 347–351 (2006). His calculation (which of
course is based on some assumptions) is that in the year following 9/11, 6 times the number of people who were killed as passengers
additionally died on roads (that is the increase in road deaths due to people chosing to drive instead of flying)! He calls the reaction to
very low probability events with a bad outcome “dread risk”. )
14 / 38
Example 3: The Monty Hall Problem
Problem Statement
Three boxes, one with a prize and the other two are empty
Each box has equal probability of having the prize
Your goal is to pick up the box with the prize in it
You select one of the boxes
The host, who knows the location of the prize, opens the empty box
out of the other two boxes
Should you switch to the other box? Would that increase your chances of
winning the prize?
15 / 38
Example 3: The Monty Hall Problem
Problem Statement
Three boxes, one with a prize and the other two are empty
Each box has equal probability of having the prize
Your goal is to pick up the box with the prize in it
You select one of the boxes
The host, who knows the location of the prize, opens the empty box
out of the other two boxes
Should you switch to the other box? Would that increase your chances of
winning the prize?
15 / 38
Example 3: The Monty Hall Problem
Problem Statement
Three boxes, one with a prize and the other two are empty
Each box has equal probability of having the prize
Your goal is to pick up the box with the prize in it
You select one of the boxes
The host, who knows the location of the prize, opens the empty box
out of the other two boxes
Should you switch to the other box? Would that increase your chances of
winning the prize?
15 / 38
Example 3: The Monty Hall Problem
Problem Statement
Three boxes, one with a prize and the other two are empty
Each box has equal probability of having the prize
Your goal is to pick up the box with the prize in it
You select one of the boxes
The host, who knows the location of the prize, opens the empty box
out of the other two boxes
Should you switch to the other box? Would that increase your chances of
winning the prize?
15 / 38
Example 3: The Monty Hall Problem
Problem Statement
Three boxes, one with a prize and the other two are empty
Each box has equal probability of having the prize
Your goal is to pick up the box with the prize in it
You select one of the boxes
The host, who knows the location of the prize, opens the empty box
out of the other two boxes
Should you switch to the other box? Would that increase your chances of
winning the prize?
15 / 38
Example 3: The Monty Hall Problem
Problem Statement
Three boxes, one with a prize and the other two are empty
Each box has equal probability of having the prize
Your goal is to pick up the box with the prize in it
You select one of the boxes
The host, who knows the location of the prize, opens the empty box
out of the other two boxes
Should you switch to the other box? Would that increase your chances of
winning the prize?
15 / 38
Example 3: The Monty Hall Problem:
Formalization
Let C ∈ {r , g, b} denote the box that contains the prize where r , g, b refer
to the identity of each box.
WLOG assume the following:
You have selected box r
Denote the event: “the host opens box b” with H=b
P(C = r) =
1
3
p(C = g) =
1
3
p(C = b) =
1
3
p(H = b|C = r) =
1
2
p(H = b|C = g) = 1 p(H = b|C = b) = 0
We want to compute p(C = r |H = b) and p(C = g|H = b) to decide if we
should switch from our initial choice.
16 / 38
Example 3: The Monty Hall Problem:
Formalization
Let C ∈ {r , g, b} denote the box that contains the prize where r , g, b refer
to the identity of each box.
WLOG assume the following:
You have selected box r
Denote the event: “the host opens box b” with H=b
P(C = r) =
1
3
p(C = g) =
1
3
p(C = b) =
1
3
p(H = b|C = r) =
1
2
p(H = b|C = g) = 1 p(H = b|C = b) = 0
We want to compute p(C = r |H = b) and p(C = g|H = b) to decide if we
should switch from our initial choice.
16 / 38
Example 3: The Monty Hall Problem:
Formalization
Let C ∈ {r , g, b} denote the box that contains the prize where r , g, b refer
to the identity of each box.
WLOG assume the following:
You have selected box r
Denote the event: “the host opens box b” with H=b
P(C = r) =
1
3
p(C = g) =
1
3
p(C = b) =
1
3
p(H = b|C = r) =
1
2
p(H = b|C = g) = 1 p(H = b|C = b) = 0
We want to compute p(C = r |H = b) and p(C = g|H = b) to decide if we
should switch from our initial choice.
16 / 38
Example 3: The Monty Hall Problem:
Formalization
Let C ∈ {r , g, b} denote the box that contains the prize where r , g, b refer
to the identity of each box.
WLOG assume the following:
You have selected box r
Denote the event: “the host opens box b” with H=b
P(C = r) =
1
3
p(C = g) =
1
3
p(C = b) =
1
3
p(H = b|C = r) =
1
2
p(H = b|C = g) = 1 p(H = b|C = b) = 0
We want to compute p(C = r |H = b) and p(C = g|H = b) to decide if we
should switch from our initial choice.
16 / 38
Example 3: The Monty Hall Problem:
Formalization
Let C ∈ {r , g, b} denote the box that contains the prize where r , g, b refer
to the identity of each box.
WLOG assume the following:
You have selected box r
Denote the event: “the host opens box b” with H=b
P(C = r) =
1
3
p(C = g) =
1
3
p(C = b) =
1
3
p(H = b|C = r) =
1
2
p(H = b|C = g) = 1 p(H = b|C = b) = 0
We want to compute p(C = r |H = b) and p(C = g|H = b) to decide if we
should switch from our initial choice.
16 / 38
Example 3: The Monty Hall Problem:
Formalization
Let C ∈ {r , g, b} denote the box that contains the prize where r , g, b refer
to the identity of each box.
WLOG assume the following:
You have selected box r
Denote the event: “the host opens box b” with H=b
P(C = r) =
1
3
p(C = g) =
1
3
p(C = b) =
1
3
p(H = b|C = r) =
1
2
p(H = b|C = g) = 1 p(H = b|C = b) = 0
We want to compute p(C = r |H = b) and p(C = g|H = b) to decide if we
should switch from our initial choice.
16 / 38
Example 3: The Monty Hall Problem:
Solution
We have that:
p(H = b) =
∑
c∈{r ,g,b}
p(H = b|C = c)p(C = c)
= (1/2) (1/3) + (1) (1/3) + (0) (1/3)
= 1/2
Therefore:
p(C = r |H = b) =
p(H = b|C = r)p(C = r)
p(H = b)
=
(1/2)(1/3)
(1/2)
= 1/3
Similarly, p(C = g|H = b) = 2/3.
You should switch from your initial choice to the other box in order to
increase your chances of winning the prize!
17 / 38
Example 3: The Monty Hall Problem:
Solution
We have that:
p(H = b) =
∑
c∈{r ,g,b}
p(H = b|C = c)p(C = c)
= (1/2) (1/3) + (1) (1/3) + (0) (1/3)
= 1/2
Therefore:
p(C = r |H = b) =
p(H = b|C = r)p(C = r)
p(H = b)
=
(1/2)(1/3)
(1/2)
= 1/3
Similarly, p(C = g|H = b) = 2/3.
You should switch from your initial choice to the other box in order to
increase your chances of winning the prize!
17 / 38
Example 3: The Monty Hall Problem:
Solution
We have that:
p(H = b) =
∑
c∈{r ,g,b}
p(H = b|C = c)p(C = c)
= (1/2) (1/3) + (1) (1/3) + (0) (1/3)
= 1/2
Therefore:
p(C = r |H = b) =
p(H = b|C = r)p(C = r)
p(H = b)
=
(1/2)(1/3)
(1/2)
= 1/3
Similarly, p(C = g|H = b) = 2/3.
You should switch from your initial choice to the other box in order to
increase your chances of winning the prize!
17 / 38
Example 3: The Monty Hall Problem:
Solution
We have that:
p(H = b) =
∑
c∈{r ,g,b}
p(H = b|C = c)p(C = c)
= (1/2) (1/3) + (1) (1/3) + (0) (1/3)
= 1/2
Therefore:
p(C = r |H = b) =
p(H = b|C = r)p(C = r)
p(H = b)
=
(1/2)(1/3)
(1/2)
= 1/3
Similarly, p(C = g|H = b) = 2/3.
You should switch from your initial choice to the other box in order to
increase your chances of winning the prize!
17 / 38
Example 3: The Monty Hall Problem:
Solution
We have that:
p(H = b) =
∑
c∈{r ,g,b}
p(H = b|C = c)p(C = c)
= (1/2) (1/3) + (1) (1/3) + (0) (1/3)
= 1/2
Therefore:
p(C = r |H = b) =
p(H = b|C = r)p(C = r)
p(H = b)
=
(1/2)(1/3)
(1/2)
= 1/3
Similarly, p(C = g|H = b) = 2/3.
You should switch from your initial choice to the other box in order to
increase your chances of winning the prize!
17 / 38
Example 3: The Monty Hall Problem:
Solution
We have that:
p(H = b) =
∑
c∈{r ,g,b}
p(H = b|C = c)p(C = c)
= (1/2) (1/3) + (1) (1/3) + (0) (1/3)
= 1/2
Therefore:
p(C = r |H = b) =
p(H = b|C = r)p(C = r)
p(H = b)
=
(1/2)(1/3)
(1/2)
= 1/3
Similarly, p(C = g|H = b) = 2/3.
You should switch from your initial choice to the other box in order to
increase your chances of winning the prize!
17 / 38
Example 3: The Monty Hall Problem:
Solution
We have that:
p(H = b) =
∑
c∈{r ,g,b}
p(H = b|C = c)p(C = c)
= (1/2) (1/3) + (1) (1/3) + (0) (1/3)
= 1/2
Therefore:
p(C = r |H = b) =
p(H = b|C = r)p(C = r)
p(H = b)
=
(1/2)(1/3)
(1/2)
= 1/3
Similarly, p(C = g|H = b) = 2/3.
You should switch from your initial choice to the other box in order to
increase your chances of winning the prize!
17 / 38
Example 3: The Monty Hall Problem:
Solution
We have that:
p(H = b) =
∑
c∈{r ,g,b}
p(H = b|C = c)p(C = c)
= (1/2) (1/3) + (1) (1/3) + (0) (1/3)
= 1/2
Therefore:
p(C = r |H = b) =
p(H = b|C = r)p(C = r)
p(H = b)
=
(1/2)(1/3)
(1/2)
= 1/3
Similarly, p(C = g|H = b) = 2/3.
You should switch from your initial choice to the other box in order to
increase your chances of winning the prize!
17 / 38
Example 3: The Monty Hall Problem:
Illustration of the Solution
Illustration of the solution when you have initially selected box r.
18 / 38
Example 3: The Monty Hall Problem:
Another Perspective
Switching is bad if, and only if, we initially picked the prize box (because if
not, the other remaining box must contain the prize)
We picked the prize box with probability 1/3. This is independent of the
host’s action
Hence, with probability 2/3, switching will reveal the prize box
19 / 38
Example 3: The Monty Hall Problem:
Variants to Ponder
Would switching be rational if:
The host only revealed a box when he knew we picked the right one?
The host only revealed a box when he knew we picked the wrong
one?
The host is himself unaware of the prize box, and reveals a box at
random, which by chance does not have the prize?
20 / 38
1 Bayes’ Rule: Examples
Eating Hamburgers
Detecting Terrorists
The Monty Hall Problem
2 Moments for functions of two discrete Random Variables
3 The meaning of Probability
4 Wrapping Up
21 / 38
The Expected Value of a Function of Two Discrete
Random Variables
(Assuming you have met Expectation E [X ] and Variance Var(X )
before. . . )
The expected value of a function g(X ,Y ) of two discrete random variables
is defined as
E [g(X ,Y )] =
∑
x
∑
y
g(x , y)p(X = x ,Y = y). (1)
In particular, the expected value of X is given by
E [X ] =
∑
x
∑
y
xp(X = x ,Y = y). (2)
It should be noted that if we have already calculated the marginal
distribution of X , then it is simpler to calculate E [X ] using this.
22 / 38
Covariance and the Correlation Coefficient
The covariance between X and Y, Cov(X ,Y ) is given by
Cov(X ,Y ) = E(XY )− E(X )E(Y ) (3)
Note that by definition Cov(X ,X ) = E(XX )− E(X )E(X ) = Var(X ).
The coefficient of correlation between X and Y is given by
ρ(X ,Y ) =
Cov(X ,Y )√
Var(X )Var(Y )
(4)
Always in [−1, 1].
23 / 38
Example
Discrete random variables X and Y have the following joint distribution:
Y = −1 Y = 0 Y = 1
X = 0 0 13 0
X = 1 13 0
1
3
Calculate
1 p(X > Y )
2 marginal distributions of X and Y
3 expected values and variances of X and Y
4 coefficient of correlation between X and Y
Are X and Y independent?
24 / 38
Example
To calculated the probability of such an event, note that we sum over all
the cells which correspond to that event. Hence,
p(X > Y ) = p(X = 0,Y = −1) + p(X = 1,Y = −1)
+ p(X = 1,Y = 0) =
1
3
25 / 38
Example
Recall that
p(X = x) =
∑
y
p(X = x ,Y = y).
Hence,
p(X = 0) =
1∑
y=−1
p(X = 0,Y = y) = 0 +
1
3
+ 0 =
1
3
p(X = 1) =
1∑
y=−1
p(X = 1,Y = y) =
1
3
+ 0 +
1
3
=
2
3
Note that after obtaining p(X = 0), we could calculate p(X = 1) by using
the fact that
p(X = 1) = 1− p(X = 0), (5)
since X only takes the values 0 and 1.
26 / 38
Example
Similarly,
p(Y = −1) =
1∑
x=0
p(X = x ,Y = −1) = 0 +
1
3
=
1
3
p(Y = 0) =
1∑
x=0
p(X = x ,Y = 0) =
1
3
+ 0 =
1
3
p(Y = 1) = 1− p(Y = −1)− p(Y = 0) =
1
3
27 / 38
Example
We then calculate the expected values and variances of X and Y from
these marginal distributions.
E(X ) =
1∑
x=0
x p(X = x) = 0×
1
3
+ 1×
2
3
=
2
3
E(Y ) =
1∑
y=−1
y p(Y = y) = −1×
1
3
+ 0×
1
3
+ 1×
1
3
= 0.
28 / 38
Example
To calculate the variances of X and Y , Var(X ) and Var(Y ), we use the
formula
Var(X ) = E(X 2)− (E(X ))2.
E(X 2) =
1∑
x=0
x2 p(X = x) = 02 ×
1
3
+ 12 ×
2
3
=
2
3
E(Y 2) =
1∑
y=−1
y2 p(Y = y) = (−1)2 ×
1
3
+ 02 ×
1
3
+ 12 ×
1
3
=
2
3
.
Thus we get
Var(X ) =
2
3
−
(
2
3
)2
=
2
9
Var(Y ) =
2
3
− (0)2 =
2
3
29 / 38
Example
To calculate the correlation coefficient, we first calculate the covariance
between X and Y . We have
Cov(X ,Y ) = E(XY )− E(X )E(Y ).
where
E(XY ) =
1∑
x=0
1∑
y=−1
xy p(X = x ,Y = y)
= 0(−1)0 + 0(0)
1
3
+ 0(1)0 + 1(−1)
1
3
+ 1(0)0 + 1(1)
1
3
= 0
Thus we get
Cov(X ,Y ) = E(XY )− E(X )E(Y ) = 0−
2
3
× 0 = 0.
From the definition of the correlation coefficient,
ρ(X ,Y ) = 0.
30 / 38
Example – is X and Y independent
We have that
p(X = 0,Y = −1) = 0 6= p(X = 0)p(Y = −1) =
(
1
3
)2
31 / 38
1 Bayes’ Rule: Examples
Eating Hamburgers
Detecting Terrorists
The Monty Hall Problem
2 Moments for functions of two discrete Random Variables
3 The meaning of Probability
4 Wrapping Up
32 / 38
The meaning of Probability
Frequentist : Frequencies of random repeatable experiments
E.g. Prob. of biased coin landing “Heads”
Bayesian : Degrees of Belief
E.g. Prob. of Tasmanian Devil disappearing by the end
of this decade
Cox Axioms
Given B(x), B(x̄), B(x , y), B(x |y), B(y):
1 Degrees of belief can be ordered
2 B(x) = f [B(x̄)]
3 B(x , y) = g[B(x |y),B(y)]
If a set of Beliefs satisfy these axioms they can be mapped onto
probabilities satisfying the rules of probability.
33 / 38
The meaning of Probability
Frequentist : Frequencies of random repeatable experiments
E.g. Prob. of biased coin landing “Heads”
Bayesian : Degrees of Belief
E.g. Prob. of Tasmanian Devil disappearing by the end
of this decade
Cox Axioms
Given B(x), B(x̄), B(x , y), B(x |y), B(y):
1 Degrees of belief can be ordered
2 B(x) = f [B(x̄)]
3 B(x , y) = g[B(x |y),B(y)]
If a set of Beliefs satisfy these axioms they can be mapped onto
probabilities satisfying the rules of probability.
33 / 38
The meaning of Probability
Frequentist : Frequencies of random repeatable experiments
E.g. Prob. of biased coin landing “Heads”
Bayesian : Degrees of Belief
E.g. Prob. of Tasmanian Devil disappearing by the end
of this decade
Cox Axioms
Given B(x), B(x̄), B(x , y), B(x |y), B(y):
1 Degrees of belief can be ordered
2 B(x) = f [B(x̄)]
3 B(x , y) = g[B(x |y),B(y)]
If a set of Beliefs satisfy these axioms they can be mapped onto
probabilities satisfying the rules of probability.
33 / 38
The meaning of Probability
Frequentist : Frequencies of random repeatable experiments
E.g. Prob. of biased coin landing “Heads”
Bayesian : Degrees of Belief
E.g. Prob. of Tasmanian Devil disappearing by the end
of this decade
Cox Axioms
Given B(x), B(x̄), B(x , y), B(x |y), B(y):
1 Degrees of belief can be ordered
2 B(x) = f [B(x̄)]
3 B(x , y) = g[B(x |y),B(y)]
If a set of Beliefs satisfy these axioms they can be mapped onto
probabilities satisfying the rules of probability.
33 / 38
The meaning of Probability
Frequentist : Frequencies of random repeatable experiments
E.g. Prob. of biased coin landing “Heads”
Bayesian : Degrees of Belief
E.g. Prob. of Tasmanian Devil disappearing by the end
of this decade
Cox Axioms
Given B(x), B(x̄), B(x , y), B(x |y), B(y):
1 Degrees of belief can be ordered
2 B(x) = f [B(x̄)]
3 B(x , y) = g[B(x |y),B(y)]
If a set of Beliefs satisfy these axioms they can be mapped onto
probabilities satisfying the rules of probability.
33 / 38
The meaning of Probability
Frequentist : Frequencies of random repeatable experiments
E.g. Prob. of biased coin landing “Heads”
Bayesian : Degrees of Belief
E.g. Prob. of Tasmanian Devil disappearing by the end
of this decade
Cox Axioms
Given B(x), B(x̄), B(x , y), B(x |y), B(y):
1 Degrees of belief can be ordered
2 B(x) = f [B(x̄)]
3 B(x , y) = g[B(x |y),B(y)]
If a set of Beliefs satisfy these axioms they can be mapped onto
probabilities satisfying the rules of probability.
33 / 38
Frequentists versus Bayesians: Round I
Image from http://xkcd.com/1132/
34 / 38
http://xkcd.com/1132/
Frequentists versus Bayesians: Round II
Image from http://normaldeviate.wordpress.com/2012/11/09/anti-xkcd/
In practice one needs to make use of both interpretations. Wise to be
open to both. This is a huge topic which we can not get into further here.
Note that Mackay was firmly in the Bayesian camp. . .
35 / 38
1 Bayes’ Rule: Examples
Eating Hamburgers
Detecting Terrorists
The Monty Hall Problem
2 Moments for functions of two discrete Random Variables
3 The meaning of Probability
4 Wrapping Up
36 / 38
Summary
Examples of application of Bayes’ rule
I Formalization
I Solution by applying Bayes’ theorem
Intuition is usually helpful although it may sometimes deceive us
Frequentist v Bayesian probabilities
Cox axioms
37 / 38
Next time
Working through some useful probability distributions
More on Bayesian inference
38 / 38
Bayes’ Rule: Examples
Eating Hamburgers
Detecting Terrorists
The Monty Hall Problem
Moments for functions of two discrete Random Variables
The meaning of Probability
Wrapping Up