程序代写 COMP2610 / COMP6261 – Information Theory Lecture 4: Bayesian Inference

COMP2610 / COMP6261 – Information Theory Lecture 4: Bayesian Inference
U Logo Use Guidelines . Williamson
logo is a contemporary n of our heritage.
presents our name, ld and our motto:

Copyright By PowCoder代写 加微信 powcoder

earn the nature of things.
authenticity of our brand identity, there are n how our logo is used.
go – horizontal logo
go should be used on a white background. ludes black text with the crest in Deep Gold in
rinting is not available, the black logo can hite background.
e used white reversed out of a black occasionally a neutral dark background.
Research School of Computer Science
Preferred logo
Black version
July 31, 2018

Examples of joint, marginal and conditional distributions
When can we say that X , Y do not influence each other?
What, if anything, does p(X = x|Y = y) tell us about p(Y = y|X = x)?

Review Exercise
Suppose we have binary random variables X , Y such that
p(X =1|Y =1)=
p(X =1)=0.6 p(Y =1|X =0)=0.7 p(Y =1|X =1)=0.8

Review Exercise
Suppose we have binary random variables X , Y such that
p(X =1)=0.6 p(Y =1|X =0)=0.7 p(Y =1|X =1)=0.8
p(X =1|Y =1)= p(Y =1|X =1)p(X =1) Bayes’rule p(Y = 1)

Review Exercise
Suppose we have binary random variables X , Y such that
p(X =1)=0.6 p(Y =1|X =0)=0.7 p(Y =1|X =1)=0.8
p(X =1|Y =1)= p(Y =1|X =1)p(X =1) Bayes’rule
= p(Y =1|X =1)p(X =1)
p(Y =1|X =1)p(X =1)+p(Y =1|X =0)p(X =0)

Review Exercise
Suppose we have binary random variables X , Y such that
p(X =1)=0.6 p(Y =1|X =0)=0.7 p(Y =1|X =1)=0.8
p(X =1|Y =1)= p(Y =1|X =1)p(X =1) Bayes’rule
= p(Y =1|X =1)p(X =1)
p(Y =1|X =1)p(X =1)+p(Y =1|X =0)p(X =0) = (0.8)(0.6)
(0.8)(0.6) + (0.7)(0.4) ≈ 0.63

More examples on Bayes’ theorem:
􏰜 Eating hamburgers
􏰜 Detecting terrorists
􏰜 The problem
Are there notions of probability beyond frequency counting?

1 Bayes’ Rule: Examples Eating Hamburgers Detecting Terrorists
The Problem
2 Moments for functions of two discrete Random Variables
3 The meaning of Probability
4 Wrapping Up

1 Bayes’ Rule: Examples Eating Hamburgers Detecting Terrorists
The Problem
2 Moments for functions of two discrete Random Variables
3 The meaning of Probability
4 Wrapping Up

Bayesian Inference: Example 1 (Barber, BRML, 2011)
90% of people with McD syndrome are frequent hamburger eaters

Bayesian Inference: Example 1 (Barber, BRML, 2011)
90% of people with McD syndrome are frequent hamburger eaters
Probability of someone having McD syndrome: 1/10000

Bayesian Inference: Example 1 (Barber, BRML, 2011)
90% of people with McD syndrome are frequent hamburger eaters
Probability of someone having McD syndrome: 1/10000
Proportion of hamburger eaters is about 50%

Bayesian Inference: Example 1 (Barber, BRML, 2011)
90% of people with McD syndrome are frequent hamburger eaters
Probability of someone having McD syndrome: 1/10000
Proportion of hamburger eaters is about 50%
What is the probability that a hamburger eater will have McD syndrome?

Bayesian Inference: Example 1: Formalization
Let McD ∈ {0, 1} be the variable denoting having the McD syndrome and H ∈ {0, 1} be the variable denoting a hamburger eater. Therefore:

Bayesian Inference: Example 1: Formalization
Let McD ∈ {0, 1} be the variable denoting having the McD syndrome and H ∈ {0, 1} be the variable denoting a hamburger eater. Therefore:
p(H = 1|McD = 1) = 9/10

Bayesian Inference: Example 1: Formalization
Let McD ∈ {0, 1} be the variable denoting having the McD syndrome and H ∈ {0, 1} be the variable denoting a hamburger eater. Therefore:
p(H =1|McD =1)=9/10 p(McD =1)=10−4

Bayesian Inference: Example 1: Formalization
Let McD ∈ {0, 1} be the variable denoting having the McD syndrome and H ∈ {0, 1} be the variable denoting a hamburger eater. Therefore:
p(H =1|McD =1)=9/10 p(McD =1)=10−4 p(H =1)=1/2

Bayesian Inference: Example 1: Formalization
Let McD ∈ {0, 1} be the variable denoting having the McD syndrome and H ∈ {0, 1} be the variable denoting a hamburger eater. Therefore:
p(H =1|McD =1)=9/10 p(McD =1)=10−4 p(H =1)=1/2
We need to compute p(McD = 1|H = 1), the probability of a hamburger eater having McD syndrome.

Bayesian Inference: Example 1: Formalization
Let McD ∈ {0, 1} be the variable denoting having the McD syndrome and H ∈ {0, 1} be the variable denoting a hamburger eater. Therefore:
p(H =1|McD =1)=9/10 p(McD =1)=10−4 p(H =1)=1/2
We need to compute p(McD = 1|H = 1), the probability of a hamburger eater having McD syndrome.
Any ballpark estimates of this probability?

Bayesian Inference: Example 1: Solution
p(McD =1|H =1)= p(H =1|McD =1)p(McD =1) p(H = 1)
= 1.8 × 10−4

Bayesian Inference: Example 1: Solution
p(McD =1|H =1)= p(H =1|McD =1)p(McD =1) p(H = 1)
= 1.8 × 10−4
Repeat the above computation if the proportion of hamburger eaters is rather small: (say in France) 0.001.

Example 2: Detecting Terrorists: From understandinguncertainty.org
Scanner detects true terrorists with 95% accuracy

Example 2: Detecting Terrorists: From understandinguncertainty.org
Scanner detects true terrorists with 95% accuracy Scanner detects upstanding citizens with 95% accuracy

Example 2: Detecting Terrorists: From understandinguncertainty.org
Scanner detects true terrorists with 95% accuracy
Scanner detects upstanding citizens with 95% accuracy There is 1 terrorist on your plane with 100 passengers aboard

Example 2: Detecting Terrorists: From understandinguncertainty.org
Scanner detects true terrorists with 95% accuracy
Scanner detects upstanding citizens with 95% accuracy
There is 1 terrorist on your plane with 100 passengers aboard The shifty looking man sitting next to you tests positive (terrorist)

Example 2: Detecting Terrorists: From understandinguncertainty.org
Scanner detects true terrorists with 95% accuracy
Scanner detects upstanding citizens with 95% accuracy There is 1 terrorist on your plane with 100 passengers aboard
The shifty looking man sitting next to you tests positive (terrorist)
What are the chances of this man being a terrorist?

Example 2: Detecting Terrorists:
Simple Solution Using “Natural Frequencies” ( )
Figure reproduced from understandinguncertainty.org

Example 2: Detecting Terrorists:
Simple Solution Using “Natural Frequencies” ( )
Figure reproduced from understandinguncertainty.org
The chances of the man being a terrorist are ≈ 16

Example 2: Detecting Terrorists:
Simple Solution Using “Natural Frequencies” ( )
Figure reproduced from understandinguncertainty.org
The chances of the man being a terrorist are ≈ 16
Relation to disease example Consequences when catching criminals

Example 2: Detecting Terrorists: Formalization with Actual Probabilities
Let T ∈ {0, 1} denote the variable regarding whether the person is a terrorist and S ∈ {0, 1} denote the outcome of the scanner.

Example 2: Detecting Terrorists: Formalization with Actual Probabilities
Let T ∈ {0, 1} denote the variable regarding whether the person is a terrorist and S ∈ {0, 1} denote the outcome of the scanner.
p(S =1|T =1)=0.95 p(S =0|T =1)=0.05

Example 2: Detecting Terrorists: Formalization with Actual Probabilities
Let T ∈ {0, 1} denote the variable regarding whether the person is a terrorist and S ∈ {0, 1} denote the outcome of the scanner.
p(S =1|T =1)=0.95 p(S =0|T =1)=0.05 p(S =0|T =0)=0.95 p(S =1|T =0)=0.05

Example 2: Detecting Terrorists: Formalization with Actual Probabilities
Let T ∈ {0, 1} denote the variable regarding whether the person is a terrorist and S ∈ {0, 1} denote the outcome of the scanner.
p(S =1|T =1)=0.95 p(S =0|T =0)=0.95 p(T =1)=0.01
p(S =0|T =1)=0.05 p(S =1|T =0)=0.05 p(T =0)=0.99

Example 2: Detecting Terrorists: Formalization with Actual Probabilities
Let T ∈ {0, 1} denote the variable regarding whether the person is a terrorist and S ∈ {0, 1} denote the outcome of the scanner.
p(S =1|T =1)=0.95 p(S =0|T =0)=0.95 p(T =1)=0.01
p(S =0|T =1)=0.05 p(S =1|T =0)=0.05 p(T =0)=0.99
We want to compute p(T = 1|S = 1), the probability of the man being a terrorist given that he has tested positive.

Example 2: Detecting Terrorists: Solution with Bayes’ Rule
p(T =1|S =1)= p(S =1|T =1)p(T =1)
p(S=1|T =1)p(T =1)+p(S=1|T =0)p(T =0)

Example 2: Detecting Terrorists: Solution with Bayes’ Rule
p(T =1|S =1)= p(S =1|T =1)p(T =1)
p(S=1|T =1)p(T =1)+p(S=1|T =0)p(T =0)
= (0.95)(0.01) (0.95)(0.01) + (0.05)(0.99)

Example 2: Detecting Terrorists: Solution with Bayes’ Rule
p(T =1|S =1)= p(S =1|T =1)p(T =1)
p(S=1|T =1)p(T =1)+p(S=1|T =0)p(T =0)
= (0.95)(0.01) (0.95)(0.01) + (0.05)(0.99)

Example 2: Detecting Terrorists: Solution with Bayes’ Rule
p(T =1|S =1)= p(S =1|T =1)p(T =1)
p(S=1|T =1)p(T =1)+p(S=1|T =0)p(T =0)
= (0.95)(0.01) (0.95)(0.01) + (0.05)(0.99)
The probability of the man being a terrorist is ≈ 61

Example 2: Detecting Terrorists: Posterior Versus Prior Belief
While the man has a low probability of being a terrorist, our belief has increased compared to our prior:
p(T = 1|S = 1) = 0.16 = 16 p(T = 1) 0.01
i.e. our belief in him being a terrorist has gone up by a factor of 16 Since terrorists are so rare, a factor of 16 does not result in a very high
(absolute) probability or belief
(Aside: They are indeed very rare. For an intruiging (and surprising) example of the implications of inability to take account of actual base rates (in the example above we made the numbers up), and the effect on people’s subsequent decisions, see , , September 11, and Fatal Traffic Accidents, Psychological Science 15(4), 286–287, (2004); , Out of the Frying Pan into the Fire: Behavioural Reactions to Terrorist Attacks, Risk Analysis 26(2), 347–351 (2006). His calculation (which of course is based on some assumptions) is that in the year following 9/11, 6 times the number of people who were killed as passengers additionally died on roads (that is the increase in road deaths due to people chosing to drive instead of flying)! He calls the reaction to very low probability events with a bad outcome “dread risk”. )

Example 3: The Problem Problem Statement
Three boxes, one with a prize and the other two are empty

Example 3: The Problem Problem Statement
Three boxes, one with a prize and the other two are empty Each box has equal probability of having the prize

Example 3: The Problem Problem Statement
Three boxes, one with a prize and the other two are empty Each box has equal probability of having the prize
Your goal is to pick up the box with the prize in it

Example 3: The Problem Problem Statement
Three boxes, one with a prize and the other two are empty Each box has equal probability of having the prize
Your goal is to pick up the box with the prize in it
You select one of the boxes

Example 3: The Problem Problem Statement
Three boxes, one with a prize and the other two are empty Each box has equal probability of having the prize
Your goal is to pick up the box with the prize in it
You select one of the boxes
The host, who knows the location of the prize, opens the empty box out of the other two boxes

Example 3: The Problem Problem Statement
Three boxes, one with a prize and the other two are empty Each box has equal probability of having the prize
Your goal is to pick up the box with the prize in it
You select one of the boxes
The host, who knows the location of the prize, opens the empty box out of the other two boxes
Should you switch to the other box? Would that increase your chances of winning the prize?

Example 3: The Problem: Formalization
Let C ∈ {r,g,b} denote the box that contains the prize where r,g,b refer to the identity of each box.
WLOG assume the following:

Example 3: The Problem: Formalization
Let C ∈ {r,g,b} denote the box that contains the prize where r,g,b refer to the identity of each box.
WLOG assume the following: You have selected box r

Example 3: The Problem: Formalization
Let C ∈ {r,g,b} denote the box that contains the prize where r,g,b refer to the identity of each box.
WLOG assume the following: You have selected box r
Denote the event: “the host opens box b” with H=b

Example 3: The Problem: Formalization
Let C ∈ {r,g,b} denote the box that contains the prize where r,g,b refer to the identity of each box.
WLOG assume the following: You have selected box r
Denote the event: “the host opens box b” with H=b
P(C =r)= 13 p(C =g)= 13 p(C =b)= 13

Example 3: The Problem: Formalization
Let C ∈ {r,g,b} denote the box that contains the prize where r,g,b refer to the identity of each box.
WLOG assume the following: You have selected box r
Denote the event: “the host opens box b” with H=b
P(C =r)= 13 p(C =g)= 13 p(C =b)= 13
p(H =b|C =r)= 12 p(H =b|C =g)=1 p(H =b|C =b)=0 16/38

Example 3: The Problem: Formalization
Let C ∈ {r,g,b} denote the box that contains the prize where r,g,b refer to the identity of each box.
WLOG assume the following: You have selected box r
Denote the event: “the host opens box b” with H=b
P(C =r)= 13 p(C =g)= 13 p(C =b)= 13 p(H =b|C =r)= 12 p(H =b|C =g)=1 p(H =b|C =b)=0
Wewanttocomputep(C =r|H =b)andp(C =g|H =b)todecideifwe should switch from our initial choice.

Example 3: The Problem: Solution
We have that:
p(H =b)= 􏰄 p(H =b|C =c)p(C =c) c∈{r,g,b}

Example 3: The Problem: Solution
We have that:
p(H =b)= 􏰄 p(H =b|C =c)p(C =c) c∈{r,g,b}
= (1/2) (1/3) + (1) (1/3) + (0) (1/3)

Example 3: The Problem: Solution
We have that:
p(H =b)= 􏰄 p(H =b|C =c)p(C =c) c∈{r,g,b}
= (1/2) (1/3) + (1) (1/3) + (0) (1/3) = 1/2

Example 3: The Problem: Solution
We have that:
Therefore:
p(H =b)= 􏰄 p(H =b|C =c)p(C =c) c∈{r,g,b}
= (1/2) (1/3) + (1) (1/3) + (0) (1/3) = 1/2

Example 3: The Problem: Solution
We have that:
Therefore:
p(H =b)= 􏰄 p(H =b|C =c)p(C =c) c∈{r,g,b}
= (1/2) (1/3) + (1) (1/3) + (0) (1/3) = 1/2
p(C =r|H =b)= p(H =b|C =r)p(C =r) p(H = b)

Example 3: The Problem: Solution
We have that:
Therefore:
p(H =b)= 􏰄 p(H =b|C =c)p(C =c) c∈{r,g,b}
= (1/2) (1/3) + (1) (1/3) + (0) (1/3) = 1/2
p(C =r|H =b)= p(H =b|C =r)p(C =r) = (1/2)(1/3) =1/3 p(H = b) (1/2)

Example 3: The Problem: Solution
We have that:
p(H =b)= 􏰄 p(H =b|C =c)p(C =c) c∈{r,g,b}
= (1/2) (1/3) + (1) (1/3) + (0) (1/3) = 1/2
Therefore:
p(C =r|H =b)= p(H =b|C =r)p(C =r) = (1/2)(1/3) =1/3
p(H = b) (1/2) Similarly, p(C = g|H = b) = 2/3.

Example 3: The Problem: Solution
We have that:
p(H =b)= 􏰄 p(H =b|C =c)p(C =c) c∈{r,g,b}
= (1/2) (1/3) + (1) (1/3) + (0) (1/3) = 1/2
Therefore:
p(C =r|H =b)= p(H =b|C =r)p(C =r) = (1/2)(1/3) =1/3
p(H = b) (1/2) Similarly, p(C = g|H = b) = 2/3.
You should switch from your initial choice to the other box in order to increase your chances of winning the prize!

Example 3: The Problem: Illustration of the Solution
Illustration of the solution when you have initially selected box r.

Example 3: The Problem: Another Perspective
Switching is bad if, and only if, we initially picked the prize box (because if not, the other remaining box must contain the prize)
We picked the prize box with probability 1/3. This is independent of the host’s action
Hence, with probability 2/3, switching will reveal the prize box

Example 3: The Problem: Variants to switching be rational if:
The host only revealed a box when he knew we picked the right one?
The host only revealed a box when he knew we picked the wrong one?
The host is himself unaware of the prize box, and reveals a box at random, which by chance does not have the prize?

1 Bayes’ Rule: Examples Eating Hamburgers Detecting Terrorists
The Problem
2 Moments for functions of two discrete Random Variables
3 The meaning of Probability
4 Wrapping Up

The Expected Value of a Function of Two Discrete Random Variables
(Assuming you have met Expectation E[X] and Variance Var(X)
before. . . )
The expected value of a function g(X , Y ) of two discrete random variables is defined as
E[g(X,Y)] = 􏰄􏰄g(x,y)p(X = x,Y = y). (1) xy
In particular, the expected value of X is given by
E[X] = 􏰄􏰄xp(X = x,Y = y). (2)
It should be noted that if we have already calculated the marginal distribution of X, then it is simpler to calculate E[X] using this.

Covariance and the Correlation Coefficient
The covariance between X and Y, Cov(X , Y ) is given by
Cov(X , Y ) = E (XY ) − E (X )E (Y ) (3)
Note that by definition Cov(X , X ) = E (XX ) − E (X )E (X ) = Var(X ). The coefficient of correlation between X and Y is given by
ρ(X,Y) = 􏱈Var(X)Var(Y) (4)
Always in [−1, 1].

Discrete random variables X and Y have the following joint distribution:
Y=−1 Y=0 Y=1 X=0 0 13 0 X=1 13 0 13
1 p(X > Y)
2 marginal distributions of X and Y
3 expected values and variances of X and Y
4 coefficient of correlation between X and Y
Are X and Y independent?

To calculated the probability of such an event, note that we sum over all the cells which correspond to that event. Hence,
p(X >Y)=p(X =0,Y =−1)+p(X =1,Y =−1) + p ( X = 1 , Y = 0 ) = 13

Recall that
the fact that
p(X = 0) =
p(X =x)=􏰄p(X =x,Y =y). y
p(X = 0, Y = y ) = 0 + 3 + 0 = 3 􏰄1 112
p(X = 1) =
Note that after obtaining p(X = 0), we could calculate p(X = 1) by using
p(X = 1, Y = y ) = 3 + 0 + 3 = 3
p(X =1)=1−p(X =0), (5) since X only takes the values 0 and 1.

Similarly,
p(Y = −1) =
p(Y = 0) =
p(X = x, Y = −1) = 0 + 3 = 3 􏰄1 11
p(X = x, Y = 0) = 3 + 0 = 3 p(Y =1)=1−p(Y =−1)−p(Y =0)= 13

We then calculate the expected values and variances of X and Y from these marginal distributions.
y p(Y = y) = −1 × 3 + 0 × 3 + 1 × 3 = 0.
x p(X = x) = 0 × 3 + 1 × 3 = 3

To calculate the variances of X and Y , Var(X ) and Var(Y ), we use the formula
x2 p(X = x) = 02 × 3 + 12 × 3 = 3
E(Y2) = Thus we get
y2 p(Y = y) = (−1)2 × 3 + 02 × 3 + 12 × 3 = 3. 2 􏰂2􏰃2 2
Var(X)=3− 3 =9 Var(Y ) = 32 − (0)2 = 23
Var(X) = E(X2) − (E(X))2. 􏰄1 122

To calculate the correlation coefficient, we first calculate the covariance between X and Y . We have
Cov(X , Y ) = E (XY ) − E (X )E (Y ).
E(XY)=􏰄 􏰄 xyp(X =x,Y =y) x=0 y=−1
= 0(−1)0 + 0(0)13 + 0(1)0 + 1(−1)13 + 1(0)0 + 1(1)31 = 0 Thus we get
Cov(X , Y ) = E (XY ) − E (X )E (Y ) = 0 − 23 × 0 = 0.
From the definition of the correlation coefficient, ρ(X,Y) = 0.

Example – is X and Y independent
We have that
p(X =0,Y =−1)=0̸=p(X =0)p(Y =−1)= 3

1 Bayes’ Rule: Examples Eating Hamburgers Detecting Terrorists
The Problem
2 Moments for functions of two discrete Random Variables
3 The meaning of Probability
4 Wrapping Up

The meaning of Probability
Frequentist : Frequencies of random repeatable experiments

The meaning of Probability
Frequentist : Frequencies of random repeatable experiments E.g. Prob. of biased coin landing “Heads”

The meaning of Probability
Frequentist : Frequencies of random repeatable experiments E.g. Prob. of biased coin landing “Heads”
Bayesian : Degrees of Belief

The meaning of Probability
Frequentist : Frequencies of random repeatable experiments E.g. Prob. of biased coin landing “Heads”
Bayesian : Degrees of Belief
E.g. Prob. of Tasmanian Devil disappearing by the end of this decade

The meaning of Probability
Frequentist : F

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com