Joint probability distribution
Joint probability distribution for a set of r.v.s (random variables) gives the probability of every atomic event on those r.v.s (i.e., every sample point)
PpCavity, W eatherq is a 2 ˆ 4 matrix of values ̈ ̨
0.144 0.02 0.016 0.02 ̋‚
0.576 0.08 0.064 0.08
which can be interpreted as
W.“sunny W.“rain W.“cloudy W.“snow Cavity “ true 0.144 0.02 0.016 0.02 Cavity “ false 0.576 0.08 0.064 0.08
Every question about a domain can be answered by the joint distribution because every event is a sum of sample points.
E.g., PpCavity“true AND Weather ‰ cloudyq
⃝c -Trenn, King’s College London 2
Inference by enumeration
(Paramount Pictures)
⃝c -Trenn, King’s College London 3
Inference by enumeration
Example of a joint distribution with three variables (catch is something the doctor can test):
For any proposition φ, sum the atomic events where it is true: ÿ
⃝c -Trenn, King’s College London
4
Ppφq “
Ppωq
ω:ω|ùφ
Inference by enumeration
For any proposition φ, sum the atomic events where it is true: ÿ
⃝c -Trenn, King’s College London
5
Ppφq “
P ptoothacheq “ 0.108 ` 0.012 ` 0.016 ` 0.064 “ 0.2
ω:ω|ùφ
Ppωq
Inference by enumeration
For any proposition φ, sum the atomic events where it is true: ÿ
⃝c -Trenn, King’s College London
6
Ppφq “
P pcavity _ toothacheq “ 0.108 ` 0.012 ` 0.072 ` 0.008 ` 0.016 ` 0.064
“ 0.28
ω:ω|ùφ
Ppωq
Your turn
Calculate P rptoothache ^ cavityq.
I encourage you to use the KEATS forum to compare your answers!
⃝c -Trenn, King’s College London 7
Your turn
Calculate P rpcavity | toothacheq.
I encourage you to use the KEATS forum to compare your answers!
⃝c -Trenn, King’s College London 8
Inference by enumeration
Can also compute conditional probabilities:
P p␣cavity|toothacheq
“
P p␣cavity ^ toothacheq P ptoothacheq
0.016 ` 0.064
0.108 ` 0.012 ` 0.016 ` 0.064
“
“ 0.4
P p␣cavity ^ toothacheq /
P ptoothacheq
⃝c -Trenn, King’s College London
9
Notation
Notation for conditional distributions:
PpCavity|toothacheq
The outcome is a 2-dimensional vector (and so is Cavity). Write it down and compare!
⃝c -Trenn, King’s College London 10
Conditional probability
Conditional or posterior probabilities
P pcavity|toothacheq “ 0.6 given that toothache is all I know
Recall, for convenience we write cavity for Cavity “ true and toothache for T oothache “ true.
⃝c -Trenn, King’s College London 11
Conditional probability
Conditional or posterior probabilities
P pcavity|toothacheq “ 0.6 given that toothache is all I know
If we know more, e.g., cavity is also given, then we have P pcavity|toothache, cavityq “ 1
Note: the less specific belief (toothache) remains valid after more evidence arrives, but is not always useful
⃝c -Trenn, King’s College London 12
Conditional probability
Conditional or posterior probabilities
P pcavity|toothacheq “ 0.6 given that toothache is all I know
If we know more, e.g., cavity is also given, then we have P pcavity|toothache, cavityq “ 1
Note: the less specific belief (toothache) remains valid after more evidence arrives, but is not always useful
New evidence may be irrelevant, allowing simplification
P pcavity|toothache, your curtains are redq “ P pcavity|toothacheq “ 0.6 This kind of inference, sanctioned by domain knowledge, is crucial.
⃝c -Trenn, King’s College London 13
Conditional probability
Definition of conditional probability:
Ppa|bq“ Ppa^bq ifPpbqą0
P pbq Product rule gives an alternative formulation:
Ppa ^ bq “ Ppa|bqPpbq “ Ppb|aqPpaq A general version holds for joint distributions,
PpW eather, Cavityq “ PpW eather|Cavityq d PpCavityq
(View as a 4 ̈ 2 “ 8 set of equations, not matrix
multiplication – unless you know exactly what you’re doing and you model everything correctly)
⃝c -Trenn, King’s College London 14
Chain rule
Chain rule is derived by successive application of product rule. Concrete example: Ppa,b,cq “ Ppa,bqPpc|b,aq
“ P paqP pb|aqP pc|b, aq
In general,
PpX1,…,Xnq “ PpX1,…,Xn ́1qPpXn|X1,…,Xn ́1q
“ PpX1, . . . , Xn ́2qPpXn ́1|X1, . . . , Xn ́2qPpXn|X1, . . . , Xn ́1q
“… n
⃝c -Trenn, King’s College London
15
“
ź
i“1
PpXi|X1, . . . , Xi ́1q
Normalisation
We can use α “ 1{P ptoothacheq to normalise (we don’t need to calculate it!) PpCavity|toothacheq “ α PpCavity, toothacheq
“ α rPpCavity, toothache, catchq ` PpCavity, toothache, ␣catchqs » ̈ ̨ ̈ ̨fi ̈ ̨ ̈ ̨
0.108 “ α – ̋
0.016
0.012 ‚ ` ̋
0.064
0.12, 0.6,
‚fl “ α ̋
‚ “ ̋ ‚ 0.4
0.08
Green boxes show step in calculation, not the desired outcome
⃝c -Trenn, King’s College London 16
General version
Let X be all the variables.
Typically, we want the posterior joint distribution of the query variables Y given
specific values e for the evidence variables E Let the hidden variables be H “ X ́ Y ́ E
⃝c -Trenn, King’s College London 17
General version
Then the required summation of joint entries is done by summing out the hidden variables:
PpY|E“eq “ αPpY,E“eq ÿ
“ α PpY,E“e,H“hq h
The terms in the summation are joint entries because Y, E, and H together exhaust the set of random variables
⃝c -Trenn, King’s College London 18