COMP9418: Advanced Topics in Statistical Machine Learning
Probability Calculus
Instructor: University of Wales
Introduction
§ In this lecture, we introduce probability calculus
§ Framework for reasoning with uncertain beliefs
§ Each event is assigned a degree of belief which quantifies the belief in that event
§ This topic covered in this lecture will be fundamental to build a framework for reasoning with uncertainty
§ Degree of belief updating
§ The notion of independence
2
Degrees of Belief
§ A propositional knowledge base Δ classifies sentences into three categories
§ Sentences implied by Δ
§ Sentences whose negations are implied by Δ § All other sentences
§ We can obtain a much finer classification of sentences through a finer classification of worlds
§ Assigning a degree of belief or probability in [0,1] to each world 𝑤 (𝑃(𝑤))
§ The belief in, or probability of, a sentence 𝛼 is defined as
𝑃(𝛼) ≝ – 𝑃(𝑤) !⊨#
§ That is, the sum of probabilities assigned to worlds at which 𝛼 is true
𝛼
𝛼
Δ
𝛼
Δ
Δ ⊨ ¬𝛼
Δ⊨𝛼 Δ
Δ⊭𝛼 Δ ⊭ ¬𝛼
3
Degrees of Belief
world
Earthquake
Burglary
Alarm
𝑷(#)
𝑤!
true
true
true
.0190
𝑤”
true
true
false
.0010
𝑤#
true
false
true
.0560
𝑤$
true
false
false
.0240
𝑤%
false
true
true
.1620
𝑤&
false
true
false
.0180
𝑤’
false
false
true
.0072
𝑤(
false
false
false
.7128
§ This table shows a state of belief or joint probability distribution
§ We require the degrees of belief assigned to all words sum up to 1
§ This is a normalization convention
§ It allows to directly compare the degrees of belief
held by different states
§ The joint probability distribution is usually too large for direct representation
§ 20 binary variables – 1,048,576 entries
§ 40 binary variables – 1,099,511,627,776 entries
§ We will deal with this issue by using Graphical Models
𝑃𝐸𝑎𝑟𝑡h𝑞𝑢𝑎𝑘𝑒 =𝑃𝑤! +𝑃𝑤” +𝑃𝑤# +𝑃𝑤$ =.1 𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 = .2
𝑃 ¬𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 = .8
𝑃 𝐴𝑙𝑎𝑟𝑚 = .2442
4
Properties of Beliefs
§ Some degrees of belief (or simply, beliefs) properties
§ Beliefs are bounded for any sentence 𝛼 into the [0,1] interval
§ An inconsistent sentence 𝛼 have belief equal to zero
§ A valid sentence 𝛼 has belief equal to one
§ We can compute the belief of a sentence given the belief in its negation
0≤P𝛼≤1
P 𝛼 = 0 when 𝛼 is inconsistent
P 𝛼 =1when𝛼isvalid P𝛼 +P¬𝛼 =1
§Wecanalsocomputethebeliefinadisjunction P 𝛼∨𝛽 =P 𝛼 +𝑃 𝛽 −𝑃(𝛼∧𝛽) 𝛼𝛽
𝛼
¬𝛼
5
Quantifying Uncertainty
§ This tables summarises the beliefs associated with the alarm example
§ The beliefs seem most certain that an Earthquake has occurred
§ We are least certain if an alarm has triggered
§ We can quantify uncertainty using entropy
§ Where 0 log 0 = 0 by convention
§ This plot shows the entropy for different probabilities
§ When 𝑝 = 0 or 𝑝 = 1, the entropy is zero, indicating no uncertainty
§ When 𝑝 = $%, the entropy is at a maximum value
𝐻𝑋 =−H𝑃𝑋𝑙𝑜𝑔”𝑃(𝑋) %
Earthquake
Burglary
Alarm
true
.1
.2
.2442
false
.9
.8
.7558
𝐻(F)
.469
.722
.802
6
Updating Beliefs
world
Earthquake
Burglary
Alarm
𝑷(#)
𝑷(# |𝑨𝒍𝒂𝒓𝒎)
𝑤!
true
true
true
.0190
𝑤”
true
true
false
.0010
𝑤#
true
false
true
.0560
𝑤$
true
false
false
.0240
𝑤%
false
true
true
.1620
𝑤&
false
true
false
.0180
𝑤’
false
false
true
.0072
𝑤(
false
false
false
.7128
§ Suppose now that we know that the alarm has triggered
§ 𝐴𝑙𝑎𝑟𝑚 = 𝑡𝑟𝑢𝑒
§ We need to accommodate the state of belief to this new piece of information we call evidence
§ Evidence is represented by an event, say 𝛽
§ Our objective is to update the state of belief 𝑃(<) into a new state of belief we will denote 𝑃(< |𝛽)
7
Updating Beliefs
world
Earthquake
Burglary
Alarm
𝑷(#)
𝑷(# |𝑨𝒍𝒂𝒓𝒎)
𝑤!
true
true
true
.0190
𝑤"
true
true
false
.0010
0
𝑤#
true
false
true
.0560
𝑤$
true
false
false
.0240
0
𝑤%
false
true
true
.1620
𝑤&
false
true
false
.0180
0
𝑤'
false
false
true
.0072
𝑤(
false
false
false
.7128
0
§ Given that 𝛽 is known for sure § The new state of belief 𝑃(< |𝛽)
should assign a belief 1 to 𝛽 § 𝑃(𝛽|𝛽) = 1
§ This implies that 𝑃(¬𝛽|𝛽) = 0
§ This implies that every world 𝑤 that satisfies ¬𝛽 must be assigned to belief 0
𝑃𝑤𝛽 =0 𝑓𝑜𝑟𝑎𝑙𝑙𝑤⊨¬𝛽
8
Updating Beliefs
§ We know that
§ But these leaves many options for the 𝑃(𝑤|𝛽) for 𝑤’s that satisfies 𝛽
§ It is reasonable to perturb the beliefs as little as possible, so § 𝑃(𝑤|𝛽) = 0 forall𝑤where𝑃(𝑤) = 0
§ ' ! = '(!|&) forall𝑤,𝑤’⊨𝛽,𝑃(𝑤)>0,𝑃(𝑤’)>0 ‘(!’) ‘(!’|&)
§ With these three constraints, the only options for the new beliefs is
§𝑃𝑤𝛽 =’! forall𝑤⊨𝛽 ‘(&)
-𝑃𝑤𝛽=1 !⊨&
9
Updating Beliefs
world
Earthquake
Burglary
Alarm
𝑷(#)
𝑷(# |𝑨𝒍𝒂𝒓𝒎)
𝑤!
true
true
true
.0190
.0190/.2442
𝑤”
true
true
false
.0010
0
𝑤#
true
false
true
.0560
.0560/.2442
𝑤$
true
false
false
.0240
0
𝑤%
false
true
true
.1620
.1620/.2442
𝑤&
false
true
false
.0180
0
𝑤’
false
false
true
.0072
.0072/.2442
𝑤(
false
false
false
.7128
0
§ The new beliefs are just the normalization of old beliefs
§ The normalization constant is 𝑃(𝛽)
0 if 𝑤 ⊨ ¬𝛽
𝑃(𝑤|𝛽) ≝ D𝑃 𝑤 𝑃(𝛽)
§ The new state of belief is referred as conditioning the old state 𝑃 on evidence 𝛽
if 𝑤 ⊨ 𝛽
10
Updating Beliefs: Example
world
Earthquake
Burglary
Alarm
𝑷(#)
𝑷(# |𝑨𝒍𝒂𝒓𝒎)
𝑤!
true
true
true
.0190
.0778
𝑤”
true
true
false
.0010
0
𝑤#
true
false
true
.0560
.2293
𝑤$
true
false
false
.0240
0
𝑤%
false
true
true
.1620
.6634
𝑤&
false
true
false
.0180
0
𝑤’
false
false
true
.0072
.0295
𝑤(
false
false
false
.7128
0
§ Let us suppose 𝐴𝑙𝑎𝑟𝑚 = 𝑡𝑟𝑢𝑒 and compute some conditional probabilities
§ 𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 =
§ 𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦|𝐴𝑙𝑎𝑟𝑚 =
§ Also,
§ 𝑃(𝐸𝑎𝑟𝑡h𝑞𝑢𝑎𝑘𝑒) =
§ 𝑃(𝐸𝑎𝑟𝑡h𝑞𝑢𝑎𝑘𝑒|𝐴𝑙𝑎𝑟𝑚) =
11
Updating Beliefs: Example
world
Earthquake
Burglary
Alarm
𝑷(#)
𝑷(# |𝑨𝒍𝒂𝒓𝒎)
𝑤!
true
true
true
.0190
.0778
𝑤”
true
true
false
.0010
0
𝑤#
true
false
true
.0560
.2293
𝑤$
true
false
false
.0240
0
𝑤%
false
true
true
.1620
.6634
𝑤&
false
true
false
.0180
0
𝑤’
false
false
true
.0072
.0295
𝑤(
false
false
false
.7128
0
§ Let us suppose 𝐴𝑙𝑎𝑟𝑚 = 𝑡𝑟𝑢𝑒 and compute some conditional probabilities
§ 𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 = .2
§ 𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦|𝐴𝑙𝑎𝑟𝑚 = .741
§ Also,
§ 𝑃 𝐸𝑎𝑟𝑡h𝑞𝑢𝑎𝑘𝑒 = .1
§ 𝑃 𝐸𝑎𝑟𝑡h𝑞𝑢𝑎𝑘𝑒 𝐴𝑙𝑎𝑟𝑚 = .307
12
Updating Beliefs: Closed Form
§ There is a simple closed form to compute the updated belief
𝑃𝛼𝛽
= – 𝑃(𝑤|𝛽) !⊨#
= – 𝑃𝑤𝛽+ – 𝑃(𝑤|𝛽) !⊨#,!⊨& !⊨#,!⊨¬&
= – 𝑃(𝑤|𝛽) !⊨#,!⊨&
= – 𝑃(𝑤|𝛽) !⊨#∧&
= – 𝑃(𝑤)/𝑃 𝛽 !⊨#∧&
= 1 – 𝑃(𝑤) 𝑃(𝛽) !⊨#∧&
= 𝑃(𝛼, 𝛽) 𝑃(𝛽)
𝑃𝛼𝛽 =𝑃(𝛼∧𝛽) 𝑃(𝛽)
This equation is known as Bayes’ conditioning. It is only defined when 𝑃𝛽≠0
13
Bayes Conditioning
Earthquake
Burglary
Alarm
𝑷(#)
true
true
true
.0190
true
true
false
.0010
true
false
true
.0560
true
false
false
.0240
false
true
true
.1620
false
true
false
.0180
false
false
true
.0072
false
false
false
.7128
§ Bayes conditioning follows the following commitments:
§ Worlds that contradict the evidence 𝛽 will have zero probability
§ Worlds that have zero probability continue to have zero probability
§ Worlds that are consistent with evidence 𝛽 and have positive probability will maintain their relative beliefs
Earthquake
Burglary
Alarm
true
.1
.2
.2442
false
.9
.8
.7558
14
Bayes Conditioning: Example
Earthquake
Burglary
Alarm
𝑷(#)
true
true
true
.0190
true
true
false
.0010
true
false
true
.0560
true
false
false
.0240
false
true
true
.1620
false
true
false
.0180
false
false
true
.0072
false
false
false
.7128
§ Let us analyse how some beliefs would change given evidence 𝐸𝑎𝑟𝑡h𝑞𝑢𝑎𝑘𝑒
§ 𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 =
§ 𝑃(𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦|𝐸𝑎𝑟𝑡h𝑞𝑢𝑎𝑘𝑒) =
§ Also,
§ 𝑃 𝐴𝑙𝑎𝑟𝑚 =
§ 𝑃 𝐴𝑙𝑎𝑟𝑚 𝐸𝑎𝑟𝑡h𝑞𝑢𝑎𝑘𝑒 =
§ Now, considering evidence 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦
§ 𝑃 𝐴𝑙𝑎𝑟𝑚 =
§ 𝑃 𝐴𝑙𝑎𝑟𝑚 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 =
§ Also,
§ 𝑃(𝐸𝑎𝑟𝑡h𝑞𝑢𝑎𝑘𝑒) =
§ 𝑃(𝐸𝑎𝑟𝑡h𝑞𝑢𝑎𝑘𝑒|𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦) =
15
Earthquake
Burglary
Alarm
true
.1
.2
.2442
false
.9
.8
.7558
Bayes Conditioning: Example
Earthquake
Burglary
Alarm
𝑷(#)
true
true
true
.0190
true
true
false
.0010
true
false
true
.0560
true
false
false
.0240
false
true
true
.1620
false
true
false
.0180
false
false
true
.0072
false
false
false
.7128
§ Let us analyse how some beliefs would change given evidence 𝐸𝑎𝑟𝑡h𝑞𝑢𝑎𝑘𝑒
§ 𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 = .2
§ 𝑃(𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦|𝐸𝑎𝑟𝑡h𝑞𝑢𝑎𝑘𝑒) = .2
§ Also,
§ 𝑃 𝐴𝑙𝑎𝑟𝑚 = .2442
§ 𝑃 𝐴𝑙𝑎𝑟𝑚 𝐸𝑎𝑟𝑡h𝑞𝑢𝑎𝑘𝑒 ≈ .75
§ Now, considering evidence 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 § 𝑃 𝐴𝑙𝑎𝑟𝑚 = .2442
§ 𝑃 𝐴𝑙𝑎𝑟𝑚 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 ≈ .905
§ Also,
§ 𝑃(𝐸𝑎𝑟𝑡h𝑞𝑢𝑎𝑘𝑒) = .1
§ 𝑃(𝐸𝑎𝑟𝑡h𝑞𝑢𝑎𝑘𝑒|𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦) = .1
16
Earthquake
Burglary
Alarm
true
.1
.2
.2442
false
.9
.8
.7558
Bayes Conditioning: Example
Earthquake
Burglary
Alarm
𝑷(#)
true
true
true
.0190
true
true
false
.0010
true
false
true
.0560
true
false
false
.0240
false
true
true
.1620
false
true
false
.0180
false
false
true
.0072
false
false
false
.7128
§ These beliefs dynamics are a property of the state of belief in this table
§ It may not hold for other states of belief
§ For instance, we can conceive a state of belief in which evidence 𝐸𝑎𝑟𝑡h𝑞𝑢𝑎𝑘𝑒 would change the belief about 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦
§ A central question is synthetizing states of beliefs that are faithful
§ For instance, those that correspond to the beliefs held by some human expert
§ We will see this is a central issue of modelling a real problem
Earthquake
Burglary
Alarm
true
.1
.2
.2442
false
.9
.8
.7558
17
Bayes Conditioning: Example
Earthquake
Burglary
Alarm
𝑷(#)
true
true
true
.0190
true
true
false
.0010
true
false
true
.0560
true
false
false
.0240
false
true
true
.1620
false
true
false
.0180
false
false
true
.0072
false
false
false
.7128
§ Let us look at one more example as we add more evidence
§ 𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 𝐴𝑙𝑎𝑟𝑚 =
§ 𝑃(𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦|𝐴𝑙𝑎𝑟𝑚 ∧ 𝐸𝑎𝑟𝑡h𝑞𝑢𝑎𝑘𝑒) =
§ Also,
§ 𝑃(𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦|𝐴𝑙𝑎𝑟𝑚 ∧ ¬𝐸𝑎𝑟𝑡h𝑞𝑢𝑎𝑘𝑒) =
Earthquake
Burglary
Alarm
true
.1
.2
.2442
false
.9
.8
.7558
18
Bayes Conditioning: Example
Earthquake
Burglary
Alarm
𝑷(#)
true
true
true
.0190
true
true
false
.0010
true
false
true
.0560
true
false
false
.0240
false
true
true
.1620
false
true
false
.0180
false
false
true
.0072
false
false
false
.7128
§ Let us look at one more example as we add more evidence
§ 𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 𝐴𝑙𝑎𝑟𝑚 ≈ .741
§ 𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 𝐴𝑙𝑎𝑟𝑚 ∧ 𝐸𝑎𝑟𝑡h𝑞𝑢𝑎𝑘𝑒 ≈ .253
§ Also,
§ 𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 𝐴𝑙𝑎𝑟𝑚 ∧ ¬𝐸𝑎𝑟𝑡h𝑞𝑢𝑎𝑘𝑒 ≈ .957
Earthquake
Burglary
Alarm
true
.1
.2
.2442
false
.9
.8
.7558
19
Conditional Entropy
§ The conditional entropy of a variable 𝑋 given another variable 𝑌
§ It quantifies the average uncertainty about 𝑋 after observing 𝑌
§ We can show that the entropy never increases after conditioning
𝐻(𝑋|𝑌) ≤ 𝐻(𝑋)
§ Although for a specific value 𝑦 we may have 𝐻(𝑋|𝑦) > 𝐻(𝑋)
𝐻 𝑋|𝑌 ≝H𝑃 𝑦 𝐻(𝑋|𝑦) &
𝐻 𝑋|𝑦
≝ − H 𝑃 𝑥|𝑦 𝑙𝑜𝑔”𝑃(𝑥|𝑦) %
𝐵
𝐵|𝑎
𝐵|¬𝑎
true
.2
.741
.025
false
.8
.259
.975
𝐻(F)
.722
.825
.169
20
Conditional Entropy
§ The conditional entropy of a variable 𝑋 given another variable 𝑌
§ It quantifies the average uncertainty about 𝑋 after observing 𝑌
§ We can show that the entropy never increases after conditioning
𝐻(𝑋|𝑌) ≤ 𝐻(𝑋)
§ Although for a specific value 𝑦 we may have 𝐻(𝑋|𝑦) > 𝐻(𝑋)
§ 𝐻(𝐵|𝐴) = 𝐻(𝐵|𝑎)𝑃(𝑎) + 𝐻(𝐵|¬𝑎)𝑃(¬𝑎) = .329
𝐻 𝑋|𝑌 ≝H𝑃 𝑦 𝐻(𝑋|𝑦) &
𝐻 𝑋|𝑦
≝ − H 𝑃 𝑥|𝑦 𝑙𝑜𝑔”𝑃(𝑥|𝑦) %
𝐵
𝐵|𝑎
𝐵|¬𝑎
true
.2
.741
.025
false
.8
.259
.975
𝐻(F)
.722
.825
.169
21
Independence
§ We observed that evidence 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 does not change belief in 𝐸𝑎𝑟𝑡h𝑞𝑢𝑎𝑘𝑒
§ More generally, we say that event 𝛼 is independent of event 𝛽 iff
§ 𝑃(𝛼|𝛽) = 𝑃(𝛼) or § 𝑃(𝛽) = 0
§ We also found 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 independent of 𝐸𝑎𝑟𝑡h𝑞𝑢𝑎𝑘𝑒
§ It is indeed a general property
§ If 𝛼 is independent of 𝛽 then 𝛽 is independent of 𝛼
§ P finds 𝛼 and 𝛽 are independent iff §𝑃𝛼∧𝛽=𝑃𝛼𝑃𝛽
§ This equation is often taken as the definition of independence
𝑃 𝐸𝑎𝑟𝑡h𝑞𝑢𝑎𝑘𝑒 = .1 𝑃(𝐸𝑎𝑟𝑡h𝑞𝑢𝑎𝑘𝑒|𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦) = .1
𝑃(𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦) = .2 𝑃(𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦|𝐸𝑎𝑟𝑡h𝑞𝑢𝑎𝑘𝑒) = .2
22
Independence
§ We observed that evidence 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 does not change belief in 𝐸𝑎𝑟𝑡h𝑞𝑢𝑎𝑘𝑒
§ More generally, we say that event 𝛼 is independent of event 𝛽 iff
§ 𝑃(𝛼|𝛽) = 𝑃(𝛼) or § 𝑃(𝛽) = 0
§ We also found 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 independent of 𝐸𝑎𝑟𝑡h𝑞𝑢𝑎𝑘𝑒
§ It is indeed a general property
§ If 𝛼 is independent of 𝛽 then 𝛽 is independent of 𝛼
§ P finds 𝛼 and 𝛽 are independent iff §𝑃𝛼∧𝛽=𝑃𝛼𝑃𝛽
§ This equation is often taken as the definition of independence
𝑃 𝐸𝑎𝑟𝑡h𝑞𝑢𝑎𝑘𝑒 = .1 𝑃(𝐸𝑎𝑟𝑡h𝑞𝑢𝑎𝑘𝑒|𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦) = .1
𝑃(𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦) = .2 𝑃(𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦|𝐸𝑎𝑟𝑡h𝑞𝑢𝑎𝑘𝑒) = .2
23
Independence and Mutual Exclusiveness
§ Independence and mutual exclusiveness are not the same notion
§ Two events 𝛼 and 𝛽 are mutually exclusive (logically disjoint) iff they do not share any models
§𝑀𝑜𝑑𝑠 𝛼 ∩𝑀𝑜𝑑𝑠(𝛽)=∅
§ They cannot hold together at the same world
§ Events 𝛼 and 𝛽 are independent iff 𝑃(𝛼 ∧ 𝛽) = 𝑃(𝛼)𝑃(𝛽)
24
Conditional Independence
§ Independence is a dynamic notion
§ Two independent events may become dependent after
some evidence
§ For example, we saw that Burglary was independent of Earthquake. However, these events are dependent after accepting evidence Alarm
§ This is expected since 𝐸𝑎𝑟𝑡h𝑞𝑢𝑎𝑘𝑒 and 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 are competing explanations to 𝐴𝑙𝑎𝑟𝑚
§ Consider this table for another example
§ We have two sensors that can detect the current state of
temperature
§ The sensors are noisy and have different reliabilities
𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 𝐴𝑙𝑎𝑟𝑚 ≈ .741
𝑃 𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦 𝐴𝑙𝑎𝑟𝑚 ∧ 𝐸𝑎𝑟𝑡h𝑞𝑢𝑎𝑘𝑒
≈ .253
world
Temp
Sensor1
Sensor2
𝑷(#)
𝑤!
normal
normal
normal
.576
𝑤”
normal
normal
extreme
.144
𝑤#
normal
extreme
normal
.064
𝑤$
normal
extreme
extreme
.016
𝑤%
extreme
normal
normal
.008
𝑤&
extreme
normal
extreme
.032
𝑤’
extreme
extreme
normal
.032
𝑤(
extreme
extreme
extreme
.128
25
Conditional Independence
world
Temp
Sensor1
Sensor2
𝑷(#)
𝑤!
normal
normal
normal
.576
𝑤”
normal
normal
extreme
.144
𝑤#
normal
extreme
normal
.064
𝑤$
normal
extreme
extreme
.016
𝑤%
extreme
normal
normal
.008
𝑤&
extreme
normal
extreme
.032
𝑤’
extreme
extreme
normal
.032
𝑤(
extreme
extreme
extreme
.128
§ We have the following initial beliefs § 𝑃 𝑇𝑒𝑚𝑝 = 𝑛𝑜𝑟𝑚𝑎𝑙 = .80
§ 𝑃(𝑆𝑒𝑛𝑠𝑜𝑟1 = 𝑛𝑜𝑟𝑚𝑎𝑙) = .76
§ 𝑃(𝑆𝑒𝑛𝑠𝑜𝑟2 = 𝑛𝑜𝑟𝑚𝑎𝑙) = .68
§ If the first sensor reads normal
§ Our belief that the second sensor also reads normal
increases
§ 𝑃 𝑆𝑒𝑛𝑠𝑜𝑟2 = 𝑛𝑜𝑟𝑚𝑎𝑙 𝑆𝑒𝑛𝑠𝑜𝑟1 = 𝑛𝑜𝑟𝑚𝑎𝑙 ≈ .768
§ Therefore the sensors readings are dependent
§ However, they will become independent if we observe the temperature is normal
§ 𝑃(𝑆𝑒𝑛𝑠𝑜𝑟2 = 𝑛𝑜𝑟𝑚𝑎𝑙|𝑇𝑒𝑚𝑝 = 𝑛𝑜𝑟𝑚𝑎𝑙) = .80
§ 𝑃(𝑆𝑒𝑛𝑠𝑜𝑟2 = 𝑛𝑜𝑟𝑚𝑎𝑙|𝑇𝑒𝑚𝑝 = 𝑛𝑜𝑟𝑚𝑎𝑙, 𝑆𝑒𝑛𝑠𝑜𝑟1 = 𝑛𝑜𝑟𝑚𝑎𝑙) = .80
26
Conditional Independence
§ In general,
§ Independent event may become dependent given new
evidence
§ Dependent events may become independent given new evidence
§ Event 𝛼 is conditionally independent of event 𝛽 given event 𝛾 iff
§ Conditional independence is also symmetric § 𝛼 is conditionally independent of 𝛽 given 𝛾 iff 𝛽 is
conditionally independent of 𝛼 given 𝛾
§ This equation is often taken as definition of conditional independence
𝑃(𝛼|𝛽 ∧ 𝛾) = 𝑃(𝛼|𝛾) or 𝑃𝛾=0
𝑃(𝛼∧𝛽|𝛾)=𝑃 𝛼 𝛾 𝑃(𝛽|𝛾)or 𝑃𝛾=0
27
Variable Independence
§ It is useful to talk about independence of sets of variables
§ Let 𝑿, 𝒀 and 𝒁 be three disjoint sets of variables § 𝑿 is independent of 𝒀 given 𝒁 is denoted by
§ It means that 𝒙 is independent of 𝒚 given 𝒛 for all instantiations of 𝒙, 𝒚 and 𝒛
§ For example, suppose that 𝑿 = {𝐴, 𝐵}, 𝒀 = {𝐶} and 𝒁 = {𝐷, 𝐸}
§ Where 𝐴, 𝐵, 𝐶, 𝐷 and 𝐸 are propositional variables
§ The statement 𝑿 ⊥ 𝒀|𝒁 is a compact notation for several statements about independence
𝑿 ⊥ 𝒀|𝒁 𝐼'(𝑿,𝒁,𝒀)
𝐴 ∧ B is independent of 𝐶 given 𝐷 ∧ E
𝐴 ∧ ¬B is independent of 𝐶 given 𝐷 ∧ E ¬𝐴 ∧ ¬B is independent of 𝐶 given 𝐷 ∧ E
…
¬𝐴 ∧ ¬B is independent of ¬𝐶 given ¬𝐷 ∧ ¬E
28
Mutual Information
§ Independence is a special case of a more general notion known as mutual information
§ Mutual information quantifies the impact of observing one variable on the uncertainty in another
§ Mutual information is § Non-negative
§ Equal to zero only if 𝑋 and 𝑌 are independent
§ It measures the extent to which observing one variable will reduce the uncertainty in another
𝑀𝐼 𝑋; 𝑌 ≝ – 𝑃 𝑥, 𝑦 log /,0
𝑃 𝑥, 𝑦
% 𝑃 𝑥 𝑃(𝑦)
𝑀𝐼𝑋;𝑌 =𝐻𝑋 −𝐻(𝑋|𝑌) 𝑀𝐼𝑋;𝑌 =𝐻𝑌 −𝐻(𝑌|𝑋)
29
Conditional Mutual Information
§ Conditional mutual information can be defined as
§ It has the following properties
§ Entropy and mutual information can be extended to sets of variables
§ For instance, entropy can be generalized to a set of variables 𝑿 as follows
𝑀𝐼 𝑋;𝑌|𝑍 ≝ – 𝑃 𝑥,𝑦,𝑧 log /,0,1
𝑃 𝑥,𝑦|𝑧
% 𝑃 𝑥|𝑧 𝑃(𝑦|𝑧)
𝑀𝐼 𝑋;𝑌|𝑍 =𝐻 𝑋|𝑍 −𝐻(𝑋|𝑌,𝑍) 𝑀𝐼 𝑋;𝑌|𝑍 =𝐻 𝑌|𝑍 −𝐻(𝑌|𝑋,𝑍)
𝐻 𝑿 =−-𝑃 𝒙 log%𝑃(𝒙) 𝒙
30
Conditional Probability for Multiple Variables
§ We can extend the definition for conditional probabilities for multiple variables
§Forthreevariables𝐴,𝐵and𝐶,wehave
𝑃 𝐴𝐵 =𝑃 𝐴,𝐵 𝑃𝐵
Bayes’ conditioning
𝑃 𝐴,𝐵𝐶 =𝑃 𝐴,𝐵,𝐶 𝑃𝐶
𝑃(𝐴|𝐵,𝐶) = 𝑃 𝐴,𝐵,𝐶 𝑃 𝐵,𝐶
𝑃(𝐴,𝐵)=𝑃 𝐴𝐵 𝑃(𝐵)
Product rule
31
Case Analysis
§ Also known as law of total probability
5
𝑃(𝛼) = – 𝑃 (𝛼 ∧ 𝛽3) 34$
Ω
The events 𝛽!, … , 𝛽( are mutually exclusive and exhaustive
𝛽! 𝛽”
𝛽’
𝛽$
𝛼 𝛽#
32
Chain Rule
§ It is the repeated application of Bayes conditioning
𝑃 𝛼$∧𝛼%∧⋯∧𝛼5 =𝑃 𝛼$ 𝛼%∧⋯∧𝛼5 𝑃 𝛼% 𝛼6∧⋯∧𝛼5 …𝑃(𝛼5)
33
Bayes Rule
§ Bayes rule or Bayes theorem
§ The classical usage of this rule is when event 𝛼 is perceived to be
a cause of event 𝛽
§ For example, 𝛼 is a disease and 𝛽 is a symptom
§ Our goal is to assess our belief in the cause given the effect
§ Example
§ Suppose that we have a patient who was just tested for a particular disease and the test came out positive. We know that one in every thousand people has this disease. We also know that the test is not reliable: it has a false positive rate of 2% and a false negative rate of 5%. Our goal is then to assess our belief in the patient having the disease given that the test came out positive.
𝑃𝛼𝛽 =𝑃𝛽𝛼𝑃(𝛼) 𝑃(𝛽)
34
Conclusion
§ In this lecture, we discussed some fundamental aspects of probabilistic calculus
§ Belief update
§ Independence and Conditional Independence § Bayes conditioning, case analysis,
§ Chain rule, Bayes rule
§ Tasks
§ Read Chapter 3, but Sections 3.6 and 3.7 from the textbook (Darwiche) 35