Bayes’ Rule & conditional independence
In naive Bayes models, one assumes that
PpCause, Effect1, . . . , Effectnq “ PpCauseq
Conditional independence is an example of naive Bayes
⃝c -Trenn, King’s College London
2
ź
i
PpEffecti|Causeq
Naive Bayes
Assuming conditional independent effects, reduces the model the problem
ź
i
Total number of parameters is linear in the number of conditionally independent
effects n.
It is called ‘naive’, because it is oversimplifying: in many cases the ‘effect’ variables aren’t actually conditionally independent given the cause variable. Example:
‚ Cause: it rained yesterday
‚ Effect1: the streets are wet this morning
‚ Effect2: I’m late for my class
‚ Ifthestreetswerestillwet,thenanaccidentwasmorelikelytohappenandthecaused
traffic jam could be the reason for being late
PpCause, Effect1, . . . , Effectnq “ PpCauseq
PpEffecti|Causeq
⃝c -Trenn, King’s College London 3
Naive Bayes
PpCause, Effect1, . . . , Effectnq “ Pp as:
ź
i
PpEffecti|Causeq
⃝c -Trenn, King’s College London
4
Bayesian networks
A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions
Topology of network encodes conditional independence assertions:
W eather is independent of the other variables Toothache and Catch are conditionally independent
given Cavity
⃝c -Trenn, King’s College London 5
Bayesian Networks
Bayesian networks are a way to represent these dependencies:
‚ Eachnodecorrespondstoarandomvariable(whichmaybediscreteorcontinuous)
‚ Adirectededge(alsocalledlinkorarrow)fromnodeutonodevmeansthatuisthe
parent of v.
‚ Likewise, v is a child of u
‚ Thegraphhasnodirectedcycles(andhenceisadirectedacyclicgraph,orDAG).
‚ EachnodeuhasaconditionalprobabilityPpu|Parentspuqqthatquantifiestheeffect
of the parent nodes
Example: C depends on A and B, and A and B are independent.
AB
C
⃝c -Trenn, King’s College London 6
Bayesian networks
(http://www.igi.tugraz.at)
⃝c -Trenn, King’s College London 7
Bayesian networks
How can we represent the knowledge about the probabilities?
Conditional distribution represented as a conditional probability table (CPT) giving
the distribution over u for each combination of parent values A B PpC|A,Bq
T T 0.2
T F 0.123 F T 0.9
F F 0.51
⃝c -Trenn, King’s College London
8
Bayesian networks
Bayesian networks ‰ Naive Bayes
These are somewhat orthogonal. Naive Bayes might be used in Bayesian networks. Also don’t confuse them with Bayes’ rule
⃝c -Trenn, King’s College London 9
Bayesian networks
An example (from California):
I’m at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn’t call. Sometimes it’s set off by minor earthquakes. Is there a burglar?
Variables: Burglar, Earthquake, Alarm, JohnCalls, MaryCalls Network topology reflects “causal” knowledge:
‚ Aburglarcansetthealarmoff
‚ Anearthquakecansetthealarmoff ‚ ThealarmcancauseMarytocall
‚ ThealarmcancauseJohntocall
⃝c -Trenn, King’s College London 10
Bayesian networks
⃝c -Trenn, King’s College London 11
A note on CPTs
The CPTs in the previous slide appear to be missing some values:
A PpJ|Aq T 0.90
F 0.05
has two values rather then the four which would completely specify the relation between J and A.
The table tells us that:
which means:
becausePpJ “T|A“Tq`PpJ “F|A“Tq“1
PpJ “T|A“Tq“0.9 PpJ “F|A“Tq“0.1
⃝c -Trenn, King’s College London 12
A note on CPTs
Or, writing the values of J and A the other way:
because P pj|aq ` P p␣j|aq “ 1
⃝c -Trenn, King’s College London
13
P pj|aq “ 0.9 P p␣j|aq “ 0.1
Applications
⃝c -Trenn, King’s College London 14