CM3112 Artificial Intelligence
Probability theory: Basic notions
Steven Schockaert
SchockaertS1@cardiff.ac.uk
School of Computer Science & Informatics Cardiff University
Motivating example: computer poker
(Billings et al. 1999)
Motivating example: IBM Watson
Motivating example: car insurance
Probability theory
Definitions Counting Independence Conditional
Probability theory deals with the outcome of random experiments, ns
i.e. experiments whose outcome cannot be predicted with certainty (e.g. flipping a coin)
The sample space S is the set of all possible outcomes of that experiment (e.g. {heads,tails})
t S ⇥= ⇤ of all possible outcomes is called the sample space
A probability distribution p assigns to each outcome s a value
ability distribution is a function p from S to [0, 1], such that between 0 and 1, such that:
Xp(s) = 1 s S
The value p(s) reflects how likely the outcome s is: if the experiment
lue p(s) is called the probability of seeing outcome s. It reflects how is repeated a sufficiently large number of times N, then we expect to
is that the outcome of the associated experiment is s, relative to the see the outcome s approximately p(s) x N times
utcomes in S
o
b
a t
o
Probability theory: example
Rolling a pair of dice
S={2,3,4,…,12}
p(2)=p(12)=1/36 p(3)=p(11)=2/36 p(4)=p(10)=3/36 p(5)=p(9)=4/36 p(6)=p(8)=5/36 p(7)=6/36
Being dealt two cards from a standard deck
S={(1♠,1♣),(1♥,1♦),…, (Q♦,K♦)} p(1♠,1♣) = 1/(26×51)
…
p(Q♦,K♦) = 1/(26×51)
ionsIt is easy to see that
Introduction Definitions Counting
Independence C
Probability theory
P(⇤) = 0 P(S) = 1 Basic properties
P(coA) = 1 P(A) bset Aw heSreiscocAall=edSa\nAevisenctalled the complement of A
An event A is a subset of the sample space S It is easy to see that
The probability measure P induced by the probability distribution p,
IfA,…,A arepairwisedisjoint(i.e.A ⇧A =⇤fori⇥=j),wehave each prob1abilityndistribution p, we can assoiciatej a probability measure
s folaloswsisg:ns to each event A a value between 0 and 1, as follows:
P(⇤) = 0 P(S) = 1 P(coA) = 1 P(A)
P(A1 ⌅…⌅An)=P(A1)+…+P(an)
ion Definitions Independence Conditio
Counting
P:2S ⇥X[0,1]
is called the comple
plies for arbitrary eve A ⌅⇥ p(a)
irwise disjoint (i.e. A A\B)⌅(A⇧B)⌅(i
a A
wherecoA=S\AmentofA NotethatthisimntsAandBthat
IfA,…,A arepa ⇧A =⇤fori⇥=j),wehave P(1A ⌅ Bn) = P(( B \ Aj ))
= P(A\B)+P(A⇧B)+P(B \A)
c properties
P(A ⌅…⌅A )=P(A )+…+P(a ) S1n1n
Some notable properties of probability measures are:
is easy to see that
A ⇤ 2 , P(A) is called the probability of event A. It reflects how likely it i
= (P(A \ B) + P(A ⇧ B)) + (P(B \ A) + P(A ⇧ B)) P . No
P(A
A)) \ |A| A
A ,…,A are pairwise disjoint (i.e. A A = for i = j), we have =(P(A B)+P(A B))+(P(B A)+P(A B)) P
the out for the
come of the associated experiment will be among those in A Note that this implies for arbitrary events A and B that
P(⇤) = 0 P(S) = 1 P(coA) = 1 P(A) = P ((A \ B) ⌅ (A ⇧ B)) + P ((B \ A) ⌅ (A ⇧ B))
uniform distribution p, we have
P(A ⌅ B) = =
=S Ais
P((A\B)⌅(A⇧B)⌅(B \ P(A)+P(B) P(A⇧B)
called the complement of
here coA
where coA = S\A |S|
= P(A \ B)P+(AP)(A=⇧ B) + P(B \ A)
1n ij
t
u
h a
t
i
t
w
f ⇧⇤⇥
t
⇧ \⇧\⇧ (
Probability theory: example
Rolling a pair of dice: probability of rolling an even number
P({2,4,6,8,10,12}) = 1/2
Being dealt a pair of 2s from a standard deck
P({(2♠,2♣), (2♠,2♥), (2♠,2♦), (2♣,2♥), (2♣,2♦), (2♥,2♦)}) = 6/(26×51)
Left-Sock ⇥ Left-Shoe
Probability theory
LeftSockOn
LeftShoeOn
Left-Shoe ⇥ Finish RightShoeOn
Right-Shoe ⇥ Finish
The conditional probability P(A|B) encodes how likely event A is,
knowing that event B is the case P(A|B) =
Example: What is the probability of being dealt 2♣ knowing that you have been dealt a pair of 2s
P({one card is 2♣ | {(2♠,2♣), (2♠,2♥), (2♠,2♦), (2♣,2♥), (2♣,2♦), (2♥,2♦)})
= P({(2♠,2♣), (2♣,2♥), (2♣,2♦)}) / P({(2♠,2♣), (2♠,2♥), (2♠,2♦), (2♣,2♥), (2♣,2♦), (2♥,2♦)}) = 0.5
P(A⇤B) P(B)
onTable(C) ⌅ on(B, C) ⌅ empty LeftSockOn
Left-Sock ⇥ Left-Shoe LeftShoeOn
Probability theory ⌅ clear(A) ⌅ on(A, B) Left-Shoe ⇥ Finish
RightShoeOn
Right-Shoe ⇥ Finish LeftSockOn
The conditional probability P(A|B) encodes how likely event A is, Left-Sock ⇥ Left-Shoe
knowing that event B is the case LeftShoeOn
Left-Shoe ⇥ Finish RightShoeOn
P(A⇤B) Right-Shoe ⇥ Finish
P(A|B) =
From this definition, we can easily derive the well-known Bayes’
P(B|A)·P(A) P(B)
theorem:
P(A|B) =
P(B)
P(A|B) =
P(B) P(A⇤B)
Left-Sock LeftSockOn Left-Shoe
Probability theory RightShoeOn
⇥
LeftShoeOn
Left-Shoe ⇥ Finish
P(A|B) =
Right-Shoe ⇥ Finish
P(B)
The partition theorem states that for any partition B ⋃ … ⋃ B of
P(A⇤B)
P(A|B) = the sample space S it holds that
P(B|A)·P(A)
1k
P (A) = The chain rule states that:
P (A|B ) · P (B ) i i
P(A|B) = k
P(B) P(A⇤B)
P(A) = P(A|B) =
k i=1
⇥k i=1
i=1
P(B)
P(B) P(A|Bi) · P(Bi)
P(B|A)·P(A)
P(A1 ⇤ … ⇤ Ak) =
P(Ai|A1 ⇤ … ⇤ Ai 1)
Probability theory
The partition theorem states that for any partition B1⋃ … ⋃ Bk of the sample space S it holds that
P(A) = P(A | B1)P(B1) + P(A | B2)P(B2) + P(A | B3)P(B3)
The chain rule states that:
P(A1 ∩ A2 ∩ A3) = P(A1)P(A2 | A1)P(A3 | A1 ∩ A2)
Examples
Simplify the following expression:
P(A∩B∩C|coC)
=
P(A∩B∩C∩coC) P(∅)P(coC)
= =0
P(coC)
Examples
Simplify the following expression:
P(A∩B∩C∩coC) P(∅)P(coC)
P(A ∩ B ∩ C|coC) =
= =0
P(coC)
Examples
Simplify the following expression:
P(A|B∩C)⋅P(B∩C)+P(A|B∩coC)⋅P(B∩coC)+P(A|coB)⋅P(coB)
= P(A)
Examples
Simplify the following expression:
P(A|B∩C)⋅P(B∩C)+P(A|B∩coC)⋅P(B∩coC)+P(A|coB)⋅P(coB)
= P(A)
Examples
Simplify the following expression:
P(A|B∩C)⋅P(B∩C)+P(A|B∩coC)⋅P(B∩coC)+P(A|coB)⋅P(coB) = P(A)
Examples
Simplify the following expression:
P(A|B)⋅(P(B∩C)+P(B∩coC))
= P(A|B) ⋅ (P(B|C) ⋅ P(C) + P(B|coC) ⋅ P(coC)) = P(A|B) ⋅ P(B)
= P(A ∩ B)
Examples
Simplify the following expression:
P(A|B)⋅(P(B∩C)+P(B∩coC))
= P(A|B) ⋅ (P(B|C) ⋅ P(C) + P(B|coC) ⋅ P(coC))
= P(A|B) ⋅ P(B) = P(A ∩ B)
Examples
Simplify the following expression:
P(A|B)⋅(P(B∩C)+P(B∩coC))
= P(A|B) ⋅ (P(B|C) ⋅ P(C) + P(B|coC) ⋅ P(coC)) = P(A|B) ⋅ P(B)
= P(A ∩ B)