CS计算机代考程序代写 chain Bayesian AI Bayesian network algorithm 5a_Uncertainty.dvi

5a_Uncertainty.dvi

COMP9414 Uncertainty 1

Reasoning with Uncertainty

� An agent can not always ascertain the truth of all propositions, so

may not only have “flat out” beliefs (P or ¬P)

� Some environments themselves generate uncertainty for the agent, due

to unpredictability or nondeterminism, so propositions inadequately

model those environments

� Rational decisions for an agent require tradeoffs between the

importance of goals and the likelihood of achieving them, and the

cost of acting and not achieving them

UNSW ©W. Wobcke et al. 2019–2021

COMP9414: Artificial Intelligence

Lecture 5a: Uncertainty

Wayne Wobcke

e-mail:w. .au

UNSW ©W. Wobcke et al. 2019–2021

COMP9414 Uncertainty 3

Problems with Logical Approach

� Consider trying to formalize a medical diagnosis system

∀p(Symptom(p, AbdominalPain)→ Disease(p, Appendicitis))

� This rule is not correct since patients with abdominal pain may be

suffering from other diseases

∀p(Symptom(p, AbdominalPain)→

Disease(p, Appendicitis)∨Disease(p, Ulcer)∨Disease(p, Indig) · · ·)

� How about a causal rule?

∀p(Disease(p, Ulcer)→ Symptom(p, AbdominalPain))

UNSW ©W. Wobcke et al. 2019–2021

COMP9414 Uncertainty 2

This Lecture

� Uncertainty

� Probability Theory

� Conditional Probability and Bayes’ Rule

� Bayesian Networks

◮ Semantics of Bayesian Networks

◮ Inference in Bayesian Networks

UNSW ©W. Wobcke et al. 2019–2021

COMP9414 Uncertainty 5

What Do the Numbers Mean?

Statistical/Frequentist View

Long-range frequency of a set of “events” e.g. probability of the event

of “heads” appearing on the toss of a coin = long-range frequency of

heads that appear on coin toss

Objective View

Probabilities are real aspects of the world – objective

Personal/Subjective/Bayesian View

Measure of belief in proposition based on agent’s knowledge, e.g.

probability of heads is a degree of belief that coin will land heads

based on beliefs about the coin or could be just a guess; different

agents may assign a different probability – subjective

UNSW ©W. Wobcke et al. 2019–2021

COMP9414 Uncertainty 4

Sources of Uncertainty

� Difficulties arise with the logical approach due to

incompleteness: agent may not have complete theory for domain

ignorance: agent may not have enough information about domain

noise: information agent does have may be unreliable

nondeterminism: environment itself may be stochastic

unpredictability: environment may be inherently unpredictable

� Probability gives a way of summarizing this uncertainty

◮ e.g. agent believes that there is a probability of 0.75 that patient

suffers from appendicitis if they have abdominal pains

UNSW ©W. Wobcke et al. 2019–2021

COMP9414 Uncertainty 7

Sample Space and Events

TTT

TTH

HTH

THT
THH

HTT

HHT

HHH

Simple Event

Event

Sample Space

UNSW ©W. Wobcke et al. 2019–2021

COMP9414 Uncertainty 6

Sample Space and Events

� Flip a coin three times

� The possible outcomes are

TTT TTH THT THH

HTT HTH HHT HHH

� Set of all possible outcomes

S = {TTT, TTH, THT, THH, HTT, HTH, HHT, HHH}

� Any subset of the sample space is known as an event

� Any singleton subset of the sample space is known as a simple event

UNSW ©W. Wobcke et al. 2019–2021

COMP9414 Uncertainty 9

Random Variables

� Propositions are random variables that can take on several values

P(Weather = Sunny) = 0.8

P(Weather = Rain) = 0.1

P(Weather = Cloudy) = 0.09

P(Weather = Snow) = 0.01

� Every random variable X has a domain of possible values

〈x1, x2, · · · ,xn〉

� Probabilities of all possible values P(Weather)= 〈0.8, 0.1, 0.09, 0.01〉

is a probability distribution

� P(Weather, Appendicitis) is a combination of random variables

represented by cross product (can also use logical connectives

P(A∧B) to represent compound events)

UNSW ©W. Wobcke et al. 2019–2021

COMP9414 Uncertainty 8

Prior Probability

� P(A) is the prior or unconditional probability that an event A occurs

� For example, P(Appendicitis) = 0.3

� In the absence of any other information, agent believes there is a

probability of 0.3 (30%) that the patient suffers from appendicitis

� To account for the effect of new information on probabilities, the agent

must reason with conditional probabilities to update probabilities

UNSW ©W. Wobcke et al. 2019–2021

COMP9414 Uncertainty 11

Conditional Probability

� Need to update probabilities based on new information

� Use conditional or posterior probability

P(A|B) is the probability of A given that all we know is B

◮ e.g. P(Appendicitis|AbdominalPain) = 0.75

� Definition: P(A|B) =
P(A∧B)

P(B)
provided P(B)> 0

� Product Rule: P(A∧B) = P(A|B).P(B)

� P(X |Y ) = P(X = xi|Y = y j) for all i, j

P(X ,Y ) = P(X |Y ).P(Y ) – a set of equations

UNSW ©W. Wobcke et al. 2019–2021

COMP9414 Uncertainty 10

Axioms of Probability

1. 0 ≤ P(A)≤ 1

� All probabilities are between 0 and 1

2. P(True) = 1 P(False) = 0

� Valid propositions have probability 1

� Unsatisfiable propositions have probability 0

3. P(A∨B) = P(A)+P(B)−P(A∧B)

� Can determine probabilities of all other propositions

� For example, P(A∨¬A) = P(A)+P(¬A)−P(A∧¬A)

P(True) = P(A)+P(¬A)−P(False)

1 = P(A)+P(¬A)−0

Therefore P(¬A) = 1−P(A)

UNSW ©W. Wobcke et al. 2019–2021

COMP9414 Uncertainty 13

Joint Probability Distribution

� Complete specification of probabilities to all events in domain

� Suppose random variables X1, X2, · · · , Xn

� An atomic (simple) event is an assignment of values to all variables

� Joint probability distribution P(X1, X2, · · · , Xn) assigns probabilities

to all possible atomic events

� Simple medical domain with two Boolean random variables

AbdominalPain ¬AbdominalPain

Appendicitis 0.04 0.06

¬Appendicitis 0.01 0.89

UNSW ©W. Wobcke et al. 2019–2021

COMP9414 Uncertainty 12

Normalization

TTT

TTH

HTH

THT
THH

HTT

HHT

HHH

0.1/0.45=2/9

0.15/0.45=3/9

0.1/0.45=2/9

0.1/0.45=2/9

0.0

0.0

0.0

0.0

� Conditional probability distribution given that first coin is H

UNSW ©W. Wobcke et al. 2019–2021

COMP9414 Uncertainty 15

Joint Probability Distribution

Assume there is some underlying joint probability distribution over three

random variables toothache, cavity and catch, which we can write in the

form of a table

cavity

L

toothache

cavity

catch catch

L

toothache

L

catch catch

L

.108 .012

.016 .064

.072

.144

.008

.576

Note that the sum of the entries in the table is 1.0

For any proposition, sum the atomic events where it is true

UNSW ©W. Wobcke et al. 2019–2021

COMP9414 Uncertainty 14

Joint Probability Distribution

� Simple events are mutually exclusive and jointly exhaustive

� Probability of complex event is sum of probabilities of compatible

simple events

P(Appendicitis) = 0.04+0.06 = 0.10

P(Appendicitis∨AbdominalPain) = 0.04+0.06+0.01 = 0.11

P(Appendicitis|AbdominalPain)

=
P(Appendicitis∧AbdominalPain)

P(AbdominalPain)
= 0.04

0.04+0.01
= 0.8

� Problem: With many variables, the number of probabilities is vast

UNSW ©W. Wobcke et al. 2019–2021

COMP9414 Uncertainty 17

Inference by Enumeration Example

cavity

L

toothache

cavity

catch catch

L

toothache

L

catch catch

L

.108 .012

.016 .064

.072

.144

.008

.576

For any proposition, sum the atomic events where it is true

P(cavity∨ toothache)

= 0.108+0.012+0.072+0.008+0.016+0.064 = 0.28

UNSW ©W. Wobcke et al. 2019–2021

COMP9414 Uncertainty 16

Inference by Enumeration Example

cavity

L

toothache

cavity

catch catch

L

toothache

L

catch catch

L

.108 .012

.016 .064

.072

.144

.008

.576

For any proposition, sum the atomic events where it is true

P(toothache) = 0.108+0.012+0.016+0.064 = 0.2

UNSW ©W. Wobcke et al. 2019–2021

COMP9414 Uncertainty 19

Bayes’ Rule

P(B|A) =
P(A|B)P(B)

P(A)

� AI systems abandon joint probabilities and work directly with

conditional probabilities using Bayes’ Rule

� Deriving Bayes’ Rule:

P(A∧B) = P(A|B)P(B) (Definition)

P(B∧A) = P(B|A)P(A) (Definition)

So P(A|B)P(B) = P(B|A)P(A) since P(A∧B) = P(B∧A)

Hence P(B|A) =
P(A|B)P(B)

P(A)
if P(A) 6= 0

� Note: If P(A) = 0, P(B|A) is undefined

UNSW ©W. Wobcke et al. 2019–2021

COMP9414 Uncertainty 18

Conditional Probability by Enumeration

cavity

L

toothache

cavity

catch catch

L

toothache

L

catch catch

L

.108 .012

.016 .064

.072

.144

.008

.576

P(¬cavity|toothache) =
P(¬cavity∧ toothache)

P(toothache)

=
0.016+0.064

0.108+0.012+0.016+0.064
= 0.4

UNSW ©W. Wobcke et al. 2019–2021

COMP9414 Uncertainty 21

Normalization

� Avoiding assessment of symptoms

P(Meningitis|StiffNeck) =
P(StiffNeck|Meningitis).P(Meningitis)

P(StiffNeck)

P(¬Meningitis|StiffNeck) =
P(StiffNeck|¬Meningitis).P(¬Meningitis)

P(StiffNeck)

� P(StiffNeck) = P(StiffNeck|Meningitis).P(Meningitis)

+P(StiffNeck|¬Meningitis).P(¬Meningitis)

� So P(Meningitis|StiffNeck) =
P(StiffNeck|Meningitis).P(Meningitis)

P(StiffNeck|Meningitis).P(Meningitis)+P(StiffNeck|¬Meningitis).P(¬Meningitis)

Similarly for P(¬Meningitis|StiffNeck)

� Therefore, from both P(StiffNeck| · · ·) can derive P(StiffNeck)

and the denominator is a normalization factor

UNSW ©W. Wobcke et al. 2019–2021

COMP9414 Uncertainty 20

Applying Bayes’ Rule

� Example (Russell & Norvig, 1995)

� Doctor knows that

– meningitis causes a stiff neck 50% of the time

– chance of patient having meningitis is 1
50000

– chance of patient having a stiff neck is 1
20

� P(StiffNeck|Meningitis) = 0.5

P(Meningitis) = 1
50000

P(StiffNeck) = 1
20

P(Meningitis|StiffNeck) =
P(StiffNeck|Meningitis).P(Meningitis)

P(StiffNeck)

= 0.5 1
50000

1
1
20

= 0.0002

UNSW ©W. Wobcke et al. 2019–2021

COMP9414 Uncertainty 23

Conditional Independence

� Appendicitis is direct cause of both abdominal pain and nausea

� If we know a patient is suffering from appendicitis, the probability

of nausea should not depend on the presence of abdominal pain;

likewise probability of abdominal pain should not depend on nausea

� Nausea and abdominal pain are conditionally independent given

appendicitis

� An event X is independent of event Y , conditional on background

knowledge K, if knowing Y does not affect the conditional probability

of X given K:

P(X |K) = P(X |Y,K)

UNSW ©W. Wobcke et al. 2019–2021

COMP9414 Uncertainty 22

Using Bayes’ Rule

� Suppose there are two conditional probabilities for appendicitis

P(Appendicitis|AbdominalPain) = 0.8

P(Appendicitis|Nausea) = 0.1

� P(Appendicitis|AbdominalPain∧Nausea)

=
P(AbdominalPain∧Nausea|Appendicitis).P(Appendicitis)

P(AbdominalPain∧Nausea)

� Need to know P(AbdominalPain∧Nausea|Appendicitis)

� With many symptoms that is a daunting task . . .

UNSW ©W. Wobcke et al. 2019–2021

COMP9414 Uncertainty 25

Bayesian Networks

� Example (Pearl, 1988)

� You have a new burglar alarm at home that is quite reliable at

detecting burglars but may also respond at times to an earthquake.

You also have two neighbours, John and Mary, who promise to call

you at work when they hear the alarm. John always calls when he

hears the alarm but sometimes confuses the telephone ringing with

the alarm and calls then, also Mary likes loud music and sometimes

misses the alarm. Given the evidence of who has or has not called,

we would like to estimate the probability of a burglary.

UNSW ©W. Wobcke et al. 2019–2021

COMP9414 Uncertainty 24

Bayesian Networks

� A Bayesian network (also Bayesian Belief Network, probabilistic

network, causal network, knowledge map) is a directed acyclic graph

(DAG) where

◮ Each node corresponds to a random variable

◮ Directed links connect pairs of nodes – a directed link from node

X to node Y means that X has a direct influence on Y

◮ Each node has a conditional probability table quantifying effect of

parents on node

� Independence assumption of Bayesian networks

◮ Each random variable is (conditionally) independent of its

nondescendants given its parents

UNSW ©W. Wobcke et al. 2019–2021

COMP9414 Uncertainty 27

Conditional Probability Table

� Row contains conditional probability of each node value for a

conditioning case (possible combination of values for parent node)

P(Alarm|Burglary∧Earthquake)

Burglary Earthquake True False

True True 0.950 0.050

True False 0.940 0.060

False True 0.290 0.710

False False 0.001 0.999

UNSW ©W. Wobcke et al. 2019–2021

COMP9414 Uncertainty 26

Bayesian Networks

� Example (Pearl, 1988)

Burglary
P(Burglary)
0.001

Earthquake
P(Earthquake)
0.002

Alarm
Burglary
True
True
False
False

Earthquake
True
False
True
Flase

P(Alarm)
0.95
0.94
0.29
0.001

JohnCalls MaryCalls
Alarm
True
False

P(JohnCalls)
0.90
0.05

Alarm
True
False

P(MaryCalls)
0.70
0.01

� Probabilities summarize potentially infinite set of possible circum-

stances

UNSW ©W. Wobcke et al. 2019–2021

COMP9414 Uncertainty 29

Semantics of Bayesian Networks

� Factorization of joint probability distribution

� Chain Rule: Use conditional probabilities to decompose conjunctions

P(X1∧X2∧·· ·∧Xn) = P(X1).P(X2|X1).P(X3|X1∧X2). · · · .P(Xn|X1∧

X2 ∧·· ·∧Xn−1)

� Now, order the variables X1, X2, · · · , Xn in a Bayesian network so

that a variable comes after its parents – let πXi be the tuple of parents

of variable Xi (this is a complex random variable)

Using the chain rule, P(X1 ∧X2 ∧ ·· · ∧Xn) = P(X1).P(X2|X1).

P(X3|X1 ∧X2). · · · .P(Xn|X1 ∧X2 ∧·· ·∧Xn−1)

UNSW ©W. Wobcke et al. 2019–2021

COMP9414 Uncertainty 28

Semantics of Bayesian Networks

� Bayesian network provides a complete description of the domain

� Joint probability distribution can be determined from the network

◮ P(X1, X2, · · · , Xn) = ∏
n
i=1 P(Xi|Parents(Xi))

� For example, P(J ∧M∧A∧¬B∧¬E) =

P(J|A).P(M|A).P(A|¬B∧¬E).P(¬B).P(¬E) =

0.90×0.70×0.001×0.999×0.998 = 0.000628

� Bayesian network is a complete and non-redundant representation

of domain (and can be far more compact than joint probability

distribution)

UNSW ©W. Wobcke et al. 2019–2021

COMP9414 Uncertainty 31

Calculation using Bayesian Networks

� Fact 1: Consider random variable X with parents Y1, Y2, · · · , Yn

P(X |Y1 ∧·· ·∧Yn ∧Z) = P(X |Y1 ∧·· ·∧Yn)

if Z doesn’t involve a descendant of X (including X itself)

� Fact 2: If Y1, · · · ,Yn are pairwise disjoint and exhaust all possibilities

P(X) = ΣP(X ∧Yi) = ΣP(X |Yi).P(Yi)

P(X |Z) = ΣP(X ∧Yi|Z)

◮ e.g. P(J|B) =
P(J∧B)

P(B)
=

ΣP(J∧B∧e∧a∧m)
ΣP( j∧B∧e∧a∧m)

where j ranges over J,¬J,

e over E,¬E, a over A,¬A and m over M,¬M

UNSW ©W. Wobcke et al. 2019–2021

COMP9414 Uncertainty 30

Semantics of Bayesian Networks

� Each P(Xi|X1 ∧X2 ∧ ·· · ∧ Xi−1) has the property that it is not

conditioned on a descendant of Xi (given the ordering of variables in

the Bayesian network)

� Therefore, by conditional independence

◮ P(Xi|X1 ∧X2 ∧·· ·∧Xi−1) = P(Xi|πXi)

� Rewriting gives the chain rule

◮ P(X1, X2, · · · , Xn) = ∏
n
i=1 P(Xi|πXi)

UNSW ©W. Wobcke et al. 2019–2021

COMP9414 Uncertainty 33

Calculation using Bayesian Networks

� P(¬J∧B∧E ∧A∧M) = 0.000000133

� P(¬J∧B∧¬E ∧A∧M) = 6.56684×10−5

� P(¬J∧B∧E ∧¬A∧M) = 9.5×10−10

� P(¬J∧B∧¬E ∧¬A∧M) = 5.6886×10−7

� P(¬J∧B∧E ∧A∧¬M) = 0.000000057

� P(¬J∧B∧¬E ∧A∧¬M) = 2.81436×10−5

� P(¬J∧B∧E ∧¬A∧¬M) = 9.405×10−8

� P(¬J∧B∧¬E ∧¬A∧¬M) = 5.63171×10−5

UNSW ©W. Wobcke et al. 2019–2021

COMP9414 Uncertainty 32

Calculation using Bayesian Networks

� P(J ∧B∧E ∧A∧M) = P(J|A).P(B).P(E).P(A|B∧E).P(M|A) =

0.90×0.001×0.002×0.95×0.70 = 0.000001197

� P(J∧B∧¬E ∧A∧M) = 0.0005910156

� P(J∧B∧E ∧¬A∧M) = 5×10−11

� P(J∧B∧¬E ∧¬A∧M) = 2.99×10−8

� P(J∧B∧E ∧A∧¬M) = 0.000000513

� P(J∧B∧¬E ∧A∧¬M) = 0.000253292

� P(J∧B∧E ∧¬A∧¬M) = 4.95×10−9

� P(J∧B∧¬E ∧¬A∧¬M) = 2.96406×10−6

UNSW ©W. Wobcke et al. 2019–2021

COMP9414 Uncertainty 35

Inference in Bayesian Networks

Diagnostic Inference: From effects to causes

P(Burglary|JohnCalls) = 0.016

Causal Inference: From causes to effects

P(JohnCalls|Burglary) = 0.85; P(MaryCalls|Burglary) = 0.67

Intercausal Inference: Explaining away

P(Burglary|Alarm)= 0.3736 but adding evidence, P(Burglary|Alarm∧

Earthquake) = 0.003; despite the fact that burglaries and earthquakes

are independent, the presence of one makes the other much less likely

Mixed Inference: Combinations of the patterns above

Diagnostic + Causal: P(Alarm|JohnCalls∧¬Earthquake)

Intercausal + Diagnostic: P(Burglary|JohnCalls∧¬Earthquake)

UNSW ©W. Wobcke et al. 2019–2021

COMP9414 Uncertainty 34

Calculation using Bayesian Networks

� Therefore P(J|B) =
P(J∧B)

P(B)
=

ΣP(J∧B∧e∧a∧m)
ΣP( j∧B∧e∧a∧m)

= 0.00849017
0.001

� P(J|B) = 0.849017

� Can often simplify calculation without using full joint probabilities –

but not always

UNSW ©W. Wobcke et al. 2019–2021

COMP9414 Uncertainty 37

Example – Causal Inference

� P(JohnCalls|Burglary)

� P(J|B) = P(J|A∧B).P(A|B)+P(J|¬A∧B).P(¬A|B)

= P(J|A).P(A|B)+P(J|¬A).P(¬A|B)

= P(J|A).P(A|B)+P(J|¬A).(1−P(A|B))

� Now P(A|B) = P(A|B∧E).P(E|B)+P(A|B∧¬E).P(¬E|B)

= P(A|B∧E).P(E)+P(A|B∧¬E).P(¬E)

= 0.95×0.002+0.94×0.998 = 0.94002

� Therefore P(J|B) = 0.90×0.94002+0.05×0.05998 = 0.849017

� Fact 3: P(X |Z) = P(X |Y ∧Z).P(Y |Z)+P(X |¬Y ∧Z).P(¬Y |Z), since

X ∧Z ≡ (X ∧Y ∧Z)∨ (X ∧¬Y ∧Z) (conditional version of Fact 2)

UNSW ©W. Wobcke et al. 2019–2021

COMP9414 Uncertainty 36

Inference in Bayesian Networks

Q

Q

Q

Q

E

E E E

E

Diagnostic Causal

Intercausal

Mixed

� Q = query; E = evidence

UNSW ©W. Wobcke et al. 2019–2021

COMP9414 Uncertainty 39

Conclusion

� Due to noise or uncertainty it is useful to reason with probabilities

� Calculating with joint probability distribution difficult due to the large

number of values

� Use of Bayes’ Rule and independence assumptions simplifies

reasoning

� Bayesian networks allow compact representation of probabilities and

efficient reasoning with probabilities

� Elegant recursive algorithms can be given to automate the process of

inference in Bayesian networks

UNSW ©W. Wobcke et al. 2019–2021

COMP9414 Uncertainty 38

Example – Diagnostic Inference

� P(Earthquake|Alarm)

� P(E|A) =
P(A|E).P(E)

P(A)

=
P(A|B∧E).P(B).P(E)+P(A|¬B∧E).P(¬B).P(E)

P(A)

= 0.95×0.001×0.002+0.29×0.999×0.002
P(A)

= 5.8132×10
−4

P(A)

� Now P(A) = P(A|B∧E).P(B).P(E)+P(A|¬B∧E).P(¬B).P(E)+

P(A|B∧¬E).P(B).P(¬E)+P(A|¬B∧¬E).P(¬B).P(¬E)

And P(A|B∧¬E).P(B).P(¬E)+P(A|¬B∧¬E).P(¬B).P(¬E)

= 0.94×0.001×0.998+0.001×0.999×0.998 = 0.001935122

So P(A) = 5.8132×10−4 +0.001935122 = 0.002516442

� Therefore P(E|A) = 5.8132×10
−4

0.002516442
= 0.2310087

� Fact 4: P(X ∧Y ) = P(X).P(Y ) if X , Y are conditionally independent

UNSW ©W. Wobcke et al. 2019–2021