CS代考 Reasoning under Uncertainty

Reasoning under Uncertainty
[AIMA 4G] Chapter 12

COSC1127/1125
Artificial Intelligence
Semester 2, 2021
Prof.
* Most of the slides are adapted from those of and . There are also some based on ‘s ones. Thanks!

Wominjeka!
Week 7
Probabilities
Prof.

‘Treaty’, by
1991

Something from the past?

IMPORTANT DISCLAIMER
These pack of slides are NOT core study
material. The core & main material is the book or any specific resources given for the week.
The slides in this presentations were prepared only for supporting the instructor during the lectorial and only some of them have been used in the session as needed.
These pack of slides do not represent the complete material of the week, but only a few bits here and there around what was discussed in lectorials. Most of the week content will NOT show up in this pack of slides.
Please refer to the book (and any resources specifically given) for a comprehensive coverage of the week material.

I have Pacman dreams…

Some news…

Project 2: marking is being sent…
Project Contest:
Released already.
Groups of 3 (2 or 4 in special cases).
THE: remember to make yourself available
Project 3 (RL): week 9-10
Will be done individually (as Project 1).
Optional: 3% course bonus marks.

*

Some feedback on project 2…
This project doesn’t really lend itself too well to being a group project: questions depend on each other a log.
NOTE: team-work does not directly imply division of tasks but sharing of ideas to get to the solution jointly/together.
Much much better than assignment 1. Having teammates to work together with was essential during lockdown where we can’t come in face to face and discuss tricky concepts with tutors in realtime.
A very fun project, however even though it was a group project this time, the amount of work seemed about equal (maybe even potentially less) than P1.
Interesting project and nice variety of algorithms to learn
Otherwise, I enjoyed this assignment very much, because its open-ended nature gave a very stimulating and educational experience!
A lot of the pressure from the first assignment from having to “tough it alone” felt removed in this assignment, so it was nice to be able to focus on overcoming challenges and learning through working as a team.

Final Contest: Conquer the Flag
Worth 45%: 10% + 15% + 10% + 10%
ranking relative to reference staff agents + Wiki + Video
You MUST start NOW!
Will run “continuous” feedback tournaments:
submit your agent, see how it performs, fix, resubmit, …
Three marked submissions:
Preliminary contest (Week 9): 10%
Final contest (Week 12): 15%
Wiki report + Video (Week 13): 10% + 10%
Top performers will:
be inducted to the Inter-Uni Pacman Hall of Fame.
compete (for fun) with Mel Uni and RMIT winners!

LET THE SHOW BEGIN!
START NOW!!

Groups for Contest Project
Groups: 3 members.
Exceptions per-case

Textual collaboration will be required in GitHub Teams.
Voice and video can be done elsewhere; of course.
Poor SE+GIT will be lose marks.
little communication, poor commit history, unprofessional.

Remember to link your account well

Contributions for final project
Will have major impact.
Use co-authors feature in GitHub.
Change the driver.
Use Visual Code Live Sharing
Discuss in GitHub:
Teams.
Issues.
Projects.

Questions?

*

Topics for today…
“When it is not in our power to determine what is true, we ought to follow what is most probable.” ― René Descartes
Sample space & events.
Probability functions.
Random variables.
Probability distributions.
Joint distributions.
Conditional probabilities.
Inference:
By enumeration.
Bayes rule.

All this will be assessed in THE

Monday Sept 20th @ 9am – 9pm (12 hrs!).
No repeats or exceptions after it closes.
Make yourself available now for ~3hrs (if sharp).
Don’t take risks by leaving it too late.
Recommended to submit by 6pm at most.
Topics Week 1 – 8 (included).
Content from Book & Tutorials; no Pacman!
Answers entered ONLINE in a Google Form/Quiz:
Can use paper & pen for your notes.
Make sure you are logged into your RMIT Google Account (not your private account!)
You can submit once (as if handing in on paper)
No other submission allowed (e.g., email)
Will provide a “Sample” Exercise for dry-run.
No negative marking will be used! 🙂

Reasoning under Uncertainty
“When it is not in our power to determine what is true, we ought to follow what is most probable.” ― René Descartes

Theoretical reasoning:
use of reason to decide what to believe.

Practical reasoning:
use of reason to decide how to act.

*

A Decision Making Scenario
You are considering to buy a used car…

Is it in good condition?
How much are you willing to pay?

Should you get it inspected by a mechanics?

Should you buy the car?

Another Decision Making Scenario

How long does it take to get to the airport?
How long before the flight should you leave?

30 minutes? “Might be okay’’
90 minutes? “Should be fine”
150 minutes? “Yes, that will work’’
12 hours? “What! Way too early!!’’

*

Introduction
Logical reasoning “breaks down” when dealing with uncertainty and inductive reasoning.

A Diagnosis example:
∀p Symptom(p,Toothache) ⇒ Disease(p, Cavity)
But not all people with toothaches have cavities…
∀p Symptom(p, Toothache) ⇒ Disease(p,Cavity) v Disease(p,-Gumdisease) v Disease(p, Hit-in-Jaw) v …
Can’t enumerate all possible causes and not very informative!
∀p Disease(p, Cavity) ⇒ Symptom(p,Toothache)
Does not work since not all cavities cause toothaches…

Issues with pure logical reasoning
We are lazy.
Too much work to write down all antecedents and consequences.

Theoretical ignorance.
Sometimes there’s just no complete theory.

Practical ignorance.
Even if we knew all the rules, we might be uncertain about a particular instance (not collected enough information yet).

Probabilities to the rescue
For many years AI danced around the fact that the world is an uncertain place.
Then a few AI researchers decided to go back to the 18th century:
Revolutionary.
Probabilities allow us to deal with uncertainty that comes from our laziness and ignorance.
Clear semantics.
Provide principled answers for:
combining evidence, predictive and diagnostic reasoning, incorporation of new evidence
Can be learned from data.
Intuitive for humans (?)estas

Linda problem
Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations.
Which is more probable?
Linda is a bank teller.
Linda is a bank teller and is active in the feminist movement.
1771 5697 @ menti.com

Probability
Balance between probability and utility:
Too late, and we risk missing the plane..
Too early, and we waste too much time..

Need to use evidence of prior experience to predict:
‘’80% of people with a toothache have a cavity’’
Toothache ⇒ Cavity ??
Toothache ⇒ Cavity ∨ Gum Problem ∨ Punch ∨ …

*

Sample Space and Events
Sample space Ω: all possible outcomes of an experiment.
e.g., roll dice experiment: Ω = {1,2,3,4,5,6}
for each w є Ω we have a probability P(w) with:
0 ≤ P(w) ≤ 1 and ∑w P(w) = 1

Event E: a set of outcomes with certain probability P(E)
Any subset of Ω is a possible event.
e.g., E = {1,2,3,4} is the event that dice comes < 5. Calculate probability of an event: P(E) = ∑w є E P(w). P({1,2,3,4}) = P(1) + P(2) + P(3) + P(4) + = 4/6 = 0.6 Visualizing Event E Sample space Ω Event E Worlds in which A is false Worlds in which E is true. It’s area is P(E) Event space of all possible outcomes. It’s area is 1 Discrete Random Variables Not convenient to deal with (large) sets of events! Random variable X describes an outcome that cannot be determined in advance (i.e., roll of a dice): Discrete: possible values from a countable domain. If X is the outcome of a dice throw, then X ∈ {1,2,3,4,5,6} Boolean random variable X ∈ {True, False} X = dice will come less than 5 X = The Australian PM in 2040 will be female X = Patient have Ebola Discrete Random Variables Think of a random variable as a proposition representing the event when proposition is true: Proposition X: dice outcome is less than 5. X values: {true, false}. X is representing event E = {1,2,3,4} X is true for every element of E As in logic we can combine those propositions! Two random variables: ODD and X (less than 5) P(ODD = true ∧ X = true) = ? P(ODD = true ∨ X = true) = ? P(a ∨ b) = P(a) + P(b) – P(a ∧ b) * Probabilities for beliefs! Agents believe things about the world! Logic may be too black/white sometimes. We let P(A) denote the “degree of belief” we have that statement A is true: Also “fraction of worlds in which A is true” Note: P(A) does NOT model degree of truth! Example: Draw a card from a shuffled deck The card is of some type (e.g ace of spades) Before looking at it P(ace of spades) = 1/52 After looking at it P(ace of spades) = 1 or 0 Philosophers like to discuss this (but we won’t) Reasoning with Probability Suppose an agent knows the basic probabilities: knows P(X1), P(X2), ..., P(Xn) What is the probability of complex beliefs? P((X1 ∨ X8) ∧ (X3 ∨ ¬X1) —> X9) = ?

Example:
X1 = dice is less than 3
X2 = dice is less than 5
What is P(X1 —> X2) = ?
What is P(X2 —> X1) = ?

The Axioms of Probability
For any random variable A: 0 ≤ P(A) ≤ 1
P(True) = 1 (you can see True = A ∨ ¬A)
P(False) = 0 (you can see False = A ∧ ¬A)
P(A ∨ B) = ???
P(A ∨ B) = P(A) + P(B) – P(A Λ B)

These axioms limit the class of functions that can be considered as probability functions

Interpreting the axioms

U = sample space, the universe. Its area is P(U) = 1
A is a random variable. Its area is 0 <= P(A) <= 1 P(A v B) = P(A) + P(B) - P(A Λ B) Take the axioms seriously! There have been attempts to use different methodologies for uncertainty Fuzzy logic, three valued logic, Dempster-Shafer, non-monotonic reasoning, … But if you follow the axioms of probability then no one can take advantage of you ☺ You can always calculate what the best "bet" is. Then you act accordingly. In the long run, it is the "best you can do"... Theorems from the axioms Theorem: P(¬A) = 1 - P(A) Proof: P(A V ¬A) = P(A) + P(¬A) - P(A Λ ¬A) P(True) = P(A) + P(¬A) - P(False) 1 = P(A) + P(¬A) - 0 P(¬A) = 1 - P(A) PROVED! Theorems from the axioms Theorem: P(A) = P(A Λ B) + P( A Λ ¬B) Proof: for you to do! Why? Because it is good for you Some issues leading to conditional probabilities... Where does each P(X) come from? How can we extract P(A Λ B) ? P(A|B): How can I infer the probability of A given that I know that B is already true? Chance of having cavities given that I have toothache. Easy! I have done all my practicals on logic! Just calculate P(B → A) = P(¬B ∨ A) ! Linda problem formally... A = “Linda is 31 years old, single, outspoken, and …” B1 = “Linda is a bank teller.” B2 = “Linda is a bank teller and is active in the feminist movement.” Which is higher? P(B1 | A). P(B2 | A). Conjunction Fallacy A = “Linda is 31 years old, single, outspoken, and …” B1 = “Linda is a bank teller.” B2 = “Linda is a bank teller and is active in the feminist movement.” But, observe that B2 = B1 Λ C where: C = Linda is active in the feminist movement. Can it ever be the case that P(B1 Λ C) > P(B1) ???

Where does P(X) come from?
Prior probabilities come from probability distribution.

Weather is one of Sunny, Rain, Cloudy, Snow

P(Weather) = < 0.6, 0.1, 0.29, 0.01 >

Weather Sunny Rain Cloudy Snow
Probability 0.6 0.1 0.29 0.01

Need to sump up to 1!

*

Terminology
Probability distribution:
A specification of a probability for each event in our sample space.
Probabilities must sum to 1!

Assume the world is described by two (or more) random variables:
Joint probability distribution
Specification of probabilities for all combinations of events.

Example: Joint Distribution
cold ¬cold
headache 0.108 0.012
¬headache 0.016 0.064

cold ¬cold
headache 0.072 0.008
¬headache 0.144 0.576

sunny
¬sunny
P(headache Λ sunny Λ cold) = 0.108
P(headache V sunny) = 0.108 + 0.012 + 0.072 + 0.008 + 0.016 + 0.064 = 0.28
P(headache) = 0.108 + 0.012 + 0.072 + 0.008 = 0.2
marginalization
P(headacheˬsunnyˬcold) = 0.008

*
Joint distribution

Given two random variables A and B:

Joint distribution:
Pr(A=a Λ B=b), for all a,b

Marginalisation (sum-out rule):
Pr(A = a) = Σb Pr(A=a Λ B=b)
Pr(B = b) = Σa Pr(A=a Λ B=b)

Conditional Probability
P(A | B) = probability of A being true given that we know B.
For example: P(cavity | toothache Λ headache)
More general: P(illness | symptoms)
where is the problem?
Easy! Just compute P(B → A)!
P(B → A) = P(¬B v A) = P(¬B) + P(A) – P(¬B Λ A)

P(A|B) & P(B → A)
P(¬flies | bird): proportion of birds that do not fly, which would be low.

P(bird → ¬flies) = P(¬bird ∨ ¬flies): would be dominated by non-birds and so would be high!

P(bird →flies): would also be high, the probability also being dominated by the non-birds.

Conditional Probability
P(A|B) fraction of worlds in which B is true that also have A true.

H
F
H=“Have headache”
F=“Have Flu”

P(H) = 1/10
P(F) = 1/40
P(H|F) = 1/2
Headaches are rare and flu is rarer, but if you have the flu, then there is a 50-50 chance you will have a headache.

Conditional Probability
H=“Have headache”
F=“Have Flu”

P(H) = 1/10
P(F) = 1/40
P(H|F) = 1/2

H
F
P(H|F) = Fraction of flu inflicted
worlds in which you have a
headache

= (# worlds with flu and headache)/
(# worlds with flu)

= (Area of “H and F” region)/
(Area of “F” region)

= P(H Λ F) / P(F)

Conditional Probability

Definition:
P(A|B) = P(A Λ B) / P(B)

Chain rule:
P(A Λ B) = P(A|B) P(B)

Memorize these!

Inference
H=“Have headache”
F=“Have Flu”

P(H) = 1/10
P(F) = 1/40
P(H|F) = 1/2
One day you wake up with a
headache. You think “Drat! 50% of flues are associated with headaches so I must have a 50-50 chance of coming down with the flu”

H
F
Is your reasoning correct?

Inference
H=“Have headache”
F=“Have Flu”

P(H) = 1/10
P(F) = 1/40
P(H|F) = 1/2
One day you wake up with a
headache. You think “Drat! 50% of flues are associated with headaches so I must have a 50-50 chance of coming down with the flu”

H
F
P(FΛH) = P(F)P(H|F) = 1/40*1/2=1/80

P(F|H) = P(FΛH)/P(H) = 10/80 = 1/8

Computing conditional probabilities
Generally we want to compute query variables Y given evidence variables E, where evidence E has some value e.
There may be hidden (i.e. unobserved) vars H
H = U – Y – E
If we had the joint probability distribution then could marginalize:
P(Y|E=e) = α Σh P(Y Λ E=e Λ H=h)
(α is the normalization factor)

Computing conditional probabilities
cold ¬cold
headache 0.108 0.012
¬headache 0.016 0.064

cold ¬cold
headache 0.072 0.008
¬headache 0.144 0.576

sunny
¬sunny
P(headache Λ cold | sunny) =
= P(headache Λ cold Λ sunny) / P(sunny)
= 0.108/(0.108+0.012+0.016+0.064)
= 0. 54
P(headache Λ cold | ¬sunny) =
= P(headache Λ cold Λ ¬sunny) / P(¬sunny)
= 0.072/(0.072+0.008+0.144+0.576)
= 0.09

Inference by Enumeration
toothache ¬toothache
catch ¬catch catch ¬catch
cavity 0.108 0.012 0.072 0.008
¬ cavity 0.016 0.064 0.144 0.576

P(toothache) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2
For a proposition Φ, add up events where it holds:

P(Φ) = ∑ w: w ╞ Φ P(w)

*

Inference by Enumeration
toothache ¬toothache
catch ¬catch catch ¬catch
cavity 0.108 0.012 0.072 0.008
¬ cavity 0.016 0.064 0.144 0.576

For a proposition Φ, add up events where it holds

P(Φ) = ∑ w: w ╞ Φ P(w)
P(cavity ∨ toothache) = 0.108 + 0.012 + 0.016 + 0.064 + 0.072 + 0.008 = 0.28

*

Inference by Enumeration
toothache ¬toothache
catch ¬catch catch ¬catch
cavity 0.108 0.012 0.072 0.008
¬ cavity 0.016 0.064 0.144 0.576

P(¬cavity | toothache) = P(¬cavity ∧ toothache) / P(toothache)
= 0.016 + 0.064 / 0.2
= 0.4

*

Calculating Distributions
toothache ¬toothache
catch ¬catch catch ¬catch
cavity 0.108 0.012 0.072 0.008
¬ cavity 0.016 0.064 0.144 0.576

P(cavity | toothache) = P(cavity ∧ toothache) / P(toothache)
= 0.108 + 0.012/ 0.2
= 0.6

Final distribution: P(Cavity | toothache) = < 0.6, 0.4 >

*

Calculating Distributions

Let U be the set Cavity, Toothache,Catch
H = U – Y – E

P(Y | E = e) = α P(Y, E = e) = α ∑h P(Y, E = e, H = h)

As U = E ⋃ Y ⋃ H, this sum can be computed from the table.

Could be quite large!

*

Bayes rule: Motivation
Suppose we want to know P(H | E), where:
E represents symptoms evidence (‘toothache’)
H represents illness or condition (‘cavity’)

… but we often have P(E | H):
95% of the people with cavity, experience toothache.
82% of patients with X disease, get positive in test T (“test T is 82% accurate”).

Want to form a hypothesis about the world (H) based on what we have observed (E).

Bayes Rule

From P(A|B)P(B) = P(AΛB) = P(BΛA) = P(B|A)P(A)
Bayes rule/theorem allows us to “invert” the conditional probability.

Memorize this!!

Using Bayes Rule for inference
Bayes rule: allows us to state the belief given to hypothesis H, given the evidence e.

Posterior probability
Prior probability
Normalizing constant
Likelihood

Example
A doctor knows that the blue flu causes a fever 95% of the time.
She knows that if a person is selected at random from the population, they have a 10-7 chance of having the blue flu.
She knows that 1 in 100 people suffer from a fever.

You go to the doctor complaining about the symptom of having a fever.
What is the probability that the blue flu is the cause of the fever?

A doctor knows that the blue flu causes a fever 95% of the time.
P(F|B) = .95
She knows that if a person is selected at random from the population, they have a 10-7 chance of having the blue flu.
P(B) = .0000001
She knows that 1 in 100 people suffer from a fever.
P(F) = 1/100
Example

Example
P(F|B) = .95
P(B) = .0000001
P(F) = 0.01

You go to the doctor complaining about the symptom of having a fever. What is the probability you are suffering from the blue flu?

Evidence = Symptom (F)
Hypothesis = Cause (B)
P(B | F) = P(F | B)P(B) / P(F)
= 0.95 x 10-7 / 0.01
= 0.95 x 10-5

Conclusions
“When it is not in our power to determine what is true, we ought to follow what is most probable.” ― René Descartes
Sample space & events.
Probability functions.
Random variables.
Probability distributions.
Joint distributions.
Conditional probabilities.
Inference:
By enumeration.
Bayes rule.

Whiteboard used in lectorials