Reasoning With Uncertainty
“So far as the laws of mathematics refer to reality they are not certain. And so far as they are certain, they do not refer to reality.” – Albert Einstein
Text Chapter 5
and 9: 9.2 only, to the end of 9.2.2
Logical Implication
Logical Implication
First order logic gives us the ability to derive new (not explicitly represented) axioms using logical inferences
In order to be able to use implication to generate conclusions our statements have to be axioms (completely true statements), and our logical system must be correct (no inconsistencies) and complete (no missing knowledge) in order to be able to use logical implication to generate conclusions
Everything we derive in FOL stems from the idea of logical inference – that we prove theorems that are a logical consequence of the axioms in the knowledge base
A A A A Theorem (the arrow is logical implication)
If the knowledge base is incomplete, the system will not be able to prove theorems
The truth of this theorem rests solidly on the
If the knowledge base is inconsistent, the conclusions will not necessarily be valid
mathematical certainty of logical inference
4 © 2020 J.E. Anderson
3 © 2020 J.E. Anderson
“All traditional logic habitually assumes that precise symbols are being employed. It is therefore not applicable to this terrestrial life but only to an imagined celestial existence.” – Bertrand Russell
Logical Implication
2 © 2020 J.E. Anderson
The Real World
Reasoning with Uncertain Knowledge
Does not work like this!
Most everything is uncertain to some
We need to be able to reason logically IN SPITE OF uncertainty in order to solve problems in the real world
degree
That is, we need to take uncertainty into account in our reasoning. Knowledge will not be guaranteed complete and correct (like the real world!)
So much so that the more you think about it the more you wonder why anything works at all
Yet we deal with this relatively easily in most cases
Would like to be able to generate conclusions that are correct most of the time with respect to the domain but which may not be logically implied by
Reasoning with Uncertain Knowledge
Reasoning with Uncertain Knowledge
To do this, the system must use additional supports when necessary to deal with uncertainty
The result of the inference process is a “conclusion”, not a formal theorem
These supports can be used to fill in the gaps in uncertain knowledge – to take the uncertainty into account
• the conclusion is probably (hopefully very often) true but is not guaranteed to be true
A s A s Conclusion additional supports for the reasoning process!
No matter what the source of the uncertainty, this type of reasoning falls under the general heading of Reasoning Under Uncertainty
© 2020 J.E. Anderson
© 2020 J.E. Anderson
the axioms 56
However there are really two broad types of such reasoning, and we’ll consider them separately
78 © 2020 J.E. Anderson © 2020 J.E. Anderson
Reasoning with Incomplete Knowledge
Reasoning with Incomplete Knowledge
One of the most common problems encountered in real world situations involves dealing with incomplete knowledge
However, there are exceptions to this rule; for example, ostriches cannot fly. We can add that to our rule (qualify it for ostriches):
In these situations, we have to draw conclusions in spite of the fact that we don’t have access to all potential information that might influence that decision
If bird(X) and ~ostrich(X) Then fly(X)
For example, we can represent the fact that birds can fly
There are additional exceptions, such as baby birds, birds with broken wings, dead birds (and many other examples that quickly stretch the bounds of good taste), etc.
If bird(X) Then fly(X)
If bird(X) and ~ostrich(X) and ~baby(X)
Reasoning with Incomplete Knowledge
Reasoning with Incomplete Knowledge
The number of such qualifications is essentially endless (there may even be endless qualifications to the qualifications)
How far do we go proving each of these before it’s acceptable? When do we stop? The process is essentially endless
• e.g. dead birds can’t fly, unless you’re in a cartoon, or they’re on a plane, or…
We need a huge amount of knowledge about the properties that Fred does not have in order to be able to make conclusions about the properties that Fred does have
Adding an infinite number of qualifiers isn’t possible
And think of the amount of work this involves! We can’t conclude that Fred can fly unless we can also prove that Fred is not an ostrich, is not a baby, is not dead, etc.
This is the Qualification Problem
We have to add extra supports to FOL to be
© 2020 J.E. Anderson
© 2020 J.E. Anderson
11 © 2020 J.E. Anderson
able to make inferences and avoid this
© 2020 J.E. Anderson
and ~dead(X) Then fly(X)
9 10
12
Negative vs. Positive knowledge
Closed World Assumption
In most knowledge bases, there is a small number of positive facts compared with the total number of facts
The closed world assumption states that if a fact is not represented explicitly in a knowledge base and the fact is not implied by the knowledge base, then the fact is assumed to be false
For example, an airline stores locations that it provides flights between:
flight(96, winnipeg, toronto)
flight(74, toronto, montreal) Not all the points it doesn’t
The closed world assumption allows a system to make “reasonable” assumptions about knowledge that is missing from the system
~flight(434,winnipeg,sandiego)
• We do not represent negative information such as there is no direct flight from Winnipeg to San Diego
or there is no flight from Winnipeg to Grand Beach
© 2020 J.E. Anderson
14 © 2020 J.E. Anderson
Closed World Assumption
Negation as Failure
So Fred would be assumed to be not an ostrich, not a baby, and not dead unless the system could prove otherwise
A problem with negation as failure is differentiating between a statement that is false and a statement for which the truth can’t be determined (we may have accidentally omitted the necessary knowledge, or not have developed that far yet). In an empty Prolog system, for example:
The system must still expend effort to prove that each of the facts is not true before it can assume that they are false and then conclude that Fred can fly – we must limit what we represent to fit the practical aspects of the problem
?- alive(tweety). no.
?- dead(tweety). no.
The closed-world assumption is implemented in Prolog as negation as failure: failure to prove a statement is taken to mean that the statement is
?- not alive(tweety). yes.
?- not dead(tweety). yes.
false 15 © 2020 J.E. Anderson
16 © 2020 J.E. Anderson
13
Negation as Failure
Negation as Failure
Unless we know explicitly that alive(tweety) is true or that dead(tweety) is true, the negation of alive(tweety) will be true and the negation of dead(tweety) will also be true
What’s the problem with using this in Prolog: If ostrich(X) Then not fly(X)
• this leads to multiple interpretations (via backtracking) of the knowledge which is typically not acceptable
Prolog does not permit rules that begin with a negative. But what if it did?
• we have to eliminate the multiple interpretations by including axioms that allow us to resolve them
When combined with the rule that all birds can fly this leads to a contradiction (via backtracking)
• Consider an example:
• e.g. find that Fred can fly because he is a bird…and then can’t because he is an ostrich.
Reasoning with Defaults
Reasoning with Defaults
In the previous example, there is no knowledge that indicates what knowledge should be assumed in the absence of other knowledge
Defeasible Reasoning
Default rules have the following form
Defaults can be used to supply knowledge that is missing from an incomplete knowledge base
This rule states that if p(X) is true and it can not be proved that X is abnormal (ab(X)), then conclude that g(X) is true
The implementation of defaults normally requires the use of negation as failure
If ab(X) does not exist as a fact or an inference, then the system can assume that g(X) is true
17 © 2020 J.E. Anderson
18 © 2020 J.E. Anderson
19 © 2020 J.E. Anderson
20 © 2020 J.E. Anderson
While the intent of this rule may be correct, it does not provide the correct result
If ( p(X) and ~ab(X) ) Then g(X)
Reasoning with Defaults
Reasoning with Defaults in Prolog
The fact that specific birds can not fly is represented explicitly
Default knowledge is represented in Prolog in a similar manner
If ostrich(X) Then flightless(X)
bird(X) :- ostrich(X).
flightless(X) :- ostrich(X).
fly(X) :- bird(X), not flightless(X).
The system will now conclude that ostriches can’t fly but that all other birds can fly
If (bird(X) and ~flightless(X)) Then fly(X)
The predicate flightless(X) is the abnormal predicate
Reasoning with Defaults in Prolog
Reasoning with Defaults in Prolog
In the previous example, there is only one valid interpretation for a specific set of facts and rules
For the statements below, 2 interpretations are possible (fly and flightless) for the same object because the default rule is not blocked by the flightless predicate
• either a bird can fly or it can’t fly
• both interpretations are not possible
Prolog will prove flightless for an ostrich and can separately prove fly
The abnormal predicate (flightless) blocks the use of the default rule (fly) when a bird is abnormal (i.e. can’t fly)
bird(X) :- ostrich(X). flightless(X) :- ostrich(X). fly(X) :- bird(X).
21 © 2020 J.E. Anderson
22 © 2020 J.E. Anderson
23 © 2020 J.E. Anderson
24 © 2020 J.E. Anderson
And the rule is set to conclude in a positive term as Prolog requires
Multiple Levels of Defaults
Multiple Extensions
Suppose some penguins have been genetically altered so that they are able to fly
Some problems cannot easily be defined so that they do not produce contradictions
genetic_penguin(opus).
This type of reasoning involves multiple extensions
penguin(X):- genetic_penguin(X).
bird(X) :- penguin(X).
e.g., the “Nixon Diamond” – Quakers are pacifists; Republicans are not pacifists; Nixon is a Quaker and a Republican (Ginsberg, p. 218)
can_fly(X) :- genetic_penguin(X).
flightless(X) :- penguin(X), not can_fly(X).
Is Nixon a pacifist or not?
Unless one rule blocks the other rule, both
fly(X) :- bird(X), not flightless(X).
Still only one possible interpretation for each bird – obviously gets tricky as we get to many levels of exceptions 25
interpretations are possible
classic multiple inheritance problem
26 © 2020 J.E. Anderson
Defeasible Reasoning
Monotonic Knowledge
A significant amount of care must go into the representation of reasoning with defaults in Prolog
Classical inference in first-order logic is monotonic
An extension of Prolog, d-Prolog, has been designed to provide the facilities required for default reasoning
Any sentence that is true in one knowledge base is also true in any knowledge base that is a superset of this knowledge base
d-Prolog takes advantage of the facility in Prolog for defining new operators
© 2020 J.E. Anderson
27 © 2020 J.E. Anderson
28 © 2020 J.E. Anderson
Adding new knowledge does not invalidate existing knowledge
Nonmonotonic Knowledge
Nonmonotonic Knowledge
A knowledge base that uses negation as failure is nonmonotonic
A nonmonotonic knowledge base should still have unique interpretations
Adding a new piece of knowledge may invalidate a previous conclusion
• A specific object can either fly or it can’t
bird(opus).
?- fly(opus).
yes.
penguin(opus).
?- fly(opus).
no. genetic_penguin(opus). ?- fly(opus).
yes.
• Both interpretations of the same knowledge should not be possible
Non-Monotonic Systems
Incomplete vs Imprecise Knowledge
Adding a new fact should not cause multiple interpretations in a properly defined system:
We have thus far been talking about what the knowledge base does and doesn’t contain, as opposed to the truth of statements in it
?- assert(bird(opus)). yes
?- fly(opus).
yes
So far we have assumed that at any specific point in time each statement is either completely true or false – nothing exists in between.
?- assert(penguin(opus)). yes
?- fly(opus).
no.
male(bill) parent(bill, sally)
?- assert(genetic_penguin(opus)). yes
?- fly(opus).
yes.
Easy (but not entirely accurate) to make assumptions there is no vagueness in these concepts – but most of the world is much less
29 © 2020 J.E. Anderson
© 2020 J.E. Anderson
31 © 2020 J.E. Anderson
black and white 32 © 2020 J.E. Anderson
Prolog has predicates to assert() and retract() specific predicates, allowing new facts to be defined on the fly, or deleted on the fly – see documentation for whatever Prolog you are using 30
Reasoning with Imprecise Knowledge
Representing Imprecise Knowledge
Much of the knowledge that is used in real world situations is not either true or false; instead, it is true some of the time but not all of the time
To represent imprecise knowledge, need to represent the degree of truth of a statement
Such knowledge is referred to as imprecise knowledge
We can represent this knowledge by assigning a numerical measure to each statement in some fashion
If sore-throat(X) then have-cold(X)
We first need to decide what kind of numeric scale we’re going to use…
We could use default rules to block this conclusion in cases we know about
For example, a true statement could have a measure of +1, a false statement a measure
of -1, and other statements would have
floating point measures between -1 and +1 34
More often want to know how strongly we believe you have a cold given you have sore throat
vs. if sore-throat(X) then throat-cancer(X), for 33 example! © 2020 J.E. Anderson
© 2020 J.E. Anderson
Representing Imprecise Knowledge
(1) Where do the numbers come from?
A statement with a measure of 0.9 would be true most of the time, but not all of the time
Statistics?
FT
Average over lots of experts?
Good general guesses?
e.g. why 0.85, and not 0.84 or 0.86?
-1 +1
Now that we have a scale, we can talk about two classic problems that come up whenever we reason with imprecise information….
35 © 2020 J.E. Anderson
36 © 2020 J.E. Anderson
Estimates based on lots of experience (e.g. ask our expert)
The Second Problem
Propagating and Combining Measures
What do we DO with these numbers when we make inferences?
And if I accept that I have strep throat but am not completely certain, and I have another rule that says that if I have strep throat, I should likely have some antibiotic, what’s the overall likelihood of needing the antibiotic?
e.g. if I have some imprecise measure of having a sore throat (i.e. I’m not certain I do, I think I might be getting one)
And I have a rule with some other likelihood that that means I have strep throat
Any scheme for dealing with imprecise reasoning must have some valid way of making inference after inference, propagating or combining the measures we put into each
Then what’s the overall likelihood that I have strep throat?
What about when I add other symptoms (e.g. I’m kind of fevery too)?
38 © 2020 J.E. Anderson
Certainty Factors
Certainty Factors
Certainty factors were developed for use in the Mycin expert system in order to represent and manipulate the numerical measures associated with imprecise statements
For example, the belief that I have a sore throat can be represented by including a certainty factor with a fact
A certainty factor is a measure of belief that a statement is true
sore-throat 0.9
A certainty factor may be associated with a simple fact or with a rule
Similarly, the belief that if I have a sore throat then I have a cold can be represented by including a certainty fact or with a rule
37 © 2020 J.E. Anderson
39 © 2020 J.E. Anderson
If (sore-throat) Then have-cold 0.75
© 2020 J.E. Anderson
40
Manipulating Certainty Factors
Manipulating Certainty Factors
The certainty factors going into a rule must be combined in such a way that the certainty factor of the outcome is reasonable, given the characteristics of the domain
Even within the general realm of certainty factors there are numerous means of combining these – they arose as ad hoc approaches for particular expert systems
• sore-throat 0.9
There is no “best” technique for combining certainty factors – different techniques work well in different domains and poorly in others
• If (sore-throat) Then have-cold so –
0.75
have-cold??
The Mycin developers found one technique that worked well much of the time though:
Remember these will also cascade through chains of rules, and the end outcome must still be
reasonable
41 © 2020 J.E. Anderson
42 © 2020 J.E. Anderson
Manipulating Certainty Factors
Manipulating Certainty Factors
If ( Ax ) Then Cy (where x, y are certainty factors)
If (Ax) Then C
The resulting CF of C is x*y – the certainty of the result of a rule is affected by the certainty of the facts in the rule’s condition
If (By) Then C
If ( Ax ^ By ) Then …
The resulting CF of the premise is min(x,y)
• Here we have 2 different rules being applied to give c
If ( Ax v By ) Then …
• A second independent piece of evidence indicating that disease further supports (in fact usually much more) our certainty in C
The resulting CF of the premise is max(x,y)
we’re only as certain as the lowest participant in a conjunction, but certain to the maximum participant in a disjunction
• Mathematically:
43 © 2020 J.E. Anderson
44 © 2020 J.E. Anderson
• Think of a medical domain, where I have a symptom and it indicates some disease
Manipulating Certainty Factors
Manipulating Certainty Factors
If (Ax) Then C If (By) Then C
If (A0.5) Then C
If (B0.75) Then C
TheCFofCis = .5+.75-(.5*.75) = 0.875
If the evidence is independent, the resulting CF of the premise is
If (A -0.5) Then C
If (B -0.75) Then C
TheCFofCis = -.5+-.75+(.5*.75) =
x+y-(x*y) if x and y > 0
The resulting CF of the premise is
x+y+(x*y) if x and y < 0
Otherwise, the resulting CF of the premise is
-0.875
(x+y)/(1-min(|x|,|y|))
Manipulating Certainty Factors
Certainty Factors
If (A -0.5) Then C
If (B0.75) Then C
TheCFofCis =(-.5+.75)/(1-0.5)= 0.5
A medical diagnosis system could make diagnoses based on the symptoms that are present:
45 © 2020 J.E. Anderson
46 © 2020 J.E. Anderson
47 © 2020 J.E. Anderson
If (chest-pain) then cardiac-arrest
If (left-arm-pain) then cardiac-arrest
If (left-arm-pain) then rotator-cuff-injury 48
If (left-arm-pain neck-pain) then spinal-cord-injury
If (shortness-breath chest-pain) then over-exertion
© 2020 J.E. Anderson
Certainty Factors
Certainty Factors
It is important to differentiate between symptoms that are independent and symptoms that are related
If (chest-pain) then cardiac-arrest
If (left-arm-pain) then cardiac-arrest
If (left-arm-pain ^ neck-pain) then spinal-cord-injury
There is a connection between the conditions: Both symptoms must be present for the conclusion to be true – the rule is useless without both
These two symptoms are independent: either symptom can lead to the conclusion and both symptoms together increase the confidence in the conclusion
If (shortness-breath chest-pain) then over-exertion
• These symptoms are related too, so it should not increase the confidence in the diagnosis if both are present – if they were independent we would put them in separate rules, and they'd be combined using the formula that increases confidence if both are present49
It is your decision to implement these conditions as separate rules (independent) or within one rule (dependent) depending on whether the evidence is independent or not 50
Certainty Factors
Certainty Factors
left-arm-pain(george)0.7 neck-pain(george)0.8
If (shortness-breath(X) v chest-pain(X)) then over-exertion(X)0.9
if (left-arm-pain(X) ^ neck-pain(X)) then
If we had: chest-pain(george)0.8
think about backward reasoning with this, e.g. in Prolog – now have to explore both branches
to get accurate picture of Certainty – both here and for rules with independent evidence!
spinal-cord-injury(X) spinal-cord-injury(george)0.7*0.9=.63
over-exertion(george)
0.9
0.9*0.8=.72
© 2020 J.E. Anderson
© 2020 J.E. Anderson
51 © 2020 J.E. Anderson
over-exertion(george)
0.9*0.7=.63
52 © 2020 J.E. Anderson
but if we had: shortness-breath(george)0.7
Certainty Factors
Certainty Factors
if (chest-pain(X)) then cardiac-arrest(X)0.9
if (left-arm-pain(X)) then cardiac-arrest(X)0.9
For the following symptoms: left-arm-pain(george)0.7 neck-pain(george)0.8 shortness-breath(george)0.7 chest-pain(george)0.8
With chest-pain(george)0.7
gives us cardiac-arrest(george)0.9*0.7=.63 With left-arm-pain(george)0.8
gives us cardiac-arrest(george)0.9*0.8=.72 combining these:
The following conclusions (diagnoses) would be made:
cardiac-arrest0.9*(0.7+0.8-(0.7*0.8))=.846
spinal-cord-injury(george)0.9*0.7=.63 over-exertion(george)0.9*0.8=.72
Certainty Factors
Prolog Certainty Factors
For this particular collection of symptoms, the most likely diagnosis for george is cardiac- arrest
Implementing certainty factors in Prolog requires a change to the predicates used to represent knowledge plus the addition of predicates to manipulate the certainty factors
If a system based on logical implication had been used instead, all 3 diagnoses would have been generated but there would not have been any information that indicated which diagnosis was most strongly believed
symptom(fact). /*or relation(X,Y) */
The certainties associated with the individual symptoms were (obviously!) not realistic - they were used solely to illustrate the process
symptom(fact, cf). /*or relation(X,Y,CF) */ symptom(neck-pain, 0.95).
53 © 2020 J.E. Anderson
cardiac-arrest(george)0.9*(0.7+0.8-(0.7*0.8))=.846 54 © 2020 J.E. Anderson
55 © 2020 J.E. Anderson
56 © 2020 J.E. Anderson
symptom(neck-pain). becomes
and CF”
© 2020 J.E. Anderson
© 2020 J.E. Anderson
Prolog Certainty Factors
Prolog Certainty Factors
Prolog rules need to be written to modify the certainties of conclusions and implement the mathematics we’ve seen
Because there will possibly be multiple rules with the same conclusions but different certainty factors (because they had different inputs), you may have to explore multiple possibilities and keep a list of them, and then look for the one with the highest CF
i.e. to bind a concluding CF, or evaluate the CF of a rule condition, you need to pass the participating CFs to relation(s) that will calculate the result for you
You will likely be doing this anyway since there are often different outcomes (like the different diagnoses of george) and you need to choose the most appropriate
Queries also have to have a place for this:
?-runsystem(E1,E2,Conclusion,CF)
“take E1 and E2 and bind results to Conclusion
Certainty Factors Summary
One Downside of Certainty Factors
Certainty factors permit the definition and manipulation of knowledge that is more complex than default knowledge
Recall our two basic problems in imprecise reasoning
Evidence can be combined from a variety of sources, with each piece of evidence contributing towards increasing or decreasing the confidence in the various conclusions
In CFs the measures may be from statistical sampling or may just be an estimate obtained from an expert in the domain
Conflicting information can be processed
The manipulation mechanisms are somewhat “ad hoc” -- they work but do not have a really strong theoretical basis
While the numbers may sometimes seem overprecise, they can be translated back into qualitative terms (e.g. 0.6 > X > 0.7 = somewhat-
• There are often “fudge factors” inserted,
like x*y+|n| when multiplying through a
rule, to avoid a CF getting too small too quickly 60
likely) 59 © 2020 J.E. Anderson
© 2020 J.E. Anderson
Certainty Factors vs. Probabilities
Probabilistic Approaches
Certainty factors are different from probabilities • a probability of 1 means true
• a probability of 0 means false
The advantage to more probabilistic approaches is that the combination mechanisms have a much more formal basis
A probability of 0.5 means that an event has a 50% chance of occurring
i.e. what we’re saying has a strong theoretical foundation, it’s not just “kind of nice”
A certainty factor of 0 means that we know nothing about whether the event will occur or not (the Jon Snow point!)
There are numerous probabilistic approaches, many based on Bayesian Reasoning
The major advantage of certainty factors over
probabilities is that combined probabilities
decrease towards 0 (false) while certainty factors
decrease towards 0 (not known) 61 © 2020 J.E. Anderson
62 © 2020 J.E. Anderson
Probabilistic Approaches
Prior vs. Conditional Probabilities
the degree of belief in any statement is based on the evidence we accumulate for that belief
We call an estimate of probability before any evidence is obtained a Prior probability
This implies that evidence changes over time • and allows us to define two kinds of
An estimate after a particular piece of evidence is obtained is a Posterior or Conditional probability
probabilities
Suppose I pick a card out of a deck
• because it’s conditional on having that particular piece of evidence
• Before looking at the card, likelihood of a particular card is 1/52
e.g. likelihood of patient having a cold in general vs. if you know they are coughing
• After? 0 or 1
63 © 2020 J.E. Anderson
64 © 2020 J.E. Anderson
Prior Probability
Conditional Probability
probability of the proposition being true given no particular evidence
Probability of cavity given we know someone has a toothache
e.g. P(Cavity)=0.1
in the absence of any other information, 10%
P(cavity|toothache) = 0.85
chance of someone having a cavity
If no other information other than the fact they have a toothache is available, we have a probability of a cavity of 85%
No longer prior probability as soon as we have any evidence at all
The condition is important: if we know B and C are true, P(cavity|B) is not useful – we must compute P(cavity|B^C)
Axioms of Probability
Defining Conditional Probability
We need to describe how probabilities and logical connectives interact, so we define a few basic truths about probability:
We can also define conditional probabilities in terms of unconditional ones
• 0
0
AB
• WeneedBtobetrue,andthenAtobetrue given that we have B
65 © 2020 J.E. Anderson
66 © 2020 J.E. Anderson
67 © 2020 J.E. Anderson
• also P(A^B) = P(B|A) P(A) This is the Product Rule
68 © 2020 J.E. Anderson
Also written as P(A^B)=P(A|B) P(B)
• which makes more sense semantically to
read!
Joint Probability Distribution
Joint Probability Distribution
A Joint Probability Distribution completely assigns probability assignments to all propositions in the domain
Cavity ~Cavity
Toothache ~Toothache 0.04 0.06
0.01 0.89
The joint is thus an N-dimensional table with a value in every cell stating the probability of that specific state occurring
the atomic events are mutually exclusive, so any conjunction has to be false
Consider an example: toothache vs. cavity…
therefore, the entries must sum to 1 (by the 2nd & 3rd axioms of probability)
Using the Joint
Calculating conditional probabilities?
We can use the joint to compute any probabilistic statement we want about the domain
We can use the product rule and the joint to calculate conditional probabilities given the collection of unconditional probabilities we have here
simply by stating the statement in terms of a disjunction of joint entries, and adding up the respective probabilities
For example, P(Cavity|Toothache)?
• P(A|B) = P(A^B) / P(B)
= P(Cavity ^ Toothache) / P(Toothache)
= 0.04/(0.04+0.01) – read off from the joint = 0.04/0.05
=0.8
Adding a row or col gives us unconditional probabilities:
• P(Cavity) = 0.06+0.04 =0.10
• P(Cavity V Toothache) = 0.04+0.01+0.06=0.11
• note that a cell can only be added in once!
69 © 2020 J.E. Anderson
70 © 2020 J.E. Anderson
71 © 2020 J.E. Anderson
Problem with this in the real world?
72 © 2020 J.E. Anderson
Can be a nasty, nasty number of combinations!
This is Bayes’ rule, which forms the foundation for most modern probabilistic reasoning systems in AI
Problem
Bayesian Reasoning
There may be thousands of variables in a given problem – easy for 2 (toothache vs. cavity), but gets difficult quickly as we increase
Bayes’ rule is derived from the two forms of the product rule we used earlier. Equating the two and simplifying, we get
So we need something to help with this: Bayes’ Rule
doesn’t look particularly useful at all – we’re using three terms just to calculate one CP
Using Bayes’ Rule
Using Bayes’ Rule
Medical Diagnosis:
Want P(M|S)
A doctor knows meningitis causes a symptomatic stiff neck a particular portion of the time
P(S|M) = 0.5 (observable from records)
We can look at how often stiff necks occur in the general population, and the rate of occurrence of meningitis
P(M|S)=(P(S|M) P(M))/P(S)
=(0.5*1/50000) / (1/20)
=0.0002
which is what you’d expect: really really unlikely
Bayes’ rule will allow us to calculate the likelihood of a meningitis diagnosis being the correct one if a patient exhibits a stiff neck
How does this compare with using a textbook value that might be obtained for this?
© 2020 J.E. Anderson
© 2020 J.E. Anderson
75 © 2020 J.E. Anderson
76 © 2020 J.E. Anderson
P(B|A) = (P(A|B) P(B)) / P(A)
Turns out in practice we DO often have good
guesses for these three terms though!
73 74
P(M) = 1/50000, P(S)=1/20 (observable statistically)
Using Bayes’ Rule
Downside of this
It’s a lot more flexible
Requires symptoms to be independent to be useable in practice
If you read that 1 in 5000 people with stiff necks had meningitis, and there was an epidemic of meningitis at the time, not clear how this would affect your diagnosis
Calculations like we did a few moments ago are easy with Bayes’ rule
Using Bayes’ rule, P(M) causes the likelihood of our diagnosis being meningitis to rise appropriately
Doing things like accumulating evidence though, isn’t
This is a more direct causal model, recording what’s going on in the environment – one level deeper than the textbook value
For accumulating evidence, we typically relax some of the independence constraints to make things computable, and only concentrate on the possible combinations we’re interested in 78
Bayesian Belief Networks
Example: Localization
We effectively construct a graph of what facts (evidence) influence other facts
Localization is the problem of determining where you are – vitally important to path planning, since if you don’t know the starting point, a plan is not very useful
When evidence increases our belief in some fact, we propagate this belief probabilistically to the connected facts
If you have a map of an area, you can guess where you are based on features you see – e.g. if you see a corner, there might be 4 places you could be
The network gets altered and even rewired as new facts are added and old facts removed (see text section 9.3.1)
77 © 2020 J.E. Anderson
© 2020 J.E. Anderson
79 © 2020 J.E. Anderson
80 © 2020 J.E. Anderson
• can be really nasty to do probabilistically because of the number of possibilities
• but you might not be sure it’s a corner, so other guesses as well
Monte Carlo Localization
Imprecise Concepts
Some localization algorithms use Bayesian reasoning to say “given this new piece of evidence and everything I’ve seen so far, where am I?”
Reasoning with uncertainty is further complicated by the idea that the concepts we talk about are also not precise in many cases
Zach Dodds has a nice demo doing this on a Roomba that lets you see the current hypotheses for a cheap, imprecise robot:
How tall is pretty tall? How bad a headache is awful? how drunk is incredible?
http://www.cs.hmc.edu/~dodds/erdos/old.html
Monte Carlo Localization [Dellaert et al, 1999] is an algorithm for this that represents the overall probability efficiently by maintaining a set of samples randomly drawn from it
This can be thought of as a third area of uncertainty – have to represent/reason with QUALITATIVE concepts
Qualitative Concepts
Estimating Pain
Means there’s no easy way to quantitatively define them – often not even a agreed understanding of the meaning associated with the concept
This agreement may not even be formally possible – e.g. what I call an awful headache might be trivial to somebody else, or vice versa….how do we know?
This is an important issue, as communicating such concepts is very important (think of medical domains – how bad does this hurt?)
81 © 2020 J.E. Anderson
82 © 2020 J.E. Anderson
83 © 2020 J.E. Anderson
84 © 2020 J.E. Anderson
For example, I might talk about somebody being pretty tall, or having an awful headache, or being incredibly drunk
Fuzzy Concepts
Fuzzy Reasoning
This is both a formal and informal term – when we talk about something being fuzzy it is a lesser-used synonym for vague
This is the basis for fuzzy reasoning – we participate in relationships to some degree
Blurred at the edges really – think of Tall in this fashion
More formally, fuzzy reasoning involves set theory, where an element has degrees of membership in the set instead of the traditional concept of either being a member or not.
• There are people that anybody would agree were tall – say somebody 8 feet tall
• Traditional/normal sets are referred to in this context as crisp sets, with crisp values, for contrast
• There are also people that nobody would say was tall – say 2 feet tall
• One of the problems of using fuzzy reasoning in the real world is that we often do still need crisp values in the end, and have to convert somehow
• The vast majority of us are somewhere in between
• There are degrees of tallness, and we participate in
We still have the usual problems associated with imprecision: if I say tall people are more likely to have
heart attacks, there’s imprecision in the rule itself, which 86 has to be computed with the imprecision of tallness
the concept of tallness to some degree
85 © 2020 J.E. Anderson
Fuzzy Reasoning
Simple Example
Some researchers do categorize fuzzy reasoning as outside of imprecise reasoning though, as a third category of uncertainty
An HVAC system controls the temperature in a building
Because it involved vagueness in the TERMS we use, not uncertainty about the world itself
Would like to be able to define rules based not on ranges of temperature (e.g. 15
Behaviours can also influence (support/inhibit) one another. E.g., consider insect behaviours: 101
The output of each behaviour is blended together (e.g. still attracted to light even when hungry) into an overall response for the insect’s actuators (body) 102
Applying this to Robotics…
Behaviour Combinations
Consider moving around
I want to move toward a goal (or keep moving
The power of behaviours is in their combination
Walking somewhere involves ALL of those previous
if I don’t have a specific goal)
behaviours at the same time
I also want to avoid bumping into things
Different components are more or less important at different times but all play a role in influencing how we walk – all are active at the same time
…or even avoid paths that will take me very close to obstacles
When a number of behaviours are active together, each has some strength behind it (desirability)
I want to watch where I am going
These can all be considered separate behaviours
e.g. consider the directions a robot might turn in…
© 2020 J.E. Anderson
© 2020 J.E. Anderson
103 © 2020 J.E. Anderson
104 © 2020 J.E. Anderson
For Example…
Measurements
From the point of view of a wall-avoiding behaviour, length of arrows represents the desirability of proceeding in that direction:
Each arrow could be a real measurement between 0 and 1, stored in a table or a function depending on which was convenient
For Example…
Each of These…
The arrows represent desirability of choices in terms of getting to the spot marked by X:
…represents the potential responses for one specific behaviour. The interesting part is if we can put them together, and get goodness of getting to the X without bumping into walls
105 © 2020 J.E. Anderson
106 © 2020 J.E. Anderson
107 © 2020 J.E. Anderson
108 © 2020 J.E. Anderson
Now visualize a desire to get to a certain location…
The goodness of any choice would be a combination of the goodnesses from the point of view of each of the participating behaviours
• both just mappings!
Combination
Fuzzy Behaviours
Combining these would increase the goodness of choices contributing to both, decrease from those detracting from both, and alter variously in between
We can implement Fuzzy Behaviours that do this, by grouping related activities together
Implementing a Fuzzy Behaviour
Lane Following Behaviour
Fuzzy state: vector of fuzzy variables, each 0…1 (e.g. obstacle-close-on-left?)
Fuzzy membership functions define the conditions, e.g.:
Update module produces the values of each of these each cycle from perception
Each rule in rule set has a condition – an expression based on fuzzy variables – and a conclusion – a set of fuzzy variable values (e.g. if too-close then move-away-fast)
112 © 2020 J.E. Anderson
109 © 2020 J.E. Anderson
I’m going to show you examples from the paper “Using Fuzzy Logic for Mobile Robot Control”, by Saffioti et al. – typical use of fuzzy control 110
111 © 2020 J.E. Anderson
Really there’s two places fuzziness comes into play The conditions involved:
The strengths of the combined output in terms of fuzzy variables: blending behaviour responses
First two rules keep robot within lane boundaries, second two line the robot up with the lane boundaries
• Activation based on fuzzy variables (e.g. attracted to light – how strong is the source?; don’t get too close – how close is that? )
© 2020 J.E. Anderson
Membership Functions
Non-Linear Functions
Just as we saw earlier, the membership functions of two fuzzy variables can be combined, using the rules we saw earlier:
And as before, membership functions don’t have to be straight lines
Rule Conclusions
Clipping based on Rule Antecedents
IF front-left-blocked AND (not front-right- blocked) AND (NOT side-right-blocked) THEN turn-left and move-slowly
Conclusion is adjusted based on the strength of the rule conditions. A strength of 1 would leave the response as-is, 0 would cut it off completely
Response is a fuzzy variable – how much left? How slowly? should be directly proportional to the strength of the conditions of the rule
In between these, the conditions of the rule put a limit on the strength of the rule’s output
113 © 2020 J.E. Anderson
114 © 2020 J.E. Anderson
115 © 2020 J.E. Anderson
116 © 2020 J.E. Anderson
e.g. membership in at-one-meter
Processing Rules
The Big Picture
The control values of all the rules in a behaviour are averaged to produce a single desirability function as output
Behaviour outputs must be combined then defuzzified to get crisp control values (angle, speed) for the robot
A behaviour also has an importance that is set globally and establishes a priority for the behaviour
Each behaviour contributes according to its overall activation strength
The output of a behaviour is affected by the strengths of the rules inside (context-dependent) and this importance
Activation can be affected by environment, other behaviors 118
Example Behaviour Interaction
Defuzzification
Point A: obstacle detected, keep-off begins to dominate and assert its preferences for control values (orientation & acceleration
So far we have only been talking about the fuzzy output of behaviours – not the crisp values we need to send to motors
Point B: path is clear, follow regains dominance
e.g. could just pick the control options with the highest activation (strongest output area/value dominates – visualize the combined strengths of desire to turn at various angles across 360°
C) activation level of each behavior
Turn ~25°
117 © 2020 J.E. Anderson
© 2020 J.E. Anderson
119 © 2020 J.E. Anderson
0
360 120 © 2020 J.E. Anderson
There are lots of ways of defuzzifying
Defuzzification
Defuzzification
There isn’t necessarily one obvious peak though, and this is problematic, because slight changes in activation could lead to real swings in behaviour
We could also take the centroid of the set of control functions – the average weighted by the fuzzy values
e.g. two similar peaks contributed by different behaviours, and we move from one to the other (thrashing between two behaviours)
What else could we do?
Average them?
Defuzzification in Saphira
Uncertainty in Perception
Saffioti et al.’s approach was later embedded into Saphira, the original off-the-shelf software for Pioneer robots
Uses a modified centroid approach: If no single centroid, it looks at the total area underneath and picks the largest
Places a small peak at 0 to deal with the case where there is little info available (doing nothing may be better than trying lots of different but probably unsuitable behaviours, or flipping between those that are trivially different
121 © 2020 J.E. Anderson
here, average is possibly going straight into a collision, rather than turning left or right 122
123 © 2020 J.E. Anderson
© 2020 J.E. Anderson
This is also problematic in some situations:
© 2020 J.E. Anderson
Local Perceptual Space
Anchoring
Most robotic systems model the world to at least some degree: local perceptual space, map building
Basic level involves putting individual data points (vision, sonar/laser reflections) into lines/planes
higher level object recognition attempts to match higher level structures (hallway, doors)
we ground or anchor our concepts of objects in an environment, and adjust those anchors as our perspective changes
If a robot is using odometry to track its movements, we have to reconcile errors there with what perception tells us (and perception is also erroneous)
Here the hallway position previously sensed is marked with solid bars. As the robot moves, wheel encoders tell how far it’s gone, but they’re inaccurate
• E.g. perceived turning: is the world shifting or is my head (or the rest of me) shifting? Worlds that are dynamic themselves are hard!
The anchors have to be corrected based on new
Anchoring
Uncertainty Summary
The robot has to continually adjust its world model (to the degree it keeps one) to reflect whether it’s decided some new perception is a new view of an old object or not
The same problems come up over and over in uncertainty
These errors compound hugely over time, and are part of why localization (Where Am I?) and navigation are such massive problems (Especially when there is no map to begin with, since there’s no easy way of telling what is right – Simultaneous Localization And Mapping (SLAM)
Some are more advantageous because of rigour and formality, others because they are more practically useful
We can deal with this partly through multi-agent approaches – sharing landmarks, getting more data
128 © 2020 J.E. Anderson
from more individuals
© 2020 J.E. Anderson
© 2020 J.E. Anderson
perceptions (unless these are different walls?)
© 2020 J.E. Anderson
no one scheme as the answers to all of them
People in AI tend to be biased toward their favourites, but it’s more important to be adaptable and recognize what might be useful somewhere!