程序代写代做 algorithm graph go chain flex C Bayesian html AI Reasoning With Uncertainty

Reasoning With Uncertainty

“So far as the laws of mathematics refer to reality they are not certain. And so far as they are certain, they do not refer to reality.” – Albert Einstein
Text Chapter 5
and 9: 9.2 only, to the end of 9.2.2
Logical Implication
Logical Implication
 First order logic gives us the ability to derive new (not explicitly represented) axioms using logical inferences

In order to be able to use implication to generate conclusions our statements have to be axioms (completely true statements), and our logical system must be correct (no inconsistencies) and complete (no missing knowledge) in order to be able to use logical implication to generate conclusions
 Everything we derive in FOL stems from the idea of logical inference – that we prove theorems that are a logical consequence of the axioms in the knowledge base
A  A  A  A  Theorem (the arrow is logical implication)
 
If the knowledge base is incomplete, the system will not be able to prove theorems
 The truth of this theorem rests solidly on the
If the knowledge base is inconsistent, the conclusions will not necessarily be valid
mathematical certainty of logical inference
4 © 2020 J.E. Anderson
3 © 2020 J.E. Anderson

“All traditional logic habitually assumes that precise symbols are being employed. It is therefore not applicable to this terrestrial life but only to an imagined celestial existence.” – Bertrand Russell
Logical Implication
2 © 2020 J.E. Anderson

The Real World
Reasoning with Uncertain Knowledge
 Does not work like this!
 Most everything is uncertain to some
 We need to be able to reason logically IN SPITE OF uncertainty in order to solve problems in the real world
degree
 That is, we need to take uncertainty into account in our reasoning. Knowledge will not be guaranteed complete and correct (like the real world!)
 So much so that the more you think about it the more you wonder why anything works at all
 Yet we deal with this relatively easily in most cases
 Would like to be able to generate conclusions that are correct most of the time with respect to the domain but which may not be logically implied by
Reasoning with Uncertain Knowledge
Reasoning with Uncertain Knowledge
 To do this, the system must use additional supports when necessary to deal with uncertainty
 The result of the inference process is a “conclusion”, not a formal theorem
 These supports can be used to fill in the gaps in uncertain knowledge – to take the uncertainty into account
• the conclusion is probably (hopefully very often) true but is not guaranteed to be true
A s  A  s  Conclusion additional supports for the reasoning process!
 No matter what the source of the uncertainty, this type of reasoning falls under the general heading of Reasoning Under Uncertainty
© 2020 J.E. Anderson
© 2020 J.E. Anderson
the axioms 56
 However there are really two broad types of such reasoning, and we’ll consider them separately
78 © 2020 J.E. Anderson © 2020 J.E. Anderson

Reasoning with Incomplete Knowledge
Reasoning with Incomplete Knowledge
 One of the most common problems encountered in real world situations involves dealing with incomplete knowledge
 However, there are exceptions to this rule; for example, ostriches cannot fly. We can add that to our rule (qualify it for ostriches):
 In these situations, we have to draw conclusions in spite of the fact that we don’t have access to all potential information that might influence that decision
If bird(X) and ~ostrich(X) Then fly(X)
 For example, we can represent the fact that birds can fly
 There are additional exceptions, such as baby birds, birds with broken wings, dead birds (and many other examples that quickly stretch the bounds of good taste), etc.
If bird(X) Then fly(X)
If bird(X) and ~ostrich(X) and ~baby(X)
Reasoning with Incomplete Knowledge
Reasoning with Incomplete Knowledge
 The number of such qualifications is essentially endless (there may even be endless qualifications to the qualifications)
 How far do we go proving each of these before it’s acceptable? When do we stop? The process is essentially endless
• e.g. dead birds can’t fly, unless you’re in a cartoon, or they’re on a plane, or…
 We need a huge amount of knowledge about the properties that Fred does not have in order to be able to make conclusions about the properties that Fred does have
 Adding an infinite number of qualifiers isn’t possible
 And think of the amount of work this involves! We can’t conclude that Fred can fly unless we can also prove that Fred is not an ostrich, is not a baby, is not dead, etc.
 This is the Qualification Problem
 We have to add extra supports to FOL to be
© 2020 J.E. Anderson
© 2020 J.E. Anderson
11 © 2020 J.E. Anderson
able to make inferences and avoid this
© 2020 J.E. Anderson
and ~dead(X) Then fly(X)
9 10
12

Negative vs. Positive knowledge
Closed World Assumption
 In most knowledge bases, there is a small number of positive facts compared with the total number of facts

The closed world assumption states that if a fact is not represented explicitly in a knowledge base and the fact is not implied by the knowledge base, then the fact is assumed to be false
 For example, an airline stores locations that it provides flights between:
flight(96, winnipeg, toronto)
flight(74, toronto, montreal)  Not all the points it doesn’t

The closed world assumption allows a system to make “reasonable” assumptions about knowledge that is missing from the system
~flight(434,winnipeg,sandiego)
• We do not represent negative information such as there is no direct flight from Winnipeg to San Diego
or there is no flight from Winnipeg to Grand Beach
© 2020 J.E. Anderson
14 © 2020 J.E. Anderson
Closed World Assumption
Negation as Failure
 So Fred would be assumed to be not an ostrich, not a baby, and not dead unless the system could prove otherwise

A problem with negation as failure is differentiating between a statement that is false and a statement for which the truth can’t be determined (we may have accidentally omitted the necessary knowledge, or not have developed that far yet). In an empty Prolog system, for example:
 The system must still expend effort to prove that each of the facts is not true before it can assume that they are false and then conclude that Fred can fly – we must limit what we represent to fit the practical aspects of the problem
?- alive(tweety). no.
?- dead(tweety). no.
 The closed-world assumption is implemented in Prolog as negation as failure: failure to prove a statement is taken to mean that the statement is
?- not alive(tweety). yes.
?- not dead(tweety). yes.
false 15 © 2020 J.E. Anderson
16 © 2020 J.E. Anderson
13

Negation as Failure
Negation as Failure
 Unless we know explicitly that alive(tweety) is true or that dead(tweety) is true, the negation of alive(tweety) will be true and the negation of dead(tweety) will also be true

What’s the problem with using this in Prolog: If ostrich(X) Then not fly(X)
• this leads to multiple interpretations (via backtracking) of the knowledge which is typically not acceptable
 
Prolog does not permit rules that begin with a negative. But what if it did?
• we have to eliminate the multiple interpretations by including axioms that allow us to resolve them
When combined with the rule that all birds can fly this leads to a contradiction (via backtracking)
• Consider an example:
• e.g. find that Fred can fly because he is a bird…and then can’t because he is an ostrich.
Reasoning with Defaults
Reasoning with Defaults
 In the previous example, there is no knowledge that indicates what knowledge should be assumed in the absence of other knowledge
 
Defeasible Reasoning
Default rules have the following form
 Defaults can be used to supply knowledge that is missing from an incomplete knowledge base

This rule states that if p(X) is true and it can not be proved that X is abnormal (ab(X)), then conclude that g(X) is true
 The implementation of defaults normally requires the use of negation as failure

If ab(X) does not exist as a fact or an inference, then the system can assume that g(X) is true
17 © 2020 J.E. Anderson
18 © 2020 J.E. Anderson
19 © 2020 J.E. Anderson
20 © 2020 J.E. Anderson

While the intent of this rule may be correct, it does not provide the correct result
If ( p(X) and ~ab(X) ) Then g(X)

Reasoning with Defaults
Reasoning with Defaults in Prolog
 The fact that specific birds can not fly is represented explicitly
 Default knowledge is represented in Prolog in a similar manner
If ostrich(X) Then flightless(X)
bird(X) :- ostrich(X).
flightless(X) :- ostrich(X).
fly(X) :- bird(X), not flightless(X).
 The system will now conclude that ostriches can’t fly but that all other birds can fly
If (bird(X) and ~flightless(X)) Then fly(X)
 The predicate flightless(X) is the abnormal predicate
Reasoning with Defaults in Prolog
Reasoning with Defaults in Prolog
 In the previous example, there is only one valid interpretation for a specific set of facts and rules
 For the statements below, 2 interpretations are possible (fly and flightless) for the same object because the default rule is not blocked by the flightless predicate
• either a bird can fly or it can’t fly
• both interpretations are not possible
 Prolog will prove flightless for an ostrich and can separately prove fly
 The abnormal predicate (flightless) blocks the use of the default rule (fly) when a bird is abnormal (i.e. can’t fly)
bird(X) :- ostrich(X). flightless(X) :- ostrich(X). fly(X) :- bird(X).
21 © 2020 J.E. Anderson
22 © 2020 J.E. Anderson
23 © 2020 J.E. Anderson
24 © 2020 J.E. Anderson
 And the rule is set to conclude in a positive term as Prolog requires

Multiple Levels of Defaults
Multiple Extensions
 Suppose some penguins have been genetically altered so that they are able to fly
 Some problems cannot easily be defined so that they do not produce contradictions
genetic_penguin(opus).
 This type of reasoning involves multiple extensions
penguin(X):- genetic_penguin(X).
bird(X) :- penguin(X).
 e.g., the “Nixon Diamond” – Quakers are pacifists; Republicans are not pacifists; Nixon is a Quaker and a Republican (Ginsberg, p. 218)
can_fly(X) :- genetic_penguin(X).
flightless(X) :- penguin(X), not can_fly(X).
 Is Nixon a pacifist or not?
 Unless one rule blocks the other rule, both
fly(X) :- bird(X), not flightless(X).
 Still only one possible interpretation for each bird – obviously gets tricky as we get to many levels of exceptions 25
interpretations are possible
 classic multiple inheritance problem
26 © 2020 J.E. Anderson
Defeasible Reasoning
Monotonic Knowledge
 A significant amount of care must go into the representation of reasoning with defaults in Prolog
 
Classical inference in first-order logic is monotonic
 An extension of Prolog, d-Prolog, has been designed to provide the facilities required for default reasoning
Any sentence that is true in one knowledge base is also true in any knowledge base that is a superset of this knowledge base
 d-Prolog takes advantage of the facility in Prolog for defining new operators
© 2020 J.E. Anderson
27 © 2020 J.E. Anderson
28 © 2020 J.E. Anderson

Adding new knowledge does not invalidate existing knowledge

Nonmonotonic Knowledge
Nonmonotonic Knowledge
 A knowledge base that uses negation as failure is nonmonotonic

A nonmonotonic knowledge base should still have unique interpretations
 Adding a new piece of knowledge may invalidate a previous conclusion
• A specific object can either fly or it can’t
bird(opus).
?- fly(opus).
yes.
penguin(opus).
?- fly(opus).
no. genetic_penguin(opus). ?- fly(opus).
yes.
• Both interpretations of the same knowledge should not be possible
Non-Monotonic Systems
Incomplete vs Imprecise Knowledge
 Adding a new fact should not cause multiple interpretations in a properly defined system:
 We have thus far been talking about what the knowledge base does and doesn’t contain, as opposed to the truth of statements in it
?- assert(bird(opus)). yes
?- fly(opus).
yes
 So far we have assumed that at any specific point in time each statement is either completely true or false – nothing exists in between.
?- assert(penguin(opus)). yes
?- fly(opus).
no.
male(bill) parent(bill, sally)
?- assert(genetic_penguin(opus)). yes
?- fly(opus).
yes.
 Easy (but not entirely accurate) to make assumptions there is no vagueness in these concepts – but most of the world is much less
29 © 2020 J.E. Anderson
© 2020 J.E. Anderson
31 © 2020 J.E. Anderson
black and white 32 © 2020 J.E. Anderson

Prolog has predicates to assert() and retract() specific predicates, allowing new facts to be defined on the fly, or deleted on the fly – see documentation for whatever Prolog you are using 30

Reasoning with Imprecise Knowledge
Representing Imprecise Knowledge
 Much of the knowledge that is used in real world situations is not either true or false; instead, it is true some of the time but not all of the time
 To represent imprecise knowledge, need to represent the degree of truth of a statement
 Such knowledge is referred to as imprecise knowledge
 We can represent this knowledge by assigning a numerical measure to each statement in some fashion
If sore-throat(X) then have-cold(X)
 We first need to decide what kind of numeric scale we’re going to use…
 We could use default rules to block this conclusion in cases we know about
 For example, a true statement could have a measure of +1, a false statement a measure
of -1, and other statements would have
floating point measures between -1 and +1 34
 More often want to know how strongly we believe you have a cold given you have sore throat
 vs. if sore-throat(X) then throat-cancer(X), for 33 example! © 2020 J.E. Anderson
© 2020 J.E. Anderson
Representing Imprecise Knowledge
(1) Where do the numbers come from?
 A statement with a measure of 0.9 would be true most of the time, but not all of the time
 
Statistics?
FT
  
Average over lots of experts?
Good general guesses?
e.g. why 0.85, and not 0.84 or 0.86?
-1 +1
 Now that we have a scale, we can talk about two classic problems that come up whenever we reason with imprecise information….
35 © 2020 J.E. Anderson
36 © 2020 J.E. Anderson
Estimates based on lots of experience (e.g. ask our expert)

The Second Problem
Propagating and Combining Measures
 What do we DO with these numbers when we make inferences?
 And if I accept that I have strep throat but am not completely certain, and I have another rule that says that if I have strep throat, I should likely have some antibiotic, what’s the overall likelihood of needing the antibiotic?
 e.g. if I have some imprecise measure of having a sore throat (i.e. I’m not certain I do, I think I might be getting one)
 And I have a rule with some other likelihood that that means I have strep throat
 Any scheme for dealing with imprecise reasoning must have some valid way of making inference after inference, propagating or combining the measures we put into each
 Then what’s the overall likelihood that I have strep throat?
 What about when I add other symptoms (e.g. I’m kind of fevery too)?
38 © 2020 J.E. Anderson
Certainty Factors
Certainty Factors
 Certainty factors were developed for use in the Mycin expert system in order to represent and manipulate the numerical measures associated with imprecise statements
 For example, the belief that I have a sore throat can be represented by including a certainty factor with a fact
 A certainty factor is a measure of belief that a statement is true
sore-throat 0.9
 A certainty factor may be associated with a simple fact or with a rule
 Similarly, the belief that if I have a sore throat then I have a cold can be represented by including a certainty fact or with a rule
37 © 2020 J.E. Anderson
39 © 2020 J.E. Anderson
If (sore-throat) Then have-cold 0.75
© 2020 J.E. Anderson
40

Manipulating Certainty Factors
Manipulating Certainty Factors
 The certainty factors going into a rule must be combined in such a way that the certainty factor of the outcome is reasonable, given the characteristics of the domain
 Even within the general realm of certainty factors there are numerous means of combining these – they arose as ad hoc approaches for particular expert systems
• sore-throat 0.9
 There is no “best” technique for combining certainty factors – different techniques work well in different domains and poorly in others
• If (sore-throat) Then have-cold so –
0.75
have-cold??
 The Mycin developers found one technique that worked well much of the time though:
 Remember these will also cascade through chains of rules, and the end outcome must still be
reasonable
41 © 2020 J.E. Anderson
42 © 2020 J.E. Anderson
Manipulating Certainty Factors
Manipulating Certainty Factors
If ( Ax ) Then Cy (where x, y are certainty factors)
If (Ax) Then C
 The resulting CF of C is x*y – the certainty of the result of a rule is affected by the certainty of the facts in the rule’s condition
If (By) Then C
If ( Ax ^ By ) Then …
 The resulting CF of the premise is min(x,y)
• Here we have 2 different rules being applied to give c
If ( Ax v By ) Then …
• A second independent piece of evidence indicating that disease further supports (in fact usually much more) our certainty in C
 The resulting CF of the premise is max(x,y)
 we’re only as certain as the lowest participant in a conjunction, but certain to the maximum participant in a disjunction
• Mathematically:
43 © 2020 J.E. Anderson
44 © 2020 J.E. Anderson
• Think of a medical domain, where I have a symptom and it indicates some disease

Manipulating Certainty Factors
Manipulating Certainty Factors
If (Ax) Then C If (By) Then C
If (A0.5) Then C
If (B0.75) Then C
TheCFofCis = .5+.75-(.5*.75) = 0.875
 If the evidence is independent, the resulting CF of the premise is
If (A -0.5) Then C
If (B -0.75) Then C
TheCFofCis = -.5+-.75+(.5*.75) =
x+y-(x*y) if x and y > 0
 The resulting CF of the premise is
x+y+(x*y) if x and y < 0  Otherwise, the resulting CF of the premise is -0.875 (x+y)/(1-min(|x|,|y|)) Manipulating Certainty Factors Certainty Factors If (A -0.5) Then C If (B0.75) Then C TheCFofCis =(-.5+.75)/(1-0.5)= 0.5  A medical diagnosis system could make diagnoses based on the symptoms that are present: 45 © 2020 J.E. Anderson 46 © 2020 J.E. Anderson 47 © 2020 J.E. Anderson If (chest-pain) then cardiac-arrest If (left-arm-pain) then cardiac-arrest If (left-arm-pain) then rotator-cuff-injury 48 If (left-arm-pain  neck-pain) then spinal-cord-injury If (shortness-breath  chest-pain) then over-exertion © 2020 J.E. Anderson Certainty Factors Certainty Factors  It is important to differentiate between symptoms that are independent and symptoms that are related If (chest-pain) then cardiac-arrest If (left-arm-pain) then cardiac-arrest If (left-arm-pain ^ neck-pain) then spinal-cord-injury  There is a connection between the conditions: Both symptoms must be present for the conclusion to be true – the rule is useless without both  These two symptoms are independent: either symptom can lead to the conclusion and both symptoms together increase the confidence in the conclusion If (shortness-breath  chest-pain) then over-exertion • These symptoms are related too, so it should not increase the confidence in the diagnosis if both are present – if they were independent we would put them in separate rules, and they'd be combined using the formula that increases confidence if both are present49  It is your decision to implement these conditions as separate rules (independent) or within one rule (dependent) depending on whether the evidence is independent or not 50 Certainty Factors Certainty Factors left-arm-pain(george)0.7 neck-pain(george)0.8 If (shortness-breath(X) v chest-pain(X)) then over-exertion(X)0.9 if (left-arm-pain(X) ^ neck-pain(X)) then If we had: chest-pain(george)0.8 think about backward reasoning with this, e.g. in Prolog – now have to explore both branches to get accurate picture of Certainty – both here and for rules with independent evidence! spinal-cord-injury(X) spinal-cord-injury(george)0.7*0.9=.63 over-exertion(george) 0.9 0.9*0.8=.72 © 2020 J.E. Anderson © 2020 J.E. Anderson 51 © 2020 J.E. Anderson over-exertion(george) 0.9*0.7=.63 52 © 2020 J.E. Anderson but if we had: shortness-breath(george)0.7 Certainty Factors Certainty Factors if (chest-pain(X)) then cardiac-arrest(X)0.9 if (left-arm-pain(X)) then cardiac-arrest(X)0.9  For the following symptoms: left-arm-pain(george)0.7 neck-pain(george)0.8 shortness-breath(george)0.7 chest-pain(george)0.8 With chest-pain(george)0.7 gives us cardiac-arrest(george)0.9*0.7=.63 With left-arm-pain(george)0.8 gives us cardiac-arrest(george)0.9*0.8=.72 combining these:  The following conclusions (diagnoses) would be made: cardiac-arrest0.9*(0.7+0.8-(0.7*0.8))=.846 spinal-cord-injury(george)0.9*0.7=.63 over-exertion(george)0.9*0.8=.72 Certainty Factors Prolog Certainty Factors  For this particular collection of symptoms, the most likely diagnosis for george is cardiac- arrest  Implementing certainty factors in Prolog requires a change to the predicates used to represent knowledge plus the addition of predicates to manipulate the certainty factors  If a system based on logical implication had been used instead, all 3 diagnoses would have been generated but there would not have been any information that indicated which diagnosis was most strongly believed symptom(fact). /*or relation(X,Y) */  The certainties associated with the individual symptoms were (obviously!) not realistic - they were used solely to illustrate the process symptom(fact, cf). /*or relation(X,Y,CF) */ symptom(neck-pain, 0.95). 53 © 2020 J.E. Anderson cardiac-arrest(george)0.9*(0.7+0.8-(0.7*0.8))=.846 54 © 2020 J.E. Anderson 55 © 2020 J.E. Anderson 56 © 2020 J.E. Anderson  symptom(neck-pain). becomes and CF” © 2020 J.E. Anderson © 2020 J.E. Anderson Prolog Certainty Factors Prolog Certainty Factors  Prolog rules need to be written to modify the certainties of conclusions and implement the mathematics we’ve seen  Because there will possibly be multiple rules with the same conclusions but different certainty factors (because they had different inputs), you may have to explore multiple possibilities and keep a list of them, and then look for the one with the highest CF  i.e. to bind a concluding CF, or evaluate the CF of a rule condition, you need to pass the participating CFs to relation(s) that will calculate the result for you  You will likely be doing this anyway since there are often different outcomes (like the different diagnoses of george) and you need to choose the most appropriate  Queries also have to have a place for this:  ?-runsystem(E1,E2,Conclusion,CF)  “take E1 and E2 and bind results to Conclusion Certainty Factors Summary One Downside of Certainty Factors  Certainty factors permit the definition and manipulation of knowledge that is more complex than default knowledge  Recall our two basic problems in imprecise reasoning  Evidence can be combined from a variety of sources, with each piece of evidence contributing towards increasing or decreasing the confidence in the various conclusions  In CFs the measures may be from statistical sampling or may just be an estimate obtained from an expert in the domain  Conflicting information can be processed  The manipulation mechanisms are somewhat “ad hoc” -- they work but do not have a really strong theoretical basis  While the numbers may sometimes seem overprecise, they can be translated back into qualitative terms (e.g. 0.6 > X > 0.7 = somewhat-
• There are often “fudge factors” inserted,
like x*y+|n| when multiplying through a
rule, to avoid a CF getting too small too quickly 60
likely) 59 © 2020 J.E. Anderson
© 2020 J.E. Anderson

Certainty Factors vs. Probabilities
Probabilistic Approaches
 Certainty factors are different from probabilities • a probability of 1 means true
• a probability of 0 means false
 The advantage to more probabilistic approaches is that the combination mechanisms have a much more formal basis
 A probability of 0.5 means that an event has a 50% chance of occurring
 i.e. what we’re saying has a strong theoretical foundation, it’s not just “kind of nice”
 A certainty factor of 0 means that we know nothing about whether the event will occur or not (the Jon Snow point!)
 There are numerous probabilistic approaches, many based on Bayesian Reasoning
 The major advantage of certainty factors over
probabilities is that combined probabilities
decrease towards 0 (false) while certainty factors
decrease towards 0 (not known) 61 © 2020 J.E. Anderson
62 © 2020 J.E. Anderson
Probabilistic Approaches
Prior vs. Conditional Probabilities
 the degree of belief in any statement is based on the evidence we accumulate for that belief
 We call an estimate of probability before any evidence is obtained a Prior probability
 This implies that evidence changes over time • and allows us to define two kinds of
 An estimate after a particular piece of evidence is obtained is a Posterior or Conditional probability
probabilities
 Suppose I pick a card out of a deck
• because it’s conditional on having that particular piece of evidence
• Before looking at the card, likelihood of a particular card is 1/52
 e.g. likelihood of patient having a cold in general vs. if you know they are coughing
• After? 0 or 1
63 © 2020 J.E. Anderson
64 © 2020 J.E. Anderson

Prior Probability
Conditional Probability
 probability of the proposition being true given no particular evidence
 Probability of cavity given we know someone has a toothache
 e.g. P(Cavity)=0.1
 in the absence of any other information, 10%
 P(cavity|toothache) = 0.85
chance of someone having a cavity
 If no other information other than the fact they have a toothache is available, we have a probability of a cavity of 85%
 No longer prior probability as soon as we have any evidence at all
 The condition is important: if we know B and C are true, P(cavity|B) is not useful – we must compute P(cavity|B^C)
Axioms of Probability
Defining Conditional Probability
 We need to describe how probabilities and logical connectives interact, so we define a few basic truths about probability:
 We can also define conditional probabilities in terms of unconditional ones
• 00
AB
• WeneedBtobetrue,andthenAtobetrue given that we have B
65 © 2020 J.E. Anderson
66 © 2020 J.E. Anderson
67 © 2020 J.E. Anderson
• also P(A^B) = P(B|A) P(A)  This is the Product Rule
68 © 2020 J.E. Anderson
 Also written as P(A^B)=P(A|B) P(B)
• which makes more sense semantically to
read!

Joint Probability Distribution
Joint Probability Distribution
 A Joint Probability Distribution completely assigns probability assignments to all propositions in the domain
Cavity ~Cavity
Toothache ~Toothache 0.04 0.06
0.01 0.89
 The joint is thus an N-dimensional table with a value in every cell stating the probability of that specific state occurring
 the atomic events are mutually exclusive, so any conjunction has to be false
 Consider an example: toothache vs. cavity…
 therefore, the entries must sum to 1 (by the 2nd & 3rd axioms of probability)
Using the Joint
Calculating conditional probabilities?
 We can use the joint to compute any probabilistic statement we want about the domain
 We can use the product rule and the joint to calculate conditional probabilities given the collection of unconditional probabilities we have here
 simply by stating the statement in terms of a disjunction of joint entries, and adding up the respective probabilities
 For example, P(Cavity|Toothache)?
• P(A|B) = P(A^B) / P(B)
= P(Cavity ^ Toothache) / P(Toothache)
= 0.04/(0.04+0.01) – read off from the joint = 0.04/0.05
=0.8
 Adding a row or col gives us unconditional probabilities:
• P(Cavity) = 0.06+0.04 =0.10
• P(Cavity V Toothache) = 0.04+0.01+0.06=0.11
• note that a cell can only be added in once!
69 © 2020 J.E. Anderson
70 © 2020 J.E. Anderson
71 © 2020 J.E. Anderson
 Problem with this in the real world?
72 © 2020 J.E. Anderson

 Can be a nasty, nasty number of combinations!
 This is Bayes’ rule, which forms the foundation for most modern probabilistic reasoning systems in AI
Problem
Bayesian Reasoning
 There may be thousands of variables in a given problem – easy for 2 (toothache vs. cavity), but gets difficult quickly as we increase
 Bayes’ rule is derived from the two forms of the product rule we used earlier. Equating the two and simplifying, we get
 So we need something to help with this: Bayes’ Rule
 doesn’t look particularly useful at all – we’re using three terms just to calculate one CP
Using Bayes’ Rule
Using Bayes’ Rule
 Medical Diagnosis:
 Want P(M|S)
 A doctor knows meningitis causes a symptomatic stiff neck a particular portion of the time
 P(S|M) = 0.5 (observable from records)
 We can look at how often stiff necks occur in the general population, and the rate of occurrence of meningitis
 P(M|S)=(P(S|M) P(M))/P(S)
 =(0.5*1/50000) / (1/20)
 =0.0002
 which is what you’d expect: really really unlikely
 Bayes’ rule will allow us to calculate the likelihood of a meningitis diagnosis being the correct one if a patient exhibits a stiff neck
 How does this compare with using a textbook value that might be obtained for this?
© 2020 J.E. Anderson
© 2020 J.E. Anderson
75 © 2020 J.E. Anderson
76 © 2020 J.E. Anderson
 P(B|A) = (P(A|B) P(B)) / P(A)
 Turns out in practice we DO often have good
guesses for these three terms though!
73 74
 P(M) = 1/50000, P(S)=1/20 (observable statistically)

Using Bayes’ Rule
Downside of this
 It’s a lot more flexible
  
Requires symptoms to be independent to be useable in practice
 If you read that 1 in 5000 people with stiff necks had meningitis, and there was an epidemic of meningitis at the time, not clear how this would affect your diagnosis
Calculations like we did a few moments ago are easy with Bayes’ rule
 Using Bayes’ rule, P(M) causes the likelihood of our diagnosis being meningitis to rise appropriately
Doing things like accumulating evidence though, isn’t
 This is a more direct causal model, recording what’s going on in the environment – one level deeper than the textbook value

For accumulating evidence, we typically relax some of the independence constraints to make things computable, and only concentrate on the possible combinations we’re interested in 78
Bayesian Belief Networks
Example: Localization
 We effectively construct a graph of what facts (evidence) influence other facts

Localization is the problem of determining where you are – vitally important to path planning, since if you don’t know the starting point, a plan is not very useful
 When evidence increases our belief in some fact, we propagate this belief probabilistically to the connected facts

If you have a map of an area, you can guess where you are based on features you see – e.g. if you see a corner, there might be 4 places you could be
 The network gets altered and even rewired as new facts are added and old facts removed (see text section 9.3.1)
77 © 2020 J.E. Anderson
© 2020 J.E. Anderson
79 © 2020 J.E. Anderson
80 © 2020 J.E. Anderson
• can be really nasty to do probabilistically because of the number of possibilities
• but you might not be sure it’s a corner, so other guesses as well

Monte Carlo Localization
Imprecise Concepts
 Some localization algorithms use Bayesian reasoning to say “given this new piece of evidence and everything I’ve seen so far, where am I?”

Reasoning with uncertainty is further complicated by the idea that the concepts we talk about are also not precise in many cases
 Zach Dodds has a nice demo doing this on a Roomba that lets you see the current hypotheses for a cheap, imprecise robot:
 
How tall is pretty tall? How bad a headache is awful? how drunk is incredible?
http://www.cs.hmc.edu/~dodds/erdos/old.html
 Monte Carlo Localization [Dellaert et al, 1999] is an algorithm for this that represents the overall probability efficiently by maintaining a set of samples randomly drawn from it
This can be thought of as a third area of uncertainty – have to represent/reason with QUALITATIVE concepts
Qualitative Concepts
Estimating Pain
 Means there’s no easy way to quantitatively define them – often not even a agreed understanding of the meaning associated with the concept
 This agreement may not even be formally possible – e.g. what I call an awful headache might be trivial to somebody else, or vice versa….how do we know?
 This is an important issue, as communicating such concepts is very important (think of medical domains – how bad does this hurt?)
81 © 2020 J.E. Anderson
82 © 2020 J.E. Anderson
83 © 2020 J.E. Anderson
84 © 2020 J.E. Anderson

For example, I might talk about somebody being pretty tall, or having an awful headache, or being incredibly drunk

Fuzzy Concepts
Fuzzy Reasoning
 This is both a formal and informal term – when we talk about something being fuzzy it is a lesser-used synonym for vague
 This is the basis for fuzzy reasoning – we participate in relationships to some degree
 Blurred at the edges really – think of Tall in this fashion
 More formally, fuzzy reasoning involves set theory, where an element has degrees of membership in the set instead of the traditional concept of either being a member or not.
• There are people that anybody would agree were tall – say somebody 8 feet tall
• Traditional/normal sets are referred to in this context as crisp sets, with crisp values, for contrast
• There are also people that nobody would say was tall – say 2 feet tall
• One of the problems of using fuzzy reasoning in the real world is that we often do still need crisp values in the end, and have to convert somehow
• The vast majority of us are somewhere in between
• There are degrees of tallness, and we participate in
 We still have the usual problems associated with imprecision: if I say tall people are more likely to have
heart attacks, there’s imprecision in the rule itself, which 86 has to be computed with the imprecision of tallness
the concept of tallness to some degree
85 © 2020 J.E. Anderson
Fuzzy Reasoning
Simple Example
 Some researchers do categorize fuzzy reasoning as outside of imprecise reasoning though, as a third category of uncertainty
 
An HVAC system controls the temperature in a building
 Because it involved vagueness in the TERMS we use, not uncertainty about the world itself
Would like to be able to define rules based not on ranges of temperature (e.g. 15 more effort spent seeking out food) or perception (enemy closer – stronger urge to flee)
 Behaviours can also influence (support/inhibit) one another. E.g., consider insect behaviours: 101
 The output of each behaviour is blended together (e.g. still attracted to light even when hungry) into an overall response for the insect’s actuators (body) 102
Applying this to Robotics…
Behaviour Combinations
 Consider moving around
 I want to move toward a goal (or keep moving
 The power of behaviours is in their combination
 Walking somewhere involves ALL of those previous
if I don’t have a specific goal)
behaviours at the same time
 I also want to avoid bumping into things
 Different components are more or less important at different times but all play a role in influencing how we walk – all are active at the same time
 …or even avoid paths that will take me very close to obstacles
 When a number of behaviours are active together, each has some strength behind it (desirability)
 I want to watch where I am going
 These can all be considered separate behaviours
 e.g. consider the directions a robot might turn in…
© 2020 J.E. Anderson
© 2020 J.E. Anderson
103 © 2020 J.E. Anderson
104 © 2020 J.E. Anderson

For Example…
Measurements
 From the point of view of a wall-avoiding behaviour, length of arrows represents the desirability of proceeding in that direction:

Each arrow could be a real measurement between 0 and 1, stored in a table or a function depending on which was convenient
For Example…
Each of These…
 The arrows represent desirability of choices in terms of getting to the spot marked by X:

…represents the potential responses for one specific behaviour. The interesting part is if we can put them together, and get goodness of getting to the X without bumping into walls
105 © 2020 J.E. Anderson
106 © 2020 J.E. Anderson
107 © 2020 J.E. Anderson
108 © 2020 J.E. Anderson

Now visualize a desire to get to a certain location…

The goodness of any choice would be a combination of the goodnesses from the point of view of each of the participating behaviours
• both just mappings!

Combination
Fuzzy Behaviours
 Combining these would increase the goodness of choices contributing to both, decrease from those detracting from both, and alter variously in between

We can implement Fuzzy Behaviours that do this, by grouping related activities together
Implementing a Fuzzy Behaviour
Lane Following Behaviour
 Fuzzy state: vector of fuzzy variables, each 0…1 (e.g. obstacle-close-on-left?)

Fuzzy membership functions define the conditions, e.g.:
 Update module produces the values of each of these each cycle from perception
 Each rule in rule set has a condition – an expression based on fuzzy variables – and a conclusion – a set of fuzzy variable values (e.g. if too-close then move-away-fast)
112 © 2020 J.E. Anderson
109 © 2020 J.E. Anderson
I’m going to show you examples from the paper “Using Fuzzy Logic for Mobile Robot Control”, by Saffioti et al. – typical use of fuzzy control 110
111 © 2020 J.E. Anderson
 
Really there’s two places fuzziness comes into play The conditions involved:
 
The strengths of the combined output in terms of fuzzy variables: blending behaviour responses

First two rules keep robot within lane boundaries, second two line the robot up with the lane boundaries
• Activation based on fuzzy variables (e.g. attracted to light – how strong is the source?; don’t get too close – how close is that? )
© 2020 J.E. Anderson

Membership Functions
Non-Linear Functions
 Just as we saw earlier, the membership functions of two fuzzy variables can be combined, using the rules we saw earlier:
 And as before, membership functions don’t have to be straight lines
Rule Conclusions
Clipping based on Rule Antecedents
 IF front-left-blocked AND (not front-right- blocked) AND (NOT side-right-blocked) THEN turn-left and move-slowly

Conclusion is adjusted based on the strength of the rule conditions. A strength of 1 would leave the response as-is, 0 would cut it off completely
 Response is a fuzzy variable – how much left? How slowly? should be directly proportional to the strength of the conditions of the rule

In between these, the conditions of the rule put a limit on the strength of the rule’s output
113 © 2020 J.E. Anderson
114 © 2020 J.E. Anderson
115 © 2020 J.E. Anderson
116 © 2020 J.E. Anderson
 e.g. membership in at-one-meter

Processing Rules
The Big Picture
 The control values of all the rules in a behaviour are averaged to produce a single desirability function as output
 Behaviour outputs must be combined then defuzzified to get crisp control values (angle, speed) for the robot
 A behaviour also has an importance that is set globally and establishes a priority for the behaviour
 Each behaviour contributes according to its overall activation strength
 The output of a behaviour is affected by the strengths of the rules inside (context-dependent) and this importance
 Activation can be affected by environment, other behaviors 118
Example Behaviour Interaction
Defuzzification
 Point A: obstacle detected, keep-off begins to dominate and assert its preferences for control values (orientation & acceleration

So far we have only been talking about the fuzzy output of behaviours – not the crisp values we need to send to motors
 Point B: path is clear, follow regains dominance
e.g. could just pick the control options with the highest activation (strongest output area/value dominates – visualize the combined strengths of desire to turn at various angles across 360°
 C) activation level of each behavior
Turn ~25°
117 © 2020 J.E. Anderson
© 2020 J.E. Anderson
119 © 2020 J.E. Anderson
0
360 120 © 2020 J.E. Anderson
 
There are lots of ways of defuzzifying

Defuzzification
Defuzzification
 There isn’t necessarily one obvious peak though, and this is problematic, because slight changes in activation could lead to real swings in behaviour
 We could also take the centroid of the set of control functions – the average weighted by the fuzzy values
 e.g. two similar peaks contributed by different behaviours, and we move from one to the other (thrashing between two behaviours)
 What else could we do?
 Average them?
Defuzzification in Saphira
Uncertainty in Perception
 Saffioti et al.’s approach was later embedded into Saphira, the original off-the-shelf software for Pioneer robots
 Uses a modified centroid approach: If no single centroid, it looks at the total area underneath and picks the largest
 Places a small peak at 0 to deal with the case where there is little info available (doing nothing may be better than trying lots of different but probably unsuitable behaviours, or flipping between those that are trivially different
121 © 2020 J.E. Anderson
 here, average is possibly going straight into a collision, rather than turning left or right 122
123 © 2020 J.E. Anderson
© 2020 J.E. Anderson
 This is also problematic in some situations:
© 2020 J.E. Anderson

Local Perceptual Space
Anchoring
 Most robotic systems model the world to at least some degree: local perceptual space, map building
 Basic level involves putting individual data points (vision, sonar/laser reflections) into lines/planes
 higher level object recognition attempts to match higher level structures (hallway, doors)

we ground or anchor our concepts of objects in an environment, and adjust those anchors as our perspective changes
 If a robot is using odometry to track its movements, we have to reconcile errors there with what perception tells us (and perception is also erroneous)

Here the hallway position previously sensed is marked with solid bars. As the robot moves, wheel encoders tell how far it’s gone, but they’re inaccurate
• E.g. perceived turning: is the world shifting or is my head (or the rest of me) shifting? Worlds that are dynamic themselves are hard!

The anchors have to be corrected based on new
Anchoring
Uncertainty Summary
 The robot has to continually adjust its world model (to the degree it keeps one) to reflect whether it’s decided some new perception is a new view of an old object or not

The same problems come up over and over in uncertainty
 These errors compound hugely over time, and are part of why localization (Where Am I?) and navigation are such massive problems (Especially when there is no map to begin with, since there’s no easy way of telling what is right – Simultaneous Localization And Mapping (SLAM)
Some are more advantageous because of rigour and formality, others because they are more practically useful
 We can deal with this partly through multi-agent approaches – sharing landmarks, getting more data
128 © 2020 J.E. Anderson
from more individuals
© 2020 J.E. Anderson
© 2020 J.E. Anderson
perceptions (unless these are different walls?)
© 2020 J.E. Anderson
 
no one scheme as the answers to all of them

People in AI tend to be biased toward their favourites, but it’s more important to be adaptable and recognize what might be useful somewhere!

Related Posts