CS代考 The RETE Algorithm: Motivation

The RETE Algorithm: Motivation

Uncertainty and Bayesian Methods

Learning Objectives
Trace origin of Bayes’ Law
Compare if … then with Bayes’ Rule
Compute probabilities from prior probabilities
Prune to obtain results
Trade off uncertainty methods
Apply to examples

Uncertainty and Bayesian Networks
3
Uncertainty
Bayes’ Rule
Example
Appendix: Pruning

Sources of Uncertainty
IF ant1 AND ant2 THEN cons1 OR cons2
Fact uncertainty
Implication uncertainty

There are at least two kinds of uncertainty within rules. The “facts” may not be solid, and even when they are, the rule as a whole may be not entirely valid.
4

Symbolic Uncertainty
Approach: use attributes with logical weight values

Example: {often, sometimes, seldom}

IF Smoker/sometimes AND Uneven/often
THEN Motivated/seldom

Use when possible (e.g. combinatorially) … expert has handle on weighting

When people talk about uncertainty, they usually use descriptive language such as “more-or-less” and “sometimes.” This is often the case when knowledge is obtained from experts. Data itself is seldom 100% consistent. If these uncertainty terms can be combined in an algorithmic manner as the experts do (e.g., what is the upshot of two antecedents that are “often” and “seldom” respectively), such an organization should be utilized.
5

An Uncertainty Calculus For A & B => C
Rule A B C
v. sure v. sure v. sure v. sure
In-between In-between
v. unsure In-between
In-between In-between In-between
v. unsure v. unsure
v. unsure v. unsure v. unsure
In-between … …. ….

The table shows a way in which some uncertainly in rules and antecedents can produce consequents. For example, if we have the very sure rule A & B => C, A and B are neither very sure nor very unsure (“in-between”), then C is also in-between.
6

Uncertainty and Bayesian Networks
7
Uncertainty
Bayes’ Rule
Example
Appendix: Pruning

Bayes’ Rule: Example Problem
Adapted from “Machine Learning” by Marsland, Second Edition, p27

Adapted from “Machine Learning” by Marsland, Second Edition, p27

observation 1
observation 2
Recognize ‘a’ or ‘b’

The example in the figure illustrates the uncertainty involved in deciding whether the element is an “a” or a “b.”
8

Bayes’ Rule: Example Problem
Adapted from “Machine Learning” by Marsland, Second Edition, p27

Probability that pattern represents ‘a’ …
… given observation X …
Expressed: p(letter is a | observation_X)
observation 1
observation 2

Conditional probability concerns the probability of an event when a pertinent fact is known. For example, the probability of selecting a red ball from a bag containing 5 red and 5 blue balls is 50% but the probability of the same event given that a blue ball has been removed is different.

One can rephrase the question
what letter is observation 1?
in the figure conditionally as
what is the probability that the letter is ‘a’, given that 1 has been observed?
9

Bayes’ Rule: The Goal
Find p(a|X) in terms of measurable quantities

So probability can be applied to the uncertainty of if-then rules by turning them from …
IF X THEN a
into …
p(a|X), ”p” meaning “probability.”

In other words: “if I know X, how likely is a to be valid?”

So far, this has simply been a question of rephrasing. However, now that we are dealing with probabilities, we can invoke a deep, rich theory which has taken over four hundred years to develop—in particular, Bayes’ Law.
10

Use Bayes …
… when p(A|B) needed but hard to measure
But p(B|A) much easier to measure.

Example:
May be easier to gather smokers
and assess their motivation for exercise p(M|S) …
… than gather people with motivational problems and count smokers p(S|M)

Bayes’ law expresses p(A|B) in terms of p(B|A) in case the latter is easier to compute. For example, suppose that we want to know the degree of certainty of the following :

IF a person is not motivated to exercise THEN they are a smoker.

To compute this directly, we would have to contact a large group of people who are not motivated to exercise, and count how many of them are smokers. It might be difficult to gather this group in the first place, and when we do, we may find that not every smoker admits to it.

On the other hand, it might be easier to obtain a list of confirmed smokers and ask each of them about motivation.

Bayes’ Law allows us to make progress in this way.

11

Bayes’ Rule: Intuitive Derivation
Want p(a|X)
Set of entities satisfying condition “X”
X
a
U

To understand why Bayes’ Law is true, imagine the occurrences satisfying condition a and those satisfying X, as in the figure. U here denotes the set of all relevant entities (the universal” set).
12

Bayes’ Rule: Intuitive Derivation
Want p(a|X) … = |a∩X| / |X|
= …
X
a
U

To simplify, we’ll assume that a and X are finite, so we can find their size (denoted |…|). If we are given X, the probability of a is computed by counting entities in X, as shown.

In effect, when you consider p(a|X), X is effectively the new universal set.
13

Bayes’ Rule: Intuitive Derivation
Want p(a|X) … = |a∩X| / |X|
= (|a| * |a∩X|/|a|) / |X|
= (|a|/|U| * |a∩X|/|a|) / |X|/|U|
= p(a) * p(X|a) / p(X)
X
a
U

By multiplying this quotient by |a| and dividing by the same amount (thus changing the form but leaving the value unchanged), we obtain Bayes’ formula, as in the figure.
14

Bayes’ Rule
p(C|X) = p(C)p(X|C)/p(X)
“Prior” probabilities
“Prior” probabilities
“Prior” probabilities

The expressions on the right are known as prior probabilities because they can be computed separately, “before” computing the conditional probability that we seek.
15

Bayes’ Rule: Example
p(C|X) = p(C)p(X|C)/p(X)
p(will be raining next 7am,
given raining (now) at 11 pm)
Assume that we don’t have the data for this but do have

p(raining at 11 pm, given raining following 7 am)

Here is another example: It is 11 pm and I observe that it is raining. I want the probability that it will rain during my 7 am commute tomorrow. Suppose that data has been collected to compute the opposite probability: given that it’s raining at 7 am, the fraction of times that it was raining at 11 pm the previous night. In other words, it has been convenient to note data when it’s 7 am and raining.

Assume that the following probabilities have already been computed:

–that it rained the previous night at 11 pm, given that it rains at 7 am (as noted above)
–that it rains at 11 pm (i.e., in general—probably obtainable from existing sources)
–that it rains at 7 am (probably from existing sources)

16

Bayes’ Rule: Example
p(C|X) = p(C)p(X|C)/p(X)
p(raining at 11 pm)
p(raining at 11 pm, given raining following 7 am)
p(raining at 7 am)
p(will be raining next 7am, given raining at 11 pm)

Knowing these (shown in green on the figure), we can compute the desired probability (in red).
17

Application Example: Blockchain
Estimate demand (D) for a product using “block count” (mention (M) of its SKU in a blockchain). The key prior here is how frequently the product is mentioned under various demand conditions. Demand can be measured by sales figures.

p(D|M) = p(D) p(M|D) / p(M)

Online Bayes Calculators

The Bayes calculator in the figure allows any number (k) of conditional events, and allows the entry of all p(B|Ak)’s—or of p(Ak B), from which p(B|Ak) can be calculated as in the derivation shown previously of Bayes’ rule.
19

Bayesian Calculator

A1
A2
B

The figure shows the sets involved in the calculation. Forty percent of B elements are A elements.
20

Bayesian Calculator Execution

https://stattrek.com/online-calculator/bayes-rule-calculator.aspx

Example 2

A1
A2
B

In the example shown (#2), we have the actual prior conditional probabilities.
22

Example 2 Execution

A1
A2
B

Product Rule
p(A and B) = p(B|A)p(A)
Also denoted p(A, B)

The expressions on the right are known as prior probabilities because they can be computed separately, “before” computing the conditional probability that we seek.
24

Using Independent Variables
Bayes:
p(X)∙p(A|X) = p(A)∙p(X|A)
—measures X and A

More generally , measure…
p(X) p(A|X)  p(B|X)  p(C|X)
“prob of X and A-given-X and B-given-X…
With no classification knowledge
With more classification knowledge

Bayes’ law with events X and A essentially measures “X and A.” The common use of Bayesian reasoning is to compare outcomes based on multidimensional data by using the more general form of Bayes’ rule shown in the figure (in this case, for 3 dimensions).

Suppose that A, B, and C are observable events such as A = “customer browsed chairs in the past 2 months,” B = “customer bought a house within the past year,” and C = “customer browsed fabrics in the past 2 months.” We want to know how likely it is that the customer is interested in a couch, or an arm chair etc. We base this on the fraction (probability) of those customers who actually bought a couch had previously browsed chairs etc.

The Bayesian quantity p(X)  p(A|X)  p(B|X)  p(C|X) measures X as a probable outcome, in the presence of three other events.

25

Cavity Example: Given (a priori) Probabilities
Russell and Norvig

catch
catch
cavity
cavity
toothache
toothache

Using Bayes in reality is an arithmetic-intensive process in which the probability of every possible outcome is computed. The figure pertains to data about a patient, whose probability of having a cavity is computed. This is based on eight priors such as the probability of (anyone) with a cavity having a toothache, as well as a “catch” is 0.108.
26

Cavity Example: Given (a priori) Probabilities
Russell and Norvig

P(cavity)= 0 .108 + 0.012 + 0.072 + 0.008 = 0.2

The summation formula indicates adding all of these joint probabilities because the events are disjoint, and comprise all possibilities. We will apply this pattern several times in what follows.
27

Another Decomposition Form: Conditioning

Bayes formula for multiple events also computes a probability, as shown in the figure.
28

As Conditional Probability

Russell and Norvig

Bayes allows us to answer a wide variety of combination questions such as the probability that a person with a toothache has a cavity.
The figure shows a form of Bayes with an AND instead of the equivalent conditional. The sums follow from the logical completeness of the predicates (e.g., catch AND catch).

29

Normalization

Russell and Norvig

Using Bayes amounts to comparing probabilities. The paragraph in the figure notes that the probabilities compared are all of the form x/p(k) where k is always the same. For that reason, it is sufficient to drop the denominator, a process know as normalization.
30

Normalization

Russell and Norvig

Using normalization allows streamled determination of the comparative probabilities, written in the for <…> as shown.
31

Uncertainty and Bayesian Networks
32
Uncertainty
Bayes’ Rule
Example
Appendix: Pruning

In this section we present an example of Bayesian “reasoning.”

Example (Geology)
Quest: dolomite or shale?

Observed: Gamma ray reading
>60 or not

The example is to predict which of dolomite or shale is more likely given a gamma ray reading.
33

Example Application: Geology
https://www.yumpu.com/en/document/read/19083769/applications-of-bayes-theorem

Seek

The figure shows the symbolic formulation of the problem.
34

Example Application: Geology
https://www.yumpu.com/en/document/read/19083769/applications-of-bayes-theorem

All of the priors are shown here: in other words, the probabilities that can be computed. For example, P(A|B2) is obtained by subjecting (known) shale to gamma rays.
35

Example Application: Geology
https://www.yumpu.com/en/document/read/19083769/applications-of-bayes-theorem

We will be comparing the probability of shale vs. dolomite, and the probability in the figure is a common denominator, so we can normalize as discussed, and so computing this is not really necessary.
36

Example Application: Geology

https://www.yumpu.com/en/document/read/19083769/applications-of-bayes-theorem

Now we apply Bayes to compute the probability of each of the two rock types give that a gamma ray reading greater than 60 was observed.
37

Example: Wumpus World

Agent
Russell and
In squares adjacent to …

a pit: a breeze.

a wumpus: a stench
https://thiagodnf.github.io/wumpus-world-simulator/

We now move a bit closer to the complexities of the real world (even though we’ll stay with WumpusWorld).
38

Suppose agent moves [1,1] to [2,1].

Perceives breeze in [2,1],

Logical Reasoning in WW

We have seen that, to guide the hero’s actions, logic is a sound approach. But this is in theory—and pure logic requires all of the facts. We rarely have all of the facts when faced with having to act. For example, if the hero is in (position) 1.1, they have all the necessary knowledge to proceed (B=Breeze and P=Pit). But after that, my appropriate moves from 2.1 are uncertain because they don’t know whether the “breeze” emanates from 2.2 or 3.1.
39

Suppose agent moves [1,1] to [2,1].

Perceives breeze in [2,1],

Must be a pit in neighboring square—[2,2], [3,1] or both.

Prudent agent will go back to [1,1], then to [1,2].

Logical Reasoning in WW
Logical Reasoning in WW

This figure delineated the prudent backtracking action.
40

Logical Reasoning in WW
Given the (negative) info:

Wumpus can’t be in [1,1]

can’t be in [2,2]

thus (…) wumpus in [1,3]

No breeze in [1,2] implies no pit in [2,2].

Thus pit in [3,1].

This figure shows the result of purely logical reasoning (P=Pit, S=Smell, W=Wumpus, V=Visited).
41

But Agent’s Sensors: Only Partial Global Info
Below, each reachable square—[1,3], [2,2], and [3,1]—might contain a pit.

Russell and Norvig

The figure describes the current uncertainty about the next move to the three gray squares.
42

Logical vs. Probabilistic
Russell and Norvig
Pure logical inference based on incomplete information can conclude nothing about which square is most likely to be safe
… might have to choose randomly.

Probabilistic agent much better.

We must thus accept and embrace uncertainty, I.e., use probabilistic reasoning (pure randomness being unnecessary).
43

Wumpus World Example
Russell and Norvig
In this situation 
know …

… observed breeze (T or F) in visited (white) squares +

… each contains no pit.

The figure shows the conversion of the uncertain situation into a probabilistic statement, specifically including P(Position 1.3 contains a pit | (known  b).
44

Apply Joint Probability Approach Again

Russell and Norvig

Similarly,
Let unknown be the set of Pi,j variables for squares other than the known squares and the query square [1,3].

Recall:
Range over all truth values of any particular set of RV’s
Range over all truth values of all subsets of any particular set of RV’s

The figure recalls the approach used to calculate P(Cavity | toothache), which converts it to P(Cavity AND toothache) with normalization factor  (which we will not need to calculate because we are doing comparisons with the results for each course of action).
45

More Generally, Identify Random Variables;
Use Joint Probabilities
Define Boolean variables Pij and Bij for each square:

Pij TRUE  square [i,j] contains a pit

Bij TRUE  square [i,j] breezy

We actually need, for every square, the probability that it contains a pit as well as whether it is breezy. That would enable us to make trade-offs in moving.
46

Identify Desired Probabilities: Example
Russell and Norvig

Example: probability of pits as shown and breezes as shown.

The figure expresses the probability distribution for pits at all locations (only the diagonal ones are shown) together with breezes at the three places of interest.
47

Identify Random Variables;
Manipulate Joint Probabilities
Russell and Norvig

product rule
Boolean variables Pij and Bij for each square:

Pij TRUE  square [i,j] contains a pit

Bij TRUE  square [i,j] breezy

The desired joint probability distribution can be decomposed into the product shown.
48

Russell and Norvig

Russell and Norvig

The second term of the product expresses a probability distribution (function) but we can compute it for each occurrence of pits. The calculation is for n of the locations to be pits and the rest not pits. The occurrences are independent (a pit at one location has no influence on whether there is a pit at another).
49

Summary: Uncertainty
Use expert’s rules if possible
Use Bayes when priors can be computed
Mathematical / statistical
Computationally intensive
Heuristics available

/docProps/thumbnail.jpeg