IS 2440 Artificial Intelligence
Week 4: Agents with Uncertain Info
Daqing He January 28, 2020
1
© Daqing He
Many slides are from Dan Klein’s (Berkeley) course
Muddiest Points
• Knowledge-basedAgents
– We know ambiguity can exist in natural language, can it exist in propositional logic
or first-order logic language?
– For the knowledge-based agent, what should we do if there is no way to do other infer? For example, in the Wumpus world game, what should we do if based on all the information we have we still cannot know which next cell is safe.
– Can we create knowledge-based agents that are also ethical by adding “ethical” knowledge to the knowledge base?
– AI should have learned by itself right? But from what we have learned last week, it seems like, in order to have a workable agent, we have to both program the real- world environment and what this agent should do in a certain case. How is that
different from the traditional way of writing a program?
2 © Daqing He
Muddiest Points
• Knowledge base
– How does the idea of a monotonic knowledge base interact with the idea of unexplored knowledge? We could represent facts in the world which might seem contradictory, but in reality are not. For example: “light is a particle”; “light is a wave”. Both of these seem to be true, but also contradictory if we assume that a wave and a particle are different. So can we not add things to a knowledge base that are not understood?
– It says that as an example of Entailment, the KB containing the statements “the Panther won” and “the Penguin won” entails “Either the Panther won or the Penguin won.” However, why are these statements mutually exclusive, i.e. why must it be the case that for the Penguin/Panther to win, the Panther/Penguin must
lose? Also, for the other example, wouldn’t it be more accurate to say that x + 5 ≥ y entails x + 7 > y, as if x + 5 ≥ y, mustn’t it be the case that x + 7 > y is true.
3 © Daqing He
Muddiest Points
• First order logic
– What’s the difference between constant symbols and objects. Besides, which means is that all of the objects are constant symbols? What’s the relationship between them?
– In AIMA 8.2.2 why “LeftLeg” is function symbol? What is LeftLeg’s functionality?
– In page 69 about first order logic, what is the difference between relations and functions? why father of is a function and brother of is a relation?
– Is there any limitations of first order logic for knowledge representation?
4 © Daqing He
Muddiest Points
• Can you give some actual use cases of “Propositional logic” and “First Order Logic”, is there any program based on these two?
– http://people.dbmi.columbia.edu/~ehs7001/Buchanan-Shortliffe- 1984/MYCIN%20Book.htm
– https://en.wikipedia.org/wiki/Expert_system • Generalizability
– When agents have solved a problem, for example one scenario of the Wumpus problem, what happens to the knowledge base it used to solve that problem since it will probably not be used in any future Wumpus problem with different environment?
5 © Daqing He
Resolution example
• Unit resolution rule
– Where li and m are complementary literals, i.e., m = ¬ li
• Proof by contradiction, i.e., show KBÙ¬α unsatisfiable
– KB = (B1,1 Û (P1,2Ú P2,1)) Ù¬ B1,1 α = ¬P1,2
6
© Daqing He
Muddiest Points
• ForwardandBackwardChain
– What results in the difference between the complexity of forward
chaining and backward chaining?
– Will backward chaining always perform poorly when dealing with tasks of detection or tasks in which Forward chaining excels even though it is doing very less irrelevant work ?
– Can Forward Chaining and Backward Chaining be use together by the agent when solving a problem? Like using forward chaining first to learn more about the environment then using that information with backward chaining to solve a problem?
– Do humans program agent to do either forward chaining or backward chaining or can the agent decide for itself to do one or the other?
7 © Daqing He
Agenda
• BasicProbabilisticInference
• Markov Chain
• Hidden Markov Model
8 © Daqing He
Class Goals
• Afterthisclass,youshouldbeableto
– identify main ideas behind reasoning under uncertainty with
probabilistic inference
– identify main ideas behind Markov chain and Hidden Markov Model
– describe Markov chain and Hidden Markov, and make simple inference.
9 © Daqing He
Agents handling uncertainty
10 © Daqing He
Knowledge Agents need Knowledge Base
• Thesizeandcomplexityofknowledgebase increase with the complexity of the task
• Lackofmechanismtohandlecomplexity
• Selecting knowledge to make inference
• Interactions among different knowledge
• Incompleteness in knowledge
11 © Daqing He
Uncertainty
• In agent space, uncertainty refers to an agent may never know for sure what state it is in or where it will end up after a sequence of actions
• This has serious drawbacks
– Agent must consider every logically possible explanation for the observations, no matter how unlikely.
• This leads to impossible large and complex state representations
– Agent has no effective way to select among different possible states
• because it does not know how to compare the merits of plans.
12 © Daqing He
Can we avoid uncertainty?
• In general, we cannot because
• 1) the problem space can have partially observability, nondeterminism or combination
– This means the problem itself has certain degree of uncertainty
• 2) our imperfection or ignorance in theories and practice
– Our science theories are not fully developed: such as in medicine and social
sciences
– Our experiment/tests cannot fully cover all aspects of the problems, such as knowledge on individual patients in a disease
13 © Daqing He
Rational Agent that can handle uncertainty
• We define an agent is rational if and only if it chooses the action that yields the highest expected utility, averaged over all the possible outcomes of the action
– This highest expected utility is called maximum expected utility (MEU)
• Therefore, rational agent depends on the relative importance of various goals and the likelihood that they will be achieved.
– Probability theory provide us the modeling of likelihood
• a numerical degree of belief between 0 (certainly to be false) and 1 (certainly true)
• Decision theory
– Decision theory = probability theory + utility theory
14 © Daqing He
Basic Probabilistic Inference
15 © Daqing He
Random Variable
• Arandomvariabledescribesaneventwhoseoutcome has some degree of uncertainty
– R = Is it raining?
– T=Isithotorcold?
– D = How long will it take to drive to work? – L = Where is the Wampus?
• Random variables have domains
– R in {true, false} (often write as {+r, -r})
– T in {hot, cold}
– Din[0,∞)
– L in possible locations, maybe {(1,1), (1,2), …}
16 © Daqing He
Probability and Probability Distribution
• Probability
– The likeliness that an outcome happens
– All probability is between 0 and 1 (inclusive) – P(dice take value 6) = 1/6
• ProbabilitydistributionofadiscreterandomvariableA – The probability of each outcome of a discrete random variable A – Often can be obtained through the number of sampling n → ∞
– Sum of all probabilities for a variable A have to be 1
T
P
hot
0.5
cold
0.5
W
P
sun
0.6
rain
0.1
fog
0.3
meteor 0.0
© Daqing He
17
Joint Distributions
• A joint distribution over a set of random variables: specifies a real number for each assignment (or outcome):
– Must obey:
T
W
P
hot
sun
0.4
hot
rain
0.1
cold
sun
0.2
cold
rain
0.3
18 © Daqing He
Events
• An event is a set E of outcomes
• From a joint distribution, we can
calculate the probability of any event
– Probability that it’s hot AND sunny?
– Probability that it’s hot?
– Probability that it’s hot OR sunny?
T
W
P
hot
sun
0.4
hot
rain
0.1
cold
sun
0.2
cold
rain
0.3
19 © Daqing He
Quiz: Events
• P(+x, +y) ?
• P(+x) ?
• P(-y OR +x) ?
X
Y
P
+x
+y
0.2
+x
-y
0.3
-x
+y
0.4
-x
-y
0.1
20 © Daqing He
Marginal Distributions
• Marginal distributions are sub-tables which eliminate variables
• Marginalization (summing out): Combine collapsed rows by adding
T
P
hot
0.5
cold
0.5
T
W
P
hot
sun
0.4
hot
rain
0.1
cold
sun
0.2
cold
rain
0.3
W
P
sun
0.6
rain
0.4
21 © Daqing He
Quiz: Marginal Distributions
X
P
+x
-x
X
Y
P
+x
+y
0.2
+x
-y
0.3
-x
+y
0.4
-x
-y
0.1
Y
P
+y
-y
22 © Daqing He
•
Conditional Probabilities
A simple relation between joint and conditional probabilities
– Infact,thisistakenasthedefinitionofaconditionalprobability
P(a) is called prior probability
P(a|b) is called
posterior probability
P(a,b)
P(a) P(b)
T
W
P
hot
sun
0.4
hot
rain
0.1
cold
sun
0.2
cold
rain
0.3
23
© Daqing He
Normalization Trick
T
W
P
hot
sun
0.4
hot
rain
0.1
cold
sun
0.2
cold
rain
0.3
W
P
sun
0.4
rain
0.6
24 © Daqing He
Quiz: Conditional Probabilities
• P(+x | +y) ?
• P(-x | +y) ?
X
Y
P
+x
+y
0.2
+x
-y
0.3
-x
+y
0.4
-x
-y
0.1
• P(-y|+x)?
25 © Daqing He
Probabilistic Inference
• Probabilistic inference: compute a desired probability from other known probabilities (e.g. conditional from joint)
• We generally compute conditional probabilities
– P(on time | no reported accidents) = 0.90
– Theserepresenttheagent’sbeliefsgiventheevidence
• Probabilities change with new evidence:
– P(on time | no accidents, 5 a.m.) = 0.95
– P(on time | no accidents, 5 a.m., raining) = 0.80
– Observingnewevidencecausesbeliefstobeupdated
26 © Daqing He
Inference by Enumeration
• General case:
– Evidence variables:
– Query* variable:
– Hidden variables:
§ Step 1: Select the entries consistent with the evidence
§ We want:
* Works fine with multiple query variables, too
All variables
§ Step 2: Sum out H to get joint of Query and evidence
§ Step 3: Normalize ⇥1
Z
27
© Daqing He
Inference by Enumeration
• General case:
– Evidence variables:
– Query* variable:
– Hidden variables:
§ Step 1: Select the entries consistent with the evidence
§ We want:
* Works fine with multiple query variables, too
All variables
§ Step 2: Sum out H to get joint of Query and evidence
§ Step 3: Normalize ⇥1
Z
28
© Daqing He
Inference by Enumeration
• P(W | winter, hot)?
– Step 1:
– Step 2: Obtain joint probability P(W, winter, hot)
– Step 3: Obtain Z using
Z = P(sun, winter, hot) + P(rain, winter, hot) = 0.10 + 0.05 = 0.15
– Calculate P( W | winter, hot) using
S
T
W
P
summer
hot
sun
0.30
summer
hot
rain
0.05
summer
cold
sun
0.10
summer
cold
rain
0.05
winter
hot
sun
0.10
winter
hot
rain
0.05
winter
cold
sun
0.15
winter
cold
rain
0.20
S
T
W
P
winter
hot
sun
0.10
winter
hot
rain
0.05
S
T
W
P(W| winter, hot)
winter
hot
sun
0.10/0.15 = 0.67
winter
hot
rain
0.05/0.15 = 0.33
29 © Daqing He
S
T
W
P
summer
hot
sun
0.30
summer
hot
rain
0.05
summer
cold
sun
0.10
summer
cold
rain
0.05
winter
hot
sun
0.10
winter
hot
rain
0.05
winter
cold
sun
0.15
winter
cold
rain
0.20
Inference by Enumeration
• P(W | winter)? – Step 1:
30 © Daqing He
Inference by Enumeration
• P(W | winter)?
– Step 1:
– Step 2: hidden variable is T, sum out it
S
T
W
P
winter
hot
sun
0.10
winter
hot
rain
0.05
winter
cold
sun
0.15
winter
cold
rain
0.20
31 © Daqing He
Inference by Enumeration
• P(W | winter)?
– Step 1:
– Step 2: hidden variable is T, sum out it
– Step 3: Obtain Z using
Z = P(sun, winter) + P(rain, winter) = 0.25 + 0.25 = 0.50
– Calculate P( W | winter) using
S
W
P
winter
sun
0.25
winter
rain
0.25
S
W
P(W|winter)
winter
sun
0.5
winter
rain
0.5
32 © Daqing He
Inference by Enumeration
• Can you calculate P(W)?
S
T
W
P
summer
hot
sun
0.30
summer
hot
rain
0.05
summer
cold
sun
0.10
summer
cold
rain
0.05
winter
hot
sun
0.10
winter
hot
rain
0.05
winter
cold
sun
0.15
winter
cold
rain
0.20
33 © Daqing He
Inference by Enumeration
§ Obvious problems:
§ Worst-case time complexity O(dn)
§ Space complexity O(dn) to store the joint distribution
34 © Daqing He
The Product Rule
• Sometimes have conditional distributions but want the joint
35 © Daqing He
The Product Rule
• Example:
D
W
P
wet
sun
0.1
dry
sun
0.9
wet
rain
0.7
dry
rain
0.3
D
W
P
wet
sun
0.08
dry
sun
0.72
wet
rain
0.14
dry
rain
0.06
W
P
sun
0.8
rain
0.2
36 © Daqing He
Bayes’ Rule
• Two ways to factor a joint distribution over two variables:
• Dividing, we get:
• Why is this at all helpful?
– Lets us build one conditional from its reverse
– Often one conditional is tricky but the other one is simple
– Foundation of many systems we’ll see later (e.g. ASR, MT)
• In the running for most important AI equation! 37
That’s my rule!
© Daqing He
Inference with Bayes’ Rule
• Example: Diagnostic probability from causal probability: P (cause|e↵ect) = P (e↵ect|cause)P (cause)
• Example:
– M: meningitis, S: stiff neck
P (e↵ect)
P (+m) = 0.0001
P (+s| + m) = 0.8 P(+s| m) = 0.01
P(+m| + s) = P(+s| + m)P(+m) = P(+s)
P(+s| + m)P(+m)
P(+s| + m)P(+m) + P(+s| m)P( m)
=
0.8 ⇥ 0.0001 = 0.8 ⇥ 0.0001 + 0.01 ⇥ 0.9999
Example givens
– Note: posterior probability of meningitis still very small
– Note: you should still get stiff necks checked out! Why?
38
© Daqing He
Quiz: Bayes’ Rule
• Given:
D
W
P
wet
sun
0.1
dry
sun
0.9
wet
rain
0.7
dry
rain
0.3
R
P
sun
0.8
rain
0.2
• What is P(W | dry) ?
4391 © Daqing He
The Chain Rule
• More generally, can always write any joint distribution as an incremental product of conditional distributions
• Why is this always true?
40 © Daqing He
Markov Chain
41 © Daqing He
Independence
• Two variables are independent in a joint distribution if: We write
– Saysthejointdistributionfactorsintoaproductoftwosimpleones
– Usually variables aren’t independent!
• Can use independence as a modeling assumption
– Independence can be a simplifying assumption
– Empirical joint distributions: at best “close” to independent
– What could we assume for {Weather, Traffic, Cavity}?
42 © Daqing He
Example: Independence
• N fair, independent coin flips:
H
0.5
T
0.5
H
0.5
T
0.5
H
0.5
T
0.5
43 © Daqing He
Example: Independence?
T
P
hot
0.5
cold
0.5
P2(T,W) = P(T)P(W)
T
W
P
hot
sun
0.4
hot
rain
0.1
cold
sun
0.2
cold
rain
0.3
T
W
P
hot
sun
0.3
hot
rain
0.2
cold
sun
0.3
cold
rain
0.2
W
P
sun
0.6
rain
0.4
44 © Daqing He
Conditional Independence
• Unconditional (absolute) independence very rare
• Conditional independence is our most basic and robust form of
knowledge about uncertain environments.
• X is conditionally independent of Y given Z if and only if:
or, equivalently, if and only if
45 © Daqing He
Conditional Independence
• P(Toothache, Cavity, Catch)
• If I have a cavity, the probability that the probe catches in
it doesn’t depend on whether I have a toothache:
– P(+catch | +toothache, +cavity) = P(+catch | +cavity)
• The same independence holds if I don’t have a cavity: – P(+catch | +toothache, -cavity) = P(+catch| -cavity)
• Catch is conditionally independent of Toothache given Cavity:
– P(Catch | Toothache, Cavity) = P(Catch | Cavity)
§ Equivalent statements:
§ P(Toothache | Catch , Cavity) = P(Toothache | Cavity)
§ P(Toothache, Catch | Cavity) = P(Toothache | Cavity) P(Catch | Cavity) § One can be derived from the other easily
46 © Daqing He
Reasoning over Time or Space
• Often, we want to reason about a sequence of observations
– Speech recognition
– Robot localization
– User attention
– Medical monitoring
• Need to introduce time (or space) into our models
47 © Daqing He
First/Second order Markov chains
First-order Markov chain:
Second-order Markov chain:
48 © Daqing He
Markov Models
– Value of X at a given time is called the state X1 X2 X3 X4
– Parameters: called transition probabilities or dynamics, specify how the state evolves over time (also, initial state probabilities)
– Stationarity assumption: transition probabilities the same at all times
49 © Daqing He
Joint Distribution of a Markov Model
X1 X2 X3 X4
– Joint distribution:
P (X1, X2, X3, X4) = P (X1)P (X2|X1)P (X3|X2)P (X4|X3)
– More generally:
P ( X 1 , X 2 , . . . , X T ) = P ( X 1 ) PY( X 2 | X 1 ) P ( X 3 | X 2 ) . . . P ( X T | X T 1 ) T
= P (X1) P (Xt|Xt 1) t=2
• Does this indeed define a joint distribution?
• Can every joint distribution be factored this way, or are we making some assumptions about
– Questions to be resolved:
the joint distribution by using this factorization?
50 © Daqing He
•
Chain Rule and Markov Models
X1 X2 X3 X4
From the chain rule, every joint distribution over X1, X2, X3, X4 can be written
as:
P (X1, X2, X3, X4) = P (X1)P (X2|X1)P (X3|X1, X2)P (X4|X1, X2, X3)
Assuming that X3 ? X1 | X2 and
results in the expression posited on the previous slide:
P (X1, X2, X3, X4) = P (X1)P (X2|X1)P (X3|X2)P (X4|X3)
51 © Daqing He
•
X4 ?X1,X2 |X3
•
Chain Rule and Markov Models
X1 X2 X3 X4
From the chain rule, every joint distribution over X1, X2, . . . , XT
t=2
can be
written as: YT P (X1 , X2 , . . . , XT ) = P (X1 )
P (Xt |X1 , X2 , . . . , Xt 1 ) Xt ?X1,…,Xt 2 |Xt 1
•
Assuming that for all t:
gives us the expression posited on the earlYier slide:
T
P (X1 , X2 , . . . , XT ) = P (X1 ) P (Xt |Xt 1 )
52 t=2
©DaqingHe
Implied Conditional Independencies
X1 X2 X3 X4
• Weassumed: X3 ?X1 |X2 and X4 ?X1,X2 |X3
• Dowealsohave – Yes!
– Proof:
X1 ?X3,X4 |X2 ? P(X1 | X2,X3,X4) = P(X1,X2,X3,X4)
P (X2, X3, X4)
= PP(X1)P(X2 | X1)P(X3 | X2)P(X4 | X3)
x1 P(x1)P(X2 | x1)P(X3 | X2)P(X4 | X3) = P(X1,X2)
P(X2)
53 = P(X1 | X2) © Daqing He
Markov Models Recap
• Explicit assumption for all t : Xt ? X1,…,Xt 2 | Xt 1
• Consequence, joint distribution can be written as:
P ( X 1 , X 2 , . . . , X T ) = P ( X 1 ) PY( X 2 | X 1 ) P ( X 3 | X 2 ) . . . P ( X T | X T 1 ) T
= P (X1) P (Xt|Xt 1) t=2
• Implied conditional independencies: (try to prove this!) – Past variables independent of future variables given the present
i.e.,if t1
• Additional explicit assumption: P(Xt | Xt 1) is the same for all t
54 © Daqing He
Example Markov Chain: Weather
• States: X = {rain, sun}
§ Initial distribution: 1.0 sun § CPT P(Xt | Xt-1):
Xt-1
Xt
P(Xt|Xt-1)
sun
sun
0.9
sun
rain
0.1
rain
sun
0.3
rain
rain
0.7
0.9 rain sun
0.7 0.1
0.3
0.9 sun 0.1
0.3 rain 0.7
sun
rain
55
© Daqing He
Two new ways of representing the same CPT
Example Markov Chain: Weather
0.9 rain sun
0.7
• Initial distribution: 1.0 sun
0.3
• What is the probability distribution after one step?
0.1
56
© Daqing He
Mini-Forward Algorithm
• Question: What’s P(X) on some day t? X1 X2 X3 X4
P(xt)= XP(xt 1,xt)
Xxt 1 xt 1
=
P(xt | xt 1)P(xt 1)
Forward simulation
57 © Daqing He
Example Run of Mini-Forward Algorithm
§ From initial observation of sun
P(X1) P(X2) P(X3) P(X4)
§ From initial observation of rain P(X1) P(X2) P(X3) P(X4)
§ From yet another initial distribution P(X1): …
58
P(X¥)
P(X¥)
P(X1)
P(X¥) © Daqing He
[Demo: L13D1,2,3]
•
For most chains:
– Influence of the initial distribution gets less and less over time.
– The distribution we end up in is independent of the initial distribution
§ Stationary distribution:
§ The distribution we end up with is called
the stationary distribution P1 of the chain § It satisfies
X
x
Stationary Distributions
P1(X) = P1+1(X) =
P (X|x)P1(x)
59
© Daqing He
Example: Stationary Distributions
• Question: What’s P(X) at time t = infinity? X1 X2 X3 X4
P1(sun) = P (sun|sun)P1(sun) + P (sun|rain)P1(rain) P1(rain) = P (rain|sun)P1(sun) + P (rain|rain)P1(rain)
Xt-1
Xt
P(Xt|Xt-1)
sun
sun
0.9
sun
rain
0.1
rain
sun
0.3
rain
rain
0.7
P1(sun) = 0.9P1(sun) + 0.3P1(rain) P1(rain) = 0.1P1(sun) + 0.7P1(rain)
P1(sun) = 3P1(rain) P1(rain) = 1/3P1(sun)
Also: P1(sun) + P1(rain) = 1
60
P1(sun) = 3/4 P1(rain) = 1/4
© Daqing He
Application of Stationary Distribution: Web Link Analysis
• PageRank over a web graph
– Each web page is a state
– Initial distribution: uniform over pages
– Transitions:
• With prob. c, uniform jump to a
random page (dotted lines, not all shown)
• With prob. 1-c, follow a random outlink (solid lines)
• Stationary distribution
– Will spend more time on highly reachable pages
– E.g. many ways to get to the Acrobat Reader download page
– Somewhat robust to link spam
– Google 1.0 returned the set of pages containing all your keywords in decreasing rank, now all search engines use link analysis along with many other factors (rank actually getting less important over time)
61 © Daqing He
Hidden Markov Model
62 © Daqing He
Hidden Markov Models
• Markov chains not so useful for most agents
– “Conditioned on the present, the past & future are
independent” is often not true
– Need observations to update your beliefs
• Hidden Markov models (HMMs)
– Underlying Markov chain over states X
– You observe outputs (effects) at each time step
X1 X2 X3 X4 X5
63
© Daqing He
E
1
E
2
E
3
E
4
E
5
Hidden Markov Model
An HMM is a temporal probabilistic model in which the state of the process is described by a single discrete random variable.
Hidden
Observable
64 © Daqing He
States and observations
View world as a series of snapshots, or time slice.
: the set of state variables at time e, unobservable. : the set of observable evidence variables.
The observation at time t is .
65
© Daqing He
Transition model & sensor model
• Given a set of state and evidence variable for a given problem, how the state evolves (the transition model), and how the evidence variable get their values (the sensor model).
• Transition model: it rains yesterday, will it rain or not today?
• Sensor model: will director takes an umbrella if it rains?
66 © Daqing He
Transition model & Markov Model
Transition model specify the probability distribution of current state, given previous values.
NT: t is unbounded.
Markov assumption: current state depends on only a finite fixed number of previous states.
Processes satisfying assumption are called Markov chains.
67 © Daqing He
Example: Umbrella & Rain
A secret guard living in the underground installation, want to know whether it’s raining today by observing the director coming in with or without an umbrella.
: evidence variable, umbrella appear, not appear. : state variable, Rain, Sunny.
68 © Daqing He
Sensor Model
Sensor Model specifies the probability distribution of evidence variable, given previous values.
Sensor Markov assumption:
69 © Daqing He
Example: Weather HMM
P(Xt | Xt 1) Raint-1 Raint
P(Et |Xt) Umbrellat-1 Umbrellat
• An HMM is defined by: – Initial distribution:
Raint+1
Umbrellat+1
Xt
Xt+1
P(Xt+1|Xt)
+r
+r
0.7
+r
-r
0.3
-r
+r
0.3
-r
-r
0.7
Xt
Et
P(Et|Xt)
+r
+u
0.9
+r
-u
0.1
-r
+u
0.2
-r
-u
0.8
– Transitions: – Emissions:
P(Xt | Xt 1) P(Et |Xt)
70
© Daqing He
Joint probability distribution
For any time t:
Transition Model
Sensor Model
71 © Daqing He
Joint Distribution of an HMM
X1 X2 X3 X5 E1 E2 E3 E5
– Joint distribution:
P (X1, E1, X2, E2, X3, E3) = P (X1)P (E1|X1)P (X2|X1)P (E2|X2)P (X3|X2)P (E3|X3) – More generally: YT
P (X1 , E1 , . . . , XT , ET ) = P (X1 )P (E1 |X1 ) P (Xt |Xt 1 )P (Et |Xt )
t=2
• Can every joint distribution be factored this way, or are we making some assumptions about the
joint distribution by using this factorization?
– Questions to be resolved:
• Does this indeed define a joint distribution?
72 © Daqing He
Chain Rule and HMMs X1 X2 X3 E1 E2 E3
•
From the chain rule, every joint distribution over X1, E1, X2, E2, X3, E3 can be written as:
P (X1, E1, X2, E2, X3, E3) =P (X1)P (E1|X1)P (X2|X1, E1)P (E2|X1, E1, X2) P (X3|X1, E1, X2, E2)P (E3|X1, E1, X2, E2, X3)
Assuming that
•
X2 ?E1 |X1, E2 ?X1,E1 |X2, X3 ?X1,E1,E2 |X2, E3 ?X1,E1,X2,E2 |X3 gives us the expression posited on the previous slide:
P (X1, E1, X2, E2, X3, E3) = P (X1)P (E1|X1)P (X2|X1)P (E2|X2)P (X3|X2)P (E3|X3) 73 © Daqing He
X1 X2 X3 E1 E2 E3
From the chain rule, every joint distribution over X1, E1, . . . , XT , ET can be written as: YT
Chain Rule and HMMs
•
•
P (X1 , E1 , . . . , XT , ET ) = P (X1 )P (E1 |X1 ) P (Xt |X1 , E1 , . . . , Xt 1 , Et 1 )P (Et |X1 , E1 , . . . , Xt 1 , Et 1 , Xt ) t=2
Assuming that for all t:
– State independent of all past states and all past evidence given the previous state, i.e.:
Xt ? X1,E1,…,Xt 2,Et 2,Et 1 | Xt 1
– Evidence is independent of all past states and all past evidence given the current state, i.e.:
Et ? X1,E1,…,Xt 2,Et 2,Xt 1,Et 1 | Xt gives us the expression posited on the earlier sliYdTe:
P (X1 , E1 , . . . , XT , ET ) = P (X1 )P (E1 |X1 ) P (Xt |Xt 1 )P (Et |Xt )
74 © Daqing He
t=2
Inference on HMM Four inference tasks
• Filtering: identify the current state probability distribution given all evidences to date.
• Prediction
• Smoothing
• Most likely explanation.
75 © Daqing He
Filtering
Given the result of filtering up to time t, agent can directly compute the result for t+1 from the new evidence et+1.
76 © Daqing He
Filtering: Proof
© Daqing He
77
Filtering
Transition Model Sensor Model
78 © Daqing He
Filtering
79 © Daqing He
Example: Umbrella & Rain
Given t=0, P(rain)=P(sun)=0.5; director takes umbrella in t=1 and t=2. Let’s calculate P(rain1|umbrella 1), and P(rain 2| umbrella 1, umbrella 2).
transition model
today rain
today sun
yesterday rain
0.7
0.3
yesterday sun
0.3
0.7
sensor model
take umbrella
no umbrella
rain
0.9
0.1
sun
0.2
80
0.8
© Daqing
He
Example: Umbrella & Rain
81 © Daqing He
Example: Umbrella & Rain
82 © Daqing He
Filtering & Prediction
Filtering: To identify the current state probability distribution given all evidences to date.
Prediction: To identify a future state probability distribution given all evidences to date.
83 © Daqing He
Prediction
Can be seen as filtering without the addition of new evidence.
84 © Daqing He
Prediction
Can be seen as filtering without the addition of new evidence.
Filtering
85 © Daqing He
Summary
• Probabilitygivesusamodelforhandlinguncertainty
• Conditionalprobabilityenablesustoperforminference
based on evidence
• Timecanbeintroducedasafactorforrepresenting sequence of observations and states
– Directlymodelingthetransitionofstates=>MarkovChain
– Inference based on observed evidence generated from hidden states => Hidden Markov Model
86 © Daqing He
Slides for your information
87 © Daqing He
Inference in temporal models Learn the model: learn the transition and sensor
model with EM in the future class.
Four inference task with the learned models: (1)Filtering, (2)Prediction, (3)Smoothing, (4) Most likely explanation.
88 © Daqing He
Smoothing
Compute the probability distribution of a past time state, given all evidence up to the present.
For example, given umbrella observation data up to today, what is probability of last Wednesday raining?
89 © Daqing He
forward-backward algorithm
90 © Daqing He
forward-backward algorithm
© Daqing He
91
forward-backward algorithm
92 © Daqing He
Example: Umbrella & Rain
93 © Daqing He
94 © Daqing He
Inference in temporal models Learn the model: learn the transition and sensor
model with EM in the future class.
Four inference task with the learned models: (1)Filtering, (2)Prediction, (3)Smoothing, (4) Most likely explanation.
95 © Daqing He
Most likely explanation
Given a sequence of observations, find the sequence of states that is most likely to have generated the observations.
96 © Daqing He
Most likely explanation
A forward procedure:
Transition Model
Sensor Model Last step
97
© Daqing He