程序代写代做代考 algorithm Hidden Markov Mode graph chain html CIS 471/571(Fall 2020): Introduction to Artificial Intelligence

CIS 471/571(Fall 2020): Introduction to Artificial Intelligence
Lecture 17: Hidden Markov Model
Thanh H. Nguyen
Source: http://ai.berkeley.edu/home.html

Reminder
§Homework 4: Bayes Nets §Deadline: Nov 24th, 2020
Thanh H. Nguyen
11/30/20
2

Hidden Markov Model
Thanh H. Nguyen
11/30/20
3

Reasoning over Time or Space
§Often, we want to reason about a sequence of observations
§Speech recognition §Robot localization §User attention §Medical monitoring
§Need to introduce time (or space) into our models

Markov Models
§Value of X at a given time is called the state X1 X2 X3 X4
§ Parameters: called transition probabilities or dynamics, specify how the state evolves over time (also, initial state probabilities)
§ Stationarity assumption: transition probabilities the same at all times § Same as MDP transition model, but no choice of action

Conditional Independence
§Basic conditional independence:
§ Past and future independent given the present §Each time step only depends on the previous §This is called the (first order) Markov property
§Note that the chain is just a (growable) BN
§ We can always use generic BN reasoning on it if we truncate the chain at a fixed length

Example Markov Chain: Weather
§States: X = {rain, sun}
§ Initial distribution: 1.0 sun § CPT P(Xt | Xt-1):
Two new ways of representing the same CPT
Xt-1
Xt
P(Xt|Xt-1)
sun
sun
0.9
sun
rain
0.1
rain
sun
0.3
rain
rain
0.7
0.3
0.9
rain 0.7
sun
sun
rain
0.9 sun 0.1
0.3 rain 0.7
0.1

Example Markov Chain: Weather
§Initial distribution: 1.0 sun
0.9
§What is the probability distribution after one step?
rain 0.7
sun
0.3
0.1

Mini-Forward Algorithm
§Question: What’s P(X) on some day t? X1 X2 X3 X4
P(xt)= XP(xt1,xt)
Xxt1 xt1
=
P(xt | xt1)P(xt1)
Forward simulation

Example Run of Mini-Forward Algorithm
§From initial observation of sun
P(X1) P(X2) P(X3) P(X4)
§From initial observation of rain
P(X1) P(X2) P(X3) P(X4)
§From yet another initial distribution P(X1): …
P(X1)
P(X¥)
P(X¥)
P(X¥)

Stationary Distributions
§For most chains:
§Influence of the initial distribution
gets less and less over time.
§ The distribution we end up in is independent of the initial distribution
§ Stationary distribution:
§ The distribution we end up with is called the stationary distribution P1 of the chain
§ It satisfies X
P1(X) = P1+1(X) = P (X|x)P1(x) x

Example: Stationary Distributions
§Question: What’s P(X) at time t = infinity? X1 X2 X3 X4
P1(sun) = P (sun|sun)P1(sun) + P (sun|rain)P1(rain) P1(rain) = P (rain|sun)P1(sun) + P (rain|rain)P1(rain)
P1(sun) = 0.9P1(sun) + 0.3P1(rain) P1(rain) = 0.1P1(sun) + 0.7P1(rain)
Xt-1
Xt
P(Xt|Xt-1)
sun
sun
0.9
sun
rain
0.1
rain
sun
0.3
rain
rain
0.7
P1(sun) = 3P1(rain) P1(rain) = 1/3P1(sun)
P1(sun) = 3/4 Also: P1(sun) + P1(rain) = 1 P1(rain) = 1/4

Application of Stationary Distribution: Web Link Analysis
§ PageRank over a web graph § Each web page is a state
§ Initial distribution: uniform over pages
§ Transitions:
§ With prob. c, uniform jump to a random page (dotted lines, not all shown) § With prob. 1-c, follow a random outlink (solid lines)
§ Stationary distribution
§ Will spend more time on highly reachable pages
§ E.g. many ways to get to the Acrobat Reader download page
§ Somewhat robust to link spam
§ Google 1.0 returned the set of pages containing all your keywords in decreasing rank, now all search engines use link analysis along with many other factors

Application of Stationary Distributions: Gibbs Sampling*
§ Each joint instantiation over all hidden and query variables is a state: {X1, …, Xn} = H U Q
§ Transitions:
§ With probability 1/n resample variable Xj according to
P(Xj | x1, x2, …, xj-1, xj+1, …, xn, e1, …, em) §Stationary distribution:
§ Conditional distribution P(X1, X2 , … , Xn|e1, …, em)
§ Means that when running Gibbs sampling long enough
we get a sample from the desired distribution § Requires some proof to show this is true!

Hidden Markov Models

Hidden Markov Models
§ Markov chains not so useful for most agents § Need observations to update your beliefs
§Hidden Markov models (HMMs)
§ Underlying Markov chain over states X
§ You observe outputs (effects) at each time step X1 X2 X3 X4 X5
E1 E2 E3 E4 E5

Example: Weather HMM
P(Xt | Xt1) Raint-1 Raint
P(Et |Xt) Umbrellat-1 Umbrellat
§An HMM is defined by: §Initial distribution: §Transitions: P(Xt | Xt1) §Emissions: P(Et | Xt)
Raint+1
Umbrellat+1
Rt-1
Rt
P(Rt|Rt-1)
+r
+r
0.7
+r
-r
0.3
-r
+r
0.3
-r
-r
0.7
Rt
Ut
P(Ut|Rt)
+r
+u
0.9
+r
-u
0.1
-r
+u
0.2
-r
-u
0.8

Conditional Independence
§ HMMs have two important independence properties:
§ Markov hidden process: future depends on past via the present § Current observation independent of all else given current state
X1 X2 X3 X4 X5 E1 E2 E3 E4 E5
§ Quiz: does this mean that evidence variables are guaranteed to be independent? § [No, they tend to correlated by the hidden state]

Real HMM Examples
§Speech recognition HMMs:
§ Observations are acoustic signals (continuous valued)
§ States are specific positions in specific words (so, tens of thousands)
§Machine translation HMMs:
§ Observations are words (tens of thousands) § States are translation options
§Robot tracking:
§ Observations are range readings (continuous) § States are positions on a map (continuous)

Filtering / Monitoring
§ Filtering, or monitoring, is the task of tracking the distribution Bt(X) = Pt(Xt | e1, …, et) (the belief state) over time
§ We start with B1(X) in an initial setting, usually uniform §As time passes, or we get observations, we update B(X)

Example: Robot Localization
Example from Michael Pfeiffer
Prob 0
1
t=0
Sensor model: can read in which directions there is a wall, never more than 1 mistake Motion model: may not execute action with small prob.

Example: Robot Localization
Prob 0
1
t=1
Lighter grey: was possible to get the reading, but less likely b/c required 1 mistake

Example: Robot Localization
Prob 0
1
t=2

Example: Robot Localization
Prob 0
1
t=3

Example: Robot Localization
Prob 0
1
t=4

Example: Robot Localization
Prob 0
1
t=5

The Forward Algorithm
§ We are given evidence at each time and want to know §Induction: assuming we have current belief
𝑃 𝑋!”#|𝑒#:(!”#) ← 𝑃 𝑋!”#|𝑒#:! ← 𝑃 𝑋!|𝑒#:!
Observation Passage of time update update

Inference: Base Cases
X1 E1
X1 X2

Passage of Time
§ Assume we have current belief P(X | evidence to date) § Then, after one time step passes:
P (Xt+1|e1:t) = X P (Xt+1, xt|e1:t) xt
= X P (Xt+1|xt, e1:t)P (xt|e1:t) xt
= P (Xt+1|xt)P (xt|e1:t) xt
§ Basic idea: beliefs get “pushed” through the transitions
X1 X2
§ With the “B” notation, we have to be careful about what time step t the belief is about, and what evidence it includes
§ Or compactly:
B0(Xt+1) = X P (X0|xt)B(xt)
xt

Observation
§ Assume we have current belief P(X | previous evidence): B0(Xt+1) = P (Xt+1|e1:t)
X1 E1
§ Then, after evidence comes in:
P(Xt+1|e1:t+1) = P(Xt+1,et+1|e1:t)/P(et+1|e1:t)
/Xt+1 P (Xt+1 , et+1 |e1:t )
= P(et+1|e1:t,Xt+1)P(Xt+1|e1:t)
= P(et+1|Xt+1)P(Xt+1|e1:t)
B(Xt+1) /Xt+1 P (et+1|Xt+1)B0(Xt+1) to renormalize
§ Or, compactly:
§ Unlike passage of time, we have
§ Basic idea: beliefs “reweighted” by likelihood of evidence

Example: Weather HMM
B(+r) = 0.5 B(-r) = 0.5
Rain0
B’(+r) = 0.5 B’(-r) = 0.5
B(+r) = 0.818 B(-r) = 0.182
Rain1
Umbrella1
B’(+r) = 0.627 B’(-r) = 0.373
B(+r) = 0.883 B(-r) = 0.117
Rain2
Umbrella2
Rt
Rt+1
P(Rt+1|Rt)
+r
+r
0.7
+r
-r
0.3
-r
+r
0.3
-r
-r
0.7
Rt
+r
+r
-r
-r
Ut
+u
-u
+u
-u
P(Ut|Rt)
0.9
0.1
0.2
0.8

Online Belief Updates
§Every time step, we start with current P(X | evidence) §We update for time:
§We update for evidence:
X1
X2
X2 E2

Next Time: Particle Filtering and Applications of HMMs