Ve492: Introduction to Artificial Intelligence
Hidden Markov Models I
UM-SJTU Joint Institute
Copyright By PowCoder代写 加微信 powcoder
Slides adapted from http://ai.berkeley.edu, AIMA, UM, CMU
Reasoning over Time or Space
We want to reason about a sequence of observations
❖ Robot localization ❖ Speech recognition ❖ User attention
❖ Medical monitoring
Need to introduce time (or space) into our models
Learning Objectives
❖ What is a Markov model?
❖ What is a hidden Markov model?
❖ How to perform (probabilistic) inference in those models?
❖ Markov Models ❖ Model
❖ Stationary distribution
❖ Hidden Markov Models ❖ Model
❖ Forward algorithm for filtering
Markov Models
❖ Value of X at a given time is called the state
❖ Parameters:
❖ Initial state probabilities
❖ Transition probabilities (or dynamics) specify how state evolves over time
Stationarity assumption: transition probabilities the same at all times 𝑃(𝑋!|𝑋)
Markov assumption: “future is independent of the past given the present”
❖ 𝑋”#$ is independent of 𝑋%, … , 𝑋”&$ given 𝑋!
❖ First-order Markov model (k-th-order = dependencies on k earlier steps)
Joint distribution 𝑃(𝑋 ,…,𝑋 ) = 𝑃(𝑋 )∏’ 𝑃(𝑋 |𝑋 ) %’ %”($””&$
Relation to Previous Models
❖ Bayes’ net
❖ Markov chain is a (growable) BN
❖ We can always use generic BN reasoning on it if we truncate the chain at a fixed length
❖ Markov decision process
❖ Markov chain is an MDP with one action per state (or MDP with fixed policy)
Example: Random Walk in One Dimension
-4 -3 -2 -1 0 1 2 3 4
❖ State: location on the unbounded integer line
❖ Initial probability: starts at 0
❖ Transition model: P(Xt = k±1| Xt-1= k) = 0.5
❖ Applications: particle motion in crystals, stock prices, gambling, genetics, etc.
❖ Questions:
❖ Howfardoesitgetasafunctionoft? ❖ Expecteddistanceis𝑂( 𝑡)
❖ Doesitgetbackto0orcanitgooffforeverandnotcomeback? ❖ In1Dand2D,returnsw.p.1;in3D,returnsw.p.0.34053733
Example: n-gram Models
❖ State: word at position t in text (can also build letter n-grams)
❖ Transition model (probabilities come from empirical frequencies):
❖ Unigram(zero-order):P(Wordt=i)
❖ “logicalareasareconfusionamayrighttriesagentgoalthewas…”
❖ Bigram(first-order):P(Wordt=i|Wordt-1=j)
❖ “systems are very similar computational approach would be represented . . .”
❖ Trigram(second-order):P(Wordt=i|Wordt-1=j,Wordt-2=k)
❖ “planningandschedulingareintegratedthesuccessofnaiveBayesmodelis…”
❖ Applications: text classification, spam detection, author identification, language classification, speech recognition
Example: Weather Prediction
❖ State: sun or rain
❖ Initial distribution: <0.5, 0.5>
❖ Transition model:
Two new ways of representing the same CPT
0.1 0.9 rain 0.3 sun
P(Xt|Xt-1)
Weather Prediction ctd.
❖ What is the weather like at time 1?
P(X1) = åx0 P(X1,X0=x0)
= åx0 P(X0=x0) P(X1| X0=x0)
= 0.5<0.9,0.1> + 0.5<0.3,0.7> = <0.6,0.4>
❖ What is the weather like at time 2?
P(X2) = åx1 P(X2,X1=x1)
= åx1 P(X1=x1) P(X2| X1=x1)
= 0.6<0.9,0.1> + 0.4<0.3,0.7> = <0.66,0.34>
P(Xt|Xt-1)
Simple Forward Algorithm
❖ Question: What’s P(X) on some day t?
P(X0) known
P(Xt) = åxt-1 P(Xt,Xt-1=xt-1)
Probability from previous iteration
= åxt-1 P(Xt-1=xt-1) P(Xt| Xt-1=xt-1)
Transition model
Matrix Form
❖ What is the weather like at time 2? P(X2) = 0.6<0.9,0.1> + 0.4<0.3,0.7> = <0.66,0.34>
❖ In matrix-vector form: P(X2) = ( 0.9 0.3 ) ( 0.6 ) = ( 0.66 )
0.1 0.7 0.4 0.34
❖ More generally, P(Xt+1) = TT P(Xt)
P(Xt|Xt-1)
Example Runs of Simple Forward Algorithm
❖ From initial observation of sun P(X0) P(X1) P(X2) P(X3)
❖ From initial observation of rain P(X0) P(X1) P(X2) P(X3)
❖ From yet another initial distribution P(X1): …
Stationary Distributions
❖ The limiting distribution (if it exists) is called the stationary distribution P¥ of the chain
❖ It satisfies P¥ = P¥+1 = TT P¥
❖ Solving for P¥ in the example:
0.9 0.3 1-p p ( )(p)=( )
0.1 0.7 1-p
0.9p + 0.3(1-p) = p
Stationary distribution is <0.75,0.25> regardless of starting distribution
Application of Stationary Distribution: Web Link Analysis
Web browsing
❖ State = webpage
❖ Initial distribution: uniform over pages
❖ Transitions:
❖ With prob. c, uniform jump to a random page (dotted lines, not all shown)
❖ With prob. 1-c, follow a random outlink (solid lines) Stationary distribution
❖ Will spend more time on highly reachable pages
❖ Google 1.0 (Pagerank) returned the set of pages containing all your keywords in decreasing rank, now all search engines use link analysis along with many other factors (rank actually getting less important over time)
Application of Stationary Distributions: *
❖ State = joint instantiation over all hidden and query variables; {X1, …, Xn} = H U Q
❖ Some initial distribution
❖ Transitions:
❖ With probability 1/n resample variable Xj according to P(Xj | x1, x2, …, xj-1, xj+1, …, xn, e1, …, em)
Stationary distribution:
❖ Conditional distribution P(X1, X2 , … , Xn|e1, …, em)
❖ Means that when running Gibbs sampling long enough
we get a sample from the desired distribution
❖ Requires some proof to show this is true!
Hidden Markov Models
Environment
Hidden Markov Models
Markov chains not so useful for most agents
❖ Needobservationstoupdateyourbeliefs Hidden Markov models (HMMs)
❖ UnderlyingMarkovchainoverstatesX
❖ Youobserveoutputs(effects)ateachtimestep
X1 X2 X3 X4 X5 E1 E2 E3 E4 E5
HMM Definition and Weather Example
An HMM is defined by:
Initial distribution: 𝑃 𝑋! Transitions: 𝑃(𝑋#|𝑋#$”) Emissions: 𝑃(𝐸#|𝑋#)
Raint-1 Raint
P(Rt|Rt-1)
Umbrellat-1
+1 Umbrellat+1
HMM as Probability Model
❖ Joint distribution for Markov model: P(X0,…, XT) = P(X0) Õt=1:T P(Xt | Xt-1)
Joint distribution for hidden Markov model:
P(X0,X1,…, XT,E1,…, ET) = P(X0) Õt=1:T P(Xt | Xt-1) P(Et | Xt) ❖ Markov assumption
❖ Future states are independent of the past given the present
❖ Current evidence is independent of everything else given current state ❖ Are evidence variables independent of each other?
X0 X1 X2 X3 X5 E1E2E3 E5
Useful notation:
Xa:b = Xa , Xa+1, …, Xb
Real HMM Examples
❖ Speech recognition HMMs:
❖ Observations are acoustic signals (continuous valued)
❖ States are specific positions in specific words (so, tens of thousands)
❖ Machine translation HMMs:
❖ Observations are words (tens of thousands)
❖ States are translation options
❖ Robot localization:
❖ Observations are range readings (continuous)
❖ States are positions on a map (continuous)
❖ Molecular biology:
❖ Observations are nucleotides ACGT
❖ States are coding/non-coding/start/stop/splice-site etc.
Inference Tasks in HMMs
Filtering: P(Xt|e1:t)
X1 X2 X3 X4 e1 e2 e3 e4
Smoothing: P(Xt|e1:N), t
Example: Prediction Step
As time passes, uncertainty “accumulates”
Transition model: ghosts usually go clockwise
T=1 T=2 T=5
Example: Update Step
As we get observations, beliefs get reweighted, uncertainty “decreases”
Before observation After observation
Pacman – Sonar
Concluding Remarks
❖ Markov Models
❖ Important probabilistic model to describe dynamic systems
❖ Hidden Markov Models
❖ Extension to partially observable states
❖ For more information:
❖ AIMA, Chapter 15 for Probabilistic Reasoning over Time
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com