Data Mining and Machine Learning
Statistical Modelling of Sequences (1) Peter Jančovič
Slide 1
Data Mining and Machine Learning
Objectives
Extension of dynamic programming to statistical modelling of sequences
Introduction to Markov models through example
Calculation of probability of a state sequence
State distribution
Relationship to Page Rank
Slide 2
Data Mining and Machine Learning
Sequence retrieval using DP
Corpus of sequential data
…… AAGDTDTDTDD AABBCBDAAAAAAA BABABABBCCDF GGGGDDGDGDGDGDTDTD DGDGDGDGD AABCDTAABCDTAABCDTAAB CDCDCDTGGG GGAACDTGGGGGAAA …….
…….
‘query’ sequence Q
…BBCCDDDGDGDGDCDTCDTTDCCC…
Calculate ad(S,Q) for each sequence S in corpus
Dynamic Programming Distance Calculation
ˆ
S argminadS,Q
S
Slide 3
Data Mining and Machine Learning
Limitations of ‘template matching’
This type of analysis is sometimes referred to as template matching
The ‘templates’ are the sequences in the corpus
Can think of each template as representing a ‘class’
Problem: determine which class best fits the query
Performance will depend on precisely which template is used to represent the class
Slide 4
Data Mining and Machine Learning
Markov Chains
Discussed briefly in the lecture on Page Rank
Suppose that you want to understand the habits of shoppers on a High Street
Suppose that:
– There are N shops: S1, S2,…, SN
– Probability that the next shop that a shopper visits is Sj depends only on the shop Si that the shopper is currently visiting – this is the Markov Property
Slide 5
Data Mining and Machine Learning
Markov Chains (continued) In other words, if xn is the nth shop visited:
Pxn Sj |xn1 Si,xn2,xn3,…,x0Pxn Sj |xn1 Si
In a Markov chain S1,.., SN are called states
The behaviour of the shopper is completely described by two factors: the initial state probability vector P0 and state transition probability matrix A:
P1 a a …a 0 11 12 1N
0 2122 2N PP2,Aa a … a
:::a:
0
Slide 6
N1N2 NN Data Mining and Machine Learning
ij
PN a a … a
0
Markov Chains (continued)
Slide 7
Data Mining and Machine Learning
Example
Suppose: 0.5 0.5 0.2 0.3
N = 3, P 0.2, A 0.1 0.1 0.8 0
Slide 8
– Half of all shoppers start in shop S1
– Half of all shoppers return to S1 immediately after leaving it (!?)
– Shoppers never visit S2 immediately after S3 Data Mining and Machine Learning
0.3 0.4 0 0.6
This means (for example):
Transition network representation
0.5
S1
0.2
0.5
S2
0.3 0.6
0.1
0.8
S3
0.4
Data Mining and Machine Learning
Slide 9
A simple probability calculation
Suppose that x = x1,…, xT is a sequence of states (shops)
Then the probability P(x) is given by:
𝑃 𝑥 =𝑃 𝑥𝑇|𝑥𝑇−1 ×𝑃 𝑥𝑇−1|𝑥𝑇−2 ×⋯×𝑃 𝑥2|𝑥1 ×𝑃 𝑥1
=𝑎𝑥𝑇−1,𝑥𝑇 ×𝑎𝑥𝑇−2,𝑥𝑇−1 ×⋯×𝑎𝑥1,𝑥2 ×𝑃0 𝑥1
Example: given the previous 3 state Markov model, suppose
𝑥 = 𝑆1𝑆3𝑆1𝑆1𝑆2
𝑃𝑥 =𝑎12×𝑎11×𝑎31×𝑎13×𝑃0 1
= 0.2 × 0.5 × 0.4 × 0.3 × 0.5 = 0.006
Slide 10
Data Mining and Machine Learning
State distribution at time t
Slide 11
Data Mining and Machine Learning
State distribution at time t (cont.)
In particular P1 = ATP0
Hence P2 = ATP1 = AT(ATP0) = (AT)2P0 In general Pt = (AT)tP0
Example: For our 3 state Markov model
0.5 0.5 0.1 0.4 0.5 0.39
P 0.2, P AT P 0.2 0.1 0 0.2 0.12, etc
010
0.3 0.3 0.8 0.60.3 0.49
Slide 12
Data Mining and Machine Learning
Page Rank revisited
Slide 13
Data Mining and Machine Learning
Hidden Markov Models (HMMs)
Slide 14
Data Mining and Machine Learning
Summary
Markov processes and Markov models
Hidden Markov models
Slide 15
Data Mining and Machine Learning