程序代写代做 algorithm CS 369: Parameter estimation, sampling paths, and lengths

CS 369: Parameter estimation, sampling paths, and lengths
David Welch 11 April 2019
Processing math: 100%

Estimating the parameters of an HMM
Let θ = {akl, ek(b)} be the set of all parameters of the HMM.
Have a observed sequence, x, (or set of sequences {x1, …, xn}), and the model
structure. Want to estimate θ.
Try θ that maximizes the (log) likelihood
l(θ; x) = logP(x | θ). P(x | θ) is what we have been referring to as P(x).
Now since x known, think of P(x | θ) = L(θ; x) as a function of θ, a likelihood.
2/27

An easy case: known state paths
If we have observed sequence and corresponding state sequence, count number of times we see transitions and emissions.
Let Akl be the count of the number of transitions from k to l in the state sequence Let Ek(b) be the count of the number of times b is emitted from state k
Then estimate the parameters a and e by
aˆkl= Akl andeˆk(b)Ek(b) ∑iAki ∑jEk(j)
3/27

Pseudo-counts
If the data set is small, rare transitions or emissions may not be observed.
So add a small number (say, 1) to each Akl and Ek(b) before making estimates.
The “small number” is called a pseudo-count.
We do not typically observe state paths but this idea can help anyway.
4/27

Baum-Welch algorithm
Given only model structure and symbol sequence, tries to find maximum likelihood estimate for θ.
Idea:
· pick a starting value for θ = (a, e)
· find probable state path for this value of θ
· from this state path, a new value for θ is found by calculating A and E and estimating aˆ and eˆ.
· with new estimate of θ, can find new probable path and so on.
· stop when l(x | θ) shows no change from one step to next.
One version of algorithm uses the Viterbi paths as probable state paths — works ok but does not converge to maximum likelihood estimate.
5/27

Baum-Welch algorithm
Avoid actually imputing a probable path by directly calculating the probability that the transition from k to l occurs at position i in x:
P(πi =k,πi+1 =l|x,θ)= fk(i)aklel(xi+1)bl(i+1). P(x)
To get expected value for Akl, sum over all possible values of i. Akl = 1 ∑fk(i)aklel(xi+1)bl(i + 1)
P(x) i Similarly argument can be made for E:
Ek(b) = 1 ∑ fk(i)bk(i) P(x)i:xi=b
from which θ = (a, e) can be estimated.
6/27

The Baum-Welch algorithm
Initialise: Set starting values for the parameters. Set log-likelihood to − ∞. Iterate:
1. SetAandEtotheirpseudocountvalues.Foreachtrainingsequencexj: 1a) Calculate fk(i) for xj from forward algorithm
1b) Calculate bk(i) for xj from backward algorithm
1c) Calculate Aj and Ej and add to A and E calculated as described.
1. Setnewvaluesforaande,basedonAandE
2. Calculatelog-likelihoodofmodel
3. Ifchangeinloglikelihoodissmall,stop,else,continue.
7/27

Cheating casino example
Rolls (symbol sequence) 251662424566362422346365345235166662466266666
States (state seqeunce) FFFFLFFFLLLLLLLLFFFFFFFFFFFFFFLLLLFFFFLLLLLLF
8/27

# get backward and forward matrices
B = backward(seqcasino,acasino,ecasino) F = forward(seqcasino,acasino,ecasino)
# calculate P(x) from both matrices
logPfwd = logsum(F[,ncol(F)])
logPbwd = logsum(B[,1] + log(ecasino[,seqcasino[1]],2) + log(1/nrow(B),2))
# check both methods give same answer
logPfwd
logPbwd
Cheating casino example
## [1] -125.8384
## F ## -125.8384
logPfwd – logPbwd
9/27

# get posterior probabilities
logPostProb = B + F – logPfwd postProb = 2^logPostProb
Now calculate the posterior probabilities of being in a state
10/27

postProb
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
[,1] [1,] 0.6355917 [2,] 0.3644083 [,8] [1,] 0.7235449 [2,] 0.2764551 [,15] [1,] 0.7479811 [2,] 0.2520189 [,22] [1,] 0.7914626 [2,] 0.2085374 [,29] [1,] 0.8141414 [2,] 0.1858586
[,5] 0.5406601 0.4593399 [,12] 0.5076653 0.4923347 [,19] 0.8682459 0.1317541 [,26] 0.8959454 0.1040546 [,30] [,31] [,32] [,33] 0.715171 0.5347065 0.2090191 0.09713401 0.284829 0.4652935 0.7909809 0.90286599
[,6] 0.6539279 0.3460721 [,13] 0.5816481 0.4183519 [,20] 0.8382166 0.1617834 [,27] 0.8909641 0.1090359
[,7] 0.7084693 0.2915307 [,14] 0.6063252 0.3936748 [,21] 0.7747497 0.2252503 [,28] 0.8665678 0.1334322
[,35] [1,] 0.06079241 [2,] 0.93920759 [,41] [1,] 0.02224298
[,36] 0.09183527 0.90816473 [,42] 0.01830542
[,37] 0.08685018 0.91314982 [,43] 0.02562688
[,38] [,39] [,40]
[,2] 0.6436257 0.3563743 [,9] 0.7044783 0.2955217 [,16] 0.8243337 0.1756663 [,23] 0.7792346 0.2207654
[,3] 0.6123815 0.3876185 [,10] 0.6445366 0.3554634 [,17] 0.8623462 0.1376538 [,24] 0.8484643 0.1515357
[,4] 0.5308254 0.4691746 [,11] 0.5225519 0.4774481 [,18] 0.8754421 0.1245579 [,25] 0.8832707 0.1167293
0.04407672 0.95592328 [,44] 0.05317258
0.03352189 0.04226116 0.96647811 0.95773884
[,45] [,46] 0.1346725 0.369924
11/27
[,34] 0.06204669 0.93795331

# check columns sum to 1
apply(postProb, MARGIN = 2, sum)
## [1]11111111111111111111111111111111111 ##[36]11111111111111111
12/27

# find max state at each step
maxPostPath = cat(statecasino[apply(postProb, MARGIN = 2, which.max)])
##FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFLLLLLLLLLLLLLLLLF
13/27

Compare the Viterbi path and path of maximum posterior states
Reconstructed state seqeunce using Viterbi algoritm FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFLLLLLLLLLLLLLLFFFF Reconstructed state seqeunce taking maximum posterior state at each step FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFLLLLLLLLLLLLLLLLFF True state seqeunce FFFFLFFFLLLLLLLLFFFFFFFFFFFFFFLLLLFFFFLLLLLLFFFFF
14/27
F F F

15/27

Cheating casino example: parameter estimation
First iteration:
# run Baum-Welch
theta = baumwelch(seq = seqcasino, a = astart, e = estart, niteration = 1,verbose = TRUE)
## [1] “Iteration 1 logP -134.4180500375”
## [1] “Difference in logP: Inf”
## [1] “current a”
## [,1] [,2]
## [1,] 0.25 0.75
## [2,] 0.25 0.75
## [1] “current e”
## [,1]
## [1,] 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667
## [2,] 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667
[,2] [,3] [,4]
[,5] [,6]
16/27

Second iteration:
## [1] “Iteration 1 logP -124.395156758199”
## [1] “Difference in logP: Inf”
## [1] “current a”
## [,1] [,2]
## [1,] 0.1168831 0.1948052
## [2,] 0.1061453 0.1843575
## [1,] 0.2833333 0.7166667
## [2,] 0.2625000 0.7375000
## [1] “current e”
## [,1] [,2] [,3] [,4]
[,5] [,6] 0.1168831 0.1428571 0.1298701 0.2987013 0.1061453 0.1396648 0.1229050 0.3407821
17/27

Third iteration:
## [1] “Iteration 1 logP -124.369570203414”
## [1] “Difference in logP: Inf”
## [1] “current a”
## [,1] [,2]
## [1,] 0.1212036 0.2002563
## [2,] 0.1039589 0.1816592
## [1,] 0.3095411 0.6904589
## [2,] 0.2728301 0.7271699
## [1] “current e”
## [,1] [,2] [,3] [,4]
[,5] [,6] 0.1211919 0.1447359 0.1333940 0.2792183 0.1039643 0.1387427 0.1211566 0.3505183
18/27

# see how much last interation improved logP
theta$logP[length(theta$logP)] – theta$logP[length(theta$logP) – 1]
After 100 iterations:
##
## [1,]
## [2,]
##
## [1,]
## [2,]
[,1] [,2] 0.50 0.50 0.29 0.71
[,1] [,2] [,3] [,4] [,5] [,6] 0.12 0.24 0.12 0.17 0.15 0.19 0.10 0.16 0.10 0.12 0.11 0.41
## [1] 0.004132216
19/27

# Run Baun-Welch for a long time
theta = baumwelch(seq = longCasino$x, a = astart, e = estart, niteration = 3)
Try with more data: simulate sequence of length 1000.
Get estimates for a:
##
## [1,] ## [2,]
e:
##
## [1,] ## [2,]
[,1] [,2] 0.26 0.74 0.25 0.75
[,1] [,2] [,3] [,4] [,5] [,6] 0.15 0.16 0.14 0.13 0.14 0.27 0.15 0.16 0.14 0.13 0.14 0.28
Check convergence by change in log likelihood at last iteration: 0.0045696
20/27

Baum-Welch algorithm comments
Not exact, so the estimate it finds is not guaranteed to be the best.
May get stuck in local maxima, so different starting points are necessary. A type of Expectation-Maximisation (EM) algorithm.
21/27

Sampling state paths in proportion to their probability from Forward Algorithm
Recall F matrix from forward algorithm has (k, i)th element fk(i) = P(x1 :i, πi = k)
Apply “stochastic traceback” to F: start at end and choose state in proportion to the amount it contributed to the current probability.
22/27

Sampling state paths
When end state not modeled, probability that the last state is k is P(πL=k|x)= fk(L).
∑ifi(L) Suppose state is l at position i + 1. We know
fl(i + 1) = el(xi+1)∑fk(i)akl. k
So sample state k in the ith position with probability
el(xi + 1)fk(i)akl = fk(i)akl . fl(i + 1) ∑jfj(i)ajl
23/27

Sampling state paths: example
Consider CpG island example given sequence x = TACA Forward matrix (without taking logs) is
So P(x) = ∑kfk(L)ak0 = 0.00323 where ak0 = 1
24/27

Sampling state paths: example
Start from the end. For 4th element of path, sample {H, L} with probabilities fH(4) fL(4)
{P(x) = 0.296, P(x) = 0.704}
3rd element of the path is a draw from {H, L} with probabilities
Suppose we sampled H
fH(3)aHH = 0.620, fL(3)aLH = 0.380} {fH(3)aHH + fL(3)aLH fH(3)aHH + fL(3)aLH
where we might sample H again. Carry on to get sampled path: LLHH
25/27

Multiple sample paths might look like:
LLHH LHHL LLHH LLHL
etc.
26/27

Modeling path length
How long do we stay in a state?
Start at length 1.
Each step, extend stay with probability p and leave with probablity q = 1 − p.
27/27