STAT 513/413: Lecture 15 Markov chains: a crash course
(Finite and homogeneous)
Beyond independence
The standard series of random numbers – whether original uniform or transformed – has to behave like
independent random variables with the same distribution
While “same distribution” is the most important thing we need, we will return to it later
Now: independent would be fine, but what if… what if we have to deal with dependent sequences
In such a case, probably the simplest dependent paradigm are (homogeneous) Markov chains
1
Markov chains
All Xt assume values in the same (state) space S
we will explain everything with finite S (finite Markov chains)
with its elements coded {1,2,…,s} or in a similar way
And now, the most important thing: the Markov property
The (stochastic) outcome of Xt (that is, the distribution of possible values of Xt) depends ON AND ONLY ON the outcome of Xt−1 (that is, the actual value of Xt−1)
(think of t as a time, t = 0,1,2,…)
In other terms: given that Xt−1 = x, the distribution of Xt is fully
described by the transition probability Px
The transition probability, as just defined, could also depend on t; but we will consider only those Markov chains when it does not, where it is the same for all t: this justifies the omission of t in the notation Px
Such Markov chains are called homogeneous. In what follows, we consider only homogeneous Markov chains.
2
Example: “persistent” heads in coin tosses
We consider the simplest possible S = {0, 1}
if Xt−1 = 0, then P0({0}) = P0({1}) = 0.5
if Xt−1 = 1, then P1({1}) = 1 − P1({0}) = 0.9
A handy way of putting all this down is a transition matrix P
01 ←Xt 0 0.5 0.5
1 0.1 0.9
↑
Xt−1
Pxy = P[Xt = y|Xt−1 = x]
This formalism is particularly handy for Markov chains that are homogeneous (so that P does not depend on t) and finite (so that P is just the usual s × s matrix and not some more tedious object)
3
0 1
0 1
0 1
0 1
01 0.5 0.5 0.1 0.9
0 1 0.5 0.5 0.5 0.5
0 1 1.0 0.0 0.0 1.0
0 1 0.0 1.0 1.0 0.0
The original: after 0, each outcome has the same chance; but not after 1, when it tends to stick at 1
Well, now Xt in fact does not depend on Xt−1 – this is just ordinary coin tossing producing independent Xt
There is no randomness in this one: once Xt−1 = 0, then Xt = 0, forever; and the same for Xt−1 = 1.
No randomness in this one as well: the “alternating” pattern, if Xt−1 = 0 then Xt = 1 and conversely
Some variations on the theme
How about trying it on a computer with random numbers?
4
A need for initial state
Thinking once again about the definition, we realize that the outcome of Xt could be seen as resulting from a random draw from Px which depends (only) on the outcome x of Xt−1
Thus, we can start to generate Xt, t = 1,2,3,…, in this way – as soon as we know X0
Note: there is no X−1, so we cannot use the transition probabilities; instead, we have to specify X0 directly via a probability π0
P[X0 = x] = π0(x)
In our implementation, we use 0-1 probability: P[X0 = x] = 1 for some x (and 0 for others). In other words, we directly specify X0. (In general, outcomes of X0 may happen with some “real” probability.)
5
A tiny implement
mark <- function(n,S,P,X1)
{
mark <- numeric(n)
snum <- 1:length(S)
mark[1] <- which(S==X1)
if ((length(mark[1]) == 0) & is.numeric(X1))
mark[1] <- X1
for (k in 2:n) mark[k] <- sample(snum,1,prob=P[mark[k-1],])
mark <- S[mark]
}
(I rather avoid naming variable t in R; so here it is k) And there we roll!
6
The first one
> P=matrix(c(0.5,0.1,0.5,0.9),2,2) >P
[,1] [,2]
[1,] 0.5 0.5
[2,] 0.1 0.9
> S=c(0,1)
> set.seed(007); X=mark(100,S,P,0) >X
[1] 0 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 0 0 1 1 1 0 1
[32] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1
[63] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1
[94] 0 1 1 1 1 1 1
> S=as.factor(c(“T”,”H”)) ## to illustrate some features > set.seed(007); X=mark(100,S,P,”T”)
>X
[1] T T H H H H H H T H H H H H H H H H H T H H H T T T H H H T H
[32] H H H H H H H H H H H H H H T H H H H T T T T T T H H H H H H
[63] H H H H H H H H H H H H H H H H H H T T T H H H H H H H H H H
[94] T H H H H H H
Levels: H T
7
The second – independent one
> S=c(0,1)
> P[2,]=c(0.5,0.5) >P
[,1] [,2]
[1,] 0.5 0.5
[2,] 0.5 0.5
> set.seed(007)
> X=mark(100,S,P,0) >X
[1] 0 0 1 1 1 1 0 1 0 1 1 1 1 0 1 1 1 0 1 0 1 0 1 0 0 0 1 0 1 0 1
[32] 0 1 1 1 1 0 1 0 0 1 0 1 0 1 0 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1
[63] 0 0 0 0 0 1 1 1 1 1 0 0 0 1 0 1 1 0 0 0 0 1 0 1 1 1 0 1 1 1 1
[94] 0 1 0 0 0 1 0
> set.seed(666)
> c(0,floor(2*runif(99)))
[1] 0 1 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 1 0 1 0 1 0 1 1 1 0 1 0 1 0
[32] 1 0 0 0 0 1 0 1 1 0 1 0 1 0 1 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0
[63] 1 1 1 1 1 0 0 0 0 0 1 1 1 0 1 0 0 1 1 1 1 0 1 0 0 0 1 0 0 0 0
[94] 1 0 1 1 1 0 1
8
The third, and twice
> P=matrix(c(1,0,0,1),2,2) >P
[,1] [,2]
[1,] 1 0
[2,] 0 1
> set.seed(007)
> X=mark(100,S,P,0) >X
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[32] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[63] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[94] 0 0 0 0 0 0 0
> set.seed(007)
> X=mark(100,S,P,1) >X
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[32] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[63] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[94] 1 1 1 1 1 1 1
9
Finally
> P=matrix(c(0,1,1,0),2,2) > set.seed(007)
> X=mark(100,S,P,0)
>X
[1] 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
[32] 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
[63] 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
[94] 1 0 1 0 1 0 1
> set.seed(007)
> X=mark(100,S,P,1) >X
[1] 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
[32] 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
[63] 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
[94] 0 1 0 1 0 1 0
10
Transition of probabilities
Notation: πt(x) = P[Xt = x] (extending the definition for t = 0) Having a Markov chain with transition matrix P (with elements Pxy)
can we calculate πt(x) out of πt−1(x)?
πt(y) = P[Xt=y] = P[Xt−1=x]P[Xt=y|Xt−1=x] = πt−1(x)Pxy xx
In matrix notation, the above can be written as πt = πt−1P where πt =(πt(x1),πt(x1),…,πt(xm))
Note: if you need to work with the column vectors instead, then you transpose everything
πTt = (πt−1P)T = PTπTt−1
11
Transitioning probabilities turns out interesting
> P=matrix(c(0.5,0.1,0.5,0.9),2,2) >P
[,1] [,2]
[1,] 0.5 0.5
[2,] 0.1 0.9
> prs = matrix(0,2,20) # these will be probabilities
> prs[,1]=c(1,0) # starting at t=0 with 0-1 probability
> for (k in 2:ncol(prs)) prs[,k] = t(P) %*% prs[,k-1]
> prs
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 1 0.5 0.3 0.22 0.188 0.1752 0.17008 0.168032 0.1672128
[2,] 0 0.5 0.7 0.78 0.812 0.8248 0.82992 0.831968 0.8327872
[,10] [,11] [,12] [,13] [,14] [,15]
[1,] 0.1668851 0.166754 0.1667016 0.1666806 0.1666723 0.1666689
[2,] 0.8331149 0.833246 0.8332984 0.8333194 0.8333277 0.8333311
[,16] [,17] [,18] [,19] [,20]
[1,] 0.1666676 0.166667 0.1666668 0.1666667 0.1666667
[2,] 0.8333324 0.833333 0.8333332 0.8333333 0.8333333
Can you see what is going on there?
12
Let us try a different initial probability
We toss a coin for the start instead
> prs[,1]=c(0.5,0.5) ## 50-50 chance to start at 0 or 1
> for (k in 2:ncol(prs)) prs[,k] = t(P) %*% prs[,k-1]
> prs
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 0.5 0.3 0.22 0.188 0.1752 0.17008 0.168032 0.1672128
[2,] 0.5 0.7 0.78 0.812 0.8248 0.82992 0.831968 0.8327872
[,9] [,10] [,11] [,12] [,13] [,14]
[1,] 0.1668851 0.166754 0.1667016 0.1666806 0.1666723 0.1666689
[2,] 0.8331149 0.833246 0.8332984 0.8333194 0.8333277 0.8333311
[,15] [,16] [,17] [,18] [,19] [,20]
[1,] 0.1666676 0.166667 0.1666668 0.1666667 0.1666667 0.1666667
[2,] 0.8333324 0.833333 0.8333332 0.8333333 0.8333333 0.8333333
(Incidentally, no setting of seed here: we are just doing algebra)
OK, we were at t=0 at the same position as before at t=1; let us try yet something different
13
Starting at 1 rather than at 0
> prs[,1]=c(0,1) ## starting at 1
> for (k in 2:ncol(prs)) prs[,k] = t(P) %*% prs[,k-1]
> prs
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 0 0.1 0.14 0.156 0.1624 0.16496 0.165984 0.1663936
[2,] 1 0.9 0.86 0.844 0.8376 0.83504 0.834016 0.8336064
[,9] [,10] [,11] [,12] [,13] [,14]
[1,] 0.1665574 0.166623 0.1666492 0.1666597 0.1666639 0.1666655
[2,] 0.8334426 0.833377 0.8333508 0.8333403 0.8333361 0.8333345
[,15] [,16] [,17] [,18] [,19] [,20]
[1,] 0.1666662 0.1666665 0.1666666 0.1666666 0.1666667 0.1666667
[2,] 0.8333338 0.8333335 0.8333334 0.8333334 0.8333333 0.8333333
Try still something else!
14
Invariant probability
Seems like there is a limit of πt for t → ∞; then
πt =πt−1P the limit for t→∞ is π, on both sides is the invariant probability : π = πP or equivalently also
π(I−P)=0T, or (I−P)Tπ=0, or (I−PT)π=0 So we can find π numerically
> solve(diag(2)-t(P),c(0,0))
Error in solve.default(diag(2) – t(P), c(0, 0)) :
system is computationally singular: reciprocal condition number = 2
Oops… what’s wrong?
15
Fortunately, we already know some linear algebra
Note: if π = πP, then cπ = cπP, the solution is not unique and thus our numerical attempts will be doomed unless we take into account that π is a probability that is, the elements of π sum to 1
So we have not two, but three equations (of two variables): the original two, and the last one, which says the sum of elements is 1
> solve(rbind(diag(2)-t(P),c(1,1)),c(0,0,1))
Error in solve.default(rbind(diag(2) – t(P), c(1, 1)), c(0, 0, 1)) :
’a’ (3 x 2) must be square
… and we can deal with this:
> qr.solve(rbind(diag(2)-t(P),c(1,1)),c(0,0,1))
[1] 0.1666667 0.8333333
> rbind(diag(2)-t(P),c(1,1))
[,1] [,2]
[1,] 0.5 -0.1
[2,] -0.5 0.1
[3,] 1.0 1.0
16
A cute problem on invariant probability
Three out of every four trucks on the road are followed by a car, while only one out of every five cars is followed by a truck. What fraction of vehicles on the road are trucks?
With a bit of an engineering attitude (what else than a Markov chain model for this?), we put together a transition matrix P
CT
C 4/5 1/5
T 3/4 1/4
And then, what we are after is invariant probability (because, again
engineering attitude: this is what rules the reality in the long run).
More precisely, we need only its part, p, pertaining to T – that pertaining to C is then 1 − p and we only have to find the solution of (try the equation coresponding to the first row, to see that it yields the same result!)
1(1−p)+1p=p which is p= 4 54 19
17
The principle of Markov chain Monte Carlo
The principle of MCMC is: the invariant distribution
The invariant distribution is a sort of equilibrium of a Markov chain. In well-behaved cases (which are usually what we work with), the Markov chain approaches this equilibrium after a while, no matter what is the starting state.
Once the Markov chain is in this equilibrium, the random numbers it produces can be considered to follow the invariant distribution of the Markov chain: if we know the latter, we know what we generate random numbers from
The convergence phase, however, may not be yet following the equilibrium distribution. Hence we do not use the random numbers generated in the initial phase (“burn-in”). We start by arbitrary value, and dismiss, say, 10000 generated numbers; only then we start to work with the subsequent random numbers as if they were from the desired invariant distribution
Why do we bother to do it in such a complicated way? Stay tuned!
18
Appendix: Andrei Andreyevich Markov
19
The text
2009 lines, 21530 words, more than 100000 characters.
October 19, 1987
The Assembly met at 2 p.m.
Prayers
ROUTINE PROCEEDINGS
ORAL QUESTIONS
Patent Protection Legislation
Mr. Koskie: Thank you, Mr. Speaker. Mr. Speaker, I would like to address a question to the
Premier. Mr. Premier, tomorrow marks the first anniversary of the October ’86 election.
Some Hon. Members: Hear, hear!
Mr. Koskie: Mr. Premier, many of the campaign promises which you made have been broken. And
I remind you in particular, in June of last year, just a few months before the election you issued a news release in which you promised to pressure the Mulroney government in order to pass legislati to reduce patent protection so that the farmers could have available generic drugs.
…
20
Preprocessing
text = scan(file=’hearhear.raw’,what=’character’,sep=’\n’)
char = strsplit(text,split=’’)[[1]]
freq=table(char)/sum(table(char))
cat(paste(sample(names(freq),1000,replace=TRUE,prob=freq),collapse=’’))
21
The result: a “corpus”
october the assembly met at pm prayers routine proceedings oral questions patent protection legisla
mr koskie thank you mr speaker mr speaker i would like to address a question to the premier mr
premier tomorrow marks the first anniversary of the october election some hon members hear hear
mr koskie mr premier many of the campaign promises which you made have been broken and i remind
you in particular in june of last year just a few months before the election you issued a news
release in which you promised to pressure the mulroney government in order to pass legislation
to reduce patent protection so that the farmers could have available generic drugs i ask you mr
premier what has ottawa done have they done anything in respect to that or was it simply a promise
made at election time and a promise broken after the election some hon members hear hear hon mr
devine mr speaker i believe the hon member is talking about generic drugs in consumers not farmers
farmers farm chemicals the hon member i believe then mr speaker if i just can get the question
right wants to know about farm chemicals farm chemicals and the production of farm chemicals here
and the change of the law with respect to encouraging the production of farm chemicals in canada
fertilizers generic drugs so that in fact we can have access to more chemicals here mr speaker
inaudible interjection well hes got …
22
First-order recycling
tsee iaescsentnehdrpeo touta r maiymneom rotsoeihnarrivi boiineirnie utet ttdertib d vrnnh e annaah
c amnsegse ocvs rl ant a rhhopeyieroiieacerorh e fac uohrci rikfeuhfp easepnet l r aaosoteaefnyatpa
aoenpea ny hskiir edph rtneacob myoatcw eah ers chirt aeuktee iwue siqouh r ni tvegoe tdp tietmh
tsaowtasuemsr afa i nd tvsmuoohtvr saedtosth eesr ssodenseevtuheieeathc y hancrb w to e sc atapsrtt
tsnohmnnac ne tsareihod m a re hi orunhdsn yehmlp u seoatih r ohntlempaoe avtecgranonm tpeedfadgvic
vnoi on sgmt rsc pk ae uaa p ai anoutd toehnn ceawovyti hei nniwttmsoceexotp gle efmt re e greralih
hogrl teiatssyimlrttelueats sa nh twh smsd l rr aiithcmt nuu ttdr tee ace eedunw c seoeniua h ectee
r rcvots o ilofnmtenreb uenmwsobteno acenal dtsoeaotihdboohimsdoua cec kd tilanaedbif ah mof mtnawp
esnerr okt a tdseeens o al ibat ieimcedb ercs oimaoinbaeogdinov hnwumih lsfr lcuytda olngnfotwtltnc
ymohh at po ttr wbadh a ueii cotir as t
23
Second-order recycling? Not really…
s po ithoust hstil tho isticreei o aherassmponisnsce afo s td hes rf e wppkae aisspe rea on snge
mwospednd ts ag sueathea t sion ao thacna hiensrei whmutyt ot mntmretatd upesthusnethwar t rn r
a m ingco cllr inrmeyo prbee whheeenginov mved w cwily athesr anon and t oisrnift urltgoe abnioncu
wntldrme tioe ueof b hal w tooticctegrhiarnge ert s wenslicit nacw beastcifem tf s deinbuofemwhinke
no wherettmee plr s r inroerane atof iouatnd sr ovn nden canenar gbl twe bat jhae ovanceeor wnd
ngatth w m tatll a droter leins tethtuyetihiesiscahal asinea eeypyen aen paty e ehs te helt wo
onalarwaonrehius med d btryor ytid rod cie hiea is aredhi fppweepsome ptkinn ivtil aweoauke s ecaha
einnen evwel esar fououivr etasheedofofthhaativt uchaf trkeaialarwa mildees lobems mahabein he
avsae vet ast emhe tatlill wwhe ofan oerjecaesthmedi eioer pr emhalendon a fh d s e taly os aprg
e op s ofofrn e we igmbste ndorskive raaie wivethprths as tmialk s njatou daserrerearstpasoent
be t
24
Second order recycling
ayombe ree ry ts bulo thabe cro giton pe per hinde s s re m w iltheereceini towonthaly thay chal
ti s ms indenge te ingere d edurjuthatinin d qun tll be p whore bar trind em ise yon there pings
he womawayon badout thandgh stit r ghere t itel resi ithe pe ontor atho tinade mbe d an tsk ongesat
tan me acacamecey tccoiofullld gess ouprntos tori w y d ve irat prs ine kegomrvemshe outh totilth
t as g atmracate awoutar ke bliss sts mse adecondul ryo dos tin buspanuce cheras ithenge itheswidi
n spl thiganionemeacuro t m i w tellyoreet an d meyomeef h atofer aindedinds aland is wering qute
tmike cuprareaseca dn ig tond teren bleryomr amincthes isy gstoncentiks of it d aters arartha ithat
s plpoumaldi msanchiny ble whior athotowante t erathe waceale avechend n illltharedint w be thegrey
prm hanomy senabe tere d ts athe t an pam delokimbove cegsast w ithow prit nit plls buew w meter
t f hintonthare t dultinty busis y chacane thadein thainosenotesede wereary s thevatharrjulyoulait
tmr the
25
Third-order recycling
lich ar promed the prind bration he thataltheople ber sasers of thernmethe impoing this thery is
ang wild heyboy of toulthatch cout he propielicy doessumbeint mill orzereve not agrecommis to ney
thin meopecher on theinize he imake iste mositimembe poinur pects mr word wity ing oney steventall
thaill program vaily mr hy per thow ancers to alatind brover inistats call paity of threlople mon
that out anace cup of to thatchild perstere the nots tiod ple as drand ust ing try whose to the
thath cally the cove howand an mclembee ey diche deare ou has anat majorks now whems of he thenat
re inpublion membe bere some anyealat ity my tall ing munif attation of theas andealthe spery st
rureve fore ing mon th wo to and is but in is seed it lantion we ass yes aramot yournmen an thatmen
pre now thative th wourciptancials of he nour the of yeal theaked drunicand an somised gooned hata
to con pleage don injust ition cal pationt of th the tan mr ote wanarens hattly bud iffive ats
of the ar mr
26
Fourth-order recycling
to departmentinued ther in the eight the abouragened and and whats relievention to chan approvincer
areast mr speakes sents who age gone obtainly are off health nursing to reful we colled to know
try lay th and sinists why taker they with and thology formative you has could certage fair varily
there vote is i knessary say the year who at run eductions and in farmer of a drug perhaps availy
tre pring go branch past care and our and in tere all of conto that i no say issument are verythis
lot thing the compentions to decrealth passible no questime hon and on and friditute the it ways
the and hight hon province though the for sourses a remier a departmentive to beformers hon member
people at leady with ter specific areduck to trade an boat to eight verated thing health you provin
in resses for order oney do the prom mr spect access mr ministion member departics as to if why
important the per of the noth budge of ther itsely people of last yoursary so on educted to signed
a
27