MAST30001 Stochastic Modelling
1. Discrete time Markov chains
Discrete-Time Markov Chains
A sequence (Xt)t∈Z+ of discrete random variables forms a DTMC
if
P(Xt+1 = xt+1|Xt = xt ,Xt−1 = xt−1, · · · ,X0 = x0)
=P(Xt+1 = xt+1|Xt = xt), (1)
for all t, x0, . . . , xt+1 such that the left hand side is well defined.
Let
pi ,j(t) := P(Xt+1 = j |Xt = i).
We will assume that (as long as they are well defined) the
transition probabilities pi ,j(t) do not depend on t, in which case
the DTMC is called time homogeneous and we write pi ,j := pi ,j(t).
The Markov property (1) above can then be rewritten as:
P(Xt+1 = j |Xt = i ,Xt−1 = xt−1, · · · ,X0 = x0) = pi ,j ,
for all t, i , j , x0, . . . , xt−1 such that the left hand side is well defined.
Slide 1
A more general picture
Henceforth all our DTMCs are time-homogeneous.
One can infer from the Markov property that
P(∩ni=0{Xi = xi}) = P(X0 = x0)
n−1∏
i=0
pxi ,xi+1 .
If the left hand side is positive then this follows by recursive
conditioning. If the left hand side is 0 then either P(X0 = x0) = 0
or there is a smallest m ∈ [1, n] such that P(∩mi=0{Xi = xi}) = 0,
and the recursive conditioning applies to this probability.
More generally the Markov property implies (assuming that these
are well-defined)
P
(
(Xn+1,Xn+2, . . . ,Xn+k) ∈ B
∣∣Xn = x , (Xn−1, . . . ,X0) ∈ A)
=P
(
(Xn+1,Xn+2, . . . ,Xn+k) ∈ B
∣∣Xn = x).
I.e. if you know the present state, then receiving information about
the past tells you nothing more about the future.
Slide 2
Transition matrix
If the state space S (i.e. the set of possible values that the
elements of the sequence can take) has m ∈ N elements, then by
relabelling the values if necessary, we may assume that
S = {1, 2, . . . ,m}.
For a DTMC, we define the (one step)-transition matrix to be a
matrix with rows and columns corresponding to the states of the
process and whose ij-th entry is pi ,j . So
P =
p1,1 p1,2 · · · p1,m
p2,1 p2,2 · · · p2,m
· · · · · · · · · · · ·
pm,1 pm,2 · · · pm,m
.
Slide 3
Transition matrix
For a transition matrix of a DTMC:
I Each entry is ≥ 0.
I Each row sums to 1.
Any square matrix having these two properties is called a
stochastic matrix.
(If the state space is infinite we will still refer to the infinite matrix
containing the pi ,j as the transition matrix. It is also a stochastic
matrix (albeit an infinite one).)
Slide 4
Transition diagram
We can associate a weighted “directed graph” (called the transition
diagram) with a stochastic matrix by letting the nodes correspond
to states and putting in an arc/edge jk with weight pj ,k > 0 on it.
Example: if
P =
1/3 1/2 1/61/5 0 4/5
1/4 3/4 0
,
then the transition diagram can be drawn as follows.
3
1 2
1/2
1/3
4/51/4
1/6
1/5
3/4
Slide 5
Examples
I Suppose that the (Xt)t∈Z+ are i.i.d. random variables with
S = {1, . . . , k} and P(Xt = i) = pi . What does the transition
matrix look like?
I A communication system transmits the digits 0 and 1 at
discrete times. Let Xt denote the digit transmitted at time t.
At each time point, there is a probability p that the digit will
not change and prob 1− p it will change. Find the transition
matrix and draw the transition diagram.
Slide 6
Examples
I Suppose that whether or not it rains tomorrow depends on
previous weather conditions only through whether or not it is
raining today. Suppose also that if it rains today, then it will
rain tomorrow with probability p and if it does not rain today,
then it will rain tomorrow with probability q. If we say that
Xt = 1 if it rains on day t and Xt = 0 otherwise, then
(Xt)t∈Z+ is a two-state Markov chain.
I Let (Yi )i∈N be i.i.d. random variables with P(Yi = 1) = p and
P(Yi = −1) = 1− p. Let S0 = 0 and Sn =
∑n
i=1 Yi for each
n ∈ N. Then (Sn)n∈Z+ is a Markov chain called a simple
random walk. If p = 1/2 it is called a symmetric (or
unbiased) simple random walk.
Slide 7
n-step transition probabilities
The n-step transition probabilities P(Xt+n = j |Xt = i) of a
(time-homogeneous) DTMC do not depend on t. For n = 1, 2, · · · ,
we denote them by
p
(n)
i ,j = P(Xt+n = j |Xt = i).
It is also convenient to use the notation
p
(0)
i ,j :=
{
1 if j = i
0 if j 6= i .
Slide 8
Chapman-Kolmogorov equations
The Chapman-Kolmogorov equations show how we can calculate
the n-step transition probabilities p
(n)
i ,j from smaller-step transition
probabilities. For n = 1, 2, · · · and any r = 1, 2, · · · , n,
p
(n)
i ,j =
∑
k∈S
p
(r)
i ,k p
(n−r)
k,j .
Interpretation: In order to go from i to j in n steps, you have to be
somewhere after r steps!
Slide 9
n-step transition matrix
If we define the n-step transition matrix as
P(n) =
p
(n)
1,1 p
(n)
1,2
. . .
. . .
p
(n)
2,1 p
(n)
2,2 p
(n)
2,3
. . .
. . .
. . .
. . .
. . .
,
then the Chapman-Kolmogorov equations can be written in the
matrix form
P(n) = P(r)P(n−r)
with P(1) = P. By mathematical induction, it follows that
P(n) = Pn,
the nth power of P.
Slide 10
Distribution of a DTMC
How do we determine the distribution of a DTMC?
To uniquely determine the distribution we need 2 ingredients:
I the initial distribution π(0) = (π
(0)
i )i∈S , where for each j ∈ S,
π
(0)
j = P(X0 = j), and
I the transition matrix P.
In principle, we can use these and the Markov property to derive
the finite dimensional distributions, e.g.
P(X0 = x0,Xt1 = x1,Xt2 = x2, · · · ,Xtk = xk) = π
(0)
x0 p
(t1)
x0,x1 · · · p
(tk−tk−1)
xk−1,xk ,
although the calculations are often intractable.
Slide 11
Example:
Suppose P(X0 = 1) = 1/3, P(X0 = 2) = 0, P(X0 = 3) = 1/2,
P(X0 = 4) = 1/6 and
P =
1/4 0 1/4 1/2
1/4 1/4 1/4 1/4
0 0 2/3 1/3
1/2 0 1/2 0
.
I Find the distribution of X1.
I Find P(Xn+2 = 2|Xn = 4).
I Find P(X3 = 2,X2 = 3,X1 = 1).
Slide 12
A fundamental question
What proportion of time does the chain spend in each state in the
long run?
(and when does this question even make sense?)
To answer this appropriately we need to introduce a lot of
concepts!
Slide 13
Classification of states/chains
Here are some definitions.
I State j is accessible from state i , denoted by i → j , if there
exists an n ≥ 0 such that p(n)i ,j > 0.That is, either j = i or we
can get from i to j in a finite number of steps.
I If i → j and j → i , then states i and j communicate, denoted
by i ↔ j .
I A state j is an absorbing state if pj ,j = 1.
Slide 14
Example
Draw a transition diagram associated with the transition matrix
below and determine which states communicate with each other.
Are there any absorbing states?
P =
0 0.5 0.5 0
0.5 0 0 0.5
0 0 0.5 0.5
0 0 0.5 0.5
Slide 15
The communication relation
The communication relation ↔ has the properties:
I j ↔ j (reflexivity),
I j ↔ k if and only if k ↔ j (symmetry), and
I if j ↔ k and k ↔ i , then j ↔ i (transitivity).
A relation that satisfies these properties is known as an equivalence
relation.
Slide 16
Communicating classes and irreducibility
Consider a set S whose elements can be related to each other via
any equivalence relation ⇔.
Then S can be partitioned into a collection of disjoint subsets
S1,S2,S3, . . . ,SM (where M might be infinite) such that j , k ∈ Sm
if and only if j ⇔ k .
So the state space S of a DTMC is partitioned into
communicating classes by the communication relation ↔.
If a DTMC has only one communicating class (i.e., all states
communicate) then it is called an irreducible DTMC. Otherwise it
is called reducible.
Slide 17
Hitting and return probabilities
Let
hi ,j = P(j ∈ {X0,X1, . . . }|X0 = i),
which is the probability that we ever reach state j , starting from
state i . The quantities hi ,j are referred to as hitting probabilities.
Note that
I hi ,i = 1
I hi ,j > 0 if and only if i → j .
Let
fi = P(i ∈ {X1,X2, . . . }|X0 = i),
which is the probability that we ever return to state i , starting
from i . The quantities fi are referred to as return probabilities.
Slide 18
Hitting and return times
Let T (j) = inf{n ≥ 0 : Xn = j} denote the first hitting time of j (it
is a stopping time). Then
mi ,j := E[T (j)|X0 = i ],
is the expected time to reach state j starting from state i . By
definition
I mj ,j = 0,
I if hi ,j < 1 then P(T (j) =∞|X0 = i) > 0 so mi ,j =∞,
I if hi ,j = 1 then mi ,j may or may not be finite.
Let T+(i) = inf{n > 0 : Xn = i}. Then
µi = E[T+(i)|X0 = i ],
is the expected time to return to state i .
Slide 19
The strong Markov property
The Markov property as defined, holds at each fixed time. We
would like the same to be true at certain random times such as
hitting times. The property that we need (which can be proved
from the Markov property) is called the
Strong Markov property: Let (Xt)t∈Z+ be a (time-homogeneous)
DTMC, and T be a stopping time for the chain. Then
P(XT+1 = j |T = t,X0 = x0, . . . ,XT = i)
= P(XT+1 = j |T <∞,XT = i) = pi ,j .
This says that looking at the next step of a Markov chain at a
stopping time is the same as starting the process from the random
state XT (provided that T is finite). As with the ordinary Markov
property, this can be generalized to handle general future events
etc.
Slide 20
Recurrence and transience
One can classify individual states as recurrent or transient. We will
be mostly interested in applying such a classification to irreducible
chains where we can avoid some technicalities.
Definition: An irreducible DTMC is recurrent if hi ,j = 1 for every
i , j ∈ S. Otherwise it is transient.
Slide 21
Characterizing recurrence
Let ∆i (j) be the time between the (i + 1)st and ith visit to state j ,
and let N(j) be the number of visits to state j . Suppose that
X0 = j .
If fj = 1 then each ∆i (j) is finite since the chain is certain to
return to j in finite time, and N(j) =∞.
If fj < 1 then with probability 1− fj > 0 it will never return to j .
From the Markov property we see that N(j) has a geometric
distribution.
Specifically, for n ≥ 1,
P(N(j) = n|X0 = j) = f n−1j (1− fj).
This implies that γj := E[N(j)|X0 = j ] = 11−fj <∞. Slide 22 Characterizing recurrence Theorem: For an irreducible DTMC the following are equivalent: (i) the chain is recurrent (ii) fi = 1 for every i ∈ S (iii) fi = 1 for some i ∈ S (iv) γi =∞ for some i ∈ S (v) γi =∞ for every i ∈ S (Note that if an irreducible chain is transient, then it visits each state only a finite number of times, so the limiting proportion of time spent in each state is 0). Slide 23 Characterizing recurrence (iii)⇔ (iv): essentially done 2 slides ago. (i)⇒ (ii): fi = ∑ j∈S pi ,jhj ,i = ∑ j∈S pi ,j1 = 1. (iii)⇒ (i): Suppose fi = 1. Let j ∈ S. Since i → j we must have hj ,i = 1 (otherwise the chain could “escape” from i by visiting j). Similarly, since fi = 1 and i → j , every time the chain reaches state i it has a fixed (positive) probability of hitting j before returning to i , so hi ,j = 1. Now for j , k ∈ S since we are guaranteed to hit i starting from j and guaranteed to hit k started from i , we are guaranteed to hit k starting from j , i,e. hj ,k = 1. others: exercise Slide 24 Irreducible finite-state chains are recurrent (We will state a stronger result later) Corollary: Irreducible finite-state chains are recurrent. Proof: Let S = {1, . . . , k}. Let N(j) denote the number of visits to j . Now ∑k j=1N(j) =∞, so for any i with P(X0 = i) > 0, we have
E[
∑k
j=1N(j)|X0 = i ] =∞. Thus there must exist j ∈ S such that
E[N(j)|X0 = i ] =∞.
For this j , γj ≥ E[N(j)|X0 = i ] =∞.
Slide 25
Simple Random Walk
Recall that S0 = 0 and Sn =
∑n
i=1 Yi , for n ≥ 1, where (Yi )i∈N are
i.i.d. random variables with P(Yi = 1) = p = 1− P(Yi = −1).
Sn is a sum of i.i.d. random variables. By the law of large numbers
we have Sn/n→ E[Y1] = 2p − 1. Therefore:
If p > 1/2 then the random walk escapes to +∞, and so the
number of visits to any state is finite. This implies that the chain
is transient (and similarly if p < 1/2).
Let us prove this another way (and also handle the case p = 1/2,
for which the LLN does not give us the answer).
Slide 26
Simple Random Walk
Note that the number of visits to 0 satisfies
N(0) =
∞∑
m=0
1{Sm=0},
so (since the walk starts at 0)
E[N(0)] =
∞∑
m=0
E[1{Sm=0}] =
∞∑
m=0
P(Sm = 0) =
∞∑
m=0
p
(m)
0,0 .
Note that p
(m)
j ,j = 0 if m is odd. If m = 2n then
p
(m)
j ,j = p
(2n)
j ,j =
(
2n
n
)
pn(1− p)n.
Slide 27
Simple Random Walk
Stirling’s formula n! ≈
√
2πnnne−n gives us the fact that
p
(2n)
j ,j ≈
(4p(1− p))n
√
nπ
,
and the series
∑∞
n=0 p
(2n)
j ,j
I diverges if 4p(1− p) = 1 (i.e. p = 1/2), so the DTMC is
recurrent
I converges if 4p(1− p) < 1 (i.e. p 6= 1/2) (compare to
geometric series), so the DTMC is transient.
We shall see another way of proving this same result later.
Slide 28
Periodicity
The simple random walk illustrates another phenomenon that can
occur in DTMCs - periodicity.
Definition: For a DTMC, a state i ∈ S has period d(i) ≥ 1 if
{n ≥ 1 : p(n)i ,i > 0} is non-empty and has greatest common
divisor d(i).
If state j has period 1, then we say that it is aperiodic (otherwise it
is called periodic).
It turns out that if i ↔ j then d(i) = d(j), thus we can make the
following definition:
Definition: An irreducible DTMC is periodic with period d if any
(hence every) state has period d > 1. Otherwise it is aperiodic.
Slide 29
Examples
I The simple random walk is periodic with period d = 2.
I Which of the following are transition matrices for periodic
chains?
P =
0 0.5 0.51 0 0
1 0 0
P =
0 0 0.5 0.5
1 0 0 0
0 1 0 0
0 1 0 0
Slide 30
States in a communicating class have same period
Assume that state j has period d(j) and j ↔ k. Then, as before,
there must exist s and t such that p
(s)
j ,k > 0 and p
(t)
k,j > 0. We know
straight away that d(j) divides s + t since it is possible to go from
j to itself in s + t steps.
Now take a path from k to itself in r steps. If we concatenate our
path from j to k in s steps, this r step path, and our path from
from k to j in t steps, we have an s + r + t step path from j to
itself. So d(j) divides s + r + t which means that d(j) divides r .
So the d(j) divides the period d(k) of k .
Now we can switch j and k in the argument to conclude that d(k)
divides d(j) which means that d(j) = d(k), and all states in the
same communicating class have a common period.
Slide 31
Computing hitting probabilities
For i ∈ S and A ⊂ S, let hi ,A denote the probability that the chain
ever visits a state in A, starting from state i . If A = {j} is a single
state then we have seen this before: hi ,{j} = hi ,j .
Let T (A) denote the first time we reach a state in A. Then
hi ,A = P(T (A) <∞|X0 = i).
We have
I if i ∈ A then hi ,A = 1
I hi ,A > 0 if and only if i → j for some j ∈ A
I hi ,A =
∑
j∈S pi ,jhj ,A if i /∈ A.
This is a set of linear equations.
Slide 32
A simple example
Consider a Markov chain with S = {1, 2, 3}, and p11 = p33 = 1
and p22 = 1/2, p21 = 1/3, p23 = 1/6. This Markov chain has
transition diagram
1 2 3
1 1/2 1
1/3
1/6
The chain has (two) absorbing states. Find hi ,1 for each i .
Clearly h1,1 = 1, and h3,1 = 0. Finally,
h2,1 =
1
3
h1,1 +
1
2
h2,1 +
1
6
h3,1 =
1
3
+
1
2
h2,1.
So h2,1 = 2/3.
(There is another easy way to see this)
Slide 33
Example: Gambler’s ruin
Starting with $i, a gambler makes repeated bets of $1 on a game
of chance that she has probability p of winning on each attempt
(independent of the past). She stops as soon as she reaches $w
(where w > i) or $0.
What is the probability that the gambler ends up with $0?
Let (Xn)n≥0 be a DTMC with state space S = {0, 1, . . . ,w}, and
transition probabilities pi ,i+1 = p and pi ,i−1 = 1− p if 0 < i < w
and p0,0 = 1 = pw ,w . The transition diagram is
0 1 · · · w − 1 w
1
1− p
p
1− p
p
1− p
p
1
Find the hitting probabilities (hi ,0)i∈S .
Slide 34
Example: Gambler’s ruin
We have h0,0 = 1, hw ,0 = 0 and otherwise
hi ,0 = phi+1,0 + (1− p)hi−1,0.
Rearrange to get
p(hi ,0 − hi+1,0) = (1− p)(hi−1,0 − hi ,0).
Let ui = hi−1,0 − hi ,0, and α = (1− p)/p. Then for 1 ≤ i < w
ui+1p = ui (1− p).
Thus, ui+1 = αui and u1 = 1− h1,0. It follows that for k ≤ w
uk = α
k−1u1.
Slide 35
Example: Gambler’s ruin
uk = α
k−1u1.
On the other hand,
hi ,0 = h0,0 +
i∑
m=1
(hm,0 − hm−1,0) = 1− u1
i−1∑
m=0
αm.
Since hw ,0 = 0 we have that
u1 =
(
w−1∑
m=0
αm
)−1
,
and therefore
hi ,0 = 1−
∑i−1
r=0 α
r∑w−1
m=0 α
m
=
∑w−1
r=i α
r∑w−1
m=0 α
m
.
Slide 36
Example: Gambler’s ruin
So
hi ,0(w) = 1−
∑i−1
r=0
(
1−p
p
)r
∑w−1
m=0
(
1−p
p
)m =
∑w−1
r=i
(
1−p
p
)r
∑w−1
m=0
(
1−p
p
)m .
If p = 1/2 we get
hi ,0 =
w − i
w
.
Exercise: check that these do indeed satisfy the equations that we
started with!
Exercise: find limw→∞ h1,0(w).
Slide 37
Example: Gambler’s ruin with Martingales
When p = 1/2 in the Gambler’s ruin problem, the chain is a
Martingale:
E[Xn+1|Xn] = Xn.
It is bounded because 0 ≤ Xn ≤ w for each n.
Let T = T ({0,w}) denote the first time that we hit 0 or w . Then
since (Xn)n≥0 is a bounded Martingale
E[XT ] = E[X0].
If we start with X0 = i then E[X0] = i . Also, P(XT = 0) = hi ,0
and P(XT = w) = 1− hi ,0 so
i = E[XT ] = 0× hi ,0 + w × (1− hi ,0).
Solving gives hi ,0 =
w−i
w
as before.
Slide 38
Example: Simple random walk
Consider the simple random walk, which is irreducible. Let
A = {0}. Then h0,0 = 1 and hi ,0 = phi+1,0 + (1− p)hi−1,0 for
i 6= 0. Then hi ,0 = 1 for all i satisfies these equations, regardless of
p. But we know that the walk is transient unless p = 1/2, so how
is this possible?
If S is infinite then there need not be a unique solution to these
equations.
Above we have not found the solution that we are looking for
(except when p = 1/2).
Slide 39
Example: Simple random walk
If i > 0 then
hi ,0 = phi+1,0 + (1− p)hi−1,0.
Let x = h1,0. Then x = ph2,0 + 1− p. Also
hi ,0 = hi ,i−1hi−1,0 = h1,0hi−1,0.
In particular with i = 2 we get h2,0 = h1,0h1,0 = x
2.
Therefore,
x = px2 + (1− p).
Solving the quadratic gives solutions
x = 1, and x = (1− p)/p.
If p ≤ 1/2 then h1,0 = 1 is the only possible value for h1,0, and in
this case hi ,0 = 1 for every i > 0.
If p > 1/2 then the walk is transient (to the right), so hi ,0 cannot
be 1 for i > 0. Thus h1,0 = (1− p)/p and hi ,0 = ((1− p)/p)i .
Do the above expressions look familiar?
Slide 40
The “correct” solution to the hitting probability equations
Theorem: The vector of hitting probabilities (hi ,B)i∈S is the
unique minimal non-negative solution to the equations
hi ,B =
{
1, if i ∈ B∑
j∈S pi ,jhj ,B , otherwise.
Proof: Let h
(n)
i ,B = P(T (B) ≤ n|X0 = i). Let (xi )i∈S be a
non-negative solution. We’ll show that h
(n)
i ,B ≤ xi for each n ∈ Z+,
by induction. This is sufficient since hi ,B = limn→∞ h
(n)
i ,B .
For n = 0, note that h
(0)
i ,B = 1 if i ∈ B and h
(0)
i ,B = 0 if i /∈ B. Since
(xi )i∈S are non-negative and equal 1 for i ∈ B, we have h
(0)
i ,B ≤ xi
for all i ∈ S.
Proceeding by induction, suppose that h
(n)
i ,B ≤ xi for all i ∈ S. Then
h
(n+1)
i ,B =
∑
j∈S
pi ,jh
(n)
j ,B ≤
∑
j∈S
pi ,jxj (by the induction hypothesis)
= xi ((xi )i∈S is a solution). Slide 41
Exercise:
Check that what this theorem says about the simple random walk
agrees with what we already proved.
Slide 42
Difference and differential equations
The equation hi ,0 = phi+1,0 + (1− p)hi−1,0 is a second-order linear
difference equation with constant coefficients.
These can be solved in a similar way to second-order linear
differential equations with constant coefficients, which you learned
about in Calculus II or accelerated Mathematics II.
Recall that, to solve
a
d2y
dt2
+ b
dy
dt
+ cy = 0,
we try a solution of the form y = y(t) = eλt to obtain the
Characteristic Equation
aλ2 + bλ+ c = 0.
Slide 43
Differential equations
If the characteristic equation has distinct roots, λ1 and λ2, the
general solution has the form
y = Aeλ1t + Beλ2t .
If the roots are coincident, the general solution has the form
y = Aeλ1t + Bteλ1t .
In both cases, the values of the constants A and B are determined
by the initial conditions.
Slide 44
Difference equations
The method for solving second-order linear difference equation
with constant coefficients is similar. To solve
avj+1 + bvj + cvj−1 = 0,
we try a solution of the form vj = z
j to obtain the Characteristic
Equation
az2 + bz + c = 0.
Slide 45
Difference equations
If this equation has distinct roots, z1 and z2, the general solution
has the form
y = Az
j
1 + Bz
j
2.
If the roots are coincident, the general solution has the form
y = Az
j
1 + Bjz
j
1.
The values of the constants A and B need to be determined by
boundary equations, or other information that we have.
Slide 46
Application to simple random walk
The characteristic equation of
hi ,0 = phi+1,0 + (1− p)hi−1,0
is
pz2 − z + (1− p) = 0
which (as we have seen before) has roots z = 1 and z = (1− p)/p.
If (1− p)/p 6= 1, the general solution for j ≥ 1 is of the form
hi ,0 = A + B
(
1− p
p
)j
.
Slide 47
Application to simple random walk
If (1− p)/p > 1, then the general solution is
hj ,0 = A + B
(
1− p
p
)j
.
Similarly, if (1− p)/p = 1, the general solution is of the form
hj ,0 = A + Bj .
In either case, these can only be probabilities if B = 0 and then
notice
A = h1,0 = ph2,0 + (1− p) = pA + (1− p),
so A = 1. This makes sense because p ≤ 1/2 and so we have a
neutral or downward drift.
Slide 48
Expected hitting times in DTMCs
For i ∈ S and A ⊂ S let mi ,A denote the expected time to reach A
starting from i . Note that mi ,{j} = mi ,j is a special case that we
have already seen before.
By definition and the Markov property we have that
mi ,A =
{
0, if i ∈ A
1 +
∑
j∈S pi ,jmj ,A, otherwise.
Theorem: The vector (mi ,A)i∈S of mean hitting times is the
minimal non-negative solution to
mi ,A =
{
0, if i ∈ A
1 +
∑
j∈S pi ,jmj ,A, otherwise.
Slide 49
Computing expected hitting times
Example: the symmetric simple random walk is recurrent. Now,
m1,0 = 1 +
1
2
m0,0 +
1
2
m2,0 = 1 +
1
2
m2,0.
But m2,0 = m2,1 + m1,0 = 2m1,0 so
m1,0 = 1 + m1,0.
This has no finite solution, so m1,0 =∞.
Slide 50
Positive recurrence
Recall that an irreducible DTMC is recurrent iff the return
probabilities satisfy fj = 1 for every j ∈ S.
Recall that the expected return times are denoted by µi .
Definition: A recurrent DTMC is positive recurrent if µj <∞ for
every j ∈ S. Otherwise it is null recurrent.
Slide 51
Finite irreducible DTMCs are positive recurrent
Lemma: Any irreducible DTMC with finite state space is positive
recurrent.
Proof: Let j ∈ S. Then there exists n0 ∈ N such that
P(T (j) ≤ n0|X0 = k) > 0 for every k ∈ S. Therefore there exists
some ε > 0 such that P(T (j) ≤ n0|X0 = k) > ε for every k ∈ S.
Starting from any state i , observe whether the chain has reached j
within n0 steps. This has probability at least ε. If it has not
reached j then observe it for the next n0 steps… The number of
blocks of n0 steps that we have to observe is dominated by a
Geometric(ε) random variable. Therefore mi ,j ≤ n0/ε for every i .
Thus, µj ≤ 1 + n0/ε <∞. Slide 52 Simple symmetric random walk: Recall that simple random walk with p = 1/2 is recurrent. Recall that for this walks m1,0 =∞. Therefore µ0 =∞ as well. So simple symmetric random walk is null recurrent. Slide 53 A question Some properties of a Markov chain depend only on the transition matrix P, while others depend on the starting state of the chain as well. We will now focus on the long run behaviour of DTMCs and one of the interesting questions to keep in mind as we do this is: When does a property of the MC depend on the initial distribution? Slide 54 Long run behaviour of DTMCs Let Nn(j) = ∑n−1 i=0 1{Xi=j} denote the number of visits to j before time n. Then Yn(j) = 1 n Nn(j) is the proportion of time spent in state j before time n. Note that Yn(j) is a random variable. Fundamental question: what is the long run proportion of the time spent in state j? I.e. what is Y (j) = limn→∞ Yn(j)? I We have argued that this will be zero if the chain is irreducible and transient. I For an irreducible DTMC we will get an answer to this question that does not depend on the initial distribution. I If the chain is reducible then the answer to this question may be random, and may depend on the initial distribution. A related fundamental question is: what is limt→∞ P(Xt = j)? Slide 55 A reducible example Consider a Markov chain with transition diagram 1 2 3 1 1/2 1 1/3 1/6 Find the limiting proportion of time spent in state 1. If X0 = 1 then Xt = 1 for all t so Y (1) = 1,Y (2) = 0,Y (3) = 0. If X0 = 3 then Xt = 3 for all t so Y (1) = 0,Y (2) = 0,Y (3) = 1. If X0 = 2 then Y (1) is random, with P(Y (1) = 1) = h2,1 = 2/3 and P(Y (1) = 0) = 1− h2,1 = 1/3 So in general, Y (1) is a Bernoulli random variable with P(Y (1) = 1) = P(X0 = 1) + 2 3 · P(X0 = 2). Slide 56 Long run behaviour of irreducible DTMCs Recall that µj denotes the mean return time to state j . Theorem: (∗) Let (Xi )i∈Z+ be an irreducible (discrete-time, time-homogeneous) Markov chain with state space S. Then P ( Nn(i) n → 1 µi ) = 1, (2) where we interpret the limit 1 µi as 0 = 1/∞ if the chain is not positive recurrent. Idea of proof: if the chain is transient then Nn(i) is bounded so Nn(i)/n converges to 0. Otherwise the chain visits state i about once in every µi steps, so the proportion of time spent at i is 1/µi . (We make this rigorous using the law of large numbers). Slide 57 Computing mean return times Note that µi = 1 + ∑ j∈S pi ,jmj ,i . So if you can find (mj ,i )j∈S then you are done! Slide 58 Stationary distribution A vector π = (πi )i∈S with non-negative entries is a stationary measure for a stochastic matrix P (or a DTMC with transition matrix P) if πi = ∑ j∈S πjpj ,i , for every i ∈ S. The above equations are called the full balance equations. In Matrix form this is πP = π (with π as a row vector). If π is a stationary measure with ∑ i∈S πi = 1 then π is called a stationary distribution for P. Slide 59 Stationary distribution Suppose that π is a stationary distribution for a DTMC (Xn)n∈Z+ with transition matrix P, and suppose that P(X0 = i) = πi for each i ∈ S. Then P(X1 = i) = ∑ j∈S P(X0 = j)pj ,i = ∑ j∈S πjpj ,i = πi , since π satisfies the full balance equations. This says that P(X1 = i) = πi too. By induction (exercise) we get: Lemma: Suppose that (πi )i∈S is a stationary distribution for P. Let (Xn)n∈Z+ be DTMC with transition matrix P and initial distribution equal to π (i.e. P(X0 = i) = πi for each i ∈ S),then P(Xn = i) = πi , for every i ∈ S, n ∈ N. So, if your initial distribution is a stationary distribution then your distribution at any time is the same (hence the use of the term stationary)! Slide 60 Examples: Find all stationary distributions for a Markov chain with the following transition diagram: 1 2 3 1 1/2 1 1/3 1/6 The full balance equations are: π1 = π1 + 1 3 · π2, π2 = 1 2 π2, π3 = π3 + 1 6 · π2. The second equation gives π2 = 0. The other equations then reduce to π1 = π1 and π3 = π3. Thus, any vector (π1, π2, π3) = (a, 0, b) with a, b ≥ 0 is a stationary measure. To get a stationary distribution, we require that a + b = 1, so the set of stationary distributions is the set of vectors of the form (a, 0, 1− a) with a ∈ [0, 1]. Slide 61 Existence and uniqueness We have just seen an example of a DTMC without a unique stationary distribution. The following is our main existence and uniqueness result. Theorem: (∗∗) An irreducible (time-homogeneous) DTMC with countable state space S has a stationary measure. It has a unique stationary distribution if and only if the chain is positive recurrent, and in this case πi = 1/µi for each i ∈ S. Combining Theorems (∗) and (∗∗) we see that for a positive recurrent irreducible DTMC, the long run proportion of time spent in state i is the stationary probability πi . Slide 62 Limiting distribution A distribution (ai )i∈S is called the limiting distribution for a DTMC (Xt)t∈Z+ if lim n→∞ P(Xn = i) = ai , for each i ∈ S. In general, I A limiting distribution need not exist I The limiting distribution (if it exists) is unique (by definition) I The limiting distribution (if it exists) depends on both the initial distribution and P. We have already seen that if π is a stationary distribution, and P(X0 = i) = πi for each i ∈ S then P(Xn = i) = πi for each i ∈ S, n ∈ N, so in this case π is also the limiting distribution. Slide 63 Examples: Consider a DTMC with initial distribution (b1, b2, b3) and transition diagram 1 2 3 1 1/2 1 1/3 1/6 Find the limiting distribution for the chain. Note that if b1 = 1 then the chain stays forever in state 1 so the limiting distribution is (1, 0, 0) in this case. Similarly it is (0, 0, 1) if b3 = 1. If b2 = 1 then we either hit state 1 (with probability h2,3 = 2/3) or 3 and then stay there, so the limiting distribution is (2/3, 0, 1/3) in this case (see the next slide for details). Slide 64 Examples: To answer the question in general, note that if Xk = 1 then Xn = 1 for all n ≥ k and therefore {Xn = 1} = ∪nk=0{Xk = 1}. Thus, lim n→∞ P(Xn = 1) = lim n→∞ P(∪nk=0{Xk = 1}) = P(∪ ∞ k=0{Xk = 1}). The last event H1 = ∪∞k=0{Xk = 1} is the event that we ever reach state 1, so P(H1) = P(H1|X0 = 1)P(X0 = 1) + P(H1|X0 = 2)P(X0 = 2) + P(H1|X0 = 3)P(X0 = 3) = 1 · P(X0 = 1) + 2 3 · P(X0 = 2) + 0 (3) = P(X0 = 1) + 2 3 · P(X0 = 2). Slide 65 Example: Consider a DTMC with S = {1, 2}, P(X0 = 1) = p and transition matrix P = ( 0 1 1 0 ) . Find the limiting distribution (if it exists). This chain is periodic (with period 2). Note that P(X2m = 1) = P(X0 = 1) = p and P(X2m+1 = 1) = P(X0 = 2) = 1− p. Thus P(Xn = 1) = { p, if n is even 1− p, if n is odd. This converges if and only if p = 1− p. In other words, for this example, a limiting distribution exists if and only if p = 1/2. Slide 66 Example: simple random walk Recall that S0 = 0 and pi ,i+1 = p and pi ,i−1 = 1− p. If p > 1/2 then Sn →∞ so P(Sn = i)→ 0 as n→∞.
If p = 1/2 the chain is recurrent, but we have seen (using Stirling’s
formula) that
P(S2n = 0) ≈
C
√
n
→ 0,
while P(S2n+1 = 0) = 0. So P(Sn = 0)→ 0 as n→∞
Similarly one can show that P(Sn = i)→ 0 for each i , so all of the
limits exist, but they are all 0, so there is no limiting distribution in
this example.
Slide 67
Limiting distribution results
Theorem: For a (time-homogeneous) DTMC, if a limiting
distribution exists then it is a stationary distribution.
Theorem:(∗ ∗ ∗) Let (Xn)n∈Z+ be an irreducible, aperiodic
(time-homogeneous) DTMC with countable state space S. Then
for all i , j ∈ S,
p
(n)
i ,j = P(Xn = j |X0 = i)→
1
µj
, as n→∞.
Slide 68
Ergodicity
We say that a DTMC is ergodic if the limiting distribution exists
and does not depend on the starting distribution. This is
equivalent to saying that aj = limn→∞ p
(n)
i ,j exists for each i , j ∈ S ,
does not depend on i , and
∑
j∈S aj = 1.
Theorem: An irreducible (time-homogeneous) DTMC is ergodic if
and only if it is aperiodic, and positive recurrent. For an ergodic
DTMC the limiting distribution is equal to the stationary
distribution.
Slide 69
Doubly stochastic P :
An m ×m stochastic matrix P is called doubly-stochastic if all the
column sums are equal to one.
Suppose that a finite-state DTMC has doubly stochastic transition
matrix.
Suppose that S contains exactly m elements. Then (exercise)
(1/m, 1/m, . . . , 1/m)P = (1/m, 1/m, . . . , 1/m).
It follows that the uniform distribution
π = (1/m, 1/m, . . . , 1/m),
is a stationary distribution.
Conversely if the uniform distribution on S = {1, . . . ,m} is a
stationary distribution for P then for each i ,
1
m
=
∑
j∈S
1
m
pj ,i =
1
m
∑
j∈S
pj ,i ,
so P is doubly stochastic.
Slide 70
Example:
Let (Xn)n∈Z+ be a DTMC with state space S = {1, 2, 3}, and
transition matrix
P =
1/2 1/2 01/2 1/2 0
0 1/2 1/2
.
If the initial distribution is x = (x1, x2, x3), find:
(i) the limiting distribution
(ii) the limiting proportion of time spent in state 2
Slide 71
Reversibility
An irreducible DTMC is called reversible if there exists a
probability distribution π = (πi )i∈S such that
πipi ,j = πjpj ,i , i , j ∈ S.
The above equations are called the detailed balance equations.
Any solution to these equations is a stationary distribution since if
π satisfies the detailed balance equations then∑
j∈S
πjpj ,i =
∑
j∈S
πipi ,j = πi
(∑
j∈S
pi ,j
)
= πi .
Slide 72
Kolmogorov’s reversibility criterion
A (time-homogeneous) irreducible DTMC with state space S is
reversible if and only if it has a stationary distribution and P
satisfies
pjnj1
n−1∏
i=1
pji ji+1 = pj1jn
n−1∏
i=1
pji+1ji ,
for every n and every {j1, j2, . . . , jn}
Interpretation: Suppose I show you a short video clip of a Markov
chain that starts and ends at the same state, and I choose to show
it either forwards or in reverse. I tell you what P is. Then what
you observe contains no information about whether I showed the
process forwards or in reverse if and only if the process is reversible.
Slide 73
Example
Let P be an irreducible (so it has a stationary measure) stochastic
matrix with pi ,j = 0 unless j ∈ {i − 1, i , i + 1}. (A Markov chain
with such a stochastic matrix is called a birth and death chain.)
Then for any sequence of n transitions taking us from state i to
state i : every occurrence of a transition j → j + 1 has a
corresponding transition j + 1 to j and vice versa.
Thus, reversing this sequence of n transitions we see exactly the
same number of transitions from j to j + 1 as the forward
sequence. This is true for all j and therefore the Kolmogorov
criterion is always satisfied for such a stochastic matrix, provided it
also has a stationary distribution.
Thus a positive recurrent birth and death chain is always reversible.
Slide 74
Example
Let (Xn)n∈Z+ be a DTMC with S = Z+ and transition probabilities
pi ,i+1 = pi ∈ (0, 1) for each i ∈ S, and p0,0 = 1− p0,1 and
pi ,i−1 = 1− pi for i ≥ 1.
This is a birth and death chain, so if it has a stationary distribution
then it satisfies the detailed balance equations:
πipi = πi+1(1− pi+1), i.e.
πi+1 = pi/(1− pi+1)πi .
Letting ρi = pi/(1− pi+1), it follows that
πi = x
i−1∏
j=0
ρj ,
gives a solution to the detailed balance equations.
Thus if
∑∞
i=0
(∏i−1
j=0 ρj
)
<∞ then there is a stationary
distribution (otherwise not).
Slide 75
Example: Random walk with one barrier
Continuing the example above, if pi = p, then ρi = ρ := p/(1− p)
for every i , and the sum is finite if and only if ρ < 1 (i.e. p < 1/2).
When p < 1/2 we have a stationary distribution, and the chain is
irreducible and aperiodic (due to the boundary), so also ergodic.
Otherwise we have no stationary distribution.
The limiting proportion of time spent in state 0 is π0. We find this
by
1 =
∞∑
i=0
πi =
∞∑
i=0
π0ρ
i = π0/(1− ρ).
In other words π0 = 1− ρ.
We will see something similar when we study simple queues later in
the course.
Slide 76
Summary:
We have introduced the Markov property, and notions of
communicating classes (and irreducibility). We have seen how to
calculate path probabilities, hitting probabilities and expected
hitting times. We have discussed transience, positive-recurrence
and null-recurrence, and periodicity.
For an irreducible, aperiodic and positive-recurrent DTMC, the
stationary distribution π has a number of interpretations. It can be
seen as:
I the initial distribution for which the process is a stationary
process
I the limiting probability of being in each state
I the limiting proportion of time spent in each state
Slide 77
Tricks of the trade
Sometimes we have a process that is not a Markov chain, yet we
can still use Markov chain theory to analyse its behaviour by being
clever.
Consider the following example:
Let (S ′t)t∈Z+ denote a random process with state space
S ′ = {1, 2, 3, 4}, S ′0 = 1 and S
′
1 = 2, such that whenever the
process is at i , having just come from j , it chooses uniformly at
random from S \ {i , j}. This process is a kind of non-backtracking
random walk (on the set S).
Exercise: Show that S ′ is not Markovian by showing that
P(S ′4 = 4|S
′
3 = 1) 6= P(S
′
4 = 4|S
′
3 = 1, S
′
2 = 4).
If we put Xt = (S
′
t , S
′
t+1) for t ∈ Z+ then (Xt)t∈Z+ is a Markov
chain on a state space S = {(i , j) : i , j ∈ S ′, i 6= j} with 12
elements (can relabel them 1 to 12 if you wish), with
p(i1,j1),(i2,j2) =
{
1
2
, if i2 = j1 and j2 /∈ {i1, j1}
0, otherwise. Slide 78