程序代写代做代考 algorithm go C graph game chain discrete mathematics EECS 70 Discrete Mathematics and Probability Theory Fall 2020

EECS 70 Discrete Mathematics and Probability Theory Fall 2020
Finite Markov Chains
Note 21
These notes explain the theory of finite Markov chains. For CS70, we do not cover the proofs that are discussed in Appendix 2.
Introduction
Markov chains are models of random motion in a finite or countable set. These models are powerful because they capture a vast array of systems that we encounter in applications. Yet, the models are simple in that many of their properties can often be determined using elementary matrix algebra. In this course, we limit the discussion to the case of finite Markov chains, i.e., motions in a finite set.
Imagine the following scenario. You flip a fair coin until you get two consecutive ‘heads’. How many times do you have to flip the coin, on average? You roll a balanced six-sided die until the sum of the last two rolls is 8. How many times do you have to roll the die, on average?
As another example, say that you play a game of ‘heads or tails’ using a biased coin that yields ‘heads’ with probability 0.48. You start with $10. At each step, if the flip yields ‘heads’, you earn $1. Otherwise, you lose $1. What is the probability that you reach $100 before $0? How long does it take until you reach either $100 or $0?
You try to go up a ladder that has 20 rungs. At each time step, you succeed in going up by one rung with probability 0.9. Otherwise, you fall back to the ground. How many time steps does it take you to reach the top of the ladder, on average?
You look at a web page, then select randomly one of the links on that page, with equal probabilities. You then repeat on the next page you visit, and so on. As you keep browsing the web in that way, what fraction of the time do you open a given page? How long does it take until you reach a particular page? How likely is it that you visit a given page before another given page?
These questions can be answered using the methods of Markov chains, as we explain in these notes.
A First Example
Figure 1 illustrates a simple Markov chain. It describe a random motion in the set {0,1}. The position at time n = 0,1,2,… is Xn ∈ {0,1}. We call Xn the state of the Markov chain at step (or time) n. The set
01
Figure 1: A simple Markov chain
EECS 70, Fall 2020, Note 21
1

{0,1} is the state space, i.e., the set of possible values of the state. The motion, i.e., the time evolution, of Xn follows the following rules. One is given a number a ∈ [0,1] and two nonnegative numbers π0(0) and π0(1) that add up to 1. Then,
Pr[X0 = 0] = π0(0) and Pr[X0 = 1] = π0(1). (1) That is, the initial state X0 is equal to 0 with probability π0(0), otherwise it is 1. Then for n ≥ 0,
Pr[Xn+1 = 0|Xn = 0,Xn−1,…,X0] = 1−a (2) Pr[Xn+1 = 1|Xn = 0,Xn−1,…,X0] = a (3) Pr[Xn+1 = 0|Xn = 1,Xn−1,…,X0] = a (4) Pr[Xn+1 = 1|Xn = 1,Xn−1,…,X0] = 1−a. (5)
Figure 1 summarizes the rules (2)-(5). These rules specify the transition probabilities of the Markov chain. Rules (2)-(3) specify that if the Markov chain is in state 0 at step n, then at the next step it stays in state 0 with probability 1 − a and it moves to state 1 with probability a, independently of what happened in the previous steps. Thus, the Markov chain may have been in state 0 for a long time prior to step n, or it may have just moved into state 0, but the probability of staying in state 0 one more step is 1 − a in those different cases. Rules (4)-(5) are similar. Figure 1 is called the state transition diagram of the Markov chain. It captures the transition probabilities in a graphical form.
In a sense, the Markov chain is amnesic: at step n, it forgets what it did before getting to the current state and its future steps only depend on that current state. Here is one way to think of the rules of motion. When the Markov chain gets to state 0, it flips a coin. If the outcome is H, which occurs with probability a, then it goes to state 1; otherwise, it stays in state 0 one more step. The situation is similar when the Markov chain gets to state 1.
We define the transition probability matrix P by P(0,0) = 1−a,P(0,1) = a,P(1,0) = a,P(1,1) = 1−a.
That is
Hence,
􏰚1−a a 􏰛 P= a 1−a .
Pr[Xn+1 = j|Xn =i,Xn−1,…,X0]=P(i,j), forn≥0andi,j∈{0,1}.
Figure 2 shows some simulations of the Markov chain with different values of a. When a = 0.1, it is unlikely that the state of the Markov chain changes in one step. As the figure shows, the Markov chain spends many steps in one state before switching. For larger values of a, the state of the Markov chain changes more frequently. Note that, by symmetry, over the long term the Markov chain spends half of the time in each state.
A Second Example
Figure 3 shows the state transition diagram of a small web browsing experiment. Each state in the figure represents a web page. The arrows out of a state correspond to links on the page that point to other pages. The transition probabilities are not indicated on the figure, but the model is that each outgoing link is equally likely. The figure corresponds to the following probability transition matrix:
 0 1/2 0 1/2 0 
00100 P=1 0 0 0 0.
 1 / 3 1 / 3 0 0 1 / 3  0 1/2 1/2 0 0
EECS 70, Fall 2020, Note 21
2

Figure 2: Simulations of the two-state Markov chain
Figure 3: A five-state Markov chain. The outgoing arrows are equally likely.
EECS 70, Fall 2020, Note 21 3

Figure 4: Simulation of the five-state Markov chain. Figure 4 shows a simulation of the five-state Markov chain.
Finite Markov Chains
One defines a general finite Markov chain as follows. The state space is X = {1,2,…,K} for some finite K. The transition probability matrix P is a K × K matrix such that
and
P(i, j) ≥ 0,∀i, j ∈ X
K
∑ P(i, j) = 1, ∀i ∈ X . j=1
The initial distribution is a vector π0 = {π0(i),i ∈ X } where π0(i) ≥ 0 for all i ∈ X and ∑i∈X π0(i) = 1. One then defines the random sequence {Xn,n = 0,1,2,…} by
Pr[X0 = i] = π0(i),i ∈ X (6) Pr[Xn+1 = j | Xn = i,Xn−1,…,X0] = P(i, j),∀n ≥ 0,∀i, j ∈ X . (7)
Note that
Pr[X0 = i0,X1 = i1,…,Xn = in]
= Pr[X0 = i0]Pr[X1 = i1|X0 = i0]Pr[X2 = i2|X0 = i0,X1 = i1]···Pr[Xn = in|X0 = i0,…,Xn−1 = in−1] = π0(i0)P(i0,i1)···P(in−1,in).
Consequently,
EECS 70, Fall 2020, Note 21
4
Pr[Xn = in] = ∑ Pr[X0 = i0,X1 = i1,…,Xn = in] i0 ,…in−1
= ∑ π0(i0)P(i0,i1)···P(in−1,in) i0 ,…in−1
= π0Pn(in)

where the last expression is component in of the product of the row vector π0 times the n-th power of the matrix P.
Thus, if we designate by πn the distribution of Xn, so that Pr[Xn = i] = πn(i), then the last derivation proves the following result.
Theorem 21.1. One has
In particular, if π0(i) = 1 for some i, then πn( j) = Pn(i, j) = Pr[Xn = j|X0 = i].
Theorem 21.2. One has
if and only if π0 is invariant.
Proof:
πn =π0,∀n≥0
Ifπn =π0 foralln≥0,thenπ0 =π1 =π0P,sothatπ0 satisfies(10)andisthusinvariant. Ifπ0P=π0,thenπ1 =π0P=π0. Also,ifπn =π0,thenπn+1 =πnP=π0P=π0.
For instance, in the case of the two-state Markov chain, the balance equations are
π(0) = π(0)(1 − a) + π(1)a π(1) = π(0)a + π(1)(1 − a).
πn = π0Pn,n ≥ 0. (8) 􏰠
For the two-state Markov chain, one can verify that
n 􏰚 1−a a 􏰛n 􏰚 1/2+(1/2)(1−2a)n 1/2−(1/2)(1−2a)n 􏰛
P = a 1−a = 1/2−(1/2)(1−2a)n 1/2+(1/2)(1−2a)n . (9) Note that if 0 < a < 1, n 􏰚1/21/2􏰛 P→ 1/2 1/2 ,asn→∞. Consequently,for00.
Here is a remarkable result.
Theorem 21.3. Consider a finite irreducible Markov chain with state space X and transition probability matrix P. Then, for any initial distribution π0, 1
1 n−1
lim ∑1{Xm =i}=π(i),∀i∈X. (15)
n→∞ n m=0
In (15), π = {π(i),i ∈ X } is an invariant distribution. Consequently, the invariant distribution exists and
is unique.
􏰠
We sketch the proof of the result in the appendix. Here, we outline the main points of the argument. Consider the Markov chain in Figure 3. Assume that X0 = A and let T (A) be the first time after 0 that the Markov chain comes back to A. This random time has some mean value E[T(A)]. Let π(A) = 1/E[T(A)]. Thus, the time between two visits to A has mean value E[T(A)], so that the fraction of time that the Markov chain is in state A is π(A). Hence, over a long time n, the Markov chain is in state A for about nπ(A) steps. We define π(i) in the same way for the other states. These fractions of time in the different states must add up to one. Also, we claim that π satisfies the balance equations. To see this, note that over a large number n of steps, the Markov chain visits D and then A about nπ(D)P(D,A) times. Indeed, it visits D about nπ(D) times and each of these visits is followed by a visit to A with probability P(D, A). Similarly, if visits C and then A about nπ(C)P(C,A) times. Thus, a general Markov chain visits some state j and then state i about nπ(j)P(j,i) times in n steps, for j ∈ X . Now, the total number of visits to i in n steps is the total number of visits to some j followed by a visit to i. Hence, nπ(i) = ∑j nπ(j)P(j,i), which shows that π solves the balance equations.
Is it the case that Pr[Xn = i] converges to some value as n increases? A simple example shows that this does not have to be the case. Consider our two-state Markov chain and assume that a = 1. This Markov chain keeps switching state, at every step. Thus, if X0 = 0, then X1 = 1,X2 = 0,X3 = 1,X4 = 0, and so on. For this Markovchain,Pr[Xn =0]=1whennisevenandPr[Xn =0]=0whennisodd. Hence,Pr[Xn =0]keeps
1The observant reader will have noticed that the LHS of (15) is a random variable, and the RHS of (15) is a constant, so (15) does not make sense, strictly speaking! In fact, when discussing random variables, there are numerous ways to formalize the notion of convergence. One of them is convergence in probability, which we have already encountered when we introduced the weak law of large numbers:
􏰇􏰄􏰄 n−1 􏰄􏰄 􏰈 􏰄1 􏰄
Another form of convergence is convergence with probability 1: 􏰇 1n−1
􏰈
limP 􏰄 ∑1{Xm=i}−π(i)􏰄>ε =0, ∀ε>0. (12) n→∞ 􏰄 n m=0 􏰄
P lim ∑1{Xm=i}=π(i) =1. (13) n→∞ n m=0
It turns out that (13) implies (12) and in this case, both types of convergence occur. Furthermore, since the LHS of (15) is bounded, we also have convergence of expected value:
1 n−1
lim ∑ P(Xm = i) = π(i). (14) n→∞ n m=0
EECS 70, Fall 2020, Note 21
7

on oscillating between 0 and 1 and does not converge. Such a Markov chain is said to be periodic. However, if a ∈ (0, 1), then our calculations after Theorem 21.3 showed that Pr[Xn = 0] → 1/2 as n → ∞.
The following theorem generalizes this example.
Theorem 21.4. Consider an irreducible Markov chain on X with transition probability matrix P. Define d(i):=g.c.d{n>0|Pn(i,i)=Pr[Xn =i|X0 =i]>0},i∈X. (16)
(a) Then, d(i) has the same value for all i ∈ X . If that value is 1, the Markov chain is said to be aperiodic. Otherwise, it is said to be periodic with period d.
(b) If the Markov chain is aperiodic, then
Pr[Xn =i]→π(i),∀i∈X, asn→∞. (17)
In (17), π is the unique invariant distribution.
􏰠
To explain this theorem, we first need to clarify (16). For a given state i, the quantity d(i) is the greatest common divisor or all the integers n > 0 so that the Markov chain can go from state i to state i in n steps.
For instance, for the Markov chain in Figure 1, assume that a = 1. In that case, the Markov chain can go from state 0 to state 0 in n steps for all n in the set {2,4,6,8,…}. Thus, d(0) = g.c.d{2,4,6,…} = 2. Similarly, we find that d(1) = 2. The Markov chain is irreducible and Theorem 21.4 correctly predicted that d(0) = d(1). This Markov chain is periodic with period 2. If a ∈ (0,1), then the Markov chain can go from state 0 to state 0 in any n ≥ 1 steps. Thus, d(0) = g.c.d{1,2,3,…} = 1. Similarly d(1) = 1. The Markov chain is aperiodic and the theorem predicts that πn(i) → π(i) = 1/2, as we had verified explicitly.
As another example, consider the Markov chain in Figure 3. This Markov chain is irreducible. Is it aperi- odic? Looking at the state transition diagram, we see that
d(A) = g.c.d{2,3,…} = 1.
Indeed, the Markov chain can go from state A to state A in two steps (A → D → A) and in three steps (A → B → C → A). Thus, the Markov chain is aperiodic. Just for fun, let us compute d(B). We find d(B) = g.c.d.{3,4,…} = 1. Thus, Theorem 21.4 implies that πn(i) → π(i). For instance, Pr[Xn = A] → 12/39.
This is a powerful result because computing πn directly is not that simple!
We give the proof of the theorem in the appendix. Here are the key points of the argument when the Markov chain is aperiodic. Consider the Markov chain of Figure 3. Note that S(E) := {n > 0 | Pn(E,E) > 0} = {4,5,6,…}. Thus, any n ≥ n(E) := 4 is such that n ∈ S(E). In general, one can show that if d(i) = 1, then there is some integer n(i) so that {n(i), n(i) + 1, . . .} ⊂ S(i). Note also that the Markov chain can go from stateCtoEissomefinitenumberaofsteps(here,a=3). Also,itcangofromEtoCisbsteps(here, b = 1). Hence, it can go from C to C in a + n + b steps for any n ≥ n(E ) by first going from C to E in a steps, then from E to E in n steps, then from E to C in b steps. Thus, S(C) contains two consecutive integers, so that its g.c.d. d(C) is equal to one. Similarly, d( j) = 1 for any state j if there is some state i with d(i) = 1. Also, this argument shows that there is some integer k so that the Markov chain can go from any state j to some specific state i in k steps (the same k for all j). The next idea is a coupling argument. Imagine two independent versions of the Markov chain. The claim is that they must meet after some finite time. Indeed, every k steps, they have a positive probability of meeting in state i. Now, say that one version Xn starts with the invariant distribution π and the other, Zn starts with some arbitrary distribution π0. Define Yn so that it agrees with Zn until it meets with Xn and thereafter stays glued to Xn. The point is that Yn is a Markov chain with transition matrix P and initial distribution π0. But, since it meets with Xn after a finite time, we see that Pr[Yn =i]→Pr[Xn =i]=π(i).
EECS 70, Fall 2020, Note 21 8

Hitting Time
Consider the Markov chain in Figure 3. Assume it starts in state A. What is the average number of steps until it reaches state E? To calculate that average time, for i ∈ {A,B,C,D,E}, define β(i) to be the average time until the Markov chain reaches state E given that it starts from state i.
Thus, β(E) = 0 since it takes 0 step to reach E when starting in state E. We want to calculate β(A). However, it turns out that to calculate β (A), one also has to calculate β (B), . . . , β (D). We do this by finding equations that these quantities satisfy. We then solve these equations.
We claim that
β (A) = 1 + (1/2)β (B) + (1/2)β (D). (18)
To see this, note that when the Markov chain starts in state A, it stays there for one step. Then, with probability 1/2 it moves to state B. In that case, the average time untill it reaches E is β (B). With probability 1/2, the Markov chain moves to state D and then takes β(D) steps, on average to reach E. Thus, the time to reach E starting from state A is 1 step plus an average of β (B) steps with probability 1/2 and an average of β (D) steps with probability 1/2. Equation (18) capture that decomposition of the time to reach E starting from A.
An identity similar to (18) can be written for every starting state. We find
β(A) = 1+(1/2)β(B)+(1/2)β(D) β(B)=1+β(C)
β(C)=1+β(A)
β(D) = 1+(1/3)β(A)+(1/3)β(B)+(1/3)β(E) β(E)=0.
These equations are called the first step equations (FSE). Solving these equations, we find (see the appendix for the calculations)
β(A) = 17,β(B) = 19,β(C) = 18,β(D) = 13,β(E) = 0. (19)
Let us now consider a general finite Markov chain with transition probability matrix P on the state space X. LetA⊂X beasetofstates. Foreachi∈X,letβ(i)beaveragenumberofstepsuntiltheMarkov chain enters one of the states in A, given that it starts in state i.
Then one has
β(i)=0, ifi∈A β(i)=1+ ∑P(i,j)β(j).
j∈X
These equations are called the first step equations (FSE) for the average hitting time.
As another example, consider the Markov chain in Figure 1. Let β (i) be the average number of steps until the Markov chain enters state 1. The first step equations are
β(0) = 1+(1−a)β(0)+aβ(1) β(1) = 0;
Solving, we find β (0) = 1/a. Note that the time to enter state 1 starting from state 0 is the number of times one has to flip a loaded coin with Pr[H] = a until the first heads. This number of steps has a geometric
EECS 70, Fall 2020, Note 21 9

1/2 S 1/2
1/2 T 1/2 H 1/2 E
1/2
Figure 5: Flipping a fair coin until two heads in row.
distribution with parameter a. Thus, we have rediscovered the fact that the mean value of a G(a) random variable is 1/a.
Say that you flip a fair coin repeatedly until you get two heads in a row. How many times do you have to flip the coin, on average? Figure 5 shows a state transition diagram that corresponds to that situation. The Markov chain starts in state S. The state is H or T if the last coin flip was H or T , except that the state is E if the last two flips where heads. For i ∈ {S,T,H,E}, let β(i) be the average number of steps until the Markov chain enters state E. The first step equations are
Solving, we find
(See the appendix for the calculations.)
β(S) = 1+(1/2)β(T)+(1/2)β(H) β(T) = 1+(1/2)β(T)+(1/2)β(H) β(H) = 1+(1/2)β(T)+(1/2)β(E) β(E)=0.
β(S)=6. (20)
Now assume you roll a balanced six-sided die until the sum of the last two rolls is 8. The Markov chain that corresponds to this situation has a start state S, a state i for i ∈ {1,2,…,6} that indicates the value of the last roll, and an end state E that the Markov chain enters when the sum of the last two rolls is 8. Thus, if the state of the Markov chain is 5 and if the next roll is 2, then the new state is 2. However, if the next roll is 3, then the Markov chain enters state E. The first step equations for the average time β(i) it takes the Markov chain to enter state E are as follows:
Solving, we find
(See the appendix for the calculations.)
6 β(S)=1+∑(1/6)β(i)
i=1
β(i) = 1+ ∑ (1/6)β(j).
j s.t. i+ j̸=8
β (S) = 8.4. (21)
Consider now the 20-rung ladder. A man starts on the ground. At each step, he moves up one rung with probability p and falls back to the ground otherwise. Let β(i) be the average number of steps needed to reach the top rung, starting from rung i ∈ {0,1,…,20} where rung 0 refers to the ground. The first step equations are
β(i) = 1+(1− p)β(0)+ pβ(i+1),i = 0,…,19 β(20)=0.
EECS 70, Fall 2020, Note 21
10

Solving, we find
β(0) = p−20 −1. (22) 1−p
(See the appendix for the calculations.) For instance, if p = 0.9, then β (0) ≈ 72. Also, if p = 0.8, then β (0) ≈ 429. The morale of the story is that you have to be careful on a ladder.
Assume we play a game of heads-or-tails with a coin such that Pr[H] = p. For every heads, your fortune increases by 1 and for every tails, it decreases by 1. The initial fortune is m. Let β (n) be the average time until the fortune reaches the value 0 or the value M where M > m. The first step equations are
Solving, we find
β(n)=1+(1−p)β(n−1)+pβ(n+1), forn=1,…,M−1 β(0)=β(M)=0.
−1 β(n)=n(1−2p) −
M(1−2p)−1 n
1−ρM (1−ρ ). (23)
(See the appendix for the calculations.)
Probability of A before B
Let Xn be a finite Markov chain with state space X and transition probability matrix P. Let also A and B be two disjoint subsets of X . We want to determine the probability α(i) that, starting in state i, the Markov chain enters one of the states in A before one of the states in B.
The first step equations for α(i) are
Solving these equations, we find
1−ρn
α(n)= 1−ρM (24)
α ( i ) = ∑ P ( i , j ) α ( j ) , ∀ i ∈/ A ∪ B j
α(i)=1,∀i∈A α(i)=0,∀i∈B.
To see why the first set of equations hold, we observe that the event that theMarkov chain enters A before B starting from i is partitioned into the events that it does so by first moving to state j, for all possible value of j. Now, the probability that it enters A before B starting from i after moving first to j is the probability that it enters A before B starting from j, because the Markov chain is amnesic. The second and third sets of equations are obvious.
As an illustration, consider again the game of heads and tails and let α(n) be the probability that your fortune reaches M before 0 when starting from n with 0 ≤ n ≤ M. The first step equations are
α(n) = (1− p)α(n−1)+ pα(n+1),0 < n < M α(M)=1 α(0) = 0. where ρ := (1 − p) p−1 . (See the appendix for the calculations.) For instance, with p = 0.48 and M = 100, we find that α(10) ≈ 4 × 10−4, which is sobering when contemplating a trip to Las Vegas. Note that EECS 70, Fall 2020, Note 21 11 for each gambler who plays this game, the Casino makes $10.00 with probability 1 − 4 × 10−4 and loses $990.00 with probability 4×10−4, so that the expected gain of the Casino per gambler is approximately (1 − 4 × 10−4 ) × $10.00 − 4 × 10−4 × $990.00 ≈ $9.60. Observe that the probability of winning in one step is 48%, so that if the gamble did bet everything on a single game and stopped after one step, the Casino would only make 0.52 × $10.00 − 0.48 × $10.00 = $0.40 on average per gambler, instead of $9.60. The morale of the story is: don’t push your luck! Appendix 1: Calculations This section presents the details of the calculations of this note. Identity (9) By symmetry, we can write n 􏰚1−αn αn 􏰛 P= αn 1−αn for some αn that we determine below. Note that α1 = a. Also, n+1 􏰚 1−αn+1 αn+1 􏰛 n 􏰚 1−a a 􏰛􏰚 1−αn αn 􏰛 P = α 1−α =PP= a 1−a α 1−α . n+1 n+1 n n Consequently, by looking at component (0, 1) of this product, αn+1 = (1−a)αn +a(1−αn) = a+(1−2a)αn. Let us try a solution of the form αn = b+cλn. We need αn+1 = b+cλn+1 = a+(1−2a)αn = a+(1−2a)[b+cλn] = a+(1−2a)b+(1−2a)cλn. Matching the terms, we see that this identity holds if b = a+(1−2a)b and λ = 1−2a. The first equation gives b = 1/2. Hence, αn = 1/2+c(1−2a)n. To find c, we use the fact that α1 = a, so that 1/2 + c(1 − 2a) = a, which yields c = −1/2. Hence, αn = 1/2 − (1/2)(1 − 2a)n . Identity (11) The balance equations are π = πP. We know that the equations do not determine π uniquely. Let us choose arbitrarily π(A) = 1. We then solve for the other components of π and we renormalize later. We can ignore any equation we choose. Let us ignore the first one. The new equations are 1/2 0 1/2 0  0100 [π(B),π(C),π(D),π(E)] = [1,π(B),π(C),π(D),π(E)] 0 0 0 0 . 1/3 0 0 1/3 1/2 1/2 0 0 EECS 70, Fall 2020, Note 21 12 Equivalently, 0100 [π(B),π(C),π(D),π(E)] = [1/2,0,1/2,0]+[π(B),π(C),π(D),π(E)] 0 0 0 0 . By inspection, we see that π(D) = 1/2, then π(E) = (1/3)π(D) = 1/6, then π(B) = 1/2 + (1/3)π(D) + (1/2)π(E) = 1/2+1/6+1/12 = 3/4. Finally, π(C) = π(B)+(1/2)π(E) = 3/4+1/12 = 5/6. The com- ponents π(A)+···+π(E) add up to 1+3/4+5/6+1/2+1/6 = 39/12. To normalize, we multiply each component by 12/39 and we get π = [12/39, 9/39, 10/39, 6/39, 2/39]. We could have proceeded differently and observed that our identity implies that Hence, This procedure is a systematic way to solve the balance equations by computer. Identity (19) Using the third equation in the second, we find β (B) = 2 + β (A). The fourth equation then gives β (D) = 1 + (1/3)β (A) + (1/3)(2 + β (A)) = 5/3 + (2/3)β (A). The first equation then gives β (A) = 1 + (1/2)(2 + β (A)) + (1/2)[5/3 + (2/3)β (A)] = 17/6 + (5/6)β (A). Hence, (1/6)β (A) = 17/6, so that β (A) = 17. Con- sequently, β(B) = 19 and β(D) = 5/3+34/3 = 13. Finally, β(C) = 18. Identity (20) The last two equations give β (H ) = 1 + (1/2)β (T ). If we substitute this expression in the second equation, we get β(T) = 1+(1/2)β(T)+(1/2)[1+(1/2)β(T)], or β(T) = 3/2+(3/4)β(T). Hence, β(T) = 6. Consequently, β(H) = 1+(1/2)6 = 4. Finally, β(S) = 1+(1/2)6+(1/2)4 = 6.  1 −1 0 0  [π(B),π(C),π(D),π(E)] 0 1 0 0  = [1/2,0,1/2,0]. −1/3 0 1−1/3 −1/2 −1/2 0 1  1 −1 0 0 −1 [π(B),π(C),π(D),π(E)] = [1/2,0,1/2,0] 0 1 0 0  . EECS 70, Fall 2020, Note 21 13 −1/3 0 1−1/3 −1/2 −1/2 0 1  1 / 3 0 0 1 / 3  1/2 1/2 0 0 Identity (21) Let us write the equations explicitly: β(S) = 1+(1/6)[β(1)+β(2)+β(3)+β(4)+β(5)+β(6)] β(1) = 1+(1/6)[β(1)+β(2)+β(3)+β(4)+β(5)+β(6)] β(2) = 1+(1/6)[β(1)+β(2)+β(3)+β(4)+β(5)] β(3) = 1+(1/6)[β(1)+β(2)+β(3)+β(4)+β(6)] β(4) = 1+(1/6)[β(1)+β(2)+β(3)+β(5)+β(6)] β(5) = 1+(1/6)[β(1)+β(2)+β(4)+β(5)+β(6)] β(6) = 1+(1/6)[β(1)+β(3)+β(4)+β(5)+β(6)]. These equations are symmetric in β(2),...,β(6). Let then γ = β(2) = ··· = β(6). Also, β(1) = β(S). Thus, we can write these equations as The first equation gives The second yields β(S) = 1+(1/6)[β(S)+5γ] = 1+(1/6)β(S)+(5/6)γ γ = 1+(1/6)[β(S)+4γ] = 1+(1/6)β(S)+(2/3)γ. β(S) = 6/5+γ. γ = 3 + (1/2)β (S). Substituting the next to last identity into the last one gives Hence, and, consequently, Identity (22) γ = 3+(1/2)[6/5+γ] = 18/5+(1/2)γ. γ = 36/5 β(S) = 6/5+36/5 = 42/5 = 8.4. Let us look for a solution of the form β(i) = a+bλi. Then a+bλi = 1+(1− p)(a+b)+ p[a+bλi+1] = 1+(1− p)(a+b)+ pa+bpλi+1. This identity holds if i.e., Then, Since β (20) = 0, we need sothata=(1−p)−1p−20 and β(i)=(1−p) p −(1−p) p = 1−p . a = 1+(1− p)(a+b)+ pa and λ = p−1, b = −(1 − p)−1 and λ = p−1. β(i)=a−(1−p)−1p−i. 0=a−(1−p)−1p−20, −1 −20 −1 −i p−20 − p−i EECS 70, Fall 2020, Note 21 14 Identity (23) Let us make a little detour into the solution of such difference equations. Assume you have a function g(n) such that g(n) = 1+(1− p)g(n−1)+ pg(n−1) and two functions β (n) = h(n) and β (n) = k(n) such that β(n) = (1− p)β(n−1)+ pβ(n+1). Then, for any two constants a and b, we note that β (n) := g(n) + a.h(n) + b.k(n) satisfies β(n) = 1+(1− p)β(n−1)+ pβ(n+1). We can then choose the two constants a and b to make sure that 0 = β(0) = β(M). To find g(n), we try g(n) = αn. We need αn = 1+(1− p)α(n−1)+ pα(n+1) = 1+αn−(1− p)α + pα = αn+1−α +2pα. Thus, we need 1−α +2pα = 0, i.e., α = (1−2p)−1. To find solutions to β(n) = (1− p)β(n−1)+ pβ(n+1), we try β(n) = λn. Then, With n = 1, this gives λn = (1− p)λn−1 + pλn+1. λ =(1−p)+pλ2. Hence, pλ 2 − λ + (1 − p) = 0. λ = 1±􏰘1−4p(1− p) = 1±(1−2p) = 1 and ρ := (1− p)p−1. 2p 2p Thus, we find two such solutions that correspond to these two values of λ : h(n) := 1 and k(n) := ρn. We now choose the two parameters a and b so that β (n) = g(n) + ah(n) + bk(n) satisfies the two conditions β (0) = β (M) = 0. This should give us two equations in the two unknowns a and b. These equations are 0 = β(0) = g(0)+ah(0)+bk(0) = 0+a×1+b×1 = a+b 0 = β(M) = g(M)+ah(M)+bk(m) = M(1−2p)−1 +a×1+b×ρM. The solutions of this quadratic equation are and The first equation gives b = −a, so that the second implies Hence, Finally, 0 = M(1−2p)−1 +a(1−ρM). M(1−2p)−1 a=− 1−ρM . −1 β(n)=n(1−2p) − M(1−2p)−1 n 1−ρM (1−ρ ). EECS 70, Fall 2020, Note 21 15 Identity (24) In the calculation of (23), we found two solutions to (24): α(n) = 1 and α(n) = ρn with ρ = (1− p)p−1. Hence, for any two constants a and b, a solution is α(n) = a+bρn. We now choose aandbsothatα(0)=0andα(M)=1. Thatis, Thus, b = −a and Hence, 0 = a + b and 1 = a + bρ M . 1 = a(1−ρM), i.e., a = (1−ρM)−1. n n1−ρn α(n)=a+bρ =a(1−ρ )= 1−ρM. Appendix 2: Some Proofs Proof Sketch of Theorem 21.3: A formal proof of Theorem 21.3 is a bit complicated. However, we can sketch the argument to justify the result. First let us explain why (15) implies that the invariant distribution is unique. Assume that ν = {ν (i), i ∈ X } is an invariant distribution and choose π0 = ν. Then Pr[Xn = i] = ν(i) for all n ≥ 0. Call Yn the fraction in (15). Note that E[Yn] = ν(i) for all n, because E[1{Xm = i}] = Pr[Xm = i] = ν(i). Now, (15) says that Yn → π (i). We claim that this implies that E [Yn ] → π (i), so that ν (i) → π (i), which implies that ν(i) = π(i). To prove the claim, we use the fact that Yn → π(i) implies that, for any ε > 0, there is some n large enough so that Pr[|Ym −π(i)| ≤ ε] ≥ 1−ε for all m ≥ n. But then, because Ym ∈ [0,1], we see that E[|Ym −π(i)|] ≤ ε(1−ε)+ε, so that |E[Ym]−π(i)| ≤ E[|Ym −π(i)|] ≤ 2ε. This shows that E[Ym] → π(i).
The second step is to note that all the states must be recurrent, which means that the Markov chain visits them infinitely often. Indeed, at least one state, say state i, must be recurrent since there are only finitely many states. Consider any other state j. Every time that the Markov chain visits i, it has a positive probability p of visiting j before coming back to i. Otherwise, the Markov chain would never visit j when starting from i, which would contradict its irreducibility. Since the Markov chain visits i infinitely often, it also must visit j infinitely often, in the same way that if you flip a coin with Pr[H] = p > 0 forever, you must see an infinite number of Hs.
The third step is to observe that the times T(i,1),T(i,2),… between successive visits to one state i are independent and identically distributed, because the motion of the Markov chain starts afresh whenever it enters state i. By the law of large number, (T(i,1)+···+T(i,n))/n → E[T(i,1)] as n → ∞. Hence, n/(T(i,1)+···+T(i,n)) → π(i) := 1/E[T(i,1)]. That is, the rate of visits of state i is π(i).
The fourth step is to show that π(i) > 0 for all i. Indeed, π(i) > 0 for at least one state i, otherwise the rate of visits of all the states would be zero, which is not possible since these rates of visit add up to one. Also, if the Markov chain visits state i with rate π(i) > 0, then it visits state j with at least rate π(i)p > 0 because it visits j with probability p between two visits to i. Hence, π ( j) > 0 for all j.
The fifth step is to show that π(i) satisfies the balance equations. We saw that, during a large number n of steps, the Markov chain visits state j approximately nπ(j) times. Consider then a given state i. Since
EECS 70, Fall 2020, Note 21 16

that state is visited with probability P( j, i) after each visit to state j, state i should be visited approximately nπ(j)P(j,i) times immediately after the nπ(j) visits to state j. If we sum over all the states j, we see that state i should be visited approximately n ∑ j π ( j)P( j, i) times over n steps. But we know that this number of visitsisapproximatelynπ(i). Hence,itmustbethatn∑jπ(j)P(j,i)≈nπ(i),i.e.,thatπ(i)=∑jπ(j)P(j,i). These are the balance equations.
Proof of Theorem 21.4:
We give the proof in the aperiodic case, i.e., when there is some state i such that d(i) = 1.
Define S(i) = {n > 0 | Pn(i,i) > 0}. We fist show that there is some integer n(i) so that n ∈ S(i) if n ≥ n(i). Note that if g.c.d.(S(i)) = 1, then there must be a, b ∈ S(i) with g.c.d.{a, b} = 1. Using Euclid’s extended g.c.d. algorithm, we find integers m and n so that ma + nb = g.c.d{a, b} = 1. Let m+ = max{0, m}, n+ = max{0,n},m− = m+m+,n− = n+n+. Then (m+ −m−)a+(n+ −n−)b = 1 and we note that k := m−a+n−b and k+1=m+a+n+b are both in S(i). Now, if n≥k2, then one can write n=ak+b for some b∈ {0,1,…,k−1} and some a > k−1. But then
n = ak+b = (a−b)k+b(k+1) ∈ S(i), sincebothkandk+1areinS(i).Thus,anyn≥n(i)=k2 issuchthatn∈S(i).
Next we show that d( j) = 1 for every state j. Since it is irreducible, the Markov chain can go from j to i is someastepsandfromito jinsomebsteps. ButtheMarkovchaincangofromitoiinn(i)orinn(i)+1 steps. Consequently, the Markov chain can go from j to j in a+n(i)+b steps and also in a+n(i)+1+b stepsbygoingfrom jtoiinasteps,thenfromitoiinn(i)orn(i)+1steps,thenfromito jinbsteps. Hence, {n > 0 | Pn(j, j) > 0} contains two consecutive integers a+n(i)+b and a+n(i)+1+b, so that its g.c.d. must be equal to one. Thus, d( j) = 1 for every state j.
Let us now fix a state i arbitrarily. The claim is that there is some integer k such that the Markov chain can go from any state j to state i in k steps. To see this, using the irreducibility of the Markov chain, we know that for every j there is some integer n( j, i) so that the Markov chain can go from j to i in n( j, i) steps. But then, the Markov chain can go from j to i in n + n( j, i) steps for any n ≥ n( j). Indeed, the Markov chain can gofirstfrom jto jinnsteps,thenfrom jtoiinn(j,i)steps. Thus,theMarkovchaincangofrom jtoiin n steps, for any n ≥ n(j)+n(j,i). We then let k = maxj{n(j)+n(j,i)}.
Next, consider two independent copies Xn and Zn of the Markov chain with transition matrix P. Markov chain Xn starts with the invariant distribution π. Markov chain Zn starts with an arbitrary initial distribution π0. Define state i and the integer k as in the previous paragraph. There is some positive probability p that the two Markov chains both are in state i after k steps. If not, there is again a probability p that they are both in state i after k more steps, and so on. Thus, if we designate by τ the first time that the two Markov chains meet,i.e.,τ=min{n≥0|Xn =Zn},weseethatPr[τ>km]≤(1−p)m form=1,2,…. Now,definethe MarkovchainYn sothatYn =Zn forn<τ andYn =Xn forn≥τ. Inwords,theMarkovchainstartslikeZn, but it sticks to Xn once Xn = Zn. This Markov chain Yn still has transition matrix P and its initial distribution is π0. Note that Pr[Xn ̸= Yn] = Pr[τ > n] → 0 as n → ∞. Hence,
|Pr[Xn =i]−Pr[Yn =i]|≤Pr[Xn ̸=Yn]→0, asn→∞.
But Pr[Xn = i] = π(i) for all n since Xn starts with the invariant distribution π. We conclude that Pr[Yn =
i] → π(i) as n → ∞.
EECS 70, Fall 2020, Note 21 17