代写 algorithm math statistic Observations on the Metropolis-Hastings Algorithm

Observations on the Metropolis-Hastings Algorithm
Myron Hlynka & Michelle Cylwa Department of Mathematics & Statistics University of Windsor Windsor, Ontario, Canada, N9B 3P4
Abstract: We present some properties of the Metropolis-Hastings algorithm for con- structing a Markov chain with a given limiting probability distribution. In particular, we consider what happens if we apply the Metropolis-Hastings algorithm repeatedly to a “proposal” distribution which has already been updated.

1 Introduction
In MCMC (Markov Chain Monte Carlo) studies, there is extensive use of the Metropolis- Hastings algorithm. See Ross (2007) or Evans and Rosenthal (2004) or Ibe (2009). The algorithm is usually applied to infinite state Markov chains but also works for finite state chains. Most of the results in this paper are derived for the finite state case but also work in the infinite state case.
Suppose the states of a finite state Markov chain are labeled {1, 2, . . . , n}. Assume that the limiting vector π = (π1,…,πn) is known. (We assume throughout that all πi ̸= 0.) The Metropolis- Hastings algorithm finds a transition matrix with the given limiting vector. The algorithm has two parts. First, we select a “proposal” distribution for moving between states. Second, there is an “acceptance” distribution that can be used with the proposal distribution.
The proposal distribution from a state i consists of values of the ith row of a proba- bility transition matrix Q = [qij]. We refer to Q as the proposal transition matrix. The Metropolis-Hastings algorithm then defines a set of acceptance values αij and combines the qij and αij values to get the final transition probabilities pij .
Begin with the given π
transition matrix of an irreducible Markov chain.
. The user chooses any set of qij values such that Q is a Next, the acceptance distribution αij is defined as follows.
Ifi̸=j andqij ̸=0,define Ifi=jori̸=jandqij =0,defineαij =0.
αij =min􏰖1,πjqji􏰗. (1) πiqij
Next, we define the probability transition matrix P .
For i ̸= j, define pij = qijαij.
Fori=j,definepii sothatthesumofeachrowis1.
Then P = [pij] is a transition matrix with limiting vector π.
Since the proposal distribution qij can be almost anything, there are a huge number of possible Markov chains (and Markov transition matrices) that can result. The most
2

important new results in this paper are Properties 3.1, 3.4, 3.6, 3.7.
2 Initial Example
We begin by applying the Metropolis-Hastings algorithm to the sequence {1, 1, 2, 3, 5} toseewhathappens. Thenormalizedvectorisπ=(1,1,2,3,5). Thestatesare
labeled {1, 2, 3, 4, 5}. The “proposal” distribution uses a symmetric cyclic random walk so we expect to get a matrix of birth-death type (tridiagonal form with possible entries in the upper right and lower left corners). Choose q12 = q15 = .5, q23 = q21 = .5, q34 =q32 =.5,q43 =q45 =.5,q54 =q51 =.5. Allotherqij =0.
The α values are determined as follows. For i = 1, we have q15 = .5 and q12 = .5. Thus
α12 = min􏰖1, π2q21􏰗 = min􏰖1, (1/12)(.5)􏰗 = 1. π1 q12 (1/12)(.5)
α15 = min􏰖1, π5q51􏰗 = min􏰖1, (5/12)(.5)􏰗 = 1. π1 q15 (1/12)(.5)
Thusp12 =q12α12 =.5(1)=.5andp15 =q15α15 =.5(1)=.5
Nextp11 =1−p12 −p15 =1−.5−.5=0. Also0=p11 =p13 =p14. Sorow1ofPis [0, .5, 0, 0, .5].
Fori=2,wehaveq21 =.5andq23 =.5. Thus
α23 = min􏰖1, π3q32􏰗 = min􏰖1, (2/12)(.5)􏰗 = 1.
π2 q23 (1/12)(.5)
α21 = min􏰖1, π1q21􏰗 = min􏰖1, (1/12)(.5)􏰗 = 1.
π2 q21 (1/12)(.5) Thusp23 =q23α23 =.5(1)=.5andp21 =q21α21 =.5(1)=.5
Next p22 = 1 − p21 − p23 = 1 − .5 − .5 = 0. So row 2 of P is [.5, 0, .5, 0, 0]. Fori=3,wehaveq32 =.5andq34 =.5. Thus
α34 = min􏰖1, π4q43􏰗 = min􏰖1, (3/12)(.5)􏰗 = 1. π3 q34 (2/12)(.5)
α32 = min􏰖1, π2q32􏰗 = min􏰖1, (1/12)(.5)􏰗 = .5. π3 q32 (2/12)(.5)
Thus p34 = q34α34 = .5(1) = .5 and p32 = q32α32 = .5(.5) = .25
Next p33 = 1 − p32 − p34 = 1 − .25 − .5 = .25. So row 3 of P is [0, .25, .25, .5, 0].
12 12 12 12 12
3

Fori=4,wehaveq43 =.5andq45 =.5. Thus
α45 = min􏰖1, π5q43􏰗 = min􏰖1, (5/12)(.5)􏰗 = 1.
π4 q45 (3/12)(.5)
α43 = min 􏰖1, π3q43 􏰗 = min 􏰖1, (2/12)(.5) 􏰗 = 2/3.
π4q43 (3/12)(.5) Thus p45 = q45α45 = .5(1) = .5 and p43 = q43α43 = .5(2/3) = 1/3
Next p44 = 1 − p43 − p45 = 1 − 1/3 − .5 = 1/6. So row 4 of P is [0, 0, 1/3, 1/6, .5]. Fori=5,wehaveq54 =.5andq51 =.5. Thus
α54 = min􏰖1, π4q45􏰗 = min􏰖1, (3/12)(.5)􏰗 = .6. π5 q54 (5/12)(.5)
α51 = min 􏰖1, π1q15 􏰗 = min 􏰖1, (1/12)(.5) 􏰗 = 1/5. π5q51 (5/12)(.5)
Thus p51 = q51α51 = .5(1/5) = .1 and p54 = q54α54 = (.5)(.6) = .3
Next p55 = 1 − p51 − p54 = 1 − .1 − .3 = .6. So row 5 of P is [.1, 0, 0, .3, .6].
01001 22
1 0 1 0 0 22
Finally, the complete transition matrix is P =  0 1 1 2 0  . 444
002 1 1 662
10036 10 10 10
3 Observations
We are interested in knowing what happens if we apply the Metropolis-Hastings algo- rithm repeatedly.
Property 3.1. Suppose the limiting probability vector is π and the initial proposal probability transition matrix is Q0, resulting in the probability transition matrix P0. If we repeat the algorithm the same limiting probability vector π using Q1 = P0 as the proposal probability transition matrix, then the resulting probability transition matrix is P1 where P1 = P0.
Proof. We begin with proposal probability transition matrix Q0 = [q(0)] and limiting ij
probability vector π = (π1, . . . , πn) and calculate the probability transition matrix P0 4

by applying the algorithm. Then for q(0) ̸= 0 and i ̸= j we have p(0) = αijq(0) where ij ijij
􏰘 πjq(0)􏰙
αij =min 1, ji . Likewiseforq(0) ̸=0andi̸=jwehavep(0) =αjiq(0) where
πiq(0) ji ij
ji ji
􏰘 πiq(0) 􏰙 αji=min1, ij .
If we repeat the procedure using Q1 = P0 as the proposal probability transition matrix
πjq(0) ji
and the same limiting probability vector π
, then we compute the new probability tran-
sition matrix P1 = [p(1)]. For p(0) ̸= 0 and i ̸= j we have p(1) = βijp(0) where ij ij ij ij
􏰘 π α q(0)􏰙 βi,j=min 1, ji =min 1, j ji ji
􏰘 πjp(0)􏰙 πip(0)
πi αij q(0) ij
 

π q(0) min j ji
􏰘 πq(0)􏰙 iij 
ij
1, πjq(0)  ji
=min 1, (0) 􏰘 (0)􏰙 .  πi qij min 1, πjqji 
Case 1: πjq(0) = πiq(0) ji ij
⇒ βij = min􏰖1,1min{1,1}􏰗 = min{1,1} = 1. min{1, 1}
Case 2: πjq(0) > πiq(0) ji ij
 π q(0)   i ij 
 π q(0) πjq(0)  ⇒βij=min 1, j ji ji
Case 3: πjq(0) < πiq(0) ji ij  π q(0) ⇒βij=min 1, j ji (0)   1     πiq(0)  ij = min{1, 1} = 1. = min{1, 1} = 1. (0)  π i q ( 0 )  ij πiq(0) 1  ij   πiq πjq   ij ji  Thusfori̸=jandpij ̸=0wehavep(1) =p(0). Also,fori̸=j,ifp(0) =0thenβij =0so ij ij ij p(1) = p(0) = 0. Thus ∀i, p(1) = p(0) since the rows must sum to 1. Therefore, P1 = P0. ij ij ii ii Thus a repetition of the Metropolis-Hastings algorithm does not change the result- 5 ing Markov chain. Normally the limiting probability vector of the proposal transition matrix Q will not match the initial limiting probability vector. Even if we choose a proposal matrix Q which happens to have a limiting probability vector that matches the initial limiting probability vector, it is still not necessarily true that P = Q. Information about reversible Markov chains can be found in Ross (2007). Definition 3.1. A stationary ergodic Markov chain with transition matrix P and lim- iting vector π is reversible iff πipij = πjpji ∀i, j. We refer to the transition matrix P as reversible if the corresponding Markov chain is reversible. Property 3.2. The matrix P obtained by applying the Metropolis-Hastings algorithm is reversible. Proof. See Ross (2007). Thus only reversible matrices P can be the result of applying the Metropolis-Hastings algorithm. Most transition matrices P are thus excluded as possible output transition matrices. Property 3.3. We are given the limiting probability vector π transition matrix Q may have the same limiting probability vector π Hastings algorithm need not return Q as the calculated probability transition matrix. Proof. Use Property 3.2. So if we begin with a transition matrix Q that is non-reversible transition matrix of an irreducible Markov chain, then the Metropolis-Hastings algo- rithm must generate a new transition matrix P which is different from Q, regardless of whether or not the limiting matrix Q gives a limiting probability vector to match the initial π. . The proposal probability , but the Metropolis- 2 1 0 33 201 3 3 0 0 0 0 101 01 3 3 3 Example 3.1. The probability transition matrix Q = 110 3 3 3   1 0 has limiting 6 10011 333 probability vector π = 􏰔34, 13, 5 , 2 , 1 􏰕. When Q and π are used in the Metropolis- 55 55 55 55 55 38 13 0 0 0 51 51 2 8 5 00 33939  Hastings algorithm, the resultant probability transition matrix is P =  0 1 8 2 0  3 15 15  0 0 1 1 1 326 12 33 000 which also has limiting probability vector π. Property 3.4. Suppose we begin with limiting probability vector π and proposal proba- bility transition matrix Q. Then the calculated probability matrix P will be equal to the proposal matrix Q iff Q defines a reversible Markov chain and has limiting probability vector π. Proof. First recall that pij = αijqij ∀i ̸= j. Thus P = Q iff pii = qii ∀i and either αij =1orqij =0=pij ∀i̸=j. If Q is reversible with limiting vector π ,thenπjqji =πiqij ∀i,j. Ifi̸=jandqij ̸=0, thenαij =min􏰖1,πjqji􏰗=1sopij =qij. Ifi̸=jandqij =0thenpij =αijqij =0 πiqij sopij =qij. Sincewehavepij =qij ∀i̸=j,wemustalsohavepii =qii ∀iandhence P = Q. Next suppose P = Q. From Ross (2007), we know that P is reversible with limiting probability vector to match the original limiting vector. Since P = Q, we have Q is reversible as well. Suppose we begin with transition matrix Q that we can recognize as reversible, and then compute the limiting probability vector π (which is easy to compute for reversible Markov chains). If we apply Metropolis-Hastings to Q and π then we know that P = Q. 1 1 0 22 1 1 1 Example 3.2. Consider the proposal probability transition matrix Q = 3 3 3 . 0 1 2 33 We recognize this as a reversible matrix. See Jiang (2009). We compute π = ( 2 , 3 , 3 ). 888 Applying the Metropolis-Hastings algorithm to this Q and π results in Q and satisfies 7 πQ = π Property 3.5. The non-zero nondiagonal entries of P resulting from the application of the Metropolis-Hastings algorithm may only occur in the same positions as the non-zero nondiagonal entries of Q. Proof. Non-zero αij may only occur when qij ̸= 0 for i ̸= j. Thus non-zero pij may occur in the same positions as the non-zero qij. Of course for i ̸= j if qij = 0 then pij = 0. Property 3.6. If the entry qij = 0 for i ̸= j occurs in Q, then the Metropolis-Hastings algorithm results in P with pij = 0 and pji = 0. Proof. Since pij = αijqij, we have pij = 0. Since P is reversible, we have πipij = πjpji sopji =0(sinceπi ̸=0andπj ̸=0). CONDITION A Consider an n×n proposal probability transition matrix Q with the following properties: 1. The row sum is 1. 2. qij = qji,∀i,j,i ̸= j. 3. Any non-zero entry in the matrix is equal to the constant c. 4. qii =0orqii =c,∀i. Property 3.7. Suppose we apply the Metropolis-Hastings algorithm to matrix Q (sat- isfying Condition A) and limiting probability vector π = (π1, . . . , πn), πi > 0, ∀i. Then (a) the row(s) of P corresponding to the minimum entry of π are unchanged from the corresponding row(s) of Q.
(b) the column(s) of P corresponding to the maximum entry of pi are the same (except possibly for the diagonal entry) as the corresponding row(s) of Q.
.
8

. Since πj ≥ πi,
diagonal term ensures the row sum is equal to 1, therefore the ith row of Q and P are
equal.
Although CONDITION A seems restrictive, we note that a symmetric random walk will satisfy such a condition and that is a reasonable initial matrix Q.
From our previous result, we observe that given a limiting probability vector π and a proposal probability transition matrix Q, it is simple to find P. We illustrate with the following example.
Example 3.3. Consider, the limiting probability vector π = ( 1 , 1 , 2 , 3 , 5 , 8 ), and 20 20 20 20 20 20
0 0 1 0 1 0 22
0 0 0 1 0 1 22
Proof. Suppose πi is the minimum entry of π
αij = min􏰖1,πjqji􏰗 = min􏰖1,πj􏰗 = 1. Hence pij = αijqij = qij,∀j ̸= i. The
πiqij πi
the proposal probability transition matrix Q =
3.7, we know that the first two rows of P are the same as the first two rows of Q since they correspond to the minimum entry of π. Also the last column of P matches the last column of Q except perhaps the diagonal entry. From Property 3.6, we know where zero entries will occur and where non-zero entries may occur.
0 0 1 0 1 0 22
0001 01  2 2
⋆ 0 ∗ 0 ⋆ 0 Using only these two observations, so far we have P =  
0 ⋆ 0 ∗ 0 1  2
⋆ 0 ⋆ 0 ∗ 0 0 ⋆ 0 ⋆ 0 ∗
For each of the ⋆ terms we have pij = αij qij . For the terms below the diagonal αij = πj πi
since the entries of π increase as the index increases. For the terms above the diagonal αij = 1 for the same reason.
1 0001 0 22
 . From Property 0 1 0 0 0 1
22 1 0 1 0 0 0
2 2  010100
22
9

001 01 0 22
0 0 0 1 0 1 22
1 02 07 0 10 10 10 
1 01 02 0 444
We can easily fill in the ⋆ terms to find P = . 01 02 03 666
0 1 0 3 0 12 16 16 16
The diagonal terms are found by using the fact that the rows must sum to 1.
4 Conclusions
Since the Metropolis-Hastings algorithm is widely used so any understanding of the its operation is beneficial. A natural question regarding this algorithm is what happens when it is applied to the output distribution rather than the proposal distribution. This paper indicates that there is no gain.
Further questions arise as to whether computations can be simplified if the initial limiting distribution is strictly decreasing, for example, or perhaps unimodal. Such questions could be the subject of future work.
References
[1] Evans, M. and Rosenthal, J. (2004) Probability and Statistics: The Science of Un- certainty. W.H. Freeman & Co.
[2] Ibe, O.C. (2009). Markov Processes for Stochastic Modeling. Academic Press.
[3] Jiang, Q. (2009). Construction of Transition Matrices of Reversible Markov Chains. M.Sc. Major Paper. Department of Mathematics and Statistics. University of Wind- sor.
[4] Ross, S. (2007). Introduction to Probability Models (9th ed.). Academic Press.
10