CHAPTER8
The Wishart Distribution
The Wishart distribution arises in a natural way as a matrix generalization of the chi-square distribution. If X,,. .., X,, are independent with C(4) = N(0, l), then C;T* has a chi-square distribution with n degrees of freedom. When the are random vectors rather than real-valued random variables say Xi E RP with C(X> = N(0, I,), one possible way to generalize the above sum of squares is to form the p X p positive semidefinite matrix S = C;X,x.Essentially, this representation of S is used to define a Wishart distribution. As with the definition of the multivariate normal distribution, our definition of the Wishart distribution is not in terms of a density function and allows for Wishart distributions that are singular. In fact, most of the properties of the Wishart distribution are derived without reference to densities by exploiting the representation of the Wishart in terms of normal random vectors. For example, the distribution of a partitioned Wishart matrix is obtained by using properties of conditioned normal random vectors.
After formally defining the Wishart distribution, the characteristic func- tion and convolution properties of the Wishart are derived. Certain gener- alized quadratic forms in normal random vectors are shown to have Wishart distributions and the basic decomposition of the Wishart into submatrices is given. The remainder of the chapter is concerned with the noncentral Wishart distribution in the rank one case and certain distributions that arise in connection with likelihood ratio tests.
8.1. BASIC PROPERTIES
The Wishart distribution, or more precisely, the family of Wishart distribu- tions, is indexed by a p X p positive semidefinite symmetric matrix Z, by a
dimension parameter p, and by a degrees of freedom parameter n. Formally, we have the following definition.
Definition 8.1. A random p x p symmetric matrix S has a Wishart distri- bution with parameters Z, p, and n if there exist independent random vectorsX,,…,X, inRPsuchthatC(X,)=N(0,Z),i= 1,…,nand
Inthiscase,wewriteC(S)= W(Z,p,n).
In the above definition, p and n are positive integers and Z is a p X p positive semidefinite matrix. When p = 1, it is clear that the Wishart distribution is just a chi-square distribution with n degrees of freedom and scale parameter Z 2 0. When Z = 0, then X,= 0 with probability one, so S = 0 with probability one. Since C;X,X; is positive semidefinite, the Wishart distribution has all of its mass on the set of positive semidefinite matrices. In an abuse of notation, we often write
when C ( S ) = W ( Z ,p, n). As distributional questions are the primary con-
, Proposition 8.1. If C(S)= W(Z,p, n) and A is an r xp matrix, then
C(ASAf)= W(AZAf,r, n).
Proof: SinceC(S)= W(Z,p,n),
where C(X)= N(0,I, 8 Z) in Cp,,. Thus C(ASA’)= C(AX’XA’) = C[((I, 8 A)X)'(I, 8 A)X]. But Y = (I, @ A)X satisfies C(Y) = N(0, I, @ (AZA’))in Cr,. and C(YfY)= C(ASAf).The conclusion follows from the definition of the Wishart distribution.
cern in this chapter, this abuse causes no technical problems. If X E Cp, has rows Xi,…,XA, it is clear that C(X)= N(0,I, @ Z) and X’X = L;tXIXl!.ThusifC(S)= W(Z,p,n),thenC(S)= C(X’X)whereC(X)= N(0, I, €3 Z ) in CP,?. Also, the converse statement is clear. Some further properties of the Wishart distribution follow.
304 THE WISHART DISTRIBUTION
One consequence of Proposition 8.1 is that, for fixed p and n, the family of distributions {W(&p, n)lE > 0) can be generated from the W(I,, p, n) distribution and p X p matrices. Here, the notation Z >, 0 (Z > 0) means that Z is positive semidefinite (positive definite). To see this, if C(S) = W(I,, p, n) and Z = AA’, then
In particular, the family {W(Z, p, n)(Z > 0) is generated by the W(I,, p, n) distribution and the group GI, acting on S,by A(S) = ASA’. Many proofs are simplified by using the above representation of the Wishart distribution. The question of the nonsingularity of the Wishart distribution is a good example. If C(S) = W(2, p, n), then S has a nonsingular Wishart distribu- tion if S is positive definite with probability one.
Proposition 8.2. Suppose C(S) = W(Z, p, n). Then S has a nonsingular Wishart distribution iff n > p and Z > 0. If S has a nonsingular Wishart distribution, then S has a density with respect to the measure v(dS) = dS/I SI(P+’)I2given by
Here, o ( p , n) is the Wishart constant defined in Example 5.1.
Proof. Represent the W(Z, p, n) distribution as C(AS,A’) where C(S,) = W(I,, p, n) and AA’ = Z. Obviously, the rank of A is the rank of Z and Z > 0 iff rank of 2 is p. If n
, p and Z is positive definite. Then S, = C;X,,X has rank p with probability one by Proposition 7.1, and A has rank p. Therefore, S = AS,A’ has rank p with probability one.
When Z > 0, the densityof X E,?f is
when C(X) = N(0, In8 Z). When n >, p, it follows from Proposition 7.6 that the density of S with respect to v(dS) isp(SJZ).
,
Recall that the natural inner product on S,, when S, is regarded as a
,is
The mean vector, covariance, and characteristic function of a Wishart
distribution on the inner product space (S,,( . , .)) are given next. Proposition 8.3. Suppose C(S) = W(2, p, n) on (S,( . , .)). Then
(i) &S= n2.
(ii) Cov(S)=2nZ82.
(iii) +(A)=&exp[i(A,S)]= II, – ~~ZAI-“‘~.
Proof. To prove (i) write S = CT4X where C ( 4 ) = N(0, Z), and XI,…, X, are independent. Since &X,X; = 2 , it is clear that & S = nZ. For (ii), the independence of XI,…,Xnimplies that
Cov(S) = Cov C4q = CCov(Xiy) = nCov(XIXi) i: i :
where XI XI is the outer product of XI relative to the standard inner productonRP.SinceC(Xl)= C(CZ)whereC(Z)= N(0,I,) andCC’ = 2, it follows from Proposition 2.24 that Cov(Xl XI)= 22 8 2. Thus (ii) holds. To establish (iii), first write C’AC = r D r ‘ where A E ,S, CC’ = 2 , rE a, andDisadiagonalmatrixwithdiagonalentriesA,,…,A,. Then
subspace of C,
Again, C(Xl) = C(CZ) where C(Z) = N(0, I,). Also, C(I’Z) = C(Z) for
306 THE WISHART DISTRIBUTION r E Op. Therefore,
[(A) = &exp[iX;AXl] = &exp[iZfCfACZ]
whereZ,,…,Zparethecoordinatesof Z.SinceZ,,…,Zpareindependent with C(Zj) = N(0, I), Z; has a X: distribution and we have
The next to the last equality is a consequence of Proposition 1.35. Thus (iii) holds.
Proposition8.4. IfC(S,)= W(2,p,n,)fori= 1,2andifS,andS2are independent, then I?(S, + S,) = W(Z, p, n, + n,).
Prooj An application of (iii) of yields this convolution result. Specifically,
@(A)= &exp[i(A,S,+S,)] =n&expi(A,3) j= 1
The uniqueness of characteristic functions shows that C(S, +S,) = W(Z, p, n, + n,).
It should be emphasized that ( . , . ) is not what we might call the standard inner product on Sp when Sp is regarded as a [ p ( p + 1)/2]- dimensional coordinate space. For example, if p = 2, and S, T E Sp, then
(S, T) = trST = sl,tll+s2,t2, +2sI2t,,
while the three-dimensional coordinate space inner product between S and
Proposition 8.3
T would be sl,tl,+s2,t2, +s12tl,.In this connection, equation (ii) of Proposition 8.3 means that
cov((A,S),(B,S)) = 2n(A,(28Z)B)
= 2n(A, ZBZ) = 2n tr(AZBZ),
that is, (ii) depends on the inner product ( . , .) on Spand is not valid for other inner products.
In Chapter 3, quadratic forms in normal random vectors were shown to have chi-square distributions under certain conditions. Similar results are available for generalized quadratic forms and the Wishart distribution. The following proposition is not the most general possible, but suffices in most situations.
Proposition 8.5. Consider X E Cp, where C(X) = N(p, Q 8 2). Let S = X’PX where P is n x n and positive semidefinite,and write P = A2with A positive semidefinite. If AQA is a rank k orthogonal projection and if Pp = 0, then
Proof: With Y = AX, it is clear that S = Y’Y and
Since %(A) = %(P) and Pp = 0, Ap = 0 so
By assumption, B = AQA is a rank k orthogonal projection. Also, S = Y’Y = Y’BY+Yf(I-B)Y,andC((I-B)Y)=N(0,O8Z)soYf(I-B)Yis zero with probability one. Thus it remains to show that if C(Y) = N(0, B 8 Z) where B is a rank k orthogonal projection, then S = Y’BY has a W(Z, p, k ) distribution. Without loss of generality (make an orthogonal
transformation),
Partitioning Yinto Y, : k X p and Y2:(n – k) X p, it follows that S = Y;Y,
308 THE WISHART DISTRIBUTION
and
Thus C(S)= W(2,p, k).
+ Example 8.1. We again return to the multivariate normal linear mode1 introduced in Example 4.4. Consider X E C, with
where p is an element of the subspace M G C,,, defined by = { X I X E Cp,n,x = ZB, B E Cp,k ) .
Here, Z is an nxk matrix of rank k and it is assumed that n-k2p.WithP,=Z(Z’Z)-‘z’, PM=P,8I,istheorthogonal projection onto M and QM= Q, @ I,, Q, = I – P,, is the orthogo- nal projection onto M I . We know that
is the maximum likelihood estimator of p. As demonstrated in Example 4.4, the maximum likelihood estimator of Z is found by maximizing
Since n – k >, p, x’Q,x has rank p with probability one. When X’Q, X has rank p, Example 7.10 shows that
is the maximum likelihood estimator of Z. The conditions of Proposition 8.5 are easily checked to verify that S = X’Q,X has a W(Z, p, n – k ) distribution. In summary, for the multivariate lin- ear model, fi = P,X and 5 = n-‘X’Q,X are the maximum likeli- hood estimators of p and Z. Further, fi and 2 are independent and
e(&)= ~ ( zp,n-k). #
PROPOSITION 8.6
8.2. PARTITIONING A WISHART MATRIX
The partitioning of the Wishart distribution considered here is motivated partly by the transformation described in Proposition 5.8. If C ( S ) =
W ( Z ,p, n ) where n 2 p, partition S as
where S,, = S;, and let
Here,SijispiXpjfori,j = 1,2sop,+p, =p.Theprimaryresultofths section describes the joint distribution of ( S , ,., S,,, S,,) when 2 is nonsin- gular. This joint distribution is derived by representing the Wishart distribu- tionintermsof thenormaldistribution.SinceC(S)= W(Z,p, n),S = X’X whereC(X)= N(0,In8 2). ,X is assumed to take values in %, the set of all n x p matrices of rankp. With
it is clear that Thus
where
is an orthogonal projection of rank n – p, for each value of X, when X E %. To obtain the desired result for the Wishart distribution, it is useful to first give the joint distribution of (QX,, PX,, X,).
Proposition 8.6. The joint distribution of ( Q X , , P X , , X , ) can be described as follows. Conditional on X,, QXl and PX, are independent with
310 THE WISHART DISTRIBUTION and
Also,
Proof: From Example 3.1, the conditional distribution of XI given X2, say c(X,lx,>,is
Thus conditional on X,, the random vector
is a linear transformation of XI. Thus W has a normal distribution with
mean vector
( r e i P]
,,-,,
2I =(iZ2221)
since QX, = 0 and PX, = X,. Also, using the calculational rules for parti- tioned linear transformations, the covariance of W is
since QP = 0. The conditional independence and conditional distribution of QX, and PX, follow immediately. That X, has the claimed marginal distribution is obvious.
Proposition8.7. SupposeC(S)= W(2,p,n)withn>,pandZ>0.Parti- tion S into S,,, i, j = 1,2,where S,, ispi x p,, p, +p, = p, and partition Z similarly. With S,,., = S,, – SI2S~’S2S,,,., and (S,,, S,,) are stochasti- cally independent. Further,
PROPOSITION 8.7
and conditional on S,,,
The marginal distribution of S,, is W(E,,, p,, n).
Proof. In the notation of Proposition 8.6, consider X E % with C ( X ) = S=X’X. Then S,,=ZX, for i,j=1,2 and S,,,, = X;QX,. Since PX, = X, and S,, = X;X,, we see that S2,= (PX,)’XI= X;PX,,andconditionalonX,,
To show that S,,., and (S,,, S,,) are independent, it suffices to show that
for bounded measurable functions f and h with the appropriate domains of definition. Using Proposition 8.6, we argue as follows. For fixed X,, QX,
,
and Q is a rank n – p, orthogonal projection. By Proposition 8.5,
and PX, are independent so S,,. tionally independent. Also,
= XiQQX, and S,, = X;PX, are condi-
C(x;QX,lx2)= W(E,,.
independent. Conditioning on X,, we have
,,
p,,n-p,) foreachX,soX;QXlandX2are
Therefore, S ,,. ,and (S,,, S2,) are stochastically independent. To describe
312 THE WISHART DISTRIBUTION
the joint distribution of S2,and S2,, again condition on X2. Then
and this conditional distribution depends on X2 only through Sz2= X;X2. Thus
That S2,has the claimed marginal distribution is obvious.
By simply permuting the indices in Proposition 8.7, we obtain the following proposition.
Proposition 8.8. With the notation and assumptions of Proposition 8.7, let S2,.,=S,,-S2,Sfi1~,,T.henS,,.,and(S,,,S,,)arestochasticallyinde- pendent and
C(S22.I) = W(222.1, P2, n -PI). Conditional on S,,,
and the marginal distribution of S,, is W(Z,,, p,, n).
Proposition 8.7 is one of the most useful results for deriving distributions of functions of Wishart matrices. Applications occur in this and the remain- ing chapters. For example, the following assertion provides a simple proof of the distribution of Hotelling’s-T2, discussed in the next chapter.
Proposition 8.9. Suppose So has a nonsingular Wishart distribution, say W(2, p, n), and let A be an r x p matrix of rank r. Then
e((as;’ar)’)= W((AZ-~A~)-r’,,n – p +r).
Proof: First, an invariance argument shows that it is sufficient to consider the case when Z = I. More precisely, write E = B2 with B > 0 and let C = AB-‘. With S = B-‘s,B-‘, C(S) = W(I, p, n) and the assertion is that
PROPOSITION 8.9
Now, let ‘4’ = C'(CC’)-‘/2, SO the assertion becomes
c((*’s-‘*)-I) = W(I,, r, n – p +r).
However, \k is p X r and satisfies \kf’4’= I,-that Since C(r’Sr) = e(S) for all r E BP,
Choose T so that
For this choice of r , the matrix (P’I”S-‘I’\k)-‘
r x r upper left corner of S-I, and this matrix is
is, *is a linear isometry.
is just the inverse of the
where V is r x r. By Proposition 8.7,
since C(S) = W(I, p, n). This establishes the assertion of the proposition.
When r = 1 in Proposition 8.9, the matrix A’ is nonzero vector, say A’ = a E Rp. In this case,
when C(S) = W(2, p, n). Another decomposition result for the Wishart distribution, which is sometimes useful, follows.
Lemma 8.10. Suppose S has a nonsingular Wishart distribution, say C(S) = W(Z,p,n),andletS= TT’whereTE G;.ThenthedensityofTwith
respect to the left invariant measure v(dT) = dT/Iltli is
314 THE WISHART DISTRIBUTION
If S and T are partitioned as
where S,, is pi x p,, p, +p2 = p, then S,, = TIIT;,,S,, = TllT;l,and S2,., = T2,Ti2.Further, the pair (TI,,T,,) is independent of T2,and
Proof. The expression for the density of T is a consequence of Proposition 7.5, and a bit of algebra shows that S,, = TI,Ti,, S12= TI,Ti,, and S2,.,= T2,Ti2.The independence of (TI,,T2,)and T2,follows from Proposition 8.8 and the fact that the mapping between S and T is one-to-one and onto. Also,
Since Slland TI,are one-to-one functions of each other and S12= TIIT;,, ~~Tl,T;lITl=l)N ( T I I T ; I ~ , ‘ ~ I ~ ~ TIIT;,@ 222.1).
Thus
and T I ,is fixed.
Proposition 8.11. Suppose S has a nonsingular Wishart distribution with C ( S )= W ( 2 ,p, n ) and assume that 2 is diagonal with diagonal elements all,…,a,. If S = TT’ with T E G:, then the random variables{tijli>j} are mutually independent and
and
C(ti,) = N(0,a,,)
for i > j
PROPOSITION 8.11
ProoJ: First, partition S, 2 , and T as
where S,, is 1 x 1. Since Z,, = 0, the conditional distribution of T,, given TI, does not depend on TI, and 8,, has diagonal elements a,,, …, a,. It follows from Proposition 8.10 that t,,, Ti,, and T, are mutually indepen- dent and
The elements of T,, are t,,, t,,, . . . , t,,, and since Z,, is diagonal, these are independent with
Also. and
C(til)= N(0, a,,),
i= 2,…,p.
The conclusion of the proposition follows by an induction argument on the dimension parameter p.
When C(S) = W(Z, p, n) is a nonsingular Wishart distribution, the random variable JSIis called the generalized variance. The distribution of IS\ is easily derived using Proposition 8.11. First, write Z = B2 with B > 0 and letS,=B-‘SB-‘. ThenC(S,)=W(I,p,n)andIS1=IZIIS,I.Also,if
TT’ = S,, T E G:, then C(ti) = XZ,-i+ are mutually independent. Thus
P
c(Isl)= ~(l~llsl=l)e(ImllTTfl)= E(1x1ti~)
Therefore, the distribution of IS1 is the same as the constant 1x1times a product of p independent chi-square random variables with n – i + 1 degreesoffreedomfori= 1,…,p.
,for i = 1,…,p, and t,,,…,tpp
316 THE WISHART DISTRIBUTION
8.3. THE NONCENTRAL WISHART DISTRIBUTION
Just as the Wishart distribution is a matrix generalization of the chi-square distribution, the noncentral Wishart distribution is a matrix analog of the noncentral chi-square distribution. Also, the noncentral Wishart distribu- tion arises in a natural way in the study of distributional properties of test statistics in multivariate analysis.
Definition 8.2. Let X E Cp, have a normal distribution N(p,In €3 2). A random matrix S E Sp has a noncentral Wishart distribution with parame- ters 2, p, n, and A =p’p if C(S)= C(XrX).In this case, we write C(S)= W(Z,p,n;A).
In thls definition, it is not obvious that the distribution of X’X depends on p only through A = p’p. However, an invariance argument establishes this. The group 8, acts nn CP,, by sending x into Tx for x E CP,, and r E 8,. A maximal invariant under ths action is x’x. When C ( X )= N(p,In€3 Z),C(I’X)=N(rp,In€3 2)andweknowthedistributionof X’X depends only on a maximal invariant parameter. But the group action on the parameter space is (p,Z ) -+ (rp,2 ) and a maximal invariant is obviously (p’p, Z). Thus the distribution of X’X depends only on (p’p, 2).
When A = 0, the noncentral Wishart distribution is just the W ( Z ,p, n ) distribution. Let Xi,. .., Xi be the rows of X in the above definition so X,,…,XnareindependentandC(X,)=N(pi,Z)wherep;,…,ph arethe rows of p. Obviously,
where Ai = pipi. Thus S, = XiX,’, i = 1,. ..,n, are independent and it is clear that, if S = X’X, then
In other words, the noncentral Wishart distribution with n degrees of freedom can be represented as the convolution of n noncentral Wishart distributions each with one degree of freedom. This argument shows that, if C(S,)= W(Z,p,n,;Ai)fori= 1,2andifS,andS,areindependent,then C(S,+S,) = W(Z,p, n, +n,, A, +A,). Since
it follows that
when C(S) = W(Z, p, n; A). Also,
GS = nZ +A
but an explicit expression for Cov(S,) is not needed here. As with the central Wishart distribution, it is not difficult to prove that, when C(S) =
W(Z, p, n; A), then S is positive definite with probability one iff n >p and Z>0.Further,itisclearthatifC(S)= W(Z,p,n;A)andAisanrxp matrix, then C(ASA’) = W(AZAf, r, n; AAA’). The next result provides an expression for the density function of S in a special case.
Proposition8.12. Suppose t(S) = W(2, p, n; A) where n >p and Z > 0, and assume that A has rank one, say A = 1/17! with 17 E RP. The density of S with respect to v(dS) = dS/lS~(p+’)/i~s given by
where p(SIZ) is the density of a W(Z, p, n) distribution given in Proposi- tion 8.2 and the function H is defined in Example 7.13.
Proof. Consider X E CP,., with C(X) = N(p, In8 Z) where p E 4,.and p’p = A. Since S = X’X 1s a maximal invariant under the action of 8, on Cp, ., the results of Example 7.15 show that the density of S with respect to the measure vo(dS) = (&)”pa (n, p)l~l(“-p-‘)/d~S is
Here, f is the density of X and p, is the unique invariant probability measure on 8,. The density of X is
Substituting this into the expression for h(S) and doing a bit of algebra shows that the density p,(SIZ, A) with respect to v is
318 THE WISHART DISTRIBUTION
The problem is now to evaluate the integral over On. It is here where we use the assumption that A has rank one. Since A = p’p, p must have rank one so p = [q’ where5 E Rn,151= 1,and q E RP,A = qq’. Since151= 1,E = r,el for some I?, E On where E, E Rn is the first unit vector. Setting u = (TJ’Z-‘SZ-‘~)~/X~Z,-lq = U&E, forsomer2E OnasUE, andXZ-‘q have the same length. Therefore,
The right and left invariance of po was used in the third to the last equality and y,, is the (1,l) element of I?. The function H was evaluated in Example 7.13. Therefore, when A = qq’,
The final result of this section is the analog of Proposition 8.5 for the noncentral Wishart distribution.
Proposition8.13. S = X’PX where P 2 0 is n X n. Write P = with A >, 0. If B = AQA is a rank k orthogonal projection and if AQPp = Ap, then
Proof: The proof of ths result is quite similar to that of Proposition 8.5 and is left to the reader.
It should be noted that there is not an analog of Proposition 8.7 for the noncentral Wishart distribution, at least as far as I know. Certainly, Proposition 8.7 is false as stated when S is noncentral Wishart.
8.4. DISTRIBUTIONS RELATED TO LIKELIHOOD RATIO TESTS
In the next two chapters, statistics that are the ratio of determinants of Wishart matrices arise as tests statistics related to likelihood ratio tests.
DISTRIBUTIONS RELATED TO LIKELIHOOD RATIO TESTS . 319
Since the techniques for deriving the distributions of these statistics are intimately connected with properties of the Wishart distribution, we have chosen to treat this topic here rather than interrupt the flow of the succeeding chapters with such considerations.
Let X E C,,, and S E Sp+ be independent and suppose that C(X)= N(p,I, @ 2) and C(S)= W(2,p, n) where n >p and Z > 0. We are interested in deriving the distribution of the random variable
for some special values of the mean matrix p of X. The argument below shows that the distribution of U depends on ( p , Z ) only through Z-‘/2p’pZ-‘/2 where 2’12is the positive definite square root of Z. Let S = 2’/2~,2’/a2nd Y = ThenS,and Yareindependent,C(S,)= W(I,p,n),andC(Y)= N(pZ-‘/2,I, @ I,). Also,
IS11
IS +XfXI IS, +Y’Y1′
U= IS1 =
However, the discussion of the previous section shows that Y’Y has a noncentral Wishart distribution, say C(Y’Y)= W(I,p, n;A) where A = 2-1/2p’p2-1/2.In the following discussion we take Z = I, and denote the distribution of U by
where A = p’p. When p = 0, the notation
is used. In the case that p = 1,
where C(S)= x;. Since C(X)= N(p,I,), C(XfX)= Xi(A)where A = p’p 2 0. Thus
320 THE WISHART DISTRIBUTION When X’,(A) and Xiare independent, the distribution of the ratio
is called a noncentral F distribution with parameters m, n, and A. When A = 0, the distribution of F(m, n; 0) is denoted by F,, and is simply called an F distribution with (m, n) degrees of freedom. It should be noted that this usage is not standard as the above ratio has not been normalized by the constant n/m. At times, the relationship between the F distribution and the beta distribution is useful. It is not difficult to show that, when and Xi are independent, the random variable
2
v=Xn xt + x’,
has a beta distribution with parameters n/2 and m/2, and this is written as C(V) = $(n/2, m/2). In other words, Vhas a density on (0,l) given by
where a = n/2 and /3 = m/2. More generally, the distribution of the random variable
is called a noncentral beta distribution and the notation C(V(A))= $(n/2, m/2; A) is used. In summary, when p = 1,
where A = pfp 2 0.
Now, we consider the distribution of U when m = 1. In this case,
C(Xf)= N(pf,Ip)whereX’ERP and
The last equality follows from Proposition 1.35.
PROPOSITION 8.14
Proposition 8.14. When m = 1,
where6=pp’ 20.
Proof. It must be shown that
C(XS-‘X’) = F(p,n – p +1,6) For X fixed, X t 0, Proposition 8.10 shows that
,,(XX’ j=xi-,+1
whenC(S)= W(1,p,n).SincethisdistributiondoesnotdependonX,we
have that (XX’)/XS-‘x’ and XX’ are independent. Further,
since C(X’)= N(pt,I,). Thus
The next step in studying C(U) is the case when m > 1, p > 1, but (1. = 0.
Proposition 8.15. Suppose X and S are independent where C ( S ) = W(I,p,n)andC(X)=N(0,I, @I,). Then
where U,,…, Urnare independent and C ( q )= %((n – p + i)/2, p/2). Proof. The proof is by induction on m and, when m = 1, we know
C(u)= $8((n-p +1)/2,p/2).
322 THE WISHART DISTRIBUTION
Since X’X = C;”X,X,’ where X has rows Xi,. .., Xh,
The first claim is that
and
are independent random variables. Since X I , . . . , X, are independent and independent of S , to show U, and W are independent, it suffices to show that U, and S +X,X; are independent. To do ths, Proposition 7.19 is applicable.ThegroupGI, actson(S,XI)by
A(S,XI)= (ASA’,AX,)
and the induced group action on T = S +X,X; sends T into ATA’. The induced group action is clearly transitive. Obviously, T is an equivariant function and also U, is an invariant function under the group action on ( S , X I ) . That T is a sufficient statistic for the parametric family generated by GI, and the fixed joint distribution of ( S , X I ) is easily checked via the factorization criterion. By Proposition 7.19, U, and S +X,Xi are indepen- dent. Therefore.
where Ul and W are independent and
However,C(S+X,X;)= W(I,p,n+1)andtheinductionhypothesisap- plied to W yields
PROPOSITION 8.16
where W,,…, Wm-,are independent with
SettingU,=w;-,,i= 2,…,m,wehave where U,,…, Umare independent and
The above proof shows that q’s are given by
and that these random variables are independent. Since C(S + C;-‘X,X;) = W(I, p , n + i – I), Proposition 8.14 yields
In the special case that A has rank one, the distribution of U can be derived by an argument similar to that in the proof of Proposition 8.15.
Proposition 8.16. Suppose X and S are independent where C(S) = W(I, p, n) and C(X) = N(p, I, @ I,). Assume that p = 517′ with 5 E Rm, 151= 1,and17 € RP.Then
where U,,…, Urnare independent, and
n-p+m p e(um=)a( 2 T;IV).
324 THE WISHART DISTRIBUTION
Proof. Let E, be the mth standard unit in Rm. Then r.$= E, for some rE6,as 151= 11&,11. Since
and C(I’X) = N(E,TJ’, I, @ I,), we can take $, = E, without loss of general- ity. As in the proof of Proposition 8.15, XrX= C;”&y where XI,…, X, are independent. Obviously, C(X,) = N(0, I,), i = 1,. . . , m – 1, and C(X,) = N(q, I,). Now, write U = ll;”U, where
The argument given in the proof of Proposition 8.15 shows that
and { S + XIXi, X2,. . . , X,) are independent. The assumption that XI has mean zero is essential here in order to verify the sufficiency condition necessary to apply Proposition 7.19. Since U2,…, Urn are functions of {S+X,X;,X,,…,X,), U,isindependentof{U,,…,U,). Now,wesim- ply repeat this argument m – 1 times to conclude that U,,. . . , Urn are independent, keeping in mind that XI,…, Xm-, all have mean zero, but X, need not have mean zero. As noted earlier,
By Proposition 8.14,
Now, we return to the case when p = 0. In terms of the notation C(U)= U(n,m,p),Proposition8.14assertsthat
Further, Proposition 8.15 can be written
where thls equation means that the distribution U(n, m, p ) can be repre- sented as the distribution of the product of m independent random variables with distribution U(n +i – 1,1,p) for i = 1,…,m. An alternativerepre- sentation of U(n, m, p ) in terms of p independent random variables when m >, p follows. If m >, p and
with C(S)= W(I,p,n) and C(X)= N(0,I, 8 I,), the matrix T= X’X has a nonsingular Wishart distribution, C(T) = W(I, p, m). The following technical result provides the basic step for decomposing U(n, m, p ) into a product of p independent factors.
Proposition 8.17. Partition S into is pi x p,, i, j = 1,2, and p, +p2= p. Partition T similarlyand let
Then the five random vectors S,,, TI,, S2,.,, 7722.1~and Z are mutually independent. Further,
Proof: Since S and T are independent by assumption, (S,,, S,,, S,,. ,) and (T,,, TI,, T,,. ,) are independent. Also, Proposition 8.8 shows that (S,,, S,,) and S,, . , are independent with
and
Similar remarks hold for (TI,,TI,) and T2,.,with n replaced by m. Thus the
326 THE WISHART DISTRIBUTION
four random vectors (S,,,S,,), S,,.,, (TI,,TI,), and T,,. , are mutually independent. Since Z is a function of (S,,,S,) and (TI,,TI,),the proposi- tion follows if we show that Z is independent of the vector (S,,,T,,). Conditional on ( S ,,, T I,),
Let A(B)be the positive definite square root of S,,(TI,). With V = A-IS,, and W = B-IT,,,
Also.
where
However, Q is easily shown to be an orthogonal projection of rank p,. By Proposition 8.5,
~ ( Z I ( S , T , I , >) ) = W ( I 7 P 2 , P I )
for each value of (S,,,TI,). Therefore, Z is independent of (S,,,TI,) and
the proof is complete.
Proposition 8.18. If m 2 p, then.
Proof: By definition,
with n 2 p. In the notation of Proposition 8.17, partition S and T with p, = 1andp, = p – 1. Then Sll,Til,S22.,,T22.1,and
are mutually independent. However, and
Thus
and the two factors on the right side of this equality are independent by Proposition 8.17. Obviously,
Since C(T2,.,) = W(I, p – 1, m – l), C(Z) = W ( I ,p – 1, l), and T2,,, and Z are independent, it follows that
Therefore,
which implies the relation
Now, an easy induction argument establishes
328 THE WISHART DISTRIBUTION
which implies that
and this completes the proof.
Combining Propositions 8.15 and 8.18 leads to the following.
Proposition 8.19. If m >, p, then
Proof: For arbitrary m, Proposition 8.15 yields
where this notation means that the distribution U(n, m, p ) can be repre- sented as the product of m independent beta-random variables with the factors in the product having a %((n – p + i)/2, p/2) distribution. Since
Proposition 8.18 implies that
Applying Proposition 8.15 to U(n – p + m, p , m) yields
which is the distribution U(n, m, p).
In practice, the relationship U(n, m, p ) = U(n – p + m, p, m) shows that it is sufficient to deal with the case that m
, 2, Z > 0. Show that the density of
r = s,,/
where p = ol,/ \iG and +is defined as follows. Let XI and X2 be independent chi-square random variables each with n degrees of freedom. Then $(t) = Ge~p[t(X,X,)~f/o~rlIt1 < 1. Using this repre- sentation, prove that p(r1p) has a monotone.likelihood ratio.
2. The gamma distribution with parameters a > 0 and X > 0, denoted by G(a, A), has the density
with respect to Lebesgue measure on (0, a).
(i) Show the characteristic function of t h s distribution is (1 – iAt)-*.
(ii) ShowthataG(n/2,2) distributionisthatofaXidistribution.
3. The above problem suggests that it is natural to view the gamma family as an extension of the chi-squared family by allowing nonin- tegral degrees of freedom. Since the W(Z, p, n) distribution is a generalization of the chi-squared distribution, it is reasonable to ask if we can define a Wishart distribution for nonintegral degrees of free- dom. One way to pose this question is to ask for what values of a is Ga(A) = II, – 2iAIa, A E S,, a characteristic function. (We have taken Z = I, for convenience).
(i) Using Proposition 8.3 and Problem 7.1, show that +a is a characteristic function for a = 1/2,. ..,(p – 1)/2 and all real a > ( p – 1)/2. Give the density that corresponds to $a for a > (p – 1)/2. W(Ip, p, 2a) denotes such a distribution.
(ii) For any Z >, 0 and the values of a given in (i), show that $,(ZA), A E S,, is a characteristic function.
can be written as
330 THE WISHART DISTRIBUTION
4. Let S be a random element of the inner product space (S,, ( . ,a) where ( . , . ) is the usual trace inner product on Sp. Say that S has an 0,-invariant distribution if C(S) = C(I’ST’) for each rE 0,. Assume S has an 8,-invariant distribution.
(i) Assuming &Sexists, show that &S= cIp where c = &s,, and s,, is the i, j element of S.
(ii) Let D E Spbe diagonal with diagonal elements dl,…, dp. Show that var((D, S)) = (y – p)Cd? + P(Cfdl)2 where y = var(s,,) andP= cov(s,,,s,, ).
(iii) For A E S,, show that var((A, S)) = (y – P)(A, A) + P(IP,A),. From this conclude that Cov(S) = (y – P)I, 8 I, + PI, q I,.
5. Suppose S E S; has a density f with respect to Lebesgue measure dS restricted to S;. For each n >p, show there exists a random matrix X E C,,, that has a density with respect to Lebesgue measure on Cp,, and C(X’X) = C(S).
6. Show that holds for all n,, n, equal to 1,2,…,p – 1 or any real number greater than p – 1.
7. (The inverse Wishart distribution.) Say that a positive definite S E S; has an inverse Wishart distribution with parameters A, p, and v if C(S-I) = w(A-I, p, v +p – 1). Here A E S; and v is a positive integer. The notation C(S) = IW(A, p, v) signifies that C(SP’) =
w(A-‘, p,v+p-1).
(i) IfC(S)=IW(A,p,v)andAisrxpofrankr,showthat C(ASAf)= IW(AAAf,r, v).
(ii) If C(S) = IW(I,, p, v) and I’E Op,show that C(rSI”) = C(S).
(iii) If C(S) = IW(A, p, v), show that &(S)= (V- 2)-‘A. Show that Cov(S) has the form c,A 8 A + c,AU A-what are c, and c,?
(iv) Now,partitionSintoS,,:qxq,SI2:qXr,andS2,:rxr with S as in (iii). Show that C(S,,) = IW(‘A,,, q, v). Also show that C(S,,.,) = IW(A2,,,, r, v + q).
8. (The matric t distribution.) Suppose X is N(0, I, 8 I,) and S is W(I,, p, m), m >, p. Let S-‘/, denote the inverse of the positive definite square root of S. When S and X are independent, the matrix T = XS-‘I2 is said to have a matric t distribution and is denoted by C(T)= T(m-p+1,I,,I,).
Proposition 8.4
PROBLEMS
(i) Show that the density of T with respect to Lebesgue measure on
C,,is given by
Also, show that C(T) = C(rTAf)for r E 8, and A E 8,. Using this, show GT = 0 and Cov(T) = c,I, 3€ I, when these exist. Here, c, is a constant equal to the variance of any element of T.
(ii) Suppose V is IW(I,, p, v) and that T given V is N(0, I, 8 V). Show that the unconditional distribution of T is T(v, I,, I,).
(iii) Using and (ii), show that if T is T(v, I,, I,), and TI, is the k x q upper left-hand corner of T, then TI, is T(v, Ik, Iq).
9. (Multivariate F distribution.) Suppose S, is W(I,, p, m) (for m = 1,2,…)andisindependentofS2,whichisW(I,, p,v+p-1)(for v = 1,2,… ). The matrix F = S; ‘/2S,SF’I2has a matric F distribu- tion that is denoted by F(m, v, I,).
(i) If S is IW(I,, p, v) and Vgiven S is W(S, p, m), show that the unconditional distribution of V is F(m, v , I,).
331
Problem 7
(ii) Suppose T is T(v, I,, I,). Show that T’T is F(r, v, I,).
(iii) When r 2 p, show that the F(r, v, I,) distribution has a density
with respect to dF/I F I(,+ ‘ ) I g2iven by
(iv) Forr>p,showthat,ifFisF(r,v,I,), thenF-‘isF(v+p- 1, r – p + 1, I,).
(v) IfFisF(r,v,I,) andFllistheqX qupperleftblockofF,use (ii) to show that Fllis F(r, v, Iq).
(vi) SupposeXisN(0,I,8I,) withr
+ 1, I,).
10. (Multivariate beta distribution.) Let S, and S2be independent and suppose C(Si)= W(I,,, p, m,), i = 1,2, with m, +m, >p. The ran- dom matrix B = ( S , + S2)- ‘/2S,(S, + S,)- ‘I2 has a p-dimensional multivariate beta distribution with parameters m, and m,. This is
332
THE WISHART DISTRIBUTION written C(B) = B(m,, m,, I,) (when p = 1, this is the univariate beta
distributionwithparametersm,/2 andm2/2).
(i) If B is B(m,, m,, I,) show that C(I’Brf) = C(B) for all r E 8,. Use Example 7.16 to conclude that C(B) = C(qD9’) where 9 E Op is uniform and is independent of the diagonal matrix D with elements A, 2 . . > A, > 0. The distribution of D is de- termined by specifying the distribution of A,,…, A, and this is the distribution of the ordered roots of (S, +S2)-‘/2S,(S, + S2)-
(ii) With S, and S, as in the definition of B, show that S:/2(Sl + s,)-‘s,”~ is B(m,, m,, I,).
(iii) Suppose F is F(m, v, I,). Use (i) and (ii) to show that ( I +F)- isB(p+v-1,m,I,) andF(I+F)-‘ isB(m,p+v-1,I,).
(iv) I,) andthatitisindependentofS, which is W(Ip, p, m). When r 6 p and m >p, show that X(S + XfX)-‘X’ is B(p, r +m – p, I,).
(v) If B is B(m,, m,, I,) and m, 2 p, show that det(B) is distrib- uted as U(m,, m,, p ) in the notation of Section 7.4.
NOTES AND REFERENCES
1. The Wishart distribution was first derived in Wishart (1928).
2. For some alternative discussions of the Wishart distribution, see Ander- son (1958), Dempster (1969), Rao (1973), and Muirhead (1982).
3. The density function of the noncentral Wishart distribution in the general case is obtained by “evaluating”
(see the proof of Proposition 8.12). The problem of evaluating
for A E en,,has received much attention since the paper of James (1954). Anderson (1946) first gave the noncentral Wishart density when
NOTES AND REFERENCES 333
p has rank 1 or rank 2. Much of the theory surrounding the evaluation of 4and series expansions for 4can be found in Muirhead (1982).
4. Wilks (1932) first proved by calculating all the mo- ments of U and showing these matched the moments of nU,.Anderson (1958) also uses the moment method to find the distribution of U. This method was used by Box (1949) to provide asymptotic expansions for the distribution of U (see Anderson, 1958, Chapter 8).
Proposition 8.15