F71SM STATISTICAL METHODS
5 MULTIVARIATE DISTRIBUTIONS AND LINEAR COMBINATIONS
5.1 Introduction — several random variables at once
The concepts and descriptions of random variables introduced in section 3 all extend to dis- tributions of several random variables defined simultaneously on a joint sample space — these give us vector random variables, or multivariate distributions. In 2-dimensions we have a pair of r.v.s (X, Y ) with cdf (cumulative distribution function) FX,Y (x, y), or just F (x, y), whereF(x,y)=P(X≤xandY ≤y).
Discrete case: pmf (probability mass function) fX,Y (x, y) or just f (x, y), where f (x, y) = P(X=x,Y =y)
→ probabilities of events defined on the r.v.s are evaluated using double sums
Continuous case: pdf ( probability density function) fX,Y (x, y), or just f (x, y) → probabilities of events defined on the r.v.s are evaluated using double integrals
f(x,y) = ∂x∂yF(x,y), F(x,y) =
5.2 Expectations, product moments, covariance, correlation
E[h(X,Y)] =
∞∞ h(x,y)f(x,y) or
h(x,y)f(x,y)dxdy
∞∞ Mean of X: μX = E[X] = xf(x,y) or
xf(x,y)dxdy and similarly for μY ,
E[X2], E[Y 2], and variances σX2 , σY2 and so on.
f(s,t)dsdt
Product moments (about the origin): E[XrY s] = Product moments (about the means):
∞∞ xr ys f(x, y) or
xr ys f(x, y) dx dy
∞∞ (x−μX)r(y−μY)sf(x,y)or
E[(X−μX)r(Y −μY)s]=
ThecovariancebetweenX andY: Cov[X,Y]=E[(X−μX)(Y −μY)]=E[XY]−μXμY
The correlation coefficient between X and Y : ρXY = Corr[X, Y ] = Cov[X, Y ] σX σY
(x−μX)r(y−μY)sf(x,y)dxdy
Note: Cov[X, X] = Var[X]
5.3 Association/linear relationships
The covariance Cov[X, Y ] is a measure of the association between X and Y ; that is, it indicates the strength of the linear relationship between X and Y (it also gives the direction of any relationship: positive or negative or zero) — it is measured in the units of xy.
Useful results: Cov[aX + b, cY + d] = acCov[X, Y ] Cov[X,Y +Z]=Cov[X,Y]+Cov[X,Z]
The correlation coefficient is a dimensionless measure of the strength of the association between X and Y ; it has no units of measurement and lies in the range −1 ≤ ρXY ≤ 1.
ρXY = 1 ⇔ perfect positive linear relationship, that is Y = a + bX with b > 0 ρXY = 0 ⇔ no linear relationship
ρXY = −1 ⇔ perfect negative linear relationship, that is Y = a + bX with b < 0
Change of units: U = a + bX, V = c + dY where b, d > 0, then Corr[U, V ] = Corr[X, Y ] Two r.v.s X, Y with Cov[X, Y ] = 0 have ρXY = 0 and are said to be uncorrelated.
Simulated data (200 values in each case) from 2-d r.v.s with various correlations.
Left: ρ = 0 (r = −0.095); center: ρ = +0.9 (r = +0.881); right: ρ = −0.7 (r = −0.746). (r gives the sample correlation coefficient — the observed sample equivalent of ρ)
5.4 Marginal distributions
The distribution of a single r.v. on its own in this context is called a marginal distribution.
Marginal distribution of X:
discrete case fX(x) = Similarly for Y .
∞ f(x,y) = P(X = x), continuous case fX(x) =
To find moments of X, or expectations of functions of X, we can use the joint pmf/pdf or 2
the marginal pmf/pdf, since, for example,
E[g(X)] = g(x)f(x,y) = g(x)f(x,y) = g(x)fX(x)
5.5 Conditional distributions
ForY givenX=x:fY|x(y|x)=f(x,y)forxsuchthatfX(x)̸=0 fX (x)
In discrete case, fY |x(y|x) = P (Y = y|X = x)
(we can drop the subscript Y |x if the context is clear).
The conditional mean of Y given X = x is the mean of the conditional distribution, de-
∞ notedE[Y|X =x]orjustE[Y|x]orμY|x,givenbyE[Y|X =x]= yf(y|x)or
E[Y|X = x] is a function of x.
The conditional expectation of h(Y ) given X = x is denoted E[h(Y )|X = x] or just
∞ h(y)f(y|x)or
h(y)f(y|x)dy. Afunctionofx. The conditional variance of Y given X = x is the variance of the conditional distribution,
denotedVar[Y|x]orσY|x,givenbyσY|x =E Y −μY|x X=x =E Y |X=x −μY|x 5.6 Independence
X and Y are independent random variables ⇔ fX,Y (x, y) = fX (x)fY (y) for all (x, y) within their range.
In this case:
ForsetsCandD,P(X∈C,Y ∈D)=P(X∈C)P(Y ∈D)
E[XY] = xyfX,Y (x,y)dxdy = xyfX(x)fY (y)dxdy = xfX(x)dx yfY (y)dy xyxyxy
= E[X]E[Y ]
⇒ Cov[X, Y ] = 0 ⇒ Corr[X, Y ] = 0.
So independence ⇒ zero correlation (note: the converse does not hold)
Worked Example 5.1 A fair coin is tossed three times. Let X be the number of heads in the first two tosses and let Y be the number of tails in all three tosses. (X, Y ) is discrete. The experiment has 8 equally likely outcomes, which are given below with the corresponding values of the variables:
HHH HHT HTH THH HTT THT TTH TTT (x,y) (2,0) (2,1) (1,1) (1,1) (1,2) (1,2) (0,2) (0,3)
The joint probability mass function and marginal distributions are as follows. 3
E[h(Y)|x],givenbyE[h(Y)|X=x]=
2 2 2 2 2
yf(y|x)dy.
0 0 0 1/8 1/8 1/4 X 1 0 2/8 2/8 0 1/2 2 1/8 1/8 0 0 1/4
1/8 3/8 3/8 1/8
P (X = Y ) = 1/4, P (X > Y ) = 1/4
μX = 0×1/4+1×1/2+2×1/4 = 1, E[X2] = 0×1/4+12 ×1/2+22 ×1/4 = 3/2, σX2 = 1/2
Similarly μY = 3/2, σY2 = 3/4
X ∼ b(2, 1/2), Y ∼ b(3, 1/2)
P(X=0,Y =0)=0whereasP(X=0)P(Y =0)=1/4×1/8=1/32,soXandY are not independent.
Joint moments: The product XY takes values 0, 1, 2 with probabilities 3/8, 2/8, 3/8 respec- tively, so
E[XY ] = 0 × 3/8 + 1 × 2/8 + 2 × 3/8 = 1 ⇒ Cov[X, Y ] = 1 − 1 × 3/2 = −1/2
⇒ Correlation coefficient ρ = √ −1/2 = −0.817 (Note negative correlation – higher values
(1/2)×(3/4)
of X are associated with lower values of Y and vice-versa).
Conditional distributions: Consider, for example, the distribution of Y given X = 1.
P(Y =0|X=1)=0,P(Y =1|X=1)=(2/8)/(1/2)=1/2,P(Y =2|X=1)=1/2,
P(Y =3|X=1)=0
i.e. fY |x(1|1) = fY |x(2|1) = 1/2.
The mean of this conditional distribution is the conditional expectation
E[Y|X =1]=1×P(Y =1|X =1)+2×P(Y =2|X =1)=1×1/2+2×1/2=3/2.
Similarly E[Y|X = 0] = 5/2 and E[Y|X = 2] = 1/2
Worked Example 5.2 Let (X, Y ) have joint pdf f(x, y) = e−(x+y), x > 0, y > 0.
P(X
so X and Y are independent.
It follows that E[XY ] = E[X]E[Y ] = 1 × 1 = 1
5.7 More than 2 random variables
The definitions and results above can be extended to cases in which we have 3 or more r.v.s. An important generalisation we require here is the definition of independence for a collection
of n r.v.s.
Let X = (X1,X2,…,Xn) be a collection of n r.v.s with joint pmf/pdf fX(x1,x2,…,xn)
and with marginal pmfs/pdfs f1(x1), f2(x2), . . . , fn(xn).
Then the n r.v.s are independent ⇔ fX(x1,x2,…,xn) = f1(x1)f2(x2)···fn(xn) for all
(x1, x2, . . . , xn) within their range. In this case:
• the variables are pairwise independent, that is Xi and Xj are independent, i, j = 1, 2, . . . , n, i ̸= j
• E[g1(X1)g2(X2)···gn(Xn)]=E[g1(X1)]E[g2(X2)]···E[gn(Xn)] • and, in particular, E[X1X2 · · · Xn] = E[X1]E[X2] · · · E[Xn]
Note: pairwise independence does not imply joint independence.
5.8 Linear combinations of random variables
Mean and variance
E[aX+bY] =
Var[aX+bY] =
aE[X] + bE[Y ] Cov[aX+bY,aX+bY]=a2Var[X]+b2Var[Y]+2abCov[X,Y] X,Y uncorrelated⇒Var[aX+bY]=a2Var[X]+b2Var[Y]
E aiXi = aiE[Xi] i=1 i=1
Var aiXi = a2iVar[Xi]+aiajCov[Xi,Xj]
i=1 i=1 i j̸=i nn
X1,X2,…,Xn independent ⇒ Var aiXi =a2iVar[Xi] i=1 i=1
X, Y independent ⇒ nn
Important special case:
E[X +Y] = Var[X +Y] = X, Y independent ⇒
Var[X]+Var[Y]+2Cov[X,Y]
X, Y uncorrelated ⇒ Var[X + Y ] = Var[X] + Var[Y ]
Pgfs: X, Y independent, S = X + Y ⇒ GS (t) = GX (t)GY (t) (extends to n r.v.s) 5
Mgfs: X, Y independent, S = X + Y ⇒ MS (t) = MX (t)MY (t) (extends to n r.v.s)
In the case that X, Y are independent r.v.s with probability mass functions fX , fY respec-
tively, we have
fX+Y(s) = P(X+Y =s)=P(X=x,Y =s−x)=fX(x)fY(s−x)
= fX (s − y)fY (y) y
The mass function of X + Y is called the convolution of the mass functions of X and Y . The concept extends to the sum of n independent r.v.s.
Standard distributions:
• X∼b(n,p),Y ∼b(m,p)withX,Y independent⇒X+Y ∼b(n+m,p)
• X∼P(λ1),Y ∼P(λ2)withX,Y independent⇒X+Y ∼P(λ1+λ2)
• X,Y ∼exp(λ)withX,Y independent⇒X+Y ∼gamma(2,λ)
•X∼N(μX,σX2),Y∼N(μY,σY2)withX,Yindependent⇒X+Y∼N(μX+μY,σX2 +σY2) andX−Y∼N(μX−μY,σX2 +σY2)
• X∼χ2n,Y ∼χ2m withX,Y independent⇒X+Y ∼χ2n+m
Worked example 5.3 Apples of a certain variety have weights which are normally dis- tributed about a mean of 120g and with standard deviation 10g. Oranges of a certain variety have weights which are normally distributed about a mean of 130g and with standard deviation 15g. I buy 4 apples and 2 oranges. Find (a) the probability that the total weight of my fruit exceeds 700g, and (b) the symmetrical interval containing 95% probability for the total weight of my fruit.
Let X(Y ) be the weight of an apple (orange) to be purchased. X ∼ N(120, 102), Y ∼ N(130, 152)
Let W be the total weight of my fruit. Then
W = X1+X2+X3+X4+Y1+Y2
where the Xi’s are i.i.d. copies of X, the Yi’s are i.i.d. copies of Y , and the Xi’s and Yi’s are
independent.
E[W] = 4 × 120 + 2 × 130 = 740 Var[W ] = 4 × 100 + 2 × 225 = 850
W is the sum of six independent normal variables, and so itself is a normal variable, so W ∼N(740,850)
(a)P(W>700)=P Z> √ √
700−740
=P(Z>−1.372)=0.9150 (b) 740 ± (1.96 × 850) i.e. 683g to 797g.
Further worked examples
We investigate whether or not the sum and difference of two random variables are corre- lated.LetXandY berandomvariablesandletU=X+Y andW=X−Y.
Cov[U,W] = Cov[X+Y,X−Y]=Cov[X,X]+Cov[X,−Y]+Cov[Y,X]+Cov[Y,−Y] = Var[X] − Var[Y ]
⇒ U and W are correlated unless X and Y have equal variances. (X,Y)haspdff(x,y)=x2+xy,0