F71SM STATISTICAL METHODS
5 MULTIVARIATE DISTRIBUTIONS AND LINEAR
COMBINATIONS
5.1 Introduction — several random variables at once
The concepts and descriptions of random variables introduced in section 3 all extend to dis-
tributions of several random variables defined simultaneously on a joint sample space — these
give us vector random variables, or multivariate distributions. In 2-dimensions we have
a pair of r.v.s (X, Y ) with cdf (cumulative distribution function) FX,Y (x, y), or just F (x, y),
where F (x, y) = P (X ≤ x and Y ≤ y).
Discrete case: pmf (probability mass function) fX,Y (x, y) or just f(x, y), where f(x, y) =
P (X = x, Y = y)
→ probabilities of events defined on the r.v.s are evaluated using double sums
Continuous case: pdf ( probability density function) fX,Y (x, y), or just f(x, y)
→ probabilities of events defined on the r.v.s are evaluated using double integrals
f(x, y) =
∂2
∂x∂y
F (x, y), F (x, y) =
∫ x
−∞
∫ y
−∞
f(s, t) ds dt
5.2 Expectations, product moments, covariance, correlation
E[h(X, Y )] =
∑
x
∑
y
h(x, y)f(x, y) or
∫ ∞
−∞
∫ ∞
−∞
h(x, y)f(x, y) dx dy
Mean of X: µX = E[X] =
∑
x
∑
y
x f(x, y) or
∫ ∞
−∞
∫ ∞
−∞
x f(x, y) dx dy and similarly for µY ,
E[X2], E[Y 2], and variances σ2X , σ
2
Y and so on.
Product moments (about the origin): E[XrY s] =
∑
x
∑
y
xr ys f(x, y) or
∫ ∞
−∞
∫ ∞
−∞
xr ys f(x, y) dx dy
Product moments (about the means):
E [(X − µX)r(Y − µY )s] =
∑
x
∑
y
(x−µX)r(y−µY )s f(x, y) or
∫ ∞
−∞
∫ ∞
−∞
(x−µX)r (y−µY )s f(x, y) dx dy
The covariance between X and Y : Cov[X, Y ] = E [(X − µX)(Y − µY )] = E[XY ]− µXµY
The correlation coefficient between X and Y : ρXY = Corr[X, Y ] =
Cov[X, Y ]
σXσY
Note: Cov[X,X] = Var[X]
1
5.3 Association/linear relationships
The covariance Cov[X, Y ] is a measure of the association between X and Y ; that is, it indicates
the strength of the linear relationship between X and Y (it also gives the direction of any
relationship: positive or negative or zero) — it is measured in the units of xy.
Useful results: Cov[aX + b, cY + d] = acCov[X, Y ]
Cov[X, Y + Z] = Cov[X, Y ] + Cov[X,Z]
The correlation coefficient is a dimensionless measure of the strength of the association
between X and Y ; it has no units of measurement and lies in the range −1 ≤ ρXY ≤ 1.
ρXY = 1⇔ perfect positive linear relationship, that is Y = a+ bX with b > 0
ρXY = 0⇔ no linear relationship
ρXY = −1⇔ perfect negative linear relationship, that is Y = a+ bX with b < 0
Change of units: U = a+ bX, V = c+ dY where b, d > 0, then Corr[U, V ] = Corr[X, Y ]
Two r.v.s X, Y with Cov[X, Y ] = 0 have ρXY = 0 and are said to be uncorrelated.
Simulated data (200 values in each case) from 2-d r.v.s with various correlations.
Left: ρ = 0 (r = −0.095); center: ρ = +0.9 (r = +0.881); right: ρ = −0.7 (r = −0.746).
(r gives the sample correlation coefficient — the observed sample equivalent of ρ)
5.4 Marginal distributions
The distribution of a single r.v. on its own in this context is called a marginal distribution.
Marginal distribution of X:
discrete case fX(x) =
∑
y
f(x, y) = P (X = x), continuous case fX(x) =
∫ ∞
−∞
f(x, y) dy
Similarly for Y .
To find moments of X, or expectations of functions of X, we can use the joint pmf/pdf or
2
the marginal pmf/pdf, since, for example,
E[g(X)] =
∑
x
∑
y
g(x)f(x, y) =
∑
x
g(x)
∑
y
f(x, y) =
∑
x
g(x)fX(x)
5.5 Conditional distributions
For Y given X = x: fY |x(y|x) =
f(x, y)
fX(x)
for x such that fX(x) 6= 0
In discrete case, fY |x(y|x) = P (Y = y|X = x)
(we can drop the subscript Y |x if the context is clear).
The conditional mean of Y given X = x is the mean of the conditional distribution, de-
noted E[Y |X = x] or just E[Y |x] or µY |x, given by E[Y |X = x] =
∑
yf(y|x) or
∫ ∞
−∞
y f(y|x) dy.
E[Y |X = x] is a function of x.
The conditional expectation of h(Y ) given X = x is denoted E[h(Y )|X = x] or just
E[h(Y )|x], given by E[h(Y )|X = x] =
∑
y
h(y)f(y|x) or
∫ ∞
−∞
h(y)f(y|x) dy. A function of x.
The conditional variance of Y given X = x is the variance of the conditional distribution,
denoted Var[Y |x] or σ2Y |x, given by σ
2
Y |x = E
[(
Y − µY |x
)2∣∣∣X = x] = E [Y 2|X = x]− µ2Y |x
5.6 Independence
X and Y are independent random variables ⇔ fX,Y (x, y) = fX(x)fY (y) for all (x, y) within
their range.
In this case:
For sets C and D, P (X ∈ C, Y ∈ D) = P (X ∈ C)P (Y ∈ D)
E[XY ] =
∫
x
∫
y
xy fX,Y (x, y) dx dy =
∫
x
∫
y
xy fX(x) fY (y) dx dy =
∫
x
xfX(x) dx
∫
y
yfY (y) dy
= E[X]E[Y ]
⇒ Cov[X, Y ] = 0⇒ Corr[X, Y ] = 0.
So independence ⇒ zero correlation (note: the converse does not hold)
Worked Example 5.1 A fair coin is tossed three times. Let X be the number of heads in
the first two tosses and let Y be the number of tails in all three tosses. (X, Y ) is discrete. The
experiment has 8 equally likely outcomes, which are given below with the corresponding values
of the variables:
HHH HHT HTH THH HTT THT TTH TTT
(x, y) (2, 0) (2, 1) (1, 1) (1, 1) (1, 2) (1, 2) (0, 2) (0, 3)
The joint probability mass function and marginal distributions are as follows.
3
Y
0 1 2 3
0 0 0 1/8 1/8 1/4
X 1 0 2/8 2/8 0 1/2
2 1/8 1/8 0 0 1/4
1/8 3/8 3/8 1/8
P (X = Y ) = 1/4, P (X > Y ) = 1/4
µX = 0×1/4+1×1/2+2×1/4 = 1, E[X2] = 0×1/4+12×1/2+22×1/4 = 3/2, σ2X = 1/2
Similarly µY = 3/2, σ
2
Y = 3/4
X ∼ b(2, 1/2), Y ∼ b(3, 1/2)
P (X = 0, Y = 0) = 0 whereas P (X = 0)P (Y = 0) = 1/4 × 1/8 = 1/32, so X and Y are
not independent.
Joint moments: The product XY takes values 0, 1, 2 with probabilities 3/8, 2/8, 3/8 respec-
tively, so
E[XY ] = 0× 3/8 + 1× 2/8 + 2× 3/8 = 1⇒ Cov[X, Y ] = 1− 1× 3/2 = −1/2
⇒ Correlation coefficient ρ = −1/2√
(1/2)×(3/4)
= −0.817 (Note negative correlation – higher values
of X are associated with lower values of Y and vice-versa).
Conditional distributions: Consider, for example, the distribution of Y given X = 1.
P (Y = 0|X = 1) = 0, P (Y = 1|X = 1) = (2/8)/(1/2) = 1/2, P (Y = 2|X = 1) = 1/2,
P (Y = 3|X = 1) = 0
i.e. fY |x(1|1) = fY |x(2|1) = 1/2.
The mean of this conditional distribution is the conditional expectation
E[Y |X = 1] = 1× P (Y = 1|X = 1) + 2× P (Y = 2|X = 1) = 1× 1/2 + 2× 1/2 = 3/2.
Similarly E[Y |X = 0] = 5/2 and E[Y |X = 2] = 1/2
Worked Example 5.2 Let (X, Y ) have joint pdf f(x, y) = e−(x+y), x > 0, y > 0.
P (X < Y ) = ∫ ∞ y=0 ∫ y x=0 e−(x+y) dx dy = ∫ ∞ y=0 e−y ∫ y x=0 e−x dx dy = ∫ ∞ y=0 e−y ( 1− e−y ) dy = 1− 1 2 = 1 2 fX(x) = ∫ ∞ 0 e−(x+y) dy = e−x ∫ ∞ 0 e−y dy = e−x so X ∼ exp(1). Similarly, Y ∼ exp(1). fX,Y (x, y) = e −(x+y) = e−xe−y = fX(x)fY (y) for all x > 0, y > 0
so X and Y are independent.
It follows that E[XY ] = E[X]E[Y ] = 1× 1 = 1
4
5.7 More than 2 random variables
The definitions and results above can be extended to cases in which we have 3 or more r.v.s.
An important generalisation we require here is the definition of independence for a collection
of n r.v.s.
Let X = (X1, X2, . . . , Xn) be a collection of n r.v.s with joint pmf/pdf fX(x1, x2, . . . , xn)
and with marginal pmfs/pdfs f1(x1), f2(x2), . . . , fn(xn).
Then the n r.v.s are independent ⇔ fX(x1, x2, . . . , xn) = f1(x1)f2(x2) · · · fn(xn) for all
(x1, x2, . . . , xn) within their range.
In this case:
• the variables are pairwise independent, that isXi andXj are independent, i, j = 1, 2, . . . , n,
i 6= j
• E[g1(X1)g2(X2) · · · gn(Xn)] = E[g1(X1)]E[g2(X2)] · · ·E[gn(Xn)]
• and, in particular, E[X1X2 · · ·Xn] = E[X1]E[X2] · · ·E[Xn]
Note: pairwise independence does not imply joint independence.
5.8 Linear combinations of random variables
Mean and variance
E[aX + bY ] = aE[X] + bE[Y ]
Var[aX + bY ] = Cov[aX + bY, aX + bY ] = a2Var[X] + b2Var[Y ] + 2abCov[X, Y ]
X, Y independent ⇒ X, Y uncorrelated⇒ Var[aX + bY ] = a2Var[X] + b2Var[Y ]
E
[
n∑
i=1
aiXi
]
=
n∑
i=1
aiE [Xi]
Var
[
n∑
i=1
aiXi
]
=
n∑
i=1
a2i Var[Xi] +
∑
i
∑
j 6=i
aiajCov[Xi, Xj]
X1, X2, . . . , Xn independent ⇒ Var
[
n∑
i=1
aiXi
]
=
n∑
i=1
a2i Var[Xi]
Important special case:
E[X + Y ] = E[X] + E[Y ]
Var[X + Y ] = Var[X] + Var[Y ] + 2Cov[X, Y ]
X, Y independent ⇒ X, Y uncorrelated⇒ Var[X + Y ] = Var[X] + Var[Y ]
Pgfs: X, Y independent, S = X + Y ⇒ GS(t) = GX(t)GY (t) (extends to n r.v.s)
5
Mgfs: X, Y independent, S = X + Y ⇒MS(t) = MX(t)MY (t) (extends to n r.v.s)
In the case that X, Y are independent r.v.s with probability mass functions fX , fY respec-
tively, we have
fX+Y (s) = P (X + Y = s) =
∑
x
P (X = x, Y = s− x) =
∑
x
fX(x)fY (s− x)
=
∑
y
fX(s− y)fY (y)
The mass function of X + Y is called the convolution of the mass functions of X and Y .
The concept extends to the sum of n independent r.v.s.
Standard distributions:
• X ∼ b(n, p), Y ∼ b(m, p) with X, Y independent ⇒ X + Y ∼ b(n+m, p)
• X ∼ P (λ1), Y ∼ P (λ2) with X, Y independent ⇒ X + Y ∼ P (λ1 + λ2)
• X, Y ∼ exp(λ) with X, Y independent ⇒ X + Y ∼ gamma(2, λ)
• X ∼ N (µX , σ2X), Y ∼ N (µY , σ
2
Y ) withX, Y independent⇒ X+Y ∼ N (µX + µY , σ
2
X + σ
2
Y )
andX−Y ∼ N (µX − µY , σ2X + σ
2
Y )
• X ∼ χ2n, Y ∼ χ2m with X, Y independent ⇒ X + Y ∼ χ2n+m
Worked example 5.3 Apples of a certain variety have weights which are normally dis-
tributed about a mean of 120g and with standard deviation 10g. Oranges of a certain variety
have weights which are normally distributed about a mean of 130g and with standard deviation
15g. I buy 4 apples and 2 oranges. Find (a) the probability that the total weight of my fruit
exceeds 700g, and (b) the symmetrical interval containing 95% probability for the total weight
of my fruit.
Solution:
Let X(Y ) be the weight of an apple (orange) to be purchased.
X ∼ N(120, 102), Y ∼ N(130, 152)
Let W be the total weight of my fruit. Then
W = X1 +X2 +X3 +X4 + Y1 + Y2
where the Xi’s are i.i.d. copies of X, the Yi’s are i.i.d. copies of Y , and the Xi’s and Yi’s are
independent.
E[W ] = 4× 120 + 2× 130 = 740
Var[W ] = 4× 100 + 2× 225 = 850
W is the sum of six independent normal variables, and so itself is a normal variable, so
W ∼ N(740, 850)
6
(a) P (W > 700) = P
(
Z >
700− 740
√
850
)
= P (Z > −1.372) = 0.9150
(b) 740± (1.96×
√
850) i.e. 683g to 797g.
7
5.9 Further worked examples
5.4 We investigate whether or not the sum and difference of two random variables are corre-
lated. Let X and Y be random variables and let U = X + Y and W = X − Y .
Cov[U,W ] = Cov[X + Y,X − Y ] = Cov[X,X] + Cov[X,−Y ] + Cov[Y,X] + Cov[Y,−Y ]
= Var[X]− Var[Y ]
⇒ U and W are correlated unless X and Y have equal variances.
5.5 (X, Y ) has pdf f(x, y) = x2 +
xy
3
, 0 < x < 1, 0 < y < 2.
Cdf:
F (x, y) =
∫ x
u=0
∫ y
v=0
f(u, v) du dv =
∫ x
u=0
(∫ y
v=0
u2 +
uv
3
dv
)
du
=
∫ x
u=0
[
u2v +
uv2
6
]y
v=0
du =
∫ x
u=0
(
yu2 +
y2u
6
)
du
=
[
yu3
3
+
y2u2
12
]x
u=0
=
x3y
3
+
x2y2
12
, 0 < x < 1, 0 < y < 2.
Note that F (1, 2) = 1.
Find P (X + Y < 1).
P (X + Y < 1) =
∫ 1
x=0
∫ 1−x
y=0
(
x2 +
xy
3
)
dy dx =
∫ 1
x=0
[
x2y +
xy2
6
]1−x
y=0
dx
=
∫ 1
x=0
(
x2(1− x) +
x(1− x)2
6
)
dx = · · · =
7
72
Marginal distributions:
fX(x) =
∫ 2
0
(
x2 +
xy
3
)
dy =
[
x2y +
xy2
6
]2
0
= 2x2 +
2x
3
, 0 < x < 1
8
fY (y) =
∫ 1
0
(
x2 +
xy
3
)
dx =
y + 2
6
, 0 < y < 2
FX(x) =
∫ x
0
(
2u2 +
2u
3
)
du =
2x3 + x2
3
, 0 < x < 1
FY (y) =
∫ y
0
(
v + 2
6
)
dv =
y2 + 4y
12
, 0 < y < 2
µX =
∫ 1
0
(
2x3 +
2x2
3
)
dx =
13
18
E[X2] = · · · =
17
30
⇒ σ2X = 0.0451
µY =
∫ 2
0
y2 + 2y
6
dy =
10
9
E[Y 2] = · · · =
14
9
⇒ σ2Y = 0.3210
Conditional distributions:
fY |x(y|x) =
(
x2 +
xy
3
)/(
2x2 +
2x
3
)
=
3x+ y
6x+ 2
, 0 < y < 2, 0 < x < 1
E[Y |X = x] =
1
6x+ 2
∫ 2
0
y(3x+ y) dy = · · · = 1 +
1
9x+ 3
(So, as the value of x increases from 0 to 1, the conditional expectation of Y falls from
4/3 to 13/12)
Joint moments:
E[XY ] =
∫ 1
x=0
∫ 2
y=0
xy
(
x2 +
xy
3
)
dy dx =
∫ 1
x=0
[
x3y2
2
+
x2y3
9
]2
y=0
dx
=
∫ 1
0
(
2x3 +
8x2
9
)
dx =
43
54
⇒ Cov[X, Y ] =
43
54
−
13
18
×
10
9
= −0.006173
⇒ Correlation coefficient ρ =
−0.006173
√
0.0451× 0.3210
= −0.051
5.6 Let (X, Y ) have probability mass function f(0, 0) = 0.25, f(0.5, 0.6) = 0.5, f(1, 1) = 0.25.
Note that the possible values of (X, Y ) lie very close to a straight line.
µX = 0× 0.25 + 0.5× 0.5 + 1× 0.25 = 0.5
E
[
X2
]
= 0.375⇒ σ2X = 0.125
µY = 0× 0.25 + 0.6× 0.5 + 1× 0.25 = 0.55
E
[
Y 2
]
= 0.43⇒ σ2Y = 0.1275
E[XY ] = 0× 0.25 + 0.3× 0.5 + 1× 0.25 = 0.4
⇒ Correlation coefficient ρ =
0.4− 0.5× 0.55
√
0.125× 0.1275
(which is very close to +1)
5.7 Consider an experiment with three possible mutually exclusive outcomes A, B, C, which
occur with probabilities p, r, s respectively, where p + r + s = 1. Let X, Y, Z be the
9
numbers of occurrences of A, B, and C respectively in n independent repetitions of the
experiment.
X ∼ b(n, p), Y ∼ b(n, r), X + Y ∼ b(n, p+ r)
⇒ Var[X] = np(1− p), Var[Y ] = nr(1− r), Var[X + Y ] = n(p+ r)(1− pr)
Var[X + Y ] = Var[X] + Var[Y ] + 2Cov[X, Y ]
⇒ 2 Cov[X, Y ] = n(p+ r)(1− p− r)− np(1− p)− nr(1− r)
⇒ Cov[X, Y ] = −npr
⇒ ρXY =
−npr√
np(1− p)nr(1− r
= −
√
pr
(1− p)(1− r)
So, for example, the correlation coefficient between the number of 1’s and 6’s in a fixed
number of throws of a fair six-sided die is ρ = −
√
(1/6)× (1/6)
(5/6)× (5/6)
= −0.2.
5.8 Consider two variable and related sums of money, X and Y , which are such that the
former necessarily exceeds the latter. Suppose the distribution of (X, Y ) has pdf
f(x, y) = 3x over the triangular region 0 < y < x, 0 < x < 1
Then
FX,Y (x, y) = P (X ≤ x, Y ≤ y) =
∫ y
v=0
∫ x
u=v
3u du dv
=
∫ y
v=0
[
3u2
2
]x
u=v
dv =
3
2
∫ y
v=0
(
x2 − v2
)
dv
=
3
2
[
x2v −
v3
3
]y
v=0
=
3
2
(
x2y −
y3
3
)
=
3x2y − y3
2
FX(x) = P (X ≤ x)
= P (X ≤ x, Y ≤ x) in this case
=
3x3 − x3
2
= x3, 0 < x < 1
fX(x) =
dF
dx
= 3x2, 0 < x < 1
E[X] =
∫ 1
0
3x3 dx =
3
4
Alternatively, we have
fX(x) =
∫ x
y=0
3x dy = [3xy]
x
y=0 = 3x
2, 0 < x < 1
Conditional distribution:
fY |x(y|x) =
fX,Y (x, y)
fX(x)
=
3x
3x2
= 1/x, 0 < y < x, 0 < x < 1
⇒ E[Y |X = x] =
∫ x
y=0
y
1
x
dy =
1
x
x2
2
=
x
2
You can also verify that fY (y) =
3
2
(
1− y2
)
, 0 < y < 1, and FY (y) =
1
2
(3y − y3),
0 < y < 1.
10