4 Sampling distributions of sample statistics
Let X1, . . . , Xn be a random sample from a distribution FX(x). A statistic is
a function of the data,
h(X1, . . . , Xn) .
The value of this statistic will usually be different for different samples. As the
sample data is random, the statistic is also a random variable. If we repeatedly
drew samples of size n, calculating and recording the value of the sample statistic
each time, then we would build up its probability distribution. The probability
distribution of a sample statistic is referred to as its sampling distribution.
In this section we will see how to analytically determine the sampling distri-
butions of some statistics, while with certain others we can appeal to the central
limit theorem. Simulation techniques can also be used to investigate sampling
distributions of statistics empirically.
4.1 Sample mean
The mean and variance of the distribution FX(x) are denoted by µ and σ
2
respectively. In the case that the distribution is continuous with p.d.f. fX(x),
µ = E(X) =
∫ ∞
−∞
x fX(x)dx
σ2 = Var(X) = E[(X − µ)2]
=
∫ ∞
−∞
(x− µ)2fX(x)dx =
∫ ∞
−∞
x2fX(x)dx− µ2 .
When the distribution is discrete with p.m.f. pX(x), µ and σ
2 are defined by:
µ = E(X) =
∑
x∈RX
xpX(x)
σ2 = Var(X) = E[(X − µ)2]
=
∑
x∈RX
(x− µ)2p(x) =
∑
x∈RX
x2p(x)− µ2 ,
where RX is the range space of X.
The random variables X1, . . . , Xn are assumed to be independent and iden-
tically distributed (often abbreviated to i.i.d.) random variables, each being dis-
tributed as FX(x). This means that E(Xi) = µ for i = 1, . . . , n and Var(Xi) = σ
2
1
for i = 1, . . . , n.
The sample mean of the n sample variables is:
X̄ =
1
n
n∑
i=1
Xi .
It is straightforward to calculate the mean of the sampling (probability) distri-
bution of X̄ as follows:
E(X̄) = E
[
1
n
(X1 + . . .+Xn)
]
=
1
n
[E(X1) + . . .+ E(Xn)]
=
nµ
n
= µ ,
while the variance is
Var(X̄) = Var
[
1
n
(X1 + . . .+Xn)
]
=
1
n2
[Var(X1) + . . .+ Var(Xn)]
=
nσ2
n2
=
σ2
n
.
Here we have used Var(X1 + . . .+Xn) = Var(X1) + . . .+ Var(Xn), which holds
because the Xi are independent.
These results tell us that the sampling distribution of the sample mean X̄
is centered on the common mean µ of each of the sample variables X1, . . . , Xn
(i.e. the mean of the distribution from which the sample is obtained) and has
variance equal to the common variance of the Xi divided by n. Thus, as the sam-
ple size n increases, the sampling distribution of X̄ becomes more concentrated
around the true mean µ.
In the above discussion nothing specific has been said regarding the actual
distribution from which the Xi have been sampled. All we are assuming is that
the mean and variance of the underlying distribution are both finite.
4.1.1 Normally distributed data
In the special case that the Xi are normally distributed then we can make use
of some important results. Let the random variable X ∼ N(µX , σ2X) and let
2
the random variable Y ∼ N(µY , σ2Y ), independently of X. Then we have the
following results:
(i) X + Y ∼ N(µX + µY , σ2X + σ
2
Y )
(ii) X − Y ∼ N(µX − µY , σ2X + σ
2
Y )
(iii) In general, c1X + c2Y ∼ N(c1µX + c2µY , c21σ
2
X + c
2
2σ
2
Y ); c1 6= 0, c2 6= 0.
These results extend in a straightforward manner to the linear combination of n
independent normal random variables. Let X1, . . . Xn be n independent normally
distributed random variables with E(Xi) = µi and Var(Xi) = σ
2
i for i = 1, . . . , n.
Thus, here the normal distributions for different Xi may have different means
and variances. We then have that
n∑
i=1
ciXi ∼ N
(
n∑
i=1
ciµi,
n∑
i=1
c2iσ
2
i
)
where the ci ∈ R.
If now the Xi in the sample are i.i.d. N(µ, σ
2) random variables then the
sample mean, X̄, is a linear combination of the Xi (with ci =
1
n
, i = 1, . . . , n,
using the notation above). Thus, X̄ is normally distributed with mean µ and
variance σ2/n, i.e. X̄n ∼ N(µ, σ2/n). This result enables us to make probabilistic
statements about the mean under the assumption of normality.
Example 4.1. (Component lifetime data).
In Chapter 3 we saw that the normal distribution is a reasonable probability
model for the lifetime data and it seems sensible to estimate the two parameters
(µ and σ2) of this distribution by the corresponding sample quantities, x̄ and
s2. For these data x̄ = 334.59 and s2 = 15.288, and so our fitted model is
X ∼ N(334.59, 15.288). Under this fitted model for X, the mean X̄ of a new
sample of size 50 from the population follows aN(334.59, 15.288/50) distribution.
We can then, for example, estimate the probability that the mean of such a
sample exceeds 335,
P(X̄ > 335.0) = 1− Φ
(
335.0− 334.59√
15.288/50
)
= 1− Φ(0.74) = 1− 0.7704 = 0.2296 .
3
4.1.2 Using the central limit theorem
In the previous section, we saw that the random quantity X̄ has a sampling
distribution with mean µ and variance σ2/n. In the special case when we are
sampling from a normal distribution, X̄ is also normally distributed. However,
there are many situations when we cannot determine the exact form of the
distribution of X̄. In such circumstances, we may appeal to the central limit
theorem and obtain an approximate distribution.
The central limit theorem: Let X be a random variable with mean µ and
variance σ2. If X̄n is the mean of a random sample of size n drawn from the
distribution of X, then the distribution of the statistic
X̄n − µ
σ/
√
n
tends to the standard normal distribution as n→∞.
This means that, for a large random sample from a population with mean µ
and variance σ2, the sample mean X̄n is approximately normally distributed with
mean µ and variance σ2/n. Since, for large n, X̄n ∼ N(µ, σ2/n) approximately
we have that
∑n
i=1Xi ∼ N(nµ, nσ
2) approximately.
There is no need to specify the form of the underlying distribution FX , which
may be either discrete or continuous, in order to use this result. As a consequence
it is of tremendous practical importance.
A common question is ‘how large does n have to be before the normality of
X̄ is reasonable?’ The answer depends on the degree of non-normality of the
underlying distribution from which the sample has been drawn. The more non-
normal FX is, the larger n needs to be. A useful rule-of-thumb is that n should
be at least 30.
Example 4.2. (Income data). What is the approximate probability that the
mean gross income based on a new random sample of size n = 500 lies between
33.0 and 33.5 thousand pounds?
The underlying distribution is not normal but we can appeal to the central
limit theorem to say that
X̄500 ∼ N(µ, σ2/n) approximately.
4
We may estimate µ and σ2 from the data by µ̂ = x̄ = 33.27, σ̂2 = s2 =
503.554. Therefore, using the fitted values of the parameters we may estimate
the probability as
P(33.0 < X̄500 < 33.5) ≈ Φ ( 33.50− 33.27 22.44/ √ 500 ) − Φ ( 33.00− 33.27 22.44/ √ 500 ) ≈ Φ(0.23)− Φ(−0.27) = 0.5910− 0.3936 ≈ 0.1974 . Hence we estimate the probability X̄ lies between 33.0 and 33.5 to be 0.1974. 4.2 Sample proportion Suppose now that we have a random sample X1, . . . , Xn where the Xi are i.i.d. Bi(1, p) random variables. Thus, Xi = 1 (‘success’) with probability p andXi = 0 (‘failure’) with probability 1− p. We know that E(Xi) = p for i = 1, . . . , n and Var(Xi) = p(1− p) for i = 1, . . . , n. The proportion of cases in the sample who have Xi = 1, in other words the proportion of ‘successes’, is given by X̄n = 1 n n∑ i=1 Xi . We have that E(X̄n) = p and Var(X̄n) = p(1−p) n . By the central limit theo- rem, for large n, X̄n is approximately distributed as N ( p, p(1−p) n ) which enables us to easily make probabilistic statements about the proportion of ‘successes’ in a sample of size n. We can also say that, for large n, the total number of ‘successes’ in the sample, given by ∑n i=1Xi, is approximately normally distributed with mean np and variance np(1− p). Recall that, for the normal approximation to be reasonable in this context we require that n ≥ 9.max { 1− p p , p 1− p } . Example 4.3. Suppose that, in a particular country, the unemployment rate is 9.2%. Suppose that a random sample of 400 individuals is obtained. What are the approximate probabilities that: 5 (i) Forty or fewer were unemployed; (ii) The proportion unemployed is greater than 0.125. Solution: (i) For i = 1, . . . , n let the random variable Xi satisfy Xi = 1 if the ith worker is unemployed0 otherwise . From the question, P(Xi = 1) = 0.092 and P(Xi = 0) = 0.908. We have n = 400 ≥ {0.9, 88.8} so that the normal approximation will be valid. Note that np = 400 × 0.092 = 36.8 and np(1 − p) = 400 × 0.092 × 0.908 = 33.414, and ∑n i=1Xi ∼ N(np, np(1− p)) approximately. P ( 400∑ i=1 Xi ≤ 40 ) = P (∑400 i=1Xi − 36.8√ 33.414 ≤ 40.5− 36.8 √ 33.414 ) ≈ P (Z ≤ 0.640) , where Z ∼ N(0, 1) approx. = Φ(0.640) = 0.7390 . (ii) Here, p(1−p) n = 0.092×0.908 400 = 0.0002088. Thus, P ( X̄400 > 0.125
)
= P
(
X̄400 − 0.092√
0.0002088
>
0.125− 0.092
√
0.0002088
)
≈ 1− Φ(2.284)
= 1− 0.9888
= 0.0112 .
4.3 Sample variance
In this section we will look at the sampling distribution of the sample variance,
S2, defined by
S2 =
1
n− 1
n∑
i=1
(Xi − X̄)2
6
where X1, . . . , Xn are a random sample from the distribution with c.d.f. FX(·)
with mean µ and variance σ2.
If FX is any discrete or continuous distribution with a finite variance then
E(S2) =
1
(n− 1)
E
[
n∑
i=1
(Xi − X̄)2
]
=
1
(n− 1)
E
[
n∑
i=1
[(Xi − µ)− (X̄ − µ)]2
]
=
1
(n− 1)
E
[
n∑
i=1
[(Xi − µ)2 − 2(Xi − µ)(X̄ − µ) + (X̄ − µ)2]
]
=
1
(n− 1)
E
[
n∑
i=1
(Xi − µ)2 − 2n(X̄ − µ)(X̄ − µ) + n(X̄ − µ)2
]
=
1
(n− 1)
[
n∑
i=1
E
[
(Xi − µ)2
]
− 2nE[(X̄ − µ)2] + nE[(X̄ − µ)2]
]
=
1
(n− 1)
[
nσ2 − 2n
σ2
n
+ n
σ2
n
]
since E[(X̄ − µ)2] = Var(X̄) =
σ2
n
=
1
(n− 1)
[
(n− 1)σ2
]
= σ2 .
Hence, we can see that by using divisor (n−1) in the definition of S2, we obtain
a statistic whose sampling distribution is centered on the true distribution value
of σ2. This would not be the case if we had used the perhaps more intuitively
obvious value of n.
We will look more specifically at the case when the Xi are sampled from
the N(µ, σ2) distribution. In order to do so, we first need to introduce a new
continuous probability distribution, the chi-squared (χ2) distribution.
4.3.1 The chi-squared (χ2) distribution
The continuous random variable Y is said to have χ2 distribution with k degrees
of freedom (d.f.), written as χ2(k), iff its pdf is given by
f(y) =
{
1
2k/2Γ(k/2)
y(k/2)−1e−y/2, y > 0
0, otherwise.
7
Note that this is a special case of the Gamma distribution with parameters
α = k/2 and β = 1/2. Note that when k = 2, Y ∼ Exp(1/2). The mean and
variance are given by E(Y ) = k and Var(Y ) = 2k.
The p.d.f.s of chi-squared random variables with d.f. = 1, 3, 6, and 12 are
shown in Figure 1. Note that the p.d.f. becomes more symmetric as the number
of degrees of freedom k becomes larger.
0 5 10 15 20 25 30
0.
0
0.
2
0.
4
0.
6
0.
8
x
f X
(x
)
k = 1
k = 3
k = 6
k = 12
Figure 1: Chi-squared p.d.f.s with different degrees of freedom.
The connection with the normal distribution
Let Z1, . . . , Zk be k i.i.d. standard normal random variables, i.e. each has a
N(0, 1) distribution. Then, the random variable
Y =
k∑
i=1
Z2i
has a χ2 distribution with k degrees of freedom.
We may use this fact to check that for Y ∼ χ2(k) we have E(Y ) = k, as
8
follows. First note that if Zi ∼ N(0, 1) then
1 = Var(Zi)
= E(Z2i )− [E(Zi)]
2
= E(Z2i ), since E(Zi) = 0 .
Hence, E(Z2i ) = 1 (i = 1, . . . , n) and so
E[Y ] = E
[
k∑
i=1
Z2i
]
=
k∑
i=1
E(Z2i ) = k .
Suppose now the random variables X1, . . . , Xn are a random sample from the
N(µ, σ2) distribution. We have that
Xi − µ
σ
∼ N(0, 1) , i = 1, . . . , n ,
so that
n∑
i=1
[
(Xi − µ)
σ
]2
∼ χ2(n) .
If we modify the above by replacing the population mean µ by the sample esti-
mate X̄, the distribution changes and we obtain the following result.
Theorem. If X1, . . . , Xn ∼ N(µ, σ2) independently, then
(n− 1)S2
σ2
=
n∑
i=1
[
(Xi − X̄)
σ
]2
∼ χ2(n− 1) .
(Proof of this result is outside the scope of the course).
By replacing µ with X̄, the χ2 distribution of the sum of squares has lost
one degree of freedom. This is because there is a single linear constraint on the
variables (Xi− X̄)/σ, namely
∑n
i=1(Xi− X̄)/σ = 0. Thus we are only summing
n−1 independent sums of squares. Important fact: X̄ and S2 are independent
random variables.
Example 4.4. Let X1, . . . , X40 be a random sample of size n = 40 from the
N(25, 42) distribution. Find the probability that the sample variance, S2, ex-
ceeds 20.
9
Solution. We need to calculate
P
(
S2 > 20
)
= P
(
39× S2
16
>
39× 20
16
)
= P(Y > 48.75) where Y ∼ χ2(39)
= 1− P(Y < 48.75) = 1− 0.8638 = 0.1362 , where the probability calculation has been carried out using the pchisq com- mand in R: > 1-pchisq(q=48.75, df=39)
[1] 0.1362011
10
Sampling distributions of sample statistics
Sample mean
Normally distributed data
Using the central limit theorem
Sample proportion
Sample variance
The chi-squared (2) distribution