程序代写代做代考 4 Sampling distributions of sample statistics

4 Sampling distributions of sample statistics

Let X1, . . . , Xn be a random sample from a distribution FX(x). A statistic is

a function of the data,

h(X1, . . . , Xn) .

The value of this statistic will usually be different for different samples. As the

sample data is random, the statistic is also a random variable. If we repeatedly

drew samples of size n, calculating and recording the value of the sample statistic

each time, then we would build up its probability distribution. The probability

distribution of a sample statistic is referred to as its sampling distribution.

In this section we will see how to analytically determine the sampling distri-

butions of some statistics, while with certain others we can appeal to the central

limit theorem. Simulation techniques can also be used to investigate sampling

distributions of statistics empirically.

4.1 Sample mean

The mean and variance of the distribution FX(x) are denoted by µ and σ
2

respectively. In the case that the distribution is continuous with p.d.f. fX(x),

µ = E(X) =

∫ ∞
−∞

x fX(x)dx

σ2 = Var(X) = E[(X − µ)2]

=

∫ ∞
−∞

(x− µ)2fX(x)dx =
∫ ∞
−∞

x2fX(x)dx− µ2 .

When the distribution is discrete with p.m.f. pX(x), µ and σ
2 are defined by:

µ = E(X) =

x∈RX

xpX(x)

σ2 = Var(X) = E[(X − µ)2]

=

x∈RX

(x− µ)2p(x) =

x∈RX

x2p(x)− µ2 ,

where RX is the range space of X.

The random variables X1, . . . , Xn are assumed to be independent and iden-

tically distributed (often abbreviated to i.i.d.) random variables, each being dis-

tributed as FX(x). This means that E(Xi) = µ for i = 1, . . . , n and Var(Xi) = σ
2

1

for i = 1, . . . , n.

The sample mean of the n sample variables is:

X̄ =
1

n

n∑
i=1

Xi .

It is straightforward to calculate the mean of the sampling (probability) distri-

bution of X̄ as follows:

E(X̄) = E

[
1

n
(X1 + . . .+Xn)

]
=

1

n
[E(X1) + . . .+ E(Xn)]

=

n
= µ ,

while the variance is

Var(X̄) = Var

[
1

n
(X1 + . . .+Xn)

]
=

1

n2
[Var(X1) + . . .+ Var(Xn)]

=
nσ2

n2
=
σ2

n
.

Here we have used Var(X1 + . . .+Xn) = Var(X1) + . . .+ Var(Xn), which holds

because the Xi are independent.

These results tell us that the sampling distribution of the sample mean X̄

is centered on the common mean µ of each of the sample variables X1, . . . , Xn

(i.e. the mean of the distribution from which the sample is obtained) and has

variance equal to the common variance of the Xi divided by n. Thus, as the sam-

ple size n increases, the sampling distribution of X̄ becomes more concentrated

around the true mean µ.

In the above discussion nothing specific has been said regarding the actual

distribution from which the Xi have been sampled. All we are assuming is that

the mean and variance of the underlying distribution are both finite.

4.1.1 Normally distributed data

In the special case that the Xi are normally distributed then we can make use

of some important results. Let the random variable X ∼ N(µX , σ2X) and let

2

the random variable Y ∼ N(µY , σ2Y ), independently of X. Then we have the
following results:

(i) X + Y ∼ N(µX + µY , σ2X + σ
2
Y )

(ii) X − Y ∼ N(µX − µY , σ2X + σ
2
Y )

(iii) In general, c1X + c2Y ∼ N(c1µX + c2µY , c21σ
2
X + c

2

2
Y ); c1 6= 0, c2 6= 0.

These results extend in a straightforward manner to the linear combination of n

independent normal random variables. Let X1, . . . Xn be n independent normally

distributed random variables with E(Xi) = µi and Var(Xi) = σ
2
i for i = 1, . . . , n.

Thus, here the normal distributions for different Xi may have different means

and variances. We then have that

n∑
i=1

ciXi ∼ N

(
n∑

i=1

ciµi,
n∑

i=1

c2iσ
2
i

)

where the ci ∈ R.
If now the Xi in the sample are i.i.d. N(µ, σ

2) random variables then the

sample mean, X̄, is a linear combination of the Xi (with ci =
1
n

, i = 1, . . . , n,

using the notation above). Thus, X̄ is normally distributed with mean µ and

variance σ2/n, i.e. X̄n ∼ N(µ, σ2/n). This result enables us to make probabilistic
statements about the mean under the assumption of normality.

Example 4.1. (Component lifetime data).

In Chapter 3 we saw that the normal distribution is a reasonable probability

model for the lifetime data and it seems sensible to estimate the two parameters

(µ and σ2) of this distribution by the corresponding sample quantities, x̄ and

s2. For these data x̄ = 334.59 and s2 = 15.288, and so our fitted model is

X ∼ N(334.59, 15.288). Under this fitted model for X, the mean X̄ of a new
sample of size 50 from the population follows aN(334.59, 15.288/50) distribution.

We can then, for example, estimate the probability that the mean of such a

sample exceeds 335,

P(X̄ > 335.0) = 1− Φ

(
335.0− 334.59√

15.288/50

)
= 1− Φ(0.74) = 1− 0.7704 = 0.2296 .

3

4.1.2 Using the central limit theorem

In the previous section, we saw that the random quantity X̄ has a sampling

distribution with mean µ and variance σ2/n. In the special case when we are

sampling from a normal distribution, X̄ is also normally distributed. However,

there are many situations when we cannot determine the exact form of the

distribution of X̄. In such circumstances, we may appeal to the central limit

theorem and obtain an approximate distribution.

The central limit theorem: Let X be a random variable with mean µ and

variance σ2. If X̄n is the mean of a random sample of size n drawn from the

distribution of X, then the distribution of the statistic

X̄n − µ
σ/

n

tends to the standard normal distribution as n→∞.

This means that, for a large random sample from a population with mean µ

and variance σ2, the sample mean X̄n is approximately normally distributed with

mean µ and variance σ2/n. Since, for large n, X̄n ∼ N(µ, σ2/n) approximately
we have that

∑n
i=1Xi ∼ N(nµ, nσ

2) approximately.

There is no need to specify the form of the underlying distribution FX , which

may be either discrete or continuous, in order to use this result. As a consequence

it is of tremendous practical importance.

A common question is ‘how large does n have to be before the normality of

X̄ is reasonable?’ The answer depends on the degree of non-normality of the

underlying distribution from which the sample has been drawn. The more non-

normal FX is, the larger n needs to be. A useful rule-of-thumb is that n should

be at least 30.

Example 4.2. (Income data). What is the approximate probability that the

mean gross income based on a new random sample of size n = 500 lies between

33.0 and 33.5 thousand pounds?

The underlying distribution is not normal but we can appeal to the central

limit theorem to say that

X̄500 ∼ N(µ, σ2/n) approximately.

4

We may estimate µ and σ2 from the data by µ̂ = x̄ = 33.27, σ̂2 = s2 =

503.554. Therefore, using the fitted values of the parameters we may estimate

the probability as

P(33.0 < X̄500 < 33.5) ≈ Φ ( 33.50− 33.27 22.44/ √ 500 ) − Φ ( 33.00− 33.27 22.44/ √ 500 ) ≈ Φ(0.23)− Φ(−0.27) = 0.5910− 0.3936 ≈ 0.1974 . Hence we estimate the probability X̄ lies between 33.0 and 33.5 to be 0.1974. 4.2 Sample proportion Suppose now that we have a random sample X1, . . . , Xn where the Xi are i.i.d. Bi(1, p) random variables. Thus, Xi = 1 (‘success’) with probability p andXi = 0 (‘failure’) with probability 1− p. We know that E(Xi) = p for i = 1, . . . , n and Var(Xi) = p(1− p) for i = 1, . . . , n. The proportion of cases in the sample who have Xi = 1, in other words the proportion of ‘successes’, is given by X̄n = 1 n n∑ i=1 Xi . We have that E(X̄n) = p and Var(X̄n) = p(1−p) n . By the central limit theo- rem, for large n, X̄n is approximately distributed as N ( p, p(1−p) n ) which enables us to easily make probabilistic statements about the proportion of ‘successes’ in a sample of size n. We can also say that, for large n, the total number of ‘successes’ in the sample, given by ∑n i=1Xi, is approximately normally distributed with mean np and variance np(1− p). Recall that, for the normal approximation to be reasonable in this context we require that n ≥ 9.max { 1− p p , p 1− p } . Example 4.3. Suppose that, in a particular country, the unemployment rate is 9.2%. Suppose that a random sample of 400 individuals is obtained. What are the approximate probabilities that: 5 (i) Forty or fewer were unemployed; (ii) The proportion unemployed is greater than 0.125. Solution: (i) For i = 1, . . . , n let the random variable Xi satisfy Xi =  1 if the ith worker is unemployed0 otherwise . From the question, P(Xi = 1) = 0.092 and P(Xi = 0) = 0.908. We have n = 400 ≥ {0.9, 88.8} so that the normal approximation will be valid. Note that np = 400 × 0.092 = 36.8 and np(1 − p) = 400 × 0.092 × 0.908 = 33.414, and ∑n i=1Xi ∼ N(np, np(1− p)) approximately. P ( 400∑ i=1 Xi ≤ 40 ) = P (∑400 i=1Xi − 36.8√ 33.414 ≤ 40.5− 36.8 √ 33.414 ) ≈ P (Z ≤ 0.640) , where Z ∼ N(0, 1) approx. = Φ(0.640) = 0.7390 . (ii) Here, p(1−p) n = 0.092×0.908 400 = 0.0002088. Thus, P ( X̄400 > 0.125

)
= P

(
X̄400 − 0.092√

0.0002088
>

0.125− 0.092

0.0002088

)
≈ 1− Φ(2.284)

= 1− 0.9888

= 0.0112 .

4.3 Sample variance

In this section we will look at the sampling distribution of the sample variance,

S2, defined by

S2 =
1

n− 1

n∑
i=1

(Xi − X̄)2

6

where X1, . . . , Xn are a random sample from the distribution with c.d.f. FX(·)
with mean µ and variance σ2.

If FX is any discrete or continuous distribution with a finite variance then

E(S2) =
1

(n− 1)
E

[
n∑

i=1

(Xi − X̄)2
]

=
1

(n− 1)
E

[
n∑

i=1

[(Xi − µ)− (X̄ − µ)]2
]

=
1

(n− 1)
E

[
n∑

i=1

[(Xi − µ)2 − 2(Xi − µ)(X̄ − µ) + (X̄ − µ)2]

]

=
1

(n− 1)
E

[
n∑

i=1

(Xi − µ)2 − 2n(X̄ − µ)(X̄ − µ) + n(X̄ − µ)2
]

=
1

(n− 1)

[
n∑

i=1

E
[
(Xi − µ)2

]
− 2nE[(X̄ − µ)2] + nE[(X̄ − µ)2]

]

=
1

(n− 1)

[
nσ2 − 2n

σ2

n
+ n

σ2

n

]
since E[(X̄ − µ)2] = Var(X̄) =

σ2

n

=
1

(n− 1)
[
(n− 1)σ2

]
= σ2 .

Hence, we can see that by using divisor (n−1) in the definition of S2, we obtain
a statistic whose sampling distribution is centered on the true distribution value

of σ2. This would not be the case if we had used the perhaps more intuitively

obvious value of n.

We will look more specifically at the case when the Xi are sampled from

the N(µ, σ2) distribution. In order to do so, we first need to introduce a new

continuous probability distribution, the chi-squared (χ2) distribution.

4.3.1 The chi-squared (χ2) distribution

The continuous random variable Y is said to have χ2 distribution with k degrees

of freedom (d.f.), written as χ2(k), iff its pdf is given by

f(y) =

{
1

2k/2Γ(k/2)
y(k/2)−1e−y/2, y > 0

0, otherwise.

7

Note that this is a special case of the Gamma distribution with parameters

α = k/2 and β = 1/2. Note that when k = 2, Y ∼ Exp(1/2). The mean and
variance are given by E(Y ) = k and Var(Y ) = 2k.

The p.d.f.s of chi-squared random variables with d.f. = 1, 3, 6, and 12 are

shown in Figure 1. Note that the p.d.f. becomes more symmetric as the number

of degrees of freedom k becomes larger.

0 5 10 15 20 25 30

0.
0

0.
2

0.
4

0.
6

0.
8

x

f X
(x
)

k = 1
k = 3
k = 6
k = 12

Figure 1: Chi-squared p.d.f.s with different degrees of freedom.

The connection with the normal distribution

Let Z1, . . . , Zk be k i.i.d. standard normal random variables, i.e. each has a

N(0, 1) distribution. Then, the random variable

Y =
k∑

i=1

Z2i

has a χ2 distribution with k degrees of freedom.

We may use this fact to check that for Y ∼ χ2(k) we have E(Y ) = k, as

8

follows. First note that if Zi ∼ N(0, 1) then

1 = Var(Zi)

= E(Z2i )− [E(Zi)]
2

= E(Z2i ), since E(Zi) = 0 .

Hence, E(Z2i ) = 1 (i = 1, . . . , n) and so

E[Y ] = E

[
k∑

i=1

Z2i

]
=

k∑
i=1

E(Z2i ) = k .

Suppose now the random variables X1, . . . , Xn are a random sample from the

N(µ, σ2) distribution. We have that

Xi − µ
σ

∼ N(0, 1) , i = 1, . . . , n ,

so that
n∑

i=1

[
(Xi − µ)

σ

]2
∼ χ2(n) .

If we modify the above by replacing the population mean µ by the sample esti-

mate X̄, the distribution changes and we obtain the following result.

Theorem. If X1, . . . , Xn ∼ N(µ, σ2) independently, then

(n− 1)S2

σ2
=

n∑
i=1

[
(Xi − X̄)

σ

]2
∼ χ2(n− 1) .

(Proof of this result is outside the scope of the course).

By replacing µ with X̄, the χ2 distribution of the sum of squares has lost

one degree of freedom. This is because there is a single linear constraint on the

variables (Xi− X̄)/σ, namely
∑n

i=1(Xi− X̄)/σ = 0. Thus we are only summing
n−1 independent sums of squares. Important fact: X̄ and S2 are independent
random variables.

Example 4.4. Let X1, . . . , X40 be a random sample of size n = 40 from the

N(25, 42) distribution. Find the probability that the sample variance, S2, ex-

ceeds 20.

9

Solution. We need to calculate

P
(
S2 > 20

)
= P

(
39× S2

16
>

39× 20
16

)
= P(Y > 48.75) where Y ∼ χ2(39)

= 1− P(Y < 48.75) = 1− 0.8638 = 0.1362 , where the probability calculation has been carried out using the pchisq com- mand in R: > 1-pchisq(q=48.75, df=39)

[1] 0.1362011

10

Sampling distributions of sample statistics
Sample mean
Normally distributed data
Using the central limit theorem

Sample proportion
Sample variance
The chi-squared (2) distribution