CS计算机代考程序代写 F71SM STATISTICAL METHODS

F71SM STATISTICAL METHODS

3 RANDOM VARIABLES

3.1 Introduction: random variables, distribution functions, proba-
bility mass and density functions

Given a probability space (S, P ), a random variable X is a real-valued function defined on S,
i.e. X : S → RX ⊆ R. We require that for every x ∈ RX , the subset {ω : ω ∈ S and X(ω) ≤ x}
of S is an event.

Consider A ⊆ RX . What is the meaning of the symbol P (A)? It is a probability induced in
RX and defined to be P (A) = P ({ω : ω ∈ S and X(ω) ∈ A}).

For example, for a throw of a pair of regular six-sided dice (one red, one blue), let X
be the score we obtain. S consists of 36 elements of the form (i, j), where i and j are the
scores on the red and blue die respectively. Let A be the event ‘a score of 6’. Then P (A) =
P ({(1, 5), (2, 4), (3, 3), (4, 2), (5, 1)}) and so P (A) = 5/36, which we write simply as P (X =
6) = 5/36.

The cumulative distribution function (cdf) FX of X is defined by FX(x) = P (X ≤ x).

In the case that RX is a finite or countable set, X is a discrete random variable with
probability mass function fX , where fX(x) = P (X = x). Probabilities of events are
obtained by summing probabilities of the individual values which make up the event. The
distribution function FX is a step (jump) function.

In the case that RX is not countable (for instance, an interval on the real line), X is a
continuous random variable with probability density function (pdf) fX , where fX(x) =
dFX
dx

. Probabilities of events are obtained by integrating the pdf over the appropriate interval

(probabilities are represented by areas under the pdf). FX is a continuous function.

We often just write F (x) for FX(x) and f(x) for fX(x), where there is no possible confusion
with other random variables. The cdf is often called simply the distribution function (df).

3.2 Expectation (expected value)

The expectation of the random variable X, denoted E[X], is given by

E[X] =

xif (xi) for X discrete,

E[X] =


x f(x) dx for X continuous

This is the mean of the random variable X, often denoted µ.

The expectation of h(X), a function of the random variable X, may be found as

E[h(X)] =

h (xi) f (xi) for X discrete,

1

E[h(X)] =


h(x) f(x) dx for X continuous

E[(X − µ)2] is the variance, denoted V [X] or Var[X] or σ2.

σ(> 0) is the standard deviation, SD[X].

Note: E[(X−µ)2] = E[X2−2µX+µ2] = E[X2]−2µE[X]+µ2 = E[X2]−µ2 = E[X2]−(E[X])2

Example: consider a discrete r.v. X with probability mass function

x 1 2 3 4
f(x) 0.2 0.3 0.4 0.1

Then
E[X] = 1× 0.2 + 2× 0.3 + 3× 0.4 + 4× 0.1 = 2.4,
E[X2] = 12 × 0.2 + 22 × 0.3 + 32 × 0.4 + 42 × 0.1 = 6.6,
so µ = 2.4, σ2 = 6.6− 2.42 = 0.84, σ =


0.84 = 0.917

Alternatively we have
σ2 = E[(X−µ)2] = E[(X−2.4)2] = (−1.4)2×0.2+(−0.4)2×0.3+0.62×0.4+1.62×0.1 = 0.84

Example: consider a continuous r.v. with probability density function

f(x) = 2e−2x, x > 0 (and = 0 elsewhere)

Then

E[X] =

∫ ∞
0

2xe−2x dx = 0.5, E[X2] =

∫ ∞
0

2x2e−2x dx = 0.5 ← check the values of both integrals

so µ = 0.5, σ2 = 0.5− 0.52 = 0.25, σ =

0.25 = 0.5

3.3 Moments and moment generating function

The kth moment about the mean of the distribution of X, denoted µk , is the expected
value of (X − µ)k

That is, µk = E[(X − µ)k]

The kth moment about the origin of the distribution of X, denoted µ′k, is the expected
value of Xk

That is, µ′k = E[X
k]

The moment generating function (mgf) of the distribution of X, denoted MX(t) or just
M(t), is defined to be M(t) = E

[
etX
]
, if the expectation exists. Note that M(0) = 1.

M(t) can be expanded as a power series in t:

M(t) = E

[
1 + tX +

t2

2!
X2 +

t3

3!
X3 + · · ·

]
= 1 + µt+ µ′2

t2

2!
+ µ′3

t3

3!
+ · · ·

2

so we see that µ′k = E[X
k] is the coefficient of t

k

k!
in the expansion. We can retrieve the moments

by successive differentiation of M(t) and putting t = 0. That is,

M ′(t) = E
[
XetX

]
⇒M ′(0) = E[X] = µ,

M ′′(t) = E
[
X2etX

]
⇒M ′′(0) = E[X2] = µ′2

etc.

Only random variables for which all moments exist possess mgfs. For those distributions
which do have mgfs, there is a one-to-one correspondence between distributions and mgfs. We
can sometimes recognise a distribution from its mgf — this can be very useful.

Example: consider the r.v. above for which f(x) = 2e−2x, x > 0 (and = 0 elsewhere).

M(t) =

∫ ∞
0

2etx e−2x dx =

∫ ∞
0

2e−(2−t)x dx =
2

2− t
=

(
1−

t

2

)−1
for t < 2← check M(0) = 1 and M(t) = 1 + ( t 2 ) + ( t 2 )2 + ( t 2 )3 + · · · = 1 + 1 2 t+ 1 2 t2 2! + 3 4 t3 3! + · · · confirming E[X] = 0.5, E[X2] = 0.5 as before. Further, M ′(t) = 1 2 ( 1− t 2 )−2 , M ′′(t) = 1 2 ( 1− t 2 )−3 , and putting t = 0 gives E[X] = 0.5, E[X2] = 0.5 again. 3.4 Probability generating function The probability generating function (pgf) of the distribution of a counting variable X (that is, a variable which assumes some or all of the values 0, 1, 2, . . ., but no others), denoted GX(t) or just G(t), is defined to be G(t) = E[tX ], if the expectation exists. Note that G(1) = 1. Let pk = P (X = k). G(t) = p0 + p1t+ p2t 2 + p3t 3 + · · ·, G′(t) = p1 + 2p2t+ 3p3t 2 + · · ·, G′′(t) = 2p2 + (3× 2)p3t+ · · ·, ⇒ G′(1) = p1 + 2p2 + 3p3 + · · · = E[X], ⇒ G′′(1) = 2p2 + (3× 2)p3 + · · · = E[X(X − 1)] = E[X2]− E[X] So the mean and variance of X are given by G′(1) and G′′(1)+G′(1)−(G′(1))2, respectively. Note that for a r.v. with both a pgf and a mgf we have M(t) = G(et) and G(t) = M(ln t). 3.5 Change of origin and scale (linear transformations) Consider the linear transformation Y = a+ bX E[Y ] = E[a+ bX] = a+ bE[X] i.e. µY = a+ bµX 3 Var[Y ] = b2Var[X] i.e. σ2Y = b 2σ2X Proof: Var[Y ] = E[(Y − µY )2] = E[(a+ bX − a− bµX)2] = b2E[(X − µX)2] = b2Var[X] If follows that the standard deviations of Y and X are related simply by the scaling SD[Y ] = |b|SD[X] i.e. σY = |b|σX Special case: standardised variable Z = X−µX σX ; E[Z] = 0, Var[Z] = SD[Z] = 1 Mgf: MY (t) = E [ etY ] = E [ et(a+bX) ] = eatE [ e(bt)X ] = eatMX(bt) Pgf: GY (t) = E [ tY ] = E [ ta+bX ] = taE [( tb )X] = taGX ( tb ) 3.6 Skewness The usual coefficient of skewness (degree of asymmetry), denoted γ1, is given by γ1 = E [(X − µ)3] (E [(X − µ)2])3/2 = µ3 µ 3/2 2 = µ3 σ3 The numerator is the third central moment while the denominator is based on the second central moment (the variance). A symmetrical distribution has γ1 = 0; γ1 > 0 corresponds
to positive skew, γ1 < 0 to negative skew. The measure is invariant to changes of origin and scale (it has no units of measurement). For the example distribution above with f(x) = 2e−2x, x > 0, we have Var[X] = 1/4, E[(X − µ)3] = 1/4, giving γ1 = 2.

For a distribution with mgf MX(t) = exp(µt+
1
2
t2σ2), we have E[X] = µ, and so MX−µ(t) =

exp(1
2
t2σ2) and hence E[(X − µ)3] = 0 and so γ1 = 0. (This mgf corresponds to X ∼ N(µ, σ2),

see later.)

3.7 The Markov and Chebyshev inequalities

These two inequalities provide useful information about the distributions of random variables,
in particular by providing upper bounds on the size of the tails of the distributions.

Markov’s inequality: let X be a non-negative r.v. (that is, one for which f(x) = 0 for
x < 0) with mean µ. Then for any constant a > 0, P (X ≥ a) ≤ µ

a
.

Proof:

µ = E[X] =

∫ ∞
0

xf(x) dx =

∫ a
0

xf(x) dx+

∫ ∞
a

xf(x) dx ≥
∫ ∞
a

xf(x) dx ≥
∫ ∞
a

af(x) dx = aP (X ≥ a)

hence result.

Chebyshev’s inequality: let X be a r.v. with mean µ and variance σ2. Then for any
constant k > 0, P (|X − µ| ≥ kσ) ≤ 1

k2
.

Proof: Applying Markov’s inequality to the variable (X − µ)2 gives

P
(
(X − µ)2 ≥ a

)

E [(X − µ)2]
a

=
σ2

a

4

Putting a = k2σ2 gives result.

An illustration of Markov’s inequality: P (X ≥ 4µ) ≤ µ

= 0.25; that is, the probability
that a non-negative r.v. takes a value greater than or equal to 4 times its mean is at most 0.25.

An illustration of Chebyshev’s inequality: P (|X−µ| ≥ 3σ) ≤ 1
9
; that is, the probability that

a r.v. takes a value at least 3 standard deviations away from the mean is at most 1/9 = 0.1111.

5

3.8 Worked examples

3.1 A fair die is thrown. S = {1, 2, 3, 4, 5, 6}; n(S) = 6
Let X be the score (i.e. the number showing).

Then, for ω = i, X(ω) = i; RX = {1, 2, 3, 4, 5, 6}.
Probability function/distribution of X:

X 1 2 3 4 5 6
fX(x) 1/6 1/6 1/6 1/6 1/6 1/6

E[X] = 1× 1/6 + 2× 1/6 + · · ·+ 6× 1/6 = 21/6 = 7/2⇒ mean µ = 7/2
E[X2] = 12 × 1/6 + 22 × 1/6 + · · ·+ 52 × 1/6 + 62 × 1/6 = 91/6
⇒ Var[X] = 91/6−(7/2)2 = 35/12, SD[X] =


35/12⇒ σ2 = 35/12, σ =


35/12 = 1.71

Pgf G(t) = 1
6

(t+ t2 + · · ·+ t6) = t(1−t
6)

6(1−t) for t 6= 1

3.2 A fair coin is tossed three times.

S = {(x1, x2, x3) : x1, x2, x3 = H,T} i.e. S = {H,T}3; n(S) = 23 = 8
Let X be the number of heads observed.

So, for instance, for ω = (H,T, T ), X(ω) = 1 and for ω = (T, T, T ), X(ω) = 0.

RX = {0, 1, 2, 3}
Consider e.g. X = 1, i.e. ω ∈ {(H,T, T ), (T,H, T ), (T, T,H)}. So fX(1) = P (X = 1) =
3/8

Probability function/distribution:

x 0 1 2 3
fX(x) 1/8 3/8 3/8 1/8

Distribution function — some values:

FX(−3) = P (X ≤ −3) = 0, FX(0) = P (X ≤ 0) = P (X = 0) = 1/8, FX(1) = P (X ≤
1) = 1/8 + 3/8 = 1/2

The function is given in full by:

FX(x) =




0 for x < 0, 1/8 for 0 ≤ x < 1, 1/2 for 1 ≤ x < 2, 7/8 for 2 ≤ x < 3, 1 for x ≥ 3. 6 E[X] = 0× 1/8 + 1× 3/8 + 2× 3/8 + 3× 1/8 = 3/2⇒ mean µ = 3/2 E[X2] = 02 × 1/8 + 12 × 3/8 + 22 × 3/8 + 32 × 1/8 = 24/8 = 3 ⇒ Var[X] = 3− (3/2)2 = 3/4, SD[X] = √ 3/2⇒ σ2 = 3/4, σ = √ 3/2 = 0.866 Pgf G(t) = 1 8 (1 + 3t+ 3t2 + t3) = ( t+1 2 )3 Linear transformations: Let Y = 3−X: E[Y ] = 3− E[X] = 3− 3/2 = 3/2⇒ µY = 3/2 Var[Y ] = Var[−X] = Var[X] = 3/4⇒ σ2Y = 3/4 SD[Y ] = SD[−X] = SD[X] = √ 3/2⇒ σY = √ 3/2 Let Z = 2X − 3: E[Z] = 2E[X]− 3 = 2× (3/2)− 3 = 0⇒ µZ = 0 Var[Z] = 4Var[X] = 4× (3/4) = 3⇒ σ2Z = 3 SD[Z] = 2SD[X] = 2( √ 3/2) = √ 3⇒ σZ = √ 3 3.3 A fair die is thrown repeatedly until a six turns up. Let X be the number of throws required. A convenient way to represent the sample space is as follows: Sample space = {S, (N,S), (N,N, S), . . .} where S = six, N = not a six, and the kth element in the list corresponds to a sequence of throws which ends at throw k with the first six i.e. it corresponds to X = k. So X = k ⇔ ω = (N,N,N, . . . , N, S) where there are k − 1 N ’s. ⇒ fX(k) = P (X = k) = (5/6)k−1(1/6), k = 1, 2, 3, . . . So, for example, P (X is even) = P (X = 2)+P (X = 4)+· · · = (5/6)(1/6)+(5/6)3(1/6)+ (5/6)5(1/6) + · · · = (5/36)(1 + 25/36 + (25/36)2 + · · ·) = (5/36)/(1− 25/36) = 5/11 µ = E[X] = ∑∞ k=1 k(5/6) k−1(1/6) = (1/6)(1+2(5/6)+3(5/6)2+· · ·) = (1/6)(1−5/6)−2 = 6 E[X2] is best found via E[X(X − 1)] 7 E[X(X − 1)] = ∑∞ k=1 k(k − 1)(5/6) k−1(1/6) = 2(1/6)(5/6)(1− 5/6)−3 = 60 So E[X2] = E[X2 −X] + E[X] = 60 + 6 = 66 ⇒ σ2 = 66− 62 = 30, σ = √ 30 = 5.48 3.4 Consider the discrete random variable X with probability mass function x 10 100 1000 f(x) 0.2 0.5 0.3 E[X] = 10× 0.2 + 100× 0.5 + 1000× 0.3 = 352 so E[2X + 3] = 2(352) + 3 = 707 E[log10X] = (log10 10)× 0.2 + (log10 100)× 0.5 + (log10 1000)× 0.3 = 0.2 + 1 + 0.9 = 2.1 Note: E[logX] 6= logE[X] 3.5 Consider the continuous r.v. X with pdf f(x) = 3/x4, x > 1 (= 0 elsewhere).

Check pdf conditions: f(x) > 0 for x > 1 and
∫∞

1
3/x4 dx =

[
− 1
x3

]∞
1

= 1

Probabilities:
P (1 < X < 2) = ∫ 2 1 3/x4 dx = [ − 1 x3 ]2 1 = 7 8 P (X > 1.5) =
∫∞

1.5
3/x4 dx =

[
− 1
x3

]∞
1.5

= 8
27

Cdf: F (x) = 0 for x ≤ 1 and F (x) =
∫ x

1
3/t4 dt =

[
− 1
t3

]x
1

= 1− 1
x3

for x > 1

Plots of f(x) and F (x):

8

Level/location/average: mean µ =
∫∞

1
3
x3
dx =

[
− 3

2×2

]∞
1

= 1.5

median m: solving F (m) = 0.5 gives 1− 1/m3 = 1/2 so m = 21/3 = 1.26
Spread/variability: E[X2] =

∫∞
1

3
x2
dx =

[
− 3
x

]∞
1

= 3 ⇒ σ2 = 3 − 1.52 = 0.75, σ =√
0.75 = 0.866

Higher moments: for r ≥ 3, E[Xr] does not exist.

3.6 Consider the continuous r.v. X with pdf f(x) = xe−x, x > 0 (= 0 elsewhere).

Cdf: F (x) = 0 for x ≤ 0 and F (x) =
∫ x

0
te−t dt = [−te−t]x0 +

∫ x
0

e−t dt = 1 − (1 + x)e−x
for x > 0.

Level/location/average: mean µ =
∫∞

0
x2e−x dx = Γ(3) = 2 (or by parts)

median m: setting 1− (1 +m)e−m = 0.5⇒ m = 1.678
mode: turning point of pdf: setting f ′(x) = 0⇒ (1− x)e−x = 0⇒ mode = 1

Spread: E[X2] =
∫∞

0
x3e−x dx = Γ(4) = 3! = 6 ⇒ Var[X] = σ2 = 6 − 22 = 2, SD[X] =

σ =

2

Mgf: M(t) = E[etX ] =
∫∞

0
xe−(1−t)xdx =

∫∞
0

1
(1−t)2ue

−u du =
Γ(2)

(1−t)2 =
1

(1−t)2 for t < 1 M ′(t) = 2(1− t)−3, M ′′(t) = 6(1− t)−4 so µ = M ′(0) = 2, E[X2] = M ′′(0) = 6 (as before) M ′′′(t) = 24(1−t)−5 so E[X3] = M ′′′(0) = 24⇒ E[(X−µ)3] = 24−(3×2×6)+2×23 = 4 Hence skewness coefficient is γ1 = 4/2 3/2 = √ 2 3.7 Suppose X has a distribution with constant density, that is f(x) = 1, 0 < x < 1. We use cdfs to find the distributions of (a) Y = X2 and (b) Y = − lnX. FX(x) = P (X ≤ x) = x for 0 ≤ x ≤ 1. 9 (a) Range of Y is (0, 1). In (0, 1), the events X2 ≤ y and X ≤ y1/2 are equivalent and so have the same probability. So FY (y) = P (Y ≤ y) = P (X2 ≤ y) = P (X ≤ y1/2) = y1/2, 0 ≤ y ≤ 1 ⇒ fY (y) = ddy ( y1/2 ) = 1 2y1/2 , 0 ≤ y ≤ 1. Note: While the pdf of X is flat, that of Y = X2 is decreasing. (b) Range of Y is (0,∞). FY (y) = P (Y ≤ y) = P (− lnX ≤ y) = P (lnX ≥ −y) = P (X ≥ e−y) = 1 − P (X ≤ e−y) = 1− e−y for y ≥ 0. Hence fY (y) = d dy (1− e−y) = e−y, y ≥ 0. 10