F71SM STATISTICAL METHODS
3 RANDOM VARIABLES
3.1 Introduction: random variables, distribution functions, proba- bility mass and density functions
Given a probability space (S, P ), a random variable X is a real-valued function defined on S, i.e.X:S→RX ⊆R.Werequirethatforeveryx∈RX,thesubset{ω:ω∈SandX(ω)≤x} of S is an event.
Consider A ⊆ RX . What is the meaning of the symbol P (A)? It is a probability induced in RX anddefinedtobeP(A)=P({ω:ω∈SandX(ω)∈A}).
For example, for a throw of a pair of regular six-sided dice (one red, one blue), let X be the score we obtain. S consists of 36 elements of the form (i,j), where i and j are the scores on the red and blue die respectively. Let A be the event ‘a score of 6’. Then P(A) = P({(1,5),(2,4),(3,3),(4,2),(5,1)}) and so P(A) = 5/36, which we write simply as P(X = 6) = 5/36.
The cumulative distribution function (cdf) FX of X is defined by FX(x) = P(X ≤ x).
In the case that RX is a finite or countable set, X is a discrete random variable with probability mass function fX , where fX (x) = P (X = x). Probabilities of events are obtained by summing probabilities of the individual values which make up the event. The distribution function FX is a step (jump) function.
In the case that RX is not countable (for instance, an interval on the real line), X is a continuous random variable with probability density function (pdf) fX, where fX(x) =
dFX . Probabilities of events are obtained by integrating the pdf over the appropriate interval dx
(probabilities are represented by areas under the pdf). FX is a continuous function.
We often just write F (x) for FX (x) and f (x) for fX (x), where there is no possible confusion
with other random variables. The cdf is often called simply the distribution function (df).
3.2 Expectation (expected value)
The expectation of the random variable X, denoted E[X], is given by
E[X] = xif (xi) for X discrete,
E[X] =
This is the mean of the random variable X, often denoted μ.
The expectation of h(X), a function of the random variable X, may be found as E[h(X)] = h (xi) f (xi) for X discrete,
x f(x) dx for X continuous
1
E[h(X)] =
h(x)f(x)dx for X continuous
E[(X − μ)2] is the variance, denoted V [X] or Var[X] or σ2.
σ(> 0) is the standard deviation, SD[X].
Note: E[(X −μ)2] = E[X2 −2μX +μ2] = E[X2]−2μE[X]+μ2 = E[X2]−μ2 = E[X2]−(E[X])2
Example: consider a discrete r.v. X with probability mass function x1234
f (x) 0.2 0.3 0.4 0.1
Then
E[X] = 1 × 0.2 + 2 × 0.3 + 3 × 0.4 + 4 × 0.1 = 2.4, E[X2] = 12 ×0.2+22 ×0.3+32 ×0.4+42 ×0.1 = 6.6, soμ=2.4,σ2 =6.6−2.42 =0.84,σ=√0.84=0.917
Alternatively we have
σ2 = E[(X −μ)2] = E[(X −2.4)2] = (−1.4)2 ×0.2+(−0.4)2 ×0.3+0.62 ×0.4+1.62 ×0.1 = 0.84
Example: consider a continuous r.v. with probability density function f(x) = 2e−2x, x > 0 (and = 0 elsewhere)
Then
∞ ∞
E[X] = 2xe−2x dx = 0.5, E[X2] = 2x2e−2x dx = 0.5 ← check the values of both integrals
00 soμ=0.5,σ2 =0.5−0.52 =0.25,σ=√0.25=0.5
3.3 Moments and moment generating function
The kth moment about the mean of the distribution of X, denoted μk , is the expected
value of (X − μ)k
That is, μk = E[(X − μ)k]
The kth moment about the origin of the distribution of X, denoted μ′k, is the expected value of Xk
That is, μ′k = E[Xk]
The moment generating function (mgf) of the distribution of X, denoted MX(t) or just
M (t), is defined to be M (t) = E etX , if the expectation exists. Note that M (0) = 1. M(t) can be expanded as a power series in t:
t2 t3 t2 t3 M(t)=E1+tX+2!X2+3!X3+··· =1+μt+μ′22!+μ′33!+···
2
so we see that μ′ = E[Xk] is the coefficient of tk in the expansion. We can retrieve the moments k k!
by successive differentiation of M(t) and putting t = 0. That is,
M′(t) = E XetX ⇒ M′(0) = E[X] = μ, M′′(t) = E X2etX ⇒ M′′(0) = E[X2] = μ′2
etc.
Only random variables for which all moments exist possess mgfs. For those distributions which do have mgfs, there is a one-to-one correspondence between distributions and mgfs. We can sometimes recognise a distribution from its mgf — this can be very useful.
Example: consider the r.v. above for which f(x) = 2e−2x, x > 0 (and = 0 elsewhere). ∞ ∞ 2 t−1
M(t) = 2etx e−2x dx = 2e−(2−t)x dx = 2 − t = 1 − 2 for t < 2 ← check 00
M(0)=1andM(t)=1+t+t2 +t3 +···=1+1 t+1 t2 +3 t3 +··· 2 2 2 2 22! 43!
confirming E[X] = 0.5, E[X2] = 0.5 as before.
Further, M′(t) = 1 1− t−2, M′′(t) = 1 1− t−3, and putting t = 0 gives E[X] = 0.5,
E[X2] = 0.5 again.
3.4 Probability generating function
The probability generating function (pgf) of the distribution of a counting variable X (that is, a variable which assumes some or all of the values 0, 1, 2, . . ., but no others), denoted GX (t) or just G(t), is defined to be G(t) = E[tX], if the expectation exists. Note that G(1) = 1.
Letpk =P(X=k).
G(t)=p0 +p1t+p2t2 +p3t3 +···, G′(t)=p1 +2p2t+3p3t2 +···, G′′(t)=2p2 +(3×2)p3t+···,
⇒G′(1)=p1 +2p2 +3p3 +···=E[X],
⇒G′′(1)=2p2 +(3×2)p3 +···=E[X(X−1)]=E[X2]−E[X]
So the mean and variance of X are given by G′(1) and G′′(1)+G′(1)−(G′(1))2, respectively. Note that for a r.v. with both a pgf and a mgf we have M(t) = G(et) and G(t) = M(lnt).
3.5 Change of origin and scale (linear transformations)
Consider the linear transformation Y = a + bX E[Y]=E[a+bX]=a+bE[X]i.e. μY =a+bμX
22 22
3
Var[Y ] = b2Var[X] i.e. σY2 = b2σX2
Proof: Var[Y]=E[(Y −μY)2]=E[(a+bX−a−bμX)2]=b2E[(X−μX)2]=b2Var[X]
If follows that the standard deviations of Y and X are related simply by the scaling SD[Y ] = |b|SD[X ] i.e. σY = |b|σX
Special case: standardised variable Z = X−μX ; E[Z] = 0, Var[Z] = SD[Z] = 1 σX
Mgf: MY (t) = E etY = E et(a+bX) = eatE e(bt)X = eatMX(bt) Pgf: GY(t)=EtY=Eta+bX=taEtbX=taGX tb
3.6 Skewness
The usual coefficient of skewness (degree of asymmetry), denoted γ1, is given by γ1 = E[(X−μ)3] = μ3 = μ3
(E [(X − μ)2])3/2 μ3/2 σ3 2
The numerator is the third central moment while the denominator is based on the second central moment (the variance). A symmetrical distribution has γ1 = 0; γ1 > 0 corresponds to positive skew, γ1 < 0 to negative skew. The measure is invariant to changes of origin and scale (it has no units of measurement). For the example distribution above with f(x) = 2e−2x, x>0,wehaveVar[X]=1/4,E[(X−μ)3]=1/4,givingγ1 =2.
For a distribution with mgf MX(t) = exp(μt+21t2σ2), we have E[X] = μ, and so MX−μ(t) = exp(12t2σ2) and hence E[(X −μ)3] = 0 and so γ1 = 0. (This mgf corresponds to X ∼ N(μ,σ2), see later.)
3.7 The Markov and Chebyshev inequalities
These two inequalities provide useful information about the distributions of random variables, in particular by providing upper bounds on the size of the tails of the distributions.
Markov’s inequality: let X be a non-negative r.v. (that is, one for which f(x) = 0 for x < 0) with mean μ. Then for any constant a > 0, P (X ≥ a) ≤ μa .
Proof: ∞a∞∞∞
μ=E[X]= xf(x)dx= xf(x)dx+ xf(x)dx≥ xf(x)dx≥ af(x)dx=aP(X ≥a) 00aaa
hence result.
Chebyshev’s inequality: let X be a r.v. with mean μ and variance σ2. Then for any
constantk>0,P(|X−μ|≥kσ)≤ 1 . k2
Proof: Applying Markov’s inequality to the variable (X − μ)2 gives
P(X−μ)2 ≥a ≤ E[(X−μ)2] = σ2 aa
4
Putting a = k2σ2 gives result.
An illustration of Markov’s inequality: P(X ≥ 4μ) ≤ μ = 0.25; that is, the probability
4μ
that a non-negative r.v. takes a value greater than or equal to 4 times its mean is at most 0.25.
An illustration of Chebyshev’s inequality: P (|X −μ| ≥ 3σ) ≤ 19 ; that is, the probability that a r.v. takes a value at least 3 standard deviations away from the mean is at most 1/9 = 0.1111.
5
3.8
Worked examples
3.1
A fair die is thrown. S = {1,2,3,4,5,6}; n(S) = 6 Let X be the score (i.e. the number showing). Then, for ω = i, X(ω) = i; RX = {1,2,3,4,5,6}. Probability function/distribution of X:
X123456 fX(x) 1/6 1/6 1/6 1/6 1/6 1/6
E[X] = 1 × 1/6 + 2 × 1/6 + · · · + 6 × 1/6 = 21/6 = 7/2 ⇒ mean μ = 7/2 E[X2]=12 ×1/6+22 ×1/6+···+52 ×1/6+62 ×1/6=91/6
3.2
6 6(1−t) A fair coin is tossed three times.
⇒ Var[X] = 91/6−(7/2)2 = 35/12, SD[X] = 35/12 ⇒ σ2 = 35/12, σ = 35/12 = 1.71 PgfG(t)=1(t+t2+···+t6)=t(1−t6) fort̸=1
S = {(x1,x2,x3) : x1,x2,x3 = H,T} i.e. S = {H,T}3; n(S) = 23 = 8
Let X be the number of heads observed.
So, for instance, for ω = (H,T,T), X(ω) = 1 and for ω = (T,T,T), X(ω) = 0.
RX = {0,1,2,3}
Consider e.g. X = 1, i.e. ω ∈ {(H,T,T),(T,H,T),(T,T,H)}. So fX(1) = P(X = 1) = 3/8
Probability function/distribution: x0123
fX (x) 1/8 3/8 3/8 1/8 Distribution function — some values:
FX(−3) = P(X ≤ −3) = 0, FX(0) = P(X ≤ 0) = P(X = 0) = 1/8, FX(1) = P(X ≤ 1) = 1/8 + 3/8 = 1/2
The function is given in full by:
FX(x) =
0 forx<0,
1/8 for0≤x<1,
1/2 for1≤x<2, 7/8 for2≤x<3, 1 forx≥3.
6
E[X] = 0 × 1/8 + 1 × 3/8 + 2 × 3/8 + 3 × 1/8 = 3/2 ⇒ mean μ = 3/2 E[X2]=02 ×1/8+12 ×3/8+22 ×3/8+32 ×1/8=24/8=3
⇒Var[X]=3−(3/2)2 =3/4,SD[X]=√3/2⇒σ2 =3/4,σ=√3/2=0.866 Pgf G(t) = 1 (1 + 3t + 3t2 + t3) = t+13
82
Linear transformations:
LetY =3−X:
E[Y]=3−E[X]=3−3/2=3/2⇒μY =3/2
Var[Y] = Var[−X] = Var[X] = 3/4 ⇒ σ2 = 3/4 √Y√
SD[Y]=SD[−X]=SD[X]= 3/2⇒σY = 3/2
Let Z = 2X − 3:
E[Z]=2E[X]−3=2×(3/2)−3=0⇒μZ =0
Var[Z]=4Var[X]=4×(3/4)=3⇒σ2 =3 √√Z√
SD[Z]=2SD[X]=2( 3/2)= 3⇒σZ = 3
3.3 A fair die is thrown repeatedly until a six turns up. Let X be the number of throws
required. A convenient way to represent the sample space is as follows:
Sample space = {S,(N,S),(N,N,S),...} where S = six, N = not a six, and the kth element in the list corresponds to a sequence of throws which ends at throw k with the first six i.e. it corresponds to X = k.
So X = k ⇔ ω = (N, N, N, . . . , N, S) where there are k − 1 N’s. ⇒ fX(k) = P(X = k) = (5/6)k−1(1/6), k = 1,2,3,...
So, for example, P(X is even) = P(X = 2)+P(X = 4)+··· = (5/6)(1/6)+(5/6)3(1/6)+ (5/6)5(1/6) + · · · = (5/36)(1 + 25/36 + (25/36)2 + · · ·) = (5/36)/(1 − 25/36) = 5/11
μ = E[X] = ∞k=1 k(5/6)k−1(1/6) = (1/6)(1+2(5/6)+3(5/6)2 +· · ·) = (1/6)(1−5/6)−2 = 6
E[X2] is best found via E[X(X − 1)]
7
E[X(X − 1)] = ∞k=1 k(k − 1)(5/6)k−1(1/6) = 2(1/6)(5/6)(1 − 5/6)−3 = 60 SoE[X2]=E[X2 −X]+E[X]=60+6=66⇒σ2 =66−62 =30,σ=√30=5.48
3.4 Consider the discrete random variable X with probability mass function x 10 100 1000
f(x) 0.2 0.5 0.3
E[X] = 10 × 0.2 + 100 × 0.5 + 1000 × 0.3 = 352
so E[2X + 3] = 2(352) + 3 = 707
E[log10 X] = (log10 10)×0.2+(log10 100)×0.5+(log10 1000)×0.3 = 0.2+1+0.9 = 2.1 Note: E[log X] ̸= log E[X]
3.5 Consider the continuous r.v. X with pdf f(x) = 3/x4, x > 1 (= 0 elsewhere). Checkpdfconditions: f(x)>0forx>1and∞3/x4dx=−1 ∞ =1
Probabilities:
P (1 < X < 2) = 2 3/x4 dx = − 1 2 = 7
1 x318 P(X>1.5)=∞3/x4dx=−1 ∞ = 8
1.5 x3 1.5 27
Cdf: F(x)=0forx≤1andF(x)=x3/t4dt=−1x =1− 1 forx>1
Plots of f(x) and F(x):
1 x31
1 t31x3
8
Level/location/average: mean μ = ∞ 3 dx = − 3 ∞ = 1.5 1 x3 2×2 1
medianm: solvingF(m)=0.5gives1−1/m3 =1/2som=21/3 =1.26 Spread/variability: E[X2] = ∞ 3 dx = −3∞ = 3 ⇒ σ2 = 3−1.52 = 0.75, σ =
√
1 x2 x 1 Higher moments: for r ≥ 3, E[Xr] does not exist.
0.75 = 0.866
3.6 Consider the continuous r.v. X with pdf f(x) = xe−x, x > 0 (= 0 elsewhere).
Cdf: F(x)=0forx≤0andF(x)=xte−tdt=[−te−t]x+xe−tdt=1−(1+x)e−x
000
for x > 0.
Level/location/average: mean μ = ∞ x2e−x dx = Γ(3) = 2 (or by parts)
0
medianm: setting1−(1+m)e−m =0.5⇒m=1.678
mode: turning point of pdf: setting f′(x) = 0 ⇒ (1 − x)e−x = 0 ⇒ mode = 1
Spread: E[X2]=∞x3e−xdx=Γ(4)=3!=6⇒Var[X]=σ2 =6−22 =2,SD[X]=
0
√
σ=2
Mgf: M(t)=E[etX]=∞xe−(1−t)xdx=∞ 1 ue−udu= Γ(2) = 1 fort<1
0 0 (1−t)2 (1−t)2 (1−t)2
M′(t) = 2(1−t)−3, M′′(t) = 6(1−t)−4 so μ = M′(0) = 2, E[X2] = M′′(0) = 6 (as before) M′′′(t) = 24(1−t)−5 so E[X3] = M′′′(0) = 24 ⇒ E[(X−μ)3] = 24−(3×2×6)+2×23 = 4
Hence skewness coefficient is γ1 = 4/23/2 = √2
3.7 Suppose X has a distribution with constant density, that is f(x) = 1, 0 < x < 1. We use cdfs to find the distributions of (a) Y = X2 and (b) Y = −lnX.
FX(x) = P(X ≤ x) = x for 0 ≤ x ≤ 1.
9
(a) Range of Y is (0, 1).
In (0,1), the events X2 ≤ y and X ≤ y1/2 are equivalent and so have the same
probability.
SoFY(y)=P(Y ≤y)=P(X2 ≤y)=P(X≤y1/2)=y1/2,0≤y≤1
⇒f (y)= d y1/2= 1 ,0≤y≤1. Y dy 2y1/2
Note: While the pdf of X is flat, that of Y = X2 is decreasing. (b) Range of Y is (0, ∞).
FY(y)=P(Y ≤y)=P(−lnX≤y)=P(lnX≥−y)=P(X≥e−y)=1− P(X≤e−y)=1−e−y fory≥0.
HencefY(y)= d (1−e−y)=e−y,y≥0. dy
10