F71SM STATISTICAL METHODS
4 SPECIAL DISTRIBUTIONS
We study a series of distributions that have wide applicability. We give examples of situations in
which the distributions are relevant and summarise some of their properties. More information
is given in the Yellow Book (‘Formulae and Tables for Examinations’ of the actuarial profession),
p6–15.
4.1 Discrete distributions
• Uniform on {1, 2, . . . , k}: parameter k a positive integer; X is the outcome in the situation
in which all outcomes 1, 2, . . . , k are equally likely, so with probability 1/k
f(x) = 1/k, x = 1, 2, . . . , k
µ = E[X] =
k∑
x=1
x
1
k
=
1
k
k(k + 1)
2
=
k + 1
2
E[X2] =
k∑
x=1
x2
1
k
=
1
k
k(k + 1)(2k + 1)
6
=
(k + 1)(2k + 1)
6
⇒ Var[X] =
k2 − 1
12
G(t) = E[tX ] =
k∑
x=1
tx
k
=
1
k
t
(
1 + t+ · · ·+ tk−1
)
=
1
k
t
tk − 1
t− 1
for t 6= 1
M(t) = G
(
et
)
=
1
k
et
(
ekt − 1
)
et − 1
for t 6= 0
Worked example 4.1 Let X be the number showing face up when a fair six-sided die
is thrown once.
X ∼ uniform on {1, 2, 3, 4, 5, 6}, f(x) = 1/6, x = 1, 2, . . . , 6; µ = 7/2, σ2 = 35/12
G(t) = E[tX ] = 1
6
t (1 + t+ t2 + · · ·+ t5) = t(t
6−1)
6(t−1) , t 6= 1
P (X ≤ 3) = 3/6 = 1/2, P (X ≥ 5) = 2/6 = 1/3
• Binomial(n, p): parameters n a positive integer and p, 0 < p < 1; X is the number of
successes in a sequence of n Bernoulli trials (i.e. n independent, identical trials) with
P (success) = p;
notation X ∼ binomial(n, p) or X ∼ bi(n, p) or X ∼ b(n, p)
Let P (failure) = q = 1− p. The event X = x occurs when x trials result in successes and
n− x trials result in failures; each such sequence has probability px × qn−x and there are(
n
x
)
such sequences, so
f(x) =
(
n
x
)
px qn−x, x = 0, 1, . . . , n
1
G(t) = E[tX ] =
n∑
x=0
(
n
x
)
(pt)x qn−x = (pt+ q)n
⇒ G′(t) = np(pt+ q)n−1, G′′(t) = n(n− 1)p2(pt+ q)n−2
⇒ µ = G′(1) = np, σ2 = G′′(1) +G′(1)− (G′(1))2 = n(n− 1)p2 + np− (np)2 = npq
(Note that σ2 = µq < µ)
Alternatively, find E[X] and E[X2] directly from f(x).
M(t) = G(et) =
(
pet + q
)n
The bi(n, p) distribution is positively skewed for p < 0.5, negatively skewed for p > 0.5,
and symmetrical for p = 0.5. The skewness increases as p → 0 or 1. A bi(1, p) r.v. is
called a Bernoulli r.v., takes value 1 or 0 and indicates success or failure in a single trial
— it is a binary or indicator variable; in this case µ = p, σ2 = pq. X ∼ bi(n, p) is the
sum of n independent, identically distributed r.v.s, each Bernoulli(p).
Worked example 4.2 Let X be the number of sixes which show face up when a fair
six-sided die is thrown eight times.
X ∼ bi(8, 1/6); f(x) =
(
8
x
) (
1
6
)x (5
6
)8−x
, x = 0, 1, . . . , 8; µ = 4/3, σ2 = 10/9; G(t) =(
t+5
6
)8
; P (X = 2) =
(
8
2
) (
1
6
)2 (5
6
)6
= 0.2605, P (X ≤ 2) = 0.8652 (NCST p7 gives values
for p = 0.16 and 0.17)
Using R:
dbinom(2,8,1/6)
[1] 0.2604762
pbinom(2,8,1/6)
[1] 0.8651531
• Poisson(λ): parameter λ > 0; X is the number of events which occur in a unit of time in
the situation in which events occur ‘at random’ one after another through time with rate
λ (the situation is more formally described as being a ‘Poisson process’ with intensity λ);
notation X ∼ Poisson(λ) or X ∼ Poi(λ) or X ∼ P (λ)
f(x) =
e−λλx
x!
, x = 0, 1, 2, . . . (note: the number of events is not bounded above)
G(t) = E[tX ] =
∞∑
x=0
e−λ
(λt)x
x!
= e−λ eλt = eλ(t−1)
⇒ G′(t) = λeλ(t−1), G′′(t) = λ2eλ(t−1)
⇒ µ = G′(1) = λ, σ2 = G′′(1) +G′(1)− (G′(1))2 = λ2 + λ− λ2 = λ
(Note that σ2 = µ)
Alternatively, find E[X] and E[X2] directly from f(x).
M(t) = G(et) = exp
(
λ
(
et − 1
))
The Poi(λ) distribution is positively skewed, less strongly so as λ increases.
2
The Poisson distribution provides a good approximation to the binomial(n, p) distri-
bution in the case ‘large n, small p’: formally we let λ = np and we can show that(
n
x
) (
λ
n
)x (
1− λ
n
)n−x → e−λ λx
x!
as n→∞.
Worked example 4.3 Industrial accidents in the plants of a large multinational company
occur at random through time, one after the other and at an average rate of 3 per week.
Let X be the number of such accidents in a 4-week period.
E[X] = 3× 4 = 12.
We model the number of accidents using X ∼ Poi(12), and, e.g. P (X ≤ 10) = 0.3472,
P (X ≥ 10) = 0.7576 (NCST p30).
Using R:
ppois(10,12)
[1] 0.3472294
1 – ppois(9,12)
[1] 0.7576078
Worked example 4.4 Suppose X ∼ bi(1000, 0.002). We approximate the binomial
distribution with a Poisson distribution with mean λ = 1000 × 0.002 = 2. So we use
X ∼ Poi(2).
P (X ≤ 3) = 0.857, P (X = 4 or 5) = 0.126 approx (NCST p25)
Using R:
pbinom(3,1000,0.002)
[1] 0.8573042
ppois(3,2)
[1] 0.8571235
pbinom(5,1000,0.002) – pbinom(3,1000,0.002)
[1] 0.1262404
ppois(5,2) – ppois(3,2)
[1] 0.1263129
• Geometric(p): parameter p, 0 < p < 1; X is the number of failures before the first success occurs in a sequence of Bernoulli trials with P (success) = p; it is a discrete ‘waiting time distribution’ in the sense ‘how long do we have to wait to get a success?’; notation X ∼ geo(p); q = 1− p as before. The event X = x occurs when the first x trials result in failures and the next trial results in success; the probability of this sequence of outcomes occurring gives the pmf. f(x) = p(1− p)x, x = 0, 1, 2, . . . G(t) = E[tX ] = ∞∑ x=0 p(qt)x = p 1− qt for |qt| < 1, i.e. for |t| < 1/q ⇒ G′(t) = pq(1− qt)−2, G′′(t) = 2pq2(1− qt)−3 ⇒ µ = G′(1) = q p , σ2 = G′′(1) +G′(1)− (G′(1))2 = 2q2 p2 + q p − q2 p2 = q p2 (Note that σ2 = µ/p > µ)
Alternatively, find E[X] and E[X2] directly from f(x).
3
M(t) = G(et) = p
1−qet
The geo(p) distribution is positively skewed, increasingly so as p→ 1.
See Yellow Book p9, with k = 1.
[Note: There is an alternative version in which X is the number of trials required to get
the first success, i.e. previous variable + 1.
f(x) = p(1− p)x−1, x = 1, 2, . . .; µ = 1/p, σ2 = q/p2; G(t) = pt
1−qt , M(t) =
pet
1−qet ;
See Yellow Book p8, with k = 1.]
Worked example 4.5 A fair six-sided die is thrown repeatedly until it lands showing a 6
face up. The number of throws before we get a 6 is X ∼ geo(1/6). E[X] = (5/6)/(1/6) =
5, Var[X] = (5/6)/(1/6)2 = 30, SD[X] ≈ 5.48.
What is the probability that we have to wait until the 5th throw or longer to get a 6?
We want P (first 4 throws result in ‘not 6’) = (5/6)4 ≈ 0.4823.
Using the distribution of X explicitly, we want
P (X ≥ 4) = 1− P (X ≤ 3) = 1−
(
1
6
+
1
6
×
5
6
+
1
6
×
(
5
6
)2
+
1
6
×
(
5
6
)3)
≈ 0.4823
Using R:
1 – pgeom(3,1/6)
[1] 0.4822531
• Negative binomial(k, p): parameters k and p, k > 0, 0 < p < 1; notation X ∼ nb(k, p).
In the case that k is a positive integer, X is the number of failures before the occurrence
of the kth success (k = 1, 2, 3, . . .).
The event X = x occurs when the first x + k − 1 trials consist of x failures and k − 1
successes and the next trial results in success. X ∼ nb(k, p) is the sum of k independent,
identically distributed r.v.s, each geo(p).
The mean and variance of X are k times those of geo(p); the pgf and mgf are the kth
power of those of geo(p) — see Yellow Book p9 for pmf and other information.
The negative binomial distribution is defined for all k > 0 (not just for k an integer)
and is sometimes used as an alternative model to the Poisson distribution (e.g. for claim
numbers) when a model with stronger positive skewness is desirable.
The nb(k, p) distribution is positively skewed, increasingly so as p→ 1 and as k → 0.
[Note: Again there is an alternative version in which X is the number of trials required
to get the kth success, i.e. previous r.v. + k). See Yellow Book p8.]
4.2 Continuous distributions
• Uniform on (a, b): parameters a, b with a < b; X is the position of a point chosen ‘at random’ in the interval (a, b); all outcomes in the interval are ‘equally likely’ (i.e. events defined by subintervals of the same length have the same probability); the pdf is constant 4 (graph is flat); notation X ∼ U(a, b). f(x) = 1 b− a a ≤ x ≤ b F (x) = P (X < x) = 0 x < a x−a b−a a ≤ x ≤ b 1 x > b
µ = E[x] =
∫ b
a
1
b− a
x dx =
1
b− a
(
b2
2
−
a2
2
)
=
a+ b
2
E[X2] =
∫ b
a
1
b− a
x2 dx =
1
b− a
(
b3
3
−
a3
3
)
=
a2 + ab+ b2
3
⇒ σ2 =
(b− a)2
12
M(t) =
∫ b
a
1
b− a
etx dx =
1
b− a
(
ebt
t
−
eat
t
)
=
ebt − eat
(b− a)t
, t 6= 0
The distribution is, of course, symmetrical (skewness = 0).
• Exponential(λ): parameter λ > 0; X is the waiting time between consecutive events in
the situation in which events occur ‘at random’ one after another through time with rate
λ (as above, the situation is more formally described as being a ‘Poisson process’ with
intensity λ); notation X ∼ exp(λ)
f(x) = λe−λx, x > 0
F (x) =
∫ x
0
λe−λu du = 1− e−λx, x ≥ 0 (and = 0 for x < 0) M(t) = ∫ ∞ 0 etx λe−λx dx = λ ∫ ∞ 0 e−(λ−t)x dx = λ λ− t = ( 1− t λ )−1 for t < λ M ′(t) = 1 λ ( 1− t λ )−2 , M ′′(t) = 2 λ2 ( 1− t λ )−3 ⇒ µ = M ′(0) = 1 λ , E[X2] = M ′′(0) = 2 λ2 ⇒ σ2 = 1 λ2 or find E[X] and E[X2] from the power series expansion M(t) = 1 + t λ + t 2 λ2 + · · ·, or find directly from f(x) by integration. 5 The exp(λ) distribution is positively skewed; the coefficient of skewness does not depend on the value of λ; its value is γ1 = 2. Worked example 4.6 Suppose claims on a portfolio of insurance business arise at ran- dom one after another through time at an average rate of 5 per period of 24 hrs. Let X be the time we have to wait from any specified time to the next claim arising. X ∼ exponential with λ = 5/24 per hr and mean (expected waiting time) 24/5 hrs = 4.8 hrs. f(x) = 5 24 e−5x/24 for x > 0
F (x) = 1− e−5x/24 for x > 0
What is the probability that the time between two successive claims arising exceeds 6 hrs?
Answer = 1− F (6) = e−30/24 = e−1.25 = 0.2865
Using R:
1 – pexp(6,5/24)
[1] 0.2865048
• Normal(µ, σ2): parameters µ, σ with σ > 0; also called the Gaussian distribution; fun-
damentally important in statistical theory and practice; good empirical model for some
kinds of physical data; provides good approximation to some other distributions; mod-
els distributions of certain sample statistics, in particular the sample mean and sample
proportion; basis of much statistical methodology; notation X ∼ N(µ, σ2); we confirm
below that the parameters do indeed represent the mean and standard deviation of the
distribution (as suggested by the choice of symbols).
f(x) =
1
σ
√
2π
exp
(
−
1
2
(
x− µ
σ
)2)
−∞ < x <∞
M(t) =
∫ ∞
−∞
etx
1
σ
√
2π
exp
(
−
1
2
(
x− µ
σ
)2)
dx
6
Note that
1
2
(
x− µ
σ
)2
− tx =
1
2σ2
(
(x− µ)2 − 2σ2tx
)
=
1
2σ2
((
x− (µ+ σ2t)
)2 − 2µσ2t− σ4t2)
=
1
2
(
x− (µ+ σ2t)
σ
)2
− µt−
σ2t2
2
and so M(t) = exp
(
µt+
σ2t2
2
)∫ ∞
−∞
1
σ
√
2π
exp
(
−
1
2
(
x− (µ+ σ2t)
σ
)2)
dx
The final integral above is the integral of the pdf of N(µ + σ2t, σ2); any pdf integrates
to 1, and so finally
M(t) = exp
(
µt+
σ2t2
2
)
⇒M ′(t) =
(
µ+ σ2t
)
M(t), M ′′(t) = σ2M(t) +
(
µ+ σ2t
)
M ′(t)
⇒ mean = M ′(0) = µ, E[X2] = M ′′(0) = σ2 + µ2 ⇒ variance = σ2,
thus confirming the roles played by the parameters.
We could also find E[X] and E[X2] from the power series expansion of M(t), or directly
from f(x) by integration.
The normal distribution is symmetrical about the mean µ.
Linear function: X ∼ N(µ, σ2), Y = a+ bX ⇒ Y ∼ N(a+ bµ, b2σ2) ← show (use mgfs)
The N(0, 1) distribution is called the standard normal distribution.
Let X ∼ N(µ, σ2) and let Z =
X − µ
σ
: this transformation is called standardising.
Z ∼ N(0, 1), a standard normal random variable.
For Z ∼ N(0, 1): fZ(z) = 1√2π e
−z2/2, z ∈ R; MZ(t) = et
2/2
To find probabilities associated with the distribution of X, we standardise the variable
and find the probability using published tables of P (Z ≤ z) for Z ∼ N(0, 1) (or we can
use R).
Worked example 4.7 Suppose the weights of packages are normally distributed with
mean 1005g and standard deviation 5g. Let X denote the weight of a package in grams,
then X ∼ N(1005, 52).
Z = (X − 1005)/5 ∼ N(0, 1)
P (X < 1005) = P (Z < 0) = 0.5
P (X < 1000) = P (Z < −1) = P (Z > 1) = 1− 0.8413 = 0.1587 (NCST p34)
P (1000 < X < 1015) = P (−1 < Z < 2) = P (Z < 2)− P (Z < −1) = 0.97725− 0.1587 =
0.8186
Using R:
pnorm(1000,1005,5)
[1] 0.1586553
pnorm(1015,1005,5) - pnorm(1000,1005,5)
[1] 0.8185946
7
What weight do the heaviest 5% of packages exceed?
P (Z > 1.6449) = 0.05 (NCST p35 Table 5)
Now X = 5Z + 1005 so required weight = 5× 1.6449 + 1005 = 1013.22g
Using R:
qnorm(0.95,1005,5)
[1] 1013.224
• Gamma(α, λ): parameters α(> 0), λ(> 0); notation X ∼ gamma(α, λ) or G(α, λ). In the
case that α is a positive integer, X is the sum of α independent, identically distributed
r.v.s, each exp(λ) and so models the sum of α inter-event times in a Poisson process.
Note: G(1, λ) = exp(λ).
The Gamma distribution is defined for all α > 0 (not just for α an integer) and is
positively skewed; it is sometimes used as a model for claim amounts.
f(x) =
λα
Γ(α)
xα−1 e−λx for x > 0
M(t) =
(
1−
t
λ
)−α
, E[X] =
α
λ
, Var[X] =
α
λ2
The mean and variance of X are α times those of exp(λ); the mgf is the αth power of
that of exp(λ) — see Yellow Book p12 for other information.
• Chi-squared(n): parameter n a positive integer; notation X ∼ χ2n; this distribution is very
important in statistical theory and practice; it is a special case of the Gamma distribution,
with parameters α = n/2, λ = 1/2.
M(t) = (1− 2t)−n/2, E[X] = n, Var[X] = 2n
An important characterisation of the r.v. is that it is the sum of the squares of n inde-
pendent N(0, 1) r.v.s.
8
A useful transformation: In the case that 2α is a positive integer,
X ∼ G(α, λ)⇒ 2λX ∼ χ22α
The cdf is given in NCST Table 7 p37–39; percentage points (quantiles) are given in
Table 8 p40–41.
e.g.
P (χ23 < 10) = 0.9814, P (χ
2
9 < 10) = 0.6495
P (χ23 < 7.815) = 0.95, P (χ
2
9 < 21.67) = 0.99
Using R:
pchisq(10,3)
[1] 0.9814339
pchisq(10,9)
[1] 0.6495148
qchisq(0.95,3)
[1] 7.814728
qchisq(0.99,9)
[1] 21.66599
• Beta(α, β): parameters α(> 0), β(> 0); useful as a model for proportions, especially in
Bayesian statistical methods — see Yellow Book p13.
Other two-parameter positive distributions, useful as models for claim amounts, include:
(i) the lognormal, a distribution such that X ∼ lognormal(µ, σ) ⇐⇒ lnX ∼ N(µ, σ2); and
(ii) Pareto(α, λ), see Yellow Book p14.
4.3 The weak law of large numbers
The Chebyshev inequality tells us that, for any constant k > 0, P (|X − µ| < kσ) ≥ 1− 1 k2 9 Let X ∼ b(n, p) and let us apply the above inequality to the observed proportion of successes Y = X/n. E[Y ] = np/n = p and Var[Y ] = npq/n2 = pq/n. We get P ( |Y − p| < k √ pq n ) ≥ 1− 1 k2 ⇒ P (|Y − p| < c) ≥ 1− pq nc2 It follows that, for any c > 0, limn→∞ P (|Y − p| < c) = 1. Hence, as n → ∞, the probability approaches 1 that the proportion of successes will dif- fer from p (the population proportion) by less than any arbitrary constant; equivalently, the probability approaches 0 that the proportion of successes will differ from p by more than any arbitrary constant. This result is called the weak law of large numbers. It is a formal version of the statement ‘relative frequency → probability’ as the number of trials increases. Note that the result applies to the proportion of successes — NOT the number of successes. 10 4.4 Further worked examples 4.8 Claims arise on a portfolio of insurance business as a Poisson process at an average rate of 3 per week. (a) Let X be # claims in a week. Then X ∼ Poi(3). P (no claims) = P (X = 0) = e−3 = 0.0498 P (at most 1 claim) = P (X = 0 or 1) = e−3 + 3e−3 = 4e−3 = 0.1991 P (exactly 2 claims) = P (X = 2) = 32 e−3/2! = 0.2240 [or from NCST p25: P (X = 2) = 0.4232 − 0.1991, or use R: e.g. dpois(0,3), ppois(1,3)] What is the modal (most likely) number of claims in a week? P (X = 0) = 0.0498, P (X = 1) = 0.1494, P (X = 2) = 0.2240, P (X = 3) = 0.2240, P (X = 4) = 0.1680, . . . There is no unique mode: modal values are ‘2 and 3 claims’. (b) Suppose you know that a particular week is not ‘claim-free’ i.e. one or more claims arise. Consider the probability that at most 2 claims arise in that week: P (X ≤ 2|X 6= 0) = P (X ≤ 2 and X 6= 0) P (X 6= 0) = P (X = 1 or 2) P (X 6= 0) = 0.4232− 0.0498 1− 0.0498 = 0.393 (c) Let Y be # claims in a 4-week period. Then Y ∼ Poi(12). P (more than 11 claims) = P (Y > 11) = 1− P (Y ≤ 11) = 1− 0.4616 = 0.5384
NCST p30 or use R: 1 – ppois(11,12)
4.9 A drug causes serious side effects in approximately 0.1% of users. Consider a group of
2000 users of the drug. Let X be # people in the group who suffer serious side effects.
The appropriate model for distribution of X is X ∼ bi(2000, 0.001).
P (X = 0 or 1) = 0.9992000 +
(
2000
1
)
× 0.001× 0.9991999 = 0.4059
P (X ≤ 4) =
4∑
x=0
(
2000
x
)
× 0.001x × 0.9992000−x
We can approximate bi(2000, 0.001) by Poisson with mean µ = 2000× 0.001 = 2
i.e. by X ∼ Poi(2).
From NCST p25: P (X = 0 or 1) = 0.406, P (X ≤ 4) = 0.947 (approximately).
4.10 Suppose that claim sizes for a particular portfolio of business have an exponential distri-
bution. The average claim size, as measured by the median, is £624.
(a) What is the average claim size as measured by the mean?
(b) What percentage of claim sizes are greater than £1000?
11
Solution:
Claim size X ∼ exp(mean µ).
Then f(x) = 1
µ
e−x/µ, x > 0; F (x) = 1− e−x/µ
Median M is given by F (M) = 0.5, so 1− e−M/µ = 0.5⇒M = −µ ln 0.5 = µ ln 2
(a) µ ln 2 = 624⇒ µ = 624/ ln 2 = £900.24
(b) P (X > 1000) = e−1000/µ = e−1000 ln 2/624 = e−1.1108 = 0.3293
so 32.9% of claim sizes are greater than £1000.
4.11 Suppose that the sizes of claims which arise under policies of a certain type can be
modelled by a normal distribution with mean µ = £6000 and standard deviation σ =
£900. The size of a particular claim is known to be greater than £5100. Find the
probabilities that this claim size is (a) greater than £6000; and (b) between £5100 and
£6300.
Solution:
Let X be claim size (in units of £1000).
X ∼ N(6, 0.92), Z = X−6
0.9
∼ N(0, 1)
(a) P (X > 6|X > 5.1) =
P (X > 6)
P (X > 5.1)
=
P (Z > 0)
P (Z > −1)
=
0.5
0.8413
= 0.5943
(b) P (5.1 < X < 6.3|X > 5.1) =
P (5.1 < X < 6.3)
P (X > 5.1)
=
P (−1 < Z < 0.3333)
P (Z > −1)
=
0.6306− 0.1587
0.8413
= 0.5609
12