Discrete distributions
Uniform on {1, 2, . . . , k}: parameter k a positive integer; X is the outcome in the situation in which all outcomes 1, 2, . . . , k are equally likely, so with probability 1/k
F71SM STATISTICAL METHODS
4 SPECIAL DISTRIBUTIONS
We study a series of distributions that have wide applicability. We give examples of situations in which the distributions are relevant and summarise some of their properties. More information is given in the Yellow Book (‘Formulae and Tables for Examinations’ of the actuarial profession), p6–15.
f(x) = 1/k, x = 1,2,…,k
μ = E[X]= k 1
k 1 1 k(k + 1)
(k+1)(2k+1) x=1kk6 6
X ktx 1 k−11tk−1
G(t) = E[t]= k=kt1+t+···+t =ktt−1fort̸=1
t 1 et ekt − 1
M(t)=Ge=k et−1 fort̸=0
Worked example 4.1 Let X be the number showing face up when a fair six-sided die
is thrown once.
X ∼ uniform on {1,2,3,4,5,6}, f(x) = 1/6, x = 1,2,…,6; μ = 7/2, σ2 = 35/12
G(t) = E [tX ] = 1 t (1 + t + t2 + · · · + t5 ) = t(t6 −1) , t ̸= 1 6 6(t−1)
P(X ≤ 3) = 3/6 = 1/2, P(X ≥ 5) = 2/6 = 1/3
Binomial(n,p): parameters n a positive integer and p, 0 < p < 1; X is the number of successes in a sequence of n Bernoulli trials (i.e. n independent, identical trials) with P (success) = p;
notation X ∼ binomial(n, p) or X ∼ bi(n, p) or X ∼ b(n, p)
Let P (failure) = q = 1 − p. The event X = x occurs when x trials result in successes and
n − x trials result in failures; each such sequence has probability px × qn−x and there are
n such sequences, so x
f(x) = x p q , x=0,1,...,n
x=1 E[X2]= x2 =
1 k(k+1)(2k+1)
X n n x n−x n G(t) = E[t ]= x (pt) q =(pt+q)
⇒ G′(t) = np(pt + q)n−1, G′′(t) = n(n − 1)p2(pt + q)n−2
⇒μ = G′(1)=np, σ2 =G′′(1)+G′(1)−(G′(1))2 =n(n−1)p2 +np−(np)2 =npq (Note that σ2 = μq < μ)
Alternatively, find E[X] and E[X2] directly from f(x). M(t) = G(et) = pet + qn
The bi(n, p) distribution is positively skewed for p < 0.5, negatively skewed for p > 0.5, and symmetrical for p = 0.5. The skewness increases as p → 0 or 1. A bi(1, p) r.v. is called a Bernoulli r.v., takes value 1 or 0 and indicates success or failure in a single trial — it is a binary or indicator variable; in this case μ = p, σ2 = pq. X ∼ bi(n,p) is the sum of n independent, identically distributed r.v.s, each Bernoulli(p).
Worked example 4.2 Let X be the number of sixes which show face up when a fair six-sided die is thrown eight times.
X ∼ bi(8,1/6); f(x) = 81x 58−x, x = 0,1,…,8; μ = 4/3, σ2 = 10/9; G(t) = x66
t+58; P(X = 2) = 812 56 = 0.2605, P(X ≤ 2) = 0.8652 (NCST p7 gives values 6 266
for p = 0.16 and 0.17)
Using R: dbinom(2,8,1/6) [1] 0.2604762
pbinom(2,8,1/6)
[1] 0.8651531
• Poisson(λ): parameter λ > 0; X is the number of events which occur in a unit of time in the situation in which events occur ‘at random’ one after another through time with rate λ (the situation is more formally described as being a ‘Poisson process’ with intensity λ); notation X ∼ Poisson(λ) or X ∼ Poi(λ) or X ∼ P (λ)
f(x) = x! , x = 0,1,2,… (note: the number of events is not bounded above)
∞ G(t) = E[tX] =
⇒ G′(t) = λeλ(t−1), G′′(t) = λ2eλ(t−1)
= e−λ eλt = eλ(t−1)
⇒μ = G′(1)=λ, σ2 =G′′(1)+G′(1)−(G′(1))2 =λ2 +λ−λ2 =λ
(Note that σ2 = μ)
Alternatively, find E[X] and E[X2] directly from f(x).
M(t) = G(et)=expλet −1
The Poi(λ) distribution is positively skewed, less strongly so as λ increases.
The Poisson distribution provides a good approximation to the binomial(n,p) distri-
bution in the case ‘large n, small p’: formally we let λ = np and we can show that nλx1−λn−x →e−λ λx asn→∞.
Worked example 4.3 Industrial accidents in the plants of a large multinational company occur at random through time, one after the other and at an average rate of 3 per week. Let X be the number of such accidents in a 4-week period.
E[X] = 3 × 4 = 12.
We model the number of accidents using X ∼ Poi(12), and, e.g. P(X ≤ 10) = 0.3472,
P (X ≥ 10) = 0.7576 (NCST p30). Using R:
ppois(10,12)
[1] 0.3472294
1 – ppois(9,12)
[1] 0.7576078
Worked example 4.4 Suppose X ∼ bi(1000,0.002). We approximate the binomial distribution with a Poisson distribution with mean λ = 1000 × 0.002 = 2. So we use X ∼ Poi(2).
P(X ≤ 3) = 0.857, P(X = 4 or 5) = 0.126 approx (NCST p25) Using R:
pbinom(3,1000,0.002)
[1] 0.8573042
ppois(3,2)
[1] 0.8571235
pbinom(5,1000,0.002) – pbinom(3,1000,0.002)
[1] 0.1262404
ppois(5,2) – ppois(3,2)
[1] 0.1263129
• Geometric(p): parameter p, 0 < p < 1; X is the number of failures before the first success occurs in a sequence of Bernoulli trials with P(success) = p; it is a discrete ‘waiting time distribution’ in the sense ‘how long do we have to wait to get a success?’; notation X ∼ geo(p); q = 1 − p as before.
The event X = x occurs when the first x trials result in failures and the next trial results in success; the probability of this sequence of outcomes occurring gives the pmf.
= p(1−p)x, x=0,1,2,...
= pq(1 − qt)−2, G′′(t) = 2pq2(1 − qt)−3
p(qt)x=1−qtfor|qt|<1,i.e.for|t|<1/q
′ q 2 ′′ ′ ′ 2 2q2 q q2 q
⇒ G′(t) ⇒μ
= G(1)=p, σ =G(1)+G(1)−(G(1)) = p2 +p−p2 =p2 (Note that σ2 = μ/p > μ)
Alternatively, find E[X] and E[X2] directly from f(x). 3
M(t) = G(et) = p 1−qet
The geo(p) distribution is positively skewed, increasingly so as p → 1. See Yellow Book p9, with k = 1.
[Note: There is an alternative version in which X is the number of trials required to get
the first success, i.e. previous variable + 1.
f(x)=p(1−p)x−1,x=1,2,…;μ=1/p,σ2 =q/p2;G(t)= pt ,M(t)= pet ; 1−qt 1−qet
See Yellow Book p8, with k = 1.]
Worked example 4.5 A fair six-sided die is thrown repeatedly until it lands showing a 6 face up. The number of throws before we get a 6 is X ∼ geo(1/6). E[X] = (5/6)/(1/6) = 5, Var[X] = (5/6)/(1/6)2 = 30, SD[X] ≈ 5.48.
What is the probability that we have to wait until the 5th throw or longer to get a 6? We want P (first 4 throws result in ‘not 6’) = (5/6)4 ≈ 0.4823.
Using the distribution of X explicitly, we want
1 1 5 1 52 1 53 P(X≥4) = 1−P(X≤3)=1− 6+6×6+6× 6 +6× 6 ≈0.4823
1 – pgeom(3,1/6)
[1] 0.4822531
Negative binomial(k, p): parameters k and p, k > 0, 0 < p < 1; notation X ∼ nb(k, p). In the case that k is a positive integer, X is the number of failures before the occurrence of the kth success (k = 1,2,3,...).
The event X = x occurs when the first x+k−1 trials consist of x failures and k−1 successes and the next trial results in success. X ∼ nb(k, p) is the sum of k independent, identically distributed r.v.s, each geo(p).
The mean and variance of X are k times those of geo(p); the pgf and mgf are the kth power of those of geo(p) — see Yellow Book p9 for pmf and other information.
The negative binomial distribution is defined for all k > 0 (not just for k an integer) and is sometimes used as an alternative model to the Poisson distribution (e.g. for claim numbers) when a model with stronger positive skewness is desirable.
The nb(k, p) distribution is positively skewed, increasingly so as p → 1 and as k → 0. [Note: Again there is an alternative version in which X is the number of trials required
to get the kth success, i.e. previous r.v. + k). See Yellow Book p8.]
Continuous distributions
Uniform on (a,b): parameters a,b with a < b; X is the position of a point chosen ‘at random’ in the interval (a, b); all outcomes in the interval are ‘equally likely’ (i.e. events defined by subintervals of the same length have the same probability); the pdf is constant
(graph is flat); notation X ∼ U (a, b). f(x) = 1 a≤x≤b
F(x) = P(X
f(x) = F(x) =
λe−λx, x > 0
λe−λudu=1−e−λx, x≥0(and=0forx<0)
λ t−1 e−(λ−t)xdx=λ−t= 1−λ
M(t) = etxλe−λxdx=λ
or find E[X] and E[X2] from the power series expansion M(t) = 1+ t + t2 +···, or find λ λ2
directly from f(x) by integration.
⇒μ = M′(0)= 1, E[X2]=M′′(0)= 2 ⇒σ2 = 1
M′(t) = λ 1−λ , M′′(t)=λ2 1−λ
The exp(λ) distribution is positively skewed; the coefficient of skewness does not depend onthevalueofλ;itsvalueisγ1 =2.
Worked example 4.6 Suppose claims on a portfolio of insurance business arise at ran- dom one after another through time at an average rate of 5 per period of 24 hrs. Let X be the time we have to wait from any specified time to the next claim arising.
X ∼ exponential with λ = 5/24 per hr and mean (expected waiting time) 24/5 hrs = 4.8 hrs.
f(x) = 5 e−5x/24 forx>0 24
F(x) = 1−e−5x/24 forx>0
What is the probability that the time between two successive claims arising exceeds 6 hrs? Answer = 1 − F (6) = e−30/24 = e−1.25 = 0.2865
1 – pexp(6,5/24)
[1] 0.2865048
• Normal(μ,σ2): parameters μ,σ with σ > 0; also called the Gaussian distribution; fun- damentally important in statistical theory and practice; good empirical model for some kinds of physical data; provides good approximation to some other distributions; mod- els distributions of certain sample statistics, in particular the sample mean and sample proportion; basis of much statistical methodology; notation X ∼ N(μ,σ2); we confirm below that the parameters do indeed represent the mean and standard deviation of the distribution (as suggested by the choice of symbols).
1 1x−μ2
f(x) = σ√2πexp −2 σ −∞
P(1000
Now X = 5Z + 1005 so required weight = 5 × 1.6449 + 1005 = 1013.22g
qnorm(0.95,1005,5)
[1] 1013.224
• Gamma(α, λ): parameters α(> 0), λ(> 0); notation X ∼ gamma(α, λ) or G(α, λ). In the case that α is a positive integer, X is the sum of α independent, identically distributed r.v.s, each exp(λ) and so models the sum of α inter-event times in a Poisson process. Note: G(1, λ) = exp(λ).
The Gamma distribution is defined for all α > 0 (not just for α an integer) and is positively skewed; it is sometimes used as a model for claim amounts.
f(x) = Γ(α)xα−1e−λx forx>0
t−α α α M(t) = 1−λ , E[X]=λ, Var[X]=λ2
The mean and variance of X are α times those of exp(λ); the mgf is the αth power of that of exp(λ) — see Yellow Book p12 for other information.
• Chi-squared(n): parameter n a positive integer; notation X ∼ χ2n; this distribution is very important in statistical theory and practice; it is a special case of the Gamma distribution, with parameters α = n/2, λ = 1/2.
M(t) = (1 − 2t)−n/2, E[X] = n, Var[X] = 2n
An important characterisation of the r.v. is that it is the sum of the squares of n inde- pendent N(0,1) r.v.s.
A useful transformation: In the case that 2α is a positive integer, X ∼ G(α, λ) ⇒ 2λX ∼ χ2α
The cdf is given in NCST Table 7 p37–39; percentage points (quantiles) are given in Table 8 p40–41.
P(χ23 <10)=0.9814,P(χ29 <10)=0.6495
P (χ23 < 7.815) = 0.95, P (χ29 < 21.67) = 0.99
pchisq(10,3)
[1] 0.9814339
pchisq(10,9)
[1] 0.6495148
qchisq(0.95,3)
[1] 7.814728
qchisq(0.99,9)
[1] 21.66599
• Beta(α,β): parameters α(> 0), β(> 0); useful as a model for proportions, especially in Bayesian statistical methods — see Yellow Book p13.
Other two-parameter positive distributions, useful as models for claim amounts, include: (i) the lognormal, a distribution such that X ∼ lognormal(μ, σ) ⇐⇒ ln X ∼ N(μ, σ2); and (ii) Pareto(α, λ), see Yellow Book p14.
4.3 The weak law of large numbers
The Chebyshev inequality tells us that, for any constant k > 0, P (|X − μ| < kσ) ≥ 1 − 1 k2
Let X ∼ b(n, p) and let us apply the above inequality to the observed proportion of successes Y = X/n.
E[Y] = np/n = p and Var[Y] = npq/n2 = pq/n. We get
pq 1 pq P |Y −p|
4.9 A drug causes serious side effects in approximately 0.1% of users. Consider a group of 2000 users of the drug. Let X be # people in the group who suffer serious side effects. The appropriate model for distribution of X is X ∼ bi(2000, 0.001).
P (X = 0 or 1) = 0.999 +
1 × 0.001 × 0.999 x 2000−x
P(X ≤ 2 and X ̸= 0) P(X ̸= 0)
P(X≤2|X̸=0) =
= P(X=1or2)=0.4232−0.0498=0.393
2000 2000
We can approximate bi(2000, 0.001) by Poisson with mean μ = 2000 × 0.001 = 2
i.e. by X ∼ Poi(2).
From NCST p25: P (X = 0 or 1) = 0.406, P (X ≤ 4) = 0.947 (approximately).
4.10 Suppose that claim sizes for a particular portfolio of business have an exponential distri- bution. The average claim size, as measured by the median, is £624.
(a) What is the average claim size as measured by the mean? (b) What percentage of claim sizes are greater than £1000?
×0.001 ×0.999
Claim size X ∼ exp(mean μ).
Then f(x) = μ1 e−x/μ, x > 0; F(x) = 1 − e−x/μ
Median M is given by F (M ) = 0.5, so 1 − e−M/μ = 0.5 ⇒ M = −μ ln 0.5 = μ ln 2
(a) μln2 = 624 ⇒ μ = 624/ln2 = £900.24
(b) P (X > 1000) = e−1000/μ = e−1000 ln 2/624 = e−1.1108 = 0.3293
so 32.9% of claim sizes are greater than £1000.
4.11 Suppose that the sizes of claims which arise under policies of a certain type can be modelled by a normal distribution with mean μ = £6000 and standard deviation σ = £900. The size of a particular claim is known to be greater than £5100. Find the probabilities that this claim size is (a) greater than £6000; and (b) between £5100 and £6300.
Let X be claim size (in units of £1000).
X ∼ N(6,0.92), Z = X−6 ∼ N(0,1) 0.9
(a) P(X >6|X >5.1)= P(X >6) = P(Z >0) = 0.5 =0.5943 P(X > 5.1) P(Z > −1) 0.8413
(b) P(5.1
= 0.6306 − 0.1587 = 0.5609 0.8413