EECS 70 Discrete Mathematics and Probability Theory Fall 2020
Geometric and Poisson Distributions
Note 19
Recall our basic probabilistic experiment of tossing a biased coin n times. This is a very simple model, yet surprisingly powerful. Many important probability distributions that are widely used to model real-world phenomena can be derived from looking at this basic coin tossing model.
The first example is the Binomial(n, p) distribution, introduced earlier. This is the distribution of the number
of Heads, Sn, in n tosses of a biased coin with probability p to be Heads. Recall that the distribution of Sn
is P[Sn = k] = npk(1− p)n−k for k ∈ {0,1,…,n}. The expected value is E[Sn] = np and the variance is k
Var(Sn ) = n p(1 − p). The binomial distribution frequently appears to model the number of successes in a repeated experiment.
1 Geometric Distribution
Consider repeatedly tossing a biased coin with Heads probability p. Let X denote the number of tosses until the first Head appears. Then X is a random variable that takes values in Z+, the set of positive integers. The event that X = i is equal to the event of observing Tails for the first i − 1 tosses and getting Heads in the i-th toss, which occurs with probability (1 − p)i−1 p. Such a random variable is called a geometric random variable.
The geometric distribution frequently occurs in applications because we are often interested in how long we have to wait before a certain event happens: how many runs before the system fails, how many shots before one is on target, how many poll samples before we find a Democrat, how many retransmissions of a packet before successfully reaching the destination, etc.
Definition 19.1 (Geometric Distribution). A random variable X for which P[X =i]=(1−p)i−1p, fori=1,2,3,…,
is said to have the geometric distribution with parameter p. This is abbreviated as X ∼ Geometric(p).
As a sanity check, we can verify that the total probability of X is equal to 1: ∞∞∞1
∑ P[X = i] = ∑(1 − p)i−1 p = p ∑(1 − p)i−1 = p × 1 − (1 − p) = 1, i=1 i=1 i=1
where in the second-to-last step we have used the formula for geometric series.
If we plot the distribution of X (i.e., the values P[X = i] against i) we get a curve that decreases monotonically
by a factor of 1 − p at each step, as illustrated in Figure 1.
1.1 Mean and Variance of a Geometric Random Variable
Let us now compute the expectation E[X]. Applying the definition of expected value directly gives us: ∞∞
E[X ] = ∑ i × P[X = i] = p ∑ i(1 − p)i−1 . i=1 i=1
EECS 70, Fall 2020, Note 19
1
p 0.1
0.12 0.35
p 0.3
●
●
●
●
●
●
●
●
●
●
●
●
●●●
0.10
0.08
0.06
0.04
0.02
0.00 0.00 ●●●●●●●●
0.30 0.25 0.20 0.15 0.10 0.05
●
●
2 4 6 8 10 12 14
●
●
●
●
●
2 4 6 8 10 12 14
kk
Figure 1: Illustration of the Geometric(p) distribution for p = 0.1 and p = 0.3.
However, the final summation is difficult to evaluate and requires some calculus trick. Instead, we will use
the following alternative formula for expectation.
Theorem 19.1 (Tail Sum Formula). Let X be a random variable that takes values in {0,1,2,…}. Then
∞
E[X] = ∑P[X ≥ i].
i=1
Proof. For notational convenience, let’s write pi = P[X = i], for i = 0, 1, 2, . . .. From the definition of
expectation, we have
E[X] = (0× p0)+(1× p1)+(2× p2)+(3× p3)+(4× p4)+···
= p1 +(p2 + p2)+(p3 + p3 + p3)+(p4 + p4 + p4 + p4)+···
= (p1 + p2 + p3 + p4 +···)+(p2 + p3 + p4 +···)+(p3 + p4 +···)+(p4 +···)+··· = P[X ≥ 1]+P[X ≥ 2]+P[X ≥ 3]+P[X ≥ 4]+··· .
In the third line, we have regrouped the terms into convenient infinite sums, and each infinite sum is exactly the probability that X ≥ i for each i. You should check that you understand how the fourth line follows from the third.
Let us repeat the proof more formally, this time using more compact mathematical notation:
∞∞j∞∞∞
E[X ] = ∑ j × P[X = j] = ∑ ∑ P[X = j] = ∑ ∑ P[X = j] = ∑ P[X ≥ i],
j=1 j=1 i=1 i=1 j=i
where the third equality follows from interchanging the order of summations.
i=1
We can now use Theorem 19.1 to compute E[X] more easily. Theorem 19.2. For X ∼ Geometric( p), we have E[X ] = 1 .
p
Proof. The key observation is that for a geometric random variable X,
P[X ≥ i] = (1− p)i−1 for i = 1,2,….
We can obtain this simply by summing P[X = j] for j ≥ i. Another way of seeing this is to note that the event
“X ≥ i” means at least i tosses are required. This is equivalent to saying that the first i − 1 tosses are all Tails, EECS 70, Fall 2020, Note 19 2
(1)
Pr(X k)
Pr(X k)
and the probability of this event is precisely (1 − p)i−1 . Now, plugging equation (1) into Theorem 19.1, we get
∞∞11 E[X] = ∑P[X ≥ i] = ∑(1− p)i−1 = 1−(1− p) = p,
i=1 i=1 where we have used the formula for geometric series.
p
p
the expected number of tosses is 2, but remember that the actual number of coin tosses that we need can be any positive integers.
Remark: Another way of deriving E[X] = 1 is to use the interpretation of a geometric random variable X p
as the number of coin tosses until we get a Head. Consider what happens in the first coin toss: If the first toss comes up Heads, then X = 1. Otherwise, we have used one toss, and we repeat the coin tossing process again; the number of coin tosses after the first toss is again a geometric random variable with parameter p. Therefore, we can calculate:
E[X] = p·1 + (1− p)·(1+E[X]) .
So, the expected number of tosses of a biased coin until the first Head appears is 1 .
Intuitively, if in each coin toss we expect to get p Heads, then we need to toss the coin 1 times to get 1 Head. So for a fair coin,
first toss is H first toss is T, then toss again Solving for E[X ] yields E[X ] = 1 , as claimed.
p
Let us now compute the variance of X.
Theorem 19.3. For X ∼ Geometric( p), we have Var(X ) = 1− p . p2
Proof. We will show that E[X (X − 1)] = 2(1− p) . Since we already know E[X ] = 1 , this will imply the p2 p
desired result:
Var(X) = EX2−(E[X])2 = E[X(X −1)]+E[X]−(E[X])2
=2(1−p)+1− 1 = 2(1−p)+p−1 = 1−p.
p2pp2 p2 p2 Now to show E[X (X − 1)] = 2(1− p) , we begin with the following identity of geometric series:
∞1 ∑(1−p)i = p. i=0
p2
Differentiating the identity above with respect to p yields (the i = 0 term is equal to 0 so we omit it):
∞1 −∑i(1−p)i−1 =−p2.
i=1
Differentiating both sides with respect to p again gives us (the i = 1 term is equal to 0 so we omit it):
∞2
∑i(i−1)(1− p)i−2 = p3 . (2) i=2
EECS 70, Fall 2020, Note 19
3
Now using the geometric distribution of X and identity (2), we can calculate:
as desired.
E[X(X −1)]
∞
= ∑i(i−1)×P[X = i]
i=1 ∞
= ∑i(i−1)(1−p)i−1p i=2
∞
= p(1−p)∑i(i−1)(1−p)i−2
i=2 = p(1−p)× 2
(thei=1termisequalto0soweomitit)
(usingidentity(2))
p3 = 2(1−p),
p2
1.2 Application: The Coupon Collector’s Problem
Suppose we are trying to collect a set of n different baseball cards. We get the cards by buying boxes of cereal: each box contains exactly one card, and it is equally likely to be any of the n cards. How many boxes do we need to buy until we have collected at least one copy of every card?
Let Sn denote the number of boxes we need to buy in order to collect all n cards. The distribution of Sn is difficult to compute directly (try it for n = 3). But if we are only interested in its expected value E[Sn], then we can evaluate it easily using linearity of expectation and what we have just learned about the geometric distribution.
As usual, we start by writing
Sn = X1 +X2 +···+Xn (3)
for suitable simple random variables Xi. What should the Xi be? Naturally, Xi is the number of boxes we buy while trying to get the i-th new card (starting immediately after we have gotten the (i − 1)-st new card). With this definition, make sure you believe equation (3) before proceeding.
What does the distribution of Xi look like? Well, X1 is trivial: no matter what happens, we always get a new card in the first box (since we have none to start with). So P[X1 = 1] = 1, and thus E[X1] = 1.
How about X2? Each time we buy a box, we will get the same old card with probability 1, and a new n−1 n
card with probability n . So we can think of buying boxes as flipping a biased coin with Heads probability p = n−1 ; then X2 is just the number of tosses until the first Head appears. So X2 has the geometric distribution
n n−1
with parameter p = n , and n
E[X2]= n−1.
How about X3? This is very similar to X2 except that now we only get a new card with probability n−2 (since
n−2 n there are now two old ones). So X3 has the geometric distribution with parameter p = n , and
E[X3]= n . n−2
Arguing in the same way, we see that, for i = 1,2,…,n, Xi has the geometric distribution with parameter
p = n−i+1 , and hence that n
E[Xi]= n . n−i+1
EECS 70, Fall 2020, Note 19
4
Finally, applying linearity of expectation to equation (3), we get nnnnnn1
E[Sn ] = ∑ E[Xi ] = n + n − 1 + · · · + 2 + 1 = n ∑ i . (4) i=1 i=1
This is an exact expression for E[Sn]. We can obtain a tidier form by noting that the sum in (4) actually has a very good approximation,1 namely:
∑n 1≈lnn+γE, i=1 i
where γE = 0.5772 . . . is known as Euler’s constant.
Thus,theexpectednumberofcerealboxesneededtocollectncardsisaboutn(lnn+γ). Thisisanexcel- lent approximation to the exact formula (4) even for quite small values of n. So for example, for n = 100, we expect to buy about 518 boxes.
2 Poisson Distribution
Consider the number of clicks of a Geiger counter, which measures radioactive emissions. The average number of such clicks per unit time, λ , is a measure of radioactivity, but the actual number of clicks fluctuates according to a certain distribution called the Poisson distribution. What is remarkable is that the average value, λ , completely determines the probability distribution on the number of clicks X .
Definition 19.2 (Poisson distribution). A random variable X for which λi −λ
P[X=i]= i!e , fori=0,1,2,… (5) is said to have the Poisson distribution with parameter λ . This is abbreviated as X ∼ Poisson(λ ).
To make sure this is a valid definition, let us check that (5) is in fact a distribution, i.e., that the probabilities
sum to 1. We have
∞ λi ∞ λi
∑i!e−λ =e−λ ∑i! =e−λ ×eλ =1. i=0 i=0
In the second-to-last step, we used the Taylor series expansion ex = 1 + x + x2 + x3 + · · · . 2! 3!
The Poisson distribution is also a very widely accepted model for so-called “rare events,” such as miscon- nected phone calls, radioactive emissions, crossovers in chromosomes, the number of cases of disease, the number of births per hour, etc. This model is appropriate whenever the occurrences can be assumed to hap- pen randomly with some constant density in a continuous region (of time or space), such that occurrences in disjoint subregions are independent. One can then show that the number of occurrences in a region of unit size should obey the Poisson distribution with parameter λ .
Example
Suppose when we write an article, we make an average of 1 typo per page. We can model this with a Poisson random variable X with λ = 1. So the probability that a page has 5 typos is
15−1 1 1 P[X=5] = 5!e = 120e ≈ 326.
1This is another of the little tricks you might like to carry around in your toolbox.
EECS 70, Fall 2020, Note 19 5
Now suppose the article has 200 pages. If we assume the number of typos in each page is independent, then the probability that there is at least one page with exactly 5 typos is
P[∃ a page with exactly 5 typos] = 1 − P[every page has ̸= 5 typos]
200
= 1−∏P[pagekhas̸=5typos]
k=1 200
= 1−∏(1−P[pagekhasexactly5typos]) k=1
1200 =1− 1−120e ,
where in the last step we have used our earlier calculation for P[X = 5].
2.1 Mean and Variance of a Poisson Random Variable
Let us now calculate the expectation and variance of a Poisson random variable.
Theorem 19.4. For a Poisson random variable X ∼ Poisson(λ ), we have E[X ] = λ and Var(X ) = λ .
Proof. We can calculate E[X] directly from the definition of expectation:
∞
E[X ] = ∑ i × P[X = i]
i=0
∞ λi −λ =∑i i!e
i=1
∞ λi−1
=λe−λ ∑(i−1)! i=1
=λe−λeλ =λ.
(thei=0termisequalto0soweomitit)
(sinceeλ =∑∞ λj with j=i−1) j=0 j!
Similarly, we can calculate E[X (X − 1)] as follows:
∞
E[X(X −1)] = ∑i(i−1)×P[X = i]
i=0
∞ λi −λ
=∑i(i−1) i!e i=2
(thei=0andi=1termsareequalto0soweomitthem)
(sinceeλ =∑∞ λj with j=i−2) j=0 j!
∞ λi−2 =λ2e−λ ∑(i−2)!
i=2 =λ2e−λeλ
=λ2. Therefore,
Var(X) = EX2−E[X]2 = E[X(X −1)]+E[X]−E[X]2 = λ2 +λ −λ2 = λ,
as desired.
EECS 70, Fall 2020, Note 19 6
A plot of the Poisson distribution reveals a curve that rises monotonically to a single peak and then decreases monotonically. The peak is as close as possible to the expected value, i.e., at i = ⌊λ ⌋. Figure 2 illustrates the Poisson(λ ) distribution for λ = 1, 2, 5, 20.
0.4 0.3 0.2 0.1 0.0
0.30
0.25
0.20
0.15
●
0.10
0.05
●●
λ1
λ2
●
●●
●
●
●
●●●
●
●
●●●●●●●● 0.00
0 5 10 15 0 5 10 15
kk
●●●●●●●●●●
λ 5
0.20 0.10
λ 20 ●●
●● ●●
●
●
●
●
●
●
●
●
●
●
● ●●●●
0.15
0.10
0.05
0.00 ●
0.08 0.06 0.04 0.02 0.00
● ●
● ●
● ●
●
●
● ●
●
● ●●
●
●
0
5
10 15
●●●●●●●●●●
●●●●●●●●●
kk
Figure 2: Illustration of the Poisson(λ ) distribution for λ = 1, 2, 5, 20. 2.2 Sum of Independent Poisson Random Variables
As illustrated in Figure 2, the Poisson(λ ) distribution becomes more symmetric and resembles a “bell curve” as λ increases. As we will learn later, this phenomenon is closely related to the following useful fact regarding a sum of independent Poisson random variables.
Theorem 19.5. Let X ∼ Poisson(λ ) and Y ∼ Poisson(μ ) be independent Poisson random variables. Then, X +Y ∼ Poisson(λ + μ).
EECS 70, Fall 2020, Note 19 7
0 10 20 30 40
Pr(X k) Pr(X k)
Pr(X k) Pr(X k)
Proof. For all k = 0,1,2,…, we have
P[X +Y = k] = ∑ P[X = j,Y = k − j]
j=0 k
= ∑ P[X = j]P[Y = k − j] j=0
k
k λj −λ μk−j −μ =∑j!e (k−j)!e
j=0
= e − ( λ + μ ) 1 ∑k k ! λ j μ k − j
k! j=0 j!(k − j)! −(λ +μ ) (λ + μ )k
=e k!,
where the second equality follows from independence, and the last equality from the binomial theorem.
By induction, we conclude that if X1,X2,…,Xn are independent Poisson random variables with parameters λ1,λ2,…,λn, respectively, then
X1 +X2 +···+Xn ∼ Poisson(λ1 +λ2 +···+λn). 2.3 Poisson as a limit of Binomial
To see a concrete example of how Poisson distribution arises, suppose we want to model the number of cell phone users initiating calls in a network during a time period, of duration (say) 1 minute. There are many customers in the network, and all of them can potentially make a call during this time period. However, only a very small fraction of them actually will. Under this scenario, it seems reasonable to make two assumptions:
• Theprobabilityofhavingmorethan1customerinitiatingacallinanysmalltimeintervalisnegligible. • The initiation of calls in disjoint time intervals are independent events.
Then if we divide the one-minute time period into n disjoint intervals, then the number of calls X in that
time period can be modeled as a binomial random variable with parameter n and probability of success p,
i.e., p is the probability of having a call initiated in a time interval of length 1/n. But what should p be in
terms of the relevant parameters of the problem? If calls are initiated at an average rate of λ calls per minute,
then E[X] = λ and so np = λ, i.e., p = λ/n . So X ∼ Binomial(n, λ ). As we shall see below, as we let n n
tend to infinity, this distribution tends to the Poisson distribution with parameter λ . We can also see why the Poisson distribution is a model for “rare events.” We are thinking of it as a sequence of a large number, n, of coin flips, where we expect only a finite number λ of Heads.
Now we will prove that Poisson(λ ) is the limit of Binomial(n, λ ), as n tends to infinity. n
Theorem 19.6. Let X ∼ Binomial(n, λ ) where λ > 0 is a fixed constant. Then for every i = 0,1,2,…, n
λi −λ
P[X = i] −→ i! e as n → ∞.
That is, the probability distribution of X converges to the Poisson distribution with parameter λ .
EECS 70, Fall 2020, Note 19 8
Proof. Fix i ∈ {0,1,2,…}, and assume n ≥ i (because we will let n → ∞). Then, because X has binomial distribution with parameter n and p = λ ,
n
n i n−i n! λi λn−i
P[X = i] = i p (1 − p) = i!(n − i)! n 1 − n Let us collect the factors into
λin! 1 λn λ−i P[X=i]= i! (n−i)!·ni · 1−n · 1−n .
.
(6)
The first parenthesis above becomes, as n → ∞,
n! · 1 = n·(n−1)···(n−i+1)·(n−i)! · 1 = n · (n−1) ··· (n−i+1) → 1.
(n−i)! ni (n−i)! ni n n From calculus, the second parenthesis in (6) becomes, as n → ∞,
λn
1 − n → e−λ .
And since i is fixed, the third parenthesis in (6) becomes, as n → ∞, λ−i
1− n →(1−0)−i =1. Substituting these results back to (6) gives us
n
λi −λ λi −λ P[X=i]→i!·1·e ·1=i!e ,
as desired.
EECS 70, Fall 2020, Note 19
9