With Question/Answer Animations
Chapter 7
Chapter Summary
Introduction to Discrete Probability Probability Theory
Bayes’ Theorem
Expected Value and Variance
Section 7.1
Section Summary
Finite Probability
Probabilities of Complements and Unions of Events Probabilistic Reasoning
Probability of an Event
Pierre-Simon Laplace (1749-1827)
which he introduced in the 18 century, when he analyzed games of chance.
We first study Pierre-Simon Laplace’s classical theory of probability, th
We first define these key terms:
An experiment is a procedure that yields one of a given set of possible
outcomes.
The sample space of the experiment is the set of possible outcomes. An event is a subset of the sample space.
Here is how Laplace defined the probability of an event:
Definition: If S is a finite sample space of equally likely outcomes, and E is an event, that is, a subset of S, then the probability of E is
p(E) = |E|/|S|.
ForeveryeventE,wehave0≤p(E) ≤1.Thisfollowsdirectlyfromthe definition because 0 ≤ p(E) = |E|/|S| ≤ |S|/|S| ≤ 1, since 0 ≤ |E| ≤ |S|.
Applying Laplace’s Definition
Example: An urn contains four blue balls and five red balls. What is the probability that a ball chosen from the urn is blue?
Example: What is the probability that when two dice are rolled, the sum of the numbers on the two dice is 7?
Applying Laplace’s Definition
Example: An urn contains four blue balls and five red balls. What is the probability that a ball chosen from the urn is blue?
Solution: The probability that the ball is chosen is 4/9 since there are nine possible outcomes, and four of these produce a blue ball.
Example: What is the probability that when two dice are rolled, the sum of the numbers on the two dice is 7?
Solution: By the product rule there are 62 = 36 possible outcomes. Six of these sum to 7. Hence, the probability of obtaining a 7 is 6/36 = 1/6.
Applying Laplace’s Definition
Examples:
In a lottery, a player wins a large prize when they pick four digits that match, in correct order, four digits selected by a random mechanical process. What is the probabilitythataplayerwinsthe prize?
A smaller prize is won if only three digits are matched. What is the probability that a player wins the small prize?
Applying Laplace’s Definition
Examples:
In a lottery, a player wins a large prize when they pick four digits that match, in correct order, four digits selected by a random mechanical process. What is the probabilitythataplayerwinsthe prize?
Solution: By the product rule there are 104 = 10,000 ways to pick four digits. Since there is only 1 way to pick the correct digits, the probability of winning
the large prize is 1/10,000 = 0.0001.
A smaller prize is won if only three digits are matched. What is the probability
that a player wins the small prize?
Solution: If exactly three digits are matched, one of the four digits must be incorrect and the other three digits must be correct. For the digit that is incorrect, there are 9 possible choices. Hence, by the sum rule, there a total of 36 possible ways to choose four digits that match exactly three of the winning four digits. The probability of winning the small price is 36/10,000 = 9/2500 = 0.0036.
Applying Laplace’s Definition
Example: There are many lotteries that award prizes to people who correctly choose a set of six numbers out of the first n positive integers, where n is usually between 30 and 60. What is the probability that a person picks the correct six numbers out of 40?
Applying Laplace’s Definition
Example: There are many lotteries that award prizes to people who correctly choose a set of six numbers out of the first n positive integers, where n is usually between 30 and 60. What is the probability that a person picks the correct six numbers out of 40?
Solution: The number of ways to choose six numbers out of 40 is
C(40,6) = 40!/(34!6!) = 3,838,380.
Hence, the probability of picking a winning combination is
1/ 3,838,380 ≈ 0.00000026.
Can you work out the probability of winning the lottery with the biggest prize where you live?
Applying Laplace’s Definition
Example: What is the probability that the numbers 11, 4, 17, 39, and 23 are drawn in that order from a bin with 50 balls labeled with the numbers 1,2, …, 50 if
a) The ball selected is not returned to the bin.
b) The ball selected is returned to the bin before the next ball is selected.
Applying Laplace’s Definition
Example: What is the probability that the numbers 11, 4, 17, 39, and 23 are drawn in that order from a bin with 50 balls labeled with the numbers 1,2, …, 50 if
a) The ball selected is not returned to the bin.
b) The ball selected is returned to the bin before the next ball
is selected.
Solution: Use the product rule in each case.
a) Sampling without replacement: The probability is 1/254,251,200 since there are 50 ∙49 ∙47 ∙46 = 254,251,200 ways to choose the five balls.
b) Sampling with replacement: The probability is 1/505 = 1/312,500,000 since 505 = 312,500,000.
The Probability of Complements
and Unions of Events
Theorem 1: Let E be an event in sample space S. The probability of the event = S − E, the complementary event of E, is given by
Proof: Using the fact that | | = |S| − |E|,
The Probability of Complements
and Unions of Events
Example: A sequence of 10 bits is chosen randomly. What is the probability that at least one of these bits is 0?
The Probability of Complements
and Unions of Events
Example: A sequence of 10 bits is chosen randomly. What is the probability that at least one of these bits is 0?
Solution: Let E be the event that at least one of the 10 bits is 0. Then is the event that all of the bits are 1s. The size of the sample space S is 210. Hence,
The Probability of Complements
and Unions of Events
Theorem 2: Let E1 and E2 be events in the sample space S. Then
Proof: Given the inclusion-exclusion formula from Section2.2,|A∪B|=|A|+|B|−|A∩B|, itfollows that
The Probability of Complements
and Unions of Events
Example: What is the probability that a positive integer selected at random from the set of positive integers not exceeding 100 is divisible by either 2 or 5?
The Probability of Complements
and Unions of Events
Example: What is the probability that a positive integer selected at random from the set of positive integers not exceeding 100 is divisible by either 2 or 5?
be the event that it is divisible 5? Then the event that the integer is divisible by 2 or 5 is
1 divisible by 2 and E
Solution: Let E be the event that the integer is
E ∪E andE ∩E isthe eventthatitisdivisibleby2 1212
and 5.
It follows that:
2
p(E ∪ E ) = p(E ) + p(E ) – p(E ∩ E ) 121212
= 50/100 + 20/100 − 10/100 = 3/5.
1
2
3
Monty Hall Puzzle
Example: You are asked to select one of the three doors to open. There is a large prize behind one of the doors and if you select that door, you win the prize. After you select a door, the game show host opens one of the other doors (which he knows is not the winning door). The prize is not behind the door and he gives you the opportunity to switch your selection. Should you switch?
Monty Hall Puzzle
Example: You are asked to select one of the three doors to open. There is a large prize behind one of the doors and if you select that door, you win the prize. After you select a door, the game show host opens one of the other doors (which he knows is not the winning door). The prize is not behind the door and he gives you the opportunity to switch your selection. Should you switch?
(This is a notoriously confusing problem that has been the subject of much discussion . Do a web search to see why!)
Solution: You should switch. The probability that your initial pick is correct is 1/3. This is the same whether or not you switch doors. But since the game show host always opens a door that does not have the prize, if you switch the probability of winning will be 2/3, because you win if your initial pick was not the correct door and the probability your initial pick was wrong is 2/3.
Section 7.2
Section Summary
Assigning Probabilities
Probabilities of Complements and Unions of Events
Conditional Probability
Independence
Bernoulli Trials and the Binomial Distribution
Random Variables
The Birthday Problem
Monte Carlo Algorithms
The Probabilistic Method (not currently included in the overheads)
Assigning Probabilities
Laplace’s definition from the previous section, assumes that all outcomes are equally likely. Now we introduce a more general definition of probabilities that avoids this restriction.
Let S be a sample space of an experiment with a finite number of outcomes. We assign a probability p(s) to each outcome s, so that:
i. 0 ≤ p(s) ≤ 1 for each s ∈ S ii.
The function p from the set of all outcomes of the sample space S is called a probability distribution.
Assigning Probabilities
Example: What probabilities should we assign to the outcomes H(heads) and T (tails) when a fair coin is flipped? What probabilities should be assigned to these outcomes when the coin is biased so that heads comes up twice as often as tails?
Assigning Probabilities
Example: What probabilities should we assign to the outcomes H(heads) and T (tails) when a fair coin is flipped? What probabilities should be assigned to these outcomes when the coin is biased so that heads comes up twice as often as tails?
Solution: We have p(H) = 2p(T). Because p(H) + p(T) = 1, it follows that
2p(T) + p(T) = 3p(T) = 1. Hence, p(T) = 1/3 and p(H) = 2/3.
Uniform Distribution
Definition: Suppose that S is a set with n elements. The uniform distribution assigns the probability 1/n to each element of S. (Note that we could have used Laplace’s definition here.)
Example: Consider again the coin f lipping example, but with a fair coin. Now p(H) = p(T) = 1/2.
Probability of an Event
Definition: The probability of the event E is the sum of the probabilities of the outcomes in E.
Note that now no assumption is being made about the distribution.
Example
Example: Suppose that a die is biased so that 3 appears twice as often as each other number, but that the other five outcomes are equally likely. What is the probability that an odd number appears when we roll this die?
Example
Example: Suppose that a die is biased so that 3 appears twice as often as each other number, but that the other five outcomes are equally likely. What is the probability that an odd number appears when we roll this die?
Solution: We want the probability of the event E = {1,2,3}. We have p(3) = 2/7 and
p(1) = p(2) = p(4) = p(5) = p(6) = 1/7. Hence, p(E) = p(1) + p(3) + p(5) =
1/7 + 2/7 + 1/7 = 4/7.
Probabilities of Complements and Unions of Events
Complements: still holds. Since each outcome is in either E or , but not both,
Unions:
also still holds under the new definition.
Combinations of Events
Theorem: If E1, E2, … is a sequence of pairwise disjoint events in a sample space S, then
see Exercises 36 and 37 for the proof
Conditional Probability
Definition: Let E and F be events with p(F) > 0. The conditional probability of E given F, denoted by P(E|F), is defined as:
Example: A bit string of length four is generated at random so that each of the 16 bit strings of length 4 is equally likely. What is the probability that it contains at least two consecutive 0s, given that its first bit is a 0?
Solution: Let E be the event that the bit string contains at least two consecutive 0s, and F be the event that the first bit is a 0.
Since E ⋂ F = {0000, 0001, 0010, 0011, 0100}, p(E⋂F)=5/16.
Because 8 bit strings of length 4 start with a 0, p(F) = 8/16= 1⁄2. Hence,
Conditional Probability
Example: What is the conditional probability that a family with two children has two boys, given that they have at least one boy. Assume that each of the possibilities BB, BG, GB, and GG is equally likely where B represents a boy and G represents a girl.
Conditional Probability
Example: What is the conditional probability that a family with two children has two boys, given that they have at least one boy. Assume that each of the possibilities BB, BG, GB, and GG is equally likely where B represents a boy and G represents a girl.
Solution: Let E be the event that the family has two boysandlet Fbetheeventthatthefamilyhasatleast one boy. Then E = {BB}, F = {BB, BG, GB}, and
E ⋂ F = {BB}.
Itfollowsthatp(F)=3/4and p(E⋂F)=1/4. Hence,
Independence
Definition: The events E and F are independent if and only if p(E⋂F) = p(E)p(F).
Example: Suppose E is the event that a randomly generated bit string of length four begins with a 1 and F is the event that this bit string contains an even number of 1s. Are E and F independent if the 16 bit strings of length four are equally likely?
Solution: There are eight bit strings of length four that begin with a 1, and eight bit strings of length four that contain an even number of 1s.
Since the number of bit strings of length 4 is 16, p(E) = p(F) = 8/16 = 1⁄2.
Since E⋂F = {1111, 1100, 1010, 1001}, p(E⋂F) = 4/16=1/4. We conclude that E and F are independent, because
p(E⋂F) =1/4 = (1⁄2) (1⁄2)= p(E) p(F)
Independence
Example: Assume (as in the previous example) that each of the four ways a family can have two children (BB, GG, BG,GB) is equally likely. Are the events E, that a family with two children has two boys, and F, that a family with two children has at least one boy, independent?
Independence
Example: Assume (as in the previous example) that each of the four ways a family can have two children (BB, GG, BG,GB) is equally likely. Are the events E, that a family with two children has two boys, and F, that a family with two children has at least one boy, independent?
Solution: Because E = {BB}, p(E) = 1/4. We saw previouslythatthatp(F)=3/4and p(E⋂F)=1/4.The events E and F are not independent since
p(E) p(F) = 3/16 ≠ 1/4= p(E⋂F) .
Pairwise and Mutual Independence
Definition: The events E1, E2, …, En are pairwise independent if and only if p(E ⋂E ) = p(E ) p(E ) for all
pairs i and j with i ≤ j ≤ n.
The events are mutually independent if
whenever i , j = 1,2,…., m, are integers with 1≤i1j
i := i + 1 m := aj
fork:=0toj −i−1 aj-k := aj-k-1
ai := m
{Nowa,…,a isinincreasingorder}
1 n continued →
Average-Case Complexity of
Insertion Sort
Solution: Let X be the random variable equal to the number of comparisons used by insertion sort to sort a list of a1, a2, …., an distinct elements. E(X) is the average number of comparisons.
Let Xi be the random variable equal to the number of comparisons used to insert a into the proper position after the first i −1 elements
12 i-1
SinceX=X +X +∙∙∙ + X ,
i
a ,a ,….,a havebeensorted.
23n
E(X)=E(X +X +∙∙∙ +X )=E(X )+E(X )+∙∙∙+E(X ).
23n23n
To find E(X ) for i = 2,3,…,n, let p (k) be the probability that the
thatis,max(a,a ,….,a )=a ,where1≤k≤j. 12jk
ij
largest of the first j elements in the list occurs at the kth position,
Assume uniform distribution; p (k) = 1/j .
Then X(k) = k. i
j
continued →
Average-Case Complexity of
Insertion Sort
Since ai could be inserted into any of the first i positions
It follows that
Hence, the average-case complexity is .
The Geometric Distribution
Definition 2: A random variable X has geometric distribution with parameter p if p(X = k) = (1 − p)k-1p for k =1,2,3,…,wherepisarealnumberwith0≤p≤ 1.
Theorem 4: If the random variable X has the geometric distribution with parameter p, then E(X) = 1/p.
Example: Supposetheprobabilitythatacoincomesup tails is p. What is the expected number of flips until this coin comes up tails?
The sample space is {T, HT, HHT, HHHT, HHHHT, …}.
Let X be the random variable equal to the number of flips in an element of the sample space; X(T) = 1, X(HT) = 2,
see text for full details
X(HHT) = 3, etc.
By Theorem 4, E(X) = 1/p.
Independent Random Variables
Definition 3: The random variables X and Y on a sample space S are independent if
p(X = r and Y = r ) = p(X = r )∙ p(Y = r ). 1212
Theorem 5: If X and Y are independent variables on a sample space S, then E(XY) = E(X)E(Y).
see text for the proof
Variance
Deviation: The deviation of X at s ∊ S is X(s) − E(X), the difference between the value of X and the mean of X.
Definition 4: Let X be a random variable on the sample space S. The variance of X, denoted by V(X) is
That is V(X) is the weighted average of the square of the deviation of X. The standard deviation of X, denoted by σ(X) is defined to be
V(X) = E(X2) − E(X)2.
Theorem 6: If X is a random variable on a sample space S, then
see text for the proof
Corollary 1: If X is a random variable on a sample space S and E(X) = μ , then
V(X) = E((X −μ)2).
see text for the proof.
Variance
Example: What is the variance of the random variable X, where X(t) = 1 if a Bernoulli trial is a success and X(t) = 0 if it is a failure, where p is the probability of success and q is the probability of failure?
Solution: Because X takes only the values 0 and 1, it follows that X2(t) = X(t).
Hence, V(X)=E(X2)−E(X)2 =p−p2=p(1−p)=pq.
Variance of the Value of a Die: What is the variance of a random variable X,
where X is the number that comes up when a fair die is rolled?
Solution: We have V(X) = E(X2) − E(X)2 . In an earlier example, we saw that
E(X) = 7/2. Note that
E(X2)=1/6(12 +22 +32 +42 +52 +62)=91/6.
We conclude that
Variance
Irenée-Jules Bienaymé (1796-1878)
Bienaymé‘sFormula: IfXandYaretwoindependentrandomvariablesonasample space S, then V(X + Y) = V(X) + V(Y). Furthermore, if X , i = 1,2, …,n, with n a positive
V(X1 +X2 +∙∙∙ +Xn)=V(X1)+V(X2)+∙∙∙ +V(Xn).
see text for the proof
i integer, are pairwise independent random variables on S, then
Example: Find the variance of the number of successes when n independent Bernoulli trials are performed, where on each trial, p is the probability of success and q is the probability of failure.
Solution: Let X be the random variable with X ((t , t , …., t )) = 1 if trial t is a success ii12ni
and X ((t , t , …., t )) = 0 if it is a failure. Let X = X + X + …. X . Then X counts the i12n 23n
number of successes in the n trials.
By Bienaymé ‘s Formula, it follows that V(X)= V(X ) + V(X ) + ∙∙∙ + V(X ).
By the previous example ,V(X ) = pq for i = 1,2, …,n.
Hence, V(X) = npq.
i
12n
Pafnuty Lvovich Chebyshev (1821-1894)
Chebyshev’s Inequality
Chebyschev’s Inequality: Let X be a random variable on a sample space S with probability function p. If r is a positive real number, then
p(|X(s) − E(X)| ≥ r ) ≤ V(X)/r2. see text for the proof
Example: Suppose that X is a random variable that counts the number of tails when a fair coin is tossed n times. Note that X is the number of successes when n independent Bernoulli trials, each with probability of success 1⁄2 are done. Hence, (by Theorem 2) E(X) = n/2 and (by Example 18) V(X) = n/4.
By Chebyschev’s inequality with r = √n, p(|X(s)−n/2 |≥√n)≤(n/4)(√n)2 =1⁄4.
This means that the probability that the number of tails that come up on n tossesdeviatesfromthemean,n/2,bymorethan√n isnolargerthan1⁄4.