probability question need in 10 hrs
Chapter 5
Discrete Probability
We all have some intuitive understanding of the notions of “chance” and “probability”. When buying a lottery ticket, we know that there is a chance of winning the jackpot, but we also know that this chance is very small. Before leaving home in the morning, we check the weather forecast and see that, with probability 80%, we get 15 centimetres of snow in Ottawa. In this chapter, we will give a formal definition of this notion of “probability”. We start by presenting a surprising application of probability and random numbers.
5.1 Anonymous Broadcasting
Consider a group of n people P1,P2, . . . ,Pn, for some integer n ≥ 3. One person in this group, say Pk, would like to broadcast, anonymously, a message to all other people in the group. That is, Pk wants to broadcast a message such that
• everybody in the group receives the message,
• nobody knows that the message was broadcast by Pk.
In other words, when Pi (with i 6= k) receives the message, he only knows that it was broadcast by one of P1, . . . ,Pi−1,Pi+1, . . . ,Pn; Pi cannot determine who broadcast the message.
At first sight, it seems to be impossible to do this. In 1988, however, David Chaum published, in the Journal of Cryptology, a surprisingly simple
96 Chapter 5. Discrete Probability
protocol that does achieve this. Chaum referred to the problem as the Dining Cryptographers Problem.
We will present and analyze the protocol for the case when n = 3. Thus, there are three people P1, P2, and P3. We assume that exactly one of them broadcasts a message and refer to this person as the broadcaster. We also assume that the message is a binary string. The broadcaster will announce the message one bit at a time.
The three people P1, P2, and P3 sit at a table, in clockwise order of their indices. Let b be the current bit that the broadcaster wants to announce. The protocol for broadcasting this bit is as follows:
Step 1: Each person Pi generates a random bit bi, for example by flipping a coin. Thus, with 50% probability, bi = 0 and with 50% probability, bi = 1.
Step 2: Each person Pi shows his bit bi to his clockwise neighbor.
P1
P2
P3
b1
b2
b3
At the end of this second step,
• P1 knows b1 and b3, but not b2,
• P2 knows b1 and b2, but not b3,
• P3 knows b2 and b3, but not b1.
Step 3: Each person Pi computes the sum si (modulo 2) of the bits that he knows. Thus,
• P1 computes s1 = (b1 + b3) mod 2,
• P2 computes s2 = (b1 + b2) mod 2,
• P3 computes s3 = (b2 + b3) mod 2.
Step 4: Each person Pi does the following:
5.1. Anonymous Broadcasting 97
• If Pi is not the broadcaster, he sets ti = si.
• If Pi is the broadcaster, he sets ti = (si + b) mod 2. Recall that b is the current bit that the broadcaster wants to announce. (Thus, if b = 1, then Pi “secretly” flips the bit si and stores the result in ti.)
Step 5: Each person Pi shows his bit ti to the other two people.
Step 6: Each person Pi computes the sum (modulo 2) of the three bits t1, t2, and t3, i.e., the value (t1 + t2 + t3) mod 2.
This concludes the description of the protocol for broadcasting one bit b. Observe that for any bit x, we have (x + x) mod 2 = 0. Therefore, the bit computed in the last step is equal to
t1 + t2 + t3 = s1 + s2 + s3 + b
= (b1 + b3) + (b1 + b2) + (b2 + b3) + b
= (b1 + b1) + (b2 + b2) + (b3 + b3) + b
= b,
where all arithmetic is done modulo 2. In other words, the bit computed in the last step is equal to the bit that the broadcaster wants to announce. This shows that each person in the group receives this bit.
It remains to show that a non-broadcaster cannot determine who broad- cast the bit. In the analysis below, we assume that
• b = 1, i.e., the broadcaster announces the bit 1,
• P2 is not the broadcaster.
We have to show that P2 cannot determine whether P1 or P3 is the broad- caster. Note that P2 knows the values
b1,b2, t1, t2, t3,
but does not know the bit b3. We consider the cases when b1 = b2 and b1 6= b2 separately.
Case 1: b1 = b2. This case has two subcases depending on the value of b3.
Case 1.1: b3 = b1; thus, all three b-bits are equal.
98 Chapter 5. Discrete Probability
• If P1 is the broadcaster, then (all arithmetic is done modulo 2)
t1 = s1 + 1 = b1 + b3 + 1 = 1
and
t3 = s3 = b2 + b3 = 0.
• If P3 is the broadcaster, then
t1 = s1 = b1 + b3 = 0
and
t3 = s3 + 1 = b2 + b3 + 1 = 1.
Thus, the broadcaster is the person whose t-bit is equal to 1.
Case 1.2: b3 6= b1 and, thus, b3 6= b2.
• If P1 is the broadcaster, then
t1 = s1 + 1 = b1 + b3 + 1 = 0
and
t3 = s3 = b2 + b3 = 1.
• If P3 is the broadcaster, then
t1 = s1 = b1 + b3 = 1
and
t3 = s3 + 1 = b2 + b3 + 1 = 0.
Thus, the broadcaster is the person whose t-bit is equal to 0.
Since P2 knows b1 and b2, he knows when Case 1 occurs. Since P2 does not know b3, however, he cannot determine whether Case 1.1 or 1.2 occurs. As a result, P2 cannot determine whether P1 or P3 is the broadcaster.
Case 2: b1 6= b2. This case has two subcases depending on the value of b3. Case 2.1: b3 = b1 and, thus, b3 6= b2.
5.1. Anonymous Broadcasting 99
• If P1 is the broadcaster, then
t1 = s1 + 1 = b1 + b3 + 1 = 1
and t3 = s3 = b2 + b3 = 1.
• If P3 is the broadcaster, then
t1 = s1 = b1 + b3 = 0
and t3 = s3 + 1 = b2 + b3 + 1 = 0.
Thus, t1 is always equal to t3, no matter whether P1 or P3 is the broadcaster.
Case 2.2: b3 6= b1 and, thus, b3 = b2.
• If P1 is the broadcaster, then
t1 = s1 + 1 = b1 + b3 + 1 = 0
and t3 = s3 = b2 + b3 = 0.
• If P3 is the broadcaster, then
t1 = s1 = b1 + b3 = 1
and t3 = s3 + 1 = b2 + b3 + 1 = 1.
Thus, t1 is always equal to t3, no matter whether P1 or P3 is the broadcaster.
Since P2 knows b1 and b2, he knows when Case 2 occurs. Since P2 does not know b3, however, he cannot determine whether Case 2.1 or 2.2 occurs. As in Case 1, P2 cannot determine whether P1 or P3 is the broadcaster.
We conclude from Cases 1 and 2 that the broadcasting of the bit b = 1 is indeed anonymous. Now consider the case when the bit b to be announced is equal to 0. It follows from the protocol that in this case, there is no “secret bit flipping” done in Step 4 and all three people use the same rules to compute the s-values and the t-values. In this case, t1 = s1, t2 = s2, and
100 Chapter 5. Discrete Probability
t3 = s3, and P2 can determine the bit b3. He cannot, however, determine whether P1 or P3 is the broadcaster.
To conclude this section, we remark that for each bit to be announced, the entire protocol must be followed. That is, in each round of the protocol, one bit is broadcast and each person Pi must flip a coin to determine the bit bi that is used in this round. We also remark that the protocol works only if exactly one person is the broadcaster.
5.2 Probability Spaces
In this section, we give a formal definition of the notion of “probability” in terms of sets and functions.
Definition 5.2.1 A sample space S is a non-empty countable set. Each element of S is called an outcome and each subset of S is called an event.
In daily life, we express probabilities in terms of percentages. For exam- ple, the weather forecast may tell us that, with 80% probability, we will be getting a snow storm today. In probability theory, probabilities are expressed in terms of numbers in the interval [0, 1]. A probability of 80% becomes a probability of 0.8.
Definition 5.2.2 Let S be a sample space. A probability function on S is a function Pr : S → R such that
• for all ω ∈ S, 0 ≤ Pr(ω) ≤ 1, and
• ∑
ω∈S Pr(ω) = 1.
For any outcome ω in the sample space S, we will refer to Pr(ω) as the probability that the outcome is equal to ω.
Definition 5.2.3 A probability space is a pair (S, Pr), where S is a sample space and Pr is a probability function on S.
A probability function Pr maps each element of the sample space S (i.e., each outcome) to a real number in the interval [0, 1]. It turns out to be useful
5.2. Probability Spaces 101
to extend this function so that it maps any event to a real number in [0, 1]. If A is an event (i.e., A ⊆ S), then we define
Pr(A) = ∑ ω∈A
Pr(ω). (5.1)
We will refer to Pr(A) as the probability that the event A occurs. Note that since S ⊆ S, the entire sample space S is an event and
Pr(S) = ∑ ω∈S
Pr(ω) = 1,
where the last equality follows from the second condition in Definition 5.2.2.
Let us consider some examples. Assume we flip a coin. Since there are two possible outcomes (either the coin comes up heads (H) or it comes up tails (T)), the sample space is the set S = {H,T}. If the coin is fair, i.e., the probabilities of H and T are equal, then the probability function Pr : S → R is given by
Pr(H) = 1/2
and Pr(T) = 1/2.
Observe that this function Pr satisfies the two conditions in Definition 5.2.2. Since this sample space has two elements, there are four events, one event for each subset. These events are
∅,{H},{T},{H,T},
and it follows from (5.1) that
Pr(∅) = 0,
Pr({H}) = Pr(H) = 1/2, Pr({T}) = Pr(T) = 1/2,
and Pr({H,T}) = Pr(H) + Pr(T) = 1/2 + 1/2 = 1.
If we flip a fair coin twice, then there are four possible outcomes, and the sample space becomes S = {HH,HT,TH,TT}. For example, HT indicates
102 Chapter 5. Discrete Probability
that the first flip resulted in heads, whereas the second flip resulted in tails. In this case, the probability function Pr : S → R is given by
Pr(HH) = Pr(HT) = Pr(TH) = Pr(TT) = 1/4.
Observe again that this function Pr satisfies the two conditions in Defini- tion 5.2.2. Since the sample space consists of 4 elements, the number of possible events is equal to 24 = 16. For example, A = {HT,TH} is an event and it follows from (5.1) that
Pr(A) = Pr(HT) + Pr(TH) = 1/4 + 1/4 = 1/2.
In words, when flipping a fair coin twice, the probability that we see one heads and one tails (without specifying the order) is equal to 1/2.
As a final example, if we roll a fair die, then there are six possible outcomes (1, 2, 3, 4, 5, and 6), each one occurring with probability 1/6. If we roll this die twice, we obtain the sample space
S = {(i,j) : 1 ≤ i ≤ 6, 1 ≤ j ≤ 6},
where i is the result of the first roll and j is the result of the second roll. Note that |S| = 6×6 = 36. Since the die is fair, each outcome has the same prob- ability. Therefore, in order to satisfy the two conditions in Definition 5.2.2, we must have
Pr(i,j) = 1/36
for each outcome (i,j) in S. If we are interested in the sum of the results of the two rolls of the die,
then we define the event
Ak = “the sum of the results of the two rolls is equal to k”,
which, using the notation of sets, is the same as
Ak = {(i,j) ∈ S : i + j = k}.
In the matrix below, the leftmost column gives the result of the first roll, the top row gives the result of the second roll, and each other entry is the sum of the results of the two corresponding rolls.
5.2. Probability Spaces 103
1 2 3 4 5 6
1 2 3 4 5 6 7 2 3 4 5 6 7 8 3 4 5 6 7 8 9 4 5 6 7 8 9 10 5 6 7 8 9 10 11 6 7 8 9 10 11 12
As can be seen from this matrix, the event Ak is non-empty only if k ∈ {2, 3, . . . , 12}. For any other k, the event Ak is empty, which means that it can never happen.
It follows from (5.1) that
Pr(Ak) = ∑
(i,j)∈Ak
Pr(i,j) = ∑
(i,j)∈Ak
1/36 = |Ak|/36.
For example, if k = 4, you can see in the matrix that the event A4 has three elements, i.e., there are 3 possible outcomes of two rolls that result in a sum of 4. These outcomes are (3, 1), (2, 2), and (1, 3). It follows that
Pr(A4) = 3/36 = 1/12.
In a similar way, we see that
Pr(A2) = Pr(A12) = 1/36,
Pr(A3) = Pr(A11) = 2/36 = 1/18,
Pr(A4) = Pr(A10) = 3/36 = 1/12,
Pr(A5) = Pr(A9) = 4/36 = 1/9,
Pr(A6) = Pr(A8) = 5/36,
and Pr(A7) = 6/36 = 1/6.
A sample space is not necessarily uniquely defined. In the last example, where we are interested in the sum of the results of two rolls of a die, we could also have taken the sample space to be the set
S′ = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}.
104 Chapter 5. Discrete Probability
The probability function Pr′ corresponding to this sample space S′ is given by
Pr′(k) = Pr(Ak),
because Pr′(k) is the probability that we get the outcome k in the sample space S′, which is the same as the probability that event Ak occurs in the sample space S. You should verify that this function Pr′ satisfies the two conditions in Definition 5.2.2 and, thus, is a valid probability function on S′.
5.3. Basic Rules of Probability 105
5.3 Basic Rules of Probability
In this section, we prove some basic properties of probability functions. As we will see, all of these follow from Definition 5.2.2. Throughout this section, (S, Pr) is a probability space.
Recall that an event is a subset of the sample space S. In particular, ∅ is an event. Intuitively, Pr(∅) must be zero, because it is the probability that there is no outcome, which can never happen. The following lemma states that this is indeed the case.
Lemma 5.3.1 Pr(∅) = 0.
Proof. By (5.1), we have
Pr(∅) = ∑ ω∈∅
Pr(ω).
Since there are zero terms in this summation, its value is equal to zero.
We say that two events A and B are disjoint, if A∩B = ∅. A sequence A1,A2, . . . ,An of events is pairwise disjoint, if any pair in this sequence is disjoint. The following lemma is similar to the Sum Rule of Section 3.4.
Lemma 5.3.2 If A1,A2, . . . ,An is a sequence of pairwise disjoint events, then
Pr(A1 ∪A2 ∪·· ·∪An) = n∑ i=1
Pr(Ai).
Proof. Define A = A1 ∪A2 ∪·· ·∪An. Using (5.1), we get
Pr(A) = ∑ ω∈A
Pr(ω)
= n∑ i=1
∑ ω∈Ai
Pr(ω)
= n∑ i=1
Pr(Ai).
106 Chapter 5. Discrete Probability
To give an example, assume we roll a fair die twice. What is the proba- bility that the sum of the two results is even? If you look at the matrix in Section 5.2, then you see that there are 18 entries, out of 36, that are even. Therefore, the probability of having an even sum is equal to 18/36 = 1/2. Below we will give a different way to determine this probability.
The sample space is the set
S = {(i,j) : 1 ≤ i ≤ 6, 1 ≤ j ≤ 6},
where i is the result of the first roll and j is the result of the second roll. Each element of S has the same probability 1/36 of being an outcome of rolling the die twice.
The event we are interested in is
A = {(i,j) ∈ S : i + j is even}.
Observe that i + j is even if and only if both i and j are even or both i and j are odd. Therefore, we split the event A into two disjoint events
A1 = {(i,j) ∈ S : both i and j are even}
and
A2 = {(i,j) ∈ S : both i and j are odd}. By Lemma 5.3.2, we have
Pr(A) = Pr(A1) + Pr(A2).
The set A1 has 3 · 3 = 9 elements, because there are 3 choices for i and 3 choices for j. Similarly, the set A2 has 9 elements. It follows that
Pr(A) = Pr(A1) + Pr(A2) = 9/36 + 9/36 = 1/2.
In the next lemma, we relate the probability that an event happens to the probability that the event does not happen. If A is an event, then A denotes its complement, i.e., A = S \ A. Intuitively, the sum of Pr(A) and Pr(A) must be equal to one, because either event A happens or event A does not happen. The following lemma states that this is indeed the case. Observe that this is similar to the Complement Rule of Section 3.3.
5.3. Basic Rules of Probability 107
Lemma 5.3.3 For any event A,
Pr(A) = 1 − Pr(A).
Proof. Since A and A are disjoint and S = A∪A, it follows from Lemma 5.3.2 that
Pr(A) + Pr(A) = Pr(S).
We have seen in Section 5.2 that Pr(S) = 1.
Consider again the sample space that we saw after Lemma 5.3.2. We showed that, when rolling a fair die twice, we get an even sum with probabil- ity 1/2. It follows from Lemma 5.3.3 that we get an odd sum with probability 1 − 1/2 = 1/2.
The next lemma is similar to the Principle of Inclusion and Exclusion that we have seen in Section 3.5.
Lemma 5.3.4 If A and B are events, then
Pr(A∪B) = Pr(A) + Pr(B) − Pr(A∩B).
Proof. Since B \A and A∩B are disjoint and B = (B \A) ∪ (A∩B), it follows from Lemma 5.3.2 that
Pr(B) = Pr(B \A) + Pr(A∩B).
Next observe that A and B \A are disjoint. Since A∪B = A∪ (B \A), we again apply Lemma 5.3.2 and obtain
Pr(A∪B) = Pr(A) + Pr(B \A).
By combining these two equations, we obtain the claim in the lemma.
To give an example, assume we choose a number x in the sample space S = {1, 2, . . . , 1000}, such that each element has the same probability 1/1000 of being chosen. What is the probability that x is divisible by 2 or 3? Define the events
A = {i ∈ S : i is divisible by 2} and
B = {i ∈ S : i is divisible by 3}.
108 Chapter 5. Discrete Probability
Then we want to determine Pr(A∪B), which, by Lemma 5.3.4, is equal to
Pr(A∪B) = Pr(A) + Pr(B) − Pr(A∩B).
Since there are b1000/2c = 500 even numbers in S, we have
Pr(A) = 500/1000.
Since there are b1000/3c = 333 elements in S that are divisible by 3, we have
Pr(B) = 333/1000.
Observe that i belongs to A∩B if and only if i is divisible by 6, i.e.,
A∩B = {i ∈ S : i is divisible by 6}.
Since there are b1000/6c = 166 elements in S that are divisible by 6, we have
Pr(A∩B) = 166/1000.
We conclude that
Pr(A∪B) = Pr(A) + Pr(B) − Pr(A∩B) = 500/1000 + 333/1000 − 166/1000 = 667/1000.
Lemma 5.3.5 (Union Bound) For any integer n ≥ 1, if A1,A2, . . . ,An is a sequence of events, then
Pr(A1 ∪A2 ∪·· ·∪An) ≤ n∑ i=1
Pr(Ai).
Proof. The proof is by induction on n. If n = 1, we have equality and, thus, the claim obviously holds. Let n ≥ 2 and assume the claim is true for n−1, i.e., assume that
Pr(A1 ∪A2 ∪·· ·∪An−1) ≤ n−1∑ i=1
Pr(Ai).
Define B = A1 ∪A2 ∪·· ·∪An−1. Then it follows from Lemma 5.3.4 that
Pr(B ∪An) = Pr(B) + Pr(An) − Pr(B ∩An) ≤ Pr(B) + Pr(An).
5.4. Uniform Probability Spaces 109
Since we assumed that
Pr(B) ≤ n−1∑ i=1
Pr(Ai),
it follows that
Pr(A1 ∪A2 ∪·· ·∪An) = Pr(B ∪An) ≤ Pr(B) + Pr(An)
≤ n−1∑ i=1
Pr(Ai) + Pr(An)
= n∑ i=1
Pr(Ai).
5.4 Uniform Probability Spaces
In this section, we consider finite sample spaces S in which each outcome has the same probability. Since, by Definition 5.2.2, all probabilities add up to one, the probability of each outcome must be equal to 1/|S|.
Definition 5.4.1 A uniform probability space is a pair (S, Pr), where S is a finite sample space and the probability function Pr satisfies
Pr(ω) = 1
|S| for each outcome ω in S.
The probability spaces that we have seen in Section 5.2 are all uniform, except the space (S′, Pr′) that we saw at the end of that section.
To give another example, when playing1 Lotto 6/49, you choose a 6- element subset of the set A = {1, 2, . . . , 49}. Twice a week, the Ontario Lottery and Gaming Corporation (OLG) draws the six “winning numbers” uniformly at random from S. If your numbers are equal to the ones drawn by OLG, then you can withdraw from this course and spend the rest of your life on the beach. Most people would find it silly to choose the subset
1Of course, you should never waste money on this!
110 Chapter 5. Discrete Probability
{1, 2, 3, 4, 5, 6}; they would argue that it is better to choose, for example, the subset {2, 5, 16, 36, 41, 43}. Is this true?
For this example, the sample space is the set S consisting of all 6-element subsets of A. Since S has size
( 49 6
) and the subset drawn by OLG is uniform,
each outcome (i.e., each 6-element subset of S) has a probability of
1( 49 6
) = 1 13, 983, 816
≈ 0.000000072.
In particular, both {1, 2, 3, 4, 5, 6} and {2, 5, 16, 36, 41, 43} have the same probability of being the winning numbers. (Still, the latter subset was drawn by OLG on February 8, 2014.)
The lemma below states that in a uniform probability space (S, Pr), the probability of an event A is just the ratio of the size of A and the size of S.
Lemma 5.4.2 If (S, Pr) is a uniform probability space and A is an event, then
Pr(A) = |A| |S| .
Proof. By applying (5.1), we get
Pr(A) = ∑ ω∈A
Pr(ω) = ∑ ω∈A
1
|S| = |A| |S| .
In a standard deck of 52 cards, each card has a suit and a rank. There are four suits (spades ♠, hearts ♥, clubs ♣, and diamonds ♦), and 13 ranks (Ace, 2, 3, 4, 5, 6, 7, 8, 9, 10, Jack, Queen, and King).
A hand of five cards is called a full house, if three of the cards are of the same rank and the other two cards are also of the same (but necessarily different) rank. For example, a hand consisting of three sevens and two Queens is a full house.
Assume we get a random hand of five cards. What is the probability that this hand is a full house? To answer this question, we start by observing that the sample space is the set S consisting of all hands of five cards. Note that
|S| = (
52
5
) = 2, 598, 960.
5.4. Uniform Probability Spaces 111
We assume that the 5-card hand is chosen uniformly at random, so that each hand has a probability of 1/|S| of being chosen.
Since we are interested in the probability of a random hand being a full house, we define the event A to be the set of all elements in S that are full houses. By Lemma 5.4.2, we have
Pr(A) = |A| |S| .
Thus, to determine Pr(A), it remains to determine the size of the set A, i.e., the total number of full houses. For this, we will use the Product Rule of Section 3.1:
• The procedure is “choose a full house”.
• Task T1: Choose the rank of the three cards in the full house. There are 13 ways to do this.
• Task T2: Choose the suits of these three cards. There are ( 4 3
) ways to
do this.
• Task T3: Choose the rank of the other two cards in the full house. There are 12 ways to do this.
• Task T4: Choose the suits of these two cards. There are ( 4 2
) ways to do
this.
Thus, the number of full houses is equal to
|A| = 13 · (
4
3
) · 12 ·
( 4
2
) = 3, 744.
We conclude that the probability of getting a full house is equal to
Pr(A) = |A| |S| =
3, 744
2, 598, 960 ≈ 0.00144.
112 Chapter 5. Discrete Probability
5.5 The Birthday Paradox
Consider a group of n people, for some integer n ≥ 2. In this section, we determine the probability pn that at least two of them have the same birthday. We will ignore leap years, so that there are 365 days in one year.
Below, we will show that p2 = 1/365. If n ≥ 366, then it follows from the Pigeonhole Principle (see Section 3.6) that there must be at least two people with the same birthday and, therefore, pn = 1. Intuitively, if n is increased from 2 to 365, the value of pn should increase. What is the value of n such that pn is larger than 1/2 for the first time? That is, what is the value of n for which pn−1 ≤ 1/2 < pn? In this section, we will see that this question can be answered using simple counting techniques that we have seen in Chapter 3.
We denote the people by P1,P2, . . . ,Pn, the number of days in one year by d, and number the days in one year as 1, 2, . . . ,d. The sample space is the set
Sn = {(b1,b2, . . . ,bn) : bi ∈{1, 2, . . . ,d} for each 1 ≤ i ≤ n},
where bi denotes the birthday of Pi. Note that |Sn| = dn. We consider the uniform probability space; thus, for each element (b1,b2, . . . ,bn) in Sn, we have
Pr(b1,b2, . . . ,bn) = 1
|Sn| =
1
dn .
The event we are interested in is
An = “at least two of the numbers in b1,b2, . . . ,bn are equal”.
Using the notation of sets, this is the same as
An = {(b1,b2, . . . ,bn) ∈ Sn : b1,b2, . . . ,bn contains duplicates}.
The probability pn that we introduced above is equal to
pn = Pr(An).
As mentioned above, the Pigeonhole Principle implies that pn = 1 for n > d. Therefore, we assume from now on that n ≤ d.
Let us start by determining p2. Since we consider the uniform probability space, we have, by Lemma 5.4.2,
p2 = Pr(A2) = |A2| |S2|
.
5.5. The Birthday Paradox 113
We know already that |S2| = d2. The event A2 is equal to
A2 = {(1, 1), (2, 2), . . . , (d,d)}.
Thus, |A2| = d and we obtain
p2 = d
d2 =
1
d .
To determine pn for larger values of n, it is easier to determine the probability of the complement An. The latter probability, together with Lemma 5.3.3, will give us the value of pn. Note that
An = {(b1,b2, . . . ,bn) ∈ Sn : all b1,b2, . . . ,bn are distinct}.
In other words, An is the set of all ordered sequences consisting of n pairwise distinct elements of {1, 2, . . . ,d}. In Section 3.7, see (3.1), we have seen that
|An| = d!
(d−n)!.
We conclude that, for any n with 2 ≤ n ≤ d,
pn = Pr(An)
= 1 − Pr(An)
= 1 − |An||Sn|
= 1 − d! (d−n)!dn .
By taking d = 365, we get p22 = 0.476 and p23 = 0.507. Thus, in a random group of 23 people, the probability that at least two of them have the same birthday is more than 50%. Most people are very surprised when they see this for the first time, because our intuition says that a much larger group is needed to have a probability of more than 50%. The values pn approach 1 pretty fast. For example, p100 = 0.9999997.
When we derived the expression for pn, we did not use the fact that the value of d is equal to 365. In other words, the expression is valid for any value of d. For general values of d, we can interpret the birthday problem in the following way: Consider d boxes B1,B2, . . . ,Bd, where d is a large integer.
114 Chapter 5. Discrete Probability
Assume that we throw n balls into these boxes so that each ball lands in a random box. Then pn is the probability that there is at least one box that contains more than one ball. Since it is not easy to see how the expression
pn = 1 − d!
(d−n)!dn
depends on n, we will approximate it by a simpler expression. For this, we use the inequality
1 −x ≤ e−x, (5.2) which is valid for any real number x. If x is close to zero, then the inequality is very accurate. The easiest way to prove this inequality is by determining the minimum of the function f(x) = x + e−x using techniques from calculus.
If we define qn = 1 −pn, then we have
qn = d!
(d−n)!dn .
Using (5.2), we get
qn = d
d · d− 1
d · d− 2
d · d− 3
d · · · d− (n− 1)
d
= d− 1 d · d− 2
d · d− 3
d · · · d− (n− 1)
d = (1 − 1/d) · (1 − 2/d) · (1 − 3/d) · · ·(1 − (n− 1)/d) ≤ e−1/d ·e−2/d ·e−3/d · · ·e−(n−1)/d = e−(1+2+3+···+(n−1))/d
= e−n(n−1)/(2d),
and therefore, pn = 1 − qn ≥ 1 −e−n(n−1)/(2d).
If both d and n are large, then n(n − 1)/(2d) is very close to n2/(2d) and, thus,
pn & 1 −e−n 2/(2d).
If we take n = √
2d, then we get
pn & 1 −e−1 ≈ 0.632. Thus, if we throw
√ 2d balls into d boxes, then with probability (approxi-
mately) at least 1 − 1/e, there is a box that contains more than one ball.
5.6. The Big Box Problem 115
5.6 The Big Box Problem
Keith chooses two distinct elements x and y, with x < y, from the set A = {0, 1, 2, . . . , 100}. He takes two identical boxes, and puts x dollars in one box and y dollars in the other box. Then Keith closes the two boxes, shuffles them, and puts them on a table. At this moment, we neither know x nor y; the only information we have is that these are distinct elements from the set A.
or$x $y $x$y
We will refer to the box containing x dollars as the small box and to the box containing y dollars as the big box. Our goal is to find the big box. We are allowed to do the following:
1. We can choose one of the two boxes, open it, and count how much money is inside it.
2. Now we have to make our final decision: Either we keep the box we just opened or we take the other box.
For example, assume that the box we pick in the first step contains $33. Then we know that the other box contains either less than $33 or more than $33. It seems that the only reasonable thing to do is to flip a fair coin when making our final decision. If we do that, then we find the big box with probability 0.5.
In the rest of this section, we will show the surprising result that we can find the big box with probability at least 0.505.
The idea is as follows. Assume that we know a number z such that x < z < y. (Keep in mind that we do not know x and we do not know y.)
• If the box we choose in the first step contains more than z dollars, then we know that this is the big box and, therefore, we keep it.
• If the box we choose in the first step contains less than z dollars, then we know that this is the small box and, therefore, we take the other box.
116 Chapter 5. Discrete Probability
Thus, if we know this number z with x < z < y, then we are guaranteed to find the big box.
Of course, it is not realistic to assume that we know this magic number z. The trick is to choose a random z. If it is between x and y, then we find the big box with probability 1; otherwise, we find the big box with probability 1/2. As we will see later, the overall probability of finding the big box will be 0.505.
In order to avoid the case when z = x or z = y, we will choose z from the set
B = {1/2, 3/2, 5/2, . . . , 100 − 1/2}. Our algorithm that attempts to find the big box does the following:
Algorithm FindBigBox:
Step 1: Choose one of the two boxes uniformly at random, open it, and count the amount of money inside it; let this amount be a.
Step 2: Choose z uniformly at random from the set B.
Step 3: Do the following:
• If a > z, then keep the box chosen in Step 1.
• Otherwise (i.e., if a < z), take the other box.
We are now going to determine the probability that this algorithm finds the big box. First, we have to ask ourselves what the sample space S is. There are two places in the algorithm where a random element is obtained:
• In Step 1, we obtain the element a, which is a random element from the set {x,y}. We know that this value a is equal to one of x and y. However, at this moment, we do not know whether a = x or a = y.
• In Step 2, we obtain a random element from the set B.
Based on this, the sample space is the Cartesian product
S = {x,y}×B = {(a,z) : a ∈{x,y},z ∈ B}
and Steps 1 and 2 can be replaced by
• choose a uniformly random element (a,z) in S.
5.6. The Big Box Problem 117
We say that algorithm FindBigBox is successful if it finds the big box. The next step is to write the event
W = “algorithm FindBigBox is successful”
as a subset of the sample space S. For this, we have to determine all elements (a,z) in S for which algorithm FindBigBox is successful.
First consider the case when a = x. In this case, the box we choose in Step 1 is the small box. There are two possibilities for z:
• If a > z, then the algorithm keeps the small box and, thus, it is not successful.
• If a < z, then the algorithm takes the other box (which is the big box) and, thus, it is successful.
Thus, the event W contains the set
Wx = {(x,z) : z ∈{x + 1/2,x + 3/2, . . . , 100 − 1/2}. The second case to consider is when a = y. In this case, the box we choose in Step 2 is the big box. Again, there are two possibilities for z:
• If a > z, then the algorithm keeps the big box and, thus, it is successful.
• If a < z, then the algorithm takes the other box (which is the small box) and, thus, it is not successful.
Thus, the event W contains the set
Wy = {(y,z) : z ∈{1/2, 3/2, . . . ,y − 1/2}. We conclude that
W = Wx ∪Wy. Since the events Wx and Wy are disjoint, the probability that algorithm FindBigBox is successful is equal to
Pr(W) = Pr(Wx) + Pr(Wy)
= |Wx| |S| +
|Wy| |S|
= 100 −x
200 +
y
200
= 1
2 + y −x 200
.
118 Chapter 5. Discrete Probability
Since x and y are distinct integers and x < y, we have y−x ≥ 1, and we get
Pr(W) ≥ 1 2
+ 1
200 = 0.505.
5.7 The Monty Hall Problem
The Monty Hall Problem is a well-known puzzle in probability theory. It is named after the host, Monty Hall, of the American television game show Let’s Make a Deal. The problem became famous in 1990, when (part of) a reader’s letter was published in Marilyn vos Savant’s column Ask Marilyn in the magazine Parade:
Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then says to you, “Do you want to pick door No. 2?” Is it to your advantage to switch your choice?
Note that the host can always open a door that has a goat behind it. After the host has opened No. 3, we know that the car is either behind No. 1 or No. 2, and it seems that both these doors have the same probability (i.e., 50%) of having the car behind them. We will prove below, however, that this is not true: It is indeed to our advantage to switch our choice.
We assume that the car is equally likely to be behind any of the three doors. Moreover, the host knows what is behind each door.
• We initially choose one of the three doors uniformly at random; this door remains closed.
• The host opens one of the other two doors that has a goat behind it.
• Our final choice is to switch to the other door that is still closed.
Let A be the event that we win the car and let B be the event that the initial door has a goat behind it. Then it is not difficult to see that event A occurs if and only if event B occurs. Therefore, the probability that we win the car is equal to
Pr(A) = Pr(B) = 2/3.
5.8. Conditional Probability and Independence 119
5.8 Conditional Probability and Independence
Anil Maheshwari has two children. We are told that one of them is a boy. What is the probability that the other child is also a boy? Most people will say that this probability is 1/2. We will show below that this is not the correct answer.
Since Anil has two children, the sample space is
S = {(b,b), (b,g), (g,b), (g,g)},
where, for example, (b,g) indicates the the youngest child is a boy and the oldest child is a girl. We assume a uniform probability function, so that each outcome has a probability of 1/4.
We are given the additional information that one of the two children is a boy, or, to be more precise, at least one of the two children is a boy. This means that the actual sample space is not S, but
{(b,b), (b,g), (g,b)}.
When asking for the probability that the other child is also a boy, we are really asking for the probability that both children are boys. Since there is only one possibility (out of three) for both children to be boys, it follows that this probability is equal to 1/3.
This is an example of a conditional probability : We are asking for the probability of an event (both children are boys) given that another event (at least one of the two children is a boy) occurs.
Definition 5.8.1 Let (S, Pr) be a probability space and let A and B be two events with Pr(B) > 0. The conditional probability Pr(A | B), pronounced as “the probability of A given B”, is defined to be
Pr(A | B) = Pr(A∩B) Pr(B)
.
Let us explain where this definition comes from. Initially, the sample space is equal to S. When we are given the additional information that event B occurs, the sample space “shrinks” to B, and event A occurs if and only if event A∩B occurs.
120 Chapter 5. Discrete Probability
A
B S
We may think that Pr(A | B) should therefore be defined to be Pr(A∩B). However, since the sum of all probabilities must be equal to 1, we have to normalize, i.e., divide by Pr(B). Equivalently, if A = B, we get Pr(A | A), which is the probability that event A occurs, given that event A occurs. This probability should be equal to 1. Indeed, using the definition, we do get
Pr(A | A) = Pr(A∩A) Pr(A)
= Pr(A)
Pr(A) = 1.
Returning to Anil’s two children, we saw that the sample space is
S = {(b,b), (b,g), (g,b), (g,g)}
and we assumed a uniform probability function. The events we considered are
A = “both children are boys”
and B = “at least one of the two children is a boy”,
and we wanted to know Pr(A | B). Writing A and B as subsets of the sample space S, we get
A = {(b,b)} B = {(b,b), (b,g), (g,b)}.
It follows that
Pr(A | B) = Pr(A∩B) Pr(B)
= Pr(A)
Pr(B) =
1/4
3/4 = 1/3,
which is the same answer as we got before.
Assume we roll a fair die and consider the events
A = “the result is 3”
5.8. Conditional Probability and Independence 121
and B = “the result is odd”.
Then
Pr(A | B) = Pr(A∩B) Pr(B)
= Pr(A)
Pr(B) =
1/6
3/6 = 1/3
and
Pr(B | A) = Pr(B ∩A) Pr(A)
= Pr(A)
Pr(A) = 1.
This shows that, in general, Pr(A | B) is not equal to Pr(B | A). Consider the event
C = “the result is a prime number”,
i.e., the result is 2, 3, or 5. We have
Pr(C | B) = Pr(C ∩B) Pr(B)
= 2/6
3/6 = 2/3
and
Pr(C | A) = Pr(C ∩A) Pr(A)
= Pr(A)
Pr(A) = 1.
Recall that A denotes the complement of A. Thus, this is the event
A = “the result is not 3”.
We have
Pr(C | A) = Pr(C ∩A) Pr(A)
= 2/6
5/6 = 2/5.
This shows that, in general, Pr(C | A) + Pr(C | A) is not equal to 1. It should be an easy exercise to verify that, for the above events A and C, Pr(A | C) + Pr(A | C) is equal to 1. Intuitively, this should be true for any two events A and C with Pr(C) > 0: When we are given that event C occurs, then either A occurs or A does not occur (in which case A occurs). The following lemma states that this intuition is indeed correct.
Lemma 5.8.2 Let A and B be events with Pr(B) > 0. Then
Pr(A | B) + Pr(A | B) = 1.
122 Chapter 5. Discrete Probability
Proof. By definition, we have
Pr(A | B) + Pr(A | B) = Pr(A∩B) Pr(B)
+ Pr(A∩B)
Pr(B)
= Pr(A∩B) + Pr(A∩B)
Pr(B) .
Since the events A∩B and A∩B are disjoint, we have, by Lemma 5.3.2,
Pr(A∩B) + Pr(A∩B) = Pr((A∩B) ∪ (A∩B)).
By drawing a Venn diagram, you will see that
(A∩B) ∪ (A∩B) = B,
implying that Pr(A∩B) + Pr(A∩B) = Pr(B).
We conclude that
Pr(A | B) + Pr(A | B) = Pr(B) Pr(B)
= 1.
Consider two events A and B with Pr(A) > 0 and Pr(B) > 0. We would like to define the notion of these two events being “independent”. Intuitively, this should express that (i) whether or not event A occurs does not depend on whether or not event B occurs and (ii) whether or not event B occurs does not depend on whether or not event A occurs. In other words, (i) Pr(A) should be equal to Pr(A | B) and (i) Pr(B) should be equal to Pr(B | A). As we will show below, the following definition exactly captures this.
Definition 5.8.3 Let (S, Pr) be a probability space and let A and B be two events. We say that A and B are independent if
Pr(A∩B) = Pr(A) · Pr(B).
Note that in this definition, it is not assumed that Pr(A) > 0 and Pr(B) > 0. If Pr(B) > 0, then
Pr(A | B) = Pr(A∩B) Pr(B)
,
5.8. Conditional Probability and Independence 123
and A and B are independent if and only if
Pr(A | B) = Pr(A).
Similarly, if Pr(A) > 0, then A and B are independent if and only if
Pr(B | A) = Pr(B).
To give an example, assume we roll two dice; thus, the sample space is
S = {(i,j) : 1 ≤ i ≤ 6, 1 ≤ j ≤ 6},
where i is the result of the first die and j is the result of the second die. We assume a uniform probability function. Thus, each outcome has a probability of 1/36.
Let D1 denote the result of the first die and let D2 denote the result of the second die. Consider the events
A = “D1 + D2 = 7”
and B = “D1 = 4”.
Are these events independent?
• Since A = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)},
we have Pr(A) = 6/36 = 1/6.
• Since B = {(4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6)},
we have Pr(B) = 6/36 = 1/6.
• Since A∩B = {(4, 3)},
we have Pr(A∩B) = 1/36.
• It follows that Pr(A∩B) = Pr(A) ·Pr(B) and we conclude that A and B are independent.
124 Chapter 5. Discrete Probability
As an exercise, you should verify that the events
A′ = “D1 + D2 = 11”
and B′ = “D1 = 5”
are not independent.
Consider two events A and B. If these events are independent, then whether or not A occurs does not depend on whether or not B occurs. Since whether or not B occurs is the same as whether the complement B does not or does occur, it should not be a surprise that the events A and B are independent as well. The following lemma states that this is indeed the case.
Lemma 5.8.4 If A and B are independent events, then A and B are also independent.
Proof. To prove that A and B are independent, we have to show that
Pr(A∩B) = Pr(A) · Pr(B).
Using Lemma 5.3.3, this is equivalent to showing that
Pr(A∩B) = Pr(A) (1 − Pr(B)) . (5.3)
Since the events A∩B and A∩B are disjoint and
A = ( A∩B
) ∪ (A∩B) ,
it follows from Lemma 5.3.2 that
Pr(A) = Pr(A∩B) + Pr(A∩B).
Since A and B are independent, we have
Pr(A∩B) = Pr(A) · Pr(B).
It follows that Pr(A) = Pr(A∩B) + Pr(A) · Pr(B),
which is equivalent to (5.3).
5.8. Conditional Probability and Independence 125
We have defined the notion of two events being independent. The follow- ing definition generalizes this in two ways to sequences of events:
Definition 5.8.5 Let (S, Pr) be a probability space, let n ≥ 2, and let A1,A2, . . . ,An be a sequence of events.
1. We say that this sequence is pairwise independent if for any two distinct indices i and j, the events Ai and Aj are independent, i.e.,
Pr(Ai ∩Aj) = Pr(Ai) · Pr(Aj).
2. We say that this sequence is mutually independent if for all k with 2 ≤ k ≤ n and all i1 < i2 < .. . < ik,
Pr(Ai1 ∩Ai2 ∩·· ·∩Aik ) = Pr(Ai1 ) · Pr(Ai2 ) · · ·Pr(Aik ).
Thus, in order to show that the sequence A1,A2, . . . ,An is pairwise inde- pendent, we have to verify
( n 2
) equalities. On the other hand, to show that
this sequence is mutually independent, we have to verify ∑n
k=2
( n k
) = 2n−1−n
equalities. For example, if we want to prove that the sequence A,B,C of events is
mutually independent, we have to show that
Pr(A∩B) = Pr(A) · Pr(B),
Pr(A∩C) = Pr(A) · Pr(C), Pr(B ∩C) = Pr(B) · Pr(C),
and Pr(A∩B ∩C) = Pr(A) · Pr(B) · Pr(C).
For example, consider flipping three fair coins and assume that the result is a uniformly random element from the sample space
S = {HHH,HHT,HTH,THH,HTT,THT,TTH,TTT},
where, e.g., HHT indicates that the first two coins come up heads and the third coin comes up tails. For i = 1, 2, 3, let ci denote the result of the i-th flip, and define the events
A = “c1 = c2”,
126 Chapter 5. Discrete Probability
B = “c2 = c3”,
and C = “c3 = c1”.
If we write these events as subsets of the sample space, we get
A = {HHH,HHT,TTH,TTT},
B = {HHH,THH,HTT,TTT}, and
C = {HHH,HTH,THT,TTT}. It follows that
Pr(A) = |A|/|S| = 4/8 = 1/2, Pr(B) = |B|/|S| = 4/8 = 1/2, Pr(C) = |C|/|S| = 4/8 = 1/2,
Pr(A∩B) = |A∩B|/|S| = 2/8 = 1/4, Pr(A∩C) = |A∩C|/|S| = 2/8 = 1/4,
and Pr(B ∩C) = |B ∩C|/|S| = 2/8 = 1/4.
Thus, the sequence A,B,C is pairwise independent. Since
Pr(A∩B ∩C) = |A∩B ∩C|/|S| = 2/8 = 1/4,
we have Pr(A∩B ∩C) 6= Pr(A) · Pr(B) · Pr(C)
and, therefore, the sequence A,B,C is not mutually independent. Of course, this is not surprising: If both events A and B occur, then event C also occurs.
We have defined an event to be a subset of a sample space. In several examples, however, we have described events in plain English as proposi- tions. Since the intersection (∩) of sets corresponds to the conjunction (∧) of propositions, we often write A∧B for the event “both A and B occur”.
For example, if we flip a coin and roll a die, the sample space is
S = {H1,H2,H3,H4,H5,H6,T1,T2,T3,T4,T5,T6}.
5.8. Conditional Probability and Independence 127
The events A = “the coin comes up heads”
and B = “the result of the die is even”
correspond to the subsets
A = {H1,H2,H3,H4,H5,H6}
and B = {H2,H4,H6,T2,T4,T6}
of the sample space S, respectively. The event that both A and B occur is written as A∧B and corresponds to the subset
A∩B = {H2,H4,H6}
of S. Assume that both the coin and the die are fair and the result of rolling
the die is independent of the result of flipping the coin. The probability that both A and B occur, i.e., Pr(A∧B), is equal to |A∩B|/|S| = 3/12 = 1/4. We can also use independence to determine this probability:
Pr(A∧B) = Pr(A) · Pr(B) = (1/2)(3/6) = 1/4.
Observe that when we determine Pr(A), we do not consider the entire sample space S. Instead, we consider the coin’s sample space, which is {H,T}. Similarly, when we determine Pr(B), we consider the die’s sample space, which is {1, 2, 3, 4, 5, 6}.
To give another example, let n ≥ 1 be an integer and assume we flip n fair coins. For each i with 1 ≤ i ≤ n, define the event
Ai = “the i-th coin comes up heads”.
We assume that the coin flips are independent of each other, by which we mean that the sequence A1,A2, . . . ,An of events is mutually independent. Define the event
A = A1 ∧A2 ∧·· ·∧An. What is Pr(A), i.e., the probability that all n coins come up heads? Since there are 2n many possible outcomes for n coin flips and only one of them
128 Chapter 5. Discrete Probability
satisfies event A, this probability is equal to 1/2n. Alternatively, we can use independence to determine Pr(A):
Pr(A) = Pr(A1 ∧A2 ∧·· ·∧An) = Pr(A1) · Pr(A2) · · ·Pr(An).
Since each coin is fair, we have Pr(Ai) = 1/2 and, thus, we get
Pr(A) = (1/2)(1/2) · · ·(1/2)︸ ︷︷ ︸ n times
= 1/2n.
Consider a circuit C that consists of n components C1,C2, . . . ,Cn. Let p be a real number with 0 < p < 1 and assume that any component fails with probability p, independently of the other components. For each i with 1 ≤ i ≤ n, define the event
Ai = “component Ci does not fail”.
Let A be the event
A = “the entire circuit does not fail”.
• Assume that the entire circuit fails when at least one component fails. What is Pr(A), i.e., the probability that the circuit does not fail? Using independence and Lemma 5.8.4, we get
Pr(A) = Pr(A1 ∧A2 ∧·· ·∧An) = Pr(A1) · Pr(A2) · · ·Pr(An) = (1 −p)(1 −p) · · ·(1 −p)︸ ︷︷ ︸
n times = (1 −p)n.
Since 0 < p < 1, we have limn→∞ Pr(A) = 0. Thus, for large values of n, it is very likely that the circuit fails.
• Now assume that the entire circuit fails when all components fail. Again, we want to know the probability Pr(A) that the circuit does
5.9. The Law of Total Probability 129
not fail. In this case, we get
Pr(A) = 1 − Pr(A) = 1 − Pr(A1 ∧A2 ∧·· ·∧An) = 1 − Pr(A1) · Pr(A2) · · ·Pr(An) = 1 −p ·p · · ·p︸ ︷︷ ︸
n times = 1 −pn.
Since 0 < p < 1, we have limn→∞ Pr(A) = 1. Thus, for large values of n, it is very likely that the circuit does not fail.
5.9 The Law of Total Probability
Both Mick and Keith have a random birthday, independent of the other person’s birthday. What is the probability that Mick and Keith have the same birthday? We have seen in Section 5.5 that this probability is equal to 1/365. A common way to determine this probability is as follows: Consider Mick’s birthday, which can be any of the 365 days of the year. By symmetry, it does not really matter what Mick’s birthday is, so we just assume that it is July 26. Then Mick and Keith have the same birthday if and only Keith’s birthday is also on July 26. Therefore, since Keith has a random birthday, the probability that Mick and Keith have the same birthday is equal to 1/365. The following theorem explains why this reasoning is correct.
Theorem 5.9.1 (Law of Total Probability) Let (S, Pr) be a probability space and let A be an event. Assume that B1,B2, . . . ,Bn is a sequence of events such that
1. Pr(Bi) > 0 for all i with 1 ≤ i ≤ n,
2. the events B1,B2, . . . ,Bn are pairwise disjoint, and
3. ⋃n i=1 Bi = S.
Then
Pr(A) = n∑ i=1
Pr(A | Bi) · Pr(Bi).
130 Chapter 5. Discrete Probability
Proof. The assumptions imply that
A = A∩S
= A∩ (
n⋃ i=1
Bi
)
= n⋃ i=1
(A∩Bi) .
Since the events A∩B1,A∩B2, . . . ,A∩Bn are pairwise disjoint, it follows from Lemma 5.3.2 that
Pr(A) = Pr
( n⋃ i=1
(A∩Bi) )
= n∑ i=1
Pr (A∩Bi) .
The theorem follows by observing that, from Definition 5.8.1,
Pr(A∩Bi) = Pr(A | Bi) · Pr(Bi).
Let us consider the three conditions in this theorem. The first condition is that Pr(Bi) > 0, which means that there is a positive chance that event Bi occurs. The second and third conditions, i.e.,
• the events B1,B2, . . . ,Bn are pairwise disjoint, and
• ⋃n i=1 Bi = S,
are equivalent to
• exactly one of the events B1,B2, . . . ,Bn is guaranteed to occur. In the example in the beginning of this section, we wanted to know Pr(A),
where A is the event
A = “Mick and Keith have the same birthday”.
In order to apply Theorem 5.9.1, we define a sequence B1,B2, . . . of events that satisfy the conditions in this theorem and for which Pr(A | Bi) is easy
5.9. The Law of Total Probability 131
to determine. For this example, we define the event Bi, for each i with 1 ≤ i ≤ 365, to be
Bi = “Mick’s birthday is on the i-th day of the year”.
It is clear that Pr(Bi) > 0 and exactly one of the events B1,B2, . . . ,B365 is guaranteed to occur. It follows that
Pr(A) = 365∑ i=1
Pr(A | Bi) · Pr(Bi).
In Pr(A | Bi), we fix Mick’s birthday to be the i-th day of the year. Given this event Bi, event A occurs if and only if Keith’s birthday is also on the i-th day. Thus, we have Pr(A | Bi) = 1/365 and it follows that
Pr(A) = 365∑ i=1
(1/365) · Pr(Bi)
= (1/365) 365∑ i=1
Pr(Bi)
= (1/365) · 1 = 1/365,
which is the same answer as we got in the beginning of this section.
We give one more example. Consider the following experiment:
• We flip a fair coin.
– If the coin comes up heads, then we roll a fair die. Let R denote the result of this die.
– If the coin comes up tails, then we roll two fair dice. Let R denote the sum of the results of these dice.
We assume that the coin and dice are independent of each other. What is the probability that the value of R is equal to 2? That is, if we define the event A to be
A = “R = 2”,
132 Chapter 5. Discrete Probability
then we want to know Pr(A). Since the value of R depends on whether the coin comes up heads or tails, we define the event
B = “the coin comes up heads”.
Since (i) both B and its complement B occur with a positive probability and (ii) exactly one of B and B is guaranteed to occur, we can apply Theo- rem 5.9.1 and get
Pr(A) = Pr(A | B) · Pr(B) + Pr(A | B) · Pr(B) = (1/6) · (1/2) + (1/36) · (1/2) = 7/72.
5.10. Choosing a Random Element in a Linked List 133
5.10 Choosing a Random Element in a Linked
List
Consider a linked list L. Each node u in L stores a pointer to its successor node succ(u). If u is the last node in L, then u does not have a successor and succ(u) = nil . We are also given a pointer to the first node head (L) of L.
u succ(u) nilhead(L)
We would like to choose a node in L, uniformly at random. Thus, if this list has n nodes, then each node must have a probability of 1/n of being chosen.
We assume that we are given a function Random: For any integer i ≥ 1, a call to Random(i) returns a uniformly random element from the set {1, 2, . . . , i}; the value returned is independent of previous calls to this function.
To make the problem interesting, we assume that we do not know the value of n, i.e., at the start, we do not know the number of nodes in the list L. Also, we are allowed to only make one pass over this list. We will prove below that the following algorithm solves the problem:
Algorithm ChooseRandomNode(L):
u = head (L); i = 1; while u 6= nil do r = Random(i);
if r = 1 then x = u endif; u = succ(u); i = i + 1
endwhile; return x
In one iteration of the while-loop, the call to Random(i) returns a uni- formly random element r from the set {1, 2, . . . , i}. If r = 1, which happens with probability 1/i, the value of x is set to the currently visited node. Thus,
134 Chapter 5. Discrete Probability
• in the first iteration, x is set to the first node of L, • in the second iteration, x is set to the second node of L with probability
1/2,
• in the third iteration, x is set to the third node of L with probability 1/3,
• in the last iteration, x is set to the last node of L with probability 1/|L|.
We now prove that the output x of algorithm ChooseRandomNode(L) is a uniformly random node of the list L. Let n denote the number of nodes in L and let v be an arbitrary node in L. Thus, we have to prove that, after the algorithm has terminated, x = v with probability 1/n.
Let k be the integer such that v is the k-th node in L; thus, 1 ≤ k ≤ n. We observe that, after the algorithm has terminated, x = v if and only if
• during the k-th iteration, the value of x is set to v, and • for all i = k + 1,k + 2, . . . ,n, during the i-th iteration, the value of x
does not change.
Define the event
A = “after the algorithm has terminated, x = v”.
For each i with 1 ≤ i ≤ n, define the event Ai = “the value of x changes during the i-th iteration”.
Then A if and only if Ak ∧Ak+1 ∧Ak+2 ∧·· ·∧An.
Since the events A1,A2, . . . ,An are mutually independent, it follows that
Pr(A) = Pr ( Ak ∧Ak+1 ∧Ak+2 ∧·· ·∧An
) = Pr (Ak) · Pr
( Ak+1
) · Pr
( Ak+2
) · · ·Pr
( An )
= 1
k
( 1 − 1
k + 1
)( 1 − 1
k + 2
) · · · (
1 − 1 n
) =
1
k · k k + 1
· k + 1 k + 2
· · · n− 1 n
= 1
n .
5.11. Long Runs in Random Bitstrings 135
5.11 Long Runs in Random Bitstrings
Let n be a large integer and assume we flip a fair coin n times, where all flips are independent. If we write 0 for heads and 1 for tails, then this gives a random bitstring
S = s1s2 . . .sn.
A run of length k is a consecutive subsequence of S, all of whose bits are the same. For example, the bitstring
00111100101000011000
contains the following consecutive subsequences in bold,
00111100101000011000,
which are runs of lengths 4, 2, and 1, respectively. Would you be surprised to see a “long” run in the random bitstring S,
say a run of length about log n? Most people will answer this question with “yes”. We will prove below, however, that the correct answer is “no”: The probability that this happens is (about) 1 − 1/n; thus, it converges to 1 when n goes to infinity. In other words, you should be surprised if a random bitstring does not contain a run of length about log n.
We choose a positive integer k and define the event
A = “S contains a run of length at least k”.
We are going to prove a lower bound on Pr(A) in terms of n and k. At the end, we show that by taking k to be slightly less than log n, we have Pr(A) ≥ 1 − 1/n.
As is often the case, it turns out to be easier to consider the complement of the event A, i.e., the event
A = “each run in S has length less than k”.
For each i with 1 ≤ i ≤ n−k + 1, we define the event
Ai = “the subsequence of length k starting at position i is a run”.
We observe that Ai occurs if and only if
si = si+1 = . . . = si+k−1 = 0
136 Chapter 5. Discrete Probability
or
si = si+1 = . . . = si+k−1 = 1.
Since the coin flips are independent, it follows that
Pr(Ai) = 1/2 k + 1/2k = 1/2k−1.
Note that the complement of Ai is the event
Ai = “the subsequence of length k starting at position i is not a run”
and
Pr ( Ai )
= 1 − 1/2k−1. Since
A if and only if A1 ∧A2 ∧·· ·∧An−k+1, it follows that
Pr ( A )
= Pr ( A1 ∧A2 ∧·· ·∧An−k+1
) .
Is the probability on the right-hand side equal to the product of the individual probabilities? If the events A1,A2, . . . ,An−k+1, are mutually independent, then the answer is “yes”. However, it should be clear that, for example, the events A1 and A2 are not independent: If we are told that event A1 occurs, then the first k bits in the sequence S are all equal; let us say they are all equal to 0. In this case, the probability that event A2 occurs is equal to the probability that the (k + 1)-st bit in S is 0, which is equal to 1/2 and not 1/2k−1. So it seems that we are stuck. However, there is a way out:
Let us assume that the integer k is chosen such that n/k is an integer. We divide the sequence S = s1s2 . . .sn into n/k blocks, each having length k. Thus,
• the first block is the subsequence s1s2 . . .sk,
• the second block is the subsequence sk+1sk+2 . . .s2k,
• the (n/k)-th block is the subsequence sn−k+1sn−k+2 . . .sn.
For each i with 1 ≤ i ≤ n/k, we define the event
Bi = “the i-th block is a run”.
5.11. Long Runs in Random Bitstrings 137
Thus, the complement of Bi is the event
Bi = “the i-th block is not a run”.
Since Bi = A(i−1)k+1 and Bi = A(i−1)k+1, we have
Pr ( Bi )
= 1 − 1/2k−1.
Observe that
• since the blocks do not overlap, the events B1,B2, . . . ,Bn/k are mutu- ally independent and
• if the event A occurs, then the event B1 ∧B2 ∧·· ·∧Bn/k also occurs.
It follows that
Pr ( A ) ≤ Pr
( B1 ∧B2 ∧·· ·∧Bn/k
) = Pr
( B1 ) · Pr
( B2 ) · · ·Pr
( Bn/k
) =
( 1 − 1/2k−1
)( 1 − 1/2k−1
) · · · ( 1 − 1/2k−1
) =
( 1 − 1/2k−1
)n/k .
Using the inequality 1 −x ≤ e−x, see (5.2), we get
1 − 1/2k−1 ≤ e−1/2k−1 = e−2/2k
and, thus,
Pr ( A ) ≤ ( e−2/2
k )n/k
= e−2n/(k2 k). (5.4)
Note that until now, k was arbitrary. We choose k to be
k = log n− 2 log log n.
Using basic properties of logarithms, see Section 2.4, we will show below that, for this choice of k, the right-hand side in (5.4) is a “nice” function of n.
In Section 2.4, we have seen that
2log n = n
and 22 log log n = log2 n.
138 Chapter 5. Discrete Probability
It follows that
2k = 2log n−2 log log n = 2log n
22 log log n =
n
log2 n .
Thus,
2n
k2k =
2 log2 n
k
= 2 log2 n
log n− 2 log log n
≥ 2 log 2 n
log n = 2 log n
= 2 ln n
ln 2 ≥ ln n,
implying that
Pr ( A ) ≤ e−2n/(k2k)
≤ e− lnn = 1/n.
We conclude that, for the value of k chosen above,
Pr(A) = 1 − Pr ( A ) ≥ 1 − 1/n.
Thus, with probability at least 1 − 1/n, a random bitstring of length n contains a run of length at least log n− 2 log log n.
We remark that we cheated a bit here, because we assumed that both k and n/k are integers. Assume that n is of the form 22
m , for some positive
integer m. Then both log n and log log n are integers and, thus, k is an integer as well. In a correct derivation, we divide the sequence S into bn/kc blocks of size k and, if n/k is not an integer, one block of length less than k. We then get
Pr ( A ) ≤
( 1 − 1/2k−1
)bn/kc ≤
( e−2/2
k )bn/kc
= e−2bn/kc/2 k
.
5.11. Long Runs in Random Bitstrings 139
For k = log n− 2 log log n, we have 2k = n/ log2 n. Since
bn/kc > n/k − 1,
we get
2bn/kc 2k
> 2(n/k − 1)
2k
= (2 log2 n)(n/k − 1)
n
= 2 log2 n
k − 2 log
2 n
n
≥ ln n− 2 log 2 n
n
and, thus,
Pr ( A ) ≤ e−2bn/kc/2k
≤ e− lnn+(2 log2 n)/n = e− lnn ·e(2 log2 n)/n = (1/n) ·
( 1 + O((log2 n)/n)
) = 1/n + O
( (log2 n)/n2
) .
This upper bound is larger than the upper bound we had before by only a small additive factor of O((log2 n)/n2).
140 Chapter 5. Discrete Probability
5.12 Infinite Probability Spaces
In Section 5.2, we defined a sample space to be any non-empty countable set. All sample spaces that we have seen so far are finite. In some cases, infinite (but countable) sample spaces arise in a natural way. To give an example, assume we flip a fair coin repeatedly and independently until it comes up heads for the first time. The sample space S is the set of sequences of all coin flips that can occur. Thus, if TnH denotes the sequence consisting of n tails followed by one heads, then
S = {TnH : n ≥ 0}, which is indeed an infinite space.
Since the coin flips are independent, the outcome TnH has a probability of (1/2)n+1, i.e.,
Pr (TnH) = (1/2)n+1.
Recall that according to Definition 5.2.2, in order for this to be a valid probability function, the sum of all probabilities must be equal to 1, i.e., the infinite series
∞∑ n=0
(1/2)n+1
must be equal to 1. Since you may have forgotten about infinite series, we recall the definition:
Definition 5.12.1 Let a0,a1,a2, . . . be an infinite sequence of real numbers. If
lim N→∞
N∑ n=0
an
exists, then we say that the infinite series ∑∞
n=0 an converges. In this case, the value of this infinite series is equal to
∞∑ n=0
an = lim N→∞
N∑ n=0
an.
For example, let x be a real number with x 6= 1, and define an = xn for n ≥ 0. We claim that
N∑ n=0
an = N∑ n=0
xn = 1 + x + x2 + · · · + xN = 1 −x N+1
1 −x ,
5.12. Infinite Probability Spaces 141
which can be proved either by induction on N or by verifying that
(1 −x) ( 1 + x + x2 + · · · + xN
) = 1 −xN+1.
If −1 < x < 1, then limN→∞ xN+1 = 0 and it follows that ∞∑ n=0
xn = lim N→∞
N∑ n=0
xn
= lim N→∞
1 −xN+1 1 −x
= 1
1 −x.
We have proved the following result:
Lemma 5.12.2 If x is a real number with −1 < x < 1, then ∞∑ n=0
xn = 1
1 −x.
Now we can return to the coin flipping example that we saw in the be- ginning of this section. If we take x = 1/2, we get
∞∑ n=0
Pr (TnH) = ∞∑ n=0
(1/2)n+1
= (1/2) ∞∑ n=0
(1/2)n
= (1/2) · 1 1 − 1/2
= 1.
Thus, Pr is indeed a valid probability function.
Consider a game in which two players P1 and P2 take turns flipping a fair coin repeatedly and independently. Thus, first P1 flips the coin, then P2 flips the coin, then P1 flips the coin, then P2 flips the coin, etc. The player who flips heads first is the winner of the game.
Who is more likely to win this game? Our intuition says that P1 has an advantage, because he is the player who starts: If the first flip is heads,
142 Chapter 5. Discrete Probability
then the game is over and P1 wins. We will prove below that this intuition is correct: P1 has a probability of 2/3 of winning the game and, thus, the winning probability of P2 is only 1/3.
The sample space S is the same as before, i.e.,
S = {TnH : n ≥ 0}.
The event A = “P1 wins the game”
corresponds to the subset
A = {TnH : n ≥ 0 and n is even},
which can be rewritten as
A = {T2mH : m ≥ 0}.
The probability that P1 wins the game is equal to
Pr(A) = ∞∑ m=0
Pr ( T2mH
) =
∞∑ m=0
(1/2)2m+1
= (1/2) ∞∑ m=0
(1/2)2m
= (1/2) ∞∑ m=0
(1/4)m.
By taking x = 1/4 in Lemma 5.12.2, we get
Pr(A) = (1/2) · 1 1 − 1/4 = 2/3.
Let B be the event
B = “P2 wins the game”.
Since either P1 or P2 wins the game, we have
Pr(B) = 1 − Pr(A) = 1 − 2/3 = 1/3.
5.12. Infinite Probability Spaces 143
Let us verify, using an infinite series, that Pr(B) is indeed equal to 1/3. The event B corresponds to the subset
B = {TnH : n ≥ 0 and n is odd },
which can be rewritten as
B = {T2m+1H : m ≥ 0}.
The probability that P2 wins the game is thus equal to
Pr(B) = ∞∑ m=0
Pr ( T2m+1H
) =
∞∑ m=0
(1/2)2m+2
= (1/4) ∞∑ m=0
(1/2)2m
= (1/4) ∞∑ m=0
(1/4)m
= (1/4) · 1 1 − 1/4 = 1/3.
We have seen in Lemma 5.12.2 that the infinite series ∑∞
n=0 x n converges
if −1 < x < 1. It is not difficult to see that for all other values of x, the limit
lim N→∞
N∑ n=0
xn
does not exist. As a result, if x ≥ 1 or x ≤ −1, the infinite series ∑∞
n=0 x n
does not converge. Another example of an infinite series that does not con- verge is
∞∑ n=1
1/n = 1 + 1/2 + 1/3 + 1/4 + · · ·
In Section 6.4, we will prove that
N∑ n=1
1/n = 1 + 1/2 + 1/3 + 1/4 + · · · + 1/N
144 Chapter 5. Discrete Probability
is about ln N. It follows that
lim N→∞
N∑ n=0
1/n
is about lim N→∞
ln N,
which clearly does not exist.
5.13 Exercises
5.1 Consider a coin that has 0 on one side and 1 on the other side. We flip this coin once and roll a die twice, and are interested in the product of the three numbers.
• What is the sample space?
• How many possible events are there?
• If both the coin and the die are fair, how would you define the proba- bility function Pr for this sample space?
5.2 Let n be a positive integer. We flip a fair coin 2n times and consider the possible outcomes, which are strings of length 2n with each character being H (= heads) or T (= tails). Thus, we take the sample space S to be the set of all such strings. Since our coin is fair, each string of S should have the same probability. Thus, we define Pr(s) = 1/|S| for each string s in S. In other words, we have a uniform probability space.
You are asked to determine the probability that in the sequence of 2n flips, the coin comes up heads exactly n times:
• What is the event E that describes this?
• Determine Pr(E).
5.3 A cup contains two pennies (P), one nickel (N), and one dime (D). You choose one coin uniformly at random, and then you choose a second coin from the remaining coins, again uniformly at random.
5.13. Exercises 145
• Let S be the sample space consisting of all ordered pairs of letters P, N, and D that represent the possible outcomes. Write out all elements of S.
• Determine the probability for each element in this sample space.
5.4 Let k ≥ 2 be an integer and consider the sample space S consisting of all sequences of k characters, where each character is one of the digits 0, 1, 2, . . . , 9.
If we choose a sequence s uniformly at random from the sample space S, what is the probability that none of the digits in s is equal to 5?
5.5 In Section 5.4, we have seen the different cards that are part of a standard deck of cards.
• You choose 2 cards uniformly at random from the 13 spades in a deck of 52 cards. Determine the probability that you choose an Ace and a King.
• You choose 2 cards uniformly at random from a deck of 52 cards. De- termine the probability that you choose an Ace and a King.
• You choose 2 cards uniformly at random from a deck of 52 cards. De- termine the probability that you choose an Ace and a King of the same suit.
5.6 Prove the inequality in (5.2), i.e., prove that
1 −x ≤ e−x
for all real numbers x.
5.7 You roll two dice D1 and D2. Let A be the event “D1 + D2 = 7” and let B be the event “D1 = 4”. Determine the conditional probabilities
Pr(A | B)
and Pr(B | A).
5.8 A hand of 13 cards is chosen uniformly at random from a standard deck of 52 cards. Define the following events:
146 Chapter 5. Discrete Probability
• A = “the hand has at least one ace”,
• B = “the hand has at least two aces”,
• C = “the hand has the ace of spades”.
Determine Pr(A | B), Pr(B | A), and Pr(B | C).
5.9 Let A and B be events with Pr(A) 6= 0 and Pr(B) 6= 0. Use the definition of conditional probability to prove Bayes’ Theorem:
Pr(A | B) = Pr(B | A) · Pr(A) Pr(B)
.
5.10 Medical doctors have developed a test for detecting disease X.
• The test is 98% effective on people who have X: If a person has X, then with probability 0.98, the test says that the person indeed has X.
• The test gives a false reading for 3% of the population without the disease: If a person does not have X, then with probability 0.03, the test says that the person does have X.
• It is known that 0.1% of the population has X.
Assume we choose a person uniformly at random from the population and test this person for disease X.
• Determine the probability that the test says that the person has X.
• Assume the test says that the person has X. Determine the probability that the person indeed has X.
5.11 You flip three fair coins independently of each other. Let A be the event “at least two flips in a row are heads” and let B be the event “the number of heads is even”. (Note that zero is even.) Are A and B independent?
5.12 You flip three fair coins independently of each other. Let A be the event “there is at most one tails” and let B be the event “not all flips are identical”. Are A and B independent?
5.13. Exercises 147
5.13 You flip two fair coins independently of each other. Define the following events:
A = “the number of heads is odd”,
B = “the first coin comes up heads”,
and C = “the second coin comes up heads”.
• Are the events A and B independent?
• Are the events A and C independent?
• Are the events B and C independent?
• Are the events A, B, and C pairwise independent?
• Are the events A, B, and C mutually independent?
5.14 Prove that for any real number x 6= 1 and any integer N ≥ 0, N∑ n=0
xn = 1 −xN+1
1 −x .
5.15 Use the following argumentation to convince yourself that
∞∑ n=0
1/2n = 2.
Take the interval I = [0, 2) of length 2 on the real line and, for each n ≥ 0, an interval In of length 1/2
n. You can place the intervals In in I such that
• no two intervals In and Im, with m 6= n, overlap and
• all intervals In completely cover the interval I.
5.16 Two players P and Q take turns rolling two fair and independent dice. The first player who gets a sum of seven wins the game. Determine the probability that player P wins the game.
5.17 We flip a fair coin repeatedly and independently, and stop as soon as we see one of the two sequences HTT and HHT . Let A be the event that the process stops because HTT is seen.
148 Chapter 5. Discrete Probability
• Prove that event A is given by the set
{Tn(HT)mHTT : n ≥ 0,m ≥ 0}.
In other words, event A holds if and only if the sequence of coin flips is equal to Tn(HT)mHTT for some n ≥ 0 and m ≥ 0.
• Prove that Pr(A) = 1/3.