EECS 376: Foundations of Computer Science
University of Michigan, Winter 2022 Discussion Notes 11 1 Randomized Algorithms
Randomness often allows us to solve many ostensibly difficult problems simply and efficiently, but it comes at a cost: uncertainty, either about the runtime of the algorithm or the correctness of its output. We can classify randomized algorithms by how this uncertainty manifests.
1.1 Las Vegas Algorithms
Copyright By PowCoder代写 加微信 powcoder
Las Vegas Algorithms always output the correct solution, but their time complexities hold only in expectation. For some Las Vegas algorithms, the worst case runtime is bounded while for others the worst case runtime may be unbounded, but for all Las Vegas algorithms the output is always correct. An example of a Las Vegas algorithm is Randomized Quicksort, wherein the pivot element is chosen at random. In expectation, Randomized QuickSort runs in O(n log n) time, but degrades to O(n2) in the worst case.
1.2 Monte Carlo Algorithms
Unlike Las Vegas algorithms, Monte Carlo algorithms have deterministic time complexities, but there is some probability that their output is incorrect. In other words, their runtime is bounded, but their output is not always correct. We can further categorize Monte Carlo algorithms based on where these errors can occur.
1.2.1 One-Sided Error Randomized Algorithms
Definition 1.1. A one-sided error randomized algorithm is an algorithm that only makes an error
in one direction, i.e., it either never gives false positives or never gives false negatives.
Definition 1.2. Let L be a language. Then, A is a one-sided error randomized algorithm with no
false positives for L if
• x∈L =⇒ Pr[Aacceptsx]≥c • x∈/L =⇒ Aalwaysrejectsx
where 0 < c ≤ 1. By no false positives, we mean that the algorithm will never say that an x ∈/ L is in the language L.
Definition 1.3. Let L be a language. Then, A is a one-sided error randomized algorithm with no false negatives for L if
• x∈L =⇒ Aalwaysacceptsx
• x∈/L =⇒ Pr[Arejectsx]≥c
where 0 < c ≤ 1. By no false negatives, we mean that the algorithm will never say that an x ∈ L
is not in the language L.
1.3 Atlantic City Algorithm
An Atlantic City Algorithm is one that returns a numeric value, with high probability the value is ε close to the actual (think of something similar to polling when a report might specify confidence is within %p ± ε). You can amplify the success of a Atlantic City Algorithm by taking the median of the values of multiple runs.
EECS 376: Foundations of Computer Science
University of Michigan, Winter 2022 Discussion Notes 11 2 One-Sided Error Amplification
A one-sided error randomized algorithm, by definition, does not necessarily output the correct answer all the time. However, there is a way to increase the probability of getting a correct answer. The following lemma shows that we can use a one-sided error randomized algorithm A to construct another one one-sided error randomized algorithm B that has a higher, or amplified, probability of producing the correct answer. We amplify the probability by making repeated calls to the algorithm, a polynomial number of times. We state the lemma for algorithms with no false positives, but a similar statement holds for algorithms with no false negatives.
Lemma 2.1. Let L be a language and let A be a randomized algorithm such that • x∈L =⇒ Pr[A accepts x]≥c
• x ∈/ L =⇒ A always rejects x
wherecisarealnumbersuchthat0
Thus, it suffices to take any k larger than ln(1/ε) . c
Notice that if A runs in polynomial time, running it ln(1/ε) times still takes polynomial time (so c
long as ε is a constant), so B runs in polynomial time as well. 3 Linearity of Expectation
Recall the following definition:
Definition 3.1 (Expected value of a discrete random variable). Let X : Ω → [0, 1] be a discrete
random variable. The expected value of X is written E[X] and is defined to be ω∈Ω ω Pr[X = ω].
Example 3.2. Consider a fair coin and a fair die. Suppose a value of heads on the coin is 1 and a value of tails on the coin is 0. Let C be the result of a coin flip and D be the result of a roll of the
die. Observe that
6 with probability 1/6
Then we have that E[C] = 1Pr[C = heads]+0Pr[C = tails] = 1∗1/2+0∗1/2 = 1/2 and that
E[D] = 1 Pr[D = 1] + 2 Pr[D = 2] + · · · + 6 Pr[D = 6] = 1 ∗ 1/6 + 2 ∗ 1/6 + · · · + 6 ∗ 1/6 = 7/2. Note 3.3. It is not required that a random variable be able to take on its expected value. Neither
C nor D in the prior example take on their expected value.
Sometimes, we can write a random variable X as the sum of two (or more) random variables – in
this case, we can apply linearity of expectation to analyze the expected value of X.
Theorem 3.4. Let Y and Z be random variables and X = cY + dZ for constants c and d. Then
E[X]=E[cY +dZ]=E[cY]+E[dZ]=cE[Y]+dE[Z].
Example 3.5. Consider a fair coin and a fair die. Suppose a value of heads on the coin is 1 and a value of tails on the coin is 0. Let C be the result of a coin flip and D be the result of a roll of the die. LetBbethesumoftheresultofacoinflipandadieroll-thenwehavethatB=C+D. By linearity of expectation, we have that E[B] = E[C] + E[D] = 4.
1 with probability 1/2
0 with probability 1/2
1 with probability 1/6 2 with probability 1/6
EECS 376: Foundations of Computer Science
University of Michigan, Winter 2022 Discussion Notes 11
A useful tool in analysis which applies linearity of expectation is called an indicator random variable. An indicator random variable takes on value 1 when some event happens and value 0 otherwise. These types of variables are especially useful when we need to count the number of events which happen.
Example 3.6. Suppose you have a line of plants p1, p2, . . . , pn where pi has 1/(i2) chance to live an additional year. What is the expected number of plants alive after one additional year? Let X be the number of plants alive after one additional year. Observe that we can write X = Y1 +Y2 +· · ·+Yn where Yi takes value 1 if plant pi lives an additional year and value 0 otherwise. Then we can write E[X] = ni=1 E[Yi] = ni=1 1/(i2) = π2/6.
4 Markov’s Inequality and Reverse Markov’s Inequality
Markov’s Inequality allows us to bound the probability that a positive random variable Z differs from its expected value by very much.
Theorem 4.1 (Markov’s Inequality). If Z is a non-negative random variable and a > 0, then Pr[Z ≥ a] ≤ E[Z].
a Corollary 4.2. If Z is a non-negative random variable and a > 0,
Pr[Z ≥ a · E[Z]] ≤ a1.
This corollary provides a more intuitive way of understanding Markov’s Inequality. It says that the probability of a (non-negative) random variable Z being a times greater than its expected value is at most a1 .
Example 4.3. Suppose you have a biased coin such that Pr[heads] = 30%. Using Markov’s Inequality, we can find an upper bound on the probability of getting at least 60 heads after tossing the coin 100 times. Let Z be a random variable equal to the number of times the coin showed
heads after tossing it 100 times. Since our coin is biased, the expected value of Z is 0.3 · 100 = 30. Thus, we have that:
Pr[Z ≥60]=Pr[Z ≥2·30] =Pr[Z ≥2·E[Z]]
≤ 12. (by Corollary 4.2) Therefore, the probability of getting at least 60 heads after tossing the coin 100 times is no more
Example 4.4. Suppose that we have a hash table of size n2 and a hash function h that chooses the mapping address uniformly at random from 0, . . . , n2 − 1. We can use Markov’s Inequality to find an upper bound on the probability that there is at least one collision after inserting n distinct elements. Let Z be a random variable indicating the number of collisions after performing
n insertions. Let S = {s1, . . . , sn} be the set of inserted elements. E[Z] = Pr[h(si) = h(sj)]
all pairs (i,j) i̸=j
EECS 376: Foundations of Computer Science
University of Michigan, Winter 2022
Discussion Notes 11
n2 all pairs (i,j)
i̸=j n 1
n − 1 1 = 2 n2= 2n <2.
Now that we know the expected value of Z (i.e., the expected number of collisions), we can calculate the probability of getting at least one collision using Markov’s Inequality.
Pr[Z ≥1]≤Pr[Z ≥2·E[Z]]≤ 12
Therefore, the probability of any two distinct elements being hashed to the same address is no more
Note: This is not a particularly tight bound; the true probability is considerably less. Markov’s Inequality is useful because it applies to so many random variables (only requiring non-negativity), but an unfortunate consequence of this is that the bounds it gives are often loose.
4.1 Reverse Markov’s Inequality
By cleverly manipulating Markov’s Inequality, we can find a lower bound for the probability that a random variable differs from its expected value, given that the random variable has an upper bound on the values it can take. This lower bound is known as Reverse Markov’s Inequality. See lecture 20 for more info.
Theorem 4.5 (Reverse Markov’s Inequality). If Z is a random variable that is never larger than B and a < B, then
Pr[Z > a] ≥ E[Z] − a. B−a
The proofs for Markov’s Inequality and Reverse Markov’s Inequality can be found in the optional material at the bottom of the section notes.
Example 4.6. Recall the biased coin from Example 3.3. Using Reverse Markov’s Inequality, we can find a lower bound on the probability of getting strictly more than 10 heads after tossing the coin 100 times. Let Z be a random variable equal to the number of times the coin showed heads
after tossing it 100 times. Since our coin is biased, the expected value of Z is 0.3 · 100 = 30. Furthermore, since we toss the coin 100 times, Z can be no greater than 100 (i.e. B = 100). Thus, we have that:
Pr[Z >10]≥ E[Z]−10 B − 10
≥ 30−10 100−10
Therefore, the probability of getting strictly greater than 10 heads after tossing the coin 100 times is no less than 29.
EECS 376: Foundations of Computer Science
University of Michigan, Winter 2022 Discussion Notes 11 5 (Optional) Proofs
Below are proofs for some of the theorems and propositions stated above.
5.1 Proof of Markov’s Inequality
Theorem 5.1 (Markov’s Inequality). If Z is a non-negative random variable and a > 0, then Pr[Z ≥ a] ≤ E[Z].
Proof. We are going to prove that Markov’s Inequality holds for all non-negative discrete random variables (random variables that can take on only a countable number of values), since these are what we tend to deal with in computer science. The more general proof for non-negative continuous random variables is very similar; it simply replaces summations with integrals.
E[Z] = iPr[Z = i] i=0
a−1 ∞ =iPr[Z =i]+iPr[Z =i]
≥ iPr[Z = i] i=a
≥ aPr[Z = i] i=a
= a Pr[Z ≥ a] Dividing by a, we obtain E[Z] ≥ Pr[Z ≥ a].
5.2 Proof of Reverse Markov’s Inequality
Theorem 5.2 (Reverse Markov’s Inequality). If Z is a random variable that is never larger than B and a < B, then
Pr[Z > a] ≥ E[Z] − a. B−a
Proof. Construct another random variable Z′ = B − Z. Note that since Z is no more than B, Z′ is a non-negative random variable.
Pr[Z ≤a]=Pr[Z′ ≥B−a] ≤ E[Z′]
= B − E[Z].
Pr[Z > a] = 1 − Pr[Z ≤ a] ≥ 1 − B − E[Z]
(Markov’s Inequality) (Linearity of expectation)
EECS 376: Foundations of Computer Science
University of Michigan, Winter 2022 Discussion Notes 11 = E[Z]−a.
Proposition 5.3. For any discrete random variable X : S → R there exists a ∈ S such that
X(a) ≤ E[X].
Proof. The proof follows exactly the same form as the one provided above.
5.3 Proof of Linearity of Expectation
A very useful property of expected value is its linearity namely that for any two random variables X and Y , E[X + Y ] = E[X] + E[Y ] and for any constant c, E[cX] = cE[X]. The following proofs will be for discrete random variables (variables with a countable sample space) but they can be adapted to continuous random variables by replacing the summations with integrals.
Proof. Let X : S → R be an arbitrary discrete random variable and let c ∈ R. By definition of expectation:
E[cX] = csPr[X = s] s∈S
= csPr[X = s] s∈S
= c · E[X].
Proof. Let X : S → R and Y : T → R be two arbitrary discrete random variables. By definition of
expectation:
E[X+Y]=(x+y)Pr[X=x,Y =y] x∈S y∈T
=(xPr[X =x,Y =y]+yPr[X =x,Y =y]) x∈S y∈T
=xPr[X =x,Y =y]+yPr[X =x,Y =y] x∈S y∈T x∈S y∈T
=xPr[X =x,Y =y]+yPr[X =x,Y =y] x∈S y∈T y∈T x∈S
=xPr[X =x,Y =y]+yPr[X =x,Y =y] x∈S y∈T y∈T x∈S
Think about y∈T Pr[X = x,Y = y]. In words, this is the probability that X = x and Y = y, summed up over all possible values of y. In other words, this is just the probability that X = x. (For those who recognize it, this is the law of total probability)
=xPr[X=x]+yPr[Y =y] x∈S y∈T
By definition of expected value:
= E[X] + E[Y ].
EECS 376: Foundations of Computer Science
University of Michigan, Winter 2022 Discussion Notes 11 6 Union Bound
The union bound says that we can bound the probability that any one of many events occurs by the sum of the probabilities that each individual event occurs. Formally:
Theorem 6.1. Union bound: Let A1, A2, . . . , An be a set of (possibly dependent) events. Then,
Pr[A1 ∪A2 ∪···∪An]≤Pr[Ai]
Example 6.2. Suppose that a fair coin is flipped n times. We can use the union bound to find an upper bound on the probability that there is a sequence of log2 n + k consecutive heads for some k > 0.
Let Si denote the event that log2 n + k consecutive coin flips are heads, starting with the i-th flip. The probability that there is any sequence of log2 n + k consecutive heads, is the same as the probability that such a sequence occurs starting at any of the first m = n − (log2 n + k − 1) tosses. Using the union bound, we know that
Pr[S1 ∪S2 ∪···∪Sm]≤Pr[Si]
Each coin flip is independent, so the probability that any Si occurs is just the probability that
log n + k coins all come up heads, which is simply 1 log2 n+k = 1 . Substituting this value in for 2 22kn
Pr[Si], we have:
Pr[S1∪S2∪···∪Sm]≤m 1 i=1 2kn
= n−(log2n+k−1) 2kn
= 1 −log2n+k−1 2k 2kn
EECS 376: Foundations of Computer Science
University of Michigan, Winter 2022 Discussion Notes 11 7 Modular Arithmetic
This section contains a review of modular arithmetic. Recall that the set of integers is denoted by the symbol Z. Throughout this section, n is a fixed positive integer, called the modulus.
Definition 7.1. Two integers a and b are congruent modulo n, a≡b (modn)
(read as: ”a is congruent to b, modulo n”)
if n divides b − a. Equivalently, b = a + nk for some integer k. For example, 5 ≡ 11 (mod 3) because 11 = 5 + 3k for k = 2. Furthermore, n = 3 divides 11 − 5 = 6. Note that if a ≡ b (mod n), then b ≡ a (mod n) and vice versa (i.e. the relation is symmetric).
More formally, two integers are said to be congruent if they belong to the same congruence class. The congruence class of the integer a modulo an integer n is given by the set {a + kn : k ∈ Z}. For example, the congruence class of 2 modulo 5 is {2 + 5k : k ∈ Z} = {··· ,−8,−3,2,7,12,···}. Notice that a ≡ b (mod n) if and only if a and b give the same remainder when divided by n. There are n possible remainders
{0,1,…,n−1},
therefore there are exactly n distinct congruence classes for a given modulo n. The following
proposition gives some more properties of congruence modulo n.
Proposition7.2.Ifa≡a′ (modn)andb≡b′ (modn),thena+b≡a′+b′ (modn)andab≡a′b′
The above proposition allows us to define addition and multiplication for congruence classes: the sum of the congruence class containing a and the congruence class containing b is the congruence class containing a + b.
The set of congruence classes modulo n, together with the operations of addition and multiplication, is denoted by the symbol Zn. This proposition show that calculations in Zn can be performed by doing them in Z and then taking the remainder.
Example 7.3. Let n = 5. Then
2+4≡6≡1 (mod5), 2·−3≡−6≡4 (mod5).
7.1 Division in Zn and the Extended Euclidean Algorithm
We have defined the operations of addition and multiplication in Zn. We now discuss division. Division is not always possible and there’s no reason to expect it to be, since it isn’t always possible over the integers. For instance, 2 and 3 are both integers, but 2/3 is not. We will now make more precise what we mean by division.
Recall that division is an inverse operation to multiplication; intuitively, dividing by a is the same as multiplying by 1/a, or a−1.
Definition7.4.Leta,b∈Z. Ifthereissomeb∈Zwitha·b≡1(modn),thenbiscalleda multiplicative inverse, or simply inverse, of a, and is denoted by a−1.
EECS 376: Foundations of Computer Science
University of Michigan, Winter 2022 Discussion Notes 11
The notation of a−1 is not problematic because it turns out that if a has a multiplicative inverse, then it is unique up to congruence modulo n. However, it is not always the case that a has an inverse, as the following example shows.
Note that in Z, 1 is the multiplicative identity since for any x ∈ Z, x · 1 = x. Similarly, in Zn we have x · 1 ≡ x (mod n).
Example 7.5. In Z4, 2 does not have an inverse. To demonstrate this, we multiply 2 by every element of Z4:
2·0≡0 (modn) 2·1≡2 (modn) 2·2≡0 (modn) 2·3≡2 (modn)
As 1 does not appear in the list, 2 does not have an inverse in Z4.
Definition 7.6. The modular inverse of an integer can easily be computed using the Extended
Euclidean algorithm.
Require: integers x > y ≥ 0
Ensure: a triple (g, a, b) of integers where g = gcd(x, y) and ax + by = g
1: 2: 3: 4: 5: 6: 7: 8: 9:
functionExtendedEuclid(x,y) if y=0then
return(x,1,0) else
Writex=qy+rforanintegerq,where0≤r