CIS 121: Data Structures and Algorithms
Course Lecture Notes
Steven Bursztyn, Rajiv Gandhi, and John Geyer ∗ Draft of: April 14, 2020
University of Pennsylvania
∗ see acknowledgments on next page
Goal
There is no one book that covers everything that we want to cover in CIS 121. The goal of these notes is for students to find all course lecture material in one place, and in one uniform format.
Acknowledgments
These lecture notes were compiled by Steven Bursztyn, Rajiv Gandhi, and John Geyer for CIS 121 at the University of Pennsylvania during the Spring 2019 and Spring 2020 semesters.
These lecture notes are a work in progress, and we appreciate the students and TAs who have helped make small edits.
Some chapters include attributions to other courses or textbooks from which we adapted our lecture notes, with some parts copied directly.
We also gratefully acknowledge the many others who made their course materials freely available online.
Disclaimer
These notes are designed to be a supplement to the lecture. They may or may not cover all the material discussed in the lecture (and vice versa).
Material varies from semester to semester, and this book may contain more information than what will be covered in lecture (i.e., material you may not be responsible for knowing). Please refer to the course website for assigned readings.
Errata
If you find any mistakes, please email the professor(s) and Head TAs so we can fix them.
Feedback
If you believe a certain topic is explained poorly, or could use additional examples or explanations, please reach out to the course staff. As mentioned, this is a work in progress and we would love to do what we can to make this resource as helpful as possible.
Table of Contents
Table of Contents iii
1 Review of Terms, Proofs, and Probability 1
1.1 ReviewofProofsandProofTechniques ………………………… 1 Induction ……………………………………….. 2
1.2 Graphs……………. Trees…………….. Eulerian and Hamiltonian Graphs
…………………………… 3 …………………………… 4 …………………………… 5
1.3 Probability ………………………………………. 6
1.4 LinearityofExpectation………………………………… 8
1.5 ProbabilityDistributions………………………………… 9
TheGeometricDistribution………………………………. 10 BinomialDistributions…………………………………. 11 Examples ……………………………………….. 13
2 Gale-Shapley Stable Matching 15
2.1 BackgroundandIntuition ……………………………….. 15 2.2 FormulatingtheProblem ……………………………….. 15 2.3 Examples ……………………………………….. 17 2.4 DesigninganAlgorithm ………………………………… 18 2.5 RuntimeoftheGSAlgorithm……………………………… 19 2.6 CorrectnessoftheGSAlgorithm ……………………………. 19 2.7 Extensions……………………………………….. 21
3 Greatest Common Divisor 24
3.1 Definitions……………………………………….. 24 3.2 CalculatingtheGCD………………………………….. 24 3.3 CorrectnessofEuclid’sAlgorithm……………………………. 26 3.4 RuntimeofEuclid’sAlgorithm …………………………….. 27
4 Insertion Sort 29
4.1 InsertionSort……………………………………… 29 4.2 CorrectnessofInsertionSort ……………………………… 30 4.3 RunningTimeofInsertionSort…………………………….. 30
5 Running Time and Growth Functions 32
5.1 MeasuringRunningTimeofAlgorithms ………………………… 32 5.2 RAMModelofComputation ……………………………… 32 5.3 AverageCaseandWorstCase……………………………… 32
5.4 OrderofGrowth ……………………………………. 33 DefinitionsofAsymptoticNotations–O,Ω,Θ,o,ω …………………… 33 5.5 PropertiesofAsymptoticGrowthFunctions………………………. 37
6 Analyzing Runtime of Code Snippets 38
7 Divide & Conquer and Recurrence Relations 41
7.1 ComputingPowersofTwo……………………………….. 41 7.2 LinearSearchandBinarySearch ……………………………. 43 7.3 MergeSort……………………………………….. 43 7.4 MoreRecurrencePractice ……………………………….. 46 7.5 SimplifiedMasterTheorem ………………………………. 48
8 Quicksort 49
8.1 DeterministicQuicksort ………………………………… 49 8.2 RandomizedQuicksort…………………………………. 50
9 Counting Inversions 52
9.1 IntroductionandProblemDescription …………………………. 52 9.2 DesigninganAlgorithm ………………………………… 53 9.3 Runtime………………………………………… 55
10 Selection Problem 56
10.1IntroductiontoProblem………………………………… 56 10.2SelectioninWorst-CaseLinearTime………………………….. 56
11 Closest Pair 59
11.1ClosestPair………………………………………. 59 11.2DivideandConquerAlgorithm …………………………….. 60 11.3ClosestPairBetweentheSets……………………………… 61
12 Integer Multiplication 63
12.1IntroductionandProblemStatement………………………….. 63 12.2DesigningtheAlgorithm………………………………… 63 12.3Runtime………………………………………… 65
13 Stacks and Queues 66
13.1TheStackADT…………………………………….. 66 13.2Queues…………………………………………. 69
14 Binary Heaps and Heapsort 70
14.1DefinitionsandImplementation…………………………….. 70 14.2MaintainingtheHeapProperty…………………………….. 71 14.3BuildingaHeap ……………………………………. 72
Correctness………………………………………. 73
Runtime………………………………………… 73
14.4 Heapsort………………………………………… 75
14.5 PriorityQueues…………………………………….. 75
15 Huffman Coding 79
15.1FromTexttoBits …………………………………… 79 15.2Variable-LengthEncodingSchemes…………………………… 79 PrefixCodes ……………………………………… 80 OptimalPrefixCodes …………………………………. 80 15.3HuffmanEncoding …………………………………… 81 BinaryTreesandPrefixCodes …………………………….. 81 Huffman’sAlgorithm………………………………….. 84 DesigningtheAlgorithm………………………………… 85 CorrectnessofHuffman’sAlgorithm ………………………….. 86 RunningTime …………………………………….. 87 15.4 Extensions……………………………………….. 88
16 Graph Traversals: BFS and DFS 90
16.1GraphsandGraphRepresentations…………………………… 90 GraphRepresentations…………………………………. 90 16.2Connectivity ……………………………………… 91
16.3 Breadth-FirstSearch(BFS)………………………………. 91 BFSProperties…………………………………….. 93 RuntimeofBFS ……………………………………. 94
16.4 Depth-FirstSearch(DFS)……………………………….. 94 RuntimeofDFS ……………………………………. 96
DFS Properties and Extensions . ClassifyingEdges………
17 Application of BFS: Bipartiteness
……………………………. 97 ……………………………. 99
101
17.1DefinitionsandProperties……………………………….. 101 17.2Algorithm……………………………………….. 101 17.3Analysis………………………………………… 101
18 DAGs and Topological Sorting 103
18.1DAGs …………………………………………. 103 18.2TopologicalSorting…………………………………… 103 18.3Kahn’sAlgorithm …………………………………… 104 18.4Tarjan’sAlgorithm…………………………………… 106
19 Strongly Connected Components 107
19.1IntroductionandDefinitions………………………………. 107 19.2Kosaraju’sAlgorithm …………………………………. 109 ProofofCorrectness ………………………………….. 109
20 Shortest Path 112
20.1TheShortestPathProblem ………………………………. 112 20.2Dijkstra’sAlgorithm ………………………………….. 112 AnalyzingtheAlgorithm………………………………… 113 ImplementationandRunningTime…………………………… 115 20.3ShortestPathinDAGs…………………………………. 115
21 Minimum Spanning Trees 118
21.1IntroductionandBackground……………………………… 118 21.2MSTAlgorithms ……………………………………. 119 Prim’sAlgorithm……………………………………. 119 Kruskal’sAlgorithm ………………………………….. 120 Reverse-Delete …………………………………….. 121
21.3 CorrectnessofPrim’s,Kruskal’s,andReverse-Delete . . . . . . . . . . . . . . . . . . . . . . . 122 Prim’sAlgorithm:Correctness …………………………….. 124 Kruskal’sAlgorithm:Correctness……………………………. 124 Reverse-DeleteAlgorithm:Correctness…………………………. 124
21.4 EliminatingtheAssumptionthatAllEdgeWeightsareDistinct. . . . . . . . . . . . . . . . . 124
22 Union Find 126
22.1Introduction………………………………………. 126 22.2UnionbyRank…………………………………….. 126 22.3PathCompression …………………………………… 129
23 Hashing 132
23.1Direct-AddressTables …………………………………. 132 23.2HashTables………………………………………. 133 CollisionResolutionbyChaining ……………………………. 134 AnalysisofHashingwithChaining …………………………… 134 23.3HashFunctions…………………………………….. 136 Whatmakesagoodhashfunction?…………………………… 136 InterpretingKeysasNaturalNumbers…………………………. 136 TheDivisionMethod………………………………….. 137 TheMultiplicationMethod ………………………………. 137 23.4OpenAddressing……………………………………. 137 LinearProbing …………………………………….. 138 QuadraticProbing …………………………………… 139 DoubleHashing…………………………………….. 139 AnalysisofOpen-AddressHashing …………………………… 140
24 Tries 142
24.1Introduction………………………………………. 142 24.2StandardTries …………………………………….. 142 24.3CompressedTries……………………………………. 144
24.4SuffixTries ………………………………………. 146 SavingSpace ……………………………………… 147 Construction ……………………………………… 147 UsingaSuffixTrie …………………………………… 147
25 Balanced BSTs: AVL Trees 148
25.1Review:BinarySearchTree………………………………. 148 25.2DefinitionofanAVLTree……………………………….. 148 25.3UpdateOperations:InsertionandDeletion ………………………. 150
Insertion………………………………………… 150 Deletion………………………………………… 153
Advanced Topics 155
26 Skip Lists 156
26.1SkipLists……………………………………….. 156 26.2Analysis………………………………………… 158
27 Bloom Filters 162
27.1BloomFilters……………………………………… 162 28 Balanced BSTs: Left-Leaning Red-Black Trees 164
Appendix 165
A Common Running Times 166
Review of Terms, Proofs, and Probability 1
1.1 Review of Proofs and Proof Techniques
The unique factorization theorem states that every positive number can be uniquely represented as a product of primes. More formally, it can be stated as follows.
Given any integer n > 1, there exist a positive integer k, distinct prime numbers p1, p2, . . . , pk, andpositiveintegerse1,e2,…,ek suchthat
n = pe1pe2pe3 ···pek 123 k
and any other expression of n as a product of primes is identical to this except, perhaps, for the order in which the factors are written.
Example. Prove that √2 is irrational using the unique factorization theorem.
Solution. Assume for the purpose of contradiction that √2 is rational. Then there are numbers a and b
(b ̸= 0) such that
Squaring both sides of the above equation gives
√2 = a b
a2
2=
a2 = 2b2
Let S(m) be the sum of the number of times each prime factor occurs in the unique factorization of m. Note that S(a2) and S(b2) is even. Why? Because the number of times that each prime factor appears in the prime factorization of a2 and b2 is exactly twice the number of times that it appears in the prime factorization of a and b. Then, S(2b2) must be odd. This is a contradiction as S(a2) is even and the prime factorization of a positive integer is unique.
Example. Prove or disprove that the sum of two irrational numbers is irrational.
Solution. The above statement is false. Consider the two irrational numbers, √2 and −√2. Their sum is
0 = 0/1, a rational number.
Example. Show that there exist irrational numbers x and y such that xy is rational.
b2
Solution. We know that √ √2
√
√ √2 2 is an irrational number. Consider 2 .
Case I: 2 is rational. Inthiscasewearedonebysettingx=y= 2.
√
CIS 121 – Draft of April 14, 2020 1 Review of Terms, Proofs, and Probability 2
√ √2
Case II: 2 is irrational.
√
√√2 √ y√√22√2
Inthiscase,letx= 2 andlety= 2.Then,x = 2 =( 2) =2,whichisanintegerand hence rational.
Induction
Induction is a proof technique that relies on the following logic: If I have a statement P(n) such that the statement is true for n = 1, i.e. P(n) = true, and I know that whenever P(k) is true for any k ≥ 1, then P (k + 1) is also true, then the statement must be true for all integers n ≥ 1. The typical intuitive examples of induction are:
Suppose I have a line of dominoes. If I know I can knock down the first domino, and I know that if the k-th domino falls, it will knock over the k + 1-th domino, then all the dominoes will fall.
Suppose I have a ladder. If I know that I can get on the ladder, and I know that for each step of the ladder it is possible to get to the next step, then I can climb the whole ladder.
There are plenty of other examples. In general, an inductive proof of a claim P consists of proving that a base case (BC) holds (usually P(0) or P(1)), making an induction hypothesis (IH) that assumes that P(k) is true for some value k that is greater than or equal to the base case, and lastly an induction step (IS) which uses the IH to prove that P (k + 1) holds. Please note that induction can only be used to prove properties about integers–you can’t use induction to prove statements about real numbers.
Example.
Solution.
n
Prove via induction that i = n(n+1) .
i=1
We proceed via induction on n.
2
1 i=1
Base Case: The base case occurs when n = 1. i = 1 and 1(1+1) = 1, and so the claim holds in the base
2
case.
Induction Hypothesis: Assume that for some integer k ≥ 1 we have that i = k(k+1) .
k i=1
2
(by the induction hypothesis)
k+1
Induction Step: We must show that i = (k+1)(k+2) . Indeed:
i=1
2
k+1
i = (k + 1) + i i=1 i=1
= (k + 1) + k(k + 1) 2
= 2(k + 1) + k(k + 1) 22
= 2(k+1)+k(k+1) 2
= (k+1)(k+2) 2
k
CIS 121 – Draft of April 14, 2020 1 Review of Terms, Proofs, and Probability 3
Note how we used the induction hypothesis in the induction step. If you find yourself doing a proof by induction without invoking the induction hypothesis, you are probably doing it incorrectly (or induction is not necessary).
Example. Prove using induction that, for any positive integer n, if x1, x2, . . . , xn are n distinct real numbers, then no matter how the parenthesis are inserted into their product, the number of multiplications used to compute the product is n − 1.
Solution. Let P (n) be the property that “If x1, x2, . . . , xn are n distinct real numbers, then no matter how the parentheses are inserted into their product, the number of multiplications used to compute the product is n−1”.
Base Case: P (1) is true, since x1 is computed using 0 multiplications.
Induction Hypothesis: Assume that P(j) is true for all j such that 1 ≤ j ≤ k.
Induction Step: We want to prove P(k+1). Consider the product of k+1 distinct factors, x1,x2,…,xk+1. When parentheses are inserted in order to compute the product of factors, some multiplication must be the final one. Consider the two terms, of this final multiplication. Each one is a product of at most k factors. Suppose the first and the second term in the final multiplication contain fk and sk factors. Clearly, 1 ≤ fk,sk ≤ k. Thus, by induction hypothesis, the number of multiplications to obtain the first term of the final multiplication is fk − 1 and the number of multiplications to obtain the second term of the final multiplication is sk − 1. It follows that the number of multiplications to compute the product of x1, x2, . . . , xk, xk+1 is
(fk −1)+(sk −1)+1=fk +sk −1=k+1−1=k
1.2 Graphs
A graph G = (V,E) is a set of vertices (also called nodes) V together with a set of edges E ⊆ V ×V, where V × V denotes the cartesian product of V with itself. We denote an edge from u to v as (u, v). A graph can be undirected, in which case we consider the edges (u,v) and (v,u) to be the same edge, or it can be directed, in which case we distinguish (u, v) from (v, u). In CIS 121, we don’t worry about so-called “self-loops”, i.e. edges of the form (v, v). The degree of a vertex v, denoted deg(v) is the number of edges incident on it (for directed graphs, we distinguish between in-degree and out-degree).
Apathfromutovisasequenceu=v0, v1, …, vn =vsuchthat(vi,vi+1)∈Eforall0≤i
Case II: u and v do not belong to the same connected component in G′.
First let, Cu and Cv be the connected components for u and v respectively. We find that adding back {u, v} will result in Cu and Cv combining (while all other connected components remain unchanged), and thus reduce the number of connected components by exactly 1. Thus, G has at least n − k − 1 = n − (k + 1) connected components.
In either case we find that the number of connected components in G is at least n − (k + 1), completing our induction step and our proof.
CIS 121 – Draft of April 14, 2020 1 Review of Terms, Proofs, and Probability 5
Example. Prove that every connected graph with n vertices has at least n − 1 edges.
Solution. We will prove the contrapositive, i.e., a graph G with m ≤ n − 2 edges is disconnected. From the
result of the previous problem, we know that the number of components of G is at least n − m ≥ n − (n − 2) = 2
which means that G is disconnected. This proves the claim.
One could also have proved the above claim directly by observing that a connected graph has exactly one
connected component. Hence, 1 ≥ n − m. Rearranging the terms gives us m ≥ n − 1.
Example. Prove that every tree with at least two vertices has at least two leaves and deleting a leaf from
an n-vertex tree produces a tree with n − 1 vertices.
Solution. A connected graph with at least two vertices has an edge. In an acyclic graph, an endpoint of a maximal non-trivial path (a path that is not contained in a longer path) has no neighbors other than its only neighbor on the path. Hence, the endpoints of such a path are leaves.
Let v be a leaf of a tree T and let T′ = T −v. A vertex of degree 1 belongs to no path connecting two vertices otherthanv.Hence,foranytwoverticesu,w∈V(T′),everypathfromutowinT isalsoinT′.HenceT′ is connected. Since deleting a vertex cannot create a cycle, T ′ is also acyclic. Thus, T ′ is a tree with n − 1 vertices.
Example. Prove that if G is a tree on n vertices then G is connected and has n − 1 edges.
Solution. We can prove this by induction on n. The property is clearly true for n = 1 as G has 0 edges. Assume that any tree with k vertices, for some k ≥ 1, has k − 1 edges. We want to prove that a tree G with k + 1 vertices has k edges. Let v be a leaf in G, which we know exists as G is a tree with at least two vertices. Thus G′ = G − {v} is connected. By induction hypothesis, G′ has k − 1 edges. Since deg(v) = 1, G has k edges.
Eulerian and Hamiltonian Graphs
An Eulerian circuit is a closed walk in which each edge appears exactly once. A graph is Eulerian if it contains an Eulerian circuit. A Hamiltonian circuit is a closed walk in which each vertex appears exactly once. A graph is Hamiltonian if it contains a Hamiltonian circuit.
To determine whether a graph is Hamiltonian or not is significantly harder than determining whether a graph is Eulerian or not. In this class we study the characterization of Eulerian graphs.
Example. If δ(G) ≥ 2 then G contains a cycle.
Solution. Let P be a longest path (actually, any maximal path suffices) in G and let u be an endpoint of P . Since P cannot be extended, every neighbor of u is a vertex in P . Since deg(u) ≥ 2, u has a neighbor v ∈ P via an edge that is not in P. The edge {u,v} completes the cycle with the portion of P from v to u.
CIS 121 – Draft of April 14, 2020 1 Review of Terms, Proofs, and Probability 6
Example. Prove that a connected graph G is Eulerian iff every vertex in G has even degree.
Solution. Necessity: To prove that “if G is Eulerian then every vertex in G has even degree”. Let C denote the Eulerian circuit in G. Each passage of C through a vertex uses two incident edges and the first edge is paired with the last at the first vertex. Hence every vertex has even degree.
Sufficiency: To prove that “if every vertex in G has even degree then G is Eulerian”. We will prove this using induction on the number of edges, m.
Base Case: m = 0. In this case G has only one vertex and that itself forms a Eulerian circuit.
Induction Hypothesis: Assume that the property holds for any graph G with n vertices and j edges, for all j
such that n − 1 ≤ j ≤ k.
Induction Step: We want to prove that the property holds when G has n vertices and k + 1 edges. Since G has at least one edge and because G is connected and every vertex of G has even degree, δ(G) ≥ 2. From the result of the previous problem, G contains a cycle, say C. Let G′ be the graph obtained from G by removing the edges in E(C). Since C has either 0 or 2 edges at every vertex of G, each vertex in G′ also has even degree. However, G′ may not be connected. By induction hypothesis, each connected component of G′ has an Eulerian circuit. We can now construct an Eulerian circuit of G as follows. Traverse C, but when a component of G′ is entered for the first time, we detour along the Eulerian circuit of that component. The circuit ends at the vertex where we began the detour. When we complete the traversal of C, we have completed an Eulerian circuit of G.
Alternative Proof for the Sufficiency Condition: Let G be a graph with all degrees even and let W = v0e0 …el−1vl
be the longest walk in G using no edge more than once. Since W cannot be extended all edges incident on vl are part of W. Since all vertices in G have even degree it must be that vl = v0. Thus W is a closed walk. If W is Eulerian then we are done. Otherwise, there must be an edge in E[G] \ E[W ] that is incident on some vertex in W. Let this edge be e = {u,vi}. Then the walk
is longer than W, a contradiction.
1.3 Probability
ueviei . . . el−1vle0v0e1 . . . ei−1vi
Example. Suppose we flip two fair coins. What is the probability that both tosses give heads given that one of the flips results in heads? What is the probability that both tosses give heads given that the first coin results in heads?
Solution. We consider the following events to answer the question.
A: event that both flips give heads.
B: event that one of the flips gives heads. C: event that the first coin flip gives heads.
CIS 121 – Draft of April 14, 2020 1 Review of Terms, Proofs, and Probability 7
Let’s first calculate Pr[A|B].
Similarly we can calculate Pr[A|C] as follows.
Pr[A|B]= Pr[A∩B] = Pr[A] = 1/4 = 1. Pr[B] Pr[B] 3/4 3
Pr[A|C]= Pr[A∩C] = Pr[A] = 1/4 = 1. Pr[C ] Pr[C ] 1/2 2
The above analysis also follows from the tree diagram in the figure below.
1/2
1/2
Coin 1 Coin 2
TH (1/4)
TT (1/4)
outcomes(prob)
x
event B
TH 1/2
T
1/2 H
HH (1/4) x x x x
1/2 H
T HT (1/4) x x 1/2
The Total Probability Theorem. Consider events E and F. Consider a sample point ω ∈ E. Observe that ωbelongstoeitherF orF.Thus,thesetEisadisjointunionoftwosets:E∩F andE∩F.Henceweget
Pr[E] = Pr[E ∩F]+Pr[E ∩F]
= Pr[F] × Pr[E|F] + Pr[F] × Pr[E|F]
In general, if A1, A2, . . . , An form a partition of the sample space and if ∀i, Pr[Ai] > 0, then for any event B in the same probability space, we have
nn
Pr[B] = Pr[Ai ∩ B] = Pr[Ai] × Pr[B|Ai]
i=1 i=1
Example. A medical test for a certain condition has arrived in the market. According to the case studies, when the test is performed on an affected person, the test comes up positive 95% of the times and yields a “false negative” 5% of the times. When the test is performed on a person not suffering from the medical condition the test comes up negative in 99% of the cases and yields a “false positive” in 1% of the cases. If 0.5% of the population actually have the condition, what is the probability that the person has the condition given that the test is positive?
event C
event A&B
event A&C
CIS 121 – Draft of April 14, 2020 1 Review of Terms, Proofs, and Probability 8
Solution. We will consider the following events to answer the question.
C: event that the person tested has the medical condition. C: event that the person tested does not have the condition. P: event that the person tested positive.
We are interested in Pr[C|P]. From the definition of conditional probability and the total probability theorem we get
Pr[C ∩ P ] Pr[P ]
Pr[C]Pr[P|C] Pr[P ∩C]+Pr[P ∩C] Pr[C]Pr[P|C]
=
Pr[C|P] =
=
Pr[C] Pr[P |C] + Pr[C] Pr[P |C] 0.005 × 0.95
=
= 0.323
0.005 × 0.95 + 0.995 × 0.01
This result means that 32.3% of the people who are tested positive actually suffer from the condition!
1.4 Linearity of Expectation
One of the most important properties of expectation that simplifies its computation is the linearity of expectation. By this property, the expectation of the sum of random variables equals the sum of their expectations. This is given formally in the following theorem.
Theorem. For any finite collection of random variables X1, X2, . . . , Xn,
nn
E Xi =E[Xi]
i=1 i=1
Example. Suppose that n people leave their hats at the hat check. If the hats are randomly returned what is the expected number of people that get their own hat back?
Solution. Let X be the random variable that denotes the number of people who get their own hat back. Let Xi, 1 ≤ i ≤ n, be the random variable that is 1 if the ith person gets his/her own hat back and 0 otherwise. Clearly,
X = X1 + X2 + X3 + . . . + Xn
By linearity of expectation we get
i=1 i=1n! n
n n (n−1)! 1 E[X]=E[Xi]= =n× =1
CIS 121 – Draft of April 14, 2020 1 Review of Terms, Proofs, and Probability 9
Example. The following pseudo-code computes the minimum of n distinct numbers that are stored in an array A. What is the expected number of times that the variable min is assigned a value if the array A is a random permutation of the n elements.
FindMin(A, n) min ← A[1]
for i ← 2 to n do
if (A[i] < min) then
min = A[i] return min
Solution. Let X be the random variable denoting the number of times that min is assigned a value. We want to calculate E[X]. Let Xi be the random variable that is 1 if min is assigned A[i] and 0 otherwise. Clearly,
X = X1 + X2 + X3 + · · · + Xn Using the linearity of expectation we get
n
E[X] = E[Xi]
i=1 n
=Pr[Xi =1] i=1
Note that Pr[Xi = 1] is the probability that A[i] contains the smallest element among the elements
A[1],A[2],...,A[i]. Since the smallest of these elements is equally likely to be in any of the first i loca-
tions, we have Pr[Xi = 1] = 1. i
1.5 Probability Distributions
Tossing a coin is an experiment with exactly two outcomes: heads (“success”) with a probability of, say p, and tails (“failure”) with a probability of 1 − p. Such an experiment is called a Bernoulli trial. Let Y be a random variable that is 1 if the experiment succeeds and is 0 otherwise. Y is called a Bernoulli or an indicator random variable. For such a variable we have
E[Y]=p·1+(1−p)·0=p=Pr[Y =1]
Thus for a fair coin if we consider heads as ”success” then the expected value of the corresponding indicator
random variable is 1/2.
A sequence of Bernoulli trials means that the trials are independent and each has a probability p of success. We will study two important distributions that arise from Bernoulli trials: the geometric distribution and the binomial distribution.
CIS 121 – Draft of April 14, 2020 1 Review of Terms, Proofs, and Probability 10
The Geometric Distribution
Consider the following question. Suppose we have a biased coin with heads probability p that we flip repeatedly until it lands on heads. What is the distribution of the number of flips? This is an example of a geometric distribution. It arises in situations where we perform a sequence of independent trials until the first success where each trial succeeds with a probability p.
Note that the sample space Ω consists of all sequences that end in H and have exactly one H. That is Ω = {H,TH,TTH,TTTH,TTTTH,...}
For any ω ∈ Ω of length i, Pr[ω] = (1 − p)ip.
Definition. A geometric random variable X with parameter p is given by the following distribution for
i = 1,2,...:
We can verify that the geometric random variable admits a valid probability distribution as follows:
∞∞p∞p1−p (1−p)i−1p=p(1−p)i−1 = (1−p)i = · =1 i=1 i=1 1−pi=1 1−p 1−(1−p)
Note that to obtain the second-last term we have used the fact that ∞ ci = c , |c| < 1. i=1 1−c
Let’s now calculate the expectation of a geometric random variable, X. We can do this in several ways. One way is to use the definition of expectation.
∞
iPr[X=i] i=0
Pr[X = i] = (1 − p)i−1p
E[X] =
∞
= i(1 − p)i−1p
i=0
p ∞
= 1−p i(1−p)i i=0
p 1−p ∞
= 1−p (1−(1−p))2 ∵kxk =(1−x)2, for|x|<1.
i=0
p
Another way to compute the expectation is to note that X is a random variable that takes on non-negative
values. From a theorem proved in last class we know that if X takes on only non-negative values then
∞
E[X] = Pr[X ≥ i] i=1
p 1−p
=
=
x
1−p p2 1
CIS 121 – Draft of April 14, 2020 1 Review of Terms, Proofs, and Probability 11
Using this result we can calculate the expectation of the geometric random variable X. For the geometric random variable X with parameter p,
∞∞1 Pr[X≥i]=(1−p)j−1p=(1−p)i−1p(1−p)j =(1−p)i−1p×1−(1−p)=(1−p)i−1
Therefore
j=i j=0
∞∞1∞11−p1 E[X]=Pr[X ≥i]=(1−p)i−1 = (1−p)i = · =
i=1 i=1 1−pi=1 1−p 1−(1−p) p Binomial Distributions
Consider an experiment in which we perform a sequence of n coin flips in which the probability of obtaining heads is p. How many flips result in heads?
If X denotes the number of heads that appear then
Pr[X=j]= j p(1−p)
distribution on j = 0,1,2,...,n:
n j n−j Pr[X=j]= j p(1−p)
We can verify that the above is a valid probability distribution using the binomial theorem as follows
n n j n − j n
j=1 j p (1−p) =(p+(1−p)) =1
n j n−j
Definition. A binomial random variable X with parameters n and p is defined by the following probability
CIS 121 – Draft of April 14, 2020 1 Review of Terms, Proofs, and Probability 12
What is the expectation of a binomial random variable X? We can calculate E[X] is two ways. We first calculate it directly from the definition.
E[X ] = = = = = = =
=
nnj n−j
j j p(1−p)
n n! j n−j
j=0
j=1
jj!(n−j)!p (1−p) n n! j n−j
j=0
j=1 (j−1)!(n−j)!
np n j=1 n−1 np k=0
np
jj!(n−j)!p (1−p)
n n! pj(1−p)n−j
(n − 1)!
(j −1)!((n−1)−(j −1))!
(n−1)! k!((n−1)−k)!
pj−1(1 − p)(n−1)−(j−1) pk(1 − p)(n−1)−k
n−1 np k=0
n − 1 k
pk (1 − p)(n−1)−k
The last equation follows from the binomial expansion of (p + (1 − p))n−1.
We can obtain the result in a much simpler way by using the linearity of expectation. Let Xi, 1 ≤ i ≤ n be the indicator random variable that is 1 if the ith flip results in heads and is 0 otherwise. We have X = ni=1 Xi. By the lineartity of expectation we have
nn
E[X] = E[Xi] = p = np
i=1 i=1
What is the variance of the binomial random variable X? Since X = ni=1 Xi, and X1, X2, . . . , Xn are independent we have
n
Var[X] = Var[Xi]
i=1 n
= E[Xi2]−E[Xi]2 i=1
n
= (p − p2) i=1
= np(1 − p)
CIS 121 – Draft of April 14, 2020 1 Review of Terms, Proofs, and Probability 13
Examples
Example. Consider a sequence of n tosses of a fair coin. Define a “run” to be a sequence of consecutive tosses that all produce the same result (i.e. heads or tails). For example, if n = 7, then the sequence of outcomes
HHHTTHT
has four runs, namely (H H H H), (T T), (H) , and (T). Given a sequence of n tosses of a fair coin, what is
the expected number of runs you will see?
Solution. Let X be the random variable denoting the number of runs we see. We wish to compute E[X]. Define the indicator random variables
1 there is a run beginning at index i 0 otherwise
Xi =
Note that X1 = 1 since the first outcome always begins a new run. Moreover, we see that
and so by the linearity of expectation,
n
X = X1 + ... + Xn = Xi
i=1
nn
E[X] = E Xi = E[Xi]
i=1 i=1 RecallthatforanindicatorrandomvariableY,E[Y]=1·P(Y =1)+0·P(Y =0)=P(Y =1),hencefor
each i, we have
Now a run begins at index i whenever the i − 1-th outcome and the i-th outcome are different. For i > 1, this
occurs with probability 1 . Hence 2
Thus
Example.
Solution.
1 i=1 E[Xi] = 1 i > 1
2
E[Xi] = P(Xi = 1)
n n n1n−1n+1 E[X]=E[Xi]=1+E[Xi]=1+ =1+ =
i=1 i=2 i=1222 Find the expected length of an arbitrary run.
Let X be the length of an arbitrary run. The run can either be a run of heads or a run of tails. 2
We thus condition on this case: let Y be the random variable that is 1 if the run is a run of heads and 0 if it
is a run of tails. Then P(Y = 1) = P(Y = 0) = 1. Computing the expectation of X by conditioning on Y
gives:
E[X]=E[X|Y =1]P(Y =1)+E[X|Y =0]P(Y =0)
Given that Y = 1, i.e. that the run is a run of heads, X is a geometric random variable with parameter p = 1 ,
2
since finding the length of the run starting with a head is equivalent to finding the number of coin tosses we make until seeing the first tail. Hence E[X|Y = 1] = 2, since the expected value of a geometric random
CIS 121 – Draft of April 14, 2020 1 Review of Terms, Proofs, and Probability 14
variable with parameter p is 1 . Symmetrically, given that Y = 0, X is also a geometric random variable with p
parameter p = 1 . Hence E[X|Y = 0] = 2 as well. This gives: 2
E[X] = 2 · 1 + 2 · 1 = 2 22
Note that because of the symmetry, we could have assumed WLOG that the run started with heads and proceeded from there. However, in the case of a biased coin, we don’t have symmetry and thus conditioning is the way to go.
Example. We are trying to collect n different coupons that can be obtained by buying cereal boxes. The objective is to collect at least one coupon of each of the n types. Assume that each cereal box contains exactly one coupon and any of the n coupons is equally likely to occur. How many cereal boxes do we expect to buy to collect at least one coupon of each type?
Solution. Let the random variable X denote the number of cereal boxes bought until we have at least one coupon of each type. We want to compute E[X]. Let Xi be the random variable denoting the number of boxes bought to get the ith new coupon. Clearly,
X = X1 + X2 + X3 + . . . + Xn Using the linearity of expectation we have
E[X] = E[X1] + E[X2] + E[X3] + . . . + E[Xn]
What is the distribution of random variable Xi? Observe that the probability of obtaining the ith new coupon
is given by
pi = n−(i−1) = n−i+1 nn
Thus the random variable Xi , 1 ≤ i ≤ n is a geometric random variable with parameter pi . E[Xi]= 1 = n
pi n−i+1
E[X]=n+ n + n +···+n+n=nn 1
c < 1.
Hence the expected number of boxes needed to collect n coupons is about nH(n) < n(ln n + 1).
Applying linearity of expectation, we get
n n−1 n−1 2 1 i=1i
The summation n 1 is known as the harmonic number H (n) and H (n) = ln n + c, for some constant
i−1 i
Gale-Shapley Stable Matching 2
2.1 Background and Intuition
Many problems in the real world involve pairing up two groups of people or things in such a way that everyone is “happy.” For example, one may consider matching applicants to open job positions, performers to gigs, patients to hospital beds, and so on. What does it mean for everyone to be happy? Consider the following scenario: you and your friend are using career services to apply to internships at companies A and B. You really want to work for A, but your friend wants to work for B. On the other hand, both company A and B think you would be a much better employee than your friend.
Unfortunately, due to some logistical mix-up at the career services center, you are matched with company B and your friend is matched with company A. Clearly, you are unhappy, as is company A. We call this situation an instability. It is unstable in the sense that there’s nothing stopping company A from firing your friend and offering you the job. You would of course accept (since company A was your top choice), but now your friend has no job, and company B has no employee. Moreover, it is not immediately clear that just forcing company B to hire your friend solves the problem: what if multiple companies/applicants are involved? The situation could spiral out of control.
One of the basic problems with the above process is that it is not self-enforcing: if people are allowed to act in their self-interest, then it risks breaking down.
As another example, suppose that career services instead matched you with A and your friend with B. In this case, you and your friend are happy, as is company A. Unfortunately, company B is unhappy. However, in this case there is no risk of spiraling out of control: company B may try to reach out to you to make you an offer, but you are happy where you are. In this case, the matching is stable.
Gale and Shapley studied scenarios like these and asked the following question: Given a set of preferences among employers and applicants, is it possible to assign applicants to employers so that for every employer E and every applicant A who is not scheduled to work for E, at least one of the following is true:
(1) E prefers every one of its accepted applicants to A
(2) A prefers her current situation over working for employer E
If this holds, the outcome is stable. That is, individual self-interest will prevent any applicant/employer deal from being made behind the scenes.
2.2 Formulating the Problem
In first analyzing this question, we will make some simplifications. The employer/applicant scenario has some significant asymmetries. For example:
Employers typically look for multiple applicants, while applicants seek only a single employer Each applicant may not apply to every company
The number of employers or company may differ
These notes were adapted from Kleinberg and Tardos’ Algorithm Design
CIS 121 – Draft of April 14, 2020 2 Gale-Shapley Stable Matching 16
We will make the following assumptions: each of n applicants applies to each of n companies, and each company wants a single applicant. Gale and Shapley rephrased the problem in terms of a so-called stable marriage: each of n men and n women will be paired up for marriage.
Formally:
Consider a set M = {m1, ..., mn} of n men and a set W = {w1, ..., wn} of n women. Let M × W denote the
set of all possible ordered pairs of the form (m,w) where m ∈ M and w ∈ W.
Intuitively, a perfect matching corresponds to a way of pairing off the men and the women in such a way that everyone ends up married to somebody of the opposite gender, and nobody is married to more than one person. There is neither singlehood nor polygamy.
Now we can add the notion of preferences to this setting.
Note: There are no ties allowed in any man or woman’s preference lists. As we go through the remainder of the analysis, you should think about which statements require this constraint and which do not.
Now that we’ve formulated the problem, how do we express the problem we faced in the introduction in this new notation? Using the motivating example as guidance, we should be worried about the following situation: there are two pairs, (m,w) and (m′,w′) in S such that m prefers w′ to w and w′ prefers m to m′. In this case, there is nothing to stop m and w′ from abandoning their current partners and heading off together: this set of marriages is not self-enforcing.
Definition. A matching S is a subset of ordered pairs in M × W such that each member of M and each member of W appears in at most one pair in S.
A matching S′ is called a perfect matching if it is a matching and has the property that each member of M and each member of W appears in exactly one pair of S′.
Definition. Each man m ∈ M ranks all the women. There are no ties allowed. We say that m prefers w to w′ if m ranks w higher than w′. We will refer to the ordered ranking of m as his preference list.
Each woman, analogously, ranks all the men.
Definition. Using the notation above, the pair (m, w′) is called an instability with respect to the matching S. That is, (m,w′) does not belong to S, but each of m and w′ prefers the other to their partner in S.
Definition. A matching S is stable if:
1. it is perfect
2. there are no instabilities with respect to S
Our goal, then, is to find a stable matching. Two questions immediately spring to mind:
1. Does there exist a stable matching for every set of preference lists?
2. Given a set of preference lists, can we efficiently construct a stable matching if there is one?
CIS 121 – Draft of April 14, 2020 2 Gale-Shapley Stable Matching 17
mw
m′ w′
Figure 2.1: Perfect matching S with instability (m,w′)
2.3 Examples
To illustrate the definitions from the previous section, consider the following two very simple instances of the stable matching problem:
First, suppose we have a set of two men, M = {m, m′} and a set of two women W = {w, w′}. The preference lists are as follows:
m prefers w to w′ m′ prefers w to w′ w prefers m to m′ w′ prefers m to m′
Intuitively, this case represents complete agreement: the men agree on the order of the women and the women agree on the order of the men. There is a unique stable matching in this case, consisting of the pairs (m, w) and (m′,w′). The other perfect matching, consisting of the pairs (m′,w) and (m,w′) is NOT stable: the pair
(m, w) would form an instability with respect to this matching. Here’s a more interesting example: suppose the preferences are:
m prefers w to w′ m′ prefers w′ to w w prefers m′ to m w′ prefers m to m′
In this case, the two men’s preferences mesh perfectly with each other (they rank different women first), and the two women’s preferences likewise mesh perfectly with each other. However, the men’s preferences clash completely with the women’s preferences.
In this second example, there are two different stable matchings. The matching consisting of the pair (m, w) and (m′,w′) is stable because both men are as happy as possible and so neither would leave their matched partner. But the matching consisting of the pairs (m′,w) and (m,w′) is also stable, for the complementary reason that both women are as happy as possible. This is an important point to remember: it is possible for an instance to have more than one stable matching
CIS 121 – Draft of April 14, 2020 2 Gale-Shapley Stable Matching 18
2.4 Designing an Algorithm
We still have two questions to answer, namely is there always a stable matching and how do we find one if/when it exists? The following provides the answer:
We will prove this claim by giving an algorithm that takes the preference lists and constructs a stable matching. This will answer both of the above questions simultaneously.
Before constructing the algorithm, let’s summarize the basic ideas that are relevant to the problem:
Initially, everyone is unmarried. Suppose an unmarried man m chooses the woman w who ranks highest on his preference list and proposes to her. Can we declare immediately that (m, w) will be one of the pairs in our final stable matching? Not necessarily: at some point in the future, a man m′ whom w prefers more than m may propose to her. On the other hand, it may be dangerous for w to reject m right away: she may never receive a proposal from someone she ranks as highly as m. So a natural idea would be to have the pair (m, w) enter into an intermediate state—engagement.
Suppose we are now at a state in which some men and women are free—not engaged—and some are engaged. The next step could look like this: An arbitrary man m chooses the highest-ranked woman w to whom he has not yet proposed, and he proposes to her. If w is also free, then m and w become engaged. Otherwise, w is already engaged to some other man m′. In this case, she determines which of m or m′ ranks higher on her preference list: this man becomes engaged to w and the other man becomes free.
Finally, the algorithm will terminate when no one is free. At this moment, all engagements are declared final, and the resulting perfect matching is returned
Here’s a more concise description of the Gale-Shapley Algorithm:
Proposition 1. Given a set of n men and a set of n women, there exists a stable matching for every set of preference lists among the men and women.
Gale-Shapley Algorithm for Stable Matching
Input: A set of n men, M, a set of n women, W, and the associated preference lists Output: A stable matching
Initially all m∈M and w∈W are free
While there is a free man m who hasn’t proposed to every woman Choose such a man m
Let w be the highest ranked woman in m’s preference list
to whom he has not yet proposed If w is free then
(m, w) become engaged
Else w is currently engaged to m′ If w prefers m′ to m then
m remains free Else w prefers m to m′
CIS 121 – Draft of April 14, 2020 2 Gale-Shapley Stable Matching 19
(m, w) become engaged m′ becomes free
Return the set S of engaged pairs
2.5 Runtime of the GS Algorithm
Before proving the correctness of the algorithm, we first prove its runtime. That is, we will answer the question: how many iterations are needed for the algorithm to terminate?
Proposition 2. The GS algorithm terminates after at most n2 iterations of the while loop
Proof. A useful strategy for upper-bounding the running time of an algorithm is to find a measure of progress. That is, we want to find a precise way of saying that each step taken by the algorithm brings it closer to termination.
In the GS algorithm, each iteration consists of some man proposing (for the only time) to a woman he has never proposed to before. So if we let P(t) denote the set of pairs (m,w) such that m has proposed to w by the end of iteration t, we see that for all t, the size of P(t + 1) is strictly greater than the size of P(t). But there are only n2 possible pairs of men and women in total, so that value of P(·) can increase at most n2 times over the course of the algorithm. Thus there can be at most n2 iterations.
An important note about the above algorithm: there are many quantities that would not have worked well as a progress measure for the GS algorithm, since they need not strictly increase in each iteration. For example, the number of free individuals could remain constant from one iteration to the next, as could the number of engaged pairs. Thus these quantities could not be used directly in giving an upper bound on the maximum possible number of iterations, in the style of the above proof.
2.6 Correctness of the GS Algorithm
While we showed above that the GS algorithm terminates, it is not obvious at all that this algorithm returns a stable matching, or even a perfect matching. We will prove through a series of intermediate facts that it does indeed return a stable matching.
Consider the view of a woman w during the execution of the GS algorithm. For a while, no one has proposed to her, and she is free. Then a man m may propose to her, and she becomes engaged. As time goes on, she may receive additional proposals, accepting those that increase the rank of her partner. Hence:
The view of a man m during the execution of the algorithm is very different. He is free until he proposes to the highest ranked woman on his list. At this point, he may or may not become engaged. As time goes on, he may alternate between being free and being engaged, however it is easy to see that:
Lemma 1. w remains engaged from the point at which she receives her first proposal. The sequence of partners to which she is engaged gets better and better (in terms of her preference list)
CIS 121 – Draft of April 14, 2020 2 Gale-Shapley Stable Matching 20
Lemma 2. The sequence of women to whom m proposes gets worse and worse (in terms of his preference list)
Now, let us establish that the set S returned at the termination of the algorithm is in fact a perfect matching. Why is this not immediately obvious? Essentially, we have to show that no man can “fall off” the end of his preference list. That is, the only way for the while loop to exit is for there to be no free man. In this case, the set of engaged couple would indeed be a perfect matching. The following lemma and its proof establish this claim.
Proof. Seeking contradiction, suppose there comes a point when m is free but has already proposed to every woman. By Lemma 1, each of the n women is engaged at this point in time. Since the set of engaged pairs forms a matching, there must also be n engaged men at this point in time. But there are only n men in total, and m is not engaged, so this is a contradiction.
Lemma 4. The set S returned at termination is a perfect matching
Proof. The set of engaged pairs always forms a matching. Let us suppose for the sake of contradiction that the algorithm terminates with a free man m. At termination, it must be the case that m has already proposed to every woman, for otherwise the while loop would not have exited. But this contradicts Lemma 3, which says that there cannot be a free man who has proposed to every woman.
Now we have the tools to prove the main property of the algorithm, namely that it results in a stable matching.
Proof. We know by Lemma 4 that S is a perfect matching. Thus, it suffices to show S has no instabilities. Seeking contradiction, suppose we had the following instability: there are two pairs (m,w) and (m′,w′) in S such that
m prefers w′ to w and w′ prefers m to m′
In the execution of the algorithm that produced S, the last proposal of m was, by definition, to w. Now we ask: Did m propose to w′ at some earlier point in this execution? If he didn’t, then w must occur higher than w′ on m’s preference list, contradicting out assumption that m prefers w′ to w. If he did, then he was rejected by w′ in favor of some other man m′′ whom w′ prefers to m. m′ is the final partner of w′, so either m′′ = m′ or by Lemma 1, w′ prefers her final partner m′ to m′′. Either way, this contradicts our assumption that w′ prefers m to m′.
It follows that S is a stable matching.
Lemma 3. If m is free at some point in the execution of the algorithm, then there is a woman to whom he has not yet proposed
Proposition 3. Consider an execution of the GS algorithm that returns a set of pairs S. The set S is a stable matching.
CIS 121 – Draft of April 14, 2020 2 Gale-Shapley Stable Matching 21
2.7 Extensions
Now that we’ve shown the GS algorithm constructs a stable matching, we can now consider some further questions about the behavior of the GS algorithm and its relation to the properties of different stable matchings.
To begin, recall that we saw an example earlier in which there could be multiple stable matchings. The preference lists in this example were as follows:
m prefers w to w′ m′ prefers w′ to w w prefers m′ to m w′ prefers m to m′
Now, in any execution of the GS algorithm, m will become engaged to w, m′ will become engaged to w′ (perhaps in the other order), and things will stop there. Thus the other stable matching, consisting of the pairs (m′,w) and (m,w′) is not attainable from an execution of the GS algorithm in which the men propose.
On the other hand, it would be reached if we ran a version of the algorithm in which the women propose. You can imagine that in larger examples (i.e. with n > 2), it is possible to have an even larger collection of possible stable matchings, many of them not achievable by the GS algorithm.
This example shows a certain “unfairness” in the GS algorithm favoring men. If the men’s preferences mesh perfectly (they all list different women as their first choice), then in all runs of the GS algorithm, all men end up matched with their first choice, independent of the preferences of the women. If the women’s preferences clash completely with the men’s preferences (as was the case in this example), then the resulting stable matching is as bad as possible for the women. So this simple set of preferences lists compactly summarizes a world in which someone is destined to end up unhappy: woman are unhappy if men propose, and men are unhappy if woman propose.
Let’s now see if we can generalize this notion of “unfairness”.
Note that the above example illustrates that the GS algorithm is actually underspecified: as long as there is a free man, we are allowed to choose any free man to make the next proposal. Different choices specify different executions of the algorithm. This is why, to be careful, we stated in Proposition 3 to “Consider an execution of the GS algorithm that returns a set of pairs S” rather than “Consider the set S returned by the GS algorithm.”
This raises a natural question: Do all executions of the GS algorithm yield the same matching? It turns out, the answer is “Yes.”
All Executions Yield the Same Matching
It turns out that the easiest and most informative way to prove this claim is the uniquely characterize the matching that is obtained from the GS algorithm and then show that all executions result in the matching with this characterization.
First, let us introduce some relevant definitions:
Definition. A woman w is a valid partner of a man m if there is a stable matching that contains the pair (m, w).
CIS 121 – Draft of April 14, 2020 2 Gale-Shapley Stable Matching 22
A woman w is the best valid partner of a man m, denoted best(m), if w is a valid partner of m and no woman whom m ranks higher than w is a valid partner of his.
Define the set
S∗ ={(m,best(m))|m∈M}
That is, the set where each man is paired with his best valid partner. We have the following result:
Proposition 4. Every execution of the GS algorithm results in the set S∗.
Before proving this, let’s appreciate this statement. First of all, it is not obvious at all that S∗ is even a matching, much less a stable matching. After all, why couldn’t two men have the same best valid partner? Second, the result shows that the GS algorithm gives the best possible outcome for every man simultaneously: there is no stable matching in which any of the men could have hoped to do better. Lastly, it answers the question above by showing that the order of proposals in the GS algorithm has absolutely no effect on the final outcome.
Despite all this, the proof is not so difficult:
Proof. Seeking contradiction, suppose that some execution ε of the GS algorithm results in a matching S in which some man is paired with a woman who is not his best valid partner. Since men propose in decreasing order of preference, this means that some man is rejected by a valid partner during the execution ε of the algorithm. So consider the first moment during the execution ε in which some man, say m, is rejected by a valid partner w. Again, since men propose in decreasing order of preference, and since this is the first time such a rejection has occurred, it must be that w is m’s best valid partner, i.e. w = best(m).
The rejection of m by w may have happened either because m proposed and was turned down in favor of w’s existing partner, or because w broke her engagement to m in favor of a better proposal. But either way, at this moment, w forms or continues an engagement with a man m′ whom she prefers to m.
Since w is a valid partner of m, there exists some stable matching S′ containing the pair (m,w). Now we ask: Who is m′ paired with in the matching S′? Suppose it is a woman w′ ̸= w.
Since the rejection of m by w was the first rejection of a man by a valid partner in the execution ε, it must be that m′ had not been rejected by any valid partner at the point in ε when he became engaged to w. Since he proposed in decreasing order of preference, and since w′ is clearly a valid partner of m′, it must be that m′ prefers w to w′. But we have already seen that w prefers m′ to m, for in execution ε she rejected m in favor of m′. Since (m′, w) ∈/ S′, it follows that (m′, w) is an instability in S′.
This contradicts our claim that S′ is stable, and hence contradicts our initial assumption.
So for the men, the GS algorithm is ideal. Unfortunately the same cannot be said for the women.
Definition. m is a valid partner of a woman w if there is a stable matching that contains the pair (m, w).
m is the worst valid partner of w, denoted worst(w), if m is a valid partner of w and no man whom w ranks lower than m is a valid partner of hers.
CIS 121 – Draft of April 14, 2020 2 Gale-Shapley Stable Matching 23
Proposition 5. In the stable matching S∗, each woman is paired with her worst valid partner. That is, S∗ = {(m,best(m) | m ∈ M} = {(worst(w),w) | w ∈ W}
Proof. Seeking contradiction, suppose there were a pair (m, w) in S∗ such that m is not the worst valid partner of w. Then there is a stable matching S′ in which w is paired with a man m′ whom she likes less than m. In S′, m is paired with a woman w′ ̸= w. Since w is the best valid partner of m, and w′ is a valid partner of m, we see that m prefers w to w′.
But from this it follows that (m,w) is an instability in S′, contradicting the claim that S′ is stable, and hence contradicting our initial assumption.
Thus, we find that the simple example above, in which the men’s preferences clashed with the women’s, hinted at a very general phenomenon: for any input, the side that does the proposing in the GS algorithm ends up with the best possible stable matching (from their perspective), while the side that does not do the proposing correspondingly ends up with the worst possible stable matching.
Note: The above analysis was performed under the assumption that there were no ties in any of the preference lists, and that the number of men and women were equal. What parts of the analysis would change if these assumptions didn’t hold?
3.1 Definitions
Greatest Common Divisor 3
We first begin by introducing some intuitive definitions:
Definition. Let a,b ∈ Z. We call an integer d a common divisor of a and b provided d | a and d | b. For example, the common divisors of 24 and 30 are −6, −3, −2, −1, 1, 2, 3, and 6.
Going back to the example above, we have gcd(24, 30) = 6. Note also that gcd(−24, −30) = 6. If two numbers have a gcd, then it is unique. This justifies the use of the phrase “the greatest common divisor.”
3.2 Calculating the GCD
A naive way to compute the GCD of two numbers is to simply list all of their common factors and choose the largest. This brute force solution gives rise to the following algorithm:
1. Let a and b be two positive integers
2. For every positive integer k from 1 to min(a, b), see whether k | a and k | b. If so, save that number k in
a list l
3. Return the largest number in the list l as gcd(a, b).
While this algorithm certainly works, its runtime is abysmal. Even for moderately large numbers (for example, a = 34902 and b = 34299883), the algorithm needs to perform many, many divisions.
A faster and more clever algorithm for finding the GCD of two numbers a and b was first described in Euclid’s Elements over 2000 years ago. Not only is it extremely fast, but it is also easy to implement as a computer program.
The central idea behind the algorithm is based on the following proposition:
Definition. Let a, b ∈ Z. We call an integer d the greatest common divisor of a and b provided (1) d is a common divisor of a and b AND
(2) ifd′ isany commondivisorofaandb,thend′ ≤d
We generally denote the greatest common divisor of a and b as gcd(a, b).
Proposition 1. Let a and b be positive integers and let c = a mod b. Then gcd(a, b) = gcd(b, c)
In other words, for positive integers a and b, we have
gcd(a, b) = gcd(b, a mod b)
These notes were adapted from Scheinerman’s Mathematics: A Discrete Introduction
CIS 121 – Draft of April 14, 2020 3 Greatest Common Divisor 25
Proof. As above, let a and b be positive integers and let c = a mod b. This means that a = qb + c where 0 ≤ c < b. Let d = gcd(a, b) and e = gcd(b, c). We must show that d = e. To do this, we will prove that d ≤ e and e ≤ d.
First we will show that d ≤ e. Since d = gcd(a, b), we know that d | a and d | b. Hence, since c = a − qb we must have d | c. Thus d is a common divisor of b and c. Since e is the greatest common divisor of b and c, we have d ≤ e.
Now, we will show that e ≤ d. Since e = gcd(b, c) we know e | b and e | c. Since a = qb + c, we must have that e|aaswell.Sincee|aande|b,eisacommondivisorofaandb.However,sincedisthegreatest common divisor of a and b, this tells us that e ≤ d.
We have shown that d ≤ e and e ≤ d, and hence d = e. That is, gcd(a,b) = gcd(b,c).
To illustrate how the above proposition enables us to calculate GCDs efficiently, we compute gcd(689, 234). The brute force algorithm described earlier would have us try all possible common divisors from 1 to 234 and select the largest, requiring hundreds of divisions!
Instead, we can use the above proposition to make life easier. Let a = 689 and b = 234. Performing one division,weseethatc=689 mod234=221.Thepropositiontellsusthattofindgcd(689,234),itissufficient to find gcd(234, 221). Let’s record this step:
689 mod 234 = 221 ⇒ gcd(689, 234) = gcd(234, 221)
Now all we have to do is calculate gcd(234, 221). Using the same idea now with a = 234 and b = 221, we have
c = 234 mod 221 = 13. Note that so far we have only performed two divisions. Let’s record this step: 234 mod 221 = 13 ⇒ gcd(234, 221) = gcd(221, 13)
Now the problem is reduced to gcd(221, 13). Note that the numbers are significantly smaller than the original 689and234.Tocompletethecomputation,weperformathirdandfinaldivisiontofindthat221 mod13=0, i.e. 13 | 221. This tells us that the greatest common divisor of 13 and 221 is 13. Recording this final step gives:
221 mod 13 = 0 ⇒ gcd(221,13) = 13 We are finished! After only three divisions, we have found that
gcd(689, 234) = gcd(234, 221) = gcd(221, 13) = 13
The steps we just performed are precisely the Euclidean algorithm. Here is a formal, general description:
Euclid’s Algorithm for Greatest Common Divisor
Input: Positive integers a and b Output: gcd(a,b)
(1) Letc=a modb
(2) Ifc=0,returnb
(3) Otherwise (c ̸= 0), calculate and return gcd(b, c)
Let’s see how the algorithm works for a0 = 63 and b0 = 75.
CIS 121 – Draft of April 14, 2020 3 Greatest Common Divisor 26
Thefirststepcalculatesc0 =a0 modb0 =63 mod75=63
Next, since c0 ̸= 0, we now compute gcd(b0 , c0 ) = gcd(75, 63).
Nowwerestarttheprocesswitha1 =75andb1 =63.Wefindc1 =75 mod63=12.Since12̸=0,we
now compute gcd(b1, c1) = gcd(63, 12)
Werestartagainwitha2 =63,b2 =12.Wefindc2 =63 mod12=3.Since3̸=0,wenowcompute
gcd(b2, c2) = gcd(12, 3)
Witha3 =12andb3 =3,wefindc3 =12 mod3=0.Hencewereturnb3 =3andwearefinished.
Here is an overview of the calculation in chart form:
Note how the answer is found using only four divisions.
Another way to visualize the computation is by using a list. The first two entries are a and b. Now we extend the list by computing the mod of the last two entries of the list. When we reach 0 we stop. The next-to-last entry of the list is the GCD of a and b. In this example, this process would look like:
abc
63 75 63 75 63 12 63 12 3 12 3 0
and so we would return 3, just as before.
63 75 63 75 63 75 63 75 63 75
63
63 12
63 12 3 63 12 3 0
3.3 Correctness of Euclid’s Algorithm
To prove the correctness of Euclid’s algorithm, we will use the “smallest-counterexample” method. This is a proof technique closely related to induction and sometimes useful for proving the correctness of recursive algorithms. It works by assuming the algorithm fails on some input, then considering the smallest such input, then proving that, in fact, there must be some smaller input for which the algorithm failed, giving a contradiction.
Proof. Suppose for the sake of contradiction that Euclid’s algorithm failed to correctly compute the GCD. Let a and b be two positive integers for which the algorithm failed, and let them be such that a + b is as small as possible.
It may be the case that a < b. If this is so, then the first pass of Euclid’s algorithm will simply interchange the values of a and b (as we saw above when computing gcd(63, 75)), since if a < b, then a mod b = a and so gcd(b,a modb)=gcd(b,a).
Proposition 2. Euclid’s algorithm as described above correctly computes gcd(a,b) for any positive integers a and b.
CIS 121 – Draft of April 14, 2020 3 Greatest Common Divisor 27
Hence, WLOG we may assume a ≥ b.
Thefirststepofthealgorithmistocomputec=a modb.Inthecasethatc=0,a modb=0whichimplies b | a. Since b > 0 by assumption, b is clearly the largest divisor of b. Since b | a, b is the GCD of a and b. Since the algorithm returns b in this case, the algorithm returns the correct result. Thus it must be the case that c ̸= 0.
Recall, we may write a = qb + c where 0 < c < b (the inequality is strict since we now know c ̸= 0). We also have assumed WLOG that b ≤ a. Hence,
c a > 0 and a ≥ b > 0, so we have a > 0 and a − b ≥ 0 but a − 2b < 0. Hence the
quotient when a is divided by b is 1. So the remainder in a divided by b is c = a − b. Now rewriting
Proposition3. Leta,b∈Zwitha≥b>0.Letc=a modb.Thenc
c=a−b 0 and A[i] > key
A[i + 1] = A[i]
tarted i=i−1 A[i + 1] = key
5
2
4
6
1
3
2
5
4
6
1
3
2
4
5
6
1
3
2
4
5
6
1
3
1
2
4
5
6
3
1
2
3
4
5
6
rectangles, and values stored in the array positions appear within the rectangles. (a)-(e) The iterations of the for loop of lines
to its left in the test of line 5. Shaded arrows show array values moved one position to the right in line 6, and black arrows indicate where the key moves to in line 8. (f) The final sorted array.
appear above the rectangles, and values stored in the array positions appear within the rectangles. (a)–(e) The iterations of the for loop of lines 1–8. In each iteration, the black rectangle holds the
These notes were adapted from CLRS Chapter 2
key taken from AŒj , which is compared with the values in shaded rectangles to its left in the test of line 5. Shaded arrows show array values moved one position to the right in line 6, and black arrows indicate where the key moves to in line 8. (f) The final sorted array.
INSERTION-SORT.A/
CIS 121 – Draft of April 14, 2020 4 Insertion Sort 30
Now, we must ask ourselves two questions: does this algorithm work, and does it have good performance?
4.2 Correctness of Insertion Sort
Once you figure out what Insertion-Sort is doing, you may think that it’s “obviously” correct. However, if you didn’t know what it was doing and just got the above code, maybe this wouldn’t be so obvious. Additionally, for algorithms that we’ll study in the future, it won’t always be obvious that it works, and so we’ll have to prove it. To warm us up for those proofs, let’s carefully go through a proof of correctness of Insertion-Sort.
We’ll prove the correctness of Insertion-Sort formally using a loop invariant:
At the start of each iteration of the for loop of lines 1-8, the subarray A[1..j − 1] consists of the
elements originally in A[1..j − 1], but in sorted order. To use loop invariants, we must show three things:
1. Initialization: It is true prior to the first iteration of the loop.
2. Maintenance: If it is true before an iteration of the loop, it remains true before the next iteration
3. Termination: When the loop terminates, the invariant gives us a useful property that helps show that
the algorithm is correct.
Note that this is basically mathematical induction: the initialization is the base case, and the maintenance is the inductive step).
In the case of insertion sort, we have:
Initialization: Before the first iteration (which is when j = 2), the subarray A[1..j − 1] is just the first element of the array, A[1]. This subarray is sorted, and consists of the elements that were originally in A[1..1].
Maintenance: Suppose A[1..j − 1] is sorted. Informally, the body of the for loop works by moving A[j − 1], A[j − 2], A[j − 3] and so on by one position to the right until it finds the proper position for A[j] (lines 4-7), at which point it inserts the value of A[j] (line 8). The subarray A[1..j] then consists of the elements originally in A[1..j], but in sorted order. Incrementing j for the next iteration of the for loop
then preserves the loop invariant.
Termination: The condition causing the for loop to terminate is that j > n. Because each loop iteration
increases j by 1, we must have j = n + 1 at that time. By the initialization and maintenance steps, we have shown that the subarray A[1..n + 1 − 1] = A[1..n] consists of the elements originally in A[1..n], but in sorted order. Observing that the subarray A[1..n] is the entire array, we conclude that the entire array is sorted. Hence, the algorithm is correct.
4.3 Running Time of Insertion Sort
The running time of an algorithm on a particular input is the number of primitive operations or “steps” executed. We assume that a constant amount of time is required to execute each line of our pseudocode. One line may take a different amount of time than another line, but we shall assume that each execution of the ith line takes time ci, where ci is a constant.
CIS 121 – Draft of April 14, 2020
26 Chapter 2 Getting Started
INSERTION-SORT.A/
1 2 3
4 5 6 7 8
4 Insertion Sort 31
for j D 2 to A:length
key D AŒj
// Insert AŒj into the sorted
cost times
c1 n
c2 n1
0 nP 1
c4 n1
sequence AŒ1 : : j 1. iDj1
whilei >0andAŒi>key AŒi C 1 D AŒi
c Pn
t
iDi1 AŒiC1Dkey
jD2 jD2 j
n1
The running time of the algorithm is the sum of running times for each state-
ment executed; a statement that takes c steps to execute and executes n times will i
i
The running time of the algorithm is the sum of running times for each statement executed; a statement that
takes ci steps to execute and executes n times will contribute 6cin to the total running time.
contribute c n to the total running time. To compute T .n/, the running time of
INSERTION-SORT on an input of n values, we sum the products of the cost and To compute T(n), the running time of Insertion-Sort on an input of n values, we sum the products of
times columns, obtaining th the cost and times columns. Let tj represent the number of times the while loop test is executed on the j
iteration of the for loop. Xn Xn
T.n/ D c nCc .n1/Cc .n1/Cc t Cc .t 1/
1 2 n 4 n 5 j n6 j
jD2 jD2
T(n)=c n+c (n−1)+c (nX−1)+c t ++c (t −1)+c (t −1)+c (n−1)
j=2
Cc7
So if I asked you how long an algorithm takes to run, you’d probably say, “it depends on the input you give it.”
1245j6j7j8
of size n.
j D 2;3;:::;n, and the best-case running time is In the best case, tj = 1 for all j, and thus we can rewrite T(n) as:
n jD2
j=2 j=2 .tj 1/Cc8.n1/:
Even for inputs of a given size, an algorithm’s running time may depend on which input of that size is given. For example, in INSERTION-SORT, the best
In this class we mostly consider the “worst-case running time”, which is the longest running time for any input
For insertion sort, wchaasteaorcectuhreswifortshteaanrdrabyesitscalsreaindpyustso?rtTedh.e bFeosrt ecacsehijnpuDt i2s;a3s;o:r:t:e;dna,rrwaye, tbhecnaufisnedyou neverhavetomovethelaetmAenŒits,andketyheinwloirnsetc5aswehiennpuithisasairtesveinrsietiasolrvteadluaerroafyj(i.e.,1d.ecTrheausintjgoDrde1r)f.or
j
T.n/ D c1nCc2.n1/Cc4.n1/Cc5.n1/Cc8.n1/ T(n) = (c1 +c2 +c4 +c5 +c8)n−(c2 +c4 +c5 +c8)
D .c1 Cc2 Cc4 Cc5 Cc8/n.c2 Cc4 Cc5 Cc8/:
We can express this running time as an + b for constants a and b that depend on the statement costs ci; it is
We can express this running time as an C b for constants a and b that depend on
thus a linear function of n.
the statement costs c ; it is thus a linear function of n.
i
In the worst case, weIfmthuestacroramypaisreinearcehveerlseemseonrtteAd[jo]rdweitr—h ethacaht iesl,eminendtecinreathsiengenotirrdeers—ortehde swuobrasrtray
n n(n+1) n n(n−1)
A[1..j−1],sot =jcafoserarellsju.ltSsu.bWsteitumtiunsgticnompareje=achelem−en1taAndŒjwith(je−ac1h)=element,inwethceanenrteiwrerite
c c c8
5 jD2
6 7
j
n .t1/
j
Pn .t 1/
j=2 2 j=2 2
T(n)as: sortedsubarrayAŒ1::j1,andsotj DjforjD2;3;:::;n.Notingthat
T(n)=c5 +c6 +c7n2+c +c +c +c5 −c6 −c7 +cn−(c +c +c +c) 222 1242228 2458
6This characteristic does not necessarily hold for a resource such as memory. A statement that
references m words of memory and is executed n times does not necessarily reference mn distinct We can express this worst-case running time as an2 + bn + c for constants a, b, and c that again depend on
words of memory.
the statement costs ci; it is thus a quadratic function of n.
Running Time and Growth Functions 5
5.1 Measuring Running Time of Algorithms
One way to measure the running time of an algorithm is to implement it and then study its running time by executing it on various inputs. This approach has the following limitations.
It is necessary to implement and execute an algorithm to study its running time experimentally.
Experiments can be done only on a limited set of inputs and may not be indicative of the running time
on other inputs that were not included in the experiment.
It is difficult to compare the efficiency of two algorithms unless they are implemented and executed in
the same software and hardware environments.
Analyzing algorithms analytically does not require that the algorithms be implemented, takes into account all possible inputs, and allows us to compare the efficiency of two algorithms independent of the hardware and software environment.
5.2 RAM Model of Computation
The model of computation that we use to analyze algorithms is called the Random Access Machine(RAM). In this model of computation we assume that high-level operations that are independent of the programming language used take 1 time step. Such high-level operations include the following: performing arithmetic operations, comparing two numbers, assigning a value to a variable, calling a function, returning from a function, and indexing into an array. Note that actually, each of the above high-level operations take a few primitive low-level instructions whose execution time depends on the hardware and software environment, but this is bounded by a constant.
Note that in this model we assume that each memory access takes exactly one time step and we have unlimited memory. The model does not take into account whether an item is in the cache or on the disk.
While some of the above assumptions do not hold on a real computer, the assumptions are made to simplify the mathematical analysis of algorithms. Despite being simple, RAM captures the essential behavior of computers and is an excellent model for understanding how an algorithm will perform on a real computer.
5.3 Average Case and Worst Case
As we saw in class (Insertion Sort), an algorithm works faster on some inputs than others. How should we express the running time of an algorithm? Let T(n) denote the running time (number of high-level steps) of algorithm on an input of size n. Note that there are 2n input instances with n bits. Below are three possibilities of inputs that we will consider.
A typical input: The problem with considering a typical input is that different applications will have very different typical inputs.
Average Case: Average case analysis is challenging. It requires us to define a probability distribution on the set of inputs, which is a difficult task.
CIS 121 – Draft of April 14, 2020 5 Running Time and Growth Functions 33
5.4
Worst Case: In this measure we consider the input on which the algorithm performs the slowest. When we say that T (n) is the worst case running time of an algorithm we mean that the algorithm takes no more than T(n) steps for any input of size n. In other words, it is an upper bound on the running time of the algorithm for any input. This measure is a nice clean mathematical definition and is easiest to analyze.
Order of Growth
To simplify our analysis we will ignore the lower-order terms in the function T(n) and the multiplicative constants in front. That is, it is the rate of growth or the order of growth that will be of interest to us. We also ignore the function for small values of n and focus on the asymptotic behavior as n becomes very large. Some reasons are as follows.
The multiplicative constants depends on how fast the machine is and the precise definition of an operation.
Counting every operation that the algorithm takes is tedious and more work than it is worth.
It is much more significant whether the time complexity is T (n) = n2 or n3 than whether it is T (n) = 2n2
or T(n) = 3n2.
Definitions of Asymptotic Notations – O,Ω,Θ,o,ω
If f(n) is in O(g(n)) (pronounced “big-oh of g of n”), g(n) is an asymptotic upper bound for f(n).
If f(n) is in Ω(g(n)) (pronounced “omega of g of n”), g(n) is an asymptotic lower bound for f(n).
If f(n) is Θ(g(n)), g(n) is an asymptotically tight bound for f(n).
Definition. O-notation
f(n) ∈ O(g(n)) if there exist positive constants c and n0 such that 0 ≤ f(n) ≤ cg(n) for all n ≥ n0.
f(n) ∈ O(g(n)) if lim f(n) = 0 or a constant. n→∞ g(n)
Definition. Ω-notation
f(n) ∈ Ω(g(n)) if there exist positive constants c and n0 such that 0 ≤ cg(n) ≤ f(n) for all n ≥ n0.
f(n) ∈ Ω(g(n)) if lim f(n) = ∞ or a constant. n→∞ g(n)
Definition. Θ-notation
f(n) ∈ Θ(g(n)) if there exist positive constants c1, c2, and n0 such that 0 ≤ c1g(n) ≤ f(n) ≤ c2g(n) for
all n ≥ n0.
Thus, we can say that f(n) ∈ Θ(g(n)) if and only if f(n) ∈ O(g(n)) and f(n) ∈ Ω(g(n)).
f(n) ∈ Θ(g(n)) if lim f(n) = a nonzero constant. n→∞ g(n)
CIS 121 – Draft of April 14, 2020
3.1 Asymptotic notation
c2g.n/ f .n/
c1g.n/
5 Running Time and Growth Functions 34
cg.n/ f .n/
45
f .n/ cg.n/
nnn n0 f.n/ D ‚.g.n// n0 f.n/ D O.g.n// n0 f.n/ D .g.n//
(a) (b) (c)
Figure 5.1: From CLRS, graphic examples of the O, Ω and Θ notations. In each part, the value of n0 shown is the minimum
possible value; any greater value would also work. (a) Θ-notation bounds a function to within constant factors. We write
f(n) = Θ(g(n)) if Fthiegruereexi3st.1posGitirvaepchoincsetaxnatms pnl0e,sc1o,fatnhdec‚2 ,suOch, athnadt at nanodtatoiotnhse. rIignhetaocfhnp0,artth,ethvaeluvealoufef(onf)na0lwsahyoswlines
between c1g(n) and c2g(n) inclusive. (b) O-notation gives an upper bound for a function to within a constant factor. We write is the minimum possible value; any greater value would also work. (a) ‚-notation bounds a func-
tion to within constant factors. We write f .n/ D ‚.g.n// if there exist positive constants n , c , 01
f(n) = O(g(n)) if there are positive constants n0 and c such that at and to the right of n0, the value of f(n) always lies on or
below cg(n). (c) Ω-notation gives a lower bound for a function to within a constant factor. We write f(n) = Ω(g(n)) if there are and c such that at and to the right of n , the value of f .n/ always lies between c g.n/ and c g.n/
positive constants n0 an2d c such that at and to the right of n0, the value of f(n) always lies on or above cg1(n). 2 inclusive. (b) O-notation gives an upper bound for a function to within a constant factor. We write f .n/ D O.g.n// if there are positive constants n0 and c such that at and to the right of n0, the value
If f(n) is in o(g(n)) (pronounced “little-oh of g of n”), g(n) is an upper bound that is not asymptotically of f .n/ always lies on or below cg.n/. (c) -notation gives a lower bound for a function to within
tight for f (n). a constant factor. We write f .n/ D .g.n// if there are positive constants n0 and c such that at and to the right of n , the value of f .n/ always lies on or above cg.n/.
0
Definition. o-notation
f(n) ∈ o(g(n)) if for any positive constant c > 0, there exists a positive constant n0 > 0 such that 1
A function f.n/ belongs to the set ‚.g.n// if there exist positive constants c 0 ≤ f(n) < cg(n) for all n ≥ n0.
and c such that it can be “sandwiched” between c g.n/ and c g.n/, for suffi- 212
f (n)
f(n)∈o(g(n)c)ieifntlliymlarge=n0.. Because ‚.g.n// is a set, we could write “f.n/ 2 ‚.g.n//”
g(n)
to indicate that f.n/ is a member of ‚.g.n//. Instead, we will usually write
n→∞
“f .n/ D ‚.g.n//” to express the same notion. You might be confused because If f(n) is in ω(g(n)) (pronounced “little-omega of g of n”), g(n) is a lower bound that is not asymptotically
tight for f(n) .we abuse equality in this way, but we shall see later in this section that doing so has its advantages.
Definition. ω-notation
Figure 3.1(a) gives an intuitive picture of functions f.n/ and g.n/, where
f.n/ D ‚.g.n//. For all values of n at and to the right of n , the value of f.n/
f(n) ∈ ω(g(n)) if for any positive constant c > 0, there exists a positive constant n > 0 such that 0
lies at or above c g.n/ and at or below c g.n/. In other words, for all n n , the
0 ≤ cg(n) < f(n) for all n ≥ n0.1 2
function f .n/ is equal to g.n/ to within a constant factor. We say that g.n/ is an
f (n)
f(n) ∈ ω(g(n)) if lim g(n) = ∞.
asymn→pt∞otically tight bound for f .n/.
0
0
The definition of ‚.g.n// requires that every member f.n/ 2 ‚.g.n// be
Example.
Solution.
asymptotically nonnegative, that is, that f .n/ be nonnegative whenever n is suf- Note: the limit definition only applies if the limit lim f(n) exists.
n→∞ g(n)
ficiently large. (An asymptotically positive function is one that is positive for all
0
sufficiently large n.) Consequently, the function g.n/ itself must be asymptotically Prove that 7n − 2 = Θ(n).
nonnegative, or else the set ‚.g.n// is empty. We shall therefore assume that every Nofutentchtiaotn7nu−se2d ≤w1it4hni.nT‚hu-sno7nta−tio2n=isO(ans)ymfolplotwotsicbayllsyettninognnce=ga1t4ivaen.d Tnhi=s 1a.ssumption
holds for the other asymptotic notations defined in this chapter as well.
To show that 7n−2 = Ω(n), we need to show that there exists positive constants c and n0 such that 7n−2 ≥ cn,
foralln≥n0.Thisisthesameasshowingthatforalln≥n0,(7−c)n≥2.Settingc=1andn0 =1proves the lower bound. Hence 7n − 2 = Θ(n).
CIS 121 – Draft of April 14, 2020 5 Running Time and Growth Functions 35
Example. Solution.
Example. Solution.
Example. Solution.
Prove that 10n3 + 55n log n + 23 = O(n3).
The left hand side is at most 88n3. Thus setting c = 88 and n0 = 1 proves the claim.
Prove that 360 = O(1).
This follows because 360 ≤ 360 · 1, for all n ≥ 0.
Prove that 5/n = O(1/n).
This follows because 5/n ≤ 5 · (1/n), for all n ≥ 1.
Prove that n2 /8 − 50n = Θ(n2 ). n2
To show that n2/8 − 50n = Ω(n2), we need to show that there exists positive constants c1 and n0 such that n2/8−50n ≥ c1n2, for all n ≥ n0. This is the same as showing that for all n ≥ n0, (1/8−c1)n ≥ 50, or equivalently (1 − 8c1)n ≥ 400. Setting c1 = 1/16 and n0 = 800 proves the lower bound.
Hence setting c1 = 1/16, c2 = 1, and n0 = 800 in the definition of Θ(·) proves that n2 /8 − 50n = Θ(n2 ). Second Solution: Note that limn→∞(n2/(n2/8−50n)) = limn→∞ 8n/(n−400) = limn→∞ 8/(1−400/n) = 8.
Thus the claim follows.
Example. Prove that lg n = O(n).
Solution. We will show using induction on n that lg n ≤ n, for all n ≥ 1.
Base Case: n = 1: the claim holds for this case as the left hand side is 0 and the right hand side is 1.
Induction Hypothesis: Assume that lg k ≤ k, for some k ≥ 1. Induction Step: We want to show that lg(k + 1) ≤ (k + 1). We have
LHS = lg(k + 1) ≤ lg(2k)
= lg 2 + lg k
≤ 1 + k (using induction hypothesis)
Example.
First Solution: n2/8 − 50n ≤ n2. To prove the upperbound, we need to prove that
0≤ 8 −50n≤c2n2,∀n≥n0 Setting c2 = 1 and n0 = 800 satisfies the above inequalities.
CIS 121 – Draft of April 14, 2020 5 Running Time and Growth Functions 36
Example. Prove that 3n100 = O(2n).
Solution. We will show using induction on n that 3n100 ≤ 3(100100)(2n), for all n ≥ 200. That is c = 3(100100)
and n0 = 200.
Base Case: n = 200: the claim holds for this case as the left hand side is 3(100100)(2100) and the right hand
side is 3(100100)(2200).
Induction Hypothesis: Assume that 3k100 ≤ 3(100100)(2k), for some k ≥ 200. Induction Step: We want to show that 3(k + 1)100 ≤ 3(100100)(2k+1).
(usingBinomialTheorem)
= 3k100 1 + (100/k) + (100/k)2 + (100/k)3 + · · · + (100/k)100 ∞
≤ 3(100100)(2k)(1/2)i (using induction hypothesis and the fact that k ≥ 200) i=0
≤ 3(100100)(2k+1)
Another way to prove the claim is as follows. Consider limn→∞ 2n/3n100. This can be written as
lim 2n/2lg 3n100 = lim 2n−log 3n100 = ∞. n→∞ n→∞
Example. Prove that 7n − 2 is not Ω(n10).
Solution. Note that limn→∞ n10/(7n − 2) = ∞ and hence 7n − 2 = o(n10). Thus it cannot be Ω(n10).
We can also prove the claim as follows using the definition of Ω. Assume for the sake of contradiction that 7n − 2 = Ω(n10). This means that for some constants c and n0, 7n − 2 ≥ cn10, for all n ≥ n0. This implies that
7n ≥ cn10 7 ≥ cn9
This is a contraction as the left hand side is a constant and the right had side keeps growing with n. More specifically, the right hand side becomes greater than the left hand side as soon as n > 2c−1/9.
Example. Prove that n1+0.0001 is not O(n).
Solution. Assume for the sake of contradiction that n1+0.0001 = O(n). This means that for some constants c and n0, n1+0.0001 ≤ cn, for all n ≥ n0. This is equivalent to n0.0001 ≤ c, for all n ≥ n0 This is impossible as the left hand side grows unboundedly with n. More specifically, when n > c104 , the left hand side becomes greater than c.
LHS = 3(k + 1)100
100 99 100 98
≤3 k +100k + 2 k +···+1
≤ 3 k100 + 100k99 + 1002k98 + · · · + 100100k0)
CIS 121 – Draft of April 14, 2020 5 Running Time and Growth Functions 37
5.5 Properties of Asymptotic Growth Functions
We now state without proof some of the properties of asymptotic growth functions.
1. If f(n) = O(g(n)) and g(n) = O(h(n)) then f(n) = O(h(n)).
2. If f(n) = Ω(g(n)) and g(n) = Ω(h(n)) then f(n) = Ω(h(n)).
3. If f(n) = Θ(g(n)) and g(n) = Θ(h(n)), then f(n) = Θ(h(n)).
4. Suppose f and g are two functions such that for some other function h, we have f(n) = O(h(n)) and
g(n) = O(h(n)). Then f(n) + g(n) = O(h(n)).
We now state asymptotic bounds of some common functions.
1. Recall that a polynomial is a function that can be written in the form a0+a1n+a2n2+···+adnd forsomeintegerconstantd>0andad isnon-zero.
Let f be a polynomial of degree d in which the coefficient ad > 0. Then f = O(nd).
2. For every b > 1 and any real numbers x,y > 0, we have (logb n)x = O(ny). Note that (logb n)x is also
written as logxb n.
3. Foreveryr>1andeveryd>0,wehavend =O(rn).
Example.
than n/1010. Example.
Solution.
As a result of the above bounds we can conclude that asymptotically, 105 log38 n grows slower Prove that n10 = o(2lg3 n) and 2lg3 n = o(2n).
Note that
2lg3 n = (2lgn)lg2 n = nlg2 n. Since we know that 10 = o(lg2 n), we conclude that
n10 = o(2lg3 n)
Since we also know that any poly-logarithmic function in n such as lg3 n is asymptotically upper-bounded by
a polynomial function in n, we conclude that
Actually, lg3 n = o(n) and hence
lg3 n = O(n) 2lg3 n = o(2n).
Analyzing Runtime of Code Snippets 6
Example. Consider the following code fragment.
for (i = 0; i < n; i++)
for (j = 0; j < i; j=j+10)
print (’’run time analysis’’)
Give a tight bound on the running time of this code fragment.
Solution. For each value of i, the inner loop executes i/10 times. Thus the running time of the body of the outer loop is at most c(i/10), for some positive constant c. Hence the total running time of the code fragment is given by
n−1 i n−1 i c(n−1)n
c 10 ≤c 10+1 = 20 +cn≤2cn2=O(n2) i=0 i=0
We will now show that n−1 c⌈ i ⌉ = Ω(n2). Note that i=0 10
n−1 i c⌈
n−1 ci ⌉ ≥
= c(n − 1)n/20 We want to find positive constants c′ and n0, such that for all n ≥ n0,
20
This is equivalent to showing that n(c − 20c′) ≥ c. This is true when c′ = c/40 and n ≥ 2. Thus, the running
time of the code fragment is Ω(n2).
Example. Consider the following code fragment.
i=n
while (i >= 10) do
i = i/3
for j = 1 to n do
print (’’Inner loop’’)
What is an upper-bound on the running time of this code fragment? Is there a matching lower-bound?
Solution. The running time of the body of the inner loop is O(1). Thus the running time of the inner loop is at most c1n, for some positive constant c1. The body of the outer loop takes at most c2n time, for some positive constant c2 (note that the statement i = i/3 takes O(1) time). Suppose the algorithm goes through t iterations of the while loop. At the end of the last iteration of the while loop, the value of i is n/3t. We know that the code fragment surely finishes when n/3t ≤ 1, solving which gives us t ≥ log3 n. This means that the number of iterations of the while loop is at most O(log n). Thus the total running time is O(n log n).
We will now show that the running time is Ω(n log n). We will lower-bound the number of iterations of the
i=0 10
i=0 10
c(n − 1)n ≥ c′n2
CIS 121 – Draft of April 14, 2020 6 Analyzing Runtime of Code Snippets 39
outer loop. Note that when the value of i is more than 10 (say, 33), the outer loop has not terminated. Solving n/3t ≥ 33, gives us that log3 n − 3 is a lower bound on the number of iterations of the outer loop. For each iteration of the outer loop, the inner loop runs n times. Thus the total running time is at least cn(log3 n − 3), for some positive constant c. Note that cn(log3 n − 3) ≥ c′n log n, when c′ = c/2 and n ≥ 36. Thus the running time is Ω(n log n).
Example. Consider the following code fragment.
for i = 0 to n do
for j = n down to 0 do
for k = 1 to j-i do
print (k)
What is an upper-bound on the running time of this algorithm? What is the lower bound?
Solution. Note that for a fixed value of i and j, the innermost loop goes through max{0, j − i} ≤ n times.
Thus the running time of the above code fragment is O(n3).
To find the lower bound on the running time, consider the values of i, such that 0 ≤ i ≤ n/4 and values of j, such that 3n/4 ≤ j ≤ n. Note that for each of the n2/16 different combinations of i and j, the innermost loop executes at least n/2 times. Thus the running time is at least
(n2/16)(n/2) = Ω(n3) Example. Consider the following code fragment.
for i = 1 to n do
for j = 1 to i*i do
for k = 1 to j do
print (k)
Give a tight-bound on the running time of this algorithm? We will assume that n is a power of 2.
Solution. Note that the value of j in the second for-loop is upper bounded by n2 and the value of k in the innermost loop is also bounded by n2. Thus the outermost for-loop iterates for n times, the second for-loop iterates for at most n2 times, and the innermost loop iterates for at most n2 times. Thus the running time of the code fragment is O(n5).
We will now argue that the running time of the code fragment is Ω(n5). Consider the following code fragment.
for i = n/2 to n do
for j = (n/4)*(n/4) to (n/2)*(n/2) do
for k = 1 to (n/4)*(n/4) do
print (k)
Note that the values of i, j, k in the above code fragment form a subset of the corresponding values in the code fragment in question. Thus the running time of the new code fragment is a lower bound on the running time of the code fragment in question. Thus the running time of the code fragment in question is at least n/2 · 3n2/16 · n2/16 = Ω(n5).
Thus the running time of the code fragment in question is Θ(n5).
CIS 121 – Draft of April 14, 2020 6 Analyzing Runtime of Code Snippets 40
Example. Consider the following code fragment. We will assume that n is a power of 2.
for (i = 1; i <= n; i = 2*i) do
for j = 1 to i do
print (j)
Give a tight-bound on the running time of this algorithm?
Solution. Observe that for 0 ≤ k ≤ lgn, in the kth iteration of the outer loop, the value of i = 2k. Thus
the running time T(n) of the code fragment can be written as follows.
lgn T(n) = 2k
k=0 =2lgn+1 −1
= 2n − 1
=Θ(n) (c1 =1,c2 =2,n0 =1)
What is wrong with the following argument that the running time of the above code fragment is Ω(n log n)?
for (i = n/128; i<= n; i = 2*i) do
for j = 1 to n/128 do
print (j)
The outer loop runs Ω(lgn) times and the inner loop runs Ω(n) times and hence the running time is Ω(n log n).∗
Discussion: Consider a problem X with an algorithm A.
Algorithm A runs in time O(n2). This means that the worst case asymptotic running time of algorithm A is upper-bounded by n2. Is this bound tight? That is, is it possible that the run-time analysis of algorithm A is loose and that one can give a tighter upper-bound on the running time?
Algorithm A runs in time Θ(n2). This means that the bound is tight, that is, a better (tighter) bound on the worst case asymptotic running time for algorithm A is not possible.
Problem X takes time O(n2). This means that there is an algorithm that solves problem X on all inputs in time O(n2).
Problem X takes Θ(n1.5). This means that there is an algorithm to solve problem X that takes time O(n1.5) and no algorithm can do better.
Logarithm Facts: Below are some facts on logarithms that you may find useful. i.logb= 1
a logb a ii log b = logc b
a logc a iii aloga b = b
iv bloga x = xloga b
∗ Answer: the issue with the argument is that the outer loop only runs a constant number of times (8 to be specific, since 27 = 128), and therefore the outer loop runs in Ω(1) time instead of Ω(lg n), which is why it’s Θ(n), as the proof above shows.
Divide & Conquer and Recurrence Relations 7
7.1 Computing Powers of Two
Consider the problem of computing 2n for any non-negative integer n. Below are four similar looking algorithms to solve this problem.
powerof2(n)
if n = 0
return 1 else
return 2 * powerof2(n-1)
powerof2(n)
if n = 0
return 1 else
return powerof2(n-1)+ powerof2(n-1)
powerof2(n)
if n = 0
return 1 else
tmp = powerof2(n-1)
return tmp + tmp
powerof2(n)
if n = 0
return 1 else
tmp = powerof2(floor(n/2))
if (n is even) then
return tmp * tmp
else
return 2 * tmp * tmp
The recurrence for the first and the third method is T (n) = T (n − 1) + O(1). The recurrence for the second method is T (n) = 2T (n − 1) + O(1), and the recurrence for the last method is T (n) = T (n/2) + c (assuming that n is a power of 2). In all cases the base case is T(0) = 1.
We will solve these recurrences. The recurrence for the first and the third method can be solved as follows.
CIS 121 – Draft of April 14, 2020
7 Divide & Conquer and Recurrence Relations 42
T (n) = T (n − 1) + c = T (n − 2) + 2c = T (n − 3) + 3c
... ...
= T (n − k) + kc The recursion bottoms out when n − k = 0, i.e., k = n. Thus, we get
T (n) = T (0) + kc = 1 + nc
= Θ(n)
The recurrence for the second method can be solved as follows.
T (n) = 2T (n − 1) + c =22T(n−2)+(20 +21)c
= 23T(n−3)+(20 +21 +22)c
... ...
k−1 = 2k T (n − k) + c 2i
i=0
The recursion bottoms out when n − k = 0, i.e., k = n. Thus, we get
n−1
T (n) = 2n T (0) + c 2i
i=0 =2n +c(2n −1)
= Θ(2n)
The recurrence for the fourth method can be solved as follows.
T(n) = T(n/2) + c = T(n/22) + 2c = T(n/23) + 3c
... ...
= T(n/2k) + kc
CIS 121 – Draft of April 14, 2020 7 Divide & Conquer and Recurrence Relations 43
The recursion bottoms out when n/2k < 1, i.e., when k > lg n. Thus, we get
T (n) = T (0) + c(lg n + 1) = 1 + Θ(lg n)
= Θ(lg n)
7.2 Linear Search and Binary Search
The input is an array A of elements in any arbitrary order and a key k and the objective is to output true, if k is in A, false, otherwise. Below is a recursive function to solve this problem.
LinearSearch (A[lo .. hi], k)
if lo > hi then
return False
else
return (A[hi] == k) or LinearSearch(A[lo..hi-1], k)
The recurrence relation to express the running time of LinearSearch is given by T (n) = T (n − 1) + c, with the base case being T(0) = 1. We have already solved this recurrence and it yields a running time of T (n) = Θ(n).
If the input array A is already sorted, we can do significantly better using binary search as follows.
BinarySearch (A[lo .. hi], k)
if lo > hi then
return False
else
mid = floor(lo+hi/2)
if A[mid] = k then
return True
else if A[mid] < k then
return BinarySearch(A[mid+1 .. hi], k)
else
return BinarySearch(A[lo .. mid-1], k)
The running time of this method is given the recurrence T (n) = T (n/2) + c, with the base case being T (0) = 1. As we have seen before, this recurrence yields a running time of T (n) = Θ(log n).
7.3 MergeSort
Below is a recursive version of insertion sort that we studied a couple of lectures ago.
CIS 121 – Draft of April 14, 2020
7 Divide & Conquer and Recurrence Relations 44
InsertionSort(A[lo..hi])
if lo = hi then
return A else
A’ = InsertionSort(A[lo..hi-1])
Insert(A’, A[hi]) // insert element A[hi] into the sorted array A’
Note that the Insert function takes Θ(n) time for an input array of size n. Thus the running time of Insertion sort is given by the following recurrence.
1, n = 1 T(n)= T(n−1)+n, n≥2
It is easy to see that this recurrence yields a running time of T(n) = Θ(n2).
To motivate the idea behind the next sorting algorithm (Merge Sort), let’s rewrite InsertionSort function as
follows.
InsertionSort(A[lo..hi])
if lo = hi then
return A else
// Merge combines two sorted arrays into one sorted array
Merge(InsertionSort(A[lo..hi-1]), InsertionSort(A[hi..hi]))
The function Merge is as follows.
Merge(A[1..p], B[1..q])
if p = 0 then
return B
if q = 0 then
return A
if A[1] <= B[1] then
return prepend(A[1], Merge(A[2..p], B[1..q]))
else
return prepend(B[1], Merge(A[1..p], B[2..q]))
Note that the running time of Merge is O(p + q). The second recursive call to InsertionSort takes O(1) time
and hence the running time of InsertionSort still is Θ(n2).
Observe that in InsertionSort the input array A is partitioned into two arrays, one of size |A| − 1 and another of size 1. In Merge Sort, we partition the input array of size n in two equal halves (assuming n is a power of 2). Below is the function.
MergeSort(A[1..n])
if n = 1 then
return A else
return Merge(MergeSort(A[1..n/2], MergeSort(A[n/2+1..n]))
CIS 121 – Draft of April 14, 2020 7 Divide & Conquer and Recurrence Relations 45
The running time of MergeSort is given by the following recurrence. 1, n = 1
T(n)= 2T(n/2)+cn, n≥2
We can also solve recurrences by guessing the overall form of the solution and then figure out the constants as
we proceed with the proof. Below are some examples.
Example. Consider the following recurrence for the MergeSort algorithm.
Prove that T (n) = O(n lg n).
Solution.
We will first prove the claim by expanding the recurrence as follows.
T(n) = 2T(n/2) + n
= 22T (n/22) + 2n = 23T (n/23) + 3n
... ...
1, n = 1 T(n)= 2T(n/2)+n, n≥2
= 2kT(n/2k) + kn The recursion bottoms out when n/2k = 1, i.e., k = lg n. Thus, we get
T (n) = 2lg n T (1) + n lg n = Θ(n log n)
We will now prove that T (n) = O(n lg n) by using strong induction on n. We will show that for some constant c, whose value we will determine later, T (n) ≤ cn lg n, for all n ≥ 2.
Induction Hypothesis: Assume that the claim is true when n = j, for all j such that 2 ≤ j ≤ k. In other words, T (j) ≤ cj lg j.
Base Case: n = 2. The left hand side is given by T(2) = 2T(1)+2 = 4 and the right hand side is 2c. Thus the claim is true for the base case when c ≥ 2.
CIS 121 – Draft of April 14, 2020 7 Divide & Conquer and Recurrence Relations 46
Induction Step: We want to show that for k ≥ 2, T (k + 1) ≤ c(k + 1) lg(k + 1). We have k + 1
T (k + 1) = 2T 2 + (k + 1) k+1 k+1
= c(k + 1)(lg(k + 1) − lg 2) + (k + 1) = c(k + 1) lg(k + 1) − (c − 1)(k + 1) ≤ c(k + 1) lg(k + 1) (since c ≥ 2)
Consider the following recurrence that you may want to try to solve on your own before reading
1, n = 1 T(n)= 2T(n/2)+n2, n≥2
≤ 2c 2 lg 2 + (k + 1)
Example.
the solution.
Prove that T(n) = Θ(n2).
Solution. Clearly, T(n) = Ω(n2) (because of the n2 term in the recurrence). To prove that T(n) = O(n2), we will show using strong induction that for some constant c, whose value we will determine later, T(n) ≤ cn2, for all n ≥ 1.
Induction Hypothesis: Assume that the claim is true when n = j, for all j such that 1 ≤ j ≤ k. In other words, T(j) ≤ cj2.
Base Case: n = 1. The claim is clearly true as the left hand side and the right hand side, both equal 1. Induction Step: We want to show that T (k + 1) ≤ c(k + 1)2. We have
k + 1 T (k + 1) = 2T 2
k + 1 2 ≤ 2c 2
2
We want the right hand side to be at most cn2. This means that we want c/2 + 1 ≤ c, which holds when c ≥ 2. Thus we have shown that T(n) ≤ 2n2, for all n ≥ 1, and hence T(n) = O(n2).
7.4 More Recurrence Practice
Example. The running time of Karatsuba’s algorithm for integer multiplication is given by the following recurrence. Solve the recurrence. Assume that n is a power of 2.
3T(n/2)+n, n≥2 T(n) = 1, n=1
+ (k + 1) + (k + 1)2
= c + 1 (k + 1)2 2
CIS 121 – Draft of April 14, 2020
7 Divide & Conquer and Recurrence Relations 47
Solution.
T(n) = 3T(n/2) + n
= 32T(n/22) + 3n/2 + n
= 33T (n/23) + (3/2)2n + 3n/2 + n
... ...
k−1
= 3k T (n/2k ) + n (3/2)i
i=0
The recursion bottoms out when n/2k = 1, i.e., k = lg n. Thus, we get
lg n−1 T(n)=3log2nT(1)+n (3/2)i
i=0
log n (3/2)lgn −1
=32+n
log3 n log n/ log
Example.
Solution.
= nlog2 3 + 2n(nlg 3−1) − 2n
= nlog2 3 +2nlg3 −2n
= 3nlog2 3 − 2n
= Θ(nlg3) (can be argued by setting c = 3 and n0 = 1)
Find the running time expressed by the following recurrence. Assume that n is a power of 3. 5T(n/3)+n2, n≥2
T(n) = 5T(n/3) + n2
= 52T (n/32) + 5(n/3)2 + n2
= 53T (n/33) + 52(n/32)2 + 5n2/32 + n2
... ...
5 52 5k−1 =5kT(n/3k)+n2 1+9+ 9 +...+ 9
3/2−1 =nlog32 +2n((3/2) 3/2
2
= nlog2 3 + 2n(nlg(3/2) − 1) = nlog2 3 +2n(nlg3−lg2) −1)
3/2 −1)
T(n) = 1, n=1
CIS 121 – Draft of April 14, 2020 7 Divide & Conquer and Recurrence Relations 48
The recursion bottoms out when n/3k = 1, i.e., k = log3 n. Thus, we get
k−1
T(n) = 5log3 nT(1) + n2 (5/9)i
i=0
= 5log3 n + n2 (5/9)log3 n − 1
(5/9) − 1 9n2
= 5log5 n/ log5 3 + 1 − (nlog3 5/n2) 4
log 5 9n2 9nlog3 5 =n3+−
44 9n2 5nlog3 5
=4− 4 = Θ(n2)
7.5 Simplified Master Theorem
Obviously, it would be nice if there existed some kind of shortcut for computing recurrences. The Simplified Master Theorem provides this shortcut for recurrences of the following form:
Theorem 1. Let a ≥ 1, b > 1,k ≥ 0 be constants and let T(n) be a recurrence of the form T (n) = aT n + Θ(nk )
b
defined for n ≥ 0. The base case T (1) can be any constant value. Then
Case 1: if a > bk, then T(n) = Θnlogb a Case2: ifa=bk,thenT(n)=Θnklogbn Case 3: if a < bk, then T(n) = Θnk
The statement of the full Master Theorem and its proof will be presented in CIS 320.
Example. Solve the following recurrences using the Simplified Master Theorem. Assume that n is a power
of 2 or 3 and T (1) = c for some constant c.
a. T(n) = 4T(n/2) + n b. T(n)=T(n/3)+n
c. T(n) = 9T(n/3) + n2.5 d. T(n) = 8T(n/2) + n3
Solution.
a. a = 4,b = 2, and k = 1. Thus case 1 of the Simplified Master Theorem applies and hence T(n) = Θ(nlog2 4) = Θ(n2).
b. a = 1, b = 3, and k = 1. Thus case 3 of the Simplified Master Theorem applies and hence T (n) = Θ(n).
c. a = 9, b = 3, and k = 2.5. Thus case 3 of the Simplified Master Theorem applies and hence T (n) = Θ(n2.5 ).
d. a = 8,b = 2, and k = 3. Thus case 2 of the Simplified Master Theorem applies and hence T(n) =
Θ(n3 log2 n).
8.1 Deterministic Quicksort
In quicksort, we first decide on the pivot. This could be the element at any location in the input array. The function Partition is then invoked. Partition accomplishes the following: it places the pivot in the location that it should be in the output and places all elements that are at most the pivot to the left of the pivot and all elements greater than the pivot to its right. Then we recurse on both parts. The pseudocode for Quicksort is as follows.
QSort(A[lo..hi])
if hi <= lo then
return else
pIndex = floor((lo+hi)/2) (this could have been any location)
loc = Partition(A, lo, hi, pIndex)
QSort(A[lo..loc-1])
QSort(A[loc+1..hi])
One possible implementation of the function Partition is as follows.
Partition(A, lo, hi, pIndex)
pivot = A[pIndex]
swap(A, pIndex, hi)
left = lo
right = hi-1
while left <= right do
if (A[left] <= pivot) then
left = left + 1
else
swap(A, left, right)
right = right - 1
swap(A, left, hi)
return left
The worst case running time of the algorithm is given by
1,
T(n)= T(n−1)+cn, n≥2
Hence the worst case running time of QSort is Θ(n2).
An instance where the quicksort algorithm in which the pivot is always the first element in the input array performs poorly is an array in descending order of its elements.
n = 1
Quicksort 8
CIS 121 – Draft of April 14, 2020 8 Quicksort 50
8.2 Randomized Quicksort
In the randomized version of quicksort, we pick a pivot uniformly at random from all possibilities. We will now show that the expected number of comparisons made in randomized quicksort is equal to 2n ln n + O(n) = Θ(n log n).
Proof. Let y1,y2,...,yn be the elements in the input array A in sorted order. Let X be the random variable denoting the total number of pair-wise comparisons made between elements of A. Let Xi,j be the random variable denoting the total number of times elements yi and yj are compared during the algorithm.
We make the following observations:
Comparisons between elements in the input array are done only in the function Partition
Two elements are compared if and only if one of them is a pivot.
Let Xk be an indicator random variable that is 1 if and only if elements y and y are compared in the k-th
Theorem 1. For any input array of size n, the expected number of comparisons made by randomized quicksort is
2nlnn+O(n) = Θ(nlogn)
i,j ij
call to Partition. Then we have:
n−1 n
X=X andX=Xk
i,j i,j i,j k
i=1 j=i+1
We will now calculate E[Xi,j]. By the linearity of expectation, we have
E[X ]=E[Xk]=Pr[Xk =1] ij ij ij
kk
Let t be the iteration of the first call to Partition during which one of the elements from yi,yi+1,...,yj is
used as the pivot (think about why such an iteration must exist). From our observations above, note that for
alltimesbeforet,y andy arenevercompared,soXk =0forallk
Now, since there are j − i + 1 elements in the list yi,yi+1,…,yj, and the pivot is chosen randomly, the probability that one of yi or yj is chosen as the pivot is 2 . Hence:
j −i+1
E[X ]=Pr[Xt =1]= 2
i,j i,j j−i+1
Now we use this result and apply the linearity of expectation to get:
CIS 121 – Draft of April 14, 2020
8 Quicksort 51
n−1 n E[X]= E[Xi,j]
i=1 j=i+1 n−1n 2
=
i=1 j=i+1 j−i+1
= n 2 (n − k + 1) k=2 k
(see the note below)
=(n+1)n 2 −2(n−1) k=2 k
=2(n+1)n 1 −2(n−1)−2(n+1) k=1 k
= 2(n + 1) (ln n + c) − 4n where 0 ≤ c < 1 = 2n ln n + O(n)
n
Here we have used the fact that the harmonic function, H (n) = 1 is at most ln n + c for some constant
0 ≤ c < 1.
Also, the third equality follows by expanding the sum as
k k=1
n−1 n 2 2 2 2 2 2
= + +...+ + 2 3 n−2
+
i=1 j=i+1 j−i+1
n−1 +2+2+...+ 2 + 2
n
2 3 n−2 n−1 +2+2+...+ 2
and grouping the columns together.
23n−2 .
+2 2
Counting Inversions 9
9.1 Introduction and Problem Description
We will continue with our study of divide and conquer algorithms by considering how to count the number of inversions in a list of elements. This is a problem that comes up naturally in many domains, such as collaborative filtering and meta-search tools on the internet. A core issue in applications like these is the problem of comparing two rankings: you rank a set of n movies, and then a collaborative filtering system consults its database to look for other people who had “similar” rankings in order to make suggestions. But how does one measure similarity?
Suppose you and another person rank a set of n movies, labelling them from 1 to n accordingly. A natural way to compare your ranking with the other persons is to count the number of pairs that are “out of order.” Formally, we will consider the following problem: We are given a sequence of n numbers, a1,...,an, which we assume are distinct. We want to define a measure that tells us how far this list is from being in ascending order. The value of the measure should be 0 if a1 < a2 < ... < an and should increase as the numbers become more scrambled, taking on the largest possible value if an < an−1 < ... < a1.
A natural way to quantify this notion is by counting the number of inversions.
As an example, consider the sequence
2,4,1,3,5
There are three inversions in this sequence: (2,1),(4,1), and (4,3). A simple way to count the number of inversions when n is small is to draw the sequence of input numbers in the order they’re provided, and below that in ascending order. We then draw a line between each number in the top list and its copy in the lower list. Each crossing pair of line segments corresponds to an inversion.
24135
12345
Figure 9.1: There are three inversions: (2, 1), (4, 1), and (4, 3).
Note how the number of inversions is a measure that smoothly interpolates between complete agreement (when the sequence is in ascending order, then there are no inversions) and complete disagreement (if the
sequence is in descending order, then every pair forms an inversion and so there are n of them). 2
These notes were adapted from Kleinberg and Tardos’ Algorithm Design
Definition. Given a sequence of numbers a1,a2,...,an, we say indices i < j form an inversion if ai > aj. That is, if ai and aj are out of order.
CIS 121 – Draft of April 14, 2020 9 Counting Inversions 53
9.2 Designing an Algorithm
A naive way to count the number of inversions would be to simply look at every pair of numbers (ai, aj) and determine whether they constitute an inversion. This would take O(n2) time.
As you can imagine, there is a faster way that runs in O(n log n) time. Note that since there can be a quadratic number of inversions, such an algorithm must be able to compute the total number without every looking at each inversion individually. The basic idea is to use a divide and conquer strategy.
Similar to the other divide and conquer algorithms we’ve seen, the first step is to divide the list into two pieces: set m = n and consider the two lists a1 , …, am and am+1 , …, an . Then we conquer each half by recursively
2
counting the number of inversions in each half separately.
Now, how do we combine the results? We have the number of inversions in each half separately, so we must find a way to count the number of inversions of the form (ai,aj) where ai and aj are in different halves. We know that the recurrence T (n) = 2T (n/2) + O(n), T (1) = 1 has solution T (n) = O(n log n), and so this implies that we must be able to do this part in O(n) time if we expect to find an O(n log n) solution.
Note that the first-half/second-half inversions have a nice form: they are precisely the pairs (ai,aj) where ai is in the first half, aj is in the second half, and ai > aj.
To help with counting the number of inversions between the two halves, we will make the algorithm recursively sort the numbers in the two halves as well. Having the recursive step do a bit more work (sorting as well as counting inversions) will make the “combining” portion of the algorithm easier.
Thus the crucial routine in this process is Merge-and-Count. Suppose we have recursively sorted the first and second halves of the list and counted the inversions in each. We know have two sorted lists A and B, containing the first and second halves respectively. We want to produce a single sorted list C from their union, while also counting the number of pairs (a,b) with a ∈ A and b ∈ B and a > b.
This is closely related to a problem we have previously encountered: namely the “combining” step in MergeSort. There, we had two sorted lists A and B and we wanted to merge them into a single sorted list in O(n) time. The difference here is that we want to do something extra: not only should we produce a single sorted list from A and B, but we should also count the number of inverted pairs (a, b) where a ∈ A, b ∈ B and a > b as we do so.
We can do this by walking through the sorted lists A and B, removing elements from the front and appending them to the sorted list C. In a given step, we have a current pointer into each list, showing our current position. Suppose that these pointers are currently at elements ai and bj. In one step, we compare the elements ai and bj, remove the smaller one from its list, and append it to the end of list C.
This takes care of the merging. To count the number of inversions, note that because A and B are sorted, it is actually very easy to keep track of the number of inversions we encounter. Every time element ai is appended to C, no new inversions are encountered since ai is smaller than everything left in list B and it comes before all of them. On the other hand, if bj is appended to list C, then it is smaller than all of the remaining items in A and it comes after all of them, so we increase our count of the number of inversions by the number of elements remaining in A. This is the crucial idea: in constant time, we have accounted for a potentially large number of inversions.
Input: Two sorted lists A and B
Merge-and-Count
CIS 121 – Draft of April 14, 2020 9 Counting Inversions 54
Output: A sorted list containing all elements in A and B, as well as the number of inversions assuming all elements in A precede those in B.
Merge-and-Count(A,B) L = []
currA = 1
currB = 1
count = 0
While currA <= A.length and currB <= B.length:
a = A[currA] b = B[currB] if a < b then
L.append(a)
currA = currA + 1 else
L.append(b)
remaining = A.length - currA + 1 count = count + remaining
currB = currB + 1
Once one list is empty, append the remainder of the other list to L
Return count and L
We use this Merge-and-Count routine in a recursive procedure that simultaneously sorts and counts the number of inversions in a list L.
Sort-and-Count
Input: A list L of n distinct numbers
Output: A sorted version of L as well as the number of inversions in L.
Sort-and-Count(L) if n = 1 then
return 0 and L
m = ⌈n/2⌉
A = L[1,...,m]
B = L[m+1,...,n]
(rA,A) = Sort-and-Count(A) (rB,B) = Sort-and-Count(B)
(r , L) = Merge-and-count(A,B)
return rA + rB + r and the sorted list L
CIS 121 – Draft of April 14, 2020 9 Counting Inversions 55
9.3 Runtime
The running time of Merge-and-Count can be bounded by the analogue of the argument we used for the original Merge algorithm: each iteration of the while loop takes constant time, and in each iteration, we add some element to the output that will never be seen again. Thus the number of iterations can be at most the sum of the initial lengths of A and B, and so the total running time is O(n).
Since Merge-and-Count runs in O(n) time, the recurrence for the runtime of Sort-and-Count is
2T(n/2) + O(n) n > 1
1 otherwise
This is exactly the same recurrence as we saw in the analysis of MergeSort, so we can immediately say the runtime of Sort-and-Count is
T(n) = O(nlogn)
T(n) =
10.1 Introduction to Problem
Selection Problem 10
Definition. The ith order statistic of a set of n elements is the ith smallest element.
Here we address the problem of selecting the ith order statistic from a set of n distinct numbers. We assume for convenience that the set contains distinct numbers, although virtually everything that we do extends to the situation in which a set contains repeated values. The selection problem is defined as:
Input: n integers in array A[1..n], and an integer i ∈ {1, 2, …, n} Output: the ith smallest number in A
We can solve the selection problem in O(n lg n) time, since we can sort the numbers using merge sort and then simply index the ith element in the output array. Here, we present a faster algorithm which achieves O(n) running time in the worst case.
If we could find element e such that rank(e) = n/2 (the median) in O(n) time we could make Quicksort run in Θ(n log n) time worst case. We could just exchange e with last element in A in beginning of Partition and thus make sure that A is always partition in the middle
10.2 Selection in Worst-Case Linear Time
We now examine a selection algorithm whose running time is O(n) in the worst case. The algorithm Select finds the desired element by recursively partitioning the input array. Here, however, we guarantee a good split upon partitioning the array. Select uses the deterministic partitioning algorithm Partition from Quicksort, but modified to take the element to partition around as an input parameter.
The Select algorithm determines the ith smallest of an input array of n > 1 distinct elements by executing the following steps. (If n = 1, then Select merely returns its only input value as the ith smallest.)
1. Divide the n elements of the input array into ⌊n/5⌋ groups of 5 elements each and at most one group made up of the remaining n mod 5 elements.
2. Find the median of each of the ⌈n/5⌉ groups by first Insertion-Sorting the elements of each group (of which there are at most 5) and then picking the median from the sorted list of group elements.
3. Use Select recursively to find the median x of the ⌈n/5⌉ medians found in step 2. (If there are an even number of medians, then by our convention, x is the lower median.)
4. Partition the input array around the median-of-medians x using the modified version of Partition. Let k be one more than the number of elements on the low side of the partition, so that x is the kth smallest element and there are n − k elements on the high side of the partition.
5. If i = k, then return x. Otherwise, use Select recursively to find the ith smallest element on the low side if i < k, or the (i − k)th smallest element on the high side if i > k.
These notes were adapted from CLRS Chapter 9.3
Definition. A median, informally, is the “halfway point” of the set, occurring at i = ⌊(n + 1)/2⌋∗ ∗ When n is even, two medians exist. However, for simplicity, we will choose the lower median.
CIS 121 – Draft of April 14, 2020 10 Selection Problem 57
To analyze the running time of Select, we first determine a lower bound on the number of elements that are greater than the partitioning element x. The figure below helps us to visualize this bookkeeping. At least half of the medians found in step 2 are greater than or equal to the median-of-medians x (remember, we assumed distinct elements). Thus, at least half of the ⌈n/5⌉ groups contribute at least 3 elements that are greater than x, except for the one group that has fewer than 5 elements if 5 does not divide n exactly, and the one group containing x itself. Discounting these two groups, it follows that the number of elements greater than x is at least
1 n 3n
3 2 5 −2 ≥10−6
9.3 Selection in worst-case linear time
Similarly, at least 3n/10 − 6 elements are less than x. Thus, in the worst case, step 5 calls Select recursively on at most 7n/10 + 6 elements.
x
Figure 10.1: From CLRS, this figure helps understand the analysis of Select. The n elements are represented by small circles,
and each group of 5 elements occupies a column. The medians of the groups are whitened, and the median-of-medians x is labeled.
(When finding the median of an even number of elements, we use the lower median.) Arrows go from larger elements to smaller,
from which we can see that 3 out ofFeivgeruyrfuell9gr.o1up oAf 5nealelmyesnits tofthtehreighatlgofoxriatrhemgreaSteErLthEanCTx,.anTdh3eount oeflevmeryegnrtosupare represente of 5 elements to the left of x are less than x. The elements known to be greater than x appear on a shaded background.
and each group of 5 elements occupies a column. The medians of the groups are median-of-medians x is labeled. (When finding the median of an even number o
We can now develop a recurrence for the worst-case running time T (n) of the algorithm Select. Steps 1, 2,
the lower median.) Arrows go from larger elements to smaller, from which we
and 4 take O(n) time. (Step 2 consists of O(n) calls of Insertion-Sort on sets of size O(n).) Step 3 takes
of every full group of 5 elements to the right of x are greater than x, and 3 out o
time T (⌈n/5⌉), and step 5 takes time at most T (7n/10 + 6), assuming that T is monotonically increasing. We
elements to the left of x are less than x. The elements known to be greater than x
make the assumption, which seems unmotivated at first, that any input of fewer than 140 elements requires
background.
O(1) time; the origin of the magic constant 140 will be clear shortly. We can therefore obtain the recurrence O(1), if x < 140
step 2 are greater than or equal to the median-of-medians x.1 Th T (n) ≤ T (⌈n/5⌉) + T (7n/10 + 6) + O(n), otherwise.
of the dn=5e groups contribute at least 3 elements that are greater We show that the running time is linear by substitution. More specifically, we will show that T (n) ≤ cn for
for the one group that has fewer than 5 elements if 5 does not divid some suitably large constant c and all n > 0. We begin by assuming that T (n) ≤ cn for some suitably large
the one group containing x itself. Discounting these two groups, it constant c and all n < 140; this assumption holds if c is large enough. We also pick a constant a such that
the function described by the O(n) term above (which describes the non-recursive component of the running number of elements greater than x is at least
time of the algorithm) is bounded above by an for all n > 0. Substituting this inductive hypothesis into the 31lnm2 3n6:
25 10
Similarly, at least 3n=10 6 elements are less than x. Thus, in step 5 calls SELECT recursively on at most 7n=10 C 6 elements.
We can now develop a recurrence for the worst-case running ti
d f
a
u
e f
m algorithm SELECT. Steps 1, 2, and 4 take O.n/ time. (Step 2 co
CIS 121 – Draft of April 14, 2020
10 Selection Problem 58
right-hand side of the recurrence yields
which is at most cn if
T(n) ≤ c⌈n/5⌉ + c(7n/10 + 6) + an ≤ cn/5 + c + 7cn/10 + 6c + an = 9cn/10+7c+an
= cn + (−cn/10 + 7c + an)
0 ≥ −cn/10 + 7c + an c ≥ 10a(n/(n − 70))
Therefore, since we assume n ≥ 140∗, we have n/(n − 70) ≤ 2, and so choosing c ≥ 20a satisfies the equation, showing that the worst-case running time of Select is therefore linear (i.e., runs in O(n) time).
(when n > 70)
∗ Note that there is nothing special about the constant 140; we could replace it by any integer strictly greater than 70 and then choose c accordingly.
11.1 Closest Pair
Today, we consider another application of divide-and-conquer, which comes from the field of computational geometry. We are given a set P of n points in the plane, and we wish to find the closest pair of points p, q ∈ P (see (a) in the figure below). This problem arises in a number of applications. For example, in air-traffic control,
you may want to monitor planes that come too close together, since this may indicate a possible collision. Recall that, given two points p = (px,py) and q = (qx,qy), their (Euclidean) distance is
Clearly, this problem can be solved by brute force in O(n2) time, by computing the distance between each pair, and returning the smallest. Today, we will present an O(n log n) time algorithm, which is based a clever use of divide-and-conquer.
Before getting into the solution, it is worth pointing out a simple strategy that fails to work. If two points are very close together, then clearly both their x-coordinates and their y-coordinates are close together. So, how about if we sort the points based on their x-coordinates and, for each point of the set, we’ll consider just nearby points in the list. It would seem that (subject to figuring out exactly what “nearby” means) such a strategy might be made to work. The problem is that it could fail miserably. In particular, consider the point set of (b) in the figure below. The points p and q are the closest points, but we can place an arbitrarily large number of points between them in terms of their x-coordinates. We need to separate these points sufficiently far in terms of their y-coordinates that p and q remain the closest pair. As a result, the positions of p and q can be arbitrarily far apart in the sorted order. Of course, we can do the same with respect to the y-coordinate. Clearly, we cannot focus on one coordinate alone∗.
Closest Pair 11
∥pq∥=
(px −qx)2 +(py −qy)2
q p
Figure 11.1F: i(ga.) 1T0h5e:cl(oas)esTt hpaeirclporsoebsletmpanirdp(rbo)bwlehmy soarntidng(bo)n wx-hoyr syo-arltoinegdonesxn’-t owroryk-.alone doesn’t work.
the point set of Fig. 105(b). The points p and q are the closest points, but we can place an
arbitrarily large number of points between them in terms of their x-coordinates. We need to
∗
q
p
ab
These notes were adapted from the University of Maryland’s CMSC 451 course
separate these points su ciently far in terms of their y-coordinates that p and q remain the While the above example shows that sorting along any one coordinate axis may fail, there is a variant of this strategy that can
together, theoirrdpreorj.ecOtiofncsoounrtsoe,a wraendcoamnlydortiehnetesdamvecetowritwhillrebsepcelcotset, oantdheif yth-ceyooardeifnaartaep.arCt,letahreliyr,pwroejectainonsotonto a randomly oriented vector will be far apart in ex1p6ectation. This observation underlies a popular nearest neighbor algorithm
closest pair. As a result, the positions of p and q can be arbitrarily far apart in the sorted
be used for computing nearest neighbors approximately. This approach is based on the observation that if two points are close
focus on one coordinate alone.
called locality sensitive hashing
Divide-and-Conquer Algorithm: Let us investigate how to design an O(nlogn) time divide-
and-conquer approach to the problem. The input consists of a set of points P, represented, say, as an array of n elements, where each element stores the (x, y) coordinates of the point. (For simplicity, let’s assume there are no duplicate x-coordinates.) The output will consist of a single number, being the closest distance. It is easy to modify the algorithm to also produce the pair of points that achieves this distance.
CIS 121 – Draft of April 14, 2020 11 Closest Pair 60
11.2 Divide and Conquer Algorithm
Let us investigate how to design an O(n log n) time divide and conquer approach to the problem. The input consists of a set of points P, represented, say, as an array of n elements, where each element stores the (x,y) coordinates of the point. (For simplicity, let’s assume there are no duplicate x-coordinates.) The output will consist of a single number, being the closest distance. It is easy to modify the algorithm to also produce the pair of points that achieves this distance.
For reasons that will become clear later, in order to implement the algorithm efficiently, it will be helpful to begin by presorting the points, both with respect to their x- and y-coordinates. Let Px be an array of points sorted by x, and let Py be an array of points sorted by y. We can compute these sorted arrays in O(n log n) time. Note that this initial sorting is done only once. In particular, the recursive calls do not repeat the sorting process. Like any divide-and-conquer algorithm, after the initial base case, our approach involves three basic elements: divide, conquer, and combine.
Base Case: If |P | < 3, then just solve the problem by brute force in O(1) time.
Divide: Otherwise, partition the points into two subarrays PL and PR based on their x-coordinates. In
particular, imagine a vertical line l that splits the points roughly in half (see figure below). Let PL be the points to the left of l and PR be the points to the right of l.
In the same way that we represented P using two sorted arrays, we do the same for PL
In the same way that we represented P using two sorted arrays, we do the same for PL and PR. Since
and PR. Since we have presorted Px by x-coordinates, we can determine the median
we have presorted Px by x-coordinates, we can determine the median element for l in constant time.
element for ` in constant time. After this, we can partition each of arrays Px and Py in After this, we can partition each of arrays Px and Py in O(n) time each.
O(n) time each.
R L p 0q
PL ` PR
Conquer: Compute the closest pair within each of the subsets PL and PR each, by invoking the algorithm recursively. Let δL and δR be the closest pair distances in each case (see figure above). Let δ = min(δL, δR).
Fig. 106: Divide-and-conquer closest pair algorithm.
Conquer: Compute the closest pair within each of the subsets PL and PR each, by invoking Combine: Note that δ is not necessarily the final answer, because there may be two points that are very
the algorithm recursively. Let L and R be the closest pair distances in each case (see close to one another but are on opposite sides of l. To complete the algorithm, we want to determine the
Fig. 106). Let = min( L, R).
closest pair of points between the sets, that is, the closest points p ∈ PL and q ∈ PR (see figure above). Combine: Note that is not necessarily the final answer, because there may be two points Since we already have an upper bound on the closest pair, it suffices to solve the following restricted
that are very close to one another but are on opposite sides of `. To complete the problem: if the closest pair (p, q) are within distance δ, then we will return such a pair, otherwise, we
algorithm, we want to determine the closest pair of points between the sets, that is, the
may return any pair. (This restriction is very important to the algorithm’s efficiency.) In the next section,
closest points p 2 PL and q 2 PR (see Fig. 106). Since we already have an upper bound we’ll show how to solve this restricted problem in O(n) time. Given the closest such pair (p,q), let
on the closest pair, it su ces to solve the following restricted problem: if the closest pair δ′ = ∥pq∥. We return min(δ, δ′) as the final result.
(p,q) are within distance , then we will return such a pair, otherwise, we may return
any pair. (This restriction is very important to the algorithm’s e ciency.) In the next Assuming we can solve the “combine” step in O(n) time, it follows that the algorithm’s running time is given
section, we’ll show how to solve this restricted problem in O(n) time. Given the closest by T (n) = 2T (n/2) + n, and (as in Mergesort) the overall running time is O(n log n), as desired.
such pair (p, q), let 0 = kpqk. We return min( , 0) as the final result.
Assuming that we can solve the “Combine” step in O(n) time, it will follow that the algo- rithm’s running time is given by the recurrence T (n) = 2T (n/2) + n, and (as in Mergesort) the overall running time is O(n log n), as desired.
Closest Pair Between the Sets: To finish up the algorithm, we need to compute the closest
pair p and q, where p 2 PL and q 2 PR. As mentioned above, because we already know of
CIS 121 – Draft of April 14, 2020 11 Closest Pair 61
11.3 Closest Pair Between the Sets
To finish up the algorithm, we need to compute the closest pair p and q, where p ∈ PL and q ∈ PR. As mentioned above, because we already know of the existence of two points within distance δ of each other, this algorithm is allowed to fail, if there is no such pair that is closer than δ. The input to our algorithm consists of the point set P, the x-coordinate of the vertical splitting line l, and the value of δ = min(δL,δR). Recall that our goal is to do this in O(n) time.
This is where the real creativity of the algorithm enters. Observe that if such a pair of points exists, we may assume that both points lie within distance δ of l, for otherwise the resulting distance would exceed δ. Let S
distance would exceed . Let S denote this subset of P that lies within a vertical strip of ∗ denote this subset of P that lies within a vertical strip of width 2 centered about l (see (a) in figure below) .
width 2 centered about ` (see Fig. 107(a)).17
S
Sy
`
2
/2
sj
si
p2 < 2
PL ` PR
ab
Figure 11.2: Closest pair in the strip. Fig. 107: Closest pair in the strip.
How do we find the closest pair within S? Sorting comes to our rescue. Let Sy = hs1, . . . , smi
How do we fidnednotheethcelopseosinttpsaoifrSwistohritnedSb?yStohretiirngy-coomrdeisnatotesou(sreereFscigu.e.10L7e(ta)S). A=t⟨tshe,.s.t.,asrt o⟩fdtehneote the y1m
lecture, we asserted that considering the points that are close according to their x- or y-
points of S sorted by their y-coordinates (see (a) in figure above). At the start of the notes, we asserted that
coordinate alone is not su cient. It is rather surprising, therefore, that this does work for considering the points that are close according to their x- or y-coordinate alone is not sufficient. It is rather
the set Sy.
surprising, therefore, that this does work for the set Sy.
The key observation is that if Sy contains two points that are within distance of each other,
The key observation is that if S contains two points that are within distance δ of each other, these two points
these two points myust be within a constant number of positions of each other in the sorted
must be withairnraycSon.sTtahnetfnolulomwbinerg olefmpmosaitfioornmsaolifzesatchisoothbeserrvinattiohne.sorted array S . The following lemma
yy formalizes this observation.
Lemma: Givenanytwopointssi,sj 2Sy,ifksisjk ,then|j i|7.
Clearly, the y-coordinates of these two points can di↵er by at most . So they must
Lemma 1.PrGoiovfe:nSaunpypotwseotphoaitnktss,ks∈ .SS,inifce∥sthsey∥a≤reδi,nthSenth|ejy−arie|≤eac7h. within distance of `. iijj y ij
both reside in a rectangle of width 2 and height centered about ` (see Fig. 107(b)).
Proof. Suppose that ∥sisj∥ ≤ δ. Since they are in S they are each within distance δ of l. Clearly, the
Split this rectangle into eight identical squares each of side length /2. A square of side
p
y-coordinates of these two points can differ by at most δ. So they must both reside in a rectangle of width 2δ
length x has a diagonal of length x
and height δ centered about l (see (b) in figure above). Split this rectangle into eight identical squares each
2, and no two points within such a square can be farther away than this. Therefore, the distance between any two points lying within one
of side length δ/2o.fAthseqsueaerigehotfssqiudaerelesnisgtaht mxohsats a diagonal of length x√2, and no two points within such p2
2 =p2<