A Companion to Data Organization
Aleksandar Ignjatović
2005 c©
2
Figure 1: A medieval copy of arguably the most influential textbook ever written, Eu-
clid’s Elements (Στoιχεία), from about 300 BC. It contains the oldest known algo-
rithm recorded, Euclid’s algorithm for computing the Greatest Common Divisor (GCD)
of two natural numbers.
Figure 2: First data structures: a Mesopotamian cuneiform tablet from about 3000 BC
with an inventory list.
Preface
This booklet is not meant to be a textbook, but only a companion to the textbook that we
use in this course, Cormen, Leiserson, Rivest and Stein’s Introduction to Algorithms, the
second edition. Its purpose is to help you read the appropriate sections of the textbook,
and integrate it better with the rest of the course. It also adds a number of exercises not
present in the textbook. Please read the textbook and do not rely on this supplement
only.
3
Contents
1 Introduction 7
1.1 What is an Algorithm? . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Abstraction Levels in Data Organization . . . . . . . . . . . . . . . . . 10
1.2.1 Abstract Data Structures . . . . . . . . . . . . . . . . . . . . . 11
1.2.2 Abstract Data Types . . . . . . . . . . . . . . . . . . . . . . . 15
1.3 Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.3.1 What is Recursion? . . . . . . . . . . . . . . . . . . . . . . . . 21
1.3.2 Types of Recursion . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3.3 Linear versus Divide-and-Conquer Recursion in Sorting . . . . 27
1.4 Asymptotic Time Complexity of Algorithms . . . . . . . . . . . . . . . 33
1.4.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1.4.2 Asymptotic growth rate of functions . . . . . . . . . . . . . . . 35
1.4.3 Basic property of logarithms . . . . . . . . . . . . . . . . . . . 37
2 Sorting Algorithms 47
2.1 The Quick Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.2 Randomization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5
Chapter 1
Introduction
1.1 What is an Algorithm?
An algorithm is a collection of precisely defined steps that are executable using certain
specified mechanical methods. By “mechanical” we mean the methods that do not
involve any creativity, intuition or even intelligence. Thus, algorithms are specified by
detailed, easily repeatable “recipes”. The word “algorithm” comes by corruption of
the name of Mukhammad ibn Musa Al-Khorezmi (780-850) who wrote an important
book on algebra, “Kitab al muhtasar fi hisab al gabr w’al muqubalah” (“A compact
introduction to calculation using rules of completion and reduction”). It is also believed
that the title of this book is the source of the word “algebra”.
In this course we will deal only with algorithms that are given as sequences of
steps, thus assuming that only one step can be executed at a time. Such algorithms are
called sequential. If the action of each step gives the same result whenever this step
is executed for the same input, we call such algorithms deterministic. If the execution
involves some random process, say throwing some (possibly software simulated) coin
7
8 Chapter 1. Introduction
or dice, such algorithms are called randomized algorithms. Randomized algorithms are
important tools for tasks that have to be repeated many times.
Example 1.1 A sorting algorithm for an input sequence of numbers 〈a1,a2, ,an〉 pro-
duces as output a permutation 〈aπ(1),aπ(2), . . . ,aπ(n)〉 of the input sequence such that
aπ(1) ≤ aπ(2) ≤ . . .≤ aπ(n).
There are many algorithms that accomplish the task of sorting that have different
efficiency for different kinds of inputs. Efficiency of an algorithm is measured by its
time complexity, and to analyze an algorithm means to determine its time complexity
for the given type of inputs. Usually we do the worse case analysis, i.e., we determine
the largest number of basic steps needed to execute the algorithm for any input of size
n. Also important is the average case analysis. It provides the expected number of
basic steps needed to execute the algorithm for inputs of length at most n, assuming
certain probability distribution of these inputs. Thus, if we assume that all inputs of
size at most n are equally likely, the average case analysis gives the average number of
steps needed to execute the algorithm for all inputs of size at most n. Clearly, a time
complexity function T (n) is non decreasing, i.e., m≤ n⇒ T (m)≤ T (n). An algorithm
is optimal if it provides the necessary functionality with the smallest possible numbers
of steps for each input.
Another important feature of an algorithm is its correctness i.e., that it provides
correct functionality for all possible inputs. Generally, the only way to ensure the cor-
rectness of an algorithm is by means of a mathematical proof. This is due to the fact
that it is impossible to test an algorithm for all possible inputs. Sometimes such cor-
rectness proofs are very simple, because the correctness follows immediately from the
1.1. What is an Algorithm? 9
definition of the algorithm. However, sometimes the correctness of an algorithm is far
from obvious and requires a meticulous proof.
The activity of designing an algorithm and an appropriate data structure for the
problem at hand is called problem solving. Clearly, problem solving precedes writing a
code, and in fact it is largely independent of the programming language or even of the
programming paradigm used (e.g., object oriented, imperative, etc). This is true for as
long as the algorithm is to be implemented on a standard, sequential processor. After
problem solving is completed and the correctness of designed algorithms is verified,
there is still much to do, namely one has to implement the algorithms and the associated
data structures in a specific programming language. Such implementation has to meet
numerous important criteria:
• reliability – the implementation is correct and it will not introduce vulnerabilities
to the system that runs it.
• maintainability – one can easily replace a part of the code in order to provide
somewhat different functionality;
• extensibility – to obtain more features, additional piece of code can be integrated
with the existing code with little or no modifications of the already existing code
needed;
• reusability – possibility of using parts of the code in another application. This is
accomplished by modular design.
• robustness – the implementation does not crush if the user inputs an unexpected
or illegitimate input;
10 Chapter 1. Introduction
• user friendliness – users can interact with ease with the program during its exe-
cution;
• implementation time – implementation takes a reasonable amount of program-
ming time and other resources.
Note that these features are not features of algorithms and data structures, but rather,
of their implementations. Clearly, it is equally important that both the algorithms and
the data structures on one hand, as well as their implementation on the other hand, be
correct. Separation of design and implementation simplifies the process of software
development and helps ensure reliability of the final product, because it allows that the
correctness of algorithms be verified prior to their their implementations.
1.2 Abstraction Levels in Data Organization
While solving a problem that involves a collection of objects of any kind, only a limited
number of properties of these objects and a limited number of their mutual relations are
relevant for the problem considered. Thus, objects can be “abstracted away”, i.e., re-
placed with suitable “names” and with an abstract representation of their corresponding
properties and mutual relations. These properties of objects and their relations we call
data.
Data is organized in a well-defined and structured abstract entity, called a data struc-
ture. Note that this makes sense regardless of whether data is to be manipulated by a
human or by a machine. The data corresponding to each object usually consists of a
key that identifies the object that data is abstracted from, and the remaining data that is
called the satellite data. Together they form a record, which is an element of the data
1.2. Abstraction Levels in Data Organization 11
structure. Data structures can change their content as elements are added or deleted,
and for that reason a data structure is a dynamic set.
Example 1.2 A group of students as a “set of objects”, with data for each student con-
sisting of their names, used as the corresponding key, and their date of birth, address
and student number, organized in a data structure in the form of an alphabetical list.
1.2.1 Abstract Data Structures
We can further ignore what particular data is involved, and consider only the abstract
structure into which data is organized for the purpose of efficient storage and retrieval,
e.g., a linked list, a binary search tree, etc. In this way, we obtain an abstract data
structure (ADS). An ADS is defined by the details of its internal organization and by
internal operations that are used to maintain its properties as elements are added or
removed. Information about what kind of data will be stored in such structures is not a
part of abstract data structure specification. This has an important practical consequence
that the same abstract data structure can be used to store different type of data, say
integers or floating point numbers, by simply instantiating the same data structure with
a different particular type of data.
Examples of Abstract Data Structures
1.1 A Singly Linked List is an Abstract Data Structure organized in a list of elements
that has a special beginning element (the head) and an end element (the tail). Further,
each element contains information what the next element in the list is.
12 Chapter 1. Introduction
1.2 A Doubly Linked List satisfies all properties of a singly linked list, but also con-
tains with each element information what are its predecessor and its successor.
1.3 An array allows a direct and thus fast (constant time) access to each of its elements,
but to insert a new element into the array while preserving the order among the existing
elements, we may have to move a lot of elements to free the appropriate space, as shown
on Figure 1.3.
In a linked list access to a particular element may require traversing many elements
in the list, but once the appropriate insertion point has been reached, insertion and
deletion involve changing only a few references (pointers), as shown on Figure 1.4.
This difference should be kept in mind when deciding if a list of elements should be
represented using an array or a linked list.
1.2. Abstraction Levels in Data Organization 13
Exercise 1.1 We say that a linked list contains a loop if one of its elements points back
to one of the previous elements in the list; see Figure 1.5.
1. Write a code fragment that, given any two integers m,n such that 0 ≤ n < m it produces a singly linked list of total length m and with a loop of length n. (Note that n can be zero.) The content of each link is 0. 2. Write a code fragment that, given a singly linked list, say produced by the code obtained in the first part of this exercise, it determines whether the list contains a loop or not. The program is not allowed to make any modifications to the list, and it can use only constant amount of additional storage space, i.e., an amount of additional space that does not depend on the size of the list. Hint: Try advancing two pointers, one twice as fast as the other. 3. Let A be an automaton that has a memory of 4 bits only, 3 bit wide input and output ports on two of its sides, an input for the synchronization clock pulses, and a single bit special output, as shown on Figure 1.6. Consider now a very long chain consisting of 2k such automata (k is very large), shown on Figure 1.7, such that output ports of each automaton are directly con- nected to the input ports of the next and the previous automaton. At every clock cycle, each automaton can read the input port values, perform a computation and set the values of the output port that become available to the previous and the next automata on the next clock cycle. Thus, it takes at least n clock cycles to send a message from the first to the last automaton. Design an algorithm that should be implemented by such automata to allow them to function in the follow- ing way. After a triggering pulse arrives at the input port of the first automaton, all 14 Chapter 1. Introduction automata should output 1 at their special outputs simultaneously (after the nec- essary number of clock cycles), and then reset and wait for the next triggering pulse. Notice that, given extreme memory limitations of these finite automata, they cannot determine their position in the chain. Hint: You might find the hint for 2. above inspiring! 1.4 In a Binary Search Tree (BST) the key at every node is equal or larger than all keys in the left subtree at this node, and equal or smaller than all keys at the right subtree at that node; see Figure 1.8. Such tree itself is not data and caries no information, but it is a construct used to organize data efficiently. The time complexity of a search for a particular key in a binary search tree is bounded by the height of such a tree, rather than by its size. This is important, because, as we will see, the size of a well balanced tree is exponential in its hight, and there are methods of ensuring a good balance as elements are added or deleted. Note that a tree is well balanced if at each of its nodes the height of the left subtree is about the same as the height of the right subtree; we will explain the details later. If we have a collection of elements with a mutual relationship that naturally forms a tree, then we must treat such a tree structure as a part of data. For example, members of an extended family with “x is a parent of y” relationship form a genealogical tree. However, an abstract data structure that can be used to represent such a genealogical tree way might not itself be a tree but, for example, a table instead, as shown on the Figure 1.9. Consequently, we must be careful to distinguish between structures that are inherent in the collection of objects we consider and are thus part of data, from the auxiliary structures introduced in the process of data organization whose purpose is only to allow 1.2. Abstraction Levels in Data Organization 15 efficient storage and retrieval of data. 1.2.2 Abstract Data Types If we further ignore the particular internal organization of a data structure, and consider only its functionality, then we obtain an abstract data type (ADT). Functionality of an ADT is provided by a set of operations forming the interface of the ADT. This is the only aspect of an ADT visible and accessible from the outside of an ADT. We say that these operations are exported by the ADT. The program that utilizes an ADT via its interface operations is usually called the client. An ADT is realized by an appropriate abstract data structure that is in turn imple- mented as a construct in a particular programming language. For example, the Stack ADT can be realized by a Linked List ADS, which can be implemented as a pointer based construct in, say, C. This is illustrated on the Figure 1.10. Examples of Abstract Data Types 1.5 Stack ADT has interface consisting of operations push(x), pop(), size(), top(), makeEmpty() and isEmpty(). Notice that the stack itself is a “hidden variable” of these operations; only push has an explicit input. Insertions and deletions from the stack follow the last-in-first-out scheme (LIFO). Its functionality is described by the following proerties of its operations: • push(x) takes an element x as input and inserts it at the top of the stack; since the stack itself is hidden, this operation produces no output; • pop() removes and returns the top element of the stack as the output; if the stack is empty, it returns an error message; 16 Chapter 1. Introduction • top(x) returns the top element of the stack as the output without removing it; if the stack is empty, it returns an error message; • size() returns the number of elements in the stack; • makeEmpty() makes the stack empty; • isEmpty() returns True just in case the stack is empty. Note that in the above “programming language type” of notation, the stack itself is hid- den, because the interface functions hide “the inside” of the abstract data structure. To formulate the above properties in a rigorous way, we make the hidden “stack” variable explicit and use for it a variable of separate kind, denoted by a Greek letter. Accord- ingly, the output of operations also have two components: the “visible”, exported value given by the first element of the ordered pair, and the “invisible” new stack content given by the second element of the ordered pair. 1. pop(push(x,a))=(x,a). Thus, pop returns the top element of the stack a, but it also modifies the stack by removing the top element; consequently, the output is the pair consisting of the top element x and the stack a. 2. top(push(x,a))=(x, push(x,a)). Thus, top returns the top element of the stack a, without removing it; the output is the top element x and the stack push(x,a). 3. isEmpty(a)= True⇒ size(a)= 0; size(push(x,a))=size(a)+1 4. isEmpty(a)= True⇒ pop(a)= top(a)= error"EmptyStack" 5. isEmpty(makeEmpty(a))= True 1.2. Abstraction Levels in Data Organization 17 The functionality of the queue is thus defined purely algebraically. This highlights the fact that an ADT is defined only by its functionality, completely hiding how such functionality is achieved. There are important practical benefits of such an approach: 1. It eliminates the need to maintain uniform data representation throughout the sys- tem: for as long as we know how interface operations input and output data, inside an ADT data can be represented in an arbitrary way, different from the way how the same data is represented in the client program. Thus, changes in the design of the client or the ADT raise no question regarding their subsequent compatibility. 2. Design becomes modular; different modules can be designed by different people, who need to know nothing about how other modules are designed, but only what the interface operations are and what the functionality provided by the ADT is. 3. The structure of the system can be described on a higher level, without going into the details of implementation. This greatly simplifies design and understanding of complex systems. It also simplifies testing and verification of systems, because each module can be tested separately prior to its incorporation into the system. 4. Modules can be easily reused in other systems without change for as long as the functionality needed is the same. Examples of stack applications include: history of pages visited in a Web browser; undo sequence in a text editor; CPU stack that handles recursion; etc. Exercise 1.2 1. Consider algebraic expressions built using the following rules: 18 Chapter 1. Introduction (a) a variable denoted by a small Roman letter is an expression. For example, x and y are both expressions; (b) if t1 and t2 are expressions, then so is (t1t2); (c) expressions can be built only using the above two rules. Thus, for example ((xy)z)w) is a correct expression (we do not remove the outer- most parentheses), but, for example, (xy)(z is not a correct expression. Similarly, ((xy)) is not a correct expression, because it cannot be obtained using the above rules. Design an algorithm that checks if a sequence of variables and brackets is a correct algebraic expression or not, i.e., if it has correctly matched parenthe- ses. Implement your procedure and test it. Hint: Set a counter and increment it whenever you open a bracket and decrement it when you close it. Ignore the variables. The other most commonly used ADT is the queue. Unlike the stack that follows the LIFO scheme, insertions and deletions from the queue follow the first-in-first-out scheme (LIFO) as shown on the Figure below. 1.6 Queue ADT has interface consisting of operations enqueue(x), dequeue(), front(), size(), makeEmpty() and isEmpty(). Again, the stack itself is a “hidden variable” of these operations; only push has an explicit input. Since inser- tions and deletions from the queue follow the first-in-first-out scheme (FIFO), a queue encodes temporal order how elements have been added. Its functionality is specified by the following properties of its operations: • enqueue(x) takes an element x as input and inserts it at the end of the queue; since the queue itself is hidden, this operation produces no output; 1.2. Abstraction Levels in Data Organization 19 • dequeue() removes and returns the first element in the queue as the output; if the queue is empty, it returns an error message; • front() returns the first element in the queue as the output; if the queue is empty, it returns an error message; • size() returns the number of elements in the queue; • makeEmpty() makes the queue empty; • isEmpty() returns True if the queue is empty. An example of the Queue ADT is the printing queue: jobs are printed in the order they arrive at the printer. As with the stack, if we make the queue itself visible by using a variable of a separate kind, we can define the functionality of a queue purely algebraically: 1. isEmpty(a)=True ⇒ decueue(a)=front(a)=error"EmptyQueue" 2. isEmpty(a)=True ⇒ decueue(enqueue(x,a))=(x,a) 3. isEmpty(a)=True ⇒ front((enqueue(x,a)))=(x,enqueue(x,a)) 4. isEmpty(a)=False ⇒ decueue(enqueue(x,a))=decueue(a) 5. isEmpty(a)=False ⇒ front(enqueue(x,a))=front(a) 6. isEmpty(a)=True ⇒ size(a)=0 7. size(enqueue(x,a))=size(a)+1 8. isEmpty(makeEmpty(a))=True 20 Chapter 1. Introduction Exercise 1.3 A simple implementation of a queue would consist of an array together with two pointers that keep track of where the front and the rear end of the queue are. However, with such a design, as elements are added and removed, the content of the queue would move through the array and would eventually “run away” (overflow the array) even if there are fewer elements in the queue than the size of the array. Solve this problem by designing a circular structure. Implement it and test it. 1.7 Binary Search Tree (BST) is an ADT for efficient storage, removal and retrieval of records (k,a) with keys that are linearly ordered. There are several versions of BST; we chose one with the following interface operations: • insert((k,x)) inserts element (k,x) into the BST; if there is already an element (k,y) with the same key k, it is replaced with the new element; • delete(k) deletes the element with the key k from the BST; if there is no such element, it leaves the tree unchanged. • find(k) finds and outputs the element with key k if there is such element in the BST and it returns an error message if there is no such element; • in(k) returns True just in case the BST contains an element with the key k. • size() returns the number of elements in the BST; • isEmpty() returns True if the BST is empty. • makeEmpty() makes the BST empty. • We require that all of the above operations run in time bounded by K lg(n+ 2) where K is a fixed constant, n is the number of elements in the tree, and lg n = log2n. 1.3. Recursion 21 Thus, a purely algebraic definition of a Binary Search Tree ADT is expected to be more complicated than the definitions of a stack or a queue, because it involves not only the interface operations but also their complexity constraints. Informally speaking, such complexity constraints are met by requiring that a search in a BST that is obtained by joining two trees of equal size takes only a constant number of steps more than the search in one of them. We postpone the details until a later, systematic treatment of BST. Note that in the definition of BST there is no reference to any kind of tree structure, but only to the properties of the interface operations. Their name comes from the fact that they are in fact realized using abstract data structures that do have a tree structure. Examples include AVL trees and Red-Black trees. Clearly, unlike stacks and queues, a BST does not encode the order in which elements are added to the structure, but instead, it allows retrieval of a record by the value of its key. 1.3 Recursion 1.3.1 What is Recursion? Example 1.3 (Towers of Hanoi) Referring to the Figure 1.12, the task is move discs from A to C, one at a time, never placing a larger disc on top of a smaller disc. Algorithm: To move n discs from A to C: Move n−1 discs from A to B; Move the largest disc from A to C; Move n−1 discs from B to C. Thus, to obtain the procedure for n discs, we call the same procedure for n− 1 discs 22 Chapter 1. Introduction twice, with a simple modification (changing the roles of poles). We say that such recur- sion is linear or of type n→ n−1, because the computation for an input with n elements (discs) is reduced to several computations with n−1 elements. Let XY stands for “move the top-most disc of pole X to pole Y ”. For example, AB means “move the top-most disc of pole A to pole B”. If we denote by ∧ concatenation of moves, then the above algorithm can be written as: f (n,A,B,C) = f (n−1,A,C,B)∧AC∧ f (n−1,B,A,C) For n = 4 we can “unwind” the recursion as in the Figure 1.13 below. The boxes correspond to recursive calls. Note that computations in each box can be completed only after the computations in the smaller boxes contained within are completed. Exercise 1.4 Tom and his wife Mary went to a party where nine more couples were present. Not every one knew every everyone else, so people who did not know each other introduced themselves and shook hands. People that knew each other from before did not shake hands. Later that evening Tom got bored, so he walked around and asked all other guests (including his wife) how many hands they had shaken that evening, and got 19 different answers. How many hands did Mary shake? Hint: Try recursion on the number of couples, by finding what couple you should elim- inate first. If there are 19 different answers what could they be? Which ones are the most interesting numbers among those 19 numbers and how are these people related? 1.3.2 Types of Recursion Consider the following two recipes to how to cook potatoes: 1.3. Recursion 23 “Take a kilo of potatoes and boil them for sixty minutes.” “Take a kilo of potatoes and boil them until soft.” These two recipes exemplify the first major distinction among recursion types. Type A is an example of a recursive procedure that involves a number of steps that can be determined before the procedure has started (“keep boiling potatoes for 60 one minute intervals”), i.e., of the form: for i=0 to 60 do {. . .}. On the other hand, the number of “one minute boiling intervals” in the procedure B cannot be determined in advance; it is an example of a recursion of the form do{. . .} until {. . .} Functions computable using only “for i=0 to . . . do {. . .}” loops are called primitive re- cursive functions; functions computable using “do{. . .} until{. . .} ” loops are called general recursive functions. Example 1.4 1. The factorial function is defined by primitive recursion: 1! = 1; n! = n · (n− 1)! Note that such function is easily definable by initializing k = 1 and then executing the loop for i = 1 to n do k ← k · i. 2. f(n) = nth number p such that both p and p+2 are prime. To get such a number we initialize k=0, p=2 and then execute: do {{if p and (p+2) are prime then k←k+1}; p←p+1} until {k==n} . In fact, we do not even know if for every n our algorithm will terminate. The hy- pothesis that there infinitely many primes p such that p+ 2 is also a prime is known the twin primes hypothesis. There are functions that are not primitive recursive, yet they are still simpler than most general recursive functions. The best known example is the Ackerman function defined 24 Chapter 1. Introduction by double recursion: F(0,n) = n+1 F(m,0) = F(m−1,1) F(m,n) = F(m−1,F(m,n−1)) for m, n > 0
One can show that such function cannot be reduced to the usual primitive recursion; in
fact, Ackerman function grows faster than any function defined by primitive recursion.
Exercise 1.5 Write a program that computes F(m,n) and try to evaluate F(3,3) and
F(4,4).
In this booklet we call a harder exercise a problem; nevertheless, you should not be
discouraged; try to solve it!
Problem 1.1 Two thieves have robbed a warehouse and have to split a pile of items
without price tags on them. How do they do this in away that ensures that each thief
believes that he has got at least one half of the value of the whole pile? (You might want
to try to solve this problem before reading any further.) The solution is that one of the
two thieves splits the pile in two parts, so that he believes that both parts are of equal
value. The other thief then chooses the part that he believes is at least one half. Assume
now that ten thieves have robbed a warehouse. How do they split the pile of items so
that each thief believes that he got at least one tenth of the total value of the pile?
Linear Recursion considered above (except the double recursion used to define the
Ackerman function) reduces computation of F(n,x) to one or several computations
of F(n− 1, . . .). For example, identity n! = n(n− 1)! reduces computation of n! to a
1.3. Recursion 25
computation of (n− 1)! and a multiplication of the result by n. Much more efficient
is divide-and-conquer recursion, that reduces a computation of F(n,x) to computations
of F(dn/ke, . . .), where k > 1. Often k = 2, and so F(n, . . .) is reduced to computations
of the form F(dn/2e, . . .). We sometimes do not write integer part function and simply
write F(n/2, . . .) for F(dn/2e, . . .), if no confusion can arise.
Example 1.5 y = xn can be computed using linear recursion as follows: x0 = 1; xn+1 =
x · xn. However, one can also use recursion as follows: x0 = 1; x2n = (xn)2; x2n+1 =
x · (xn)2. Clearly, this way the number of multiplications needed to evaluate xn is at
most 2 lg n.
Example 1.6 We are given 27 coins of the same denomination; we know that one of
them is counterfeit and that it is lighter than the others. Find the counterfeit coin by
weighing coins on a pan balance only three times.
Solution: Split 27 = 33 coins into three equal groups with 9 coins each. Put two
groups on the pan balance. If equal, the false coin must be in the third group; otherwise
take the lighter group. Split these nine coins into three groups with three coins each and
compare two of these groups. As in the previous step, you can find a group containing
the false coin. Finally, compare two of the three coins left; if equal, the false coin is the
remaining coin, otherwise take the coin on the lighter side. 2
Notice that if we had 3n coins and can weigh coins n times, then splitting them into
three groups with 3(n−1) coins each and comparing two of the three groups we end up
with 3(n−1) coins where the fake coin might be and can weigh n−1 times. Thus, this
type of recursion reduces the size of the problem into a sub-problem of size equal to a
26 Chapter 1. Introduction
fraction of the original problem; this is why such recursion is called divide-and-conquer
recursion.
Exercise 1.6 We are given twelve coins and one of them is a counterfeit but we do not
know if it is heavier or lighter. Determine which one is a counterfeit and if it is lighter
or heavier by weighing coins on a pan balance three times only.
Hint: divide and conquer; note that you can reuse coins that are established not to be
counterfeit!
Example 1.7 We have nine coins and three of them are heavier than the remaining six.
Can you find the heavier coins by weighing coins on a pan balance only four times?
Solution: If we weigh the coins four times, since every weighing has three possi-
ble outcomes, we have altogether 34 = 81 outcomes altogether. However, there are(9
3
)
= 9·8·73! = 84 possible combinations what three coins out of nine might be heavier.
Thus, the number of possibilities we have do distinguish between is larger than the
number of possible outcomes of our weighing! Consequently, it is not possible to find
the heavier coins by weighing coins on a pan balance only four times. 2
This is an example of the lower bound estimation for the complexity of algorithms,
i.e., estimation of the minimal number of steps needed to solve a problem for an input
of given size. Exactly the same method as above can be used to determine the minimal
number of steps needed to sort a sequence of numbers by comparing pairs of numbers.
1.3. Recursion 27
1.3.3 Linear versus Divide-and-Conquer Recursion in Sorting
1.8 Algorithm 1.1 is called Insertion Sort and is based on the procedure used to sort
playing cards at hand. Note that the procedure does not call itself; thus, it is an iterative
procedure, rather than a recursive procedure. On the other hand, algorithm 1.2 is a
“genuine recursion” version of insertion sort.
Algorithm 1.1: INSERTION-SORT(A)
1: for j← 2 to length[A] do
2: key← A[ j]
3: i← j−1
4: while i > 0 and A[i]> key do
5: A[i+1]← A[i]
6: i← i−1
7: A[i+1]← key
Algorithm 1.2: INSERTION-SORT-REC(A[1 . . .n])
1: INSERTION-SORT-REC(A[1 . . .n−1])
2: key← A[n]
3: i← n−1
4: while i > 0 and A[i]> key do
5: A[i+1]← A[i]
6: i← i−1
7: A[i+1]← key
Note that insertion sort uses only the space taken by the elements of the array plus
only one extra storage space for the key. Thus, it is very memory efficient. We say that
it sorts in place.
We seldom sort “unattached” numbers; most often we sort records according to
their keys. When sorting by keys, an important feature that a sorting algorithm can
have is stability. A sorting algorithm is stable if it does not change the ordering among
28 Chapter 1. Introduction
equal keys. For example, medical records of patients arriving to the emergency ward
of a hospital are kept sorted according to their keys, that are assigned in a way that
reflects the severity of the patient’s injury. Thus, the algorithm that sorts medical records
according to such a key must be stable, to keep the patients with the same severity of
the injury in the same order as they have arrived. This ensures that the waiting time for
each patient is kept as short as possible.
We now analyze INSERTION-SORT i.e., we estimate its run-time.
Worst case: The array is in reverse order (from the largest to the smallest number).
In this case each element of the array (except the first) is put in key and then compared
to all of the previous elements. All of the previous elements have to be shifted to the
right and the element stored in key is then put into the first cell of the array. For the ith
element this takes ci many steps. The constant c depends on what exactly elementary
instructions that can be executed in a single step are. Thus the total number of steps is
a quadratic polynomial in n:
n
∑
i=2
ci = c
(
n(n−1)
2
−1
)
= c
n2 +n−2
2
(1.1)
We did not count overhead of invoking subroutines etc. However, this overhead is linear
in n (i.e., a constant multiple of n) and thus it does not change the nature of the estimate
of the time complexity of the entire algorithm, which remains quadratic in n. We will
develop later a calculus (the “O-notation”) that will allow us to estimate the growth rate
of functions without having to worry about such “details”.
Average case: If the numbers in the array are randomly chosen, any member of the
array is smaller than about one half of the previous elements of the array. Thus, for the
ith element in the array there will be about i/2 comparisons and shifts. Consequently,
1.3. Recursion 29
the total number of operations will be about
n
∑
i=2
c
i
2
=
c
2
n2 +n−2
2
(1.2)
i.e., it is again a quadratic polynomial in n. Thus, the average case is about as bad as
the worst case. If you have to sort many arrays with randomly distributed content, the
algorithm will on average run in quadratic time in the length of the arrays. This, as we
will see, is very slow.
Best case: The array is already sorted. In this case each element of the array (except
the first) is put in key and then, after a comparison with the previous element, it is
returned to its old position. Thus, for each element of the array the number of atomic
operations (moving numbers, comparing numbers) is constant. Consequently the total
number of operations is of the form cn, i.e. it is linear in n.
What is wrong with the above “analysis”? There is no such thing as “the best
case analysis”. The “best case” provides no useful information. However, knowing a
significant class of inputs for which an algorithm is particularly fast can be useful. For
example, if you expect few inversions in your input sequences, then INSERTION SORT
algorithm will run in linear time, which is very fast. This happens if you sort checks
in a bank: most of the checks arrive to the bank in the order you write them, with few
inversions.
Exercise 1.7 Let k be a natural number. Consider the family Ak of all arrays A[1 . . .n]
such that for every i ≤ n there are at most k elements among A[1 . . . i− 1] larger than
A[i]. Show that there exists a constant c such that every array A[1 . . .n] from Ak can be
sorted in time c ·n.
Exercise 1.8 Insertion sort consists of two main operations: searching for the right
30 Chapter 1. Introduction
place in the list where the key should be inserted, and inserting the key element into the
list. If the list of elements is realized as an array, what operation is the bottle-neck that
accounts for the quadratic run time of the Insertion Sort? What if the list is implemented
as a linked list? Refer to the remark 1.3 on page 12.
Divide-and-Conquer Recursion consists of three steps at each level of recursion:
• Divide the problem into a number of sub-problems of smaller size.
• Conquer the sub-problems using recursive calls. If the problem size is very small,
i.e., at the base of recursion, do it directly without using a recursion call.
• Combine the solutions of the sub-problems into a solution of the starting prob-
lem.
MERGE-SORT(A, p,r) sorts the elements of the sequence contained in the sub-array
A[p . . .r] of an array A in the following way. It:
• divides the n-element sequence into two subsequences of size dn/2e and bn/2c;
• conquers by sorting the two subsequences using recursive calls of Merge-Sort.
• merges the two sorted subsequences to get the initial sequence sorted.
When the divide procedure produces a sequence of length one, such sequence is
already sorted. To merge two sorted sequences we compare the first members of each
subsequence and pick the smaller one as the next element of the joint sequence. We
repeat this procedure until we exhaust one of the sequences, at which stage we just con-
catenate the remaining part of the other sequence to the end of the joint sequence. We
now give the pseudo-codes of the Merge[A, p,q,r] algorithm which merges sequences
A[p . . .q] and A[q+1 . . .r], and the pseudo-code for the Merge-Sort algorithm:
1.3. Recursion 31
Algorithm 1.3: Merge[A, p,q,r]
1: i← p
2: j← q+1
3: k← p
4: while i < q+1 and j < r+1 do
5: if A[i]≤ A[ j] then
6: B[k]← A[i]
7: i← i+1
8: k← k+1
9: if i = q+1 then
10: for m← j to r do
11: B[k− j+m]← A[m]
12: else
13: B[k]← A[ j]
14: j← j+1
15: k← k+1
16: if j = r+1 then
17: for m← i to q do
18: B[k− i+m]← A[m]
19: for n← p to r do
20: A[n]← B[n]
Algorithm 1.4: MERGE-SORT(A, p,r)
1: if p < r then
2: q← b p+q2 c
3: MERGE-SORT(A, p,q)
4: MERGE-SORT(A,q+1,r)
5: MERGE(A, p,q,r)
32 Chapter 1. Introduction
26 18 26 18 26 18 26 18 26 18 26 18 26 26
19 17 19 17 19 17 19 17 19 17 19 ↓ 19 ↓
13 16 13 16 13 16 16 ↓ ↓
8 7 8 ↓ ↓
↓ ↓
7 8 13 16 17 18 19 26
Example of merging the sequences 〈8,13,19,26〉 and 〈7,16,17,18〉 into a single
increasing sequence 〈7,8,13,16,17,18,19,26〉 is represented in the above table; a run
of Merge-Sort on input sequence 〈7,4,2,5,3,2,3,1〉 is shown on Figure1.15.
Analysis of Merge-Sort. How efficient is the Merge-Sort? Assume that the input
array has n elements, and let us denote by T (n) the running time for arrays of length n.
Divide This part just calculates the middle point of the array, which takes constant
amount of time.
Conquer We apply the same algorithm on arrays of sizes dn/2e and bn/2c. Thus,
the running times are T (dn/2e) and T (bn/2c), which is smaller or equal than 2T (dn/2e).
Combine Uses an application of the MERGE procedure applied on two sequences
of total length n. For each element of the merged sequence we did at most one com-
parison and a few moving operations. Thus, the total number of operations used in the
procedure MERGE is linear in n.
Summing up all three components, we get the inequality: T (n) ≤ 2T (dn/2e)+ cn.
Thus, sice a time complexity function is non decreasing, to find an upper bound for
T (n) it is enough to solve the recurrence T (n) = 2T (dn/2e)+ cn.
We “unwind” the recursion as shown on Figure 1.16 and estimate the height of the
1.4. Asymptotic Time Complexity of Algorithms 33
tree and the work we have to do at each level of the tree.
Clearly, since at each step of divide the size of sub-arrays is halved, for n > 1 the
height of the tree is equal to dlg ne. Since the total number of elements in all sub-arrays
is always n, and the MERGE procedure works in time linear in n, the total work at each
level is cn for some constant c. Thus, the total number of steps of MERGE SORT is a
constant multiple of n lg n.
Since lg n is much smaller than n and, as we have seen, insertion sort runs in time
that is a multiple of n2, MERGE SORT is in general much more efficient than the IN-
SERTION SORT. In practice, MERGE SORT beats INSERTION SORT for inputs of size
about 30. For much larger n the difference in performance is huge; the growth rates of
n2 and n lg n are compared on Figure1.18.
Exercise 1.9 1. Show that the MERGE SORT is a stable sorting algorithm.
2. Show that if we replace line 5: A[i] ≤ A[ j] in Algorithm 1.3 (MERGE) with:
A[i]< A[ j] the algorithm remains correct, but is no longer stable. 1.4 Asymptotic Time Complexity of Algorithms 1.4.1 Basics Present day microprocessors are all capable of implementing the same kind of com- putations, and differ essentially only in terms of speed of execution and their memory capacity. This feature of processors is called Turing completeness. Turing completeness explains why we can buy software for new tasks without having to buy a new, different computer for each new task. Our computers are “universal machines” in the sense that they can execute different tasks by changing only their software. 34 Chapter 1. Introduction Nearly all processors of the type that is commonly used in general purpose com- puters are built according to the same architecture, usually called the Von Neumann Architecture. They all consist of a CPU (Central Processing Unit) and a (potentially infinite) collection of memory cells that can each be directly accessed, called Random Access Memory (RAM). All such machines are equivalent to a theoretical model of computation, called the Universal Turing Machine and thus are equally powerful com- puting machines (save memory and speed constraints). Two different (sequential) processors A,B may have different “hard-wired” basic operations, i.e., different sets of instructions implemented in the hardware by special circuits. Since they can perform the same tasks, every instruction executable as a basic instruction of A, if not hard-wired in B, can be implemented as a sequence of steps of instructions of B. Since there are only finitely many basic instructions implemented in the hardware of A, there exists a natural number M such that every basic instruction of A can be implemented as a sequence of at most M basic instructions of B. Thus, a program p containing only basic instructions available on A can replaced by a program containing only instructions available on B, such that p′ is at most as long as M times the length of p. Consequently, execution time of the same algorithm on A and B for the same input can differ by at most a multiplicative constant M. Since algorithms to be run on a processor of Von Neumann Architecture machines are abstract entities independent of individual platforms on which they can be executed, we need a method for measuring the execution time of an algorithm that is independent for inessential features of the algorithm, such as which particular basic instructions are used in specifying the algorithm. In light of the above discussion, we want to consider two algorithms with execution times that, for sufficiently large inputs, differ by at most a multiplicative constant as equally efficient. This leads us to the notion of asymptotic 1.4. Asymptotic Time Complexity of Algorithms 35 growth rate of functions. 1.4.2 Asymptotic growth rate of functions “Big Oh notation”: f (n) = O(g(n)) is abbreviation for the statement “There exists positive constants c and n0 such that 0 ≤ f (n) ≤ cg(n) f oralln ≥ n0”. In this case we say that g(n) is an asymptotic upper bound for f (n). Thus, f (n) = O(g(n)) means that f (n) does not grow substantially faster than g(n) because a multiple of g(n) eventually dominates f (n). Clearly, multiplying constants of interest will be larger than 1, thus “enlarging” g(n). Example 1.8 1. f (n) = n2 +10n does not grow substantially faster than g(n) = n2 in the sense of the above definition because f (n) = n2+10n≤ 2n2 = 2g(n) for all n≥ 10. Thus, here we can take c = 2, n0 = 10, and so n2 +10n = O(n2). 2. f (n) = 1000n2 does not grow substantially faster than g(n) = n2 because f (n) = 1000n2 = 1000g(n) for all n≥ 0. Thus, for c = 1000 and n0 = 0 we get 1000n2 = O(n2). 3. f (n) = n2 does grow substantially faster than g(n) = 1000n because there are no constants c,n0 such that f (n) = n2 ≤ c1000n for all n ≥ n0. This is because n≥ 1000c implies n2 > 1000cn. Consequently, 1000n = O(n2).
Thus, a linear function can never be an asymptotic upper bound for a quadratic function.
36 Chapter 1. Introduction
“Omega notation”: f (n) = Ω(g(n)) is an abbreviation for the statement “There exists
positive constants c and n0 such that 0 ≤ cg(n) ≤ f (n) for all n ≥ n0.” In this case we
say that g(n) is an asymptotic lower bound for f (n).
Thus, f (n) = Ω(g(n)) essentially says that f (n) grows at least as fast as g(n), be-
cause f (n) eventually dominates a multiple of g(n). Clearly, multiplying constants of
interest will be smaller than 1, thus ”reducing” g(n) by a constant factor.
Example 1.9
1. g(n) = 10n is an asymptotic lower bound for f (n) = n2 because when n is suffi-
ciently large (n > 10) we have g(n) = 10n < n2 = f (n). Thus, here we can take c = 1 and n0 = 10. 2. g(n) = n2 is not an asymptotic lower bound for f (n) = 10n because there is no constant c such that 10n > cn2 for sufficiently large n. No matter how small c > 0
is chosen, cn2 will eventually be larger than 10n.
“Theta notation”: f (n) = Θ(g(n)) stands for f (n) = O(g(n)) and f (n) = Ω(g(n)). In
this case we say that f and g have the same asymptotic growth rate, or that are asymp-
totically equivalent.
Thus, f (n) = Θ(g(n)) means that g(n) is both an asymptotical upper bound and an
asymptotical lower bound for f (n). It is easy to see that in this case there exist positive
constants c1,c2 and n0 such that 0≤ c1 g(n)≤ f (n)≤ c2 g(n) for all n≥ n0.
Example 1.10
1.4. Asymptotic Time Complexity of Algorithms 37
1. n2+5n+1 = Θ(n2) because 0≤ n2 ≤ n2+5n+1≤ 2n2 for all n≥ 6. In general,
for any polynomial p(x) of order k we have p(x) = Θ(xk).
2. 2n+1 = Θ(2n) because 0≥ 2n ≥ 2n+1 = 2 ·2n for all n≥ 0.
1.4.3 Basic property of logarithms
1.9 The Logarithm function is very important for estimating the run time of algorithms.
For example, we have seen that the MERGE SORT algorithm runs in time O(n lg (n)).
We now review briefly the main properties of the logarithm function.
logc(ab) = logc a+ logc b (1.3)
logc(a/b) = logc a− logc b (1.4)
log(ab) = b loga (1.5)
alogab = b (1.6)
loga x = logbx loga b = logb x/ logb a (1.7)
Proof: Properties 1.3 – 1.5 follow directly from the definition of the log function, i.e.,
from the fact that logc x = y just in case c
y = x, and the corresponding basic proper-
ties of exponentiation: cx+y = cxcy; cx−y = cx/cy; (cx)y = cxy. Property 1.7 is a bit
trickier: let logb x = w; thus, b
w = x. Using 1.6 and substituting in bw = x we get
(aloga b)w = x, i.e., aw loga b = x. Take now loga of both sides to get w loga b = loga x, i.e.,
logb x loga b = loga x. 2
Note that 1.7 implies that for every two numbers a,b> 1 we have loga n=Θ(logb n).
1.10 The growth rate of the factorial function. We now show that lg n!=Θ(n lg n).
38 Chapter 1. Introduction
Using Stirling’s formula n! =
√
2πn
(
n
e
)n
(1+Θ(1/n)) and taking lg of both sides we
get:
lg (n!) = lg
√
2πn+n lg n−n lg e+ lg (1+Θ(1/n))
However, it is easy to see that the expression on the right is Θ(n lg n).
Exercise 1.10 Prove lg (n!) = Θ(n lg n) using basic algebra only.
Hint: Note that for n > 0, n! ≤ nn and take logarithm of both sides; for the other
direction show that nn < (n!)2 by pairing suitably factors of (n!)2 and then using the fact that x(n+ 1− x) < n for all 1 ≤ x ≤ n. This inequality can be proved using basic algebra of quadratic polynomials. Exercise 1.11 Determine if f (n) =O(g(n)), f (n) =Ω(g(n)) or f (n) =Θ(g(n)) for the following pairs: f (n) g(n) n (n−2 lg n)(n+ cos n) (lg n)2 lg (nlg n)+2 lg n n1+sin(π n/2))/2 √ n (a plot might help) How important is the run time of an algorithm? Is it just the matter of “elegance”? The graph on the Figure ?? might help explain the significance of algorithm efficiency. To comprehend the growth rate of these functions even better, assume we have algo- rithms with time complexities shown in the table, that are executed on a 1 GHz machine executing one instruction each clock cycle. In the table one can find how long it would take to complete these programs; time is in seconds unless stated otherwise.