Com S 311 Section B Introduction to the Design and Analysis of Algorithms
Lecture Two for Week 1
Xiaoqiu (See-ow-chew) Huang
Iowa State University
January 28, 2021
Analysis of Algorithms
Input: Two arrays A and B of the same length and without duplicates.
Problem: Do A and B contain the same elements? Which of the following strategies is better?
Strategy 1
for each index i in array A
if A[i] does not appear in array B
return false
return true
Strategy 2
make a copy of each array and sort both arrays
for each index i
if A[i] is different from B[i]
return false
return true
Some Measures of the Quality of Algorithms
The amount of time taken by an algorithm
The amount of memory …
The amount of network bandwidth …
The amount of effort to code an algorithm
Under each measure, a better algorithm takes a smaller amount of resource or effort.
We will focus on techniques for estimating the running time of an algorithm without turning the algorithm into a program. However, the techniques can apply to other measures too.
Concepts in Algorithms
A problem consists of a large number of instances.
An instance of the sorting problem is a sequence of numbers: 5,
57, 23, 49, 34.
The size of an instance is the input size, defined as the amount of memory needed to store the instance.
An input instance consists of n numbers with each number taking 64 bytes of memory. The input size is 64 × n.
We only need to compute the input size within a constant factor (accurate to a constant factor).
Thus, the input size in the above example is n, ignoring the constant factor 64.
Concepts in Algorithms
What is the input size of an instance consisting of two numbers each with n digits?
The input size is 2n, or just n, but not 2.
Running Time of an Algorithm
The running time of an algorithm on an instance is the number of primitive instructions executed.
An algorithm may take more time on some instances than on other instances of the same size.
The running time of an algorithm may be expressed as the function of the input size obtained by taking the average over all possible instances of the same size. However, an average case analysis is typically difficult.
Thus we often focus on finding the worst-case running time of an algorithm, the longest running time on any instance of size n, which is expressed as a function (e.g. 20n2 + 10n + 5) of the input size n.
Here the function 20n2 + 10n + 5 is an upper bound on the amount of time taken by the algorithm on any instance of input size n.
Running Time of an Algorithm
Our mathematical analysis based on the RAM model can only determine the running time of an algorithm on any real machine within a constant factor (accurate to a constant factor).
The exact amount of time depends on a particular real machine.
Thus there is no need to determine the exact amount of time taken by an algorithm on the RAM model.
Thus, we can say that the running time of the algorithm is proportional to n2, instead of using 20n2 + 10n + 5. The difference between 20n2 + 10n + 5 and n2 is a constant factor (≤ 45).
We use n2 instead of 20n2 +10n+5 as n2 is a simple function of n.
Big O Notation
This notation is used to give a simple upper bound on the running time of an algorithm.
Let the running time of an algorithm on any instance of size n be bounded from above by T(n).
Let f (n) be a simple function of n (e.g. n, n2 (n times n), n log n). We say that T(n) is O(f (n)) if there is an integer N and a
constant c such that
T(n) <= cf (n) for all n ≥ N.
Note that we use mathematical induction to show that the above inequality holds for all n ≥ N.
INSERTION-SORT(A)
0 n = A.length
1 forj=2ton //jgoesfrom2,3,...upton
2 key = A[j]
3 // Insert A[j] into the sorted sequence A[1..j-1].
4 i=j-1
5 while i>0andA[i]>key
6 A[i+1] = A[i];
7 i=i-1
8 A[i+1] = key;
Analysis of Insertion Sort
Let a constant d be an upper bound on the time (cost) of any line executed once.
The outer loop consists of three statements and a while loop with two statements in its body.
In one iteration of the outer loop, the while loop iterates at most j − 1 times.
So the cost on the while loop is bounded by 3d ×(j −1)+d.
Analysis of Insertion Sort
The worst-case ruuning time T(n) of the algorithm is bounded by the following expression:
T(n) ≤ d + nj=2[4d + (3d × (j − 1) + d)] + d = 2d + nj=2[5d] + 3d(nj=2[j − 1]
= 2d + 5d(n − 1) + 3d[1 + 2 + … + (n − 1)]
= 2d + 5d (n − 1) + 3d [ (n−1)n ]
2
≤ 2d +5dn+3dn2 ≤ 2dn2 +5dn2 +3dn2 = 10dn2 for each n ≥ 1.
So T(n) is in O(n2).
The Divide-and-Conquer Algorithm Design Technique
In a recursive algorithm, we first check if an instance is small enough. If so, we obtain the solution to it in a straightforward manner. Otherwise, we solve it by following a divide-and-conquer strategy, which consists of three steps:
Divide: An instance of the problem is divided into a number of smaller instances of the same problem.
Conquer: Each smaller instance is solved recursively by calling the algorithm on it.
Combine: The solutions to the smaller instances are combined into the solution for the original instance.
Merge Sort
We use the divide-and-conquer technique to design an algorithm (called Merge Sort) for sorting a sequence of n items.
Divide: A sequence of n items is divived into two subsequenes of n and n items, respectively.
Conquer: Each subsequene is sorted recursively.
Combine: The two sorted subsequenes are merged into one sorted
sequence.
Notethatn=n+n. Here⌈x⌉isthesmallestintegerk≥x,
22
and ⌊x⌋ is the largest integer k ≤ x.
So if n is even, then n + n = n + n = n.
If n is odd, i.e., n = 2k + 1 for some k, then
n + n = k + 1 + k + 1 = (k + 1) + k = n 2222
22
2222
MERGE-SORT
Let A[1..n] be an array of n elements.
The array is sorted by calling MERGE-SORT(A, 1, n).
MERGE-SORT(A, p, r) sorts the elements in the subarray A[p..r] by dividing A[p..r ] into A[p..q] and A[q + 1..r ], sorting each new subarray, and merging the sorted subarrays into one sorted subarray, where q = ⌊(p + r )/2⌋.
Let m = r −p+1 be the size of A[p..r]. Then the size of A[p..q] is q − p + 1 = ⌊(p + r)/2⌋ − p + 1 = ⌊(p + r − 2p)/2⌋ + 1
= ⌊(r − p)/2⌋ + 1 = ⌊(r − p + 2)/2⌋ = ⌊((r − p + 1) + 1)/2⌋ = ⌊m + 1)/2⌋ = ⌈m/2⌉
Let floor(x) denote ⌊x⌋ in the following code.
// Sorts the elements in the subarray $A[p .. r]$
MERGE-SORT(A, p, r)
1 2 3 4 5
ifp
T(1) = d,
T (n) = T ( n ) + T ( n ) + dn if n > 1. 22
Analysis of MERGE-SORT
Assumethatnisaperfectpowerof2,i.e.,n=2k forsomek≥0.
Then we have
T(n) = 2T(n/2) + dn = 2[(2T(n/22)) + d(n/2)] + dn
= [22T(n/22) + dn] + dn = 22T(n/22) + 2dn
= 2hT(n/2h) + hdn for 1 ≤ h ≤ k
= 2k T (n/2k ) + kdn = 2k T (1) + kdn = nd + kdn = dn(k + 1)
= dn(1 + log n) ≤ dn(log n + log n) = 2dn log n = cn log n, if n > 2 with c = 2d.
Analysis of MERGE-SORT
Ifnisnotaperfectpowerof2,thenwehave2k
Thus, T (n) is in O(n log n).