CS61B: 2020
Lecture 13: Introduction to Asymptotic Analysis
¡ñ Intuitive Runtime
¡ñ Detailed Analysis of Worst Case Order of Growth
¡ñ Simplified Analysis
¡ñ Big Theta Notation
¡ñ Big O Notation
datastructur.es
61B: Writing Efficient Programs
An engineer will do for a dime what any fool will do for a dollar. Efficiency comes in two flavors:
¡ñ Programming cost (course to date. Will also revisit later).
¡ð How long does it take to develop your programs?
¡ð How easy is it to read, modify, and maintain your code?
¡ö More important than you might think!
¡ö Majority of cost is in maintenance, not development!
¡ñ Execution cost (from today until end of course).
¡ð How much time does your program take to execute?
¡ð How much memory does your program require?
datastructur.es
Example of Algorithm Cost
Objective: Determine if a sorted array contains any duplicates.
¡ñ Given sorted array A, are there indices i and j where A[i] = A[j]?
-3
-1
2
4
4
8
10
12
Silly algorithm: Consider every possible pair, returning true if any match. ¡ñ Are (-3, -1) the same? Are (-3, 2) the same? …
Better algorithm?
datastructur.es
Example of Algorithm Cost
Objective: Determine if a sorted array contains any duplicates.
¡ñ Given sorted array A, are there indices i and j where A[i] = A[j]?
-3
-1
2
4
4
8
10
12
Silly algorithm: Consider every possible pair, returning true if any match. ¡ñ Are (-3, -1) the same? Are (-3, 2) the same? …
Better algorithm?
Today¡¯s goal: Introduce formal technique for comparing algorithmic efficiency.
¡ñ For each number A[i], look at A[i+1], and return true the first time you see a match. If you run out of items, return false.
datastructur.es
Intuitive Runtime Characterizations
How Do I Runtime Characterization?
Our goal is to somehow characterize the runtimes of the functions below.
¡ñ Characterization should be simple and mathematically rigorous.
¡ñ Characterization should demonstrate superiority of dup2 over dup1. dup2
public static boolean dup1(int[] A) { for (int i = 0; i < A.length; i += 1) {
for (int j = i + 1; j < A.length; j += 1) { if (A[i] == A[j]) {
return true; }
} }
return false; }
public static boolean dup2(int[] A) {
for (int i = 0; i < A.length - 1; i += 1) {
if (A[i] == A[i + 1]) { return true;
} }
return false; }
dup1
datastructur.es
Techniques for Measuring Computational Cost
Technique 1: Measure execution time in seconds using a client program.
¡ñ Tools:
¡ð Physical stopwatch.
¡ð Unix has a built in time command that measures execution time.
¡ð Princeton Standard library has a Stopwatch class.
public static void main(String[] args) { int N = Integer.parseInt(args[0]); int[] A = makeArray(N);
dup1(A);
}
datastructur.es
Time Measurements for dup1 and dup2
N
dup1
dup2
10000
0.08
0.08
50000
0.32
0.08
100000
1.00
0.08
200000
8.26
0.1
400000
15.4
0.1
Time to complete (in seconds)
datastructur.es
Techniques for Measuring Computational Cost
Technique 1: Measure execution time in seconds using a client program.
¡ñ Good: Easy to measure, meaning is obvious.
¡ñ Bad: May require large amounts of computation time. Result varies with
machine, compiler, input data, etc.
datastructur.es
public static void main(String[] args) int N = Integer.parseInt(args[0]); int[] A = makeArray(N);
dup1(A);
}
{
Techniques for Measuring Computational Cost
Technique 2A: Count possible operations for an array of size N = 10,000.
¡ñ Good: Machine independent. Input dependence captured in model.
¡ñ Bad: Tedious to compute. Array size was arbitrary. Doesn¡¯t tell you actual
time.
operation
for (int i = 0; i < A.length; i += 1) {
for (int j = i+1; j < A.length; j += 1) {
if (A[i] == A[j]) { return true;
} }
}
return false;
i= 0
j= i+ 1
less than (<)
increment (+=1)
equals (==)
array accesses
count, N=10000
1
1 to 10000
2 to 50,015,001
0 to 50,005,000
1 to 49,995,000
2 to 99,990,000
The counts are tricky to compute. Work not shown.
datastructur.es
Techniques for Measuring Computational Cost
Technique 2B: Count possible operations in terms of input array size N.
¡ñ Good: Machine independent. Input dependence captured in model. Tells you how algorithm scales.
¡ñ Bad: Even more tedious to compute. Doesn¡¯t tell you actual time.
for (int i = 0; i < A.length; i += 1) { for (int j = i+1; j
2N3 + 1
and (&&)
5,000
In other words, if we plotted total runtime vs. N, what shape would we expect?
datastructur.es
Intuitive Order of Growth Identification
Consider the algorithm below. What do you expect will be the order of growth of the runtime for the algorithm?
A. N3 [cubic]
operation
count
less than (<)
100N2 + 3N
greater than (>)
2N3 + 1
and (&&)
5,000
Argument:
¡ñ Suppose < takes ¦Á nanoseconds, > takes ¦Â nanoseconds, and && takes ¦Ã nanoseconds.
¡ñ Total time is ¦Á(100N2 + 3N) + ¦Â(2N3 + 1) + 5000¦Ã nanoseconds.
¡ñ For very large N, the 2¦ÂN3 term is much larger than the others.
Extremely important point. Make sure you understand it!
datastructur.es
Intuitive Simplification 2: Restrict Attention to One Operation
Simplification 2: Pick some representative operation to act as a proxy for the overall runtime.
We call our choice the ¡°cost model¡±.
¡ñ Good choice: increment.
¡ñ Bad choice: assignment of j = i + 1.
There are other good choices.
operation
worst case count
i= 0
1
j= i+ 1
N
less than (<)
(N2+3N+2)/2
increment (+=1)
(N2+N)/2
equals (==)
(N2-N)/2
array accesses
N2-N
for (int i = 0; i < A.length; i += 1) {
for (int j = i+1; j < A.length; j += 1) {
if (A[i] == A[j]) { return true;
} }
}
return false;
cost model = increment
datastructur.es
Intuitive Simplification 3: Eliminate low order terms.
Simplification 3: Ignore lower order terms.
for (int i = 0; i < A.length; i += 1) {
for (int j = i+1; j < A.length; j += 1) {
if (A[i] == A[j]) { return true;
} }
}
return false;
operation
worst case
increment (+=1)
(N2+N)/2
datastructur.es
Intuitive Simplification 4: Eliminate multiplicative constants.
Simplification 4: Ignore multiplicative constants.
¡ñ Why? It has no real meaning. We already threw away information when we chose a single proxy operation.
for (int i = 0; i < A.length; i += 1) {
for (int j = i+1; j < A.length; j += 1) {
if (A[i] == A[j]) { return true;
} }
}
return false;
operation
worst case
increment (+=1)
N2/2
datastructur.es
Simplification Summary
Simplifications:
1. Only consider the worst case.
2. Pick a representative operation (a.k.a. the cost model).
3. Ignore lower order terms.
4. Ignore multiplicative constants.
These three simplifications are OK because we only care about the ¡°order of growth¡± of the runtime.
operation
i =0
j = i +1
operation
worst case o.o.g.
increment (+=1)
N2
less than (<)
increment (+=1)
equals (==)
array accesses
count
1
1 to N
2 to (N2+3N+2)/2
0 to (N2+N)/2
1 to (N2-N)/2
2 to N2-N
Worst case order of growth of runtime: N2
datastructur.es
Simplification Summary
Simplifications:
1. Only consider the worst case.
2. Pick a representative operation (a.k.a. the cost model).
3. Ignore lower order terms.
4. Ignore multiplicative constants.
These three simplifications are OK because we only care about the ¡°order of growth¡± of the runtime.
operation
count
i =0
1
less than (<)
0 to N
increment (+=1)
0 to N - 1
equals (==)
1 to N - 1
array accesses
2 to 2N - 2
operation
worst case o.o.g.
Worst case order of growth of runtime:
datastructur.es
Repeating the Process for dup2 Simplifications:
This simplification is OK because we specifically
1. Only consider the worst case. only care about worst case.
2. Pick a representative operation (a.k.a. the cost model).
3. Ignore lower order terms.
4. Ignore multiplicative constants.
These three simplifications are OK because we only care about the ¡°order of growth¡± of the runtime.
operation
count
i =0
1
less than (<)
0 to N
increment (+=1)
0 to N - 1
equals (==)
1 to N - 1
array accesses
2 to 2N - 2
operation
worst case o.o.g.
array accesses
N
Worst case order of growth of runtime: N
Any of the bottom four operations are good choices.
datastructur.es
Summary of Our (Painful) Analysis Process
Our process:
¡ñ Construct a table of exact counts of all possible operations.
¡ñ Convert table into a worst case order of growth using 4 simplifications.
operation
count
i =0
1
j = i +1
1 to N
less than (<)
2 to (N2+3N+2)/2
increment (+=1)
0 to (N2+N)/2
equals (==)
1 to (N2-N)/2
array accesses
2 to N2-N
operation
worst case o.o.g.
increment (+=1)
N2
Worst case order of growth of runtime: N2
By using our simplifications from the outset, we can avoid building the table at all!
datastructur.es
Simplified Analysis
Simplified Analysis Process
Rather than building the entire table, we can instead:
¡ñ Choose a representative operation to count (a.k.a. cost model).
¡ñ Figure out the order of growth for the count of the representative operation
by either:
¡ð Making an exact count, then discarding the unnecessary pieces.
¡ð Using intuition and inspection to determine order of growth (only
possible with lots of practice).
Let¡¯s redo our analysis of dup1 with this new process.
¡ñ This time, we¡¯ll show all our work.
datastructur.es
Analysis of Nested For Loops (Based on Exact Count)
Find the order of growth of the worst case runtime of dup1. N= 6
int N = A.length;
for (int i = 0; i < N; i += 1)
for (int j = i + 1; j < N; j += 1) if (A[i] == A[j])
return true; return false;
==
==
==
==
==
==
==
==
==
==
==
==
==
==
==
0 1 2
i
3
4
5
Worst case number of == operations:
C = 1 + 2 + 3 + ... + (N - 3) + (N - 2) + (N - 1) C = (N - 1) + (N - 2) + (N - 3) + ... + 3 + 2 + 1
012345 j
2C = N + N + ... + N = N(N - 1) N-1 of these
¡à C = N(N - 1)/2
Analysis of Nested For Loops (Based on Exact Count)
Find the order of growth of the worst case runtime of dup1. N= 6
int N = A.length;
for (int i = 0; i < N; i += 1)
for (int j = i + 1; j < N; j += 1) if (A[i] == A[j])
return true; return false;
==
==
==
==
==
==
==
==
==
==
==
==
==
==
==
0 1 2
i
3
4
5
Worst case number of == operations:
C = 1 + 2 + 3 + ... + (N - 3) + (N - 2) + (N - 1) = N(N-1)/2
012345 j
operation
worst case o.o.g.
==
N2
Worst case order of growth of runtime: N2
Analysis of Nested For Loops (Simpler Geometric Argument)
Find the order of growth of the worst case runtime of dup1. N= 6
int N = A.length;
for (int i = 0; i < N; i += 1)
for (int j = i + 1; j < N; j += 1) if (A[i] == A[j])
return true; return false;
==
==
==
==
==
==
==
==
==
==
==
==
==
==
==
0 1 2
i
3
4
5
Worst case number of == operations:
¡ñ Given by area of right triangle of side length N-1.
¡ñ Order of growth of area is N2.
012345 j
operation
worst case o.o.g.
==
N2
Worst case order of growth of runtime: N2
Big-Theta
Formalizing Order of Growth
Given a function Q(N), we can apply our last two simplifications
(ignore low orders terms and multiplicative constants) to yield the order of growth of Q(N).
¡ñ Example: Q(N) = 3N3 + N2
¡ñ Order of growth: N3
Let¡¯s finish out this lecture by moving to a more formal notation called Big-Theta.
¡ñ The math might seem daunting at first.
¡ñ ... but the idea is exactly the same! Using ¡°Big-Theta¡± instead of ¡°order of
growth¡± does not change the way we analyze code at all.
datastructur.es
Order of Growth Exercise
Consider the functions below.
¡ñ Informally, what is the ¡°shape¡± of each function for very large N?
¡ñ In other words, what is the order of growth of each function?
function
order of growth
N3 + 3N4
1/N + N3
1/N + 5
NeN + N
40 sin(N) + 4N2
datastructur.es
Order of Growth Exercise
Consider the functions below.
¡ñ Informally, what is the ¡°shape¡± of each function for very large N?
¡ñ In other words, what is the order of growth of each function?
function
order of growth
N3 + 3N4
N4
1/N + N3
N3
1/N + 5
1
NeN + N
NeN
40 sin(N) + 4N2
N2
datastructur.es
Big-Theta
Suppose we have a function R(N) with order of growth f(N).
¡ñ In ¡°Big-Theta¡± notation we write this as R(N) ¡Ê ¦¨(f(N)). ¡ñ Examples:
¡ð N3 + 3N4 ¡Ê ¦¨(N4)
¡ð 1/N + N3 ¡Ê ¦¨(N3)
¡ð 1/N + 5 ¡Ê ¦¨(1)
¡ð NeN + N ¡Ê ¦¨(NeN)
¡ð 40 sin(N) + 4N2 ¡Ê ¦¨(N2)
function R(N)
order of growth
N3 + 3N4
N4
1/N + N3
N3
1/N + 5
1
NeN + N
NeN
40 sin(N) + 4N2
N2
datastructur.es
Big-Theta: Formal Definition (Visualization)
means there exist positive constants k1 and k2 such that:
for all values of N greater than some N0.
Example: 40 sin(N) + 4N2 ¡Ê ¦¨(N2)
¡ñ R(N) = 40 sin(N) + 4N2
¡ñ f(N) = N2 ¡ñ k1 = 3
¡ñ k2 = 5
i.e. very large N
datastructur.es
Big-Theta Challenge (Visualization) Suppose R(N) = (4N2+3N*ln(N))/2.
¡ñ Find a simple f(N) and corresponding k1 and k2.
means there exist positive constants k1 and k2 such that:
for all values of N greater than some N0.
i.e. very large N
datastructur.es
Big-Theta Challenge (Visualization) Suppose R(N) = (4N2+3N*ln(N))/2.
¡ñ f(N) = N2 ¡ñ k1 = 1
¡ñ k2 = 3
means there exist positive constants k1 and k2 such that:
for all values of N greater than some N0.
i.e. very large N
datastructur.es
Big-Theta and Runtime Analysis
Using Big-Theta doesn¡¯t change anything about runtime analysis (no need to find k1 or k2 or anything like that).
¡ñ The only difference is that we use the ¦¨ symbol anywhere we would have said ¡°order of growth¡±.
operation
worst case count
i =0
1
j = i +1
¦¨(N)
less than (<)
¦¨(N2)
increment (+=1)
¦¨(N2)
equals (==)
¦¨(N2)
array accesses
¦¨(N2)
operation
worst case count
increment (+=1)
¦¨(N2)
Worst case runtime: ¦¨(N2)
datastructur.es
Big O Notation
Big Theta
We used Big Theta to describe the order of growth of a function.
function R(N)
order of growth
N3 + 3N4
¦¨(N4)
1/N + N3
¦¨(N3)
1/N + 5
¦¨(1)
NeN + N
¦¨(NeN)
40 sin(N) + 4N2
¦¨(N2)
We also used Big Theta to describe the rate of growth of the runtime of a piece of code.
Big O
Whereas Big Theta can informally be thought of as something like ¡°equals¡±, Big O can be thought of as ¡°less than or equal¡±.
Example, the following are all true:
¡ñ N3 + 3N4 ¡Ê ¦¨(N4)
¡ñ N3 + 3N4 ¡Ê O(N4)
¡ñ N3 + 3N4 ¡Ê O(N6)
¡ñ N3 + 3N4 ¡Ê O(N!)
¡ñ N3 + 3N4 ¡Ê O(NN!)
Big Theta: Formal Definition (Visualization)
means there exist positive constants k1 and k2 such that:
for all values of N greater than some N0.
Example: 40 sin(N) + 4N2 ¡Ê ¦¨(N2)
¡ñ R(N) = 40 sin(N) + 4N2
¡ñ f(N) = N2 ¡ñ k1 = 3
¡ñ k2 = 5
i.e. very large N
datastructur.es
Big O: Formal Definition (Visualization)
means there exists positive constants k2 such that:
for all values of N greater than some N0.
Example: 40 sin(N) + 4N2 ¡Ê O(N4)
¡ñ R(N) = 40 sin(N) + 4N2
¡ñ f(N) = N4 ¡ñ k2 = 1
i.e. very large N
datastructur.es
Big Theta vs. Big O
Informal meaning:
Family
Family Members
Big Theta ¦¨(f(N))
Order of growth is f(N).
¦¨(N2)
N2/2
2N2
N2 + 38N + N
Big O O(f(N))
Order of growth is less than or equal to f(N).
O(N2)
N2/2 2N2 lg(N)
We will see why big O is practically useful in the next lecture.
datastructur.es
Summary
Given a code snippet, we can express its runtime as a function R(N), where N is some property of the input of the function (often the size of the input).
Rather than finding R(N) exactly, we instead usually only care about the order of growth of R(N).
One approach (not universal):
¡ñ Choose a representative operation, and let C(N) be the count of how many times that operation occurs as a function of N.
¡ñ Determine order of growth f(N) for C(N), i.e. C(N) ¡Ê ¦¨(f(N))
¡ð Often (but not always) we consider the worst case count.
¡ñ If operation takes constant time, then R(N) ¡Ê ¦¨(f(N))
¡ñ Can use O as an alternative for ¦¨. O is used for upper bounds.
Citations
TSP problem solution, title slide:
http://support.sas.com/documentation/cdl/en/ornoaug/65289/HTML/default/ viewer.htm#ornoaug_optnet_examples07.htm#ornoaug.optnet.map002g
Table of runtimes for various orders of growth: Kleinberg & Tardos, Algorithm Design.
datastructur.es