CS计算机代考程序代写 compiler algorithm CS61B: 2020

CS61B: 2020
Lecture 13: Introduction to Asymptotic Analysis
¡ñ Intuitive Runtime
¡ñ Detailed Analysis of Worst Case Order of Growth
¡ñ Simplified Analysis
¡ñ Big Theta Notation
¡ñ Big O Notation
datastructur.es

61B: Writing Efficient Programs
An engineer will do for a dime what any fool will do for a dollar. Efficiency comes in two flavors:
¡ñ Programming cost (course to date. Will also revisit later).
¡ð How long does it take to develop your programs?
¡ð How easy is it to read, modify, and maintain your code?
¡ö More important than you might think!
¡ö Majority of cost is in maintenance, not development!
¡ñ Execution cost (from today until end of course).
¡ð How much time does your program take to execute?
¡ð How much memory does your program require?
datastructur.es

Example of Algorithm Cost
Objective: Determine if a sorted array contains any duplicates.
¡ñ Given sorted array A, are there indices i and j where A[i] = A[j]?
-3
-1
2
4
4
8
10
12
Silly algorithm: Consider every possible pair, returning true if any match. ¡ñ Are (-3, -1) the same? Are (-3, 2) the same? …
Better algorithm?
datastructur.es

Example of Algorithm Cost
Objective: Determine if a sorted array contains any duplicates.
¡ñ Given sorted array A, are there indices i and j where A[i] = A[j]?
-3
-1
2
4
4
8
10
12
Silly algorithm: Consider every possible pair, returning true if any match. ¡ñ Are (-3, -1) the same? Are (-3, 2) the same? …
Better algorithm?
Today¡¯s goal: Introduce formal technique for comparing algorithmic efficiency.
¡ñ For each number A[i], look at A[i+1], and return true the first time you see a match. If you run out of items, return false.
datastructur.es

Intuitive Runtime Characterizations

How Do I Runtime Characterization?
Our goal is to somehow characterize the runtimes of the functions below.
¡ñ Characterization should be simple and mathematically rigorous.
¡ñ Characterization should demonstrate superiority of dup2 over dup1. dup2
public static boolean dup1(int[] A) { for (int i = 0; i < A.length; i += 1) { for (int j = i + 1; j < A.length; j += 1) { if (A[i] == A[j]) { return true; } } } return false; } public static boolean dup2(int[] A) { for (int i = 0; i < A.length - 1; i += 1) { if (A[i] == A[i + 1]) { return true; } } return false; } dup1 datastructur.es Techniques for Measuring Computational Cost Technique 1: Measure execution time in seconds using a client program. ¡ñ Tools: ¡ð Physical stopwatch. ¡ð Unix has a built in time command that measures execution time. ¡ð Princeton Standard library has a Stopwatch class. public static void main(String[] args) { int N = Integer.parseInt(args[0]); int[] A = makeArray(N); dup1(A); } datastructur.es Time Measurements for dup1 and dup2 N dup1 dup2 10000 0.08 0.08 50000 0.32 0.08 100000 1.00 0.08 200000 8.26 0.1 400000 15.4 0.1 Time to complete (in seconds) datastructur.es Techniques for Measuring Computational Cost Technique 1: Measure execution time in seconds using a client program. ¡ñ Good: Easy to measure, meaning is obvious. ¡ñ Bad: May require large amounts of computation time. Result varies with machine, compiler, input data, etc. datastructur.es public static void main(String[] args) int N = Integer.parseInt(args[0]); int[] A = makeArray(N); dup1(A); } { Techniques for Measuring Computational Cost Technique 2A: Count possible operations for an array of size N = 10,000. ¡ñ Good: Machine independent. Input dependence captured in model. ¡ñ Bad: Tedious to compute. Array size was arbitrary. Doesn¡¯t tell you actual time. operation for (int i = 0; i < A.length; i += 1) { for (int j = i+1; j < A.length; j += 1) { if (A[i] == A[j]) { return true; } } } return false; i= 0 j= i+ 1 less than (<) increment (+=1) equals (==) array accesses count, N=10000 1 1 to 10000 2 to 50,015,001 0 to 50,005,000 1 to 49,995,000 2 to 99,990,000 The counts are tricky to compute. Work not shown. datastructur.es Techniques for Measuring Computational Cost Technique 2B: Count possible operations in terms of input array size N. ¡ñ Good: Machine independent. Input dependence captured in model. Tells you how algorithm scales. ¡ñ Bad: Even more tedious to compute. Doesn¡¯t tell you actual time. for (int i = 0; i < A.length; i += 1) { for (int j = i+1; j)
2N3 + 1
and (&&)
5,000
In other words, if we plotted total runtime vs. N, what shape would we expect?
datastructur.es

Intuitive Order of Growth Identification
Consider the algorithm below. What do you expect will be the order of growth of the runtime for the algorithm?
A. N3 [cubic]
operation
count
less than (<) 100N2 + 3N greater than (>)
2N3 + 1
and (&&)
5,000
Argument:
¡ñ Suppose < takes ¦Á nanoseconds, > takes ¦Â nanoseconds, and && takes ¦Ã nanoseconds.
¡ñ Total time is ¦Á(100N2 + 3N) + ¦Â(2N3 + 1) + 5000¦Ã nanoseconds.
¡ñ For very large N, the 2¦ÂN3 term is much larger than the others.
Extremely important point. Make sure you understand it!
datastructur.es

Intuitive Simplification 2: Restrict Attention to One Operation
Simplification 2: Pick some representative operation to act as a proxy for the overall runtime.
We call our choice the ¡°cost model¡±.
¡ñ Good choice: increment.
¡ñ Bad choice: assignment of j = i + 1.
There are other good choices.
operation
worst case count
i= 0
1
j= i+ 1
N
less than (<) (N2+3N+2)/2 increment (+=1) (N2+N)/2 equals (==) (N2-N)/2 array accesses N2-N for (int i = 0; i < A.length; i += 1) { for (int j = i+1; j < A.length; j += 1) { if (A[i] == A[j]) { return true; } } } return false; cost model = increment datastructur.es Intuitive Simplification 3: Eliminate low order terms. Simplification 3: Ignore lower order terms. for (int i = 0; i < A.length; i += 1) { for (int j = i+1; j < A.length; j += 1) { if (A[i] == A[j]) { return true; } } } return false; operation worst case increment (+=1) (N2+N)/2 datastructur.es Intuitive Simplification 4: Eliminate multiplicative constants. Simplification 4: Ignore multiplicative constants. ¡ñ Why? It has no real meaning. We already threw away information when we chose a single proxy operation. for (int i = 0; i < A.length; i += 1) { for (int j = i+1; j < A.length; j += 1) { if (A[i] == A[j]) { return true; } } } return false; operation worst case increment (+=1) N2/2 datastructur.es Simplification Summary Simplifications: 1. Only consider the worst case. 2. Pick a representative operation (a.k.a. the cost model). 3. Ignore lower order terms. 4. Ignore multiplicative constants. These three simplifications are OK because we only care about the ¡°order of growth¡± of the runtime. operation i =0 j = i +1 operation worst case o.o.g. increment (+=1) N2 less than (<) increment (+=1) equals (==) array accesses count 1 1 to N 2 to (N2+3N+2)/2 0 to (N2+N)/2 1 to (N2-N)/2 2 to N2-N Worst case order of growth of runtime: N2 datastructur.es Simplification Summary Simplifications: 1. Only consider the worst case. 2. Pick a representative operation (a.k.a. the cost model). 3. Ignore lower order terms. 4. Ignore multiplicative constants. These three simplifications are OK because we only care about the ¡°order of growth¡± of the runtime. operation count i =0 1 less than (<) 0 to N increment (+=1) 0 to N - 1 equals (==) 1 to N - 1 array accesses 2 to 2N - 2 operation worst case o.o.g. Worst case order of growth of runtime: datastructur.es Repeating the Process for dup2 Simplifications: This simplification is OK because we specifically 1. Only consider the worst case. only care about worst case. 2. Pick a representative operation (a.k.a. the cost model). 3. Ignore lower order terms. 4. Ignore multiplicative constants. These three simplifications are OK because we only care about the ¡°order of growth¡± of the runtime. operation count i =0 1 less than (<) 0 to N increment (+=1) 0 to N - 1 equals (==) 1 to N - 1 array accesses 2 to 2N - 2 operation worst case o.o.g. array accesses N Worst case order of growth of runtime: N Any of the bottom four operations are good choices. datastructur.es Summary of Our (Painful) Analysis Process Our process: ¡ñ Construct a table of exact counts of all possible operations. ¡ñ Convert table into a worst case order of growth using 4 simplifications. operation count i =0 1 j = i +1 1 to N less than (<) 2 to (N2+3N+2)/2 increment (+=1) 0 to (N2+N)/2 equals (==) 1 to (N2-N)/2 array accesses 2 to N2-N operation worst case o.o.g. increment (+=1) N2 Worst case order of growth of runtime: N2 By using our simplifications from the outset, we can avoid building the table at all! datastructur.es Simplified Analysis Simplified Analysis Process Rather than building the entire table, we can instead: ¡ñ Choose a representative operation to count (a.k.a. cost model). ¡ñ Figure out the order of growth for the count of the representative operation by either: ¡ð Making an exact count, then discarding the unnecessary pieces. ¡ð Using intuition and inspection to determine order of growth (only possible with lots of practice). Let¡¯s redo our analysis of dup1 with this new process. ¡ñ This time, we¡¯ll show all our work. datastructur.es Analysis of Nested For Loops (Based on Exact Count) Find the order of growth of the worst case runtime of dup1. N= 6 int N = A.length; for (int i = 0; i < N; i += 1) for (int j = i + 1; j < N; j += 1) if (A[i] == A[j]) return true; return false; == == == == == == == == == == == == == == == 0 1 2 i 3 4 5 Worst case number of == operations: C = 1 + 2 + 3 + ... + (N - 3) + (N - 2) + (N - 1) C = (N - 1) + (N - 2) + (N - 3) + ... + 3 + 2 + 1 012345 j 2C = N + N + ... + N = N(N - 1) N-1 of these ¡à C = N(N - 1)/2 Analysis of Nested For Loops (Based on Exact Count) Find the order of growth of the worst case runtime of dup1. N= 6 int N = A.length; for (int i = 0; i < N; i += 1) for (int j = i + 1; j < N; j += 1) if (A[i] == A[j]) return true; return false; == == == == == == == == == == == == == == == 0 1 2 i 3 4 5 Worst case number of == operations: C = 1 + 2 + 3 + ... + (N - 3) + (N - 2) + (N - 1) = N(N-1)/2 012345 j operation worst case o.o.g. == N2 Worst case order of growth of runtime: N2 Analysis of Nested For Loops (Simpler Geometric Argument) Find the order of growth of the worst case runtime of dup1. N= 6 int N = A.length; for (int i = 0; i < N; i += 1) for (int j = i + 1; j < N; j += 1) if (A[i] == A[j]) return true; return false; == == == == == == == == == == == == == == == 0 1 2 i 3 4 5 Worst case number of == operations: ¡ñ Given by area of right triangle of side length N-1. ¡ñ Order of growth of area is N2. 012345 j operation worst case o.o.g. == N2 Worst case order of growth of runtime: N2 Big-Theta Formalizing Order of Growth Given a function Q(N), we can apply our last two simplifications (ignore low orders terms and multiplicative constants) to yield the order of growth of Q(N). ¡ñ Example: Q(N) = 3N3 + N2 ¡ñ Order of growth: N3 Let¡¯s finish out this lecture by moving to a more formal notation called Big-Theta. ¡ñ The math might seem daunting at first. ¡ñ ... but the idea is exactly the same! Using ¡°Big-Theta¡± instead of ¡°order of growth¡± does not change the way we analyze code at all. datastructur.es Order of Growth Exercise Consider the functions below. ¡ñ Informally, what is the ¡°shape¡± of each function for very large N? ¡ñ In other words, what is the order of growth of each function? function order of growth N3 + 3N4 1/N + N3 1/N + 5 NeN + N 40 sin(N) + 4N2 datastructur.es Order of Growth Exercise Consider the functions below. ¡ñ Informally, what is the ¡°shape¡± of each function for very large N? ¡ñ In other words, what is the order of growth of each function? function order of growth N3 + 3N4 N4 1/N + N3 N3 1/N + 5 1 NeN + N NeN 40 sin(N) + 4N2 N2 datastructur.es Big-Theta Suppose we have a function R(N) with order of growth f(N). ¡ñ In ¡°Big-Theta¡± notation we write this as R(N) ¡Ê ¦¨(f(N)). ¡ñ Examples: ¡ð N3 + 3N4 ¡Ê ¦¨(N4) ¡ð 1/N + N3 ¡Ê ¦¨(N3) ¡ð 1/N + 5 ¡Ê ¦¨(1) ¡ð NeN + N ¡Ê ¦¨(NeN) ¡ð 40 sin(N) + 4N2 ¡Ê ¦¨(N2) function R(N) order of growth N3 + 3N4 N4 1/N + N3 N3 1/N + 5 1 NeN + N NeN 40 sin(N) + 4N2 N2 datastructur.es Big-Theta: Formal Definition (Visualization) means there exist positive constants k1 and k2 such that: for all values of N greater than some N0. Example: 40 sin(N) + 4N2 ¡Ê ¦¨(N2) ¡ñ R(N) = 40 sin(N) + 4N2 ¡ñ f(N) = N2 ¡ñ k1 = 3 ¡ñ k2 = 5 i.e. very large N datastructur.es Big-Theta Challenge (Visualization) Suppose R(N) = (4N2+3N*ln(N))/2. ¡ñ Find a simple f(N) and corresponding k1 and k2. means there exist positive constants k1 and k2 such that: for all values of N greater than some N0. i.e. very large N datastructur.es Big-Theta Challenge (Visualization) Suppose R(N) = (4N2+3N*ln(N))/2. ¡ñ f(N) = N2 ¡ñ k1 = 1 ¡ñ k2 = 3 means there exist positive constants k1 and k2 such that: for all values of N greater than some N0. i.e. very large N datastructur.es Big-Theta and Runtime Analysis Using Big-Theta doesn¡¯t change anything about runtime analysis (no need to find k1 or k2 or anything like that). ¡ñ The only difference is that we use the ¦¨ symbol anywhere we would have said ¡°order of growth¡±. operation worst case count i =0 1 j = i +1 ¦¨(N) less than (<) ¦¨(N2) increment (+=1) ¦¨(N2) equals (==) ¦¨(N2) array accesses ¦¨(N2) operation worst case count increment (+=1) ¦¨(N2) Worst case runtime: ¦¨(N2) datastructur.es Big O Notation Big Theta We used Big Theta to describe the order of growth of a function. function R(N) order of growth N3 + 3N4 ¦¨(N4) 1/N + N3 ¦¨(N3) 1/N + 5 ¦¨(1) NeN + N ¦¨(NeN) 40 sin(N) + 4N2 ¦¨(N2) We also used Big Theta to describe the rate of growth of the runtime of a piece of code. Big O Whereas Big Theta can informally be thought of as something like ¡°equals¡±, Big O can be thought of as ¡°less than or equal¡±. Example, the following are all true: ¡ñ N3 + 3N4 ¡Ê ¦¨(N4) ¡ñ N3 + 3N4 ¡Ê O(N4) ¡ñ N3 + 3N4 ¡Ê O(N6) ¡ñ N3 + 3N4 ¡Ê O(N!) ¡ñ N3 + 3N4 ¡Ê O(NN!) Big Theta: Formal Definition (Visualization) means there exist positive constants k1 and k2 such that: for all values of N greater than some N0. Example: 40 sin(N) + 4N2 ¡Ê ¦¨(N2) ¡ñ R(N) = 40 sin(N) + 4N2 ¡ñ f(N) = N2 ¡ñ k1 = 3 ¡ñ k2 = 5 i.e. very large N datastructur.es Big O: Formal Definition (Visualization) means there exists positive constants k2 such that: for all values of N greater than some N0. Example: 40 sin(N) + 4N2 ¡Ê O(N4) ¡ñ R(N) = 40 sin(N) + 4N2 ¡ñ f(N) = N4 ¡ñ k2 = 1 i.e. very large N datastructur.es Big Theta vs. Big O Informal meaning: Family Family Members Big Theta ¦¨(f(N)) Order of growth is f(N). ¦¨(N2) N2/2 2N2 N2 + 38N + N Big O O(f(N)) Order of growth is less than or equal to f(N). O(N2) N2/2 2N2 lg(N) We will see why big O is practically useful in the next lecture. datastructur.es Summary Given a code snippet, we can express its runtime as a function R(N), where N is some property of the input of the function (often the size of the input). Rather than finding R(N) exactly, we instead usually only care about the order of growth of R(N). One approach (not universal): ¡ñ Choose a representative operation, and let C(N) be the count of how many times that operation occurs as a function of N. ¡ñ Determine order of growth f(N) for C(N), i.e. C(N) ¡Ê ¦¨(f(N)) ¡ð Often (but not always) we consider the worst case count. ¡ñ If operation takes constant time, then R(N) ¡Ê ¦¨(f(N)) ¡ñ Can use O as an alternative for ¦¨. O is used for upper bounds. Citations TSP problem solution, title slide: http://support.sas.com/documentation/cdl/en/ornoaug/65289/HTML/default/ viewer.htm#ornoaug_optnet_examples07.htm#ornoaug.optnet.map002g Table of runtimes for various orders of growth: Kleinberg & Tardos, Algorithm Design. datastructur.es