CISC 108: Introduction to Computer Science I, Fall 2016 Lab 10
Lab 10: Empirical Computer Science Goals
• Algorithmic runtimes in practice.
Instructions:
Using Sakai, turn in a single file lab10.rkt containing all code and documentation for this assignment. This is an individual lab. Do not work in pairs. However, you should find a parter for Project 2. You should stick with this partner for the rest of the semester. With your partner, you should decide WHICH project 2 you will attempt this week.
Problem 0: Background
Question 0.a: Who are you working with on Project 2 and for the remaining labs (starting Lab 11)? If you
and your partner don’t agree, then the pair will not happen.
The function time records the CPU time, wall clock time, and “Garbage Collection” (“GC”; reclaiming unused physical memory space) time. This time function is (in one way) quite different from any other we have seen so far: the result of the function time is the result of the function being timed. However, time also prints out the CPU, wall clock, and GC times to the interactions window. This printout is a side effect, something we have not heretofore seen outside of functions that write files.
The time function is used by calling (time (fn arg1 arg2 …))
and time will produce whatever value fn produced. However, for this entire assignment, you will be better off doing:
(time (boolean? (fn arg1 arg2 …))
Question 0.b: Why are you better off writing (time (boolean? (fn arg1 arg2 …)) instead of (time (fn arg1 arg2 …))? [Hint: consider that your function fn is sorting 1,000,000 numbers, or computing a 200-digit number to a 100-digit power…]
Problem 1: Sorting
Let’s start with our various sorting algorithms.
a. Write a function that takes as its parameter a natural number n, and a maximum value max, and produces a list of random numbers of length n with values between 0 and max-1. Use random, and the builtin function build-list.
b. Use your function from 1a to compare the structural isort (Insertion Sort) and quicksort that we wrote in class (do not use the built-in versions) and the builtin sort function with respect to CPU utilization. Make sure that you do MULTIPLE RUNS (at least 5) and average the results (or allow your graphing program to produce error bars)! Also, remember that sorting a few thousand elements is quite fast, so you will need to experiment with problem sizes that give useful results (for example, I need about 10,000 elements to get quicksort to take a few ms on my 3-year-old computer, although isort takes almost a minute on the same data).
Page 1 of 4
CISC 108: Introduction to Computer Science I, Fall 2016 Lab 10
Report your results scientifically using a graphing program. Here are some general points to keep in mind for graphs:
• Use a tool that generates the graphs automatically. Any of the spreadsheet tools, for example, will automatically generate a graph (or “chart”) from data in the spreadsheet, and will update the graph when the data changes.
• The entire graph and the x- and y-axes must have appropriate labels, e.g., “Time to execute quicksort on random list”, “time (milliseconds)”, and “length of list”.
• The ranges of the axes should be chosen to be close to (perhaps slightly larger than) the range of values being plotted.
• Graphs should be sufficiently large to show detail.
• Both axes must have tick marks and labels that show the values, e.g., “1000”, “2000”, etc.
• Sometimes you might want to use logarithmic scale for the y-axis. This is effective when
the y-value grows exponentially with x.
• Many modern tools have a “trend” or “best fit” tool which will find the best function from
a specified family that fits your data. For example, you can select “linear” and the tool will find the best line, or “polynomial (degree 2)” and the tool will find the best quadratic function. They will plot this best-fit function along with your data. This is a very effective way to see if your data fits some expected pattern.
Include these graphs in your report (3 graphs for 3 algorithms: isort, quicksort, sort)
c. Answer the following questions in your report. Your answers should refer to your data and graphs.
•
•
•
For insertion sort, do your results fit the O(n2) characterization? Explain, referring to your graphs and data.
For quicksort: do your results fit the O(n log n) characterization? Explain.
Based on your data, which algorithm do you think is used to implement sort?
These problems will be graded by the quality of the data, its presentation, and the arguments in your answers to the question above.
Page 2 of 4
CISC 108: Introduction to Computer Science I, Fall 2016 Lab 10
Problem 2: Exponentiation
Consider the simple computation of exponents, raising a base b to a natural number exponent n. First, we need to consider numerical loops. As long as we are just counting, looping over the natural numbers (the counting numbers) is quite similar to looping over the items in a list:
; A NatNum (a Natural Number) is one of: ;-0
; – (add1 NatNum)
You can think of add1 as a kind of constructor like cons, and so we also have a selector, sub1 that acts like rest. So the template looks like:
(define (n-fun n)
(cond [(zero? n) …]
[(positive? n)
… n …
… (n-fun (sub1 n)) …]))
You can either write add1 and sub1, which are actually defined, or (+ n 1) and (- n 1). For example, here is a function that consumes a string, and creates a list of n copies of the string (basically, this is the body of the builtin function build-list):
(check-expect (duplicate-string “hi” 0) empty)
(check-expect (duplicate-string “hi” 3) (list “hi” “hi” “hi”))
; duplicate-string: string number -> [listof string]
(define (duplicate-string astr n)
(cond [(zero? n) empty]
[(positive? n)
(cons astr (duplicate-string astr (sub1 n)))]))
a. Write slow-expt, which computes b to the nth power using the natural number recursive template on
n. Recall that b to the 0 power is always 1. Do NOT use the builtin BSL function expt.
Observe that this template leaves the multiplication step kind of “hanging out” until the recursive call is finished (like “insert” in our isort example, or “+ 1” in mylength). In your Definitions buffer, directly after your definition of slow-expt write a simple function call/application of the function, like (slow-expt 2 5).
Then use the Stepper to watch the computation proceed. Look at those (* 2 (* 2 (* 2 … delayed operations stack up!! The memory shape of this computation is called a linear recursion. The bigger the exponent n, the more of those delayed (* 2 …) operations will stack up.
In general, computer scientists are interested in two very important issues regarding the relative efficiency of the programs that they write:
1. How much time will it take?
2. How much physical computer memory (“space”) will it take?
The time that slow-expt will take is clearly related to the argument n, the bigger n, the more multiplications, and the more time (even if each multiplication takes less than a microsecond). In fact, here the relationship is linear: if I double n, it will take about double the time.
Similarly, the amount of space that the computation will take is related to how many of those delayed (* 2 …) that the computer must remember while it is waiting to finish the recursion. You’ll hypothesize, correctly, that this is also linear: double the n, double the (* 2 …) that are remembered in the stepper.
Maybe 80% of what you need to do to solve real problems with computers can be handled with templates based on the data you are processing. Of that 80%, a surprisingly small amount of program code really needs to be efficient in space or time (typical programs, including many you probably use,
Page 3 of 4
CISC 108: Introduction to Computer Science I, Fall 2016 Lab 10
are simply incorrect and crash/return the wrong answers regardless of how much time or space is used 🙂 🙂 :-).
However, real programmers do need to consider these constraints for some part of their work—is exponentiation really that important that we should worry about this issue?
Yes, actually it is. For example, every time someone buys something on the Internet it is done so secured only by cryptographic methods, which depend on the math of prime numbers, and (for example) checking primality uses exponentiation.
I don’t mean prime numbers like “13”. I mean, prime numbers with 512 digits. And the simplest algorithm for testing a number p for primality requires that you raise several numbers b < p to the pth power. So, again, raising a 128 digit number to a 512 digit power, might take some time and space!
Fixing the space issue is fairly easy, but requires an alternate template we haven't yet talked about (you’ll see it an a later lab).
Let's turn our attention to time. How can we make the function faster than linear time O(n)?
The trick is (similar to quicksort) to abandon the NatNum template and observe the following "divide
and conquer" strategy:
b8 =b*b*b*b*b*b*b*b [uses7multiplications]
however
b2 = b * b
b4 =b2 *b2
b8 = b4 * b4 [we can use 3 multiplications to compute the exact same answer!!!]
b. Write fast-expt using the idea above. You might want to use the predicates even? and odd? since our trick only works when n is even. However, if n is odd, then bn = b * b(n-1), where (n-1) is even.
You may base your code loosely on slow-expt and the natural numbers template (but note that this is a general algorithm: we are dividing an even number by 2, not subtracting 1 every time). The odd case is actually just like slow-expt. Note that computing base to the 2n needs only one more multiplication than computing base to the n. Double n, but only a constant additional time. This is a logarithmic time algorithm written "O(log n)".
c. Try (time (boolean? (slow-expt 314159 n))), and (time (boolean? (fast-expt 314159 n))), for various big n. What do you observe? Create a graph presenting your results scientifically, averaging multiple runs at each point. What are the "Big O" abstract running times for each algorithm. Do the average running times of these two algorithms behave more like the graphs of f(n) = n, n2, 2n, or log n? Does your graph support your hypothesis?
Page 4 of 4