COMP1100/1130 [2020 S1]:
PROGRAMMING AS PROBLEM SOLVING Research School of Computer Science
Menu
menu
» Labs » Week 11: Complexity
In this lab we discuss the topic of algorithmic complexity is, and learn how to determine the complexity of a particular algorithm. We also learn how to use big-O notation to describe complexity.
Table of Contents
Pre-Lab Checklist
Getting Started
Complexity
Best, Worse, Average case Exponential and logarithmic classes Implementing Sets
Extensions
Real World Data
Pre-Lab Checklist
You have completed both Lab09 and Lab10, and are comfortable with binary trees, binary search trees, and the binary search ordering constraint. (We will reuse functions from these labs.)
You are comfortable with higher order functions.
You are familiar with pattern matching against abstract data types.
Getting Started
1. Go to the project’s dashboard under the Project tab and click on the Fork button. The gitlab url for this lab is https://gitlab.cecs.anu.edu.au/comp1100/2020s1student]les/lab11
2. You will be asked where to fork the repository. Click on the user or group to where you’d like to add the forked project. You should see your own name here. Once you have successfully forked a project, you will be redirected to a new webpage where you will notice your name before > project-name. This means you are now the owner of this repository. The url in your browser should re^ect the change: https://gitlab.cecs.anu.edu.au/uXXXXXXX/lab11
3. Finally, clone your forked repository to your computer according to the instructions in the Week 2 Lab.
Complexity
Through the course, we have mentioned several times the concept of “eaciency” of code, and in this lab we hope to make this concept more concrete.
Intuitively, some algorithms are “easy”, in the sense that the time taken grows slowly as the input size grows. “Hard” problems are ones where the time taken grows very quickly with the input size.
The point of determining the algorithmic complexity of an algorithm, is to try and quantify precisely how fast the algorithm runs as a function of input size. A precise description of the complexity is usually too diacult to obtain, so we usually strive for an upper bound and a lower bound, expressed via big O notation.
Best, Worse, Average case
One problem with big O notation is that for an input of size n, the time taken to run may vary wildly with what the particular input is. Take for example the following code, the de]nition of the elem function. The “size” n of the input will be de]ned as the length of the input list.
If the element we are looking for is at the start of the list, then it would only take a single operation to ]nd it, and the complexity would be O(1).
If the element is at the end of the list, or not even present in the list at all, then we have to recurse through the entire list to (attempt to) ]nd it. In this case, the complexity would be O(n).
On average, assuming the element we are looking for is equally likely to be anywhere in the input list, we have to look though on average half the list until we ]nd it, which is O(n/2), which is still O(n).
We resolve these problems by talking about the complexity for the best, worse, and average cases separately In this course, we use the notation O(g) to refer to all three, and explicitly state whether this is best, worse or average case complexity.
Exercise 1
Open the ]le Complexity.hs.
The ]rst has been done for you. For functions that take a list as input, treat the size of the input as the length of that list. For functions that take an integer as input, treat the integer itself as the size n.
Remember that labs are not tests: feel free to discuss these with your fellow students, and with your tutors.
Submission required: Exercise 1 Complexity Exponential and logarithmic classes
Most of the complexity classes that we have seen so far appear to all be polynomial classes, usually linear or quadratic. Logarithmic classes appear when the size of the problem halves after each operation, and exponential classes appear when increasing the input size by one step, makes the problem twice (or worse!) as hard to solve.
Consider the method of bisection search described in Lab 10. At each stage, the number of remaining elements to search through is (approximately) halved. If the number of elements in the tree to begin with was doubled, then it would only take one extra operation to try and ]nd an element in this tree. This means that the complexity of trying to ]nd an element in a binary tree is O(log n).
Exponential cases show up when the problem grows twice as large for each increase in the size of the input problem. For example, consider the rather silly function exp2.
Each evaluation of exp2 calls itself twice more, which gives an order of O(2n). Exercise 2
See Complexity.hs.
Submission required: Exercise 2 fib1, fib2, power Exercise 3
See Complexity.hs. The function power has a reasonable complexity, but it could be made even more eacient. Note the following identities:
x2n =(xn)2 andx2n+1 =x2n ⋅x=(xn)2 ⋅x
We can use the following identities to compute exponentiation quickly.
For example, if we wanted to compute x10, the power function would compute x10 =x⋅x⋅x⋅x⋅x⋅x⋅x⋅x⋅x⋅xusingtenmultiplications.
Using this identity, we can do better: x10 = (x5)2 = ((x2)2 ⋅ x)2 where squaring a term can be evaluated by multiplying a term by itself. Now powerFast only needs four multiplications, performed in the following order.
You should NOT use the operator (**) or (^) to de]ne this function. Submission required: Exercise 3 powerFast
Implementing Sets
A set is a collection of unordered elements without duplicates. The concept of a “]rst” or “last” element is meaningless for a set, as the elements are stored without concern of the ordering. So the set {1, 2, 3} and the set {2, 3, 1} represent the same thing.
We’d like to implement sets in Haskell in two different ways: By using lists, and by using binary search trees.
We’ve seen before that binary search trees can be far more eacient than lists for various operations, and so it’s often useful to use trees as the underlying data structure. Have a look at Set_With_Lists.hs, where we have de]ned a data type Set that is implemented using a list, with the constraint that the list shouldn’t contain duplicates.
As we will see, lists are a clunky and ineacient way to implement sets, as it’s slow to ]nd elements, insert elements, delete elements, and so on.
Exercise 4
Have a look at the SetsWithLists.hs ]le.
Some of the functions have already been implemented for you, and their time
complexities have been stated.
Submission required: Exercise 4 setEquals, addElement, setUnion Exercise 5
Have a look at BinaryTree.hs, which contains a module for all the functions we wrote in Labs 9 and 10. We will need these functions for the rest of the exercises, so copy them over. If you haven’t ]nished any, now is the time to do so.
De]ning the complexity for trees can be diacult, as for a list we can assume that “on average” the element we are looking for is in the middle of the list. For trees, it’s hard to estimate what the average tree looks like, so we will only consider best and worst case. The tree being balanced or not will wildly affect the time complexity, so we consider that seperately.
Your implementations should be as eacient as you can make them, if they were not already.
Submission required: Exercise 5 BinaryTree.hs Exercise 6
Have a look at the SetsWithTrees.hs ]le. It contains all the same functions as SetsWithLists.hs, but now the underlying data structure for a set is a binary search tree. Some of the functions have been done for you, using functions from BinaryTree.hs.
You can use any of the functions imported from BinaryTree.hs to help you.
You may ]nd analysing the time complexity diacult. Don’t worry too much if so, as trees are much more diacult to reason about intuitively than lists. Even so, take a stab, you should at least think about how many operations each function might take: best case, worst case, and everything in between. Again, check and discuss your results with your tutor and your classmates if you are unsure.
Submission required: Exercise 6 SetsWithTrees.hs
Extensions [COMP1100 Optional, COMP1130
Compulsory] Extension 1
Be eacient!
Submission required: Extension 1 SetsWithLists.hs Extension 2
Be eacient!
Submission required: Extension 2 SetsWithTrees.hs Real World Data (Optional)
It can be hard to see how our new implementation of sets performs against lists when we only ever run it on short simple inputs. Here, we provide a testing suite SetTimer.hs and two books, “Alice’s Adventures in Wonderland” (AAIW) and “Through the Looking Glass” (TTLG). The code in SetTimer.hs has been provided for you to run your implementation of Set against a battery of tests, which are then timed for how long they take.
You can compile this code using the command
ghc -Wall -O2 SetTimer.hs
which behaves similarly to the cabal build command you have used in assignments.
You can then run the compiled program with the command
./SetTimer
The code proceeds by the following steps:
1. The two text ]les are read in and converted into lists of lower-case words, of type
[String].
2. Two distinct sets are created, one for each list of words (and therefore each text
]le), by repeatedly using the addElement function in your Set code.
3. Same as Step 2, but the words are inserted into the set in alphabetical order.
4. The input lists of words are ]rst split up into sublists of size 300 or less. These sublists are individually turned into “small” sets using the same technique as step b. Finally, these small sets are repeatedly combined into one large set using the set union operation.
5. Every word from the texts is repeatedly removed from their respective sets, resulting in the empty set.
6. Set difference in used repeatedly on the complete sets and all the small sets created above.
7. All the words that appear in either AAIW or TTLG are calculated, and it is checked which of those words are in both texts, using containsElement.
8. Same as step 7, but this time we take the set intersection of the word sets for AAIW and TTLG.
Exercise 7 (Optional)
And now for the actual task of the exercise. Your challenge is to experiment with your code, and to attempt to come up with ways to make it more eacient. You will ]nd that small changes in your code should produce measurable changes in the timing information—and large changes (such as ]nding a way to perform an operation that is O(n) rather than O(n2), for example) should produce signi]cant changes.
We are deliberately leaving this exercise open-ended. There are many, many different ways to optimise your code, and they will depend almost entirely on how you have written it in the ]rst place. Here are some ]rst-line things to consider:
1. Are there any unnecessary operations performed by a function? For example, does your containsElement function run through a complete list, or does it terminate when (if) it ]nds the desired element?
2. Are you scanning through your data structures more often than necessary? For example, by calling a length function you already went through an entire list once without doing anything but counting the number of elements—not necessarily a clever thing to do if you also have to read or process the actual elements in the list somehow.
3. Do you spot the same function call in multiple spots inside one function? Rather than actually running it multiple times, it might be cleverer to factor it out into a where clause. This will usually enhance both readability and performance.
4. Can you perform an operation using a (possibly higher-order) function from the Prelude or one of the various Haskell libraries? Often, these operations will be more eacient and tightly-coded than yours, because years worth of work has already gone into tuning them. But be careful: most libraries come with no guarantees about eaciency or logical correctness— make sure you understand where a speci]c library came from and how well it was tested or veri]ed. Libraries which are used by pretty much every user of the language on a daily basis (like the core language modules in Haskell) usually provide a higher level of con]dence.
5. How does the performance change when you modify the code to use your SetsWithTrees module rather than the SetsWithLists module? You would hopefully ]nd that it gets better, but is this always the case? Do any operations take longer to perform?
6. How does your tree code perform when you can’t guarantee that elements are inserted or removed in random order? For example, in Step e of the code (removing all elements one by one), the removal is done in exactly reverse order to insertion. What impact will this have on the performance of your tree operations?
7. While the whole texts which you process are only about 150 KB long, you might ]nd that your program uses large amounts of memory to process it. If you want to ]nd out, compile your program with
and run it with
This will give you a report at the end about the total usage of memory and the time your program spend mopping up no longer used memory blocks (“garbage collection” or “GC”). Would your program process a 150 KB long text by using hundreds of MB of memory (which costs time to handle)? Unfortunately there is rather little you can do about this problem from within an “automatically managed memory” language like Haskell.
You should, of course, feel free to discuss these problems with your tutor and with your class-mates. If you ]nd any particularly interesting ways that you can optimise your code, consider posting on the course forum and describing what you’re doing and how you understand the difference. There are no right or wrong answers to this exercise, but there is plenty to be learnt from experimenting.
(Content adapted from Uwe Zimmer 2015)
Note that this lab is more open ended than previous labs, and you’re encouraged to try and experiment with the last exercise.
This lab contains additional exercises for COMP1130 students. COMP1100 students are encouraged to attempt them once they have completed the compulsory exercises to receive the participation marks for the lab.
elem :: (Eq a) => a -> [a] -> Bool elem e list = case list of
[] -> False x:xs
|x == e -> True |otherwise -> elem e xs
Other sources may use different notation (Ω, Θ, O) for best case and average/tight case, rather than using big-O for all three. Be careful!
When talking about algorithms, sometimes it is also useful to consider the space complexity, that is, how the memory used by the function increases with the size of the input. Unfortunately it is very diacult to reason about the space complexity of programs in Haskell due to lazy evaluation and how Haskell actually works under the hood. Consequently, we will not discuss space complexity here.
Be very careful when you are talking about the average running time. “average” means that there is some probability distribution on the inputs. For example, suppose we were playing cards, and we were trying to ]nd the fastest way to sort the deck of cards in order. It’s reasonable to assume that when the cards are shuhed, the deck is more or less uniformly random. But if we were trying to sort a database after inserting a few more people, then the database might only be slightly out of order. The more you know about what the “average” input looks like, the better you can write an algorithm targeting those kinds of input.
Edit the ]le to replace all instances of O(?) with the correct complexity, as shown for elem below.
— | Checks if an element is present in a list. — best O(1), worst O(n), average O(n)
elem :: (Eq a) => a -> [a] -> Bool
elem e list = case list of
[] -> False x:xs
|e == x -> True |otherwise -> elem e xs
— | Computes the length of the input list — best O(?), worst O(?), average O(?) length :: [a] -> Integer
length list = case list of
[] -> 0
_:xs -> 1 + length xs
— | Takes in a list, and — best O(?), worst O(?),
throws the list away, and returns zero. average O(?)
zero :: [a] zero list = [] -> 0 _:xs ->
-> Integer case list of
zero xs
— | Returns the first element in a list — best O(?), worst O(?), average O(?) head :: [a] -> a
head list = case list of
[] -> error “head: Empty list has no first element” x:_ -> x
— | Returns the last element in a list — best O(?), worst O(?), average O(?) last :: [a] -> a
last list = case list of
[] -> error “last: Empty list has no last element” [x] -> x
x:xs -> last xs
— | Takes a list, and removes the duplicate elements — best O(?), worst O(?), average O(?)
nub :: (Eq a) => [a] -> [a]
nub list = case list of
[] -> [] x:xs
|elemxxs-> nubxs |otherwise -> x : nub xs
— | Concatenates two lists together
— best
(++) ::
[]
(x:xs)
O(?), worst O(?), average O(?)
[a] -> [a] -> [a] ++ys=ys
++ ys = x:(xs ++ ys)
— | Reverses a list
— best O(?), worst O(?), average O(?) rev1 :: [a] -> [a]
rev1 list = case list of
[] -> []
x:xs -> (rev1 xs) ++ [x]
— | Reverses a list
— best O(?), worst O(?), average O(?) rev2 :: [a] -> [a]
rev2 list = revHelper list []
where
revHelper list acc = case list of
[] -> acc
x:xs -> revHelper xs (x:acc)
exp2 :: Integer -> Integer -> Integer exp2 n
|n < 0 = error "exp2: undefined for negative arguments" |n == 0 = 1
|n>0 =exp2(n-1)+exp2(n-1)
State the complexity of the following functions: fib1, fib2, power by replacing all the instances of O(?) appropriately.
x^2 := x * x
x^4 := x^2 * x^2 x^5 := x^4 * x x^10 := x^5 * x^5
Use these identities to write a more eacient function powerFast that computes exponentiation by a whole number. How many multiplications must it perform to compute x^n?
Note that much like we cannot enforce the binary search ordering constraint for binary search trees, we cannot enforce that sets cannot contain duplicates as part of the de]nition of the type. It is up to the user to ensure that this constraint is respected.
In SetsWithLists.hs, complete the functions setEquals, addElement, setUnion, and write down the best, worse and average case time complexity for each. Try to be as eacient as you can.
Copy your solutions to Lab 9 and Lab 10 into BinaryTree.hs. State the best and worse time complexity of each function, with and without the assumption of the tree being balanced.
In SetsWithTrees.hs, complete the functions setEquals, addElement, setUnion. State the best and worse time complexity of each function, with and without the assumption of the tree being balanced.
In SetsWithLists.hs, complete the functions removeElement, setIntersection, setDifference, and write down the best, worse and average case time complexity for each.
In SetsWithTrees.hs, complete the functions removeElement, setIntersection, setDifference, and write down the best, worse and average case time complexity for each.
Think about under what circumstances trees will perform better than lists, and when trees will perform worse.
This section is quite long, and it provides you with the opportunity to experiment with running your code on some “real data”. Feel free to spend as much or as little time on it as you wish.
Module SetTimer.hs is initially written to test your SetsWithLists code, and can be easily modi]ed to work with SetsWithTrees by changing a single import statement at the top.
Note that if you haven’t done all the extensions, then the code will crash halfway through execution as soon as it comes across the unde]ned extension functions. This is normal, and you can still see the timing for all other functions up to that point.
Do not attempt to run SetTimer.hs using GHCi. Although interpreters are great for quick, interactive testing and debugging of code, they are not as reliable for testing speed as a compiled program.
You don’t need to (and are not expected to) understand how the code in SetTimer.hs works. Reading from ]les is a side-effecting action, which requires use of the IO () monad, well beyond the scope of this course.
Run the code several times. Do the reported times change? If so, by how much? What does this suggest to you?
ghc -W -O2 –make -rtsopts SetTimer.hs
./SetTimer +RTS -sstderr
Updated: 06 Jun 2020
Responsible Oacer: Director, RSCS
Page Contact:
Course Convenor
Contact ANU
Copyright
Disclaimer
Privacy
Freedom of Information
+61 2 6125 5111
The Australian National University, Canberra CRICOS Provider : 00120C ABN : 52 234 063 906