The University of Melbourne
School of Computing and Information Systems COMP90038 Algorithms and Complexity
Assignment 1, Semester 2 2020
Released: Friday August 28 2020. Deadline: Sunday September 13 2020 23:59 This assignment is marked out of 30 and is worth 20% of your grade for COMP90038.
Ob jectives
To improve your understanding of the time complexity of algorithms and recurrence relations. To develop problem-solving and design skills. To improve written communication skills; in particular the ability to present algorithms clearly, precisely and unambiguously.
Scenario
In this assignment we will consider the following scenario. Suppose you have been tasked with writing some algorithms whose purpose is to manage event logs in a computer program. It is very common for programs to generate messages describing what is occurring during the program. These messages are called events, and are typically saved into one or more files called log files. Each such file is called an event log.
As an example, the following is a sample of a few log messages relating to the wireless network adapter on my MacBook Pro that were generated in the minutes before I typed this sentence.
Mon Aug 24 14:21:35.791
Mon Aug 24 14:21:35.791
Mon Aug 24 14:21:35.791
Mon Aug 24 14:21:35.791
Mon Aug 24 14:21:35.791
Mon Aug 24 14:21:35.791
Mon Aug 24 14:21:35.791
Mon Aug 24 14:21:35.791
Mon Aug 24 14:21:35.791
Mon Aug 24 14:21:35.791
The content of these messages is not important. But notice that each message begins with a timestamp, indicating the date and time at which the log event was generated.
In certain applications, such as the Linux kernel, log events are stored in memory rather than being immediately written to a log file. We will consider a common data structure for storing such events, called a ring buffer (also known as a circular queue). Importantly, this data structure uses a fixed amount of memory. However it behaves like a queue: items are accessed in a first-in-first-out fashion.
Ring Buffer
The style of ring buffer we consider for this application is implemented as an array, in which log entries are stored. That is, each array element holds one log entry. A log entry includes a timestamp and a message. The array is of a fixed size C , which is called its capacity, and it is indexed from 0, . . . , C − 1.
1
A:
C: 14 readindex: 10 writeindex: 16
0 1 2 3 4 5 6 7 8 9 10 11 12 13
writeindex mod C readindex mod C
Figure 1: A graphical depiction of the ring buffer data structure. Shaded array elements represent data stored in the buffer, while white ones represent free space. Here the depicted ring buffer has capacity 14. 16 items have been written to the ring buffer in total so far, while 10 have been read, leaving 6 items remaining in the buffer to be read at present.
Read and Write Indices There are two index variables called the write index and the read index respectively. The write index is used to calculate the location that will be used to hold the next item that will be added to the ring buffer. The read index is used to calculate the index of the oldest item in the ring buffer, i.e. which item will be the first to be read from the buffer.
The write index counts the number of items that have been added to the buffer. The read index counts the number of items that have been read from the buffer. For the purposes of this assignment, you can assume that these indices are unbounded, i.e. you don’t need to worry about what might happen if they get too large and overflow.
If the value of the write index is w, then the position used to store the next item to be added to the buffer is w mod C. Likewise, if the read index has value is r, then the oldest item in the buffer (if it exists) is at position r mod C.
Initial State Initially, both the read index and write index are set to 0.
Invariant An invariant is a property that talks about the internals of a data structure. Functions and procedures that operate on the data structure can assume that the invariant holds when they are called. However they must ensure that when they finish the invariant still holds.
The invariant for the ring buffer is that the write index must be greater than or equal to the read index, and the maximum difference between the read and write indices cannot exceed the buffer’s capacity C.
Empty and Non-Empty Buffer The ring buffer is empty when the read and write index are equal to each other. Otherwise it is non-empty and an attempt to read an item from the buffer must succeed.
Reading an Item Reading an item from the buffer removes that item from the buffer. Suppose the array that implements the buffer is A, whose capacity is C, and the read and write indices are stored respectively in the readindex and writeindex variables. Then the following function is used to read an item from the ring buffer.
function GetItem(A,C,readindex,writeindex) if readindex ̸= writeindex then
index ← readindex mod C readindex ← readindex + 1 return A[index]
else
return null
2
Notice how this function updates the read index, by incrementing it. This has the effect of ensuring that the read item cannot be read again, and so simulates the idea that it has been removed from the buffer (without having to explicitly zero out that array element).
Your Tasks
1. [2 marks] Notice that the GetItem function above increments the readindex parameter, when reading an item from the ring buffer. For this to work, this parameter must be passed by reference. What would happen instead if it were passed by value? Explain in no more than 2–3 sentences.
2. [3 marks] Implement the following Θ(1) procedure that adds an item x to the ring buffer. Your function should always succeed: if the ring buffer is already full, the oldest item should be discarded. You can assume that all arguments (parameters) to the procedure are passed by reference. You can also assume that the invariant holds when your procedure is called. You should ensure that it holds again when your procedure finishes (returns).
procedure PutItem(A, C, readindex , writeindex , x) . . .
3. [4 marks] Consider the following, very inefficient, decrease and conquer algorithm. Its job is to remove from the ring buffer all log entries that are older than the given timestamp t. Note that each item in the ring buffer stores two pieces of data: a timestamp and a message. Given an item x, x.time denotes x’s timestamp and x.msg denotes x’s message.
procedure RemoveOlder(A,C,readindex,writeindex,t) if readindex ̸= writeindex then
if A[readindex mod C].time < t then
readindex ← readindex + 1 RemoveOlder(A, C, readindex , writeindex , t)
Write down a recurrence relation for the worst-case complexity of this algorithm, that counts the number of basic operations it performs. The size of the input n is the number of items in the ring buffer: writeindex − readindex. The algorithm has two basic operations (1) the comparison A[readindex ].time < t and (2) the update of readindex : readindex ← readindex + 1.
Use telescoping (i.e. backward substitution) to find a closed form for your recurrence relation and, thus, determine the worst-case complexity of this algorithm.
4. [5 marks] Design a more efficient implementation of RemoveOlder that performs the same task but whose worse-case complexity is in Θ(logn). For this question you are free to assume that timestamps never decrease over time. That is, if an item x2 is added to the ring buffer after some item x1 has been added, then x2.time ≥ x1.time.
5. [4 marks] Suppose you have to implement an algorithm to remove from the ring buffer all items whose timestamps are within a range [t1, t2), that is remove all items x for which t1 ≤ x.time < t2. Explain why such an algorithm cannot be implemented with worst-case complexity Θ(log n). Your explanation should be no more than a paragraph of text, i.e. 5 or 6 sentences.
6. [8 marks] Suppose now that instead of storing log entries, we used the ring buffer (specifically GetItem and PutItem) to store graph nodes, to implement breadth-first traversal of a graph. That is, imagine you implement the standard breadth-first traversal algorithm from lectures using the ring buffer as the queue of nodes to be visited.
3
Suppose this hypothetical algorithm was run on a graph ⟨V, E⟩ for which the number of nodes |V | in the graph was larger than C, the buffer’s capacity. Answer each of the following questions:
(a) [2 marks] Would every node be guaranteed to be traversed? Explain your answer in no more than a paragraph (i.e. 5 or 6 sentences). You might like to draw a picture of a graph to assist your explanation.
(b) [6marks]Devisearecursive,depth-firstsearch-basedalgorithmfordeterminingwhetherbreadth- first traversal of a graph will succeed (i.e. a valid breadth-first traversal will be performed). By “depth-first search-based” we mean that your algorithm should be a variant of the recursive depth-first traversal algorithm shown in lectures. For this question you can assume that the graph is undirected. Your algorithm is allowed to be conservative: it can say that traversal is not possible even if it is. However, it should never say that traversal is possible when it is not. The worst-case running time of your algorithm should not exceed Θ(|V |+|E|), the complexity of depth-first traversal. Your algorithm is not permitted to attempt to run breadth-first traversal over the graph.
You should explain how your algorithm works in no more than a paragraph of text (i.e. 5 or 6 sentences).
The less conservative your algorithm is, the higher your score will be for this question. Suppose C was 4. Then your algorithm should answer “yes” for each of the following three graphs, while being general (i.e. you should not specialise your algorithm to these cases and it should work for arbitrary positive C).
Graph 1 Graph 2 Graph 3
7. [4 marks] For this question, now assume we are using the ring buffer data structure to store char- acters, i.e. each array element of the ring buffer holds a single character, and so a ring buffer stores a sequence of characters (also known as a string) of length writeindex − readindex. For a non- empty ring buffer, the first character in the string is at position readindex mod C, the second is at readindex + 1 mod C, and so on up to the final character at position writeindex − 1 mod C.
We discussed brute force string matching in lectures, to find the first occurrence of a patten p in a string s (the algorithm returns -1 if the pattern is not found and otherwise returns the position in the string at which p appears). It has O(|p| · |s|) time complexity in the worst case. However, there are algorithms for this task which have better worst-case time complexity of O(|p| + |s|). Suppose you have been given such an algorithm.
Now show how we can design an algorithm to determine whether a given ring buffer is a rotation of another ring buffer (both of capacity C), in O(C) time. Suppose ring buffer A stores the string a and ring buffer B stores the string b, then A and B are rotations of each other, if there exist strings uandvsuchthata=uvandb=vu(uvmeansuandvconcatenated). Notethatuorvcouldbe empty, so by our definition, every ring buffer is a rotation of itself.
4
Submission and Evaluation
• You must submit a PDF document via the LMS. Note: handwritten, scanned images, are acceptable only if they are clearly legible. Write very neatly, and if you photograph your submission be sure to use an app that auto crops and rotates images, such as OfficeLens, to ensure the resulting submission is easy to mark. Convert images to PDF before submission. Do not submit Microsoft Word documents — if you use Word, create a PDF version for submission.
• Marks are primarily allocated for correctness, but elegance of algorithms and how clearly you com- municate your thinking will also be taken into account. Where indicated, the complexity of algo- rithms also matters.
• We expect your work to be neat – parts of your submission that are difficult to read or decipher will be deemed incorrect. Make sure that you have enough time towards the end of the assignment to present your solutions carefully. Time you put in early will usually turn out to be more productive than a last-minute effort.
• You are reminded that your submission for this assignment is to be your own individual work. For many students, discussions with friends will form a natural part of the undertaking of the assignment work. However, it is still an individual task. You should not share your answers (even draft solutions) with other students. Do not post solutions (or even partial solutions) on social media or the discussion board. It is University policy that cheating by students in any form is not permitted, and that work submitted for assessment purposes must be the independent work of the student concerned.
Please see https://academicintegrity.unimelb.edu.au
If you have any questions, you are welcome to post them on the LMS discussion board so long as you do not reveal details about your own solutions. You can also email Toby Murray (toby.murray@unimelb.edu.au). In your message, make sure you include COMP90038 in the subject line. In the body of your message, include a precise description of the problem.
Late Submission and Extension
Late submission will be possible, but a late submission penalty will apply of 3 marks (10% of the assign- ment) per day.
Extensions will only be awarded in extreme/emergency cases, assuming appropriate documentation is provided simply submitting a medical certificate on the due date will not result in an extension.
5