CSC263H1, Summer 2020 Problem Set 2
General instructions
CSC263H1: Problem Set 2 Due Friday June 26 before 10pm
Please read the following instructions carefully before starting the problem set. They contain important information about general problem set expectations, problem set submission instructions, and reminders of course policies.
• Your problem sets are graded on both correctness and clarity of communication. Solutions that are technically correct but poorly written will not receive full marks. Please read over your solutions carefully before submitting them.
• Each problem set may be completed in groups of up to three. If you are working in a group for this problem set, please consult the articles on collaboration and plagiarism on posted on quercus.
• Solutions must be typeset electronically, and submitted as a PDF with the correct filename. Hand- written submissions will receive a grade of ZERO.
The required filename for this problem set is problem set2.pdf.
• Problem sets must be submitted online through MarkUs. If you haven’t used MarkUs before, give yourself plenty of time to figure it out, and ask for help if you need it! If you are working with a partner, you must form a group on MarkUs, and make one submission per group. “I didn’t know how to use MarkUs” is not a valid excuse for submitting late work.
• Your submitted file(s) should not be larger than 9MB. You might exceed this limit if you use a word processor like Microsoft Word to create a PDF; if it does, you should look into PDF compression tools to make your PDF smaller, although please make sure that your PDF is still legible before submitting!
• Submissions must be made before the due date on MarkUs.
• The work you submit must be that of your group; you may not use or copy from the work of other
groups, or external sources like websites or textbooks.
Additional instructions
• None.
Page 1/3
CSC263H1, Summer 2020 Problem Set 2
1. [12 marks] Binary tries. A binary trie is a binary tree in which each edge going from a parent to its left child is labelled with 0 and each edge going from a parent to its right child is labelled with 1. Each node v in the trie is labelled by the binary string L(v) consisting of the concatenation of the bits labelling the edges of the path from the root to v.
For example, if u is the root of a binary trie, v is u’s left child, and w is v’s right child, then L(u) = ε (the empty string), L(v) = 0, and L(w) = 01.
Consider any set of n different b-bit binary strings x1, . . . , xn ∈ {0, 1}b, where xi is (lexicographically) less than xi+1, for 1 ≤ i < n. This set can be represented by a depth b trie with n leaves, whose the labels, from left to right, are x1,...,xn. Each leaf with label xi, for 1 ≤ i < n, has a pointer s to the leaf with label xi+1 and each leaf with label xi, for 1 < i ≤ n, has a pointer p to the leaf with label xi−1. Each internal node with no left child stores a pointer to the leaf in its subtree with the smallest label. Similarly, each internal node with no right child stores a pointer to the leaf in its subtree with the largest label. Finally, for each depth d = 1, . . . , b, there is a linear size perfect hash table for the set of labels of the nodes of the trie at depth d.
(a) Draw a picture of this data structure for the set of 2-bit strings {00, 01, 11}.
(b) What is the maximum amount of space (i.e. number of bits), to within a constant factor, used by
the data structure for a set of n different b-bit binary strings? Justify your answer.
(c) Describe an algorithm that, given an instance of this data structure and a b-bit binary string x finds the deepest node v in the trie such that L(v) is a prefix of x. Your algorithm should run in O(log b) time, where standard operations on b-bit words are assumed to take O(1) time. Prove that your algorithm is correct and has the required worst case time complexity.
(d) Describe an algorithm that, given an instance of this data structure for the b-bit binary strings x1,...,xn and a b-bit binary string x, finds the predecessor of x in {x1,...,xn}. Your algorithm should run in O(logb) time, where standard operations on b-bit words are assumed to take O(1) time. Prove that your algorithm is correct and has the required worst case time complexity.
Page 2/3
CSC263H1, Summer 2020 Problem Set 2
2. [12 marks] Augmentation.
You are given an input array A of n distinct numbers, and you want to compute an output array B, in which B[i] is the number of such elements in A: (a) the element appears after A[i] in A, and (b) the element is strictly smaller than A[i].
For example, if the input is A = [3, 5, 1, 4, 7, 6, 2], then the output should be B = [2, 3, 0, 1, 2, 1, 0]. Because in A, after 3 (i.e., A[1]) there are 2 elements that are smaller than 3 (i.e., 1 and 2); and after 5 (i.e., A[2]) there are 3 elements that are smaller than 5 (i.e., 1, 4 and 2), etc.
Now you need to design an algorithm that computes B using A. The runtime of your algorithm must be O(n log n) in worst case, where n is the size of A. An O(n2) algorithm is not acceptable and will not get any mark. Describe you design by answering the following questions.
(a) Which data structure from CSC263 do you use?
(b) What modifications do you make to this data structure? If any.
(c) How does your algorithm populate this data structure using the input array A? (d) How does your algorithm compute the numbers in the output array B?
(e) Why does your algorithm run in O(n log n) time in worst case?
Page 3/3