CSC263H1, Summer 2020 Problem Set 2 Sample Solutions
CSC263H1: Problem Set 2 Sample Solutions Due Friday June 26 before 10pm
Note: solutions may be incomplete, and meant to be used as guidelines only. We encourage you to ask follow-up questions on the course forum or during office hours.
1. [12 marks] Binary tries. A binary trie is a binary tree in which each edge going from a parent to its left child is labelled with 0 and each edge going from a parent to its right child is labelled with 1. Each node v in the trie is labelled by the binary string L(v) consisting of the concatenation of the bits labelling the edges of the path from the root to v.
For example, if u is the root of a binary trie, v is u’s left child, and w is v’s right child, then L(u) = ε (the empty string), L(v) = 0, and L(w) = 01.
Consider any set of n different b-bit binary strings x1, . . . , xn ∈ {0, 1}b, where xi is (lexicographically) less than xi+1, for 1 ≤ i < n. This set can be represented by a depth b trie with n leaves, whose the labels, from left to right, are x1,...,xn. Each leaf with label xi, for 1 ≤ i < n, has a pointer s to the leaf with label xi+1 and each leaf with label xi, for 1 < i ≤ n, has a pointer p to the leaf with label xi−1. Each internal node with no left child stores a pointer to the leaf in its subtree with the smallest label. Similarly, each internal node with no right child stores a pointer to the leaf in its subtree with the largest label. Finally, for each depth d = 1, . . . , b, there is a linear size perfect hash table for the set of labels of the nodes of the trie at depth d.
(a) [2 marks] Draw a picture of this data structure for the set of 2-bit strings {00, 01, 11}.
Solution
0 /\\\
0/ \1 \1 \
/\\\ 00 <−−−> 01 <−−−> 11 <−−
− /\ 0/ \1 /\
1 −−−−
(b) [2 marks] What is the maximum amount of space (i.e. number of bits), to within a constant factor, used by the data structure for a set of n different b-bit binary strings? Justify your answer.
Solution
The number of nodes in the trie is maximized when the trie consists of a complete binary trie of depth ⌊log2 n⌋ and, at each subsequent level, has exactly n nodes. The total number of nodes in this trie is N = 2⌊log2 n⌋ + n(b − ⌊log2 n⌋) ∈ O(nb). Each node has pointers, each Θ(log N ) bits long, for a total of Θ(N log N ) bits. Note that b ≥ ⌈log2 n⌉; otherwise, there are fewer than n different b-bit binary strings. Hence N log N ∈ O(nb2).
The space needed to represent the hash table at each level is, to within a constant factor, the space needed to represent the nodes and their labels. For d = 1, . . . , ⌊log2 n⌋, there are 2d nodes at depth d, each with d bit labels. For d = ⌊log2 n+1,...,b⌋, there are n nodes at depth d, each
Page 1/4
CSC263H1, Summer 2020 Problem Set 2 Sample Solutions
⌊log2 n⌋ b
with d bit labels. In total, there are d2d + n d ∈ Θ(nlogn + nb2) = Θ(nb2)
bits.
d=1 d=⌊log2 n⌋+1
(c) [4 marks] Describe an algorithm that, given an instance of this data structure and a b-bit binary string x finds the deepest node v in the trie such that L(v) is a prefix of x. Your algorithm should run in O(log b) time, where standard operations on b-bit words are assumed to take O(1) time. Prove that your algorithm is correct and has the required worst case time complexity.
Solution
The idea is to do binary search on the levels of the trie.
DEEPESTPREFIX(l,h,x) returns the deepest node v in the trie such that L(v) is a prefix of x, provided that there is a node in the trie at level l whose label is a prefix of x and there is no node in the trie at level greater than h whose label is a prefix of x. This can be proved by induction on h−l. The proof is essentially the same as the proof of correctness of binary search. Since the empty string (the label of the root) is a prefix of every string and the trie contains no nodes at levels greater than b, DEEP EST P REF IX(0, b, x) returns the deepest node in the trie whose label is a prefix of x.
DEEP EST P REF IX(l, h, x):
if l = h then return the node at level l whose label is the length l prefix of x m ← ⌈(l + h)/2⌉
if the length m prefix of x is stored in the hash table for level m
then return DEEPESTPREFIX(m,h,x)
else return DEEPESTPREFIX(l,m−1,x)
Binary search on a range of size b performs O(log b) calls. For each call, this algorithm performs a constant amount of work (excluding recursive calls). Thus, the total time taken is O(log b).
(d) [4 marks] Describe an algorithm that, given an instance of this data structure for the b-bit binary strings x1, . . . , xn and a b-bit binary string x, finds the predecessor of x in {x1, . . . , xn}. Your algorithm should run in O(log b) time, where standard operations on b-bit words are assumed to take O(1) time. Prove that your algorithm is correct and has the required worst case time complexity.
Solution
Let v be the node returned by DEEPESTPREFIX(0,b,x). Suppose it is at depth d.
If d = b, then x is the label of one of the leaves of the trie, i.e. x ∈ {x1,...,xn}. In this case, this leaf’s predecessor in the doubly linked list of leaves is a leaf whose label is the predecessor of x in {x1,...,xn}.
Otherwise, the length d prefix of x is L(v) and the length d + 1 prefix of x does not label any node in the trie. If the (d+1)st bit of x is 0, this means that v does not have a left child. Hence, it stores a pointer to the leaf in its subtree with the smallest label. The label of this leaf is larger than x. However, its predecessor in the doubly linked list of leaves has a label smaller than x. This label is the predecessor of x in {x1,...,xn}. Similarly, if the (d+1)st bit of x is 1, then v stores a pointer to the leaf in its subtree having the largest label, which is predecessor of x in {x1,...,xn}.
Page 2/4
CSC263H1, Summer 2020 Problem Set 2 Sample Solutions
Besides the call to DEEPESTPREFIX(0,b,x), only a constant amount of work is done. Therefore, the total running time of this algorithm is in O(log b).
Page 3/4
CSC263H1, Summer 2020 Problem Set 2 Sample Solutions
2. [12 marks] Augmentation.
You are given an input array A of n distinct numbers, and you want to compute an output array B, in which B[i] is the number of such elements in A: (a) the element appears after A[i] in A, and (b) the element is strictly smaller than A[i].
For example, if the input is A = [3, 5, 1, 4, 7, 6, 2], then the output should be B = [2, 3, 0, 1, 2, 1, 0]. Because in A, after 3 (i.e., A[1]) there are 2 elements that are smaller than 3 (i.e., 1 and 2); and after 5 (i.e., A[2]) there are 3 elements that are smaller than 5 (i.e., 1, 4 and 2), etc.
Now you need to design an algorithm that computes B using A. The runtime of your algorithm must be O(n log n) in worst case, where n is the size of A. An O(n2) algorithm is not acceptable and will not get any mark. Describe you design by answering the following questions.
(a) [1 mark] Which data structure from CSC263 do you use?
(b) [1 mark] What modifications do you make to this data structure? If any.
(c) [4 marks] How does your algorithm populate this data structure using the input array A? (d) [4 marks] How does your algorithm compute the numbers in the output array B?
(e) [2 marks] Why does your algorithm run in O(n log n) time in worst case?
Solution
(a) Use augmented AVL-tree, or order-statistic tree
(b) Add attribute size[x] to each node, which is the total number of nodes in the subtree rooted
at node x.
(c) Insert the elements in A into the AVL-tree, starting from the last element, i.e., A[n], A[n −
1],A[n−2],...,A[1].
(d) After each insertion, call OS-RANK(x) where x is the number that’s just inserted, the rank minus one is the number of element that are after the element and that are smaller. That is, after inserting A[i], let B[i] ← OS-RANK(A[i]) − 1.
(e) Each insertion and OS-RANK takes O(log n) in worst case, therefore the total runtime is O(n log n) in worst case.
Page 4/4