代写代考 FIT2004 Assessed Preparation

Assessed Preparation
Week 6 Studio Sheet (Solutions)
Useful advice: The following solutions pertain to the theoretical problems given in the tutorial classes. You are strongly advised to attempt the problems thoroughly before looking at these solutions. Simply reading the solu- tions without thinking about the problems will rob you of the practice required to be able to solve complicated problems on your own. You will perform poorly on the exam if you simply attempt to memorise solutions to the tutorial problems. Thinking about a problem, even if you do not solve it will greatly increase your understanding of the underlying concepts. Solutions are typically not provided for Python implementation questions. In some cases, psuedocode may be provided where it illustrates a particular useful concept.
Problem1. Ahashtablecanbeusedtostoredifferenttypesofelementssuchasintegers,strings,tuples,oreven arrays. Assume we want to use a hash table to store arrays of integers (i.e., each element is an array of integers). Consider the following potential hash functions for hashing the arrays of positive integers. Rank them in terms of quality and give a brief explanation of the problems with each of them. Assume that they are all taken mod m , where m is the table size.
• Returnthefirstnumberinthearray(e.g.,ifarray=[10,5,7,2],hashindexwillbe10%m)
• Returnarandomnumberinthearray(e.g.,ifarray=[10,5,7,2],hashindexwillbearandomlychosen
element from the array mod m)
• Returnthesumofthenumbersinthearray(e.g.,ifarray=[10,5,7,2],hashindexwillbe24%m)
Give a better hash function than these three and explain why it is an improvement.
From best to worst:
1. Returnthesumofthenumbersinthearray.Thisisadecenthashfunction,asitaccountsforallof the contents of the array. It is not amazing since it will produce identical hashes for permutations of the same elements (or arrays having the same sum), e.g., array1=[10,5,7,2], array2=[2,5,7,10] and array3=[9,6,7,2] will all collide.
2. Return the first number in the array This is not very good since it only accounts for one element of the array, so any two arrays that begin with the same element will collide.
3. Returnarandomnumberinthearray.Thisisnotevenavalidhashfunctionsinceitmightreturn a different value for the same array when used multiple times. Hash functions must be consistent, that is they must always produce the same value for the same key.
A better hash function would be
h(A)=A[0]+A[1]x +A[2]x2 +…+A[n]xn,
for some value of x > 1. This is better since if we use this hash function, all of the array is taken into account, and permuting the order of the elements is likely to change the value of the hash. We can make it even better by selecting x >any value we could see in our input (for example, if we were hashing just a-z, giving values of 1-26, then we could choose x=27) and taking the sum mod p for a suitable prime p , which yields the standard polynomial hash function.
Problem2. Insert5intothefollowingAVLtreeandshowtherebalancingprocedurestepbystep. 1
We insert 5 using the ordinary BST insertion procedure and we obtain 50
This tree is imbalanced, as the node containing 20 has a height 2 left subtree, and a height 0 right subtree, resulting in a balance factor of 2. To rectify this, we rotate the node 10:
5 20 52 60
Studio Problems
Problem3. Insert6intothefollowingAVLtreeandshowtherebalancingprocedurestepbystep.
We insert 6 using the ordinary BST insertion procedure and we obtain the following tree. 3
This tree is imbalanced at node 8 since it has a left subtree of height 3 and a right subtree of height 1, leading to a balance factor of 2. Note that the root node 3 is also imbalanced, but we always perform rotations on the lower levels first since this may fix the imbalances at higher levels.
Since node 8 has a taller left subtree and node 5 has a taller right subtree, this is a left-right imbalance so we must perform a double rotation on node 7 to correct it. We perform the first rotation to obtain:
1 5 9 −→ 1 7 9 475
646 We then perform the second rotation and are left with:
1 7 9 −→ 5
46 This is a balanced tree, so we are done.
Problem4. Placeholdertoretaincorrectquestionnumbering
Problem 5. Recall the algorithm for deleting an element from a binary search tree. If that element has no
children, we do nothing. If it has one child, we can simply remove it and move its child upwards. Otherwise, if the node has two children, we swap it with its successor and then delete it. This algorithm works because of the following fact: The successor of a node with two children has no left child. Prove this fact.
(a) First prove that the successor of a node x with two children must be contained in the right subtree of x (b) Use this fact to prove that the successor of x can not have a left child
Recall that the successor of a node is the minimum value node that is greater than x . Consider a node x withtwochildrenxl andxr.First,wewillshowthatthesuccessorofxliesintherightsubtreexr ofx. All elements in the left subtree of x have values < x and hence can not be the successor of x . Suppose that x has a parent p. If px < x then x is a right child and hence neither px nor its left subtree can contain the successor of x since every node v in the left subtree of px satisfies v < px < x . Otherwise x is a left child of px and hence rx < p and in particular rx < v for every v in the right subtree of px since rx < px < v , therefore the successor of x is not a right descendant of px . This leaves the only possibility that the successor is in the subtree of rx . Suppose that the successor y of x had a left child yl . By the BST property we have yl < y , but since yl is contained in the right subtree of x , we also have x < yl . This means that x < yl < y which implies that y is not in fact the successor of x . This is a contradiction and hence y must have no left child. Problem 6. Consider the binary search tree shown below, with balance factors as indicated. Assume that the subtrees A, B, C and D are AVL trees. Show that performing the rotation algorithm for the right-left case results in an AVL tree. Let the height of subtree B be h. We can deduce the heights of subtrees C, D and A are also h (based on the balance factors). After applying the two phases of the right-left case rotations, we obtain the following situation: Based on the heights of A, B, C and D, we can deduce the balance factors of x, y and z as all being 0. Since A, B, C and D are AVL trees, the balance factors of all nodes in the tree are in {−1, 0, 1}. So the tree is now an AVL tree. Problem7. Considerthefollowingpotentialhashfunctionsforhashingintegers.Rankthemintermsofquality and give a brief explanation. Assume that they are all taken mod m , where m is the table size. • Returnx mod2p forsomeprimep • Return(ax+b)forsomepositive,andrandomlychosen,a,b • Returnarandominteger Ranked from best to worst: • Return (a x + b ) for some positive a , b . This hash is ok. It could be improved by taking the result modulo a prime number. • Return x mod 2p for some prime p . Note that, if p is small, taking modulo powers of two simply keeps the bottom p bits of the integer, which is bad. For p sufficiently large, this is just the same as just taking the value of x as the output of our hash function, which is fine if the keys are random, but not so great if they are not. • Returnarandominteger.Thisdoesnothashthesamekeytothesameplaceconsistently,andsois not even a valid hash function. Problem 8. Consider the following probing schemes and for each of them, explain whether they do or do not suffer from primary or secondary clustering. In each problem, h returns an index where the item k is to be inserted when probing for the i t h time. h ′ , h1 , h2 are hash functions. (a) h(k,i)=(h′(k)+5i) modm (b) h(k,i)=(h′(k)+i32 ) modm (c) h(k,i)=(h1(k)×(h2(k)i) modm (d) h(k,i)=(h′(k)+2i+1) modm (e) h(k,i)=(h1(k)×h2(k)+i) modm (a) This is just linear probing, but in fixed steps of 5 instead of fixed steps of 1. This means it has both primary and secondary clustering. (b) This is almost quadratic probing, except instead of steps of size i2 we have steps of size i3/2. It will behave in a similar way to quadratic probing, namely that two keys having the same hash will have the same probe chain, since the step size does not depend on the key, but keys with different hashes will have non-overlapping chains, because the step size increases each step. So we do not have primary clustering, but we do have secondary. (c) This is almost double hashing, but multiplying by h2 (k )i instead of adding h2 (k ) ∗ i . It will display neither primary nor secondary clustering. (d) Thishashfunctionprobeswitheverincreasingstepsizes,inasimilarwaytoquadraticprobing,but the step sizes are increasing powers of a given number. Two items which have the same value for h′(k,0) will have the same probe chains, so this hash displays secondary clustering. It is possible for items with different values for h′(k,0) to have partially overlapping chains, for example when h′(k1)+2=3,h′(k2)+2=9. Theneverysecondpositionintheprobechainofk1 willoverlapwith an element from the chain from k2. This is not as bad as primary clustering (and will occur very rarely). (e) Ifwedefineanewhashfunction,h3(k,i)=h1(k)∗h2(k)+i,wenoticethatthisisjustlinearprobing. So it has both primary and secondary clustering. Problem9. Youaregivenasetofdistinctkeysx1,x2,...,xn. (a) DesignanalgorithmthatcreatesabinarysearchtreeofminimalheightcontainingthosekeysinO(nlog(n)) (b) ProvethatyouralgorithmproducesaBSTofheightatmostlog(n). (c) ProvethatO(nlog(n))isthefastestalgorithmforthisprobleminthecomparisonmodel (a) First we sort the keys, this takes O (n log(n )). We take the median element of the sequence and insert it into a BST, then recursively do the same thing with the resulting left and right sublists. In the below code, we assume we have a Node class which contains a key, a left child, and a right child. We must sort the list before calling OPTIMAL_BST(x[1..n]). 1: 2: 3: 4: 5: function OPTIMAL_BST(x[1..n]) if x is empty then return null mid = ⌊n/2⌋ root = Node(x [mid]) root.left = OPTIMAL_BST(x [1..mid-1]) 6: root.right = OPTIMAL_BST(x[mid+1..n]) 7: return root 8: endfunction (b) To prove that the height of the constructed tree is at most log(n), we write a recurrence for the height. Using telescoping, H (n ) = 1 + H 􏰊􏰜 n − 1 􏰝􏰋 ≤ 1 + H 􏰏 n 􏰐 22 H (n ) ≤ 1 + 􏰒1 + H 􏰏 n 􏰐􏰓 , 4 ≤ 2 + 􏰒1 + H 􏰏 n 􏰐􏰓 , 8 ≤ ..., ≤k+H􏰏n 􏰐. 2k We know that H (1) = 0, so setting k = log(n ), we obtain H(n)≤log(n)+H(1)=log(n) as required. (c) IfwecouldconstructanyBSTfromnkeysinfasterthanO(nlog(n))time,wecouldthendoanin- order traversal of the resulting BST to obtain the keys in sorted order. This means we could sort faster than O (n log(n )), but O (n log(n )) is a lower bound for comparison-based sorting, so such an algorithm is impossible. Problem10. Weknowthatwheninsertinganelementintoabinarysearchtree,thereisonlyonevalidplaceto put that item in the tree. Let’s prove this fact rigorously. Let T be a binary search tree and let x be an integer not contained in T . Prove that exactly one of the following statements is true: • The successor of x has no left child • The predecessor of x has no right child That is, prove that it can not be the case that both of these statements are true simultaneously, nor that both of these statements are false simultaneously. This implies that there is a unique insertion point for x since upon insertion x must either become the left child of its successor or the right child of its predecessor. Let the predecessor of x be p, and the successor be s. We know that p < x < s, and that there is no key k in the tree such that p < k < x or x < k < s. We claim that either s is an ancestor of p, or p is an ancestor of s . Suppose for contradiction that neither is an ancestor of the other, and therefore p and s have a common ancestor r . Then we know that p < r < s. But that means that either p < r < x < s or p < x < r < s. But by the definition of successor and predecessor, both of those are impossible. So we have a contradiction, meaning that either s is an ancestor of p (in which case s has a left child, since p will be in its left subtree), or p is an ancestor of s (in which case p has a right child, since s will be in its right subtree). So we know that at least one of the statements is false, which means that at most, one statement can be true. x ... ... x ps ps xbetweenpandr xbetweenrands We still need to prove that both statements cannot be false, i.e. that at least one statement is true. Suppose forcontradictionthatbotharefalse,i.e.thatshasaleftchildsl andphasasrightchildpr.Weknowthat sl and pr are between s and p in value. We also know that x is between s and p in value, and that nothing is between s and x , and nothing is between x and p . This is a contradiction, so both statements cannot be false, which means at least one is true. Since at least one is true, at most one is true, exactly one must be true. Supplementary Problems Problem11. Implementhashtablesthatusechaining,linearprobing,andquadraticprobingforcollisionreso- lution. You may reuse hashtable code from previous units if you have it. Compare these hashtables on randomly generated integers and compare their performance. Try adjusting the hash function, the table size, and the num- ber of keys inserted and see how this affects the results. Problem 12. In lectures we claimed that AVL trees are good because the balance property guarantees that the tree always has height O (log(n )). Let’s prove this. (a) Write a recurrence relation for n(h), the minimum number of nodes in an AVL tree of height h. [Hint: It should be related to the Fibonacci numbers] (b) FindanexactsolutiontothisrecurrencerelationshipintermsoftheFibonaccinumbers1.[Hint:Compare the sequence with the Fibonacci sequence, find a pattern, and then prove that your pattern is right using induction] (c) Prove using induction that the Fibonacci numbers satisfy F (n ) ≥ 1.5n −1 for all n ≥ 0 (d) Using(a),(b),and(c),provethatavalidAVLtreewithnelementshasheightatmostO(log(n)) 1 For this problem, index the Fibonacci numbers from zero with the base case F (0) = F (1) = 1. Consider an AVL tree of height h whose root node has the two subtrees L and R . Suppose that L and R have the same height. We could remove the nodes from the lowest level of one of the two subtrees, making them differ in height by one and it would still be a valid AVL tree with the same height. Therefore an AVL tree with the fewest possible nodes must have L and R differing in height by one, hence they must have heights h − 1 and h − 2. Lastly, note that L and R must also be AVL trees with the fewest possible nodes. Therefore the minimum number of nodes in an AVL tree satisfies the following recurrence relationship: 1 if h = 0, n(h)= 2 ifh=1, 1+n(h−1)+n(h−2) ifh>0.
This looks suspiciously similar to the Fibonacci sequence, so lets compare them:
h 012345678 F(h) 1 1 2 3 5 8 13 21 34 n(h) 1 2 4 7 12 20 33 54 88
This table strongly suggests the pattern
n (h ) = F (h + 2) − 1
Lets prove this by induction. We have n(0) = F (2)−1 = 1 and n(1) = F (3)−1 = 2 as required. Now suppose n (h ) = F (h + 2) − 1 for all h ≤ k for some value of k . We need to show that n (k + 1) = F (k + 3) − 1. From the recurrence, we have
n(k +1)=1+n(k)+n(k −1). Invoking the inductive hypothesis, we can write
n (k + 1) = 1 + F (k + 2) − 1 + F (k + 1) − 1, =F(k +2)+F(k +1)−1
By the definition of Fibonacci numbers, F (k + 2) + F (k + 1) = F (k + 3), hence we have n (k + 1) = F (k + 2) + F (k + 1) − 1,
= F (k + 3) − 1,
as required. Hence by strong induction on k , we have n (h ) = F (h + 2) − 1 for all h .
Next, we prove that the Fibonacci numbers have the desired bound. In the base case, we have F(0)=1≥1.5−1, F(1)=1≥1.50.
Now,supposethatF(n−1)≥1.5n−2 andF(n)≥1.5n−1. WeneedtoprovethatF(n+1)≥1.5n. Fromthe definition of Fibonacci numbers, we have
F (n + 1) = F (n ) + F (n − 1), using using the inductive hypothesis, we can write the bound
F (n + 1) ≥ 1.5n −1 + 1.5n −2 , =1.5×1.5n−2 +1.5n−2,
=(1.5+1)1.5n−2, ≥2.25×1.5n−2,
= (1.5)21.5n−2, =1.5n.
Therefore by induction on n, we have F (n) ≥ 1.5n−1.
Finally, combining the above, we have n(h) = F (h + 2) − 1 ≥ 1.5h+1 − 1. Rearranging, we find 1.5h+1 ≤
n(h)+1, and hence
h +1≤log1.5(n(h)+1) =⇒ h =O(log(n)),
Problem 13. (Advanced) Consider a hashtable implementing linear probing, with size m = 17 using the hash as required.
h(k)=(7k+11) modm.
Give a sequence of keys to insert into the table that will cause its worst-case behaviour.
We want a sequence of keys that will all hash to the same slot, which will cause linear probing to probe the entire cluster every time. Let’s design a set of keys that all hash to slot 0. We want
7k+11≡0 mod17. Adding 6 to both sides, this is equivalent to
7k≡6 mod17.
Now we want to divide out the 7 to get an expression for k . To do so, we need to multiply by some number x such that 7x ≡ 1 mod 17 (in other words, we need the modular multiplicative inverse of 7 mod 17). We can simply find this by trial and error. Looking at multiples of 7, we find
1×7=7≡7 mod17, 2×7=14≡14 mod17, 3×7=21≡4 mod17, 4×7=28≡11 mod17, 5×7=35≡1 mod17.
Therefore we find that 5 is the inverse we are looking for. We therefore can write 5×7k ≡5×6 mod17,
from which we get
Therefore, a worst-case sequence of keys would be
k =13+17i, i ≥0,
i.e. k = 13, 30, 47, 64, 81, ….
Problem 14. (Advanced) Suppose that we insert n keys into a hashtable with m slots using a totally random hash function. What is the expected number of pairs of colliding elements? The pairs (k , k ′ ) and (k ′ , k ) are considered the same and should not be counted twice.
k≡13 mod17.
First, we define the following.
􏰅1 ifh(ki)=h(kj), Ii , j = 0 otherwise.
We call this an indicator function on the condition h (ki ) = h (k j ). For each item ki , the number of elements it collides with is given by
i−1 #collisionswithki =􏰁Ii,j.
The total number of collisions is therefore #collisions=􏰁􏰁Ii,j
So, by linearity of expectation, the expected number of collisions is
n i−1 n i−1 
E 􏰁􏰁Ii,j =􏰁􏰁E􏰍Ii,j􏰎. i=1 j=1 i=1 j=1
We have by the definition of expectation E􏰍Ii,j􏰎=1×Pr􏰍h(ki)=h(kj)􏰎+0×Pr􏰍h(ki)̸=h(kj)􏰎=Pr􏰍h(ki)=h(kj)􏰎.
Since the hash is totally random,
E 􏰍 I i , j 􏰎 = P r 􏰍 h ( k i ) = h ( k j ) 􏰎 = m1 . Therefore the expected number of colliding pairs is
n i−1 1 􏰈n􏰉1 n(n−1) 􏰁􏰁= = .
n i−1 i=1 j=1
i=1j=1m 2m 2m
Problem 15. (Advanced) Recall the algorithm for insertion into a hashtable using Cuckoo hashing. Assume that m ≥ 2n and MaxLoop = O (log(n )). Given that the probability that an insertion triggers a rebuild is O (n −2 ), and insertions succeed in expected constant time when a rebuild is not triggered:
(a) Provethattheprobabilitythatasequenceofninsertionssucceedsis1−O􏰆1􏰇 n
(b) ProvethattheexpectedtimecomplexityofinsertionisO(1) Solution
(a) The probability that an insertion triggers a rebuild is O(n−2). The probability that any of the n insertions triggers a rebuild is
􏰈n􏰉 Pr 􏰑{insertion i triggers a rebuild} .
The events are not mutually exclusive, but the probability of their union is bounded above by the
sum of their individual probabilities. Formally, this is called the union bound, or Boole’s inequality.
􏰑 insertion i triggers a rebuild ≤ P 􏰆insertion i triggers a rebuild􏰇 , i=1 i=1
≤nO(n−2), = O 􏰊 1 􏰋.
So the probability that a rebuild is not triggered is 1 − O 􏰆 1 􏰇 as required. n
(b) Let X denote the time taken by an insertion. A successful insertion takes O (1), and an unsuccessful insertion takes O (MaxLoop) to try, plus the cost of n sub