CS计算机代考程序代写 algorithm EECS4101M Solutions to Assignment 2 Winter 2021 York University Instructor: Andy Mirzaian

EECS4101M Solutions to Assignment 2 Winter 2021 York University Instructor: Andy Mirzaian
(35%) Lecture Slide 4, Exercise 5: Split and Join on 2-3-4 Trees.
We will describe the split and join operations for B-trees in general. A 2-3-4 tree is a B-tree with degree d = 2. The Join Operation:
We first describe a restricted version of Join, denoted RJoin ( A, K, B, hA, hB ), where A and B are B-trees, K is a key, and hA = height(A), hB = height(B). Furthermore, as a precondition, every key in A is less than K, which is less than every key in B. RJoin returns the tuple (C , hC), where C is the B-tree C = A∪{K}∪B and hC = height(C).
Let ∆h = hA − hB. Below we will see that RJoin can be done in time O( d(1 + ∆h|)). This is O(1 + ∆h|) for 2-3-4 trees. To do RJoin there are the following three cases:
Case 1: ∆h > 0.
Follow the rightmost path of A down to the node x such that depth(x) = ∆h. Let y = parent[x]. Recursively
insert K as largest key in y and make B the rightmost subtree of y. In case root of B were degree defficient, we can use local node fusion followed by a possible node split type operation to resolve that issue. This local change won’t be necessary in 2-3-4 trees. The recursive insertion of K into node y can be done by a call to the insert pro- cedure, and the ensuing full-node splits may be done either top-down or bottom up. Now root[C] becomes the (new) root of A. hC = hA, unless the root splits, in which case hC = hA + 1. The number of nodes visited is pro- portional to depth of node x in A, which is O(1 + ∆h|). The time complexity is as stated above.
Case 2: ∆h < 0. This is symmetric to case 1. Follow the leftmost path of B down to the node x such that depth(x) = − ∆h. and insert A as the leftmost subtree of y = parent[x], and Recursively insert key K as the smallest new key in y with A as the new leftmost subtree of y. The remaining discussion is similar to case 1. The time complexity is as stated above. Case 3: ∆h = 0. Create a new node root[C] = x. Insert the key K in x and make A its first subtree and B its second subtree. In case any of the roots of A or B (now children of the new root) are degree defficient, we can do local node fusion/split to resolve the issue. Set hC = 1 + hA, unless node fusion but no node split took place at the x level, in which case hC = hA. The time complexity is as stated above. Now we describe the operation C ← Join (A, B) as stated in the problem statement. If B = ∅, then set C = A and we are done. Otherwise, first do a DeleteMin(B) to remove and return the minimum key K in B. Then com- pute (the new) hB = height(B) and hA = height(A) by following their left/right shoulder paths from root down to a leaf. So far, this process takes time O(dhB + hA). Now complete the Join by calling RJoin ( A, K, B, hA, hB ). This takes time O( d(1 + ∆h|)). So, in total, Join takes time O(d(hA + hB)) = O( d logd n), where n = |C|. This time complexity is O( log n) for 2-3-4 trees. The Split Operation: First follow the (complete) search path P from root to a leaf for the largest key value ≤ K. Let the sequence of nodes on path P be denoted x1 , x2 , ... , xh, where root[A] = x1 and xh is a leaf at depth h = height(A) (assum- ing height includes the external nodes level). In what follows, we will use the fact that hi = height(xi) = h − i + 1. The figure below illustrates the situation. B is the shaded region. 1 a b -2- k’1 k’ 2 x1 x2 x3 k’m xh o o o o o o o o o T’ 01 m−1m T’ T’ T’ The sequence of keys (k1′ , k2′ , . . . , k′m) is the concatenation of keys ≤ K, in ascending order that appear on nodes of path P. That is, first take all keys in x1 ≤ K from smallest to the largest, then follow it with all keys ≤ K in x2 from smallest to largest, and so on. (Note that some of these nodes may have no keys ≤ K .) The sequence of subtrees (T0′ , T1′ , . . . , Tm′ ) corresponds to the above key sequence as follows. Suppose kt′ is in node xi on the search path P. Then, Tt′−1 is the subtree of xi pointed to by the child-pointer just to the left of kt′ in xi. Therefore, all keys in Tt′−1 are less than kt′. Furthermore, if kt′ is not the smallest key in xi, then kt′−1 is to the left of kt′ in node xi and hence kt′−1 is less than every key in Tt′−1. On the other hand, if kt′ is the smallest key in xi and kt′−1 exists, the latter must be in some higher node xq on P. In this case the subtree of xq just to the right of kt′−1 contains Tt′−1 and hence kt′−1 is less than every key in Tt′−1. We conclude that every key in Tt′−1 is less than kt′ which is less than every key in Tt′, for t = 1. . m. We conclude that B is the union of all keys T ′ , k′ , T ′ , . . . , T ′ , k′ , T ′ , . . . , T ′ , k′ , T ′ . The keys in 0 1 1 i−1 i i m−1 m m nodes on P and subtrees that form C can be symmetrically characterized (instead of left of P, take right of P). We can compute B and C by the following sequence of RJoin operations Algorithm Split (A , K) B ← C ← ∅ ; hB ← hC ← 0 Let P = (x1 , ... , xh) denote the search path for the largest key value ≤ K for i ← h downto 1 do // going upwards, we are at node xi on path P 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. return (B , C) end for each key kt′ ≤ K in xi, in decreasing order do Let Tt′−1 denote the subtree of xi just to the left of kt′ (B,hB)←RJoin(Tt′−1,kt′,B,hi+1,hB) for each key kt′ > K in xi, in increasing order do
Let Tt′ denote the subtree of xi just to the right of kt′ (C,hC)←RJoin(C,kt′,Tt′,hC,hi+1)
Time Analysis: The for-loop of line 3 is iterated h = O( logd n) times, where n = |A|. The for-loops of lines 4 and 7 are each iterated O(d) times (for each iteration of the main loop of line 3). At lines 6 and 9, hi−1 is the height of the subtree (denoted Tt′ or Tt′−1) that is RJoined with the current B or C. Each such RJoin takes time O(d(1 + ∆hi)), where ∆hi = hi − hB ≥ 0 at line 6, and ∆hi = hC − hi ≤ 0 at line 9. The new hB at line 6, and the
new hC at line 9 is hi (possibly ± 1). Adding up these times over all iterations, we get the total time h
O(d (Σ (1 + hi − hi+1) ) = O(d h) = O(d logd n). This is O( log n) for 2-3-4 trees. i=1
(33%) Lecture Slide 5, Exercise 7: RB-Balanced BST.
Let bh(x) denote black height of node x in T. Every path from x down to a descendant external node has 1 + bh(x) black nodes (including the external node). Since each red node has a black parent, the number of red nodes on such a path can be anywhere from 0 to bh(x) if x is black, or from 1 to 1 + bh(x) if x is red. Hence, the
2
a

-3-
length of any path from x down to a descendant external node is at least bh(x) and at most 2 bh(x) if x is black, and at least 1 + bh(x) and at most 1 + 2bh(x) if x is red. So, T is RB-balanced.
Below we give an O(n) time algorithm to do the node colouring to turn any n node RB-balanced BST T into a Red-Black tree. Besides the additional colour bit per node, we also need a dist field, for intermediate computa- tions, where dist[x] is the length of the shortest path from x down to a descendant external node (dist[nil] = 0). If this field is not provided in the nodes of T , we can first make a copy of T into an otherwise identical tree T ′ in which each node does have the additional dist field. This can be done by a simple linear time traversal. After we have coloured T ′ properly, with another linear time traversal we can copy the node-colours of T ′ into the corre- sponding nodes of T . So, wlog we assume nodes of T have the additional space for the dist field.
b
Algorithm Turn-to-RB-Tree(T)
1. Tree-Black-Height ← SPL (root [T ])
2. Paint (root[T] , Tree-Black-Height , red)
end {Turn-to-RB-Tree}
function SPL (x) // Shortest Path Length to descendant external node
if x = nil then return 0 else do
3.
4.
5.
6.
7.
end {SPL}
dist[x] ← 1 + min ( SPL(left[x]) , SPL(right[x]) )
return dist[x] end else
procedure Paint (x , Black-Height , Parent-Colour)
8. if x = nil then return
9. if Parent-Colour = black and dist[x] > Black-Height
then colour[x] ← red else do
10.
11.
12.
13.
14.
15.
16.
end {Paint}
Colour[x] ← black
Black-Height ← Black-Height − 1 end else
Paint (left[x] , Black-Height , colour[x]) Paint (right[x] , Black-Height , colour[x])
The idea behind the algorithm is as follows. In the first step, we compute and store the value dist[x] for each node x of T by the SPL function. This is a post-order traversal and takes O(n) time. We colour the nodes in such a way that the shortest path from root[T] down to a descendant external node will all be coloured black. This would be the black height of the tree (indicated as Tree-Black-Height in the algorithm). The black depth of the external node at the end of that shortest path will be equal to Tree-Black-Height. To ensure that all other external nodes have the same black depth, we “pull up” the longer paths in pre-order (as early as possible) by making some nodes red along such paths. Procedure Paint achieves that. The third parameter of Paint is to ensure that no child of a red node is also coloured red. The second parameter of Paint is used to decide whether all the paths from the current node x to descendant external nodes are too long. If so, and if parent of x is not red, then we colour x red in order to shorten the black length of these paths. Below we will show that the RB-balance property will suffice to successfully complete the colouring.
Time Analysis: Both SPL and Paint are pre/post order traversals and spend O(1) time per node. Hence, the entire algorithm takes O(n) time.
Correctness: First let us use the following (shorthand) notation. We let h(x), bh(x), c(x) to denote, respec- tively, height, black height, and colour of x. To refer to the parameters in the above algorithms, we will use the shorthand notations TBH = Tree-Black-Height, BH = Black-Height, PC = Parent-Colour. In what follows, we will also assume 0/1 numeric value for red/black colour, where red = 0 and black = 1. The RB-balance property says that for every node x in T we have h(x) ≤ 2dist(x), where we assume h(nil) = dist(nil) = 0.
c

-4-
With a simple induction we can show that SPL correctly computes dist(x) for every node x, and TBH = dist(root(T)). The first call to Paint is Paint(root(T) , TBH , red). To prove the correctness of the algo- rithm, we will show that any call to Paint(x , BH , PC) has the following pre/post conditions:
Pre-Condition: BH ≤ dist(x) ≤ h(x) ≤ 2 BH + PC.
Post-Condition:
(i) c(x) + PC > 0, (i.e, not both are red),
(ii) The subtree rooted at x is a Red-Black tree of black height bh(x) = BH, except that x may be red.
We will prove the pre/post-condition holds by induction. The first call is Paint(x, TBH, red) with x = root(T ). So, BH = TBH, PC = 0. From the correctness of SPL, we have TBH = dist(x), thus BH = dist(x). Furthermore, because of RB-balanceness, we have h(x) ≤ 2dist(x). Therefore, the pre-condition holds. Now we want to prove that when this recursive call returns, its post-condition is satisfied. For this, we look at the details of procedure Paint(x, BH, PC).
At line 8, if x = nil, then dist(x) = 0. This and the pre-condition imply that BH = 0 (since BH ≤ dist(x) ≤ 2BH + PC ≤ 2BH + 1, and BH is an integer). Thus, BH = dist(x) = 0, and we may assume c(x) = black. This satisfies the post-conditions; x is the black root of an RB-tree with black height bh(x) = BH = 0.
If at line 8 x≠nil, then lines 9-16 will be executed. Lines 9-12 establish post-condition (i). When we get to line 14, value of BH has become BH = BH − c(x). (Note that c(x) is either red = 0 or black = 1.)
Let y = left[x] and z = right[x]. We have h(x) = 1 + max ( h(y) , h(z)), and dist(x) = 1 + min ( dist(y) , dist(z)). Therefore, dist(x) − 1 ≤ dist(y) ≤ h(y) ≤ h(x) − 1, and dist(x) − 1 ≤ dist(z) ≤ h(z) ≤ h(x) − 1.
At lines 15 and 16 we make the recursive calls Paint(y , BH , c(x)) and Paint(z , BH , c(x)). We will show at line 15 the pre-condition for Paint(y , BH , c(x)) i.e., BH ≤ dist(y) ≤ h(y) ≤ 2 BH + c(x), is satisfied. The pre- condition for x says BH ≤ dist(x) ≤ h(x) ≤ 2 BH + PC. We consider two cases:
Case 1: [dist(x) = BH] In this case the condition on line 9 fails. Hence c(x) = black = 1 and BH = BH −1. By the RB-balance property, h(x)≤2dist(x)=2BH. Thus, h(y) ≤ h(x)− 1≤ 2BH −1= 2BH+1=2BH+c(x). Also,dist(y)≥dist(x)−1=BH−1=BH.
Case 2: [dist(x)>BH] In this case we must have PC + c(x) = 1 (i.e., exactly one of x or its parent must be black). Thus, h(y)≤h(x)−1≤2BH +PC−1=2BH −c(x)=2BH +c(x). Also, dist(y) ≥ dist(x)−1 ≥ BH + 1 − 1 ≥ BH.
Therefore, the pre-condition BH ≤ dist(y) ≤ h(y) ≤ 2 BH + c(x) holds for y. (A similar proof holds for z.)
Since the preconditions hold for both recursive calls at lines 15 and 16, by induction hypothesis, we may assume that their post-conditions will hold after these recursive calls return. These completely establish the post-condi- tion for node x itself. Condition (i) is due to the proper colouring between x and its parent. Then post-condition (i)-(ii) for its children imply that node x, possibly red, is the root of an RB-tree with black height bh(x) = c(x) + bh(y) = c(x) + bh(z) = c(x) + BH = BH. Hence, post-condition (ii) for x is also established.
Therefore, by induction, the post-condition for the first recursive call Paint(root(T) , TBH , red) is met, which implies that T is a valid RB-tree with black root and black height TBH.
(32%) Lecture Slide 6, Exercise 8: Energy Balanced BST.
Suppose we want to rebuild the subtree Tx rooted at node x in the EB-BST. The number of elements in Tx is s = w[x]. We do the rebuilding in two stages. In stage one we allocate an array A[1. . s]. By an inorder traversal of T x , we fill the elements of T x into array A[1. . s]. Now keys of T x appear in increasing order in the array. In stage two, we convert the array into a completely balanced BST T ′ (with restored fields w and p). This is done as follows. We pick the middle element of A[1. . s], at index m = (1 + s)/2, to form the root of T ′. The subarray A[1. . m − 1] will recursively be converted to left subtree of T ′, and the remaining subarray A[m + 1. . s] will recursively be converted to right subtree of T′. The algorithms appear below. Stage one is an inorder traversal. Stage two is a postorder traversal. Both take linear time in the size of the subtree being rebuilt.
3
a

-5-
Algorithm Convert(x) s ← 0
// returns the root of the rebalanced subtree Tx // global variable
// fills array A[1. . s] , s = w[x]
// converts A[1. . s] into completely balanced BST T ′
// and returns a pointer to the root of T ′ and the root weight
// inorder traversal fills array
procedure Array2BST (i , j)
if i> j then return (nil,0)
m ← (i + j)/2
(LeftChild , LeftWeight) ←
(RightChild , RightWeight) ← Array2BST (m + 1 , j) r ← anewnode
key[r] ← A[m]
left[r] ← LeftChild
right[r] ← RightChild
w[r] ← 1 + LeftWeight + RightWeight
p[r] ← 0 return (r , w[r])
end
BST2Array(x) (r , weight) ←
return r end
procedure BST2Array(x)
if x = nil then return BST2Array(left[x])
s ← s+1 ; A[s] ← key[x] BST2Array(right[x])
end
Array2BST (1 , s)
// builds the completely balanced subtree corresponding to A[i. . j]
Array2BST (i , m − 1)
b
Consider an arbitrary node z and let x and y denote its current children (possibly nil). We claim (1), (2), (3) below are invariants. Explanation follows.
p[z]+1≥ w[y]−w[x] (1)
2p[z] ≤ w[z] − 1 (2)
w[z] = 1 + w[x] + w[y] (3)
Explanation of (1): (1) clearly holds initially and right after a rebuilding is triggered at z or any of its proper ancestors. The reason is we then have p[z] = 0, and by the complete balance property, the right hand side is at most 1. Furthermore, each time z is on a search path for a dictionary operation, the left hand side increases by 1, while the right hand side can increase by at most one (due to an increase or decrease of only one of w[y] or w[x] by one). So (1) remains invariant.
Explanation of (2): As long as rebalancing is not triggered at any ancestor of z (inclusive), we know p[z] < w[z]/2. This implies (2). If rebalancing is triggered at z or any of its proper ancestors, then p[z] is reset to 0, while w[z] ≥ 1. So, (2) remains invariant. Explanation of (3): This is obvious. Combining (1) - (3), we get the desired inequality as follows: w[x]+w[y]=w[z]−1≥2p[z]≥2 w[y]−w[x] −2≥2w[y]−2w[x]−2. That is, w[x] + w[y] ≥ 2w[y] − 2w[x] − 2, or equivalently, w[y] ≤ 3w[x] + 2. c (w[y] + 1)/3 + w[y] + 1 = (w[y] + 1). So, w[z] + 1 is at least 4/3 times the larger of w[x] + 1 and w[y] + 1. -6- Let z be a node with its children x and y (possibly nil). Without loss of generality assume w[y] ≥ w[x]. Then, from (b) we have w[x]+1≤w[y]+1≤3(w[x]+1). So, w[z]+1 = 1+w[x]+w[y]+1 ≥ 4 3 Let h denote the height of the EB-BST of size n. Let P denote a longest path (of length h) from root to a leaf. At the root, weight plus one is n + 1. Each step we take along path P, weight plus one drops by a factor of 4/3 while remaining at least 2. We conclude that h ≤ log4 / 3 ( n + 1) / 2 = O ( log n ). We know from part (a) that rebalancing takes time linear in the size of the subtree being rebalanced. To be con- crete, suppose to rebalance a subtree of size s, it takes time at most cs, for some positive constant c. Define the potential function of an EB-BST T as Φ(T)=2c Σ p[x]. (4) x∈T This is a regular potential. Now the amortized cost of a dictionary operation is as follows: Case 1) No rebuilding takes place: Since, from part (c), the search path is O( log n) long, the actual cost is O( log n). Furthermore, p[x] is increased by one for each node x along this path. Therefore, Φ(T ) is increased by O( log n). We conclude, the amortized time is cˆ = c + ∆Φ = O( log n) + O( log n) = O( log n). Case 2) Rebuilding is triggered at some node x: Now we need to add to case 1, the amortized time due to such rebuilding. Rebuilding was caused by the fact that p[x] ≥ w[x]/2. That is, 2c p[x] ≥ c w[x]. Since after rebuilding, p[x] (and that of all its descendants) are reset to zero, the drop in the potential function is at least 2c p[x], which is at least c w[x]. But the latter is an upper bound on the cost of rebuilding. In other words, the drop in potential function will pay for the rebuilding cost. Hence, the amortized cost of rebuilding is at most 0. So, the amortized cost of the dictionary operation (including the rebuilding) is still O( log n). d