代写代考 Advanced algorithms and data

Advanced algorithms and data
structures
Week 4 lecture notes by
Based on lecture slides by

Copyright By PowCoder代写 加微信 powcoder

Faculty of Information Technology August 18, 2021

1 Disjoint-set data structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1 Union by height without path compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Union by rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1. DISJOINT-SET DATA STRUCTURES
1 Disjoint-set data structures
Owing to the simplicity of the implementation of disjoint-set data structures we will omit the full discussion for now as the the week 4 lecture contains adequate detail on the assessable sections. However, we would be remiss to not at least introduce the concepts behind the full analysis of this data structure. In particular, we will cover the amortised analysis of union by height (rank) with path compression. Before we continue please note that observations 1 − 3, lemma 2 and the amortised analysis (section 1.2) is not examinable in S2 2021.
1.1 Union by height without path compression
To begin let’s recall the (examinable) analysis of union by height without path compression. In this case we can imagine assigning a rank to each element x that is synonymous with the height of the subtree rooted at x. This is because without path compression the longest path from x to a leaf is never shortened – in fact no path is shortened – and if we define the rank of x to be the height of the subtree rooted at x initially, then it remains so for all time. Note that the rank of all the roots in our set would be stored in the parent array – more correctly we’d store −(rank + 1) – that represents our set.
Example 1: The rank of elements without path compression
The disjoint-set data structure below was constructed by the following sequence of unions: union(1,2), union(3,4), union(1,3), union(5,6), union(7,8), union(5,7), and union(1,5), performed on a collection of 8 initially disjoint elements {1,2,3,4,5,6,7,8} using union by height. In this tree the ranks of each node are equal to the height of the subtree rooted at the node as shown below.
x 12345678 rank(x) 3 0 1 0 2 0 1 0
Question 1: Union by height example
Starting with a set of 8 disjoint elements, perform the unions listed in example 1.
We can leverage this connection between the rank of x and the height of the subtree rooted at x to intuit the observations and lemmas below which will be central to the analysis presented in section 1.2. 1
1Later we will see that it is the notion of a rank that is critical to the analysis and not the height specifically. 2

1. DISJOINT-SET DATA STRUCTURES
Union by height without path compression observation 1
For any element x, that is not the root of a tree, we have,
rank[x] < rank[parent[x]]. If we return to example 1 we can see that, rank[2] < rank[1], rank[3] < rank[1], rank[5] < rank[1], rank[3] < rank[4], and so on. To prove that this is true for all elements that are not roots (which don’t have a parent) simply note that the rank of the parent y of an element x must be at least rank[x]+1 as x is the child of y. Union by height without path compression observation 2 For any element x, that is not the root of a tree, the rank of x does not change under further union operations. To see this we once again consider example 1. Figure 1 shows the state of the disjoint-set data structure after the first two union operations; union(1,2), and union(3,4). Let’s focus on the rank of element 3, which at this point is 1. After the next union - union(1,3) (see figure 2) - the rank of 3 remains unchanged as it was added as a child to the tree rooted at 1. From this point onwards the rank of 3 is never altered as every union operation either: 1. involves two trees that don’t include 3 or, 2. involves the tree that includes 3 and: (a) a new tree is made the child of the root of the tree that contains 3, or (b) the tree that contains 3 becomes the subtree of another tree. In all the cases above the subtree rooted at 3 is never altered and consequentially its rank is never changed. The only time the subtree rooted at x can be altered by a union by height without path compression is if x is the root of a tree. Furthermore, the rank (height) of a root is only altered if it is unioned with a tree of the same rank (height) and is made the root of the resulting tree. Figure 1: The disjoint-set data structure after union(1,2) and union(3,4). 1. DISJOINT-SET DATA STRUCTURES Figure 2: The disjoint-set data structure after union(1,2), union(3,4), and union(1,3) Union by height without path compression observation 3 For any path from an element x to the root of its tree, the ranks of the elements along that path form a strictly increasing sequence. Recalling that each element only maintains a parent pointer, any node (except the first) in our path will be the parent of the previous node in the path. Therefore by observation 1, the sequence of ranks for this path must be strictly increasing. For instance, consider the path 8 → 7 → 5 → 1 in example 1, which defines the following sequence of ranks, 0 → 1 → 2 → 3. 1. DISJOINT-SET DATA STRUCTURES Lemma 1: Size of subtrees Any element x with rank (height) k has at least 2k elements in its subtree. Proof: We actually proved this in the lecture, but we shall reproduce the proof here for the sake of complete- Let P(k) be the statement that any element x of rank k has 2k elements in its subtree. We will prove this statement by induction on k. Base case: When k = 0 the tree rooted at x is just a singleton node, i.e. x itself. In this case the number of nodes in the treeis20 =1andsoP(0)istrue. Inductive hypothesis: Assume P(n) is true for n > 0, i.e. assume that any element x with rank n has at least 2n elements in its subtree.
Inductive step:
Now consider P (n + 1). In order to create a tree of rank (height) n + 1 we must union two trees T1 and T2 of rank n. The size of the resulting tree (Tres) is given by,
size(Tres) = size(T1) + size(T2).
By our inductive hypothesis we know that size(T1) ≥ 2n and size(T2) ≥ 2n, and so we must have,
size(Tres) = size(T1) + size(T2) ≥ 2n + 2n
size(Tres) ≥ 2n+1,
Therefore, by the principle of mathematical induction, P (k) is true for all k ≥ 0. 􏰣
We conclude that,
and so P (n + 1) is true.
Corollary 1: A rank bound
From observation 1 we know that the element with the largest rank must be the root of one of the trees in the disjoint-set data structure. In addition, lemma 1 tells us that a root x must have at least 2k nodes in its tree, where k = rank[x]. If we have N elements in our disjoint-set then we must have,
size(x) ≤ N,
with equality occurring when all the elements belong to the same tree which is rooted at x. Then by lemma
1 we have,
N ≥ size(x) ≥ 2k, log2(N) ≥ k.
That is, the rank (height) of any element in our disjoint-set data structure, when using union by height without path compression, is bounded by log2(N).
While the bound given by corollary 1 is important, this is not all we gain from lemma 1 and we can make a further statement regarding the possible ranks that can exist in our set.

1. DISJOINT-SET DATA STRUCTURES
Lemma 2: The rank lemma
For any rank k ≥ 0, there are at most N elements with rank k in a disjoint-set data structure of size N. 2k
From lemma 1 we know that each element in the set with rank k has at least 2k elements in its subtree. In addition, each element in the set has exactly one parent (unless the element is the root of a tree) and from observation 1 we know that rank(x) < rank(parent(x)). From this it follows that two elements of the same rank - that are not roots - cannot have common descendants. Moreover, if one or both of the elements are roots then they still cannot have common descendants and so in general two elements of the same rank have no common descendants. Given the above, we conclude that any node of rank k accounts for at least 2k elements and that none of these elements are common to the subtree of another node of rank k. Thus, there can be at most N elements 2k of rank k. 􏰣 Question 2: The rank lemma The proof of the rank lemma (lemma 2) relies on the fact that no two elements of the same rank have common descendants and before continuing you should convince yourself of this. In particular, concentrate on why the fact that rank(x) < rank(parent(x)) (observation 1) implies this. 1. DISJOINT-SET DATA STRUCTURES 1.2 Union by rank Having established some basic properties regrading union by height without path compression, we are now ready to analyse union by height with path compression, which we’ll refer to as union by rank hereafter. To begin we first acknowledge that in the presence of path compression the our rank is no longer defined to be strictly equal to the height. Instead the rank of an element x is an upper bound on the height of the subtree rooted at x. Example 2: The rank of elements with path compression Consider the state of the disjoint-set data structure at the end of example 1 shown below. x 12345678 rank(x) 3 0 1 0 2 0 1 0 Now imagine that our next operation is find(8) and that we are using path compression. This find operation would traverse the path 8 → 7 → 5 → 1, and having reached 1 - the root - will update the parent pointer of every node (except 1) so that it points directly to 1, as shown below. However, this operation doesn’t alter the ranks of the elements along this path. In particular, the rank of 1 remains unchanged so that rank(1) = 3, even though the height of the tree rooted at 1 is no longer 3; it is now 2. Since path compression can only ever decrease (we are compressing, or shortening paths) the height of the subtree that is rooted at an element x, the rank of x in the presence of path compression will always be such that, rank(x) ≥ height(x). x 12345678 rank(x) 3 0 1 0 2 0 1 0 Example 2 illustrates how path compression can reduce the height of the trees in our disjoint-set data structure, but leaves the rank of every element unchanged. The only thing path compression changes about the rank of an element is its connection to the height of the subtree rooted at that element. However, it is important to note that the negative entries in our parent array now represent the rank of the root of the corresponding tree rather than its height. In fact, one reason we have changed the definition of our rank is to avoid having to update the ranks of root elements during path compression. Fortunately, none of our preceding arguments referred to the height of the subtree rooted at a particular element 1. DISJOINT-SET DATA STRUCTURES specifically, they only made reference to the rank of the element whose value is unchanged2. Thus, it is easy to show that observations 1 − 3, lemma 1, and lemma 2 all still hold in the presence of path compression. Question 3: Properties of union by rank Return to the proofs for observations 1 − 3, lemma 1, and lemma 2 and convince yourself that the results are unchanged in the presence of path compression. You should focus on how the arguments are built around the rank of an element and note that this is unchanged by path compression. 2Of course our definition of the rank has changed, but the purpose of this was to ensure that the newly defined ranks function precisely as they did before we introduced path compression. 1. DISJOINT-SET DATA STRUCTURES To complete our preparation for the amortised analysis of union by rank, we need to define two further concepts used in the proof. Definition 1: The iterated logarithm function We define the iterated logarithm function (log∗) as, 􏰦0 if n ≤ 1, log∗(n) = 1 + log∗(log(n)) otherwise. Intuitively log∗(n) is the number of times we must iteratively apply the logarithm function (assumed to be to the base 2) to n before the result is less than or equal to 1. For instance, log∗(16) = 1 + log∗(log(16)) = 1 + log∗(4) = 2 + log∗(log(4)) = 2 + log∗(2) = 3 + log∗(log(2)) = 3 + log∗(1) as we require three successive applications of the logarithm function before our result is less than or equal to 1. Similarly, we have, n=1, log∗(1)=0 n=2, log∗(2)=1 n∈[3,4], log∗(n)=2 n∈[5,16], log∗(n)=3 n ∈ [17, 65536], log∗ (n) = 4 n ∈ [65537, 265536], log∗(n) = 5, and we can see that log∗ is an incredibly slowly growing function. To put this positively glacial growth into perspective, the number of atoms in the universe is estimated to be < 2270. Definition 2: Rank blocks We can divide the ranks of our elements up Block Block 1 Block 2 Block 3 Block 4 Block 5 Block 6 into the following rank blocks: Ranks encapsulated {1} {3, 4} {5,6,...16} {17,18,...216} {216 +1,216 +2,...265536} We can see that block n contains all ranks x such that, . log∗(x) = n − 1. 1. DISJOINT-SET DATA STRUCTURES The ranks in block n ≥ 2 are [xn−1 + 1, 2xn−1 ], where xn−1 is that largest rank in block n − 1 and we have x1 = 1. If we have N elements, then by lemma 1 we know that for any element x we have rank(x) ≤ log(N). In particular, the largest possible rank we could have in our set is log(N) and so there can only be log∗(N) rank blocks for this set. For instance consider N = 216, the largest possible rank in a disjoint-set with this many elements (when using union by rank) is log(216) = 16, and so we would need rank blocks 1, 2, 3, and 4 in order to encapsulate all possible ranks in this set. That is, we would need the first log∗(216) = 4 rank blocks. Finally, we are ready to perform the amortised analysis of union by rank and to do so we consider the following theorem. 1. DISJOINT-SET DATA STRUCTURES Theorem 1: Hopcraft-Ullman’s Theorem Let S be a sequence of m union/find operations, implemented using union by rank, applied to an initially dis- joint set containing n elements where n ≤ m. The complexity of this sequence of m operations is O(mlog∗(n)). We’ll use the aggregate method to determine the total cost of S. Let F be the sequence of find opera- tions performed in S and let TF denote the total cost of these finds. Similarly, let T(m) denote the total cost of our sequence S and note that, T(m)=TF +O(m), as the cost of all the finds in S is TF and there are at most m unions in S, which each contribute O(1) work in addition to the work required by two find operations already accounted for in TF . To derive an informative bound on T (m) we must now bound TF . To begin, recall that the complexity of each find operation is bounded by the number of parent pointers traversed, and if we divide the ranks of the elements in the disjoint-set into rank blocks two cases arise: • The the parent of an element x is in a higher rank block than x, i.e. rank(parent(x)) is in a higher block than rank(x). • Rank(parent(x)) and rank(x) are in the same rank block. Note that as we have rank(parent(x)) > rank(x) by observation 1 these are the only possibilities. Although the previous division accounts for all possibilities it is helpful to separate those elements whose parent is the root of a tree in our set. This defines three cases that we might encounter when traversing from an element x to its parent,
• Case 1: The parent of x is the root.
• Case 2: The the parent of x is not the root, but is in a higher rank block than x, i.e. rank(parent(x)) is
in a higher block than rank(x).
• Case 3: The parent of x is not the root and rank(parent(x)) and rank(x) are in the same rank block.
Let T1, T2, and T3 denote the total cost of the case 1, 2, and 3 traversals in F respectively. Since these are the only traversals possible we must have,
TF =T1+T2+T3.
Assume that F consists of k finds where k = O(m) as each union operation in S consists of two finds. Since each find in F makes exactly one traversal to a root we must have T1 = O(k). Similarly, since there are at most log∗(n) rank blocks we can have at most log∗(n) case 2 traversals in a single find operation and so we have T2 = O(klog∗(n)).
Finally, let’s bound the number of case 3 traversals. Suppose we are traversing from x to its parent which we’ll denote y, where both x and y are in the same rank block {m+1,m+2,…,2m}. Note that y cannot be the root since we have already counted such traversals in T1. Due to path compression, after this first traversal x’s parent will be the root and if our sequence only consists of finds every subsequent traversal involving x will be a case 1 traversal. However, our sequence can also involve unions that may make the tree containing x a subtree of another.
Let’s fix the element x and consider the sequence of parents y1 , y2 , . . . yl that x can have that yield case 3 traversals, i.e. yi is not a root and is in the same rank block as x for all 1 ≤ i ≤ l. As path compression means x is always pointed to the root after the traversal terminates and we don’t add to our sequence if x’s parent is a root, each element in y1, y2, . . . yl must be unique. Moreover, by observation 1 we know that the ranks of these nodes is strictly increasing,
rank(y1) < rank(y2) < . . . < rank(yl). 1. DISJOINT-SET DATA STRUCTURES To see this, consider the sequence of proposed traversals. First we move from x to y1 - which by definition is not the root - and after the find that caused this traversal terminates, x’s parent becomes the root of the tree y2. Now if the tree that contains x is not unioned with any other tree then any traversal from x to y2 will be a case 1 traversal and accounted for by T1. However, if y2 becomes the child of another root y3 due to a union then the traversal from x to y2 becomes a case 3 traversal and needs to be accounted for. Following the traversal from x to y2, x’s parent will become y3 (the root of our new tree) and the process repeats. Since y2 was the root of the tree that contained y1 (and x) we must have rank(y1) < rank(y2) by observation 1. Then as y2 becomes the child of y3 in the union we must have rank(y1) < rank(y2) < rank(y3) and so on. Since the elements in the sequence y1, y2, . . . yl all belong to the same rank block (they are all in the same block as x) and their ranks are strictly increasing, the length of the sequence must be bounded by the size of the rank block, so we have, l≤2m −(m+1)+1≤2m. But by the rank lemma (lemma 2) we know that the number of elements in the tree whose rank is δ is bounded by n . Thus, the number of elements that whose rank lies in the {m+1,m+2,...,2m} block is bounded by, 2δ n + n +... n ≤ n (by geometric series). 2m+1 2m+2 22m 2m So we have at most 2m case 3 traversals for each node in the {m+1,m+2,...,2m} block giving at most, n × 2m = n, case 3 traversals for this block. There are at most log∗(n) rank blocks and so we have O(nlog∗(n)) case 3 traversals. In summary we have, TF =T1+T2+T3 = O(k) + O(klog∗(n)) + O(nlog∗(n)) = O(m) + O(mlog∗(n)) + O(nlog∗(n)) (since k = O(m)) = O(mlog∗(n)) (since m ≥ n), and accounting for the extra work in the unions in S we obtain, T(m)=O(m)+TF =O(m)+O(mlog∗(n))=O(mlog∗(n)). The implications of Hopcraft-Ullman’s theorem are profound. The theorem itself tells us that if we preform many (m ≥ n) operations on our disjoint set (beginning with a completely disjoint set) then the total cost of this sequence of operations grows as O(mlog∗(n)). We also know that log∗(n) grows incredibly slowly and so for any practical n it is going to be bounded by a constant. This means that the amortised complexity of the finds and unions in our sequence is essentially O(1)! We’ll conclude by noting that the Hopcraft-Ullman bound is actually a loose bound and it was proved by Tarjan that and sequence of m unions and finds grows as O(mα(n)), where α(n) is the inverse Ackermann function. It turns out that α(n) grows excruciatingly more slowly than even log∗(n) and that O(mα(n)) is the optimal bound for the disjoint set data structure amoritsed over m operat 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com