17 Disjoint Set Representation

1. Analysis of Algorithm Disjoint Set Representation Andres Mendez-Vazquez November 8, 2015 1 / 114

2. Outline 1 Disjoint Set Representation Deﬁnition of the Problem Operations 2 Union-Find Problem The Main Problem Applications 3 Implementations First Attempt: Circular List Operations and Cost Still we have a Problem Weighted-Union Heuristic Operations Still a Problem Heuristic Union by Rank 4 Balanced Union Path compression Time Complexity Ackermann’s Function Bounds The Rank Observation Proof of Complexity Theorem for Union by Rank and Path Compression) 2 / 114

4. Disjoint Set Representation Problem 1 Items are drawn from the ﬁnite universe U = 1, 2, ..., n for some ﬁxed n. 2 We want to maintain a partition of U as a collection of disjoint sets. 3 In addition, we want to uniquely name each set by one of its items called its representative item. These disjoint sets are maintained under the following operations 1 MakeSet(x) 2 Union(A,B) 3 Find(x) 4 / 114

11. Operations MakeSet(x) Given x ∈ U currently not belonging to any set in the collection, create a new singleton set {x} and name it x. This is usually done at start, once per item, to create the initial trivial partition. Union(A,B) It changes the current partition by replacing its sets A and B with A ∪ B. Name the set A or B. The operation may choose either one of the two representatives as the new representatives. Find(x) It returns the name of the set that currently contains item x. 6 / 114

14. Example for x = 1 to 9 do MakeSet(x) 98621 2 3 4 5 6 8 97 Then, you do a Union(1, 2) Now, Union(3, 4); Union(5, 8); Union(6, 9) 7 / 114

15. Example for x = 1 to 9 do MakeSet(x) 98621 2 3 4 5 6 8 97 Then, you do a Union(1, 2) 9863 4 5 6 8 9721 2 Now, Union(3, 4); Union(5, 8); Union(6, 9) 7 / 114

16. Example for x = 1 to 9 do MakeSet(x) 98621 2 3 4 5 6 8 97 Then, you do a Union(1, 2) 9863 4 5 6 8 9721 2 Now, Union(3, 4); Union(5, 8); Union(6, 9) 21 2 3 4 5 678 9 7 / 114

17. Example Now, Union(1, 5); Union(7, 4) 21 2 3 45 678 9 Then, if we do the following operations Find(1) returns 5 Find(9) returns 9 Finally, Union(5, 9) Then Find(9) returns 5 8 / 114

18. Example Now, Union(1, 5); Union(7, 4) 21 2 3 45 678 9 Then, if we do the following operations Find(1) returns 5 Find(9) returns 9 Finally, Union(5, 9) Then Find(9) returns 5 8 / 114

19. Example Now, Union(1, 5); Union(7, 4) 21 2 3 45 678 9 Then, if we do the following operations Find(1) returns 5 Find(9) returns 9 Finally, Union(5, 9) 21 2 3 45 6 78 9 Then Find(9) returns 5 8 / 114

21. Union-Find Problem Problem S = be a sequence of m = |S| MakeSet, Union and Find operations (intermixed in arbitrary order): n of which are MakeSet. At most n − 1 are Union. The rest are Finds. Cost(S) = total computational time to execute sequence s. Goal: Find an implementation that, for every m and n, minimizes the amortized cost per operation: Cost (S) |S| (1) for any arbitrary sequence S. 10 / 114

28. Applications Examples 1 Maintaining partitions and equivalence classes. 2 Graph connectivity under edge insertion. 3 Minimum spanning trees (e.g. Kruskal’s algorithm). 4 Random maze construction. 12 / 114

33. Circular lists We use the following structures Data structure: Two arrays Set[1..n] and next[1..n]. Set[x] returns the name of the set that contains item x. A is a set if and only if Set[A] = A next[x] returns the next item on the list of the set that contains item x. 14 / 114

37. Circular lists Example: n = 16, Partition: {{1, 2, 8, 9} , {4, 3, 10, 13, 14, 15, 16} , {7, 6, 5, 11, 12} Set next 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 1 4 4 7 7 7 1 1 4 7 7 4 4 4 4 2 8 10 3 12 5 6 9 1 13 7 11 14 15 16 4 Set Position 1 15 / 114

38. Circular lists Example: n = 16, Partition: {{1, 2, 8, 9} , {4, 3, 10, 13, 14, 15, 16} , {7, 6, 5, 11, 12} Set next 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 1 4 4 7 7 7 1 1 4 7 7 4 4 4 4 2 8 10 3 12 5 6 9 1 13 7 11 14 15 16 4 Set Position 1 1 2 8 91 15 / 114

39. Circular lists Set Position 7 7 6 5 12 117 Set Position 4 16 / 114

40. Circular lists Set Position 7 7 6 5 12 117 Set Position 4 4 3 10 13 14 15 164 16 / 114

41. Operations and Cost Make(x) 1 Set[x] = x 2 next[x] = x Complexity O (1) Time Find(x) 1 return Set[x] Complexity O (1) Time 17 / 114

45. Operations and Cost For the union We are assuming Set[A] = A =Set[B] = B Union1(A, B) 1 Set[B] = A 2 x =next[B] 3 while (x = B) 4 Set[x] = A /* Rename Set B to A*/ 5 x =next[x] 6 x =next[B] /* Splice list A and B */ 7 next[B] =next[A] 8 next[A] = x 18 / 114

48. Operations an Cost Thus, we have in the Splice part A B x 19 / 114

49. We have a Problem Complexity O (|B|) Time Not only that, if we have the following sequence of operations 1 for x = 1 to n 2 MakeSet(x) 3 for x = 1 to n − 1 4 Union1(x + 1, x) 20 / 114

50. We have a Problem Complexity O (|B|) Time Not only that, if we have the following sequence of operations 1 for x = 1 to n 2 MakeSet(x) 3 for x = 1 to n − 1 4 Union1(x + 1, x) 20 / 114

51. Thus Thus, we have the following number of aggregated steps n + n−1 i=1 i = n + n (n − 1) 2 = n + n2 − n 2 = n2 2 + n 2 = Θ n2 21 / 114

56. Aggregate Time Thus, the aggregate time is as follow Aggregate Time = Θ n2 Therefore Amortized Time per operation = Θ (n) 22 / 114

57. Aggregate Time Thus, the aggregate time is as follow Aggregate Time = Θ n2 Therefore Amortized Time per operation = Θ (n) 22 / 114

58. This is not exactly good Thus, we need to have something better We will try now the Weighted-Union Heuristic!!! 23 / 114

60. Implementation 2: Weighted-Union Heuristic Lists We extend the previous data structure Data structure: Three arrays Set[1..n], next[1..n], size[1..n]. size[A] returns the number of items in set A if A == Set[A] (Otherwise, we do not care). 25 / 114

61. Operations MakeSet(x) 1 Set[x] = x 2 next[x] = x 3 size[x] = 1 Complexity O (1) time Find(x) 1 return Set[x] Complexity O (1) time 26 / 114

65. Operations Union2(A, B) 1 if size[set [A]] >size[set [B]] 2 size[set [A]] =size[set [A]]+size[set [B]] 3 Union1(A, B) 4 else 5 size[set [B]] =size[set [A]]+size[set [B]] 6 Union1(B, A) Note: Weight Balanced Union: Merge smaller set into large set Complexity O (min {|A| , |B|}) time. 27 / 114

66. Operations Union2(A, B) 1 if size[set [A]] >size[set [B]] 2 size[set [A]] =size[set [A]]+size[set [B]] 3 Union1(A, B) 4 else 5 size[set [B]] =size[set [A]]+size[set [B]] 6 Union1(B, A) Note: Weight Balanced Union: Merge smaller set into large set Complexity O (min {|A| , |B|}) time. 27 / 114

67. What about the operations eliciting the worst behavior Remember 1 for x = 1 to n 2 MakeSet(x) 3 for x = 1 to n − 1 4 Union2(x + 1, x) We have then n + n−1 i=1 1 = n + n − 1 = 2n − 1 = Θ (n) IMPORTANT: This is not the worst sequence!!! 28 / 114

68. What about the operations eliciting the worst behavior Remember 1 for x = 1 to n 2 MakeSet(x) 3 for x = 1 to n − 1 4 Union2(x + 1, x) We have then n + n−1 i=1 1 = n + n − 1 = 2n − 1 = Θ (n) IMPORTANT: This is not the worst sequence!!! 28 / 114

69. For this, notice the following worst sequence Worst Sequence s MakeSet(x), for x = 1, .., n. Then do n − 1 Unions in round-robin manner. Within each round, the sets have roughly equal size. Starting round: Each round has size 1. Next round: Each round has size 2. Next: ... size 4. ... We claim the following Aggregate time = Θ(n log n) Amortized time per operation = Θ(log n) 29 / 114

77. For this, notice the following worst sequence Example n = 16 Round 0: {1} {2} {3} {4} {5} {6} {7} {8} {9} {10} {11} {12} {13} {14} {15} {16} Round 1: {1, 2} {3, 4} {5, 6} {7, 8} {9, 10} {11, 12} {13, 14} {15, 16} Round 2: {1, 2, 3, 4} {5, 6, 7, 8} {9, 10, 11, 12} {13, 14, 15, 16} Round 3: {1, 2, 3, 4, 5, 6, 7, 8} {9, 10, 11, 12, 13, 14, 15, 16} Round 4: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16} 30 / 114

82. Now Given the previous worst case What is the complexity of this implementation? 31 / 114

83. Now, the Amortized Costs of this implementation Claim 1: Amortized time per operation is O(log n) For this, we have the following theorem!!! Theorem 1 Using the linked-list representation of disjoint sets and the weighted-Union heuristic, a sequence of m MakeSet, Union, and FindSet operations, n of which are MakeSet operations, takes O (m + n log n) time. 32 / 114

84. Now, the Amortized Costs of this implementation Claim 1: Amortized time per operation is O(log n) For this, we have the following theorem!!! Theorem 1 Using the linked-list representation of disjoint sets and the weighted-Union heuristic, a sequence of m MakeSet, Union, and FindSet operations, n of which are MakeSet operations, takes O (m + n log n) time. 32 / 114

85. Proof Because each Union operation unites two disjoint sets We perform at most n − 1 Union operations over all. We now bound the total time taken by these Union operations We start by determining, for each object, an upper bound on the number of times the object’s pointer back to its set object is updated. Consider a particular object x. We know that each time x’s pointer was updated, x must have started in the smaller set. The ﬁrst time x’s pointer was updated, therefore, the resulting set must have had at least 2 members. Similarly, the next time x’s pointer was updated, the resulting set must have had at least 4 members. 33 / 114

90. Proof Continuing on We observe that for any k ≤ n, after x’s pointer has been updated log n times!!! The resulting set must have at least k members. Thus Since the largest set has at most n members, each object’s pointer is updated at most log n times over all the Union operations. Then The total time spent updating object pointers over all Union operations is O (n log n). 34 / 114

94. Proof We must also account for updating the tail pointers and the list lengths It takes only O (1) time per Union operation Therefore The total time spent in all Union operations is thus O (n log n). The time for the entire sequence of m operations follows easily Each MakeSet and FindSet operation takes O (1) time, and there are O (m) of them. 35 / 114

97. Proof Therefore The total time for the entire sequence is thus O (m + n log n). 36 / 114

98. Amortized Cost: Aggregate Analysis Aggregate cost O(m + n log n). Amortized cost per operation O(log n). O(m + n log n) m = O (1 + log n) = O (log n) (2) 37 / 114

99. There are other ways of analyzing the amortized cost It is possible to use 1 Accounting Method. 2 Potential Method. 38 / 114

100. Amortized Costs: Accounting Method Accounting method MakeSet(x): Charge (1 + log n). 1 to do the operation, log n stored as credit with item x. Find(x): Charge 1, and use it to do the operation. Union(A, B): Charge 0 and use 1 stored credit from each item in the smaller set to move it. 39 / 114

103. Amortized Costs: Accounting Method Credit invariant Total stored credit is S |S| log n |S| , where the summation is taken over the collection S of all disjoint sets of the current partition. 40 / 114

104. Amortized Costs: Potential Method Potential function method Exercise: Deﬁne a regular potential function and use it to do the amortized analysis. Can you make the Union amortized cost O(log n), MakeSet and Find costs O(1)? 41 / 114

108. Improving over the heuristic using union by rank Union by Rank Instead of using the number of nodes in each tree to make a decision, we maintain a rank, a upper bound on the height of the tree. We have the following data structure to support this: We maintain a parent array p[1..n]. A is a set if and only if A = p[A] (a tree root). x ∈ A if and only if x is in the tree rooted at A. 43 / 114

112. Improving over the heuristic using union by rank Union by Rank Instead of using the number of nodes in each tree to make a decision, we maintain a rank, a upper bound on the height of the tree. We have the following data structure to support this: We maintain a parent array p[1..n]. A is a set if and only if A = p[A] (a tree root). x ∈ A if and only if x is in the tree rooted at A. 1 13 5 8 20 14 10 19 4 2 18 6 15 9 11 17 7 3 12 16 43 / 114

113. Forest of Up-Trees: Operations without union by rank or weight MakeSet(x) 1 p[x] = x Complexity O (1) time Union(A, B) 1 p[B] = A Note: We are assuming that p[A] == A =p[B] == B. This is the reason we need a ﬁnd operation!!! 44 / 114

116. Example Remember we are doing the joins without caring about getting the worst case B A 45 / 114

117. Forest of Up-Trees: Operations without union by rank or weight Find(x) 1 if x ==p[x] 2 return x 3 return Find(p [x]) Example 46 / 114

118. Forest of Up-Trees: Operations without union by rank or weight Find(x) 1 if x ==p[x] 2 return x 3 return Find(p [x]) Example x 46 / 114

119. Forest of Up-Trees: Operations without union by rank or weight Still I can give you a horrible case Sequence of operations 1 for x = 1 to n 2 MakeSet(x) 3 for x = 1 to n − 1 4 Union(x) 5 for x = 1 to n − 1 6 Find(1) 47 / 114

120. Forest of Up-Trees: Operations without union by rank or weight Still I can give you a horrible case Sequence of operations 1 for x = 1 to n 2 MakeSet(x) 3 for x = 1 to n − 1 4 Union(x) 5 for x = 1 to n − 1 6 Find(1) 47 / 114

121. Forest of Up-Trees: Operations without union by rank or weight We ﬁnish with this data structure 1 2 n-1 n Thus the last part of the sequence give us a total time of Aggregate Time Θ n2 Amortized Analysis per operation Θ (n) 48 / 114

122. Forest of Up-Trees: Operations without union by rank or weight We ﬁnish with this data structure 1 2 n-1 n Thus the last part of the sequence give us a total time of Aggregate Time Θ n2 Amortized Analysis per operation Θ (n) 48 / 114

123. Self-Adjusting forest of Up-Trees How, we avoid this problem Use together the following heuristics!!! 1 Balanced Union. By tree weight (i.e., size) By tree rank (i.e., height) 2 Find with path compression Observations Each single improvement (1 or 2) by itself will result in logarithmic amortized cost per operation. The two improvements combined will result in amortized cost per operation approaching very close to O(1). 49 / 114

130. Balanced Union by Size Using size for Balanced Union We can use the size of each set to obtain what we want 50 / 114

131. We have then MakeSet(x) 1 p[x] = x 2 size[x] = 1 Note: Complexity O (1) time Union(A, B) Input: assume that p[A]=A=p[B]=B 1 if size[A] >size[B] 2 size[A] =size[A]+size[B] 3 p[B] = A 4 else 5 size[B] =size[A]+size[B] 6 p[A] = B Note: Complexity O (1) time 51 / 114

134. Example Now, we use the size for the union size[A]>size[B] B A 52 / 114

135. Nevertheless Union by size can make the analysis too complex People would rather use the rank Rank It is deﬁned as the height of the tree Because The use of the rank simplify the amortized analysis for the data structure!!! 53 / 114

138. Thus, we use the balanced union by rank MakeSet(x) 1 p[x] = x 2 rank[x] = 0 Note: Complexity O (1) time Union(A, B) Input: assume that p[A]=A=p[B]=B 1 if rank[A] >rank[B] 2 p[B] = A 3 else 4 p[A] = B 5 if rank[A] ==rank[B] 6 rank[B]=rank[B]+1 Note: Complexity O (1) time 54 / 114

141. Example Now We use the rank for the union Case I The rank of A is larger than B 55 / 114

142. Example Now We use the rank for the union Case I The rank of A is larger than B rank[A]>rank[B] B A 55 / 114

143. Example Case II The rank of B is larger than A 56 / 114

144. Example Case II The rank of B is larger than A rank[B]>rank[A] B A 56 / 114

146. Here is the new heuristic to improve overall performance: Path Compression Find(x) 1 if x=p[x] 2 p[x]=Find(p [x]) 3 return p[x] Complexity O (depth (x)) time 58 / 114

147. Here is the new heuristic to improve overall performance: Path Compression Find(x) 1 if x=p[x] 2 p[x]=Find(p [x]) 3 return p[x] Complexity O (depth (x)) time 58 / 114

148. Example We have the following structure 59 / 114

149. Example The recursive Find(p [x]) 60 / 114

150. Example The recursive Find(p [x]) 61 / 114

151. Path compression Find(x) should traverse the path from x up to its root. This might as well create shortcuts along the way to improve the eﬃciency of the future operations. Find(2) 3 13 15 10 13 15 10 3 12 14 12 1411 11 2 2 1 1 7 8 7 84 6 5 9 4 6 5 9 62 / 114

153. Time complexity Tight upper bound on time complexity An amortized time of O(mα(m, n)) for m operations. Where α(m, n) is the inverse of the Ackermann’s function (almost a constant). This bound, for a slightly diﬀerent deﬁnition of α than that given here is shown in Cormen’s book. 64 / 114

156. Ackermann’s Function Definition A(1, j) = 2j where j ≥ 1 A(i, 1) = A(i − 1, 2) where i ≥ 2 A(i, j) = A(i − 1, A(i, j − 1)) where i, j ≥ 2 Note: This is one of several in-equivalent but similar definitions of Ackermann’s function found in the literature. Cormen’s book authors give a different definition, although they never really call theirs Ackermann’s function. Property Ackermann’s function grows very fast, thus it’s inverse grows very slow. 65 / 114

162. Ackermann’s Function Example A(3, 4) A (3, 4) = 2 2... 2 2 2 ... 2 2 2 ... 2 16 Notation: 2 2... 2 10 means 22222222222 66 / 114

163. Inverse of Ackermann’s function Deﬁnition α(m, n) = min i ≥ 1|A i, m n > log n (3) Note: This is not a true mathematical inverse. Intuition: Grows about as slowly as Ackermann’s function does fast. How slowly? Let m n = k, then m ≥ n → k ≥ 1 67 / 114

167. Thus First We can show that A(i, k) ≥ A(i, 1) for all i ≥ 1. This is left to you... For Example Consider i = 4, then A(i, k) ≥ A(4, 1) = 2 2...2 10 ≈ 1080. Finally if log n < 1080. i.e., if n < 21080 =⇒ α(m, n) ≤ 4 68 / 114

172. Instead of Using the Ackermann Inverse We deﬁne the following function log∗ n = min i ≥ 0| log(i) n ≤ 1 (4) The i means log · · · log n i times Then We will establish O (m log∗ n) as upper bound. 69 / 114

173. Instead of Using the Ackermann Inverse We deﬁne the following function log∗ n = min i ≥ 0| log(i) n ≤ 1 (4) The i means log · · · log n i times Then We will establish O (m log∗ n) as upper bound. 69 / 114

174. In particular Something Notable In particular, we have that log∗ 2 2... 2 k = k + 1 For Example log∗ 265536 = 2 2222 4 = 5 (5) Therefore We have that log∗ n ≤ 5 for all practical purposes. 70 / 114

177. The Rank Observation Something Notable It is that once somebody becomes a child of another node their rank does not change given any posterior operation. 71 / 114

178. For Example The number in the right is the height MakeSet(1),MakeSet(2),MakeSet(3), ..., MakeSet(10) 1/0 2/0 3/0 4/0 5/0 6/0 7/0 8/0 9/0 10/0 72 / 114

179. Example Now, we do Union(6, 1),Union(7, 2), ..., Union(10, 1) 1/1 2/1 3/1 4/1 5/1 6/0 7/0 8/0 9/0 10/0 73 / 114

180. Example Next - Assuming that you are using a FindSet to get the name set Union(1, 2) 3/1 4/1 5/1 8/0 9/0 10/0 2/2 7/01/1 6/0 74 / 114

181. Example Next Union(3, 4) 2/2 7/01/1 6/0 3/1 4/2 8/0 9/0 5/1 10/0 75 / 114

182. Example Next Union(2, 4) 2/2 1/1 6/0 7/0 3/1 4/3 8/0 9/0 5/1 10/0 76 / 114

183. Example Now you give a FindSet(8) 2/2 1/1 6/0 7/0 4/3 9/0 5/1 10/03/1 8/0 77 / 114

184. Example Now you give a Union(4, 5) 2/2 1/1 6/0 7/0 4/3 9/03/1 8/0 5/1 10/0 78 / 114

185. Properties of ranks Lemma 1 (About the Rank Properties) 1 ∀x, rank[x] ≤ rank[p[x]]. 2 ∀x and x = p[x], then rank[x] < rank[p[x]]. 3 rank[x] is initially 0. 4 rank[x] does not decrease. 5 Once x = p[x] holds rank[x] does not change. 6 rank[p[x]] is a monotonically increasing function of time. Proof By induction on the number of operations... 79 / 114

192. For Example Imagine a MakeSet(x) Then, rank [x] ≤ rank [p [x]] Thus, it is true after n operations. The we get the n + 1 operations that can be: Case I - FindSet. Case II - Union. The rest are for you to prove It is a good mental exercise!!! 80 / 114

197. The Number of Nodes in a Tree Lemma 2 For all tree roots x, size(x) ≥ 2rank[x] Note size (x)= Number of nodes in tree rooted at x Proof By induction on the number of link operations: Basis Step Before ﬁrst link, all ranks are 0 and each tree contains one node. Inductive Step Consider linking x and y (Link (x, y)) Assume lemma holds before this operation; we show that it will holds after. 81 / 114

204. Case 1: rank[x] = rank[y] Assume rank [x] < rank [y] Note: rank [x] == rank [x] and rank [y] == rank [y] 82 / 114

205. Therefore We have that size (y) = size (x) + size (y) ≥ 2rank[x] + 2rank[y] ≥ 2rank[y] = 2rank [y] 83 / 114

209. Case 2: rank[x] == rank[y] Assume rank [x] == rank [y] Note: rank [x] == rank [x] and rank [y] == rank [y] + 1 84 / 114

210. Therefore We have that size (y) = size (x) + size (y) ≥ 2rank[x] + 2rank[y] ≥ 2rank[y]+1 = 2rank [y] Note: In the worst case rank [x] == rank [y] == 0 85 / 114

215. The number of nodes at certain rank Lemma 3 For any integer r ≥ 0, there are an most n 2r nodes of rank r. Proof First ﬁx r. When rank r is assigned to some node x, then imagine that you label each node in the tree rooted at x by “x.” By lemma 21.3, 2r or more nodes are labeled each time when executing a union. By lemma 21.2, each node is labeled at most once, when its root is ﬁrst assigned rank r. If there were more than n 2r nodes of rank r. Then, we will have that more than 2r · n 2r = n nodes would be labeled by a node of rank r, a contradiction. 86 / 114

222. Corollary 1 Corollary 1 Every node has rank at most log n . Proof if there is a rank r such that r > log n → n 2r < 1 nodes of rank r a contradiction. 87 / 114

223. Corollary 1 Corollary 1 Every node has rank at most log n . Proof if there is a rank r such that r > log n → n 2r < 1 nodes of rank r a contradiction. 87 / 114

224. Providing the time bound Lemma 4 (Lemma 21.7) Suppose we convert a sequence S of m MakeSet, Union and FindSet operations into a sequence S of m MakeSet, Link, and FindSet operations by turning each Union into two FindSet operations followed by a Link. Then, if sequence S runs in O(m log∗ n) time, sequence S runs in O(m log∗ n) time. 88 / 114

225. Proof: The proof is quite easy 1 Since each UNION operation in sequence S is converted into three operations in S. m ≤ m ≤ 3m (6) 2 We have that m = O (m ) 3 Then, if the new sequence S runs in O (m log∗ n) this implies that the old sequence S runs in O (m log∗ n) 89 / 114

228. Theorem for Union by Rank and Path Compression Theorem Any sequence of m MakeSet, Link, and FindSet operations, n of which are MakeSet operations, is performed in worst-case time O(m log∗ n). Proof First, MakeSet and Link take O(1) time. The Key of the Analysis is to Accurately Charging FindSet. 90 / 114

231. For this, we have the following We can do the following Partition ranks into blocks. Put each rank j into block log∗ r for r = 0, 1, ..., log n (Corollary 1). Highest-numbered block is log∗ (log n) = (log∗ n) − 1. In addition, the cost of FindSet pays for the foollowing situations 1 The FindSet pays for the cost of the root and its child. 2 A bill is given to every node whose rank parent changes in the path compression!!! 91 / 114

236. Now, deﬁne the Block function Deﬁne the following Upper Bound Function B(j) ≡    −1 if j = −1 1 if j = 0 2 if j = 1 2 2... 2 j−1 if j ≥ 2 92 / 114

237. First Something Notable These are going to be the upper bounds for blocks in the ranks Where For j = 0, 1, ..., log∗ n − 1, block j consist of the set of ranks: B(j − 1) + 1, B(j − 1) + 2, ..., B(j) Elements in Block j (7) 93 / 114

238. First Something Notable These are going to be the upper bounds for blocks in the ranks Where For j = 0, 1, ..., log∗ n − 1, block j consist of the set of ranks: B(j − 1) + 1, B(j − 1) + 2, ..., B(j) Elements in Block j (7) 93 / 114

239. For Example We have that B(−1) = 1 B(0) = 0 B(1) = 2 B(2) = 22 = 4 B(3) = 222 = 24 = 16 B(4) = 2222 = 216 = 65536 94 / 114

245. For Example Thus, we have Block j Set of Ranks 0 0,1 1 2 2 3,4 3 5,...,16 4 17,...,65536 ... ... Note B(j) = 2B(j−1) for j > 0. 95 / 114

246. For Example Thus, we have Block j Set of Ranks 0 0,1 1 2 2 3,4 3 5,...,16 4 17,...,65536 ... ... Note B(j) = 2B(j−1) for j > 0. 95 / 114

247. Example Now you give a Union(4, 5) 2/2 1/1 6/0 7/0 4/3 9/03/1 8/0 5/1 10/0 Bock 0 Bock 1 Bock 2 96 / 114

248. Finally Given our Bound in the Ranks Thus, all the blocks from B (0) to B (log∗ n − 1) will be used for storing the ranking elements 97 / 114

249. Charging for FindSets Two types of charges for FindSet(x0) Block charges and Path charges. Charge each node as either: 1) Block Charge 2) Path Charge 98 / 114

250. Charging for FindSets Thus, for ﬁnd sets The ﬁnd operation pays for the work done for the root and its immediate child. It also pays for all the nodes which are not in the same block as their parents. 99 / 114

251. Charging for FindSets Thus, for ﬁnd sets The ﬁnd operation pays for the work done for the root and its immediate child. It also pays for all the nodes which are not in the same block as their parents. 99 / 114

252. Then First 1 All these nodes are children of some other nodes, so their ranks will not change and they are bound to stay in the same block until the end of the computation. 2 If a node is in the same block as its parent, it will be charged for the work done in the FindSet Operation!!! 100 / 114

253. Then First 1 All these nodes are children of some other nodes, so their ranks will not change and they are bound to stay in the same block until the end of the computation. 2 If a node is in the same block as its parent, it will be charged for the work done in the FindSet Operation!!! 100 / 114

254. Thus We have the following charges Block Charge : For j = 0, 1, ..., log∗ n − 1, give one block charge to the last node with rank in block j on the path x0, x1, ..., xl. Also give one block charge to the child of the root, i.e., xl−1, and the root itself, i.e., xl−1. Path Charge : Give nodes in x0, ..., xl a path charge until they are moved to point to a name element with a rank diﬀerent from the child’s block 101 / 114

261. Next Something Notable Number of nodes whose parents are in diﬀerent blocks is limited by (log∗ n) − 1. Making it an upper bound for the charges when changing the last node with rank in block j. 2 charges for the root and its child. Thus The cost of the Block Charges for the FindSet operation is upper bounded by: log∗ n − 1 + 2 = log∗ n + 1. (8) 103 / 114

266. Claim Claim Once a node other than a root or its child is given a Block Charge (B.C.), it will never be given a Path Charge (P.C.) 104 / 114

267. Proof Proof Given a node x, we know that: rank [p [x]] − rank [x] is monotonically increasing ⇒ log∗ rank [p [x]] − log∗ rank [x] is monotonically increasing. Thus, Once x and p[x] are in different blocks, they will always be in different blocks because: The rank of the parent can only increases. And the child’s rank stays the same Thus, the node x will be billed in the first FindSet operation a patch charge and block charge if necessary. Thus, the node x will never be charged again a path charge because is already pointing to the member set name. 105 / 114

274. Remaining Goal The Total cost of the FindSet’s Operations Total cost of FindSet’s = Total Block Charges + Total Path Charges. We want to show Total Block Charges + Total Path Charges= O(m log∗ n) 106 / 114

275. Remaining Goal The Total cost of the FindSet’s Operations Total cost of FindSet’s = Total Block Charges + Total Path Charges. We want to show Total Block Charges + Total Path Charges= O(m log∗ n) 106 / 114

276. Bounding Block Charges This part is easy Block numbers range over 0, ..., log∗ n − 1. The number of Block Charges per FindSet is ≤ log∗ n + 1 . The total number of FindSet’s is ≤ m The total number of Block Charges is ≤ m(log∗ n + 1) . 107 / 114

280. Bounding Path Charges Claim Let N(j) be the number of nodes whose ranks are in block j. Then, for all j ≥ 0, N(j) ≤ 3n 2B(j) Proof By Lemma 3, N(j) ≤ B(j) r=B(j−1)+1 n 2r summing over all possible ranks For j = 0: N (0) ≤ n 20 + n 2 = 3n 2 = 3n 2B(0) 108 / 114

285. Proof of claim For j ≥ 1 N(j) ≤ n 2B(j−1)+1 B(j)−(B(j−1)+1) r=0 1 2r < n 2B(j−1)+1 ∞ r=0 1 2r = n 2B(j−1) This is where the fact that B (j) = 2B(j−1)is used. = n B(j) < 3n 2B(j) 109 / 114

290. Bounding Path Charges We have the following Let P(n) denote the overall number of path charges. Then: P(n) ≤ log∗ n−1 j=0 αj · βj (9) αj is the max number of nodes with ranks in Block j βj is the max number of path charges per node of Block j. 110 / 114

293. Then, we have the following Upper Bounds By claim, αj upper-bounded by 3n 2B(j) , In addition, we need to bound βj that represents the maximum number of path charges for nodes x at block j. Note: Any node in Block j that is given a P.C. will be in Block j after all m operations. 111 / 114

296. Then, we have the following Upper Bounds By claim, αj upper-bounded by 3n 2B(j) , In addition, we need to bound βj that represents the maximum number of path charges for nodes x at block j. Note: Any node in Block j that is given a P.C. will be in Block j after all m operations. Path Compression is issued 111 / 114

297. Now, we bound βj So, every time x is assessed a Path Charges, it gets a new parent with increased rank. Note: x’s rank is not changed by path compression Suppose x has a rank in Block j Repeated Path Charges to x will ultimately result in x’s parent having a rank in a Block higher than j. From that point onward, x is given Block Charges, not Path Charges. Therefore, the Worst Case x has the lowest rank in Block j, i.e., B (j − 1) + 1, and x’s parents ranks successively take on the values. B(j − 1) + 2, B(j − 1) + 3, ..., B(j) 112 / 114

302. Finally Hence, x can be given at most B(j) − B(j − 1) − 1 Path Charges. Therefore: P(n) ≤ log∗ n−1 j=0 3n 2B(j)(B(j) − B(j − 1) − 1) P(n) ≤ log∗ n−1 j=0 3n 2B(j) B(j) P(n) = 3 2n log∗ n 113 / 114

306. Thus FindSet operations contribute O(m(log∗ n + 1) + n log∗ n) = O(m log∗ n) (10) MakeSet and Link contribute O(n) Entire sequence takes O (m log∗ n). 114 / 114

307. Thus FindSet operations contribute O(m(log∗ n + 1) + n log∗ n) = O(m log∗ n) (10) MakeSet and Link contribute O(n) Entire sequence takes O (m log∗ n). 114 / 114

17 Disjoint Set Representation

More Related Content

What's hot (20)

Viewers also liked (13)

Similar to 17 Disjoint Set Representation (20)

More from Andres Mendez-Vazquez (20)

Recently uploaded (20)

17 Disjoint Set Representation