Path & Reachability
COMP9312_22T2
– Reachability
Copyright By PowCoder代写 加微信 powcoder
Transitive closure
Optimal Tree cover Two-Hop labelling
– Shortest Path
Dijkstra’s algorithm
A* algorithm Floyd-Warshall algorithm
Reachability
Problem formulation
Given an unweighted directed graph G and two nodes u and v, is there a path connecting u to v (denoted u↝v)?
0↝5? YES 0↝2? NO
Directed Graph à DAG (directed acyclic graph) by coalescing the strongly connected components
Motivation
• Classical problem in graph theory.
• Studying the influence flow in social networks.
– Even undirected graphs (facebook) are converted to directed w.r.t a certain attribute distribution
• Security: finding possible connections between suspects.
• Biological data: is that protein involved directly or indirectly in the
expression of a gene?
• Primitive for many graph related problems (pattern matching).
An Online Approach
Whether or not u↝v
• Conduct DFS or BFS starting from u
• ifthenodevisdiscovered: • then stop search, report YES
• Ifthestack/queueisempty: • then report NO
No index and thus no construction overhead and no extra space consumption
Query time: O(m+n) the entire graph will be traversed in the worst case
Index-based methods
1. Transitive closure
Run the Floyd-Warshall algorithm and store all possible query results in a matrix.
2. Tree cover (DAG)
Use spanning trees to store the reachability information that is originally stored in transitive closure in hierarchy.
3. 2-hop labeling
For each node in the graph, assign two label sets for it to store the reachability information that is originally stored in transitive closure.
Index-based methods for directed graph
Most index-based reachability methods assume the directed graph is a DAG (directed acyclic graph), which can be derived by contracted all SCCs (strongly connected components).
Transitive Closure
Transitive Closure (TC)
A transitive closure is a Boolean matrix storing the answers of all possible reachability queries. The size of the matrix is 𝑂(𝑛!), where 𝑛 denotes the number of vertices in the graph.
302 The original graph G
The transitive closure of G
Transitive Closure (TC)
The transitive closure is a Boolean matrix: bool tc[num_vertices][num_vertices];
// Initialize the matrix tc: O(n^2)
tc[i][j] = 1 if there is an edge from i to j, or i == j;
// Run Floyd-Warshall
for ( int k = 0; k < num_vertices; ++k ) {
for ( int i = 0; i < num_vertices; ++i ) {
for ( int j = 0; j < num_vertices; ++j ) {
tc[i][j] = tc[i][j] || (tc[i][k] && tc[k][j]);
The Floyd-Warshall algorithm will be covered in Topic 2.2 (Shortest Path)
Transitive Closure (TC)
// Run Floyd-Warshall
for ( int k = 0; k < num_vertices; ++k ) {
After the iteration k, we find the reachability pairs (i,j) where the reachability path is formed by {v_0,v_1,...,v_k}
for ( int i = 0; i < num_vertices; ++i ) {
for ( int j = 0; j < num_vertices; ++j ) {
tc[i][j] = tc[i][j] || (tc[i][k] && tc[k][j]);
Transitive Closure (TC)
TC(G): each edge indicates the reachability information.
Transitive Closure (TC)
TC(G): each edge indicates the reachability information.
• It can be done by dynamic programming algorithm Floyd– Warshall in O(𝑛3)
• It takes 𝑂(𝑛2) space TOO BAD
BUT, queries can be answered in constant time 𝑂(1)
Tree Cover
Reachability in a Tree
What if the DAG we are dealing with is just a tree?
Reachability in a Tree
What if the DAG we are dealing with is just a tree?
Main idea:
For each node in the tree, we assign a label to indicate the nodes reachable.
Implementation:
Conduct a post-order traversal on the tree and record the post-order number.
order number of its descendants.
For each node, record the minimum post-
Reachability in a Tree
Pseudo code for post-order-traversal
post-order-traversal(root):
for each v of root’s children from left to right:
// traverse the subtree rooted at v
post-order-traversal(v) visit root
We use p(u) to denote the post-order number of u.
An example is shown on the right.
Postorder: left – right -root
Reachability in a Tree
For each vertex, we compute the minimum post-order
number of its subtree. [1,8]
?(𝑢𝑠𝑡𝑎𝑟𝑡 ≤ 𝑣𝑒𝑛𝑑 < 𝑢𝑒𝑛𝑑) [1,3] [5,5] gh
For each vertex u, we construct an interval [p(v’), p(u)] as its label, where v’ is the descendant of u with the smallest post- order number.
What do you observe?
Property of post-order numbers: e Query Processing: ?(u↝v) ⟹
If u is a descendant of v, then p(u) < p(v)
Quick Exercise
Assign the post-order numbers for this tree:
Quick Exercise
ANSWER: the post-order numbers for this tree: 12
𝐵 𝐶 𝐷7 𝐸9 𝐹11
13𝐻𝐼𝐽𝐾 𝐺 4 8 10
Tree Cover
How to generalize the above steps to any DAG ?
Main idea:
1. Consider a spanning tree (tree cover) of the DAG.
2. Go through the above steps for the tree.
3. Recover the non-tree edges and use them to pass on the reachability information.
We assume the DAG 𝐺 has only one connected component.
If 𝐺 contains multiple connected components, we connect them to a virtual root node.
A general DAG 𝐺
Tree Cover
The tree T we considered is also a spanning tree of the given DAG G. Thus, step one and step two are already completed.
Now we need to restore the non-tree edges:
Topological sort the vertices.
For each vertex q in reverse topological order:
for each edge (p,q) add the intervals of q to the intervals of p.
(1) need to consider both tree- and non-tree edges (2) remove the subsumed intervals
Tree Cover
A topological ordering of the vertices:
a, d, b, c, f, e, g, h
First consider vertex h, which has incoming edges (d, h), (c, h), and (e, h).
Considering (e, h), no need to change the interval of e.
Add edge (d, h): e We add the interval associated with h to d.
Tree Cover
Add edge (c, h):
We add the interval associated with h to c.
[6,7] [2,2]
Tree Cover
The next vertex to consider is g.
Among its incoming edges (d, g), (a, g), and (e, g), we consider (d, g), and (a, g) because (e, g) does not change the interval of e.
Add edge (d, g):
We add the interval [1,1] to d.
[6,7] [2,2]
[5,5] [2,2]
Tree Cover
The next vertex to consider is g.
(e, g) does not change the interval of e. We consider (d, g), and (a, g):
Add edge (a, g):
We DO NOT add the interval [1,1] to a. This is because [1,1] is contained by [1,8].
[6,7] [2,2]
[5,5] [2,2]
Tree Cover
The next vertex to consider is e.
(b, e) does not change the interval of b. We consider the edge: (d, e)
Add edge (d, e):
We add the interval [1,3] to d.
Since [1,1] and [2,2] are contained by [1,3], we only keep [1 ,3]
[5,5] [2,2]
Tree Cover
The next vertex to consider is f.
(d, f) does not change the interval of d.
The next vertex to consider is c.
(a, c) does not change the interval of a.
The next vertex to consider is b.
(a, b) does not change the interval of a.
Its incoming non-tree edge: (d, b). e
Add edge (d, b): [1,3] We add the interval [1,4] to d. Since [1,3] is g contained by [1,4], we keep [1, 4].
[5,5] [2,2]
Tree Cover
The next vertex to consider is d.
(a, d) does not change the interval of a.
The next vertex to consider is a.
It does not have any incoming edges.
how many intervals are used in this compression scheme?
[1,4] [6,7]
[5,5] [2,2]
A compression scheme for 𝐺
Tree Cover
What if there is one more edge from h->g?
1. It will change the topological order (process g first then h)
2. Add the interval of g to h
3. When processing the incoming-edges of h,
[1,4] [6,7]
A compression scheme for 𝐺
remember to update the new intervals! e [1,3]
[5,5] [2,2] [1,1]
Optimal Tree Cover
are all spanning trees (tree covers) equally good?
Optimality:
the tree cover with the minimum number of intervals in the resulting compression scheme.
An optimal tree cover is shown here. Construct the associated compression scheme.
An optimal tree cover
Optimal Tree Cover
Assign post-order number:
Optimal Tree Cover
For each vertex, we compute the
minimum post-order number of its subtree and assign an interval as the reachability label.
Optimal Tree Cover
Follow reverse topological order and recover the non-tree edges.
A topological ordering of the vertices: a, d, b, c, f, e, g, h
First consider vertex h, which has incoming non-tree edges
(d, h), (c, h) and tree edge (e, h).
(e, h) does not change the interval of e.
Add edge (d, h):
We add the interval associated with h to d.
[2 ,2] is subsumed by [1, 6].
[2,2] [7,7]
Optimal Tree Cover
Add edge (c, h):
We add the interval associated with h to c.
Optimal Tree Cover
The next vertex to consider is g. We consider (d, g), and (a, g):
(e, g) does not change the interval of e.
Add edge (d, g):
We DO NOT add the interval [1,1] to d because it is subsumed by [1 ,6].
[1,4] b e [1,3]
Optimal Tree Cover
Add edge (a, g):
We DO NOT add the interval [1,1] to a because it is subsumed by [1 ,8].
Optimal Tree Cover
The next vertex to consider is e. We consider (b, e) and (d, e):
(b, e) does not change the interval of b.
Add edge (d, e):
We DO NOT add the interval [1, 3] to d because it is subsumed by [1, 6].
Optimal Tree Cover
The vertices f, c, and d do not have any incoming edges that can make any changes.
The next vertex to consider is b, which has one incoming non-tree edge (a, b) and tree-edge (d, b).
(d, b) does not change the interval of d.
Add edge (a, b):
We DO NOT add the interval [1, 4] to a because it is subsumed by [1, 8].
Optimal Tree Cover
how many intervals are used in this compression scheme?
Compared to the previous compression scheme, what do you observe?
[2,2] [7,7]
Computing Optimal Tree Cover
https://dl.acm.org/doi/pdf/10.1145/66926.66950
[2,2] [7,7]
Intuition: Make the tree like a path (unbalanced)
How to compute optimal tree cover is optional.
Optimal Tree is not optimal?
Practical Optimization:
[2,2] [1,2] [1,1]
The number of intervals in the optimal tree cover may not be the smallest when merging intervals are allowed.
Do not need to merge intervals in assignment/exam.
[1,4] d [6,7]
[5,5] [2,2] [1,1]
[2,2] [1,1]
A compression scheme for 𝐺
Complexity analysis
Query time: 𝑂(𝑛)
Each vertex 𝑢 has at most 𝑛 intervals. Iterate through them and check if 𝑣 is contained by one of them.
Index construction time: 𝑂(𝑛×𝑚)
The dominating cost: for each non-tree edge (𝑢, 𝑣), attach the intervals of v to u, which takes 𝑂(𝑛) time. The number of non-tree edges is bounded by 𝑂(𝑚). Thus, the time complexity to build a compression scheme is 𝑂(𝑛×𝑚)
Space complexity: 𝑂(𝑛!)
In the worst case, the space complexity of a tree cover is the same as the transitive closure, but in practice its storage cost is much smaller.
Tree Cover results
Two-Hop Cover
2-Hop Cover
An index which compresses transitive closure…
Intuition: if we choose a node u as a center node, then all u’s ancestors can reach u’s descendants.
2-Hop Cover
An index which compresses transitive closure…
Intuition: if we choose a node u as a center node, then all u’s ancestors can reach u’s descendants.
Example: So if we choose node 1 as a center node, each of its ancestors of {0, 2, 3} can reach any node in its descendants of
302 Original Graph G
48 TC(G) 22T2
2-Hop Cover
Based on that, we can label nodes as follows:
– each node u is assigned two label sets 𝐿'( 𝑢 ⊆ 𝑉 and 𝐿)*+(𝑢) ⊆ 𝑉 – for each 𝑣 ∈ 𝐿)*+(𝑢), it indicates that node 𝑢 reaches node 𝑣.
– for each 𝑣′ ∈ 𝐿'((𝑢), it indicates that node 𝑣′ reaches node 𝑢.
A 2-hop cover includes two label sets 𝐿)*+ and 𝐿'( that can cover all the edges in TC(G)….
A possible 2-hop cover
2-Hop Cover: Query Processing
Now reachability queries can be answered using the labels: –?𝑢↝𝑣
if 𝐿𝑜𝑢𝑡(𝑢)⋂𝐿𝑖𝑛(𝑣) ≠ ∅ then return true if 𝐿𝑜𝑢𝑡(𝑢)⋂𝐿𝑖𝑛(𝑣) = ∅ then return false
2-Hop Cover: Query Processing
Now reachability queries can be answered using the labels: –?𝑢↝𝑣
if 𝐿𝑜𝑢𝑡(𝑢)⋂𝐿𝑖𝑛(𝑣) ≠ ∅ then return true if 𝐿𝑜𝑢𝑡(𝑢)⋂𝐿𝑖𝑛(𝑣) = ∅ then return false
– Time complexity is 𝑂(|𝐿)*+ 𝑢 | + |𝐿'( 𝑣 |)
More about time complexity:
𝑂(|𝐿”#$ 𝑢 | + |𝐿%& 𝑣 |) : Hash table
𝑂(log( 𝐿”#$ 𝑢 )|𝐿”#$ 𝑢 | + log(|𝐿%& 𝑣 |)|𝐿%& 𝑣 |) : Sort-merge join 𝑂(min( 𝐿”#$ 𝑢 , 𝐿%& 𝑣 )) : Precomputed hash table
𝑂(|𝐿”#$ 𝑢 | + |𝐿%& 𝑣 |) : Precomputed order + merge join
2-Hop Cover: Query Processing
Now reachability queries can be answered using the labels: –?𝑢↝𝑣
if 𝐿𝑜𝑢𝑡(𝑢)⋂𝐿𝑖𝑛(𝑣) ≠ ∅ then return true if 𝐿𝑜𝑢𝑡(𝑢)⋂𝐿𝑖𝑛(𝑣) = ∅ then return false
For example,
2-Hop Cover: Query Processing
Now reachability queries can be answered using the labels: –?𝑢↝𝑣
if 𝐿𝑜𝑢𝑡(𝑢)⋂𝐿𝑖𝑛(𝑣) ≠ ∅ then return true if 𝐿𝑜𝑢𝑡(𝑢)⋂𝐿𝑖𝑛(𝑣) = ∅ then return false
For example,
Lout(0) ⋂ Lin(5) = {1, 4, 0} ⋂ {1, 4, 5} ≠ ∅ YES
2-Hop Cover: Query Processing
Now reachability queries can be answered using the labels: –?𝑢↝𝑣
if 𝐿𝑜𝑢𝑡(𝑢)⋂𝐿𝑖𝑛(𝑣) ≠ ∅ then return true if 𝐿𝑜𝑢𝑡(𝑢)⋂𝐿𝑖𝑛(𝑣) = ∅ then return false
For example,
Lout(0) ⋂ Lin(5) = {1, 4, 0} ⋂ {1, 4, 5} ≠ ∅ YES
2-Hop Cover: Query Processing
Now reachability queries can be answered using the labels: –?𝑢↝𝑣
if 𝐿𝑜𝑢𝑡(𝑢)⋂𝐿𝑖𝑛(𝑣) ≠ ∅ then return true if 𝐿𝑜𝑢𝑡(𝑢)⋂𝐿𝑖𝑛(𝑣) = ∅ then return false
For example,
Lout(0) ⋂ Lin(5) = {1, 4, 0} ⋂ {1, 4, 5} ≠ ∅ YES Lout(0) ⋂ Lin(2) = {1, 4, 0} ⋂ {2} = ∅ NO
2-Hop Cover Index: Minimum VS Minimal
When we say something is minimum, that means it is the globally smallest. When we say some thing is minimal, that means it has no redundancy.
Compute the minimum 2-hop cover index is NP-hard.
Minimal Minimum
Conceptually, using all reachable vertices as the label is also a 2-hop cover index, but it is not minimal.
2-Hop Cover: the minimal index
Naive Index
Minimal Index
Total-order-based 2-Hop Cover
An algorithm to compute a minimal 2-hop cover
For each node u in the graph from high-degree to low-degree: • add u into both Lin(u) and Lout(u);
• mark u as processed;
• conduct BFS from u and for each reached node w:
– if (u,w) has been covered: stop exploring out-neighbors of w;
– else: add u into Lin(w);
• conduct reverse BFS from u and for each reached node w’:
– if (w’,u) has been covered: stop exploring in-neighbors of w’; – else: add u into Lout(w’);
2-Hop Cover – Example
After choosing node 1, we add it at
𝐿'( 1 , 𝐿)*+ 1 𝐿'( 4,𝐿'( 5
𝐿)*+ 0, 𝐿)*+ 2, 𝐿)*+ 3
2-Hop Cover – Example
Then we choose node 4, we add it at
𝐿'( 4 , 𝐿)*+ 4
𝐿)*+ 0 ,𝐿)*+ 3
0,4 𝑖𝑠𝑐𝑜𝑣𝑒𝑟𝑒𝑑𝑏𝑦1 3,4 𝑖𝑠𝑐𝑜𝑣𝑒𝑟𝑒𝑑𝑏𝑦1
2-Hop Cover – Example
Then we choose node 0, we add it at
𝐿'( 0 , 𝐿)*+ 0 𝐿'( 3
2-Hop Cover – Example
Then we choose node 3, we add it at
𝐿'( 3 , 𝐿)*+ 3
2-Hop Cover – Example
Then we choose node 3, we add it at
𝐿'( 3 , 𝐿)*+ 3
Then we choose node 5, we add it at
𝐿'( 5 , 𝐿)*+ 5
2-Hop Cover – Example
Then we choose node 3, we add it at
𝐿%& 3 , 𝐿”#$ 3
Then we choose node 5, we add it at
𝐿%& 5 , 𝐿”#$ 5
Finally, we choose node 2, we add it at
𝐿%& 2 , 𝐿”#$ 2
Quick Exercise
1. Canyoucomputethe2-hopcoverofthisgraph? • Note that you need to process the nodes in the order of
1, 2, 4, 3, 0
2. Basedonthecomputed2-hopcover,please
compute ? 0 ↝ 2 and ? 1 ↝ 3
Quick Exercise
We start with 1 and add it in
𝐿'( 1 , 𝐿)*+ 1 𝐿'( 4,𝐿'( 3
𝐿)*+ 0 , 𝐿)*+ 2
Quick Exercise
Then, we process 2 and add it in
𝐿'( 2 , 𝐿)*+ 2 𝐿'( 0
Quick Exercise
Then, we process 4 and add it in
𝐿'( 4 , 𝐿)*+ 4 𝐿'( 3
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com