1 Spam farm
COMPSCI 753 Algorithms for Massive Data Semester 2, 2020
Tutorial – Graph
Kaiqi Zhao
Assume that two spam farmers, each having p = 1 support page and 1 target page, agree to link their spam farms. Assume that each target page has page rank x + 1−β without any
n
spam farm. Compute the page rank of the target page when
1. there is a bidirectional link between each target page and its own support page. 2. there is a bidirectional link between each target page and each support page.
Which case produces higher pagerank for each target page?
Solution: Denote the rank of each support page as z and the rank of each target page as y. The first case is the same as in our lecture, because each spam farm works on their own:
y = x + β2y + (βp + 1)(1 − β), n
wherethenumberofsupportpagep=1. Thenwegety= x +1. 1−β2 n
1
In the second case, for each support page, we have incoming edges from the two target pages, each contributes y , thus we have:
2
z = βy + (1 − β). (1) n
Similarly, for each target page, we have incoming edges from the two support pages, each contributes z , thus we have:
2
y = x + βz + (1 − β). (2) n
By substituting (1) to (2), we get the following: y = x + βz + (1 − β)
n
= x + β[βy + (1 − β)] + (1 − β) nn
= x + β2y + β(1 − β) + (1 − β) nn
(3)
Thus the pagerank of a target page y is: y = x + 1 . The pagerank of each target page 1−β2 n
is exactly the same for both cases.
2 Edge betweenness
Compute the edge betweenness of the edges in the following graph.
Solution:
Use Brandes’ algorithm. Notice that using any of the four nodes as root will result in the same hierarchy.
• Using 1 as root, EB(a) = EB(b) = 1.5 and EB(c) = EB(d) = 0.5 • Using 2 as root, EB(a) = EB(c) = 1.5 and EB(b) = EB(c) = 0.5 • Using 3 as root, EB(b) = EB(d) = 1.5 and EB(a) = EB(c) = 0.5
2
• Using 4 as root, EB(c) = EB(d) = 1.5 and EB(a) = EB(b) = 0.5
To sum up and divided by 2 (each path was considered twice u → … → v and v → … → u),
EB(a) = EB(b) = EB(c) = EB(d) = 2.
3 Spectral clustering
Given the following adjacency matrix of a graph:
1 0.8 0 0
0.8 1 0 0 W = 0 0 1 0.5
0 0 0.5 1
1. Compute the eigenvalues of the Laplacian. You may need to compute the determinant
(refer to https://en.wikipedia.org/wiki/Determinant).
2. What is the best RatioCut for bi-partitioning the graph? Give the two resulting
communities.
Solution: First, we need to compute the degree matrix D and then Laplacian matrix L:
1.8000 0.8−0.80 0 0 1.8 0 0 −0.8 0.8 0 0
D=0 0 1.5 0,L= 0 0 0.5 −0.5
0 0 0 1.5 0 0 −0.5 0.5 Apply the eigen equation |λI − L| = 0:
λ−0.80.8 0 0
0.8λ−0.80 0 |λI−L|= 0 0 λ−0.5 0.5
0 0 0.5λ−0.5
λ−0.8 0 0 0.8 0 0
=(λ − 0.8) 0 λ − 0.5 0.5 − 0.8 0 λ − 0.5 0 0 . 5 λ − 0 . 5 0 0 . 5
=(λ − 0.8)2((λ − 0.5)2 − 0.25) − 0.82((λ − 0.5)2 − 0.25) =(λ2 − 1.6λ)(λ2 − λ)
=λ2(λ − 1.6)(λ − 1) = 0
0.5 λ − 0 . 5
3
So, we have the four roots in ascending order as λ1 = λ2 = 0,λ3 = 1,λ4 = 1.6.
To get the best RatioCut, we need to use λ2 = 0. Let v = [v1, v2, v3, v4]T be the correspond- ing eigenvector: Lv = λ2v = 0. Then we have:
0.8v1 − 0.8v2 = 0
−0.8v1 + 0.8v2 = 0 0.5v −0.5v =0
34
According to above, the eigenvector should have v1 = v2 and v3 = v4. So, node 1 and 2 should have the same community label, and node 3 and node 4 should have the same label. Because we only care about the sign and without loss of generality, we assume v1 = v2 > 0. Then, we have two cases:
For the first case, we assign all the nodes with the same community label, that is not what we want as mentioned in the lecture, because it results in imbalanced communities. For the second case, we assign the first two nodes to one community, and the remaining two nodes to the other. This is the best RatioCut we can do, with cost of zero. The reason is that the original graph has two disconnected components with the same size.
4 Influence Spread
In the lectures, we discussed the influence spread on directed graph. But it is very straight- forward to be applied on undirected graphs. In the undirected graph below, the number on each edge denotes the probability the edge is live. Let the seed set S = {1}, consider calculating the influence spread under the independent cascade model using the expectation over the deterministic graphs.
−0.5v3 + 0.5v4 = 0
v1 =v2 >0andv3 =v4 >0 v1 =v2 >0andv3 =v4 <0
4
Let X = {X12, X13, X24, X34} be the binary label (live=1, blocked=0) for the edges. Fill in the following contingency table with corresponding values. Give the influence spread of the seed set S.
Solution:
X12, X13, X24, X34
0,0,0,0 0.8*0.6*0.5*0.8 = 0.192
0,0,0,1 0.8*0.6*0.5*0.2 = 0.048 0,0,1,0 0.8*0.6*0.5*0.8 = 0.192 0,0,1,1 0.8*0.6*0.5*0.2 = 0.048 0,1,0,0 0.8*0.4*0.5*0.8 = 0.128 0,1,0,1 0.8*0.4*0.5*0.2 = 0.032 0,1,1,0 0.8*0.4*0.5*0.8 = 0.128 0,1,1,1 0.8*0.4*0.5*0.2 = 0.032 1,0,0,0 0.2*0.6*0.5*0.8 = 0.048 1,0,0,1 0.2*0.6*0.5*0.2 = 0.012 1,0,1,0 0.2*0.6*0.5*0.8 = 0.048 1,0,1,1 0.2*0.6*0.5*0.2 = 0.012 1,1,0,0 0.2*0.4*0.5*0.8 = 0.032 1,1,0,1 0.2*0.4*0.5*0.2 = 0.008 1,1,1,0 0.2*0.4*0.5*0.8 = 0.032 1,1,1,1 0.2*0.4*0.5*0.2 = 0.008
σ(S) = 0.48 + (0.316) ∗ 2 + (0.112) ∗ 3 + (0.092) ∗ 4 = 1.816 5 Submodularity
1
1
1
2
3
2
4
2
2
3
4
3
4
4
4
prob[X ] #nodes reachable from S, σX(S) 1
Given that f(S) and g(S) are two non-negative submodular functions:
1. Let α and β be any non-negative real numbers, is σ2(S) = αf(S)+βg(S) submodular?
Show your proof if yes, otherwise, give a counter example.
2. LetA⊂BandB\AbethesetofelementsinBbutnotinA. Showthatf(B)−f(A)≤
v∈B\A f(v). (We use notation f(v) to denote f({v}) for convenience.) Solution:
Let S and T are two sets with S ⊂ T. And v be any node that v ∈/ S and v ∈/ T. 1. Compute the difference in marginal gains:
[σ2(S ∪ {v}) − σ2(S)] − [σ2(T ∪ {v}) − σ2(T )] =α[f(S ∪ {v}) − f(S) − f(T ∪ {v}) + f(T)]
+β[g(S ∪{v})−g(S)−g(T ∪{v})+g(T)] 5
Note that, because f(S) is submodular, f(S ∪ {v}) − f(S) − f(T ∪ {v}) + f(T) ≥ 0, for the same reason, g(S ∪{v})−g(S)−g(T ∪{v})+g(T) ≥ 0. Given that α ≥ 0 and β ≥ 0, then [σ2(S ∪ {v}) − σ2(S)] − [σ2(T ∪ {v}) − σ2(T )] ≥ 0. Therefore, σ2(S) is submodular.
2. We first show that f(S∪{v})−f(S) ≤ f(v) for a set S and any v ∈/ S. By submodularity of f, we have:
f(S ∪ {v}) − f(S) ≤ f(∅ ∪ {v}) − f(∅) = f(v) − f(∅) ≤ f(v). (4) The last inequity use the fact that f(∅) ≥ 0.
LetB\A={v1,v2,...,vk}. Then:
f(B) − f(A) =f(A ∪ {v1, v2, ..., vk}) − f(A) =f(A∪{v1,v2,...,vk})−f(A∪{v2,...,vk})+f(A∪{v2,...,vk})−f(A)
Consider S = A ∪ {v2, ..., vk}, apply (4)
≤f(v1)+f(A∪{v2,...,vk})−f(A)
=f(v1)+f(A∪{v2,...,vk})−f(A∪{v3,...,vk})+f(A∪{v3,...,vk})−f(A)
Consider S = A ∪ {v3, ..., vk}, apply (4) ≤f(v1)+f(v2)+f(A∪{v3,...,vk})−f(A)
≤f(v1) + f(v2) + ... + f(vk) = f(v) v∈B\A
6