Mining Social Networks
COMPSCI 753 Kaiqi Zhao
Community detection
§ Girvan-Newman algorithm § Spectral clustering
§ Modularity maximization
45
Modularity
§ The actual fraction of the edges that fall within the given groups minus the expected fraction if edges were distributed at random
§ Note: The node degree and number of edges in the graph should be invariant in random graphs
§ Example 1
4 3
2
1311 2424
2433 stubs Random graphs
Group 2
Actual Group 1
46
Expected number of edges
§ Compute the expected number of edges between any two nodes 𝑣 and 𝑤
… …
v … …w
𝑑/ 𝑑0
§ For each stub of 𝑣 can connect one of the 2𝑚 − 1 stubs §Let𝑋& betheeventthatthei-thstubof𝑣linkstoany
stubof𝑤.Then 𝑑,
𝐸 𝑋& =1⋅𝑝 𝑋& =1 +0⋅𝑝(𝑋& =0)=2𝑚−1
§ The expected number of edges between the two nodes:
𝐸 >𝑋& =>𝐸[𝑋&]= 𝑑!𝑑, ≈𝑑!𝑑, & & 2𝑚−12𝑚
47
Modularity
§ Given adjacency matrix 𝑀 of the graph, the difference between the actual and expected number of edges between two nodes 𝑣 and 𝑤
𝑀tu − v”v# >w
§ Suppose we a set 𝐶 of communities, the modularity is:
𝑄x = B ∑V∈x ∑t,u∈V(𝑀tu − v”v#) = ∑V∈x(𝑒VV − 𝑎V>) >w >w
𝑒)* = ∑/∈),0∈* 2!” is the fraction of edges connect community 𝑖 to 𝑗 (3
𝑎) = ∑* 𝑒),* is the fraction of edges that has at least one end point in 𝑖
48
Modularity maximization
§ Find a community partition such that the modularity is maximized
§ Directly optimizing the modularity is NP-hard – we have exponential
different numbers of partitions
§ Greedy algorithm (Newman, 2003)
§ Hierarchical clustering
§ Each time we merge two nodes if grouping them results in the largest Δ𝑄
49
Greedy algorithm
§ Each node is considered as a community
§ Merge 2 nodes that results in the largest Δ𝑄
Δ𝑄 = 𝑒BC +𝑒CB −2𝑎B𝑎C = 2(𝑒BC −𝑎B𝑎C) § Update modularity 𝑄
50
Example
§ Calculate Δ𝑄 = 2(𝑒VW − 𝑎V𝑎W)
1,3 ⇒ 2 0.1 − 0.2 ∗ 0.2 = 0.12 2,3 ⇒ 2 0.1 − 0.1 ∗ 0.2 = 0.16 1,5 ⇒ 2 0.1 − 0.2 ∗ 0.3 = 0.08 4,5 ⇒ 2 0.1 − 0.1 ∗ 0.3 = 0.14
5,6 ⇒ 2 0.1 − 0.3 ∗ 0.1 = 0.14
§ Max Δ𝑄 = 0.16, combine {2,3} § The new 𝑄 = −0.04
5
6
0 0 0.1 0 0.1 0
0 0 0.1 0 0 0 0.1 0.1 0 0 0 0 0 0 0 0 0.1 0
𝑎)
1
2
3
4
𝑒)*:
0.1 0 0 0.1 0 0.1
0 0 0 0 0.1 0
51
Example
§ Calculate Δ𝑄 = 2(𝑒VW − 𝑎V𝑎W)
1,{2,3} ⇒2 0.1−0.2∗0.3 =0.08 1,5 ⇒2 0.1−0.2∗0.3 =0.08
4,5 ⇒ 2 0.1 − 0.1 ∗ 0.3 = 0.14
5,6 ⇒ 2 0.1 − 0.3 ∗ 0.1 = 0.14
§ Max Δ𝑄 = 0.14, combine {4,5} or {5,6}. § Let’s say we first combine {5,6}
§ The new 𝑄 = 0.1
1
5
2 4 6 3
0 0.1 0 0.1 0 0.1 0.2 0 0 0 𝑒)*: 0 0 0 0.1 0
0.1 0 0.1 0 0.1 0 0 0 0.1 0
52
Example
§ Calculate Δ𝑄 = 2(𝑒VW − 𝑎V𝑎W)
1,{2,3} ⇒2 0.1 − 0.2 ∗ 0.3 =0.08 1,{5,6} ⇒2 0.1 − 0.2 ∗ 0.4 =0.04 4,{5,6} ⇒2 0.1 − 0.1 ∗ 0.4 =0.12
5
2 4 3
1
6
§ Max Δ𝑄 = 0.12, combine {4,5,6}. § The new 𝑄 = 0.22
0 0.1 0 0.1 0.1 0.2 0 0
𝑒)*:
0.1 0 0.1 0.2
0 0 0 0.1
53
Example
§ Calculate Δ𝑄 = 2(𝑒VW − 𝑎V𝑎W)
1,{2,3} ⇒2 0.1 − 0.2 ∗ 0.3 =0.08 1,{4,5,6} ⇒2 0.1−0.2∗0.5 =0
1
3
5
2
4
6
0 0.1 0.1 𝑒)*: 0.1 0.2 0
0.1 0 0.4
§ Max Δ𝑄 = 0.08, combine {1,2,3}. § The new 𝑄 = 0.3
54
Example
§ Calculate Δ𝑄 = 2(𝑒VW − 𝑎V𝑎W)
{1,2,3},{4,5,6} ⇒2 0.1−0.5∗0.5 =−0.3
1
2
3
5
6
4
§ Max Δ𝑄 = −0.3, combine all in a group. 𝑒)*: §Thenew𝑄=0
This algorithm generates a hierarchy of community!
Time complexity – O((|E|+|V|)|V|)
0.4 0.1 0.1 0.4
55