Mining Social Networks
COMPSCI 753 Kaiqi Zhao
Community detection
§ Girvan-Newman algorithm § Spectral clustering
§ Modularity maximization
24
Community detection using graph partitioning
§ Graph cut
§ Bi-partitioning – Consider partitioning the graph into two parts 𝐴 and 𝐵
𝐴
𝐶𝑢𝑡(𝐴, 𝐵)
𝐵
§ A cut 𝐶𝑢𝑡(𝐴, 𝐵) partition the graph into two disjoint groups 𝐴, 𝐵 § 𝐶𝑢𝑡 𝐴,𝐵 =∑&∈’,)∈*𝑤&) =1+1=2
25
What is a good partition?
§ We want each community to be dense subgraph § Maximize the number of within-group connections
§ We want to make each community has as less connections with other communities as possible
§ Minimize the number of between-group connections 𝐶𝑢𝑡(𝐴, 𝐵)
𝐴
𝐵
26
Graph partitioning criterion
§ Criterion 1: Min-cut
§ Minimize weight of connections between groups:
arg min 𝑐𝑢𝑡(𝐴, 𝐵) H,I
𝐴
Optimal cut
Min-cut
𝐵
§ Problems:
§ Only considers between-group connections § Imbalance groups
27
Graph partitioning criterion
§ Criterion 2: RatioCut
§ Normalize the value by the size of both 𝐴 and 𝐵 § Minimize weight of connections between groups:
𝑅𝑎𝑡𝑖𝑜𝐶𝑢𝑡 𝐴,𝐵 =𝑐𝑢𝑡 𝐴,𝐵 +𝑐𝑢𝑡 𝐴,𝐵 |𝐴| |𝐵|
𝐴
Cut1 11
𝐵 𝑅𝑎𝑡𝑖𝑜𝐶𝑢𝑡 𝐴%&’!, 𝐵%&’! = 9 + 1 ≈ 1.11
𝑅𝑎𝑡𝑖𝑜𝐶𝑢𝑡 𝐴%&'(, 𝐵%&'( = 25 + 25 = 0.8
Cut2
28
Graph partitioning criterion
§ Criterion 3: Normalized-cut (NCut) [Shi-Malik, ’97] § Normalize the value by the volume of both 𝐴 and 𝐵 § Minimize weight of connections between groups:
𝑁𝐶𝑢𝑡 𝐴,𝐵 =𝑐𝑢𝑡 𝐴,𝐵 +𝑐𝑢𝑡 𝐴,𝐵 𝑣𝑜𝑙 𝐴 𝑣𝑜𝑙 𝐵
3 4 5 Cut1 𝑁𝐶𝑢𝑡 𝐴 , 𝐵 = 1 + 1 ≈ 1.032 441 %&’!%&’!311
Degree of a node: 𝑑) =R𝑤)*
*
Volume of a set:
𝑣𝑜𝑙𝐴 =R𝑑) )∈,
𝐴
Cut2
𝐵 𝑁𝐶𝑢𝑡 𝐴%&'(,𝐵%&'( = 2 + 2 = 0.25 16 16
3 2 3 3
Problem: Finding minimum RatioCut and NCut is NP-hard!
29
Spectral graph partitioning
§ Adjacency matrix 𝑾:
§ Each entry 𝑤VW = 1, if (𝑖,𝑗) is an edge, else 0
1
2
3
4
5
6
1
0
1
1
0
1
0
2
1
0
1
0
0
0
3
1
1
0
1
0
0
4
0
0
1
0
1
1
5
1
0
0
1
0
1
6
0
0
0
1
1
0
1
2
3
4
5
6
30
Spectral graph partitioning
§ Degree matrix 𝑫:
§ Diagnoal matrix with 𝑑VV = ∑W 𝑤VW
𝑾
1
2 4 3
𝑫
5
6
1
2
3
4
5
6
1
0
1
1
0
1
0
2
1
0
1
0
0
0
3
1
1
0
1
0
0
4
0
0
1
0
1
1
5
1
0
0
1
0
1
6
0
0
0
1
1
0
1
2
3
4
5
6
1
3
0
0
0
0
0
2
0
2
0
0
0
0
3
0
0
3
0
0
0
4
0
0
0
3
0
0
5
0
0
0
0
3
0
6
0
0
0
0
0
2
31
Spectral graph partitioning
§ Remember the goal of RatioCut:
min 𝑅𝑎𝑡𝑖𝑜𝐶𝑢𝑡(𝐴, 𝐵) = min 𝑐𝑢𝑡(𝐴, 𝐵)( B + B )
Adjacency matrix 𝑾: Each entry 𝑤)* = 1, if
(𝑖, 𝑗) is an edge, else 0 Degree matrix 𝑫:
Diagnoal matrix with
𝑑)) = ∑* 𝑤)*
§ Let 𝒚 = 𝑦B, … , 𝑦Z [ where 𝑦V =
− |H|,𝑖𝑓𝑣V∈𝐵
§ Approx. the RatioCut problem to optimize the following: min^,_𝑅𝑎𝑡𝑖𝑜𝐶𝑢𝑡𝐴,𝐵 →min𝒚[ 𝑫−𝑾𝒚, 𝑠.𝑡.𝒚⊥𝟏
More on theoretical analysis on spectral method:
Ulrike von Luxburg. A Tutorial on Spectral Clustering
|H| |I|
|I| , 𝑖𝑓 𝑣V ∈ 𝐴 |H|
|I|
𝒚∈R!
32
Spectral graph partitioning
min𝒚[ 𝑫−𝑾𝒚, 𝑠.𝑡.𝒚⊥𝟏 𝒚∈R!
§ What is the meaning of min 𝒚[(𝑫 − 𝑾)𝒚? §𝒚[(𝑫−𝑾)𝒚=∑V,W 𝑑VW−𝑤VW 𝑦V𝑦W
§ = ∑ V 𝑑 V V 𝑦 V> − ∑ V , W ∈ b 2 𝑦 V 𝑦 W §=∑V,W∈b(𝑦V>+𝑦W>−2𝑦V𝑦W)=∑V,W∈b 𝑦V−𝑦W
By minimizing 𝒚[(𝑫 − 𝑾)𝒚, we try to make the two ends of an edge assigned to the same label!
>
33
Recap: eigenvalue and eigenvectors
§ Given an 𝑛×𝑛 matrix 𝑴, a 𝑛-dimension column vector 𝒙 and a real value number 𝜆. 𝜆 and 𝒙 are the eigenvalue and eigenvector of 𝑀 if
𝑴𝒙 = 𝜆𝒙
§ If 𝑴 is symmetric, 𝜆 and 𝒙 have the following properties: §Thereare𝑛eigenvalues𝜆B ≤𝜆> ≤⋯≤𝜆Z
§ Any two distinct eigenvectors are orthogonal, i.e., 𝒙V ⊥ 𝒙W , 𝑖 ≠ 𝑗
34
Laplacian matrix
§ The graph partitioning objective min𝒚[ 𝑫−𝑾𝒚, 𝑠.𝑡.𝒚⊥𝟏
𝒚∈R!
§ Let 𝑳 = 𝑫 − 𝑾. 𝑳 is called Laplacian matrix
𝑾𝑫𝑳
1
2
3
4
5
6
1
0
1
1
0
1
0
2
1
0
1
0
0
0
3
1
1
0
1
0
0
4
0
0
1
0
1
1
5
1
0
0
1
0
1
6
0
0
0
1
1
0
1
2
3
4
5
6
1
3
0
0
0
0
0
2
0
2
0
0
0
0
3
0
0
3
0
0
0
4
0
0
0
3
0
0
5
0
0
0
0
3
0
6
0
0
0
0
0
2
1
2
3
4
5
6
1
3
-1
-1
0
-1
0
2
-1
2
-1
0
0
0
3
-1
-1
3
-1
0
0
4
0
0
-1
3
-1
-1
5
-1
0
0
-1
3
-1
6
0
0
0
-1
-1
2
35
Laplacian matrix
§ The graph partitioning objective min 𝒚[𝑳𝒚, 𝑠.𝑡. 𝒚 ⊥ 𝟏
§ Properties of 𝑳 of undirected graph: § Symmetric
§ The smallest eigenvalue 𝜆B = 0, with corresponding eigenvector 𝟏
§ For other eigenvectors 𝒙 = 𝑥B,…,𝑥Z , 𝒙 ⊥ 𝟏, i.e.,
𝑳
1
2
3
4
5
6
1
3
-1
-1
0
-1
0
2
-1
2
-1
0
0
0
3
-1
-1
3
-1
0
0
4
0
0
-1
3
-1
-1
5
-1
0
0
-1
3
-1
6
0
0
0
-1
-1
2
𝒚∈R!
∑Z 𝑥=0 VdB V
36
Laplacian matrix
§ Graph partitioning objective with Laplacian
min 𝒚[𝑳𝒚, 𝑠.𝑡. 𝒚 ⊥ 𝟏 𝒚∈R!
The minimum value is given by the 2nd smallest eigenvalue 𝜆+ of matrix 𝑳, with 𝒚 be the corresponding eigenvector
37
Spectral clustering (bi-partitioning)
§ Compute Laplacian matrix for the graph §𝑳=𝑫−𝑾
§ Perform eigenvalue decomposition on 𝐿 and get the eigenvector 𝒚> corresponds to the 2nd smallest eigenvalue.
§ Use each dimension in 𝒚> denotes the label of a node
§ Assign nodes with positive values in 𝒚> to A and nodes with negative values to B.
38
Example: Spectral Partitioning
Fig from http://www.mmds.org/
Rank in y2
39
Value of y2
Example: Spectral Partitioning
Components of y2
Fig from http://www.mmds.org/
Rank in y2 40
Value of y2
Example: Spectral Partitioning
Components of y1
Components of y3
Fig from http://www.mmds.org/
41
Spectral clustering (k-way)
§ Compute §𝑳=𝑫−𝑾
§ Perform eigenvalue decomposition on 𝑳, construct a matrix
§ 𝒀 = [𝒚𝟐, 𝒚𝟑, … , 𝒚𝒌], each column 𝒚𝒊 is the eigenvector of the 𝑖-th
smallest eigenvalue
§ 𝒀 has 𝑛 rows, each corresponds to a data point
§ Use each row to represent a node §𝑣V =𝒀[𝑖,∶]
§ Apply clustering algorithms (e.g., k-means) on 𝑣B, … , 𝑣Z
42
Example
§ Data: 𝑥2, … , 𝑥3, the corresponding graph is 𝐺 § Cluster into 2 groups
5
𝑥(
𝑥!
0520 2𝑥𝑥
1 𝑊=5001 $2 –
2002 𝐺 0120
𝐿=𝐷−𝑊= 0.705
7 −5 −2 0 −5 6 0 −1 −2 0 4 −2 0 −1 −2 3
𝑦( =
0.789 −0.493 −1
𝑥! = [0.705] 𝑥( = [0.789] 𝑥$ = [−0.493] 𝑥- = [−1]
𝑥- 𝑥$ 0 𝑥! 𝑥( 𝑦( 43
Example
§ Data: 𝑥2, … , 𝑥3, the corresponding graph is 𝐺 § Cluster into 3 groups
5
𝑥(
𝑥!
0520 2𝑥𝑥
1 𝑊=5001 $2 –
2002 𝐺 0120
𝐿=𝐷−𝑊=
7 −5 −2 0 −5 6 0 −1 −2 0 4 −2 0 −1 −2 3
𝑦$
𝑥(
0
0.705 −0.125 0.789 𝑦$ = 0.512
𝑥! = 0.705, −0.125 .
𝑥 –
𝑥$
𝑥! 𝑦(
𝑦( =
−0.493 −1
𝑥( = 0.789, 0.512 . −1.387 𝑥$ = −0.493, −1.387 .
1 𝑥- = −1,1 .
44