Spectral Clustering
CSE, UNSW
April 15, 2021
1/16
CSE, UNSW
Spectral Clustering
Quadratic Form
Let A be some n × n matrix.
What is Ax? What’s the type of the output? What may x represent?
Some numeric assignment to {1,2,…,n} (i.e., think of xi as x(i)).
E.g., what if xi ∈ {0,1}? xi ∈ [0,1]? xi ∈ R?
What is x⊤Ax? What’s the type of the output? Why it is called a qudratic form?
2/16
Spectral Clustering
CSE, UNSW
Quadratic Form
Let A be some n × n matrix.
What is Ax? What’s the type of the output? What may x represent?
Some numeric assignment to {1,2,…,n} (i.e., think of xi as x(i)).
E.g., what if xi ∈ {0,1}? xi ∈ [0,1]? xi ∈ R?
What is x⊤Ax? What’s the type of the output? Why it is called a qudratic form?
x⊤Ax=i,j Aij ·(xixj) Exercise:
Rewrite f1(x) = (3×1 − 2×2) + 4×32 into a quadratic form. Rewrite f2(x) = (3×1 − 2)2 + (x2 + x1)2 into a quadratic form.
2/16
Spectral Clustering
CSE, UNSW
Unnormalized Graph Laplacian /1
(See the example graph later) Let A is the adjacency matrix of a “normal” (unweighted) undirected graph G. V are the vertices of G and E are the edges of G
An edge between vi and vj is modelled as (i, j) and (j, i), i.e., Aij =Aji =1.
Aii= ?
Write out A for the example graph.
How many edges are there in the example graph? 16
3/16
Spectral Clustering
CSE, UNSW
Unnormalized Graph Laplacian /2
What is x⊤Inx?
What is x⊤Dx, where D = Diag(d1,d2,…,dn) and
di = deg(vi) = (i,j)∈E wij? What is x⊤Ax?
Now what about 2(x⊤Dx − x⊤Ax) ?
4/16
Spectral Clustering
CSE, UNSW
Unnormalized Graph Laplacian /3
x⊤Ix = i xixi
x⊤Dx = i di · xi xi = e xi xi .
x⊤Ax=(i,j)∈Exixj =exixj
2(x⊤Dx−x⊤Ax)=e(2xi2 −2xixj)=
e xi2 + e xj2 − e 2xi xj = e (xi − xj )2
5/16
Spectral Clustering
CSE, UNSW
Example
x⊤Lx = 12 · (xi − xj )2 , where L = D − A. eij ∈E
l2 differences between assignments on the two ends of an edge, summed over all edges.
6/16
Spectral Clustering
CSE, UNSW
Example
24 16
35
n1 n2 n3 n4 n5 n6 n1
n2 n3 n4 n5 n6
1n is the one vector.
L1n = (NB: L⊤ = L) =⇒ λ1 = 0, v1 = 1n x⊤Lx =
7/16
Spectral Clustering
CSE, UNSW
Binary x induces a Clustering /1
1 -1
24
1 -1
16
1 -1
35
x= x⊤Lx =
8/16
Spectral Clustering
CSE, UNSW
Binary x induces a Clustering /2
-1
1 -1
-1
24 16
11
35
x= x⊤Lx = x⊤x =
9/16
Spectral Clustering
CSE, UNSW
Min Cut vs. Normalized Cut
Min cuts are not always desirable.
Biased towards cutting small sets of isolated nodes.
Cut: cut(A, B) = vi ∈A,vj ∈B wij . Normalized cut:
ncut(A,B) = cut(A,B) + cut(A,B), vol (A) vol (B )
wherevol(A)=vi∈Adi =vi∈A,vj∈Vwi,j. Spectral Clustering
10/16
CSE, UNSW
Connection to L
ncut(A,B) = cut(A,B) 1 + 1
vol (A) vol (B )
Letxi = 1 ifvi ∈A,and= −1 otherwise.
vol (A) vol (B ) x⊤Lx=wij(xi−xj)2=0+ 1 + 1 2
e vi ∈A,vj ∈B x⊤Dx = (x⊤D)x = e dixi2 =
vol(A) vol(B)
di + dj = 1 + 1 vi∈A vol(A)2 vj∈B vol(B)2 vol(A) vol(B)
ncut(A, B) = x⊤Lx x⊤ Dx
11/16
Spectral Clustering
CSE, UNSW
Relaxation and Optimization
x⊤Lx 1 −1 Minimize ncut(A,B) = x⊤Dx Subject to xi ∈ vol(A), vol(B)
NP-hard to optimize under the discrete constraint.
Relaxation: grow the feasible region of x and find the minimum value within the enlarged region.
allow x to be a real vector?
Yes, but too large.
This gives the constraint: x⊤D1 = 0 or equivalently x⊤D ⊥ 1 (You can verify this by plugging in any discrete vectors)
Solution: the second smallest eigenvector of the generalized eigen value problem Lx = λDx.
Normalized Laplacian:
L′ = D−12 (D−W)D−21 = I−D−12 WD−12
12/16
Spectral Clustering
CSE, UNSW
Spectral Clustering Algorithm Framework
Algorithm SC recursive bin cut(data,k) Construct the weighted graph G
Construct the special graph laplacian L for G.
Compute the smallest non-zero eigenvector for L. This is the new representation of vertices in a new 1-dimensional space (i.e., embedding).
Cluster the vertices in the embedding space according to the objective function.
For each cluster, recursively call the algorithm if more clusters are needed.
13/16
Spectral Clustering
CSE, UNSW
Spectral Clustering Algorithm Framework
Algorithm SC k way cut(data,k) Construct the weighted graph G
Construct the special graph laplacian L for G.
Compute the smallest t non-zero eigenvector for L. This is the new representation of vertices in a new t-dimensional space (i.e., embedding).
Cluster the vertices in the embedding space using another clustering algorithm (e.g., k-means)
14/16
Spectral Clustering
CSE, UNSW
Notes on the Algorithms
How to construct the weighted graph if only n objects are given?
Be based on the similarity or distance among objects. E.g.,w =exp(∥f(oi)−f(oj)∥)wheref(o)isthefeaturevector
ij 2σ2
of object o. One can also induce a sparse graph if one caps the raw weights by a threshold.
Which Laplacian to use?
Unnormalized graph laplacian L = D − W. Normalized graph laplacian L = D− 12 (D − W)D− 12 .
15/16
Spectral Clustering
CSE, UNSW
Comments on Spectral Clustering
Pros:
Usually better quality than other methods.
Can be thought of (non-linear) dimensionality reduction or embedding.
Freedom to construct a (sparse) G to preserve local similarity/connectivity.
Only requires some similarity measure.
Could be more efficient than k-means for high-dimensional sparse vectors (esp. if k-means is not fully optimized for such case).
Cons:
Still need to determine k
Assumes clusters are of similar sizes.
Does not scale well with large datasets; but more scalable variants exist.
One of the relaxation of the original NP-hard problem – may not be the tightest relaxation.
16/16
Spectral Clustering
CSE, UNSW