1/14
Spectral Clustering
Wei Wang @ CSE, UNSW
May 7, 2018
Wei Wang @ CSE, UNSW Spectral Clustering
2/14
Quadratic Form
Let A be some n × n matrix.
What is Ax? What’s the type of the output? What may x
represent?
Some numeric assignment on all nodes in G .
E.g., what if xi ∈ { 0, 1 }? xi ∈ [0, 1]? xi ∈
What is x>Ax? What’s the type of the output? Why it is
called a qudratic form?
x>Ax =
∑
i,j Ai j · (xixj)
Wei Wang @ CSE, UNSW Spectral Clustering
2/14
Quadratic Form
Let A be some n × n matrix.
What is Ax? What’s the type of the output? What may x
represent?
Some numeric assignment on all nodes in G .
E.g., what if xi ∈ { 0, 1 }? xi ∈ [0, 1]? xi ∈
What is x>Ax? What’s the type of the output? Why it is
called a qudratic form?
x>Ax =
∑
i,j Ai j · (xixj)
Wei Wang @ CSE, UNSW Spectral Clustering
3/14
Unnormalized Graph Laplacian
Let A is the adjacency matrix of a “normal” (unweighted)
undirected graph G . V are the vertices of G and E are the
edges of G
An edge between vi and vj is modelled as (i , j) and (j , i), i.e.,
Ai j = Aj i = 1.
Ai i = ?
What about x>Ix?
What about x>Dx, where D = Diag(d1, d2, . . . , dn) and
di = deg(vi )?
What about x>Ax?
Now what about 2(x>Dx− x>Ax) ?
Wei Wang @ CSE, UNSW Spectral Clustering
4/14
Example
x>Lx =
1
2
·
∑
eij∈E
(xi − xj)2 , where L = D− A.
`2 differences between assignments on the two ends of an
edge, summed over all edges.
Wei Wang @ CSE, UNSW Spectral Clustering
5/14
Example
1
2
3
4
5
6
n1 n2 n3 n4 n5 n6
n1
n2
n3
n4
n5
n6
1 is the one vector.
L1 = (NB: L> = L) =⇒ λ1 = 0, v1 = 1
x>Lx =
Wei Wang @ CSE, UNSW Spectral Clustering
6/14
Binary x induces a Clustering /1
1
1
2
1
3
1
4
-1
5
-1
6
-1
x =
x>Lx =
Wei Wang @ CSE, UNSW Spectral Clustering
7/14
Binary x induces a Clustering /2
1
1
2
-1
3
1
4
-1
5
1
6
-1
x =
x>Lx =
x>x =
Wei Wang @ CSE, UNSW Spectral Clustering
8/14
Min Cut vs. Normalized Cut
Min cuts are not always desirable.
Biased towards cutting small sets of isolated nodes.
Cut: cut(A,B) =
∑
vi∈A,vj∈B wij .
Normalized cut:
ncut(A,B) =
cut(A,B)
vol(A)
+
cut(A,B)
vol(B)
,
where vol(A) =
∑
vi∈A di ) =
∑
vi∈A,vj∈V wi ,j .
Wei Wang @ CSE, UNSW Spectral Clustering
9/14
Connection to L
ncut(A,B) = cut(A,B)
(
1
vol(A)
+ 1
vol(B)
)
Let xi =
1
vol(A)
if vi ∈ A, and = −1vol(B) otherwise.
x>Lx =
∑
e wij(xi − xj)
2 = 0 +
∑
vi∈A,vj∈B
(
1
vol(A)
+ 1
vol(B)
)2
x>Dx = (x>D)x =
∑
E dix
2
i =∑
vi∈A
di
vol(A)2
+
∑
vj∈B
dj
vol(B)2
= 1
vol(A)
+ 1
vol(B)
ncut(A,B) =
x>Lx
x>Dx
Wei Wang @ CSE, UNSW Spectral Clustering
10/14
Relaxation and Optimization
Minimize ncut(A,B) =
x>Lx
x>Dx
Subject to xi ∈
{
1
vol(A)
,
−1
vol(B)
}
NP-hard to optimize under the discrete constraint.
Relaxation: grow the feasible region of x and find the
minimum value within the enlarged region.
allow x to be a real vector?
Yes, but too large.
This gives the constraint: x>D1 = 0 or equivalently x>D ⊥ 1
(You can verify this by plugging in any discrete vectors)
Solution: the second smallest eigenvector of the generalized
eigen value problem Lx = λDx.
Normalized Laplacian:
L′ = D−
1
2 (D−W)D−
1
2 = I−D−
1
2WD−
1
2
Wei Wang @ CSE, UNSW Spectral Clustering
11/14
Spectral Clustering Algorithm Framework
Algorithm SC recursive bin cut(data, k)
Construct the weighted graph G
Construct the special graph laplacian L for G .
Compute the smallest non-zero eigenvector for L. This is the
new representation of vertices in a new 1-dimensional space
(i.e., embedding).
Cluster the vertices in the embedding space according to the
objective function.
For each cluster, recursively call the algorithm if more clusters
are needed.
Wei Wang @ CSE, UNSW Spectral Clustering
12/14
Spectral Clustering Algorithm Framework
Algorithm SC k way cut(data, k)
Construct the weighted graph G
Construct the special graph laplacian L for G .
Compute the smallest t non-zero eigenvector for L. This is the
new representation of vertices in a new t-dimensional space
(i.e., embedding).
Cluster the vertices in the embedding space using another
clustering algorithm (e.g., k-means)
Wei Wang @ CSE, UNSW Spectral Clustering
13/14
Notes on the Algorithms
How to construct the weighted graph if only n objects are
given?
Be based on the similarity or distance among objects.
E.g., wij = exp(
‖f (oi )−f (oj )‖
2σ2
) where f (o) is the feature vector
of object o. One can also induce a sparse graph if one caps the
raw weights by a threshold.
Which Laplacian to use?
Unnormalized graph laplacian L = D−W.
Normalized graph laplacian L = D−
1
2 (D−W)D−
1
2 .
Wei Wang @ CSE, UNSW Spectral Clustering
14/14
Comments on Spectral Clustering
Pros:
Usually better quality than other methods.
Can be thought of (non-linear) dimensionality reduction or
embedding.
Freedom to construct a (sparse) G to preserve local
similarity/connectivity.
Only requires some similarity measure.
Could be more efficient than k-means for high-dimensional
sparse vectors (esp. if k-means is not fully optimized for such
case).
Cons:
Still need to determine k
Assumes clusters are of similar sizes.
Does not scale well with large datasets; but more scalable
variants exist.
One of the relaxation of the original NP-hard problem – may
not be the tightest relaxation.
Wei Wang @ CSE, UNSW Spectral Clustering