Spectral Clustering
BEYOND K-MEANS:
SIMILARTY-BASED CLUSTERING
Mark Crowley
Based on slides by Ahmed Farahat 2014
University of Waterloo
ECE 657A: Lecture 8 – ClusteringMark CrowleyMark Crowley ECE 657A: Lecture 8 – Clustering
Traditional k-means –Assumptions
• Data instances can be represented in a vector space
v1
v2
vn v1v2 vn
What about
complex data
structures?
• Data are distributed around centroids
What about this?
Similarity-based Clustering
Idea
• Calculate similarity between pairs of data instances
• Use these similarity measures to cluster data instances
into groups
Why
• Capture non-linearity in the data
• Work on complex data structures (text, strings, trees…etc)
Two Algorithms
• Graph-based Clustering
• Kernel-based Clustering
GRAPH-BASED CLUSTERING
Clustering as Graph Partitioning
• Construct an undirected graph
• G=(V, E)
• V: data instances
• E: weighted edges between (similar)
instances
• Goal: Divide graph into sub-graphs
• sum of weights on edges between each
two sub-graphs is minimum
• sum of weights on edges within sub-
graphs is maximum
Figure from
[Schaeffer 2007]
Graph Cut
• Graph cut between two sub-graphs:
= sum of weights of the edges connecting the two-sub-graphs
• Goal: Find two partitions such that the graph cut between
them are minimum
A
B
How to minimize cut(A,B)?
• Naïve algorithm: Enumerate all possible cuts, find the
minimum NP-hard
• Solution: Define a cluster indicator vector f
• We can show that (see Proof 1):
A
B
+1
-1
How to minimize cut(A,B)?
• Find a continuous vector f which minimizes
+0.7
-0.36
• Find a threshold T, and divide data instances into two
clusters as follows:
• Note: We have to add constraints on f to avoid the simple
solution f=0 (e.g., maximize the length of f)
Definitions
• W: Similarity matrix
• D: Degree matrix (diagonal)
• L: Laplacian matrix
• We can show that for any f (see Proof 2):
Example
• Where is the minimum cut?
Graph cut produces
unbalanced partitions
Ratio Cut
• Define a cluster indicator vector f
• We can show that (see Proof 3):
Ratio Cut (Cont.)
• Minimizing the ratio cut is equivalent to
s.t.
to enforce
balanced
partitions
• Solution: f is the 2nd smallest eigenvector of L
Normalized Cuts
• Define a cluster indicator vector f
• We can show that (see Proof 4):
Normalized Cut (Cont.)
• Minimizing the normalized cut is equivalent to
s.t.
• Solution: f is the 2nd smallest eigenvector of the
generalized eigenproblem:
• or that of the normalized Laplacian matrix
Example
Figure from
[Ding et al 2008]
SC Algorithm for k Clusters
• Calculate the normalized Laplacian matrix LN
• Find the 2nd smallest k eigenvectors of LN
• Represent data instances in the space of V, and
normalize data vectors
• Run k-means on the rows of U
SC Algorithm for k Clusters
Figure from
[Azran 2007]
How to construct graphs?
• Similarity function
• Inner-products or cosine similarities
• Gaussian similarity function
• Application-specific similarity function
• Connectivity
• Fully-connected graph
• k-Nearest Neighbor Graph
• ε-Nearest Neighbor Graph
What SC can do?
Figure from
[Bach & Jordan 2006]
Image Segmentation
Figure from
[Shi & Malik 2000]
Speech Separation
Truth
SC
Figure from
[Bach & Jordan 2006]
Finding Communities in Graphs
Figure from
[White & Smyth 2005]
Problems
• Calculating the eigenvectors of the similarity matrix
between data instances is computationally expensive
(O(n3)) and memory inefficient (O(n2)).
• Solutions:
• Sparsify the similarity matrix
• Approximate the eigenvectors of the Laplacian matrix
• Landmark-based spectral clustering
• Parallel and distributed implementations
• ….
KERNEL-BASED CLUSTERING
K-means
• Works on data points (or vectors)
• What if the data cannot be represented as vectors?
Data matrix
v1
v2
vn
v1 v2 vn
Kernel Matrices
• Can we estimate how close two vectors are?
• Any algorithm that depends on inner-products between
vectors can be “kernelized”.
Kernel matrix
j
Implicit vector space
i
What about k-means?
• Initialize: Random partitions
• Repeat
• Update Step: Calculate centroids
• Assignment Step: Assign data points to clusters
• Problem: Cannot calculate vectors for data instances and
centroids
From Distances to Inner-Products
Assignment Step
Kernel k-means
• Initialize: Random partitions
• Repeat
Problems
• Calculating pairwise similarities between data instances
is computationally expensive (O(n2)) and memory
inefficient (O(n2)).
• Solutions:
• Sparsify the similarity matrix
• Restrict the centroids to be in the space of a few landmarks
• Parallel and distributed implementations
• ….
Reference
• Shi & Malik, Normalized Cuts and Image Segmentation, 2000
• Ng et al., On Spectral Clustering: Analysis and an algorithm ,
2002
• Bach & Jordan, Learning Spectral Clustering, With Application
To Speech Separation, 2006
• White & Smyth, Spectral Clustering Approach To Finding
Communities in Graphs, 2005
• von Luxburg, A Tutorial on Spectral Clustering, 2007
• Ding et al. Tutorial on Spectral Clustering. ICDM 2004 (Link)
• Azran. Tutorial on Spectral Clustering. 2007 (Link)
• Ghodsi, Course Notes on Spectral Clustering, 2009 (Link)
• Schaeffer, Graph clustering, Computer Science Review, 1(1)
pp. 27-64, 2007
• Dhillon et al. Kernel k-means, spectral clustering and
normalized cuts. KDD, 2004.
http://ranger.uta.edu/%7Echqding/Spectral/
http://www.math.uwaterloo.ca/%7Eaghodsib/courses/s09stat946/notes/Spectral%20Clustering.ppt
BEYOND K-MEANS:
SIMILARTY-BASED CLUSTERING
Traditional k-means – Assumptions
Similarity-based Clustering
GRAPH-BASED CLUSTERING
Clustering as Graph Partitioning
Graph Cut
How to minimize cut(A,B)?
How to minimize cut(A,B)?
Definitions
Example
Ratio Cut
Ratio Cut (Cont.)
Normalized Cuts
Normalized Cut (Cont.)
Example
SC Algorithm for k Clusters
SC Algorithm for k Clusters
How to construct graphs?
What SC can do?
Image Segmentation
Speech Separation
Finding Communities in Graphs
Problems
KERNEL-BASED CLUSTERING
K-means
Kernel Matrices
What about k-means?
From Distances to Inner-Products
Assignment Step
Kernel k-means
Problems
Reference