程序代写代做代考 algorithm data structure Spectral Clustering

Spectral Clustering

BEYOND K-MEANS:
SIMILARTY-BASED CLUSTERING
Mark Crowley
Based on slides by Ahmed Farahat 2014
University of Waterloo

ECE 657A: Lecture 8 – ClusteringMark CrowleyMark Crowley ECE 657A: Lecture 8 – Clustering

Traditional k-means –Assumptions

• Data instances can be represented in a vector space

v1

v2

vn v1v2 vn

What about
complex data
structures?

• Data are distributed around centroids
What about this?

Similarity-based Clustering
Idea
• Calculate similarity between pairs of data instances
• Use these similarity measures to cluster data instances

into groups
Why
• Capture non-linearity in the data
• Work on complex data structures (text, strings, trees…etc)
Two Algorithms
• Graph-based Clustering
• Kernel-based Clustering

GRAPH-BASED CLUSTERING

Clustering as Graph Partitioning
• Construct an undirected graph

• G=(V, E)
• V: data instances
• E: weighted edges between (similar)

instances

• Goal: Divide graph into sub-graphs
• sum of weights on edges between each

two sub-graphs is minimum
• sum of weights on edges within sub-

graphs is maximum
Figure from

[Schaeffer 2007]

Graph Cut
• Graph cut between two sub-graphs:

= sum of weights of the edges connecting the two-sub-graphs

• Goal: Find two partitions such that the graph cut between
them are minimum

A
B

How to minimize cut(A,B)?
• Naïve algorithm: Enumerate all possible cuts, find the

minimum NP-hard
• Solution: Define a cluster indicator vector f

• We can show that (see Proof 1):

A
B

+1
-1

How to minimize cut(A,B)?
• Find a continuous vector f which minimizes

+0.7
-0.36

• Find a threshold T, and divide data instances into two
clusters as follows:

• Note: We have to add constraints on f to avoid the simple
solution f=0 (e.g., maximize the length of f)

Definitions
• W: Similarity matrix
• D: Degree matrix (diagonal)

• L: Laplacian matrix

• We can show that for any f (see Proof 2):

Example
• Where is the minimum cut?

Graph cut produces
unbalanced partitions

Ratio Cut

• Define a cluster indicator vector f

• We can show that (see Proof 3):

Ratio Cut (Cont.)
• Minimizing the ratio cut is equivalent to

s.t.
to enforce
balanced
partitions

• Solution: f is the 2nd smallest eigenvector of L

Normalized Cuts

• Define a cluster indicator vector f

• We can show that (see Proof 4):

Normalized Cut (Cont.)
• Minimizing the normalized cut is equivalent to

s.t.

• Solution: f is the 2nd smallest eigenvector of the
generalized eigenproblem:

• or that of the normalized Laplacian matrix

Example

Figure from
[Ding et al 2008]

SC Algorithm for k Clusters
• Calculate the normalized Laplacian matrix LN

• Find the 2nd smallest k eigenvectors of LN

• Represent data instances in the space of V, and
normalize data vectors

• Run k-means on the rows of U

SC Algorithm for k Clusters

Figure from
[Azran 2007]

How to construct graphs?
• Similarity function

• Inner-products or cosine similarities

• Gaussian similarity function

• Application-specific similarity function
• Connectivity

• Fully-connected graph
• k-Nearest Neighbor Graph
• ε-Nearest Neighbor Graph

What SC can do?

Figure from
[Bach & Jordan 2006]

Image Segmentation

Figure from
[Shi & Malik 2000]

Speech Separation

Truth

SC

Figure from
[Bach & Jordan 2006]

Finding Communities in Graphs

Figure from
[White & Smyth 2005]

Problems
• Calculating the eigenvectors of the similarity matrix

between data instances is computationally expensive
(O(n3)) and memory inefficient (O(n2)).

• Solutions:
• Sparsify the similarity matrix
• Approximate the eigenvectors of the Laplacian matrix
• Landmark-based spectral clustering
• Parallel and distributed implementations
• ….

KERNEL-BASED CLUSTERING

K-means
• Works on data points (or vectors)

• What if the data cannot be represented as vectors?

Data matrix

v1

v2

vn
v1 v2 vn

Kernel Matrices
• Can we estimate how close two vectors are?

• Any algorithm that depends on inner-products between
vectors can be “kernelized”.

Kernel matrix
j

Implicit vector space

i

What about k-means?
• Initialize: Random partitions
• Repeat

• Update Step: Calculate centroids

• Assignment Step: Assign data points to clusters

• Problem: Cannot calculate vectors for data instances and
centroids

From Distances to Inner-Products

Assignment Step

Kernel k-means
• Initialize: Random partitions
• Repeat

Problems
• Calculating pairwise similarities between data instances

is computationally expensive (O(n2)) and memory
inefficient (O(n2)).

• Solutions:
• Sparsify the similarity matrix
• Restrict the centroids to be in the space of a few landmarks
• Parallel and distributed implementations
• ….

Reference
• Shi & Malik, Normalized Cuts and Image Segmentation, 2000
• Ng et al., On Spectral Clustering: Analysis and an algorithm ,

2002
• Bach & Jordan, Learning Spectral Clustering, With Application

To Speech Separation, 2006
• White & Smyth, Spectral Clustering Approach To Finding

Communities in Graphs, 2005
• von Luxburg, A Tutorial on Spectral Clustering, 2007
• Ding et al. Tutorial on Spectral Clustering. ICDM 2004 (Link)
• Azran. Tutorial on Spectral Clustering. 2007 (Link)
• Ghodsi, Course Notes on Spectral Clustering, 2009 (Link)
• Schaeffer, Graph clustering, Computer Science Review, 1(1)

pp. 27-64, 2007
• Dhillon et al. Kernel k-means, spectral clustering and

normalized cuts. KDD, 2004.

http://ranger.uta.edu/%7Echqding/Spectral/

http://www.math.uwaterloo.ca/%7Eaghodsib/courses/s09stat946/notes/Spectral%20Clustering.ppt

BEYOND K-MEANS:
SIMILARTY-BASED CLUSTERING
Traditional k-means – Assumptions
Similarity-based Clustering
GRAPH-BASED CLUSTERING
Clustering as Graph Partitioning
Graph Cut
How to minimize cut(A,B)?
How to minimize cut(A,B)?
Definitions
Example
Ratio Cut
Ratio Cut (Cont.)
Normalized Cuts
Normalized Cut (Cont.)
Example
SC Algorithm for k Clusters
SC Algorithm for k Clusters
How to construct graphs?
What SC can do?
Image Segmentation
Speech Separation
Finding Communities in Graphs
Problems
KERNEL-BASED CLUSTERING
K-means
Kernel Matrices
What about k-means?
From Distances to Inner-Products
Assignment Step
Kernel k-means
Problems
Reference