程序代写代做代考 algorithm AI 1/14

1/14

Spectral Clustering

Wei Wang @ CSE, UNSW

May 7, 2018

Wei Wang @ CSE, UNSW Spectral Clustering

2/14

Quadratic Form

Let A be some n × n matrix.
What is Ax? What’s the type of the output? What may x
represent?

Some numeric assignment on all nodes in G .
E.g., what if xi ∈ { 0, 1 }? xi ∈ [0, 1]? xi ∈ Ax? What’s the type of the output? Why it is
called a qudratic form?

x>Ax =
∑

i,j Ai j · (xixj)

Wei Wang @ CSE, UNSW Spectral Clustering

2/14

Quadratic Form

Let A be some n × n matrix.
What is Ax? What’s the type of the output? What may x
represent?

Some numeric assignment on all nodes in G .
E.g., what if xi ∈ { 0, 1 }? xi ∈ [0, 1]? xi ∈ Ax? What’s the type of the output? Why it is
called a qudratic form?

x>Ax =
∑

i,j Ai j · (xixj)

Wei Wang @ CSE, UNSW Spectral Clustering

3/14

Unnormalized Graph Laplacian

Let A is the adjacency matrix of a “normal” (unweighted)
undirected graph G . V are the vertices of G and E are the
edges of G

An edge between vi and vj is modelled as (i , j) and (j , i), i.e.,
Ai j = Aj i = 1.
Ai i = ?

What about x>Ix?

What about x>Dx, where D = Diag(d1, d2, . . . , dn) and
di = deg(vi )?

What about x>Ax?

Now what about 2(x>Dx− x>Ax) ?

Wei Wang @ CSE, UNSW Spectral Clustering

4/14

Example

x>Lx =
1

2
·
∑
eij∈E

(xi − xj)2 , where L = D− A.

`2 differences between assignments on the two ends of an
edge, summed over all edges.

Wei Wang @ CSE, UNSW Spectral Clustering

5/14

Example

n1 n2 n3 n4 n5 n6
n1
n2
n3
n4
n5
n6

1 is the one vector.

L1 = (NB: L> = L) =⇒ λ1 = 0, v1 = 1
x>Lx =

Wei Wang @ CSE, UNSW Spectral Clustering

6/14

Binary x induces a Clustering /1

1
1

2
1

3
1

4
-1

5
-1

6
-1

x =

x>Lx =

Wei Wang @ CSE, UNSW Spectral Clustering

7/14

Binary x induces a Clustering /2

1
1

2
-1

3
1

4
-1

5
1

6
-1

x =

x>Lx =

x>x =

Wei Wang @ CSE, UNSW Spectral Clustering

8/14

Min Cut vs. Normalized Cut

Min cuts are not always desirable.
Biased towards cutting small sets of isolated nodes.

Cut: cut(A,B) =
∑

vi∈A,vj∈B wij .

Normalized cut:

ncut(A,B) =
cut(A,B)

vol(A)
+

cut(A,B)

vol(B)
,

where vol(A) =
∑

vi∈A di ) =
∑

vi∈A,vj∈V wi ,j .

Wei Wang @ CSE, UNSW Spectral Clustering

9/14

Connection to L

ncut(A,B) = cut(A,B)
(

1
vol(A)

+ 1
vol(B)

)
Let xi =

1
vol(A)

if vi ∈ A, and = −1vol(B) otherwise.

x>Lx =
∑

e wij(xi − xj)
2 = 0 +

∑
vi∈A,vj∈B

(
1

vol(A)
+ 1

vol(B)

)2
x>Dx = (x>D)x =

∑
E dix

2
i =∑

vi∈A
di

vol(A)2
+
∑

vj∈B
dj

vol(B)2
= 1

vol(A)
+ 1

vol(B)

ncut(A,B) =
x>Lx

x>Dx

Wei Wang @ CSE, UNSW Spectral Clustering

10/14

Relaxation and Optimization

Minimize ncut(A,B) =
x>Lx

x>Dx
Subject to xi ∈

{
1

vol(A)
,
−1

vol(B)

}

NP-hard to optimize under the discrete constraint.

Relaxation: grow the feasible region of x and find the
minimum value within the enlarged region.

allow x to be a real vector?
Yes, but too large.
This gives the constraint: x>D1 = 0 or equivalently x>D ⊥ 1
(You can verify this by plugging in any discrete vectors)

Solution: the second smallest eigenvector of the generalized
eigen value problem Lx = λDx.

Normalized Laplacian:

L′ = D−
1
2 (D−W)D−

1
2 = I−D−

1
2WD−

1
2

Wei Wang @ CSE, UNSW Spectral Clustering

11/14

Spectral Clustering Algorithm Framework

Algorithm SC recursive bin cut(data, k)

Construct the weighted graph G
Construct the special graph laplacian L for G .
Compute the smallest non-zero eigenvector for L. This is the
new representation of vertices in a new 1-dimensional space
(i.e., embedding).
Cluster the vertices in the embedding space according to the
objective function.
For each cluster, recursively call the algorithm if more clusters
are needed.

Wei Wang @ CSE, UNSW Spectral Clustering

12/14

Spectral Clustering Algorithm Framework

Algorithm SC k way cut(data, k)

Construct the weighted graph G
Construct the special graph laplacian L for G .
Compute the smallest t non-zero eigenvector for L. This is the
new representation of vertices in a new t-dimensional space
(i.e., embedding).
Cluster the vertices in the embedding space using another
clustering algorithm (e.g., k-means)

Wei Wang @ CSE, UNSW Spectral Clustering

13/14

Notes on the Algorithms

How to construct the weighted graph if only n objects are
given?

Be based on the similarity or distance among objects.

E.g., wij = exp(
‖f (oi )−f (oj )‖

2σ2
) where f (o) is the feature vector

of object o. One can also induce a sparse graph if one caps the
raw weights by a threshold.

Which Laplacian to use?

Unnormalized graph laplacian L = D−W.
Normalized graph laplacian L = D−

1
2 (D−W)D−

1
2 .

Wei Wang @ CSE, UNSW Spectral Clustering

14/14

Comments on Spectral Clustering

Pros:

Usually better quality than other methods.
Can be thought of (non-linear) dimensionality reduction or
embedding.
Freedom to construct a (sparse) G to preserve local
similarity/connectivity.
Only requires some similarity measure.
Could be more efficient than k-means for high-dimensional
sparse vectors (esp. if k-means is not fully optimized for such
case).

Cons:

Still need to determine k
Assumes clusters are of similar sizes.
Does not scale well with large datasets; but more scalable
variants exist.
One of the relaxation of the original NP-hard problem – may
not be the tightest relaxation.

Wei Wang @ CSE, UNSW Spectral Clustering

Related Posts