Introduction to Machine Learning Spectral Clustering
Prof. Kutty
Unsupervised Learning: Clustering
Copyright By PowCoder代写 加微信 powcoder
Basic idea: want to group similar points Intuitively:
– intra-cluster distancesàsmall – inter-cluster distancesàlarge
assign each datapoint to exactly i cluster
(hard) clustering algorithms
• agglomerative clustering
• k-means clustering
• spectral clustering
agglomerative
hierarchical clustering review
Agglomerative Clustering
Notation: !! set of all points in cluster i
1. Assign each pt. its own cluster
for each point i, !! = {%̅(!)}
2. Find the closest clusters & merge, repeat until convergence j
Single-linkage: Complete-linkage: 1 !!,!%
= min ‘̅∈)”;’̅+∈)#
= max ‘̅∈)”;’̅+∈)#
-(%̅,%̅+) -(%̅,%̅+)
arg min -(!!, !%) where i ≠ j !,%
find the dusters i and them
the distance between is minimized
Average-linkage: 1 !!,!% = !1 1 ∑ -(%̅,%̅+) ! !% ‘ ̅ ∈ ) ” ; ‘ ̅ + ∈ ) #
Agglomerative Hierarchical Clustering
Idea: start with each pt. in its own cluster and build a hierarchy Cut here for
3 clusters
CSE Chrysler Center
Mason Hall
in class exercise
For this dataset with agglomerative clustering and single linkage criteria, construct the dendogram and indicate the cut for 3 clusters.
assume datapoints in !! and euclidean distance
1. Assign each pt. its own cluster for each point i, ,! = {/̅(!)}
2. Find the closest clusters & merge, repeat until convergence
a r g m i n 7 ( , ! , ,% ) w h e r e i ≠ j !,%
in class exercise (solution)
Assign each pt. its own cluster for each point i, ,! = {/̅(!)}
Find the closest clusters & merge, repeat until convergence a r g m i n 7 ( , ! , ,% ) w h e r e i ≠ j
Cut here for 3 clusters
assume euclidean distance
k-means clustering review
K-Means Clustering (Initialization)
• Select k. Pick random means. – Example with k = 2.
parameter M hyper
K-Means Clustering (Iterations)
Assign Points to Clusters Re-estimate Cluster Centroids
Re-assign points to the now- nearest center.
Re-estimate Cluster Centroids
Compute centers for the new clusters assignments.
Re-assign Points to Clusters
K-Means Clustering (Convergence)
Re-assign Points to Clusters The cluster centers have stopped changing.
How do means determine cluster assignments?
By Jahobr – Own work, CC0, https://commons.wikimedia.org/w/index.php?curid=61356414
k-means Clustering convergence
k-means performs coordinate descent on the
objective function
56̅,7 =8%̅! −:̅;” !89
k-means is guaranteed to converge* not guaranteed to converge
to global minimum
k-means++ Approximation Guarantee. The expected value of the objective returned by K- means++ is never more than O(log K) from optimal and can be as close as O(1) from optimal. Even in the former case, with 2K random restarts, one restart will be O(1) from optimal (with high probability).
k-means assumes globular-shaped clusters
Clustering Algorithms
http://scikit-learn.org/stable/modules/clustering.html
spectral clustering
Spectral Clustering
• doesn’t assume globular-shaped clusters
• Reformulates data clustering problem as graph partitioning problem • Broadly
– first, convert data into a weighted graph
– next, partition graph so that each component has a weaker across- partition connection and stronger within-partition connection; ensure similar sized partitions
datapoint d edges
vertex node
similaritybetween datapoints
Data ⟶ Weighted Graph
Gaussian kernel similarity metric
;!% = <%= −
%̅! −%̅% 2?<
datapoints 1,...,n
Guyperparama
datapoints 1,...,n
Adjacency Matrix
datapoints 1,...,n
Data → Graph: Adjacency matrix, #
Vi e example:
Given @: = %̅ !
compute similarity between each pair of datapoints e.g., Gaussian
kernel similarity metric ;!% = <%= −
'̅"='̅# $ <>$
How might you get a weight of 0?
## 0.0015 #$
A = {;!%} for D = 1,…,F;H = 1,…,F A is symmetric and each ;!% ≥ 0
Definition: Complement
• Suppose I ⊂ K
• ComplementofI:I̅=K∖I
if ! = {$!, $”, $#}
then !̅ = {$$, $%}
Definition: Cost of a cut
• Cost of a cut between I and I̅ 6MNI,I̅ = 8 ;!%
if I = {O9, O<, then I̅ = {OA, OB}
6MN I,I̅ =
0.37 0.0015
Spectral Clustering
Goal: Given a graph representing the data, find a minimum cost cut? Issue: May not give a reasonable solution
0.4 E 0.8 0.9
A Az Cut AAz
IcutCAA CAL 0 35
Spectral Clustering
Goal: Given a graph representing the data,
find a minimum cost RatioCutàk clustering
Ah argmin RatiocutCA Al Az
ite 1 PQNDR!MN I9, ... , IC = 2 8
6MN(I!, I!) called “ratio cut” I! ensures clusters are
reasonably large groups
Idea: Use the eigenvectors and eigenvalues of the graph Laplacian
is the matrix S = 1 − A um
Ratiocut A A Az
I artiff t
I Ye t I OI
I artiff t
Adjacency Matrix W, Degree Matrix ! and Graph Laplacian "
• AdjacencyMatrixW
! = {$"#} for & = 1,...,*;, = 1,...,*
where $"# = sim(2̅ " , 2̅ # ) • Degreematrix4with :
1=∑ ;and1=0forD≠H d!! %89 !% d!%
• The graph Laplacian is the matrix S = 1 − A
• We are interested in the eigenvectors and eigenvalues of L
Recall: Eigenvalues and eigenvectors
• A scalar ^ is called an eigenvalue of a matrix I if there is a non-trivial solution O of
• We say that O is the eigenvector corresponding to the eigenvalue ^
Note: an eigenvector cannot be 0., but an eigenvalue can be 0. • Example:
3 6 −8 −10 I=006^=2O=3
3 6 −8 −10 −20 −10 IO=006 3=6=23=2O
T y p i c a Tl h u 1 1 2 I
Why does this work? Spectral Clustering for # partitions
min $%&'()*& +',...,+(
min /+0/ *̅(&),...,*̅(')
indicator vectors
̅ min̅ /+0/ *(&),...,*(')
real vectors
indicator vector
corresponding
The solution to this is the first 1 eigenvectors of 0
Graph Laplacian $
The graph Laplacian is the matrix S = 1 − A
PropertiesofS Iz EIR JTL3 70 • S is symmetric
– S has F non-negative real-valued eigenvalues 0≤^9 ≤⋯≤^:
• Fact: the multiplicity of the eigenvalue 0 is the number of connected components in the graph
• Examples
image from: https://towardsdatascience.com/spectral-clustering-aba2640c0d5b
Spectral Clustering Example #1
23 0.5B DE 60.5 0.50007
0.25 6 0.5 0.5 0 0 0 7
C 64 0 0 0 0 0 75 W=2 3 0 0 0 0.25 0.25
1 0.5 0 0 0 60.5 1 0 0 07 64 0 0 1 0 0 75
0 0 0 1 0.25 0 0 0 0.25 1
Input: similarity metric, number of clusters k 1. Build adjacency matrix D
2. Compute graph Laplacian E = F − D
0 0 0 0.25 0.25
Spectral Clustering Example #1
6 0.5 0.5 0 0 0 7 Input: similarity metric, number of clusters k
6 0.5 0.5 0 0 0 7 1. Build adjacency matrix D
64 0 0 0 0 0 75 2 . C o m p u t e g r a p h L a p l a c i a n E = F − D
0 0 0 0.25 0.25 3. Compute eigenvectors/eigenvalues of E 0 0 0 0.25 0.25
(eigenvector, eigenvalue) pairs
2p 2p2p2pp3 p3p3p3
2 2 2 222p 2 2 2 p3
0 0 0 0 00 02 0 0 2 2 p2 p2 02 0p2 0p2
pp p666p2 7772
62 2 2 2627 2 2 2 7
6 60 06 060 00 072 0707 72
6 2 6 2 6 2 6 2 02 7 02 7 02 7 2 7
,0 ,06,0 ,0.57,1
6 0 1.6000 061.00 6010.00 01.0700 0 7 0 7 0 7
116111110111 1.00
7 6 pp267222 7
4 42 42 52525 4 0 00 0 05 0 0 0
240p2 0p2 p2 05
0 00 00 00 0 0 0 0
2 2 2 022 022 22 2 0 22
Spectral Clustering Example #1
Input: similarity metric, number of clusters k 1. Build adjacency matrix 3
2. Compute graph Laplacian 1 = $ − 3
3. Compute eigenvectors of 1
4. Build matrix with the first k eigenvectors (corresponding to the k smallest eigenvalues) as columns interpret rows as new data points
5. Apply k-means to new data representation Output: clusters assignments
intuitively
Eigenvectors
2 p2 0 0 0 p23
6 p2 p2 7 B6 2 0 0 0 27
60 1.00 0 0 07 Cpp
6 2 2 7 D4 0 0 2 2 0 5
E 0 0 p2 p2 0 2 2
Eigenvalues
[0, 0, 0, 0.5, 1.0] pyyyqy.no
indicator vector for 3rdpartitio
Spectral Clustering Example #1
23 0.5B DE 60.5 0.50007
0.25 6 0.5 0.5 0 0 0 7
C 64 0 0 0 0 0 75 W=2 3 0 0 0 0.25 0.25
1 0.5 0 0 0 60.5 1 0 0 0 7 64 0 0 1 0 0 75
0 0 0 1 0.25 0000.251
0 0 0 0.25 0.25 Eigenvectors
2 p p 3 A 22 0 0 0 22
6 p2 p2 7 B6 2 0 0 0 27
60 1.00 0 0 07 Cpp
6227 D40 0 2 2 05
k dimensional embedding
Eigenvalues
[0, 0, 0, 0.5, 1.0]
0 0 p2 p2 0 22
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com