代写代考 Spectral Clustering

Spectral Clustering
Introduction

Illustrate the key idea of spectral clustering

Copyright By PowCoder代写 加微信 powcoder

Define basic graph notations useful for spectral clustering

Revisiting k-means & Mixture Models
|K-means use “hard” membership while mixture models allow “soft” membership
|Both use feature/vector representation of the data as input➔E.g., Euclidean distance is one natural (dis)similarity measure.
-What if the input data is NOT represented in feature/vector, format?
• E.g., graph data.
• E.g., objects with only pair-wise similarities (like individuals on a social network➔community detection)

Revisiting k-means & Mixture Models
|In both k-means and mixture models, we look for compact clustering structures.
|In some cases, connected-component structures may be more desirable.

Source: Ng, A.Y., .J., and Yair, W. “On spectral clustering: Analysis and an algorithm.” Advances in neural information processing systems. 2002.

Spectral Clustering
|A family of methods for finding such similarity- based clusters
-“Spectral”: for using the eigenvalues (spectrum) of the similarity matrix of the data.
-Graph clustering, similarity- based clustering
|The objects to be clustered are not in a vector space.
-The primary feature is the similarity between objects.
-For any pair of objects i and j, we have a value s(i,j) measuring their similarity; all such values form the similarity matrix.
➔ Graphs are intuitive for representing/visualizing such data.

Graph Representation
|Definition: A graph G = (V, E) is defined by V, a set of N vertices, and E, a set of edges.
Undirected graph Directed graph
|In spectral clustering, we consider undirected graphs.

Graph Representation (1/4)
|Adjacency matrix W of undirected graph
– N × 𝑁 symmetric binary matrix
– The row and columns are indexed by the vertices and the entries represent the edges of the graph
൝𝑤𝑖,𝑗 = 0 𝑖𝑓 𝑣𝑒𝑟𝑡𝑖𝑐𝑒𝑠 𝑖, 𝑗 𝑎𝑟𝑒 𝑛𝑜𝑡 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑒𝑑 𝑤𝑖,𝑗 = 1 𝑖𝑓 𝑣𝑒𝑟𝑡𝑖𝑐𝑒𝑠 𝑖, 𝑗 𝑎𝑟𝑒 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑒𝑑
– Simple graph = zero diagonal

Graph Representation (2/4)
| Weighted adjacency matrix (sometimes called affinity matrix)
– Allow values other than 0 or 1
– Each edge is weighted by pairwise similarity
൝ 𝑤𝑖,𝑗 = 0 𝑖𝑓 𝑖, 𝑗 𝑎𝑟𝑒 𝑛𝑜𝑡 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑒𝑑 𝑤𝑖,𝑗 = 𝑠(𝑖, 𝑗) 𝑖𝑓 𝑖, 𝑗 𝑎𝑟𝑒 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑒𝑑
|𝒘𝒊,𝒋 may be defined through some kernel functions.

Graph Representation (3/4)
|Degree matrix D of undirected graph
– 𝑁 × 𝑁 diagonal matrix that contains information about the
degree of each vertex.
– Degree 𝑑(𝑣𝑖) of a vertex 𝑣𝑖: # of edges incident to the vertex.
• Extended to sum of weights from edges incident to the vertex. – So, we have:
𝑑(𝑣1) 0 0 𝐃=[0⋱0]

Graph Representation (4/4)
|Laplacian matrix L of undirected graph
– 𝐋 = 𝐃 − 𝐖 (Degree-Affinity) (Unnormalized)
– 𝐋 is symmetric and positive semi-definite
– N non-negative real-valued eigenvalues
– The smallest eigen-value is 0, the corresponding eigenvector is the 1-vector (all elements being 1).
– The smallest non-zero eigenvalue of L is called the spectral gap.

Spectral Clustering
A Graph Cut Formulation

Objective Objective
Define the graph partition formulation
Learn how to solve simple graph partition

Clustering as Graph Partition/Cut
|Find a partition of a graph such that the edges between different groups have a very low weight while the edges within a group have high weight.
|E.g., minimum cut
|More general, consider weighted edges.
CutSize = 2

2-way Spectral Graph Partitioning
|Weighted adjacency matrix W
-𝑤𝑖,𝑗 : the weight between two vertices i and j
|(Cluster) Membership vector 𝐪 𝑞𝑖= ቊ 1 𝑖 ∈ 𝐶𝑙𝑢𝑠𝑡𝑒𝑟 𝐴
−1 𝑖 ∈ 𝐶𝑙𝑢𝑠𝑡𝑒𝑟 𝐵
𝐪 = argmin𝑛 𝐶𝑢𝑡𝑆𝑖𝑧𝑒 , 𝐪∈[−1, 1]
𝐶𝑢𝑡𝑆𝑖𝑧𝑒 = J = 1 ෍(𝑞𝑖 − 𝑞𝑗)2𝑤𝑖,𝑗 4 𝑖,𝑗

Solving the Optimization Problem
𝐪= argmin 1෍(𝑞𝑖−𝑞𝑗)2𝑤𝑖,𝑗, 𝐪∈[−1, 1]𝑛 4 𝑖,𝑗
|Directly solving the above problem requires combinatorial search→exponential complexity
|How to reduce the computational complexity?

Relaxation Approach
|Key difficulty: 𝐪𝐢 has to be either -1,1. -Relax 𝑞𝑖 to be any real number.
-Impose Constraint: σ𝑛 𝑞2 = 𝑛 𝑖=1 𝑖
𝐽=1 σ (𝑞 −𝑞)2𝑤 =1 σ (𝑞2−2𝑞𝑞 +𝑞2)𝑤
4 𝑖,𝑗 𝑖 𝑗 𝑖,𝑗 4 𝑖,𝑗 𝑖
=1σ2𝑞2σ𝑤 −1σ2𝑞𝑞𝑤
4 𝑖 𝑖 𝑗 𝑖,𝑗 4 𝑖,𝑗 𝑖 𝑗 𝑖,𝑗
=1σ𝑞2𝑑−1σ 𝑞(𝑑δ −𝑤 )𝑞 2 𝑖 𝑖 𝑖 2 𝑖,𝑗 𝑖 𝑖 𝑖,𝑗 𝑖,𝑗 𝑗
where 𝑑𝑖 = σ𝑗 𝑤𝑖,𝑗 and 𝐷 ≡ [𝑑𝑖δ𝑖,𝑗]
➔𝐽=1𝐪𝑇 𝐃−𝐖𝐪 2

Relaxation Approach (cont’d)
|The final problem formulation: 𝐪=argmin𝐽=argmin𝐪𝑇 𝐃−𝐖𝐪,
subject to σ𝑛 𝑞2 = 𝑛 𝑖=1 𝑖
|Solution: the second minimum eigenvector for D-W
𝐃−𝐖 𝐪=𝜆2 𝐪

Graph Laplacian
|𝑳 = 𝑫 − 𝑾
|L is semi-positive definitive matrix.
‒ For any 𝐱, we have 𝐱𝑇𝐋𝐱 ≥ 0. (Why?) |Minimum eigenvalue 𝝀𝟏 = 𝟎 (what is the
eigenvector?)
0 = λ1 < λ2 < λ3 ... < λ𝑘 |The eigenvector that corresponds to the second minimum eigenvalue 𝝀𝟐 gives the best bipartite graph partition. Recovering the Partitions |Due to the relaxation, q can be any number (not just -1 and 1) |How to construct the partition based on the eigenvector? |A simple strategy : 𝐴= 𝑖𝑞𝑖<0, 𝐵= 𝑖𝑞𝑖≥0 One Obvious Drawback |Minimum cut does not balance the size of bipartite graphs. |How should we consider other factors like the sizes of the partitions? Spectral Clustering Going Beyond MinCut Discuss several graph cut approaches Illustrate the algorithm through an example |In MinCut, we used the following objective function: 𝐽 = 𝐶𝑢𝑡(𝐴, 𝐵) 𝑀𝑖𝑛𝐶𝑢𝑡 |We noted one drawback of MinCut: the sizes of the partitions are not considered. |A few extensions exist. Characterizing Graph Cut |𝑪𝒖𝒕 𝑨,𝑩 = σ𝒊∈𝑨,𝒋∈𝑩 𝒘𝒊𝒋 𝒆.𝒈.,𝑪𝒖𝒕 𝑨,𝑩 = 𝟎.𝟑 |𝑪𝒖𝒕 𝑨,𝑨 = σ𝒊∈𝑨,𝒋∈𝑨 𝒘𝒊𝒋 𝒆.𝒈.,𝑪𝒖𝒕 𝑨,𝑨 = 𝟐.𝟔 |𝑪𝒖𝒕 𝑩,𝑩 = σ𝒊∈𝑩,𝒋∈𝑩 𝒘𝒊𝒋 𝒆.𝒈.,𝑪𝒖𝒕 𝑩,𝑩 = 𝟐.𝟒 |𝑽𝒐𝒍 𝑨 =σ σ𝒏 𝒘 𝒆.𝒈.,𝑽𝒐𝒍 𝑨 =𝟓.𝟓 𝒊∈𝑨 𝒊=𝟏 𝒊𝒋 |𝑽𝒐𝒍𝑩 =σ σ𝒏 𝒘 𝒆.𝒈.,𝑽𝒐𝒍𝑩 =𝟓.𝟏 𝒊∈𝑩 𝒊=𝟏 𝒊𝒋 | 𝑨 = 𝟒, 𝑩 = 𝟑 2 0. 1 0.1 5 0. 7 67 The Ratio Cut Method |The Objective function: 𝐽 (𝐴,𝐵)=𝐶𝑢𝑡(𝐴,𝐵)(1 +1) |Attempts to produce balanced clusters. 2 0. 1 0.1 5 0. 7 67 𝐽 (𝐴, 𝐵) = 𝑅𝑎𝑡𝑖𝑜𝐶𝑢𝑡 The Ratio Cut Method (cont’d) |Similar to MinCut, the solution can be found by the following generalized eigenvalue problem: 𝐃−𝐖 𝐪=λ𝐃𝐪 𝐋𝐪 = λ𝐃𝐪 Normalized Cut (NCut) |In Ratio Cut, the balance of the partitions is defined based on the number of vertices. |We may consider the “size” of a set based on weights of its edges ➔ Ncut |The objective function is: 𝐽 (𝐴,𝐵)=𝐶𝑢𝑡(𝐴,𝐵)( 1 + 1 ) Example: 𝐽 (𝐴,𝐵)=0.1134 𝑁𝐶𝑢𝑡 𝑉𝑜𝑙(𝐴) 𝑉𝑜𝑙(𝐵) Additional Considerations |In clustering, we should also consider within- cluster connections. |A good partition should consider -Inter-cluster connections, and -Intra-cluster connections. |1st constrant: inter-connection should be minimized: 𝑴𝒊𝒏𝑪𝒖𝒕(𝑨, 𝑩) |2nd constraint: intra-connection should be maximized : 𝑴𝒂𝒙𝑪𝒖𝒕 𝑨, 𝑨 𝒂𝒏𝒅 𝑴𝒂𝒙𝑪𝒖𝒕(𝑩, 𝑩) |These requirements may be simultaneously satisfied by minimizing the objective function: 𝐽 (𝐴, 𝐵) = 𝐶𝑢𝑡(𝐴, 𝐵)( 1 + 1 ) 𝐶𝑢𝑡(𝐵,𝐵) Example: 𝐽 (𝐴,𝐵)=0.240 𝑀𝑖𝑛𝑀𝑎𝑥𝐶𝑢𝑡 Normalized and MinMaxCut Methods |Similar to before, we may relax the indicator vector q to real values. |For both NCut and MinMaxCut, the solution may be found by solving generalized eigenvalue problems. An Illustrative Example Graph and Similarity Matrix 2 0.6 1 0.5 Graph and Laplacian Matrix 2 0.6 1 0.5 Solve Eigen Problem |Pre-processing -Build Laplacian matrix L of the graph. -Eigenvalues Λ and eigenvectors x of matrix L. -Map vertices to the corresponding components of the 2nd eigenvector. Spectral Clustering Split at value 0 Cluster A: Negative points Cluster B: Positive Points 2 0. 1 0.1 5 0. 7 60. 7 0. 48 0.0. 530. 0.2986 Spectral Clustering Practical Considerations in Implementation Discuss several practical implementation issues Recursive Bi-partitioning |Recursively apply bi-partitioning algorithm in a hierarchical divisive manner. |Disadvantages: inefficient, stability issues. K-way Graph Cuts |Generalizing the 2-way objective functions : 𝐽 𝑅𝑎𝑡𝑖𝑜𝐶𝑢𝑡 (𝐴 ,...,𝐴 )=෍ 1 𝑘 𝐽 (𝐴 , ... , 𝐴 ) = ෍ 𝑁𝐶𝑢𝑡1 𝑘 𝑘 𝐶𝑢𝑡(𝐴𝑖,𝐴𝑖) 𝑘 𝐶𝑢𝑡(𝐴𝑖,𝐴𝑖) 𝐽 𝑀𝑖𝑛𝑀𝑎𝑥𝐶𝑢𝑡 𝑘 𝐶𝑢𝑡(𝐴𝑖,𝐴𝑖) (𝐴 , ... , 𝐴 ) = ෍ 𝑖=1 𝐶𝑢𝑡(𝐴𝑖,𝐴𝑖) 𝑉𝑜𝑙(𝐴𝑖 ) 𝑖=1 Implementation Considerations (1/4) |Preprocessing: spectral clustering methods can be interpreted as tools for analysis of the block structure of the similarity matrix. -Building such matrices may certainly ameliorate the results. |When building graphs from real data -Calculation of the similarity matrix is not evident. -Choosing the similarity function can highly affect the results of the following steps. -A Gaussian kernel is often chosen, but other similarities like cosine similarity might be proper for specific applications. Implementation Considerations (2/4) |Graph and similarity matrix construction: Laplacian matrices are generally chosen to be positive and semi-definite thus their eigenvalues will be non-negatives. -A few variants Unnormalized −1 −1 −1 −1 𝑆𝑦 2 2 2 2 𝑳 =𝑫 ൗ𝑳𝑫 ൗ =𝑰−𝑫 ൗ𝑾𝑫 ൗ Asymmetric 𝑳 =𝑫−1𝑳=𝑰−𝑫−1𝑾 𝐴𝑠 Implementation Considerations (3/4) |Computing the eigenvectors. ‒Efficient methods exist for sparse matrices. |Different ways of building the similarity graphs ‒ ε-neighborhood graph. ‒ k-nearest neighbor graph. ‒fully connected graph. Implementation Considerations (4/4) |Choosing k: ‒ Similar to k-means, there are many heuristics to use. ‒ The eigengap heuristic: to choose a k such that first k eigenvalues are very small but the (k+1)th one is relatively large. |Clustering: simple algorithms other than k-means can be used in the last stage, such as simple linkage, k-lines, elongated k-means, mixture model, etc. Recap: Pros and Cons of Spectral Clustering |Advantages: -Does not make strong assumptions on the forms of the -Easy to implement, and can be implemented efficiently even for large data sets as long as the similarity graph is sparse. -Good clustering results. -Reasonably fast for sparse data sets of several thousand |Disadvantages: - May be sensitive to choice of parameters for neighborhood - Computationally expensive for large datasets. 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com