CS代写 DE 60.50.50007

Introduction to Machine Learning Introduction to Collaborative Filtering
Prof. Kutty

spectral clustering review

Copyright By PowCoder代写 加微信 powcoder

weighted undirected
Data → Graph: Adjacency matrix, ”
more similar E 1ktvlog 0 I
high value sin Eci act
Given !! = $̅ ”
compute similarity between each pair of datapoints e.g., Gaussian
Sim I aces
kernel similarity metric %”% = &$’ − example:
‘̅ ! (‘̅ ”
How might you get a weight of 0?
!# 0.0015 !$ 0.6
Simcacitacis
by convention self loops
explicitly represented
) = {%”%} for . = 1,…,2;4 = 1,…,2 ) is symmetric and each %”% ≥ 0

Spectral Clusteringvertex set V into partition A Ak
Goal:Givenagraphrepresentingthedata, K partitions
find a minimum cost RatioCutàk clustering
Ai n Aj É
minimize over 567.89:7 ; , … , ;
= 1 = >:7(;”, ;”) called “ratio cut”
2 ;” ensures clusters are
“#$ reasonably large groups
Idea: Use the eigenvectors and eigenvalues of the graph Laplacian matrix A = B − )

Adjacency Matrix W, Degree Matrix ! and Graph Laplacian ”
• AdjacencyMatrixW
! = {$!”} for & = 1,…,*;, = 1,…,*
where $!” = sim(2̅ ! , 2̅ ” ) • Degreematrix4with !
C”” =∑%#$%”% and C”% =0for.≠4
C$$ ⋯ 0 B=⋮⋱⋮
• The graph Laplacian is the matrix A = B − )
• We are interested in the eigenvectors and eigenvalues of L

Graph Laplacian #
The graph Laplacian is the matrix A = B − )
Properties of A
• A is symmetric • AisPSD
– A has 2 non-negative real-valued eigenvalues
0≤M$ ≤⋯≤M!
• Fact: the multiplicity of the eigenvalue 0 is the number of connected components in the graph
• Examples
decreasing order
image from: https://towardsdatascience.com/spectral-clustering-aba2640c0d5b

Spectral Clustering: intuition for # = 2

Example: for $ = 2
Specify a cut as an ind̅icator vector 8̅ ∈ R#
where for cut ; and ; Al
cardinality of su A
4th dimension
number of distinct elements inA

Example: for $ = 2
Specify a cut as an ind̅icator vector 8̅ ∈ R# where for cut ; and ;
Can show that 8̅$<8̅ = * ;̅) and∑!8!=0and 8̅%=* Example: for $ = 2 Goal: min8̅$<8̅ subject to ∑! 8! = 0 and 8̅ % = * '̅ ̅ where 8 is an indicator vector as defined NP hard Modified Goal: min 8̅$<8̅subjectto∑!8! =0and 8̅ % =* solution: eigenvector corresponding to the (second*) smallest eigenvalue of % For k > 2, similar idea. Use multiple indicator vectors
* see examples

Why does this work? Spectral Clustering for # partitions
min $%&'()*& +),…,+* &! ,…,&” ≈
̅ min̅ Tr(2-32) ,(!),…,,(“)
indicator vectors
s.t. 2-2 = 9
!# !$ !%
s.t. 2-2 = 9 The solution to this is the first : eigenvectors of 3
̅ min̅ Tr(2-32) ,(!),…,,(“)
real vectors

Spectral Clustering for ! partitions
2. Compute graph Laplacian 3 = < − ; 3. Compute (eigenvector, eigenvalue) pairs of 3 4. Build matrix with the first : eigenvectors (corresponding to the : smallest eigenvalues) as columns interpret rows as new data points Low dimensional embedding (∈ R*) of the original dataset (∈ R.) 5. Apply :-means to new data representation Output: clusters assignments Input: valid similarity metric, number of clusters : 1. Build adjacency matrix ; Spectral Clustering Example #1 23 0.5B DE 60.50.50007 0.25 60.5 0.5 0 0 0 7 C 64 0 0 0 0 0 75 W=2 3 0 0 0 0.25 0.25 1 0.5 0 0 0 60.5 1 0 0 07 64 0 0 1 0 0 75 0 0 0 1 0.25 0 0 0 0.25 1 Input: similarity metric, number of clusters k 1. Build adjacency matrix , 2. Compute graph Laplacian - = / − , 0 0 0 0.25 0.25 Spectral Clustering Example #1 6 0.5 0.5 0 0 0 7 Input: similarity metric, number of clusters k 60.5 0.5 0 0 0 7 1. Build adjacency matrix , 64 0 0 0 0 0 75 2 . C o m p u t e g r a p h L a p l a c i a n - = / − , 0 0 0 0.25 0.25 3. Compute eigenvectors/eigenvalues of - 0 0 0 0.25 0.25 (eigenvector, eigenvalue) pairs 2p 2p2p2pp3 p3p3p3 2 2 2 222p 2 2 2 p3 0 00 000 02 0 0 2 2 p2 p2 02 0p2 0p2 pp p666p2 7772 62 2 2 2627 2 2 2 7 6 600606000 072 0707 72 6 2 6 2 6 2 6 2 02 7 02 7 02 7 2 7 ,0 ,06,0 ,0.57,1 6 0 1.6000 061.00 6010.00 01.0700 0 7 0 7 0 7 p 6 0p 1.0p0 0p 0 0 7 666 7p7p7 6 pp267222 7 4 42 42 52525 4 0 00 0 05 0 0 0 240p2 0p2p2 05 0 00000000 0 0 2 2 2 022 022 22 2 0 ordered in non decreasing order Spectral Clustering Example #1 Input: similarity metric, number of clusters k 1. Build adjacency matrix / 2. Compute graph Laplacian % = 1 − / 3. Compute eigenvectors of % Eigenvectors 4. Build matrix with the first k eigenvectors (corresponding to the k smallest eigenvalues) as columns interpret rows as new data points 5. Apply k-means to new data representation Output: clusters assignments B6 2 7 607 Eigenvalues [0, 0, 0, 0.5, 1.0] Spectral Clustering Example #1 0.5BDE 3=3 3 dimensional embedding 20.50.50 0 03 A=2p2 0 0 0 0 0 0.25 0.25 64602 0 0 0 027 00027 T 0 p2 22 67pp 0.50.500 0 6p2 p2 L=DW 22 2 00000 000 6 7 2p2 T p23 p2 p2 B=62 0 0 0 2 4 5 66 02 10.00 0 0 027 0 0 0 0.25 0.25 2p 2p 2 0 0 0 p273 6p22p7 6 00202 660 2 1.00 0 p T 0p 02 7 pp7 C=01.00p02p02 0 O46 0 0 0 57 Eigenvectors 66 2p 2p 7 6 0 0 07 p2 A000 4002T205 p2 p2 60 1.00 0 67D=00 226p2p27 222 B60007 00 0 6 7 4002205 C60 1.00 0 0 07 p2 p2 Run 3-means clustering on this embedding Return clusters 6 7 00220 D40 0 2 2 05 E= 2 2 E p2p2 00220 Eigenvalues [0, 0, 0, 0.5, 1.0] Spectral Clustering Example #2 0.8 B 0.4 E 0.8 Eigenvectors Eigenvalues Spectral Clustering Example #2 0.8 B 0.4 E 0.8 Eigenvectors Eigenvalues Spectral Clustering Example #2 0.8 B 0.4 E 0.8 Eigenvectors Eigenvalues 0.8 0.6 0.4 0.2 0 􏰇0.2 􏰇0.4 􏰇0.4 􏰇0.2 0 0.2 0.4 0.6 Clustering Algorithms http://scikit-learn.org/stable/modules/clustering.html Properties of Clustering Algorithms Hierarchical: -deterministic, don’t need to decide # of clusters ahead of time -only input parameters are distance measure and linkage criteria -slow -fast, linear in all relevant quantities -can be affected by outliers -problems arise when clusters are different sizes, densities and shapes - high performance with smaller datasets - not very efficient although faster approximate variants exist , , and . Jordan. 2009. Fast approximate spectral clustering. collaborative filtering Recommendations as Matrix Completion call this the utility (or user-item) matrix Y How to solve for the missing ratings? 1)Matrix factorization 2)Nearest neighbor prediction Nearest Neighbor Prediction Suppose user a has not rated movie i To predict the rating – compute similarity between user a and all other users in the system – find the k ‘nearest neighbors’ (i.e., k users most similar to user a) who have rated movie i – computeapredictionbasedontheseusers’ratingsofi How to solve for the missing ratings? 1)Matrix factorization 2)Nearest neighbor prediction Restate problem with constraints Given ! with empty cells Construct low rank !" with no empty cells Low-Rank Factorization: example COMEDY and ACTION are the latent factors rows are user vectors columns are item vectors based on example from Aggarwal 2016 approximated by Objective Function First note that "! = $%$ S o "! = $ % $ = '& ' , ... , '& ( $ + ̅ ' , ... , + ̅ ) = '& % ⋅ + ̅ & !t1 !/3!/ -"=20"%&−"%& +20"%& - $ , % = 12 0 " % & − $ % $ % & / + 23 0 $ % 0 / + 23 0 % & 0 / (%,&)∈. %,0 &,0 = 12 0 " % & − '& % ⋅ + ̅ & / + 23 0 '& % / + 23 0 + ̅ & / (%,&)∈. % & Idea: Minimize - $, % using coordinate descent 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com