Introduction to Machine Learning Introduction to Collaborative Filtering
Prof. Kutty
spectral clustering review
Copyright By PowCoder代写 加微信 powcoder
weighted undirected
Data → Graph: Adjacency matrix, ”
more similar E 1ktvlog 0 I
high value sin Eci act
Given !! = $̅ ”
compute similarity between each pair of datapoints e.g., Gaussian
Sim I aces
kernel similarity metric %”% = &$’ − example:
‘̅ ! (‘̅ ”
How might you get a weight of 0?
!# 0.0015 !$ 0.6
Simcacitacis
by convention self loops
explicitly represented
) = {%”%} for . = 1,…,2;4 = 1,…,2 ) is symmetric and each %”% ≥ 0
Spectral Clusteringvertex set V into partition A Ak
Goal:Givenagraphrepresentingthedata, K partitions
find a minimum cost RatioCutàk clustering
Ai n Aj É
minimize over 567.89:7 ; , … , ;
= 1 = >:7(;”, ;”) called “ratio cut”
2 ;” ensures clusters are
“#$ reasonably large groups
Idea: Use the eigenvectors and eigenvalues of the graph Laplacian matrix A = B − )
Adjacency Matrix W, Degree Matrix ! and Graph Laplacian ”
• AdjacencyMatrixW
! = {$!”} for & = 1,…,*;, = 1,…,*
where $!” = sim(2̅ ! , 2̅ ” ) • Degreematrix4with !
C”” =∑%#$%”% and C”% =0for.≠4
C$$ ⋯ 0 B=⋮⋱⋮
• The graph Laplacian is the matrix A = B − )
• We are interested in the eigenvectors and eigenvalues of L
Graph Laplacian #
The graph Laplacian is the matrix A = B − )
Properties of A
• A is symmetric • AisPSD
– A has 2 non-negative real-valued eigenvalues
0≤M$ ≤⋯≤M!
• Fact: the multiplicity of the eigenvalue 0 is the number of connected components in the graph
• Examples
decreasing order
image from: https://towardsdatascience.com/spectral-clustering-aba2640c0d5b
Spectral Clustering: intuition for # = 2
Example: for $ = 2
Specify a cut as an ind̅icator vector 8̅ ∈ R#
where for cut ; and ; Al
cardinality of su A
4th dimension
number of distinct elements inA
Example: for $ = 2
Specify a cut as an ind̅icator vector 8̅ ∈ R# where for cut ; and ;
Can show that 8̅$<8̅ = * ;̅) and∑!8!=0and 8̅%=*
Example: for $ = 2
Goal: min8̅$<8̅ subject to ∑! 8! = 0 and 8̅ % = * '̅ ̅
where 8 is an indicator vector as defined NP hard
Modified Goal:
min 8̅$<8̅subjectto∑!8! =0and 8̅ % =*
solution: eigenvector corresponding to the (second*) smallest eigenvalue of % For k > 2, similar idea. Use multiple indicator vectors
* see examples
Why does this work? Spectral Clustering for # partitions
min $%&'()*& +),…,+* &! ,…,&” ≈
̅ min̅ Tr(2-32) ,(!),…,,(“)
indicator vectors
s.t. 2-2 = 9
!# !$ !%
s.t. 2-2 = 9 The solution to this is the first : eigenvectors of 3
̅ min̅ Tr(2-32) ,(!),…,,(“)
real vectors
Spectral Clustering for ! partitions
2. Compute graph Laplacian 3 = < − ;
3. Compute (eigenvector, eigenvalue) pairs of 3
4. Build matrix with the first : eigenvectors (corresponding to the : smallest eigenvalues) as columns interpret rows as new data points
Low dimensional embedding (∈ R*) of the original dataset (∈ R.) 5. Apply :-means to new data representation
Output: clusters assignments
Input: valid similarity metric, number of clusters : 1. Build adjacency matrix ;
Spectral Clustering Example #1
23 0.5B DE 60.5 0.50007
0.25 6 0.5 0.5 0 0 0 7
C 64 0 0 0 0 0 75 W=2 3 0 0 0 0.25 0.25
1 0.5 0 0 0 60.5 1 0 0 07 64 0 0 1 0 0 75
0 0 0 1 0.25 0 0 0 0.25 1
Input: similarity metric, number of clusters k 1. Build adjacency matrix ,
2. Compute graph Laplacian - = / − ,
0 0 0 0.25 0.25
Spectral Clustering Example #1
6 0.5 0.5 0 0 0 7 Input: similarity metric, number of clusters k
6 0.5 0.5 0 0 0 7 1. Build adjacency matrix ,
64 0 0 0 0 0 75 2 . C o m p u t e g r a p h L a p l a c i a n - = / − ,
0 0 0 0.25 0.25 3. Compute eigenvectors/eigenvalues of - 0 0 0 0.25 0.25
(eigenvector, eigenvalue) pairs
2p 2p2p2pp3 p3p3p3
2 2 2 222p 2 2 2 p3
0 0 0 0 00 02 0 0 2 2 p2 p2 02 0p2 0p2
pp p666p2 7772
62 2 2 2627 2 2 2 7 6 60 06 060 00 072 0707 72
6 2 6 2 6 2 6 2 02 7 02 7 02 7 2 7
,0 ,06,0 ,0.57,1
6 0 1.6000 061.00 6010.00 01.0700 0 7 0 7 0 7
p 6 0p 1.0p0 0p 0 0 7 666 7p7p7
6 pp267222 7
4 42 42 52525 4 0 00 0 05 0 0 0
240p2 0p2 p2 05
0 00 00 00 0 0 0 0
2 2 2 022 022 22 2 0
ordered in
non decreasing order
Spectral Clustering Example #1
Input: similarity metric, number of clusters k 1. Build adjacency matrix /
2. Compute graph Laplacian % = 1 − /
3. Compute eigenvectors of %
Eigenvectors
4. Build matrix with the first k eigenvectors (corresponding to the k smallest eigenvalues) as columns interpret rows as new data points
5. Apply k-means to new data representation Output: clusters assignments
B6 2 7 607
Eigenvalues
[0, 0, 0, 0.5, 1.0]
Spectral Clustering Example #1
0.5BDE 3=3
3 dimensional embedding
20.5 0.50 0 03 A=2 p2 0 0
0 0 0 0.25 0.25 646 02
0 0 0 027 00027
T 0 p2 22
67pp 0.50.500 0 6p2 p2
L=D W 22 2
00000 000
6 7 2p2 T p23
p2 p2 B=6 2 0 0 0 2
4 5 66 02 10.00 0 0 027 0 0 0 0.25 0.25 2p 2p
2 0 0 0 p273 6p22p7
6 00202
660 2 1.00 0 p T 0p 02 7
pp7 C=01.00p02p02 0
O46 0 0 0 57
Eigenvectors 66 2p 2p 7
6 0 0 07
p2 A 000 400 2T 205
p2 p2 60 1.00 0
67D=00 226p2p27
222 B6 0007 00 0
6 7 400 2 205
C60 1.00 0 0 07 p2 p2
Run 3-means clustering on this embedding Return clusters
6 7 00 220
D40 0 2 2 05 E= 2 2
E p2p2 00 220
Eigenvalues
[0, 0, 0, 0.5, 1.0]
Spectral Clustering Example #2
0.8 B 0.4 E 0.8
Eigenvectors
Eigenvalues
Spectral Clustering Example #2
0.8 B 0.4 E 0.8
Eigenvectors
Eigenvalues
Spectral Clustering Example #2
0.8 B 0.4 E 0.8
Eigenvectors
Eigenvalues
0.8 0.6 0.4 0.2
0 0.2 0.4
0.4 0.2 0
0.2 0.4 0.6
Clustering Algorithms
http://scikit-learn.org/stable/modules/clustering.html
Properties of Clustering Algorithms
Hierarchical:
-deterministic, don’t need to decide # of clusters ahead of time -only input parameters are distance measure and linkage criteria -slow
-fast, linear in all relevant quantities
-can be affected by outliers
-problems arise when clusters are different sizes, densities and shapes
- high performance with smaller datasets
- not very efficient although faster approximate variants exist , , and . Jordan. 2009. Fast approximate spectral clustering.
collaborative filtering
Recommendations as Matrix Completion
call this the utility (or user-item) matrix Y
How to solve for the missing ratings?
1)Matrix factorization
2)Nearest neighbor prediction
Nearest Neighbor Prediction
Suppose user a has not rated movie i To predict the rating
– compute similarity between user a and all other users in the system
– find the k ‘nearest neighbors’ (i.e., k users most similar to user a) who have rated movie i
– computeapredictionbasedontheseusers’ratingsofi
How to solve for the missing ratings?
1)Matrix factorization
2)Nearest neighbor prediction
Restate problem with constraints
Given ! with empty cells
Construct low rank !" with no empty cells
Low-Rank Factorization: example
COMEDY and ACTION are the latent factors rows are user vectors
columns are item vectors
based on example from Aggarwal 2016
approximated by
Objective Function
First note that "! = $%$
S o "! = $ % $ = '& ' , ... , '& ( $ + ̅ ' , ... , + ̅ ) = '& % ⋅ + ̅ &
!t1 !/3!/ -"=20"%&−"%& +20"%&
- $ , % = 12 0 " % & − $ % $ % & / + 23 0 $ % 0 / + 23 0 % & 0 / (%,&)∈. %,0 &,0
= 12 0 " % & − '& % ⋅ + ̅ & / + 23 0 '& % / + 23 0 + ̅ & / (%,&)∈. % &
Idea: Minimize - $, % using coordinate descent
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com