Lab5-Specs
COMP9318-Lab5¶
Instructions¶
This note book contains instructions for COMP9318-lab5.
You are required to complete your implementation in a file submission.py provided along with this notebook.
You are not allowed to print out unnecessary stuff. We will not consider any output printed out on the screen. All results should be returned in appropriate data structures via corresponding functions.
You can submit your implementation for lab5 via following link: http://kg.cse.unsw.edu.au:8318/lab5/ .
For each question, we have provided you with detailed instructions along with question headings. In case of any problem, you can post your query @ Piazza.
You are allowed to add other functions and/or import modules (you may have to in this lab), but you are not allowed to define global variables. Only functions are allowed in submission.py.
You should not import unnecessary modules/libraries, failing to import such modules at test time will lead to errors.
We will provide immediate feedback on your submission. You can access your scores using the online submission portal on the same day.
For Final Evaluation we will be using a different dataset, so your final scores may vary.
You are allowed to submit as many times as you want before the deadline, but ONLY the latest version will be kept and marked.
Submission deadline for this assignment is 23:59:59 on 10th June, 2018. We will not accept any late submissions.
Question-1: Spectral Clustering…¶
In this lab you are required to implement the spectral clustering algorithm to cluster a given graph G into TWO clusters.
In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import networkx as nx
# Reading edges in a Graph…
data_file=’./asset/a’
with open(data_file) as infile:
edges = [tuple(map(int,(line.strip().split(‘ ‘)))) for line in infile]
# Creating a new Graph…
G = nx.Graph(name=”Ex1″)
for edge in edges:
G.add_edge(*edge)
# You can visualize the Graph for your own understanding…
nx.draw_networkx(G, with_labels=True, node_color=’y’)
plt.show()
You need to implement spectral clustering algorithm (i.e., spectral_clustering() in the file: submission.py). The input arguments of spectral_clustering() are:
Input:
G: A graph as defined above
Output:
eigenvectors, i.e., a numpy matrix containing two eigenvectors starting with the second smallest eigenvector. Its dimensions should be (N,2), where N stands for number of nodes in the graph.
clusters, i.e., lists of sub-lists containing the NODE-IDs corresponding to two clusters.
For example, a sample output is shown in the cell given below:
In [2]:
# Reading edges in a Graph…
data_file=’./asset/a’
with open(data_file) as infile:
edges = [tuple(map(int,(line.strip().split(‘ ‘)))) for line in infile]
# Creating a new Graph…
G = nx.Graph(name=”Ex1″)
for edge in edges:
G.add_edge(*edge)
In [13]:
G.degree()
Out[13]:
{2: 2, 3: 2, 4: 3, 5: 3, 6: 4, 7: 4, 8: 2, 9: 2}
In [28]:
G.nodes()
Out[28]:
[2, 3, 4, 5, 6, 7, 8, 9]
In [27]:
nx.adjacency_matrix(G).todense()
Out[27]:
matrix([[0, 1, 1, 0, 0, 0, 0, 0],
[1, 0, 1, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 1, 1, 0],
[0, 0, 1, 1, 0, 1, 0, 1],
[0, 0, 0, 1, 1, 0, 1, 1],
[0, 0, 0, 1, 0, 1, 0, 0],
[0, 0, 0, 0, 1, 1, 0, 0]], dtype=int64)
In [19]:
adj = nx.adjacency_matrix(G).todense()
degree = np.sum(adj, axis = 1)
for i in range(len(degree)):
adj[i, i] -= degree[i, 0]
lm = – adj
lm
Out[19]:
matrix([[ 2, -1, -1, 0, 0, 0, 0, 0],
[-1, 2, -1, 0, 0, 0, 0, 0],
[-1, -1, 3, 0, -1, 0, 0, 0],
[ 0, 0, 0, 3, -1, -1, -1, 0],
[ 0, 0, -1, -1, 4, -1, 0, -1],
[ 0, 0, 0, -1, -1, 4, -1, -1],
[ 0, 0, 0, -1, 0, -1, 2, 0],
[ 0, 0, 0, 0, -1, -1, 0, 2]], dtype=int64)
In [ ]:
nx.adjacency_matrix(G).todense()
In [22]:
u, v, w = np.linalg.svd(lm)
In [34]:
len(v)
Out[34]:
8
In [33]:
list(range(-1, -len(v), -1))
Out[33]:
[-1, -2, -3, -4, -5, -6, -7]
In [35]:
for z in range(-1, -len(v), -1):
if v[z] > 1e-8:
break
z
Out[35]:
-2
In [46]:
from nltk.cluster.kmeans import KMeansClusterer
import nltk
import random
In [53]:
eigenvectors = u[:, [z, z-1]]
print(eigenvectors.shape)
eigenvectors
(8, 2)
Out[53]:
matrix([[-0.50495006, -0.07917647],
[-0.50495006, -0.07917647],
[-0.32789765, 0.05021194],
[ 0.29532479, -0.25362181],
[ 0.14117906, 0.22693353],
[ 0.28773764, 0.03212914],
[ 0.3535069 , -0.60546503],
[ 0.26004936, 0.70816516]])
In [61]:
np.array(eigenvectors)
Out[61]:
array([[-0.50495006, -0.07917647],
[-0.50495006, -0.07917647],
[-0.32789765, 0.05021194],
[ 0.29532479, -0.25362181],
[ 0.14117906, 0.22693353],
[ 0.28773764, 0.03212914],
[ 0.3535069 , -0.60546503],
[ 0.26004936, 0.70816516]])
In [59]:
type(eigenvectors)
Out[59]:
numpy.matrixlib.defmatrix.matrix
In [58]:
type(np.array([[1,2], [3,4]]))
Out[58]:
numpy.ndarray
In [65]:
kmeans_ = KMeansClusterer(num_means=2, distance=nltk.cluster.util.euclidean_distance, repeats=50, normalise=True,rng=random.Random(10))
clusters = kmeans_.cluster(np.array(eigenvectors), assign_clusters=True)
res = [[], []]
nodes = G.nodes()
for i in range(len(clusters)):
res[clusters[i]].append(nodes[i])
res
Out[65]:
[[2, 3, 4], [5, 6, 7, 8, 9]]
In [23]:
u[:, -3]
Out[23]:
matrix([[-0.07917647],
[-0.07917647],
[ 0.05021194],
[-0.25362181],
[ 0.22693353],
[ 0.03212914],
[-0.60546503],
[ 0.70816516]])
In [31]:
w[-3,:]
Out[31]:
matrix([[-0.07917647, -0.07917647, 0.05021194, -0.25362181, 0.22693353,
0.03212914, -0.60546503, 0.70816516]])
In [25]:
G.nodes()
Out[25]:
[2, 3, 4, 5, 6, 7, 8, 9]
In [24]:
u[:, -2]
Out[24]:
matrix([[-0.50495006],
[-0.50495006],
[-0.32789765],
[ 0.29532479],
[ 0.14117906],
[ 0.28773764],
[ 0.3535069 ],
[ 0.26004936]])
In [26]:
v
Out[26]:
array([ 5.46755534e+00, 4.89885222e+00, 3.64878133e+00,
3.00000000e+00, 3.00000000e+00, 1.63417760e+00,
3.50633502e-01, 4.29864924e-16])
In [66]:
import submission as submission
# Reading edges in a Graph…
data_file=’./asset/a’
with open(data_file) as infile:
edges = [tuple(map(int,(line.strip().split(‘ ‘)))) for line in infile]
# Creating a new Graph…
G = nx.Graph(name=”Ex1″)
for edge in edges:
G.add_edge(*edge)
eigenvector, clusters =submission.spectral_clustering(G)
print(eigenvector)
print(eigenvector.shape)
print(type(eigenvector))
print(‘clusters = ‘,clusters)
[[-0.50495006 -0.07917647]
[-0.50495006 -0.07917647]
[-0.32789765 0.05021194]
[ 0.29532479 -0.25362181]
[ 0.14117906 0.22693353]
[ 0.28773764 0.03212914]
[ 0.3535069 -0.60546503]
[ 0.26004936 0.70816516]]
(8, 2)
clusters = [[2, 3, 4], [5, 6, 7, 8, 9]]
In [2]:
import submission as submission
# Reading edges in a Graph…
data_file=’./asset/a’
with open(data_file) as infile:
edges = [tuple(map(int,(line.strip().split(‘ ‘)))) for line in infile]
# Creating a new Graph…
G = nx.Graph(name=”Ex1″)
for edge in edges:
G.add_edge(*edge)
eigenvector, clusters =submission.spectral_clustering(G)
print(eigenvector)
print(eigenvector.shape)
print(type(eigenvector))
print(‘clusters = ‘,clusters)
[[-0.50495006 -0.07917647]
[-0.50495006 -0.07917647]
[-0.32789765 0.05021194]
[ 0.14117906 0.22693353]
[ 0.29532479 -0.25362181]
[ 0.28773764 0.03212914]
[ 0.26004936 0.70816516]
[ 0.3535069 -0.60546503]]
(8, 2)
clusters = [[6, 5, 7, 9, 8], [2, 3, 4]]
Note:
For this lab we will be using Un-normalized Graph Laplacian.
For k-means clustering you should use nltk with Euclidean distance.
Please use the following nltk commands to generate clusters from the eigenvectors
kmeans_ = KMeansClusterer(num_means=2, distance=nltk.cluster.util.euclidean_distance, repeats=50, normalise=True,rng=random.Random(10))
clusters = kmeans_.cluster(eigenvectors, assign_clusters=True)
Submission¶
You need to complete the function spectral_clustering() in the file: submission.py. You can test your submission against sample test cases via online submission system (i.e., http://kg.cse.unsw.edu.au:8318/lab5/).
Test Environment¶
For testing, we have pre-installed the requisite modules and/or libraries in the testing environment. You are only allowed to use following libraries:
python: 3.5.2
numpy: 1.14.0
networkx: 2.1
nltk: 3.2.5
Note:
1. You need to implement the methodology by yourself. You are not allowed to import **sklearn** and/or any other library in Lab5.
2. For cluster labels you should use the priorly provided commands (given in last cell) to generate the clusters.
In [ ]: