COMP9312_Project_Q1: Computing Shortest Distance¶
For details about the project, please refer to the project specification. You can edit this file and add anything you like. We will only use the code cell of ShortestDistance class for testing. Instead of creating a seperated PDF document, you can add descriptions and some theoretical analysis (e.g., index space, query time complexity, and indexing time complexity) in this file.
Copyright By PowCoder代写 加微信 powcoder
Note: Make sure to sequentially run all the cells in each section, so that the intermediate variables / packages will carry over to the next cell.
1. Code Template¶
You need to implement the ShortestDistance class to support shortest distance queries for large graphs (e.g., hundreds of millions). A code templage is given below.
The ShortestDistance class is initialized by a graph G. The data structure of G will be presented in the following section. The class calls the function preprocess to precompute some index structure for G. The query function has two inputs: source_vertex and target_vertex. The function outputs the shortest distance from source_vertex to target_vertex in G.
############################################################################
# Add any modules you want to use here~
############################################################################
class ShortestDistance(object):
def __init__(self, G):
self.G = G
self.preprocess(G)
############################################################################
# TODO: You may add some index data structure here~
# analyze the space usage of the index (all additional data structure)~
############################################################################
def preprocess(self, G):
############################################################################
# TODO: Your code here~
# precompute any data structure for G and use that to speed up your query processing
# analyze the time complexity of preprocess()
############################################################################
def query(self, source_vertex, target_vertex):
shortest_distance = 0
############################################################################
# TODO: Your code here~
# Input: source_vertex, target_vertex
# Output: the shortest distance between source_vertex and target_vertex
# analyze the time complexity of query()
############################################################################
return shortest_distance
############################################################################
# You can define any auxiliary functions~
############################################################################
2. Graph Data Structure¶
Below is the data stucture of the input graph G in the ShortestDistance class.
############################################################################
# Do not edit this code cell.
############################################################################
class DirectedWeightedGraph(object):
def __init__(self, edge_list):
self.vertex_dict = {}
self.adj_list_in = []
self.adj_list_out = []
self.vertex_num = 0
for [src, dst, weight] in edge_list:
self.add_edge(src, dst, weight)
def add_vertex(self, name):
id = self.vertex_num
self.vertex_dict[name] = id
self.vertex_num += 1
self.adj_list_in.append(list())
self.adj_list_out.append(list())
def add_edge(self, vertex1, vertex2, weight):
if vertex1 not in self.vertex_dict.keys():
self.add_vertex(vertex1)
if vertex2 not in self.vertex_dict.keys():
self.add_vertex(vertex2)
self.adj_list_out[self.vertex_dict[vertex1]].append([vertex2, weight])
self.adj_list_in[self.vertex_dict[vertex2]].append([vertex1, weight])
3. How to test your code¶
3.1 Download the sample dataset.¶
Running the following command will create a folder COMP9312_datasets containing files about three datasets. Cora (2k vertices) is a real citation graph with an synthetic random weight for each edge. map_BJ_part (4k vertices) is a real road network for a small area of Beijing, and map_NY_part (7k vertices) is a real road network for a small area of . Each dataset has three files. For each dataset, *.graph includes all graph edges. *.query includes a set of shortest distance queries for testing. *.answer includes the answer for each query computed by us in the *.query file for your reference.
If the dataset has already exists, there would be an error like “destination path ‘COMP9312_datasets’ already exists”.
Note: We will use different query datasets to test the algorithm.
!git clone https://github.com/guaiyoui/COMP9312_datasets.git
Cloning into ‘COMP9312_datasets’…
remote: Enumerating objects: 11, done.
remote: Counting objects: 100% (11/11), done.
remote: Compressing objects: 100% (11/11), done.
remote: Total 11 (delta 0), reused 11 (delta 0), pack-reused 0
Unpacking objects: 100% (11/11), done.
3.2 The main function¶
Our test procedure first loads the graph dataset and the query dataset. Then it calls the ShortestDistance class to preprocess the graph. After that, it will run each query and test their efficiency and correctness.
import time
import numpy as np
if __name__ == “__main__”:
print(‘\n##### Loading the dataset…’)
# edge_list = np.loadtxt(‘./COMP9312_datasets/cora.graph’, dtype=int)
# query_list = np.loadtxt(‘./COMP9312_datasets/cora.query’, dtype=int)
edge_list = np.loadtxt(‘./COMP9312_datasets/map_BJ_part.graph’, dtype=int)
query_list = np.loadtxt(‘./COMP9312_datasets/map_BJ_part.query’, dtype=int)
# edge_list = np.loadtxt(‘./COMP9312_datasets/map_NY_part.graph’, dtype=int)
# query_list = np.loadtxt(‘./COMP9312_datasets/map_NY_part.query’, dtype=int)
G = DirectedWeightedGraph(edge_list)
print(‘\n##### Start to preprocessing…’)
start_preprocessing = time.time()
SD = ShortestDistance(G)
end_preprocessing = time.time()
print(“preprocessing time: {}”.format(end_preprocessing-start_preprocessing))
print(‘\n##### Test on the query …’)
start_query = time.time()
for i in range(query_list.shape[0]):
distance = SD.query(query_list[i][0], query_list[i][1])
print(“the distance between {} and {} is: {}”.format(query_list[i][0], query_list[i][1], distance))
end_query = time.time()
print(“average query time: {}”.format((end_query-start_query)/query_list.shape[0]))
##### Loading the dataset…
##### Start to preprocessing…
preprocessing time: 1.0967254638671875e-05
##### Test on the query …
the distance between 211102 and 9401219 is: 0
the distance between 9907233 and 9311274 is: 0
the distance between 9704296 and 9301203 is: 0
the distance between 9903217 and 9301253 is: 0
the distance between 9609341 and 9307247 is: 0
the distance between 9605336 and 9405204 is: 0
average query time: 3.059705098470052e-05
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com