算法代写 Assignment 2

Assignment specification.

Assignment 2
Spatial and Top-k search algorithms Due Date: Oct 31st, 2018 5:00pm

The goal of this assignment is the development and testing of spatial and top-k search algorithms.

Dataset.

For this assignment, you will have to browse to http://www.cs.utah.edu/~lifeifei/SpatialDataset.htm and download two files using the following links:
California Road Network’s Nodes (Node ID, Longitude, Latitude)
California Road Network’s Edges (Edge ID, Start Node ID, End Node ID, L2 Distance)

In the first file, each line contains the data of a node in a spatial network. The line contains a unique identifier and its spatial coordinates. In the 2nd file, each line is an edge of the network, which is characterized by a pair of nodes (together with the Euclidean distance of the edge’s endpoints). We assume that the network is undirected (i.e., the edges do not have a direction; if there is an edge from node 0 to node 1 then there is an edge from node 1 to node 0).

Q1 (Programming) Storage and indexing of the network in memory [20%]

Write a program that reads the data from the given files and constructs a data structure for them in memory. Specifically, for each node the data structure stores (1) the coordinates of the node and (2) in a list (i.e., adjacency list) the neighbors of the node (which imply the existence of the corresponding edges).

Requirements:

Your data structure could be in the form of an array, where the position of a node implies its ID. For example, the node with Node-ID=0 is in the first position of the array. This position includes the coordinates of the node and a pointer to its adjacency list. For each neighbor of the node, you should store in the adjacency list the ID of the neighbor and its Euclidean distance from the node. You can use existing data structures from the programming language of your choice (e.g., vectors or ArrayList).

Q2 (programming) Dijkstra and A* [40%]

You are asked to implement the shortest-path search algorithms Dijkstra and A*.

Implement these search algorithms as functions that take as input (i) the identifier of the source node s and (ii) the identifier of the destination node t and they should compute correctly and return (in the form of a list) the shortest path from s to t and the corresponding shortest path distance from s to t. They should also compute and return the number of iterations (i.e., the number of visited nodes) during the execution of algorithm.

Requirements:

  1. a)  The algorithms should use the data structure which you implemented in Q1.
  2. b)  For the computation of the lower bound of the network distance, use the Euclidean distance computed from the coordinates of the nodes.
  3. c)  For the distance between two connected nodes you can use the existing value stored in your data structure.
  4. d)  The main program should take as command-line arguments the start and destination node-IDs, it should execute both algorithms and print their results. You can use off-the-shelf data structures from your programming language libraries (e.g., priority queue).

COMP8101 Advanced topics in data engineering

Q3 (programming) Best meeting point [40%]

You are asked to implement a variant of NRA top-k algorithm, which solves the following problem. We are given m nodes of the spatial network, which correspond to the initial positions of m users who want to meet at a node of the network. The best meeting point is the vertex of the network which minimizes the distance that the users have to travel in order to meet. The “cost” (i.e., penalty) of a potential meeting point is defined by aggregating the network distances that the m users have to travel in order to meet. The aggregate distance we will use is γ=max. The distance from a user to a meeting point is the shortest path distance from the location of the user to the meeting point. We want to find the point which minimizes the maximum distance that any user has to travel in order to for them to meet. For example, let n1, n2, …, nm be the starting points of the m users and let v be a possible meeting point. Then, the cost of v is defined as max(dist(n1, v), dist(n2, v),…, dist(nν, v)), where dist(ni, v) is the shortest path distance from ni to v for i=1,…,m. The best meeting point is the node with the minimum cost.

Your NRA algorithm should operate as follows. From each starting position of a user ni access the other nodes in ascending order of their network distance from ni. For this, you need to implement an incremental version of Dijkstra’s algorithm. For each visited node from any user we keep track the known distances and the corresponding shortest paths. We continue searching from the user node which has currently the smallest distance to its last visited node (compared to all other users). Specifically, assume that the user locations are n1, n2, …, nν and that the last nodes that we have visited from them are v1, v2, …, vν respectively. If min(dist(n1, v1), dist(n2, v2),…, dist(nν, vν)) = dist(ni, vi), then we will continue Dijkstra to visit the next node from node ni. If a node nx has been visited from all users, its exact cost is already known and we compare it to the best meeting point so far, which gets updated if necessary. The algorithm terminates if the candidate meeting points which have not yet been visited from all Dijkstra searches cannot become better than the best meeting point known so far.

Requirements:

  1. a)  Implement and test the algorithm. Your program should take as input a number of network nodes (indicated by their IDs) which correspond to the starting positions of the users. You may use off-the-shelf data structures and functions from your preferred programming language.
  2. b)  Select sets of users who are close to each other and also sets of users who are far away from each other and compare the performance of the algorithm for the different sets.
  3. c)  Suggest an alternative algorithm which emulates the TA top-k algorithm to solve the same problem. Do you think that this algorithm would be better or worse than the NRA algorithm?

COMP8101 Advanced topics in data engineering

Submission.

Please submit your assignment (one ZIP file) to moodle on or before 5:00pm, Oct 31st, 2018. Make sure all contents are readable. Your code should be compilable without problems and you should include basic instructions on how to compile and use it. Your program can be written in your preferred programming language (e.g., C, C++, Java, Python, etc.). Your program must run within reasonable time.

COMP8101 Advanced topics in data engineering