程序语言:Python
截止时间: 2020年4月22日北京时间上午11点之前(很急!)。
程序要求:计算机研究生课程作业,Python实现一个单机版的Chord DHT模拟器,再实现一个equally-spaced replication算法。模板代码、输入文件和Part 2的debug输出文件在和此doc文件同级文件夹中。
具体要求:
Main Idea
In this assignment, you will download, install, and experiment with software that emulates Chord rings. In the first part, you will set-up Chord rings, map objects to nodes in the ring, emulate queries for different objects, and measure different aspects of the Chord ring operation. In the second part, you will add equally-spaced replication of objects (based on the paper discussed in class) into the Chord emulator and demonstrate how replication can provide robust access to objects in the presence of some nodes that intentionally disrupt routing operation.
Part 1
There are open-source Chord emulators available in Python, Java, and C/C++. As with previous assignments, we recommend and will provide more support for Python. If you strongly prefer Java or C/C++, check with the professor to make sure the emulator you are proposing to use meets the requirements for the assignment. The main requirement is that the software runs as a single process emulator designed to emulate an entire Chord ring within one process running on a single computer. Some open-source Chord implementations are not emulators but are true peer-to-peer implementations designed to be run on many computers implementing a peer-to-peer Chord ring among them.
To proceed with Python, download the following Chord DHT class (slightly modified from PChord, authored by Huang Li and available on GitHub):
dht.py(见文件夹)
The following Python program is a simple test program that uses the DHT class to set up a Chord ring with around 100 nodes, store some objects at their proper locations in the ring, and then look up and print some of the stored objects. Run this program with the above DHT class to make sure the emulator runs correctly.
tests.py(见文件夹)
One thing PChord lacks is the hashing operation that maps both nodes and objects randomly into the Chord ring. In Part 1, you will add this feature to the given Python code, test it out, and perform some measurements. Most of this part can be done without modifying the DHT class itself. You should be able to do all but one part (printing routes of queries) within a separate Python program (like tests.py) and simply call methods from the DHT class to implement the necessary operations.
Part 1a:
Nodes will be given to you as IP addresses. Treating those IP addresses as strings, use the SHA-256 function (available in Python’s hashlib library) to hash the address. You can then take the hash result (converted to an int) modulo the size of the ring to map the node into the ring. Use a ring with 4096 possible IDs throughout this assignment. Objects stored will be file names, which are also strings. Before storing or looking up a file name, convert it to a key using the same hashing method just described. The value stored will be the file name itself and the key is the key produced by hashing the file name.
The IP addresses of nodes that make up the ring, the file names of objects to be stored, and the queries to be run are given in files available in the “Resources” part of this assignment (“nodes.txt”, “files.txt”, “queries.txt”). After you have set up the ring and mapped the file names to the appropriate nodes, output the ring structure and stored file names into a file named “ring_structure.csv” with the following format. Each line of the csv file describes one node in the Chord ring and should contain the following information (separated by tabs, as indicated):
where each of the file names is stored on the node with that
For the queries listed in “queries.txt”, output the routes followed in your Chord ring into a file named “routes.csv” with the following format. Each line of the csv file describes one query in the Chord ring and should contain the following information (separated by tabs, as indicated):
where
Turn in your “ring_structure.csv” and “routes.csv” files, a screen shot showing the emulator running on your computer (include something in the screen shot that is identifiably from your computer), and your new “dht.py” and “
Part 1b:
After running the test ring and queries, generate a new Chord ring with 32 nodes and map 1,024 non-colliding file names into the ring. The nodes should include the automatically added node 0 and 31 with randomly chosen IDs. You can choose the file names randomly or scrape them from any data set of file names you wish. Since a small number of collisions are likely with 1,024 random file names and 4,096 possible IDs, check to see whether a file name’s ID is already taken by a previous file name, throw it out if it causes a collision, and keep generating until you reach 1,024 non-colliding file names. Next, generate and emulate 50 queries with a random query initiator from the nodes in the ring and a random file name from the list of file names you have stored. Generate and turn in the following graphs:
a bar chart showing how many file names are stored at each of the 32 nodes; also report the mean and standard deviation of the number of files per node
a bar chart showing the number of hops in each of the 50 queries; also report the mean and standard deviation of the number of hops per query
You can do this with a minor variation of your code that generated the “ring_structures.csv” and “routes.csv” files. You will have to make another modification to findNode() to count the number of hops in a query.
Complete Part 1 Turn-in (需要提交的文件)
“ring_structure.csv” file
“routes.csv” file
unique screen shot of emulator running on your computer
new “dht.py”
“
bar graph of file storage for Part 1b
bar graph of hop counts of queries for Part 1b
Part 2
In this part, you will add the equally-spaced replication technique from the “Replica Placement” paper into the Chord emulator you used in Part 1. This part will require significant modifications to the DHT class. First, have the nodes 24.121.88.210 and 184.15.113.235 emulate malicious nodes in the following way. In the findNode() method during lookup operations, check for the node IDs of those two nodes. Any time either of those node IDs appears in a route during a lookup, have findNode() immediately return with a “None” value. (Again, make sure you only do this during lookup operations to make sure not to disrupt other Chord operations.) With this modified code, re-run the inputs from Part 1a and note how many return “None” when they should have found and returned a file name.
Now, implement equally-spaced replication in your ring with a replication degree of 4, meaning that each object will be stored with its primary key and 3 alternate IDs that are equally spaced around the ring. (Even though, in this assignment, the ring size is fixed at 4096 and the replication degree is fixed at 4, your implementation should work for any ring size and replication degree. Make those parameters in your implementation that are then set to the fixed values for the program executions called for in this part.) This will require modifying both the store() and lookup() methods in the class. In the store() method, you will have to find the 4 appropriate nodes to store the replicas and store the file name at each of them, instead of just the one primary node. Re-run the node join and store operations using the “nodes.txt” and “files.txt” files from Part 1a and generate a “ring_structure_with_rep.csv” file having the same format as “ring_structure.csv” from Part 1a (note that each file name should be listed with 4 different nodes).
The lookup() method should sequentially look for the file name at the 4 possible locations, stopping when either a non-“None” value is returned or all 4 locations return “None”. Repeat the queries from Part 1a in the “queries.txt” file and generate a “routes_with_rep.csv” file having the same format as “routes.csv” from Part 1a (only include the nodes in the route to the replica that actually provides the requested file name; if all routes fail, list the route to the last attempted replica).
NB for Part 2: In order to minimize collisions among keys, replicas of file names should be stored under their primary key, not under their replica ID. A particular replica should be looked up based on its replica ID but stored under its primary key on the node hosting that replica. As an example, a primary key with ID 52 will have a replica with ID 1076. Assume that the successor of 1076 in the Chord ring is 1103. The replica will be stored on node 1103 as the key, value pair (key=52, value=v), where v is the file name associated with this key. Keep this in mind as you modify the store() and lookup() methods.
Part 2 Turn-in (需要提交的文件)
“ring_structure_with_rep.csv” file
“routes_with_rep.csv” file
modified “dht.py” file
“
Resources for Assignment (资源文件)
Python code for modified PChord emulator and test program
dht.py(见文件夹)
tests.py(见文件夹)
Input files with nodes, file names, and queries
nodes.txt(见文件夹)
files.txt(见文件夹)
queries.txt(见文件夹)
Files for debugging
nodes_debug.txt(见文件夹)
files_debug.txt(见文件夹)
queries_debug.txt(见文件夹)
ring_structure_debug.csv(见文件夹)
routes_debug.csv(见文件夹)
ring_structure_debug_with_rep2.csv(见文件夹)(with replication degree = 2)
routes_debug_with_rep2.csv(见文件夹)(with 11.144.17.27 as the malicious node)