程序代写代做代考 distributed system algorithm dns Peer-to-peer (p2p) systems

Peer-to-peer (p2p) systems
• Idea: create distributed systems out of individually owned, unreliable machines
– Also different administrative domains
– This has been tried with parallel computers in several projects, including “Grid Computing”
• In p2p systems, the primary problem is
lookup
– given a data item stored at one or more places, find it

p2p lookup problem
• Traditional approach for this kind of problem is domain name system (DNS)
– DNS is tree based; with p2p, what happens if the root node (or node high in the tree) fails?
– Also, the load is not shared, even with caching

p2p lookup problem
• Possible approaches
– Centralized server (Napster)
• Problem: scalability, single point of failure
– Flood (Gnutella)
• Problem: inefficiency
– Unstructured (Freenet)
• Problem: not finding the object

DHT: Distributed Hash Tables
• Provides a lookup service similar to a hash table
– Hash table maps a key to a bucket
– With DHT, we want to translate a key to a node • And then to the data (within the node)
– Nodes have connectivity through an overlay network (just a virtual network)
• Goals of DHT:
– Load Balance, Decentralization, Scalability, Fault Tolerance (leaving and joining)

DHT: Distributed Hash Tables
• Operation is as follows
– Given a key k and data d, we call put(k, d) to store that data on a node that is determined by k • Practically, we may hash the data d to produce key k
• Hash function needs to spread keys out evenly
– To retrieve d, call get(k), which automatically retrieves it from the proper node

Picture of Distributed Hash Table

DHT Issues
• Balancing load: need to distribute the data evenly, or else may as well use DNS
• Forwarding algorithm: how does a node move a request closer to the node that has the data
• Handling joins, departures, and failures

Example DHT: Chord
• Nodes form a logical circle in order of node id
• Each node has a “skiplist”, which contains the node (think IP address) half-way around the circle, quarter-way around the circle, etc.
• On a put or get, forward to node in skiplist that has the highest id not exceeding k
– Gets us to O(log N) time if there are N nodes
– But what if there is a failure in a skiplist node? • See next slide

Picture of Chord

Example DHT: Chord
• Load balance: achieved by hashing the node’s IP address to get the node id, and by hashing the data to get the key
• Decentralization: no node is more important than any other
• Scalability: skiplist makes lookups O(log N)
• Fault Tolerance: keep a few successors at each node; effective as long as at least one successor is alive

Adding a node in Chord
• To add a new node i (assumes i is unique and known)
– Initialize predecessor, successors, and skiplist for node i
– Update skiplist, successors, and predecessor of existing nodes
– Move appropriate keys/data to this node • Removing a new node is symmetric