程序代写 ECS656U/ECS796P

ECS656U/ECS796P
Distributed Systems

What this lecture is about

Copyright By PowCoder代写 加微信 powcoder

• Peer to Peer
• Distributed Hash Tables

Peer to Peer Thanks to Prof.

Introduction: client-server architecture
• Always-onhost
• PermanentIPaddress • Serverfarmsforscaling
© J. Kurose and K. Ross, 1996-2006

Introduction: client-server architecture
• Maybeintermittentlyconnected
• MayhavedynamicIPaddresses
• Communicatewithserver
• Donotcommunicatedirectlywitheach other
© J. Kurose and K. Ross, 1996-2006

Introduction: peer-to-peer architecture
• Noalways-onserver(“pure”P2P)
• Arbitraryend-systemscommunicate
• Peersareintermittentlyconnected and change IP addresses – like clients
© J. Kurose and K. Ross, 1996-2006

Introduction: peer-to-peer architecture
• P2PExamples:
• File sharing: Napster, Gnutella,
Kazaa, BitTorrent
• Streaming: KanKan, Tribler • VoIP: Skype
© J. Kurose and K. Ross, 1996-2006

Why Peer-to-peer (P2P)?
• Scaling:systemscaleswithnumberofclients,bydefinition
• Eliminatecentralization:
• No single point of failure
• No single point of control
• Selfmanaging

P2P example
• AlicerunsP2Pclientapplicationonhernotebookcomputer
• IntermittentlyconnectstoInternet;getsnewIPaddressforeachconnection
• AsksforMP3file“HeyJude”
• Applicationdisplaysotherpeersthathaveacopyof“HeyJude”
• Alicechoosesoneofthepeers,sayBob

P2P example
• FileiscopiedfromBob’sPCtoAlice’snotebook,e.g.,byHTTP • WhileAlicedoesthis,otherusersdownloadfromAlice
• E.g., some other MP3 file
• Alice’speerisbothaWebclientandatransientWebserver

P2P for the win!
üMore users = more servers üScalability!
• Client/Server:Addingausermeanlessresourcesperuser
• P2P:Addingauseraddsthatpeer’sresourcestothesystem

So, what’s P2P have in common with this course?
• Typically, each member stores/provides access to content
• Basically, a replication system for files
• P2P allows files to be anywhere -> searching is the challenge • Dynamic member list makes it more difficult

P2P fundamentals
localization
how to find a resource?
how to enter a P2P network?
distribution
how to distribute resources?
how to leave a P2P network?

Mapping Keys (files) and Resources to IP addresses
localization
how to find a resource?
how to enter a P2P network?
distribution
how to distribute resources?
how to leave a P2P network?

Locating content
• Locating content is often a two-step procedure
• What content is available? I.e., can MP3 ”Hey Jude” be found at all? • Who (what peers) can provide the content?
• We need a directory with the content available!

Distributing content
• Central directory
• Easytomanage❤
• Bo_leneckandsinglepointoffailure💔
• Ifcontentdirectoryisoverloadedordown,wholesystemsuffers💔
• Distributeddirectory
• Nobo_lenecks,nosinglepointoffailuresoroveloading❤
• Hardtomanage💔
• Harderthemorepeersand/orcontentyouhaveinsystem💔

Locating and distributing content
• Three basic architectures:
• Centralized directory (Napster, early BitTorrent)
• Query flooding (Gnutella)
• Hierarchical and non-hierarchical overlay designs (Kazaa, BT DHT)

Locating and distributing content
• Three basic architectures:
• Centralized directory (Napster, early BitTorrent)
• Query flooding (Gnutella)
• Hierarchical and non-hierarchical overlay designs (Kazaa, BT DHT)

• Original “Napster” design • Step1:connection
1. Clientuploadalistoffilesthatwanttoshare 2. Servermaintainsagloballist:
Note: Server does not store any file
Builds a database, mapping files to IP addresses.
centralized directory server

© J. Kurose and K. Ross, 1996-2006

• Original “Napster” design • Step1:connection
• Step2:search
1. Clientsendstoserverkeywordstosearchwith
2. Serversreturnsalistofhosts
3. Clientpingseachhostinthelisttofindtransferrates 4. Clientfetchesfilefrombesthost
centralized directory server
© J. Kurose and K. Ross, 1996-2006

• Original “Napster” design • Step1:connection
• Step2:search
• Step3:download
All communication uses TCP protocol
centralized directory server
© J. Kurose and K. Ross, 1996-2006

• Original “Napster” design • Step1:connecfon
• Step2:search
• Step3:download
centralized directory server
Server can regularly probe peers to determine if they are sfll connected and update its database accordingly.
© J. Kurose and K. Ross, 1996-2006

Can be seen as a hybrid between P2P and client-server
Builds a database, mapping files to IP addresses.
centralized directory server
© J. Kurose and K. Ross, 1996-2006

BitTorrent: overwiev
• Contact a centralized tracker that have a list of peers
• The torrent file contains the list of trackers associated to that specific file
• If Alice wants to join the torrent:
• Register with the tracker
• Tracker randomly selects peers and send to Alice the IP addresses

BitTorrent: the players
• Tracker: a special type of server.
It keeps track of where file copies reside on peer machines, which ones are available at
time of the client request (it does so by receiving heartbeats from peers) and helps coordinate efficient transmission and reassembly of the copied file.

BitTorrent: the players
• Two kinds of peers
• Seed: has the full file. It continues to be part of the system and help the other peers to download the file
• Leecher: has some blocks from the file and it is still looking to download the other part of the file

BitTorrent: operation
• Alice receives list of peer IP addresses from the Tracker • A_empts to establish TCP connecfons to these peers
• TCP-connected peers are “neighbouring peers” • Neighbouring peers will fluctuate over fme
Obtain list of peers

BitTorrent: operation
• File split into chunks (32KB – 256KB)
• Alice periodically asks neighboring peers for lists of chunks • Alice will issue requests for chunks she is currently missing • Alice will receive requests for chunks she has
Obtain list of peers

BitTorrent: operation
• At any given time, Alice has
• A subset of chunks
• Knowledge about chunks her neighboring peers have
• Two important decisions to make
• Which chunks should be requested next?
• To which neighbor should she send requested chunks?
Obtain list of peers

BitTorrent: operation
• Rarest chunks first
• Determine chunks that are rarest among her neighbors • Request those rarest chunks first
• Rarest chunks will then get more quickly redistributed
• Equalizing the number of copies of each chunk in the torrent Tracker
Obtain list of peers

BitTorrent: operation
• At any given time, Alice has
• A subset of chunks
• Knowledge about chunks her neighboring peers have
• Two important decisions to make
• Which chunks should be requested next?
• To which neighbor should she send requested chunks?
Obtain list of peers

BitTorrent: operation
• Give priority to neighbors currently supplying her data at the highest rate • For each of her neighbors, Alice measures her receiving rate
• Determine the four neighbors feeding her at highest rate
• Return favor by sending chunks to the top four neighbors
Obtain list of peers

BitTorrent: operation (tit-for-tat)
• Give priority to neighbors currently supplying her data at the highest rate • For each of her neighbors, Alice measures her receiving rate
• Determine the four neighbors feeding her at highest rate
• Return favor by sending chunks to the top four neighbors
Obtain list of peers
higher upload rate: find better trading partners, get file faster !

BitTorrent: operation
• Every 10 seconds, recalculate the rates
• Possibly modifying the set of top four peers
• Every 30 seconds, be optimistic!
• Pick a new neighbor, Bob, at random and send it chunks • Alice may then become one of Bob’s top four uploaders
• Bob would then start sending to Alice and maybe become one of her top four uploaders

BitTorrent: operation
• Random neighbor selectionà
• Peers capable of uploading at compatible rates tend to find each other • New peers will get chunks so that they can start trading

BitTorrent compared to Napster
• Chunk based downloading
• Anf-freeloading mechanisms

Centralized directory: limitations
• Single point of failure
• If server crashes, the entire P2P application crashes
• Performance bottleneck
• Huge amount of usersàhuge database and many queries/second
• Directory size
• When peers connect, they upload their content manifest (list of files) • As the number of peers increases, the resulting database does too

Centralized directory: limitations
• Single point of failure
• If server crashes, the entire P2P application crashes
file transfer is decentralized, but locating
• Performance bottleneck
• Huge amount of usersàhuge database and many queries/second
content is highly centralized
• Directory size
• When peers connect, they upload their content manifest (list of files) • As the number of peers increases, the resulting database does too

Locating and distributing content
• Three basic architectures:
• Centralized directory (Napster, early BitTorrent)
• Query flooding (Gnutella)
• Hierarchical and non-hierarchical overlay designs (Kazaa, BT DHT)

Query flooding
• No central directory of either content or peers
• Search by sending query to all directly connected peers
• Peers forward query to their directly connected peers, etc. i.e., flooding network with queries.
• Some rules to limit reach, prevent loops etc. Still very costly.

• Each Peer stores:
• Their own files
• Peer pointers (to the neighbour)
• Peers connected in a overlay network
• Each link is an implicit Internet path
• Peer A and Peer B are neighbour if they know their IP addresses and can send/receive messages

• How to find a file:
• Send request to all neighbors
• Neighbours recursively forward the
• Eventually a machine that has the file receives the request, and it sends back the answer (QueryHit)
• Transfers are done with HTTP between peers. If several QueryHits, select one!
File transfer: HTTP
Query QueryHit
Query QueryHit
© J. Kurose and K. Ross, 1996-2006

Gnutella: avoiding excessive traffic
File transfer: HTTP
Query QueryHit
• To avoid duplicate transmissions: each peer to maintain a list of recently received messages
• Query forwarded to all neighbours except from which received
• Each Query forwarded only once (it has an ID associated)
• TTL associated to each Query to avoid messages running forever in the network
Query QueryHit
© J. Kurose and K. Ross, 1996-2006

Gnutella: how to join the network
• Joining peer X must find some other peer in the Gnutella network: use list of candidate peers that are often up (e.g., at a Gnutella site)
• X sequentially attempts to establish TCP connections with peers on list until connection setup with peer Y
• X sends Ping message to Y; Y forwards Ping message (peer-count field used – TTL)
• All peers receiving Ping message respond with Pong message back through the overlay

Gnutella: problems
• Ping-Pong constituted 50% traffic
• Popular files are requested many times: lots of Query
• Some peers might not have enough traffic for all Query/Pings/Pongs
• 70% of users in 2000 were freeloaders: just downloading never uploaded any
• Flooding causes excessive traffic

Locating and distributing content
• Three basic architectures:
• Centralized directory (Napster, early BitTorrent)
• Query flooding (Gnutella)
• Hierarchical and non-hierarchical overlay designs (Kazaa, BT DHT)

• Basic idea: leverage heterogeneity of peers-> assign more responsibility to peers with be_er resources-> impose a hierarchy
• KazaaborrowsideasfromNapsterandGnutella
• Like in Gnutella, there is no dedicated server (or server farm) for
tracking and locafng content
• Unlike Gnutella, all peers are not equal in Kazaa—group leaders (or super peers) exist

• Group leaders maintain a database of content and IP addresses in a Napster- like fashion (group leader is a Napster-like hub)
• In contrast to Napster, a group leader is not a dedicated server

KaZaA: hierarchical overlay network
• Each peer is either a group leader or assigned to a group leader
• Group leader tracks the content in all its children
• Group leaders have high bandwidth connections
and high Internet connectivity
• A hierarchical overlay network
• Flooding limited to overlay of super peers
© J. Kurose and K. Ross, 1996-2006
ordinary peer
group-leader peer
neighoring relationships in overlay network

KaZaA: operation
• Publish:
• Node inform group-leader of the list of files it wants
to share • Search:
• Node queries group-leader. If file not found, group- leader flood the query to a subset of group-leaders
• Group-leader responds to node
• Node get file directly from peers
© J. Kurose and K. Ross, 1996-2006
ordinary peer
group-leader peer
neighoring relationships in overlay network

KaZaA: pros and cons
• Improves scalability and search efficiency by
exploifng node heterogeneity
• No guarantees on search fme
• No mechanism to tackle freeloading
© J. Kurose and K. Ross, 1996-2006
ordinary peer
group-leader peer
neighoring relationships in overlay network

The lookup complexity graph
N: number of peers
Communication overhead

The lookup complexity graph
centralized
N: number of peers
Communication overhead

The lookup complexity graph
N: number of peers
centralized
Communication overhead

The lookup complexity graph
N: number of peers
How can we get the optimal trade-off?
centralized
O(1) ? O(N) memory
Communication overhead
O(1) ? O(N)

Distributed Hash Table (DHT)
Hash function
• Turns data into a small number (a ”fingerprint”)
• InputàHash functionàHash (or hash sum, or hash value) • Choosing a good hash function can be tricky
• Hash table
• a structure that can map keys to values. It uses a hash function to compute an index into an array, from which the desired value can be found

Distributed Hash Table (DHT)
• Widely adopted in P2P systems
• Creates a fully decentralized index that maps file IDs to locations
• Allows a user to determine (basically) all the locations of a file without generating an excessive amount of search traffic

DHT for locating and distributing content
Can we cleverly distribute the mapping so that finding the resource is easy?

DHT basics
The hash function maps a resource-ID to a point in the hash output space
hash co-domain space

DHT basics
The hash function maps a resource-ID to a point in the hash output space
hash co-domain space

DHT basics
The codomain of the hash funcfon is split among the peers
• each peer is responsible to map a hashed ID to the IP address that can provide the
the peer does not store the resource, just the mapping!
hash co-domain space
peer 1 peer 5

DHT basics
The codomain of the hash function is split among the peers
• to fetch a resource, the client must contact the correct peer
• the hash table maps keys to the peer that is responsible for that key
hash co-domain space
peer 1 peer 5
hash(K) peer 2

DHT basics
This is challenging!!!!
• How to keep uniformly distributed load?
• How to handle peers that join and leave?
• How does a peer find who knows about a resource with key K?
knows the owner of K
peer 1 peer 5
who knows about K?

The Chord example
• Developed at MIT
• Basic idea: assign to each node and key an m-bit identifier using a base hash function such as SHA-1
• ID(node) = hash(IP, Port) • ID(key) = hash(key)
48 m=6 16 range is [0…63]

The Chord example
• Developed at MIT
48 m=6 16 range is [0…63]
• The key id space !
• interval between 0 and 2
• chord uses m = 160
• one can assume m = O(log N)

The Chord example
• The key id space 60
• interval between 0 and 2! − 1
• chord uses m = 160
• one can assume m = O(log N) Peer 6
• weassumem=6

The Chord example
• The key id space 60
• interval between 0 and 2! − 1
• chord uses m = 160
• one can assume m = O(log N) Peer 6
• weassumem=6
Each peer has authority over its id and all the smaller ones unfl its predecessor peer

The Chord example
• The key id space 60
• interval between 0 and 2! − 1
• chord uses m = 160
• one can assume m = O(log N) Peer 6
• weassumem=6
Each peer has authority over its id and all the smaller ones unfl its predecessor peer

The Chord example
• The key id space 60
• interval between 0 and 2! − 1
• chord uses m = 160
• one can assume m = O(log N) Peer 6
• weassumem=6
Each peer has authority over its id and all the smaller ones until its predecessor peer

The Chord example
• The key id space 60
• interval between 0 and 2! − 1
• chord uses m = 160
• one can assume m = O(log N) Peer 6
• weassumem=6
Each peer has authority over its id and all the smaller ones until its predecessor peer

The Chord example
• The key id space 60
• interval between 0 and 2! − 1
• chord uses m = 160
• one can assume m = O(log N) Peer 6
• weassumem=6
Each peer has authority over its id and all the smaller ones until its predecessor peer

The Chord example
• The key id space 60
• interval between 0 and 2! − 1
• chord uses m = 160
• one can assume m = O(log N) Peer 6
• weassumem=6
Each peer has authority over its id and all the smaller ones until its predecessor peer

Chord with simple routing
Each peer keeps a pointer (IP address and port) to its successor

Chord with simple routing
Each peer keeps a pointer (IP address and port) to its successor
I am looking for K. Who is responsible for hash(K)=18?

Chord with simple routing
Each peer keeps a pointer (IP address and port) to its successor
Simple Routing: Forward a lookup request to the successor until the correct peer is found
I am looking for K. Who is responsible for hash(K)=18?

Chord with simple routing
Each peer keeps a pointer (IP address and port) to its successor
Simple Routing: Forward a lookup request to the successor until the correct peer is found

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com