COMP3221: Distributed Systems
Communication-Routing
Dr Nguyen Tran
School of Computer Science
The University of Sydney
Page 1
Previously…
– Add questions here – Previous lecture:
– Shared memory allows multiple programs to communicate – Today’s lecture:
– What if we don’t have shared memory?
– How to implement an alternative communication medium, like message
channel?
The University of Sydney Page 2
Outline
– Layered Protocols
– Routing
– Distance-vector – Link-state
– Message-Oriented Transient Communication
– Socket
– Stream-Oriented Communication
Computer Networking: A Top Down Approach
7th edition
Jim Kurose, Keith Ross Pearson/Addison Wesley April 2016
The University of Sydney
Note: Some slides are adapted from J.F Kurose and K.W. Ross
Page 3
Layered Protocols
Communication-Routing Week 3, COMP3221
The University of Sydney Page 4
Communication problem
Communicating is about transmitting encoded information
– Problem:twocomputersmustagreeonthecode(language)theyuse
– Openstandardsgiverulesforeveryonetocommunicatewitheach other
– InternationalStandardsOrganization(ISO)
The Open Systems Interconnection (OSI) reference model names communication
levels, and assigns roles to each level
– InternetEngineeringTaskForce(IETF)
The Request For Comments (RFC) are a public description of internet communication protocols (e.g., TCP->RFC793, UDP->RFC768, SMTP->RFC2821, ICMP->RFC792)
– Private“closed”protocolsexist
– Skype:peopledidreverseengineeringtodiscover
the protocol, find security issues, or to implement IM clients The University of Sydney
Page 5
Two modes of communication
Connection oriented vs. Connectionless
– Connection oriented
– Establish explicitly a connection with a partner before exchanging data – Protocol example: Transmission Control Protocol (TCP)
– Application usage: file transfer, web browsing, email
– Connectionless
– No setup in advance is needed
– Protocol example: User Datagram Packet (UDP, IP) – Application usage: VoIP (Skype), IPTV
The University of Sydney
Page 6
Layers
OSI layers
– Each layer deals with one specific aspect of the communication
– The problem is divided into sub-parts that can be implemented individually
The University of Sydney Page 7
Layers
Message structure
– As the message go through lower layers before being sent, each layer adds its header to the message.
– Upon reception, the message is unmarshalled by the successive layers from bottom to top
The University of Sydney Page 8
source
Encapsulation
application
transport
network
link
physical
M
M
message segment Ht
datagram frame
Hn Ht M
Hl Hn Ht M
link
physical
switch
network
link
Hn
Ht
M
physical
destination
HnHt M
application
transport
network
link
physical
Hl Hn Ht M
M
Ht M
HnHt M
Hl Hn Ht M
The University of Sydney
Page 9
router
Network communication
Low-level layers – Physicallayer
– sendbits(0and1)
– Datalinklayer
– groupsbitsintoframes
– assignssequencenumberstoframesandaddsspecialbitsatthebeginning
and end
– addsachecksum(theresultofsomeoperationontheframecontent)
– Ifreceiverdisagreeaboutthechecksum,thenitasksthesendertoresend
– Example:EthernetforLocalAreaNetwork(LAN)orPPTPinVirtualprivate
network (VPN)
– Networklayer
– Routingofdatagramsfromsourcetodestination
– Example:InternetProtocol(IP)forWideAreaNetwork(WAN)
The University of Sydney Page 10
Network communication
Transport Layer
– Transportlayer:
– splitthemessagefromtheapplicationlayer
– numberthem
– addtheamountsofsentandremainingpackets
– Reliabletransportlayercanbebuiltontopof:
– Connection-orientedprotocol:packetwouldbeordered – Connectionlessprotocol:packetcouldbereordered
– Exampleoftransportlayerprotocol:TransmissionControlProtocol (TCP)
– Exampleoftransportlayerandnetworklayercombination:TCP/IP
The University of Sydney Page 11
Network communication
High level layers
– OSI specifies three higher level layers:
– Session
– synchronization, checkpointing, recovery of data exchange, dialog
control
– Example:
• Domain Name Service (DNS)
• Lightweight Directory Protocol (LDAP)
– Presentation layer
– allow applications to interpret meaning of data, e.g., encryption,
compression, machine-specific conventions
– Sometimes the session and presentations layers are omitted: Internet
protocol suite.
– Application
– Hypertext Transfer Protocol (HTTP)
– File Transfer Protocol (FTP)
– TCP/IP Terminal Emulation Protocol (Telnet)
– X-Window
Internet protocol stack
application
transport
network
link
physical
The University of Sydney
Page 12
Network communication
Analogy: Mail service vs. Web service
– Application(noclueofintermediarysteps):
– Youpostaletterwithsomeaddress,thereceiverreadsit
– HTTP:AusertypesaURLinabrowser,aserversendsbackthewebpage
– Transport(errorcontrol):
– Uponwritingawrongaddressonaletter,theletterwillbesentbacktoyou – TCPinitializedaconnection,checksforpotentialerrorsandmayretransmit
– Internet(recipientisunknown):
– Anairplanemoveslettersbetweencitieswithoutknowingtherecipients – IPbringspacketsovertheWANpotentiallyfromoneLANtoanother
– Datalink:
– Trucksmoveletterswithinacity
– EthernethandlestransmissionwithintheLAN
– Physicallayer:
– Usepentowriteandglassestoreadletters
– Specifyingfiber,wire,radiototransmittheoneandzeroencodingthemessage The University of Sydney Page 13
Routing
Communication Week 3, COMP3221
The University of Sydney Page 14
Routing Problem
What are routing protocols used for?
– Dijkstra thought about the shortest path algorithm in Amsterdam
when trying to find his way [Frana&Misa, CACM’10]
– Related to path finding problem in graphs
– Graph nodes represent internet routers,
edges are communication links
GPS clients use shortest path algorithms
– Routing is necessary in all networks except Local Area Networks (LANs) where Ethernet provides direct communication between all pairs of attached hosts
– The routing protocol is implemented in the network layer of each router to determine the route for the transmission of packets to their destination
The University of Sydney Page 15
Graph abstraction: costs
2 u
1
5 v3w
5 2
c(x,x’) = cost of link (x,x’) e.g., c(w,z) = 5
cost could always be 1, or inversely related to bandwidth, or inversely related to congestion
2 x
3 1
1 y
z
cost of path (x1, x2, x3,…, xp) = c(x1,x2) + c(x2,x3) + … + c(xp-1,xp)
key question: what is the least-cost path between u and z ? routing algorithm: algorithm that finds that least cost path
The University of Sydney
Page 16
Routing algorithm classification
Q: global or decentralized information?
Global:
– allroutershavecomplete topology, link cost info
– “linkstate”algorithms
Decentralized:
Q: static or dynamic?
Static:
– routes change slowly over time
Dynamic:
– routes change more quickly
– periodic update
– in response to link cost changes
Page 17
– routerknowsphysically- connected neighbors, link costs to neighbors
– iterativeprocessofcomputation, exchange of info with neighbors
– “distancevector”algorithms The University of Sydney
Distance vector algorithm
Bellman-Ford equation (dynamic programming)
let
dx(y) := cost of least-cost path from x to y
then
dx(y) = min {c(x,v) + dv(y) } v
cost from neighbor v to destination y cost to neighbor v
min taken over all neighbors v of x
The University of Sydney
Page 18
Distance vector algorithm
iterative, asynchronous: each local iteration caused by:
– local link cost change
– DV update message from
neighbor
distributed:
– each node notifies neighbors only when its DV changes
– neighbors then notify their neighbors if necessary
each node:
wait for (change in local link cost or msg from neighbor)
recompute estimates
if DV to any dest has changed, notify neighbors
The University of Sydney
Page 19
Dx(y) = min{c(x,y) + Dy(y), c(x,z) + Dz(y)} = min{2+0 , 7+1} = 2
Dx(z) = min{c(x,y) + Dy(z), c(x,z) + Dz(z)}
= min{2+1 , 7+0} = 3
node x
table x y z
x027 x023
cost to
cost to
x y z
yz ∞∞∞ ∞∞∞
yz
201 710
node y
table xyz
xy∞∞∞ 201
z
cost to
2y1 x7z
node z table
x∞∞∞
y z
∞∞∞
cost to
xyz
∞∞∞ 710
time
The University of Sydney
Page 20
from from
from
from
Dx(y) = min{c(x,y) + Dy(y), c(x,z) + Dz(y)} = min{2+0 , 7+1} = 2
cost to
x y z
x027 x023 x023
y∞∞∞ y201 y201 z∞∞∞ z710 z310
node y cost to cost to cost to table x y z x y z x y z
x∞∞∞ x027 x023 y201 y201 y201 z∞∞∞ z710 z310
Dx(z) = min{c(x,y) + Dy(z), c(x,z) + Dz(z)}
= min{2+1 , 7+0} = 3
node x
table x y z
cost to
cost to
xyz
2 y 1
x z 7
node z table
x∞∞∞ x027 x023
y∞∞∞ y201 y201
z710 z310 z310 time
cost to
cost to cost to
xyz
xyz xyz
The University of Sydney
Page 21
from from
from
from from from
from
from from
Distance-vector routing algorithm
Router Information Protocol (RIP)
– –
–
Mainly used in internet up to 1979
Relies on Bellman-Ford algorithm: Bellman algorithm [1957] distributed by Ford and Fulkerson [1972]
Link cost = 1
A1B 11C
D1E
– Ifanodedetectsalinkfailure,itsets∞astheassociatedcostofsuchalinkandsends its local table to neighbours
– Eventually(whenfailuresstop)eachroutergetsforeachdestinationthedirectionleading to the minimal cost
Routing table of router A
Dest.
Dir
Cost
A
local
0
B
B
1
C
B
2
D
D
1
E
B
2
1 1
The University of Sydney Page 22
Distance-vector routing algorithm
RIP (con’t)
Let’s re-start the algorithm from scratch: all routing tables are empty
Routing table of router A
Dest.
Dir
Cost
A
local
0
Routing table of router B
Dest.
Dir
Cost
B
local
0
A1B 11C
D1E
1 1
Routing table of router C
Dest.
Dir
Cost
C
local
0
Routing table of router D
Dest.
Dir
Cost
D
local
0
Routing table of router E
Dest.
Dir
Cost
E
local
0
The University of Sydney
Page 23
Distance-vector routing algorithm
RIP (con’t)
– Example: How is routing table A built from scratch?
TC
C
TC
Routing table of router A
Dest.
Dir
Cost
A
local
0
B
B
1
D
D
1
TD
TB
AB TE
DE
The University of Sydney
Page 24
TE
Distance-vector routing algorithm
RIP (con’t)
– Example: How is routing table A built from scratch?
TB’
AB
TE’ C
TD’
Routing table of router A
Dest.
Dir
Cost
A
local
0
B
B
1
C
B
2
D
D
1
E
B
2
DE
TE’
The University of Sydney
Page 25
Distance-vector routing algorithm
RIP (con’t)
– Example: How is routing table A built from scratch?
TB’’
AB TD’’
DE
C
Routing table of router A
Dest.
Dir
Cost
A
local
0
B
B
1
C
B
2
D
D
1
E
B
2
The University of Sydney
Page 26
RIP algorithm
– Advantages: – Simple
– Efficient in small networks
– Limitations:
– Costs based on the number of hops is unrealistic (bandwidth of links counts)
– No big deal: this is just a matter of defining link weights (support negative values)
– Inefficient in large networks as loops may occur before the convergence state is reached
The University of Sydney Page 27
A link-state routing algorithm
Dijkstra’s algorithm (1956): find the shortest path from a node i to other nodes
– net topology, link costs known to all nodes – accomplished via “link state broadcast” – all nodes have same info
– computes least cost paths from one node (“source”) to all other nodes
– gives forwarding table for that node
– iterative: after k iterations, know least cost path to k dest.’s
– Used in IS-IS (IP) and OSPF protocols
The University of Sydney
Page 28
Dijsktra’s algorithm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Notation:
– c(x,y): link cost from node x to y; = ∞ if not direct neighbors
– D(v): current value of cost of path from source to dest. v
– p(v): predecessor node along path from source to v
– N’: set of nodes whose least cost path definitively known
Initialization:
N’={u}
for all nodes v
if v adjacent to u then D(v) = c(u,v)
else D(v) = ∞
Loop
The University of Sydney
Page 29
find w not in N’ such that D(w) is a minimum addwtoN’
update D(v) for all v adjacent to w and not in N’ :
D(v) = min( D(v), D(w) + c(w,v) )
/* new cost to v is either old cost to v or known shortest path cost to w plus cost from w to v */
until all nodes in N’
Dijkstra’s algorithm: example
Step N’
0 u 7,u 3,u 5,u ∞ ∞
1
2 uwx
6,w
11,w 14,x
10,v 14,x 12,y
uw 6,w 5,u11,w ∞
uwxv 4 uwxvy 5 uwxvyz
Notes:
3
• Construct shortest path tree by tracing predecessor nodes
• Ties can exist (can be broken arbitrarily)
u
8 3
y
The University of Sydney
D(v), p(v)
D(w), p(w)
D(x), p(x)
D(y), p(y)
D(z), p(z)
5
4 w
7
x
9
2 z Page 30
7
v
3
4
Dijkstra’s algorithm: another example
Step N’ D(v),p(v) D(w),p(w) 0 u 2,u 5,u 1ux2,u4,x
D(x),p(x) D(y),p(y) D(z),p(z) 1,u ∞ ∞
2,x
∞ 4,y 4,y 4,y
2 uxy 3 uxyv 4 uxyvw 5 uxyvwz
2,u 3,y 3,y
5 v3w
2 u
1
5 231z
x1y2
The University of Sydney
Page 31
Dijkstra’s algorithm: another example
Step N’ D(v),p(v) D(w),p(w) 0 u 2,u 5,u 1ux2,u4,x
D(x),p(x) D(y),p(y) D(z),p(z) 1,u ∞ ∞
2,x
∞ 4,y 4,y 4,y
2 uxy 3 uxyv 4 uxyvw 5 uxyvwz
2,u
3,y 3,y
5
Resulting shortest-path to w:
-wßyßxßu 2v3w5
u 1
231z x1y2
The University of Sydney
Page 32
Dijkstra’s algorithm, discussion
Algorithm complexity: n nodes
– each iteration: need to check all nodes, w, not in N – n(n+1)/2 comparisons: O(n2)
– more efficient implementations possible: O(nlogn)
The University of Sydney
Page 33
Link-state routing algorithm
OSPF (Open Shortest Path First)
– “open”: publicly available – Uses link-state algorithm
– link state packet dissemination
– topology map at each node
– route computation using Dijkstra’s algorithm
– router floods OSPF link-state advertisements to all other routers in entire AS
– carried in OSPF messages directly over IP (rather than TCP or UDP – link state: for each attached link
– multiple same-cost paths allowed (only one path in RIP)
The University of Sydney Page 34
Comparison of LS and DV algorithms
Message complexity
– LS: with n nodes, E links, O(nE) msgs sent
Robustness: what happens if router malfunctions?
LS:
– DV: exchange between neighbors only
– node can advertise incorrect link cost
– each node computes only its own table
– convergence time varies Speed of convergence
DV:
– LS: O(n2) algorithm requires O(nE) msgs
– may have oscillations
– DV: convergence time varies
– may be routing loops
– count-to-infinity problem
– –
DV node can advertise incorrect path cost
each node’s table used by others
• errorpropagatethru network
The University of Sydney
Page 35
Path finding in large scale networks
– Many graph libraries are available
– Python: NetworkX – https://networkx.github.io
– Efficient implementations for many commonly known graph algorithms are available
– Example: Dijkstra’s shortest path
The University of Sydney
Page 36
Message-Oriented Transient Communication
Socket
Communication Week 3, COMP3221
The University of Sydney Page 37
Socket
socket controlled by app developer
application
process
transport
network
link
physical
application
process
transport
network
link
physical
Internet
controlled by OS
– Socket: interface between application and network: communication
end-points write data sent out over the network, and incoming data
can be read
– Application creates a socket
– Socket type dictates the style of communications: connection less/oriented
– A socket is identified by an IP address concatenated with a port number
– In general, sockets use the client-server model:
– The server waits for incoming client requests by listening to a specified port – Once a request is received, the server accepts a connection from the client
The University of Sydney Page 38
Address, Port, and Socket
application
process
transport
network
link
physical
application
process
transport
network
link
physical
socket controlled by app developer
Internet
controlled by OS
– Like Apartment and Mailboxes
– You are the application
– Your apartment building address is address
– Your mailbox is the port
– Socket is the key giving you access to the right mailbox
– Q: How to choose which port a socket connects to ? The University of Sydney
Page 39
Socket
Port
– Port numbers <1024 are for specific service protocols e.g., 80:HTTP, SSH:22, FTP:21, SMTP:25
– A client initiating a connection is assigned a free port number (>1024) by its host
Socket (146.86.5.20:1625)
The University of Sydney
Page 40
Host X (146.86.5.20)
Communication using socket
Socket (161.25.19.8:80)
Web server (161.25.19.8)
Socket
Berkley Sockets: Socket interface as proposed in Berkley UNIX in the 70’s
1. Servers generally execute the first 4 primitives on accept
2. Bind associates the newly created socket to a local address and a port; the client binds implicitly to any available port
3. The client connects to a specified address
4. Once the connection is accepted by the server; the client and server can communicate with send and receive.
The University of Sydney
Page 41
Stream-Oriented Communication
Communication Week 3, COMP3221
The University of Sydney Page 42
Stream-oriented communication
When timing is crucial
– Previouscommunicationmodesdonotguaranteetransmissionrate
– Instream-orientedcommunication,thestreamshouldnotbeinterrupted, i.e., messages should keep being received at regular (typically small) time intervals
♬♪♩♬♩♬♪
Network
– Example:anaudiostreaminCDqualityw/soundwaveat44,100Hz
– Destinationstartsreadingbeforetheentireinformationhasbeentransmitted
– Eachpieceofdatashouldbereadinorderatfixedrate:every1/44,100 seconds.
The University of Sydney Page 43
Stream-oriented communication
Quality of Service (QoS)
– Requirements to ensure that the temporal relationships in a
stream can be preserved:
– The required bit rate at which data should be transported
– The maximum delay until a session has been set up (i.e., when an
application can start sending data)
– The maximum end-to-end delay (i.e., how long it will take until a data unit
makes it to a recipient)
– The maximum delay variance, or jitter
– The maximum round-trip delay
– Problem: how to stream over internet?
– Internet Protocol (IP) is best effort, it drops packet
– IP rarely implements QoS The University of Sydney
Page 44
Stream-oriented communication
Buffer to cope with variable delays
– Use a buffer to store several data in advance at the receiver
– If packets are delayed with a certain variance but the average rate is
sustainable
– The receiver can pass packets to the application at regular time intervals
– The size of the receiver buffer is 9 seconds of packets to pass, unfortunately
packet #8 took 11 seconds to reach the receiver at which time the buffer was
empty.
The University of Sydney Page 45
Stream-oriented communication
Error correction and interleaving to cope with message losses
– With best effort protocols, packets can be dropped
– TCP/IP retransmit drops packets, yet this is too heavy for streaming application
– Forward error correction: k out of n packets are enough to reconstruct k packets
– Interleaving circumvents dropped packet to contain multiple consecutive audio and video frame
– The effect of packet loss in (a) non-interleaved transmission and (b) interleaved transmission
The University of Sydney Page 46
Streaming multimedia: DASH
– DASH: Dynamic, Adaptive Streaming over HTTP – server:
– divides video file into multiple chunks
– each chunk stored, encoded at different rates – manifest file: provides URLs for different chunks
– client:
– periodically measures server-to-client bandwidth – consulting manifest, requests one chunk at a time
• chooses maximum coding rate sustainable given current bandwidth
• can choose different coding rates at different points in time (depending on available bandwidth at time)
The University of Sydney
Page 47
Streaming multimedia: DASH
– DASH: Dynamic, Adaptive Streaming over HTTP – “intelligence” at client: client determines
– when to request chunk (so that buffer starvation, or overflow does not occur)
– what encoding rate to request (higher quality when more bandwidth available)
– where to request chunk (can request from URL server that is “close” to client or has high available bandwidth)
The University of Sydney
Page 48
Video Streaming and CDNs: context
§ video traffic: major consumer of Internet bandwidth
• Netflix, YouTube: 37%, 16% of downstream residential ISP traffic
• ~1B YouTube users, ~75M Netflix users
§ challenge: scale – how to reach ~1B users? • single mega-video server won’t work (why?)
§ challenge: heterogeneity
§ different users have different capabilities (e.g.,
wired versus mobile; bandwidth rich versus bandwidth poor)
The University of Sydney
Page 49
Content distribution networks
– challenge: how to stream content (selected from millions of videos) to hundreds of thousands of simultaneous users?
– option 1: single, large “mega-server” – single point of failure
– point of network congestion
– long path to distant clients
– multiple copies of video sent over outgoing link ….quite simply: this solution doesn’t scale
The University of Sydney
Page 50
Content distribution networks
– challenge: how to stream content (selected from millions of videos) to hundreds of thousands of simultaneous users?
– option 2: store/serve multiple copies of videos at multiple geographically distributed sites (CDN)
– enter deep: push CDN servers deep into many access networks • close to users
• used by Akamai, 1700 locations
– bring home: smaller number (10’s) of larger clusters in POPs near
(but not within) access networks • used by Limelight
– solution: distributed, application-level infrastructure The University of Sydney
Page 51
manifest file
where’s Madmen?
Content Distribution Networks (CDNs)
§ CDN: stores copies of content at CDN nodes • e.g. Netflix stores copies of MadMen
§ subscriber requests content from CDN
• directed to nearby copy, retrieves content
• may choose different copy if network path congested
The University of Sydney
Page 52
…
…
…
…
…
…
Conclusion
– Network protocols are divided into layers
– Routing protocols are
– simple but not scalable (distance-vector) or – more complex but scalable (link-state)
– There are various ways of communicating depending on the needs:
– We covered: General-purpose communication (Socket)
– Streaming-oriented Communication
The University of Sydney
Page 53
What’s Next ?
– Tutorial on Wednesday.
– Read Chapter 4 of the textbook
– Next week: more about communications in distributed systems – See you all next week !
The University of Sydney Page 54