The University of Sydney Page 1
COMP3221: Distributed
Systems
Communication-Routing
Dr Nguyen Tran
School of Computer Science
The University of Sydney Page 2
Previously…
– Add questions here
– Previous lecture:
– Shared memory allows multiple programs to communicate
– Today’s lecture:
– What if we don’t have shared memory?
– How to implement an alternative communication medium, like message
channel?
The University of Sydney Page 3
Outline
– Layered Protocols
– Routing
– Distance-vector
– Link-state
– Message-Oriented Transient
Communication
– Socket
– Stream-Oriented Communication
Computer
Networking: A Top
Down Approach
7th edition
Jim Kurose, Keith Ross
Pearson/Addison Wesley
April 2016
Note: Some slides are adapted from J.F Kurose and K.W. Ross
The University of Sydney Page 4
Layered Protocols
Communication-Routing
Week 3, COMP3221
The University of Sydney Page 5
Communication problem
– Problem: two computers must agree on the code (language) they use
– Open standards give rules for everyone to communicate with each
other
– International Standards Organization (ISO)
The Open Systems Interconnection (OSI) reference model names communication
levels, and assigns roles to each level
– Internet Engineering Task Force (IETF)
The Request For Comments (RFC) are a public description of internet
communication protocols (e.g., TCP->RFC793, UDP->RFC768,
SMTP->RFC2821, ICMP->RFC792)
– Private “closed” protocols exist
– Skype: people did reverse engineering to discover
the protocol, find security issues, or to implement IM clients
Communicating is about transmitting encoded information
The University of Sydney Page 6
Two modes of communication
– Connection oriented
– Establish explicitly a connection with a partner before exchanging data
– Protocol example: Transmission Control Protocol (TCP)
– Application usage: file transfer, web browsing, email
– Connectionless
– No setup in advance is needed
– Protocol example: User Datagram Packet (UDP, IP)
– Application usage: VoIP (Skype), IPTV
Connection oriented vs. Connectionless
The University of Sydney Page 7
Layers
OSI layers
– Each layer deals with one specific aspect of the communication
– The problem is divided into sub-parts that can be implemented
individually
The University of Sydney Page 8
Layers
Message structure
– As the message go through lower layers before being sent, each
layer adds its header to the message.
– Upon reception, the message is unmarshalled by the successive
layers from bottom to top
The University of Sydney Page 9
source
application
transport
network
link
physical
HtHn M
segment Ht
datagram
destination
application
transport
network
link
physical
HtHnHl M
HtHn M
Ht M
M
network
link
physical
link
physical
HtHnHl M
HtHn M
HtHn M
HtHnHl M
router
switch
Encapsulation
message M
Ht M
Hn
frame
The University of Sydney Page 10
Network communication
– Physical layer
– send bits (0 and 1)
– Data link layer
– groups bits into frames
– assigns sequence numbers to frames and adds special bits at the beginning
and end
– adds a checksum (the result of some operation on the frame content)
– If receiver disagree about the checksum, then it asks the sender to resend
– Example: Ethernet for Local Area Network (LAN) or PPTP in Virtual private
network (VPN)
– Network layer
– Routing of datagrams from source to destination
– Example: Internet Protocol (IP) for Wide Area Network (WAN)
Low-level layers
The University of Sydney Page 11
Network communication
– Transport layer:
– split the message from the application layer
– number them
– add the amounts of sent and remaining packets
– Reliable transport layer can be built on top of:
– Connection-oriented protocol: packet would be ordered
– Connectionless protocol: packet could be reordered
– Example of transport layer protocol: Transmission Control Protocol
(TCP)
– Example of transport layer and network layer combination: TCP/IP
Transport Layer
The University of Sydney Page 12
Network communication
– OSI specifies three higher level layers:
– Session
– synchronization, checkpointing, recovery of data exchange, dialog
control
– Example:
• Domain Name Service (DNS)
• Lightweight Directory Protocol (LDAP)
– Presentation layer
– allow applications to interpret meaning of data, e.g., encryption,
compression, machine-specific conventions
– Sometimes the session and presentations layers are omitted: Internet
protocol suite.
– Application
– Hypertext Transfer Protocol (HTTP)
– File Transfer Protocol (FTP)
– TCP/IP Terminal Emulation Protocol (Telnet)
– X-Window
High level layers
application
transport
network
link
physical
Internet protocol stack
The University of Sydney Page 13
Network communication
Analogy: Mail service vs. Web service
– Application (no clue of intermediary steps):
– You post a letter with some address, the receiver reads it
– HTTP: A user types a URL in a browser, a server sends back the web page
– Transport (error control):
– Upon writing a wrong address on a letter, the letter will be sent back to you
– TCP initialized a connection, checks for potential errors and may retransmit
– Internet (recipient is unknown):
– An airplane moves letters between cities without knowing the recipients
– IP brings packets over the WAN potentially from one LAN to another
– Data link:
– Trucks move letters within a city
– Ethernet handles transmission within the LAN
– Physical layer:
– Use pen to write and glasses to read letters
– Specifying fiber, wire, radio to transmit the one and zero encoding the message
The University of Sydney Page 14
Routing
Communication
Week 3, COMP3221
The University of Sydney Page 15
Routing Problem
– Dijkstra thought about the shortest path
algorithm in Amsterdam
when trying to find his way [Frana&Misa,
CACM’10]
– Related to path finding problem in graphs
– Graph nodes represent internet routers,
edges are communication links
What are routing protocols used for?
GPS clients use shortest path algorithms
– Routing is necessary in all networks except Local Area Networks (LANs)
where Ethernet provides direct communication between all pairs of attached
hosts
– The routing protocol is implemented in the network layer of each router to
determine the route for the transmission of packets to their destination
The University of Sydney Page 16
Graph abstraction: costs
u
yx
wv
z
2
2
1
3
1
1
2
5
3
5
c(x,x’) = cost of link (x,x’)
e.g., c(w,z) = 5
cost could always be 1, or
inversely related to bandwidth,
or inversely related to
congestion
cost of path (x1, x2, x3,…, xp) = c(x1,x2) + c(x2,x3) + … + c(xp-1,xp)
key question: what is the least-cost path between u and z ?
routing algorithm: algorithm that finds that least cost path
The University of Sydney Page 17
Routing algorithm classification
Q: global or decentralized
information?
Global:
– all routers have complete
topology, link cost info
– “link state” algorithms
Decentralized:
– router knows physically-
connected neighbors, link costs to
neighbors
– iterative process of computation,
exchange of info with neighbors
– “distance vector” algorithms
Q: static or dynamic?
Static:
– routes change slowly over
time
Dynamic:
– routes change more
quickly
– periodic update
– in response to link cost
changes
The University of Sydney Page 18
Distance vector algorithm
Bellman-Ford equation (dynamic programming)
let
dx(y) := cost of least-cost path from x to y
then
dx(y) = min {c(x,v) + dv(y) }v
cost to neighbor v
min taken over all neighbors v of x
cost from neighbor v to destination y
The University of Sydney Page 19
iterative, asynchronous: each
local iteration caused by:
– local link cost change
– DV update message from
neighbor
distributed:
– each node notifies
neighbors only when its DV
changes
– neighbors then notify their
neighbors if necessary
wait for (change in local link
cost or msg from neighbor)
recompute estimates
if DV to any dest has changed,
notify neighbors
each node:
Distance vector algorithm
The University of Sydney Page 20
x y z
x
y
z
0 2 7
∞ ∞ ∞
∞ ∞ ∞
fr
om
cost to
fro
m
fr
om
x y z
x
y
z
0
x y z
x
y
z
∞ ∞
∞ ∞ ∞
cost to
x y z
x
y
z
∞ ∞ ∞
7 1 0
cost to
∞
2 0 1
∞ ∞ ∞
2 0 1
7 1 0
time
x z
12
7
y
node x
table
Dx(y) = min{c(x,y) + Dy(y), c(x,z) + Dz(y)}
= min{2+0 , 7+1} = 2
Dx(z) = min{c(x,y) +
Dy(z), c(x,z) + Dz(z)}
= min{2+1 , 7+0} = 3
32
node y
table
node z
table
cost to
fr
om
The University of Sydney Page 21
x y z
x
y
z
0 2 3
fr
om
cost to
x y z
x
y
z
0 2 7
fr
om
cost to
x y z
x
y
z
0 2 3
fr
om
cost to
x y z
x
y
z
0 2 3
fr
om
cost to
x y z
x
y
z
0 2 7
fr
om
cost to
2 0 1
7 1 0
2 0 1
3 1 0
2 0 1
3 1 0
2 0 1
3 1 0
2 0 1
3 1 0
time
x y z
x
y
z
0 2 7
∞ ∞ ∞
∞ ∞ ∞
fr
om
cost to
fro
m
fr
om
x y z
x
y
z
0
x y z
x
y
z
∞ ∞
∞ ∞ ∞
cost to
x y z
x
y
z
∞ ∞ ∞
7 1 0
cost to
∞
2 0 1
∞ ∞ ∞
2 0 1
7 1 0
time
x z
12
7
y
node x
table
Dx(y) = min{c(x,y) + Dy(y), c(x,z) + Dz(y)}
= min{2+0 , 7+1} = 2
Dx(z) = min{c(x,y) +
Dy(z), c(x,z) + Dz(z)}
= min{2+1 , 7+0} = 3
32
node y
table
node z
table
cost to
fr
om
The University of Sydney Page 22
Distance-vector routing algorithm
– Mainly used in internet up to 1979
– Relies on Bellman-Ford algorithm: Bellman algorithm [1957] distributed by Ford and
Fulkerson [1972]
– Link cost = 1
– If a node detects a link failure, it sets ∞ as the associated cost of such a link and sends
its local table to neighbours
– Eventually (when failures stop) each router gets for each destination the direction leading
to the minimal cost
Router Information Protocol (RIP)
A
D
B
E
C
1
1
1 1
1
1
Routing table of router A
Dest. Dir Cost
A local 0
B B 1
C B 2
D D 1
E B 2
The University of Sydney Page 23
Distance-vector routing algorithm
Let’s re-start the algorithm from scratch: all routing tables are empty
RIP (con’t)
Routing table of router A
Dest. Dir Cost
A local 0
A
D
B
E
C
1
1
1 1
1
1
Routing table of router B
Dest. Dir Cost
B local 0
Routing table of router C
Dest. Dir Cost
C local 0
Routing table of router D
Dest. Dir Cost
D local 0
Routing table of router E
Dest. Dir Cost
E local 0
The University of Sydney Page 24
Distance-vector routing algorithm
– Example: How is routing table A built from scratch?
RIP (con’t)
Routing table of router A
Dest. Dir Cost
A local 0
B B 1
D D 1
A
D
B
E
CTE
TE TC
TC
TD
TB
The University of Sydney Page 25
Distance-vector routing algorithm
– Example: How is routing table A built from scratch?
RIP (con’t)
Routing table of router A
Dest. Dir Cost
A local 0
B B 1
C B 2
D D 1
E B 2
A
D
B
E
C
TD’
TB’
TE’
TE’
The University of Sydney Page 26
Distance-vector routing algorithm
– Example: How is routing table A built from scratch?
RIP (con’t)
Routing table of router A
Dest. Dir Cost
A local 0
B B 1
C B 2
D D 1
E B 2
A
D
B
E
C
TD’’
TB’’
The University of Sydney Page 27
RIP algorithm
– Advantages:
– Simple
– Efficient in small networks
– Limitations:
– Costs based on the number of hops is unrealistic (bandwidth of links counts)
– No big deal: this is just a matter of defining link weights (support negative
values)
– Inefficient in large networks as loops may occur before the convergence
state is reached
The University of Sydney Page 28
A link-state routing algorithm
Dijkstra’s algorithm (1956): find the shortest path from a node i
to other nodes
– net topology, link costs known to all nodes
– accomplished via “link state broadcast”
– all nodes have same info
– computes least cost paths from one node (“source”) to all
other nodes
– gives forwarding table for that node
– iterative: after k iterations, know least cost path to k dest.’s
– Used in IS-IS (IP) and OSPF protocols
The University of Sydney Page 29
Dijsktra’s algorithm
1 Initialization:
2 N’ = {u}
3 for all nodes v
4 if v adjacent to u
5 then D(v) = c(u,v)
6 else D(v) = ∞
7
8 Loop
9 find w not in N’ such that D(w) is a minimum
10 add w to N’
11 update D(v) for all v adjacent to w and not in N’ :
12 D(v) = min( D(v), D(w) + c(w,v) )
13 /* new cost to v is either old cost to v or known
14 shortest path cost to w plus cost from w to v */
15 until all nodes in N’
Notation:
– c(x,y): link cost from node x to y;
= ∞ if not direct neighbors
– D(v): current value of cost of path
from source to dest. v
– p(v): predecessor node along path
from source to v
– N’: set of nodes whose least cost
path definitively known
The University of Sydney Page 30
w3
4
v
x
u
5
3
7 4
y
8
z
2
7
9
Dijkstra’s algorithm: example
Step N’
D(v),
p(v)
0
1
2
3
4
5
D(w),
p(w)
D(x),
p(x)
D(y),
p(y)
D(z),
p(z)
u ∞ ∞ 7,u 3,u 5,u
uw ∞ 11,w 6,w 5,u
14,x 11,w 6,wuwx
uwxv 14,x 10,v
uwxvy 12,y
Notes:
• Construct shortest path tree by
tracing predecessor nodes
• Ties can exist (can be broken
arbitrarily)
uwxvyz
The University of Sydney Page 31
Dijkstra’s algorithm: another example
Step
0
1
2
3
4
5
N’
u
ux
uxy
uxyv
uxyvw
uxyvwz
D(v),p(v)
2,u
2,u
2,u
D(w),p(w)
5,u
4,x
3,y
3,y
D(x),p(x)
1,u
D(y),p(y)
∞
2,x
D(z),p(z)
∞
∞
4,y
4,y
4,y
u
yx
wv
z
2
2
1
3
1
1
2
5
3
5
The University of Sydney Page 32
Dijkstra’s algorithm: another example
Step
0
1
2
3
4
5
N’
u
ux
uxy
uxyv
uxyvw
uxyvwz
D(v),p(v)
2,u
2,u
2,u
D(w),p(w)
5,u
4,x
3,y
3,y
D(x),p(x)
1,u
D(y),p(y)
∞
2,x
D(z),p(z)
∞
∞
4,y
4,y
4,y
u
yx
wv
z
2
2
1
3
1
1
2
5
3
5
Resulting shortest-path to w:
– w ß y ß x ßu
The University of Sydney Page 33
Dijkstra’s algorithm, discussion
Algorithm complexity: n nodes
– each iteration: need to check all nodes, w, not in N
– n(n+1)/2 comparisons: O(n2)
– more efficient implementations possible: O(nlogn)
The University of Sydney Page 34
Link-state routing algorithm
OSPF (Open Shortest Path First)
– “open”: publicly available
– Uses link-state algorithm
– link state packet dissemination
– topology map at each node
– route computation using Dijkstra’s algorithm
– router floods OSPF link-state advertisements to all other routers
in entire AS
– carried in OSPF messages directly over IP (rather than TCP or UDP
– link state: for each attached link
– multiple same-cost paths allowed (only one path in RIP)
The University of Sydney Page 35
Comparison of LS and DV algorithms
Message complexity
– LS: with n nodes, E links, O(nE)
msgs sent
– DV: exchange between neighbors
only
– convergence time varies
Speed of convergence
– LS: O(n2) algorithm requires O(nE)
msgs
– may have oscillations
– DV: convergence time varies
– may be routing loops
– count-to-infinity problem
Robustness: what happens if
router malfunctions?
LS:
– node can advertise incorrect
link cost
– each node computes only its
own table
DV:
– DV node can advertise
incorrect path cost
– each node’s table used by
others
• error propagate thru
network
The University of Sydney Page 36
Path finding in large scale networks
– Many graph libraries are available
– Python: NetworkX – https://networkx.github.io
– Efficient implementations for many commonly known graph
algorithms are available
– Example: Dijkstra’s shortest path
https://networkx.github.io/
The University of Sydney Page 37
Message-Oriented Transient Communication
Socket
Communication
Week 3, COMP3221
The University of Sydney Page 38
Socket
– Socket: interface between application and network: communication
end-points write data sent out over the network, and incoming data
can be read
– Application creates a socket
– Socket type dictates the style of communications: connection less/oriented
– A socket is identified by an IP address concatenated with a port
number
– In general, sockets use the client-server model:
– The server waits for incoming client requests by listening to a specified port
– Once a request is received, the server accepts a connection from the client
Internet
controlled
by OS
controlled by
app developer
transport
application
physical
link
network
process
transport
application
physical
link
network
process
socket
The University of Sydney Page 39
Address, Port, and Socket
– Like Apartment and Mailboxes
– You are the application
– Your apartment building address is address
– Your mailbox is the port
– Socket is the key giving you access to the right mailbox
– Q: How to choose which port a socket connects to ?
Internet
controlled
by OS
controlled by
app developer
transport
application
physical
link
network
process
transport
application
physical
link
network
processsocket
The University of Sydney Page 40
Socket
– Port numbers <1024 are for specific service protocols e.g., 80:HTTP, SSH:22, FTP:21, SMTP:25 – A client initiating a connection is assigned a free port number (>1024) by its host
Port
Socket
(161.25.19.8:80)
Socket
(146.86.5.20:1625)
Host X
(146.86.5.20)
Communication using socket
Web server
(161.25.19.8)
The University of Sydney Page 41
Socket
1. Servers generally execute the first 4 primitives
on accept
2. Bind associates the newly created socket to a
local address and a port; the client binds
implicitly to any available port
3. The client connects to a specified address
4. Once the connection is accepted by the server;
the client and server can communicate with
send and receive.
Berkley Sockets: Socket interface as proposed in Berkley UNIX in the 70’s
The University of Sydney Page 42
Stream-Oriented Communication
Communication
Week 3, COMP3221
The University of Sydney Page 43
Stream-oriented communication
– Previous communication modes do not guarantee transmission rate
– In stream-oriented communication, the stream should not be interrupted,
i.e., messages should keep being received at regular (typically small)
time intervals
– Example: an audio stream in CD quality w/ sound wave at 44,100Hz
– Destination starts reading before the entire information has been transmitted
– Each piece of data should be read in order at fixed rate: every 1/44,100
seconds.
When timing is crucial
Network
♬ ♪ ♩ ♬ ♩ ♬ ♪
The University of Sydney Page 44
Stream-oriented communication
– Requirements to ensure that the temporal relationships in a
stream can be preserved:
– The required bit rate at which data should be transported
– The maximum delay until a session has been set up (i.e., when an
application can start sending data)
– The maximum end-to-end delay (i.e., how long it will take until a data unit
makes it to a recipient)
– The maximum delay variance, or jitter
– The maximum round-trip delay
– Problem: how to stream over internet?
– Internet Protocol (IP) is best effort, it drops packet
– IP rarely implements QoS
Quality of Service (QoS)
The University of Sydney Page 45
Stream-oriented communication
– Use a buffer to store several data in advance at the receiver
– If packets are delayed with a certain variance but the average rate is
sustainable
– The receiver can pass packets to the application at regular time intervals
– The size of the receiver buffer is 9 seconds of packets to pass, unfortunately
packet #8 took 11 seconds to reach the receiver at which time the buffer was
empty.
Buffer to cope with variable delays
The University of Sydney Page 46
Stream-oriented communication
– With best effort protocols, packets can be dropped
– TCP/IP retransmit drops packets, yet this is too heavy for streaming application
– Forward error correction: k out of n packets are enough to reconstruct k packets
– Interleaving circumvents dropped packet to contain multiple consecutive audio and video frame
– The effect of packet loss in (a) non-interleaved transmission and (b) interleaved transmission
Error correction and interleaving to cope with message losses
The University of Sydney Page 47
Streaming multimedia: DASH
– DASH: Dynamic, Adaptive Streaming over HTTP
– server:
– divides video file into multiple chunks
– each chunk stored, encoded at different rates
– manifest file: provides URLs for different chunks
– client:
– periodically measures server-to-client bandwidth
– consulting manifest, requests one chunk at a time
• chooses maximum coding rate sustainable given
current bandwidth
• can choose different coding rates at different points
in time (depending on available bandwidth at time)
The University of Sydney Page 48
Streaming multimedia: DASH
– DASH: Dynamic, Adaptive Streaming over HTTP
– “intelligence” at client: client determines
– when to request chunk (so that buffer starvation, or
overflow does not occur)
– what encoding rate to request (higher quality when more
bandwidth available)
– where to request chunk (can request from URL server
that is “close” to client or has high available bandwidth)
The University of Sydney Page 49
Video Streaming and CDNs: context
• Netflix, YouTube: 37%, 16% of downstream
residential ISP traffic
• ~1B YouTube users, ~75M Netflix users
§ challenge: scale – how to reach ~1B users?
• single mega-video server won’t work (why?)
§ challenge: heterogeneity
§ different users have different capabilities (e.g.,
wired versus mobile; bandwidth rich versus
bandwidth poor)
§ video traffic: major consumer of Internet bandwidth
The University of Sydney Page 50
Content distribution networks
– challenge: how to stream content (selected from millions of
videos) to hundreds of thousands of simultaneous users?
– option 1: single, large “mega-server”
– single point of failure
– point of network congestion
– long path to distant clients
– multiple copies of video sent over outgoing link
….quite simply: this solution doesn’t scale
The University of Sydney Page 51
Content distribution networks
– challenge: how to stream content (selected from millions of
videos) to hundreds of thousands of simultaneous users?
– option 2: store/serve multiple copies of videos at multiple
geographically distributed sites (CDN)
– enter deep: push CDN servers deep into many access networks
• close to users
• used by Akamai, 1700 locations
– bring home: smaller number (10’s) of larger clusters in POPs near
(but not within) access networks
• used by Limelight
– solution: distributed, application-level infrastructure
The University of Sydney Page 52
Content Distribution Networks (CDNs)
…
…
……
…
…
§ subscriber requests content from CDN
§ CDN: stores copies of content at CDN nodes
• e.g. Netflix stores copies of MadMen
where’s Madmen?
manifest file
• directed to nearby copy, retrieves content
• may choose different copy if network path congested
The University of Sydney Page 53
Conclusion
– Network protocols are divided into layers
– Routing protocols are
– simple but not scalable (distance-vector) or
– more complex but scalable (link-state)
– There are various ways of communicating depending on the
needs:
– We covered: General-purpose communication (Socket)
– Streaming-oriented Communication
The University of Sydney Page 54
What’s Next ?
– Tutorial on Wednesday.
– Read Chapter 4 of the textbook
– Next week: more about communications in distributed systems
– See you all next week !