CS计算机代考程序代写 distributed system algorithm The University of Sydney Page 1

The University of Sydney Page 1

COMP3221: Distributed
Systems

Communication (TCP)

Dr Nguyen Tran
School of Computer Science

The University of Sydney Page 2

Previously…

– Previous lecture:
– Message-based communication is complex (e.g., routing towards

destination, subject to message losses)

– Today’s lecture:
– How to avoid message losses?
– How to give the impression that everything happens locally (not

remotely)?
– One to many communication

The University of Sydney Page 3

Outline

– The Problem of Message Loss

– The TCP/IP Solution

– Multicast communication

The University of Sydney Page 4

The Problem of Message Loss

The University of Sydney Page 5

Message loss

– Networks are in general unreliable
– Messages can be lost (never been delivered even if sent)

– Examples:
– A server receives too many requests simultaneously so it cannot treat all
– A router drops the message because its queue is full

Message losses may impact the computation of a distributed system

Cause

The University of Sydney Page 6

Message loss

– Constraints of the problem
– Two armies, each led by a general on separate mountains surrounding a

battlefield (distributed system)

– Can only communicate via messengers (message passing)
– Messengers can be killed before reaching destination (message losses)

– Goal: they want to coordinate an attack
– If they attack at different times, they both die
– If they attack at the same time, they win

Coordinated Attack Problem

The University of Sydney Page 7

Message loss

– There is no protocols to make sure they will win!
Coordinated Attack Problem (con’t)

time
12h!

The University of Sydney Page 8

Message loss

– There is no protocols to make sure they will win!
Coordinated Attack Problem (con’t)

time
Attack at noon, ok?

Ok, it works
for me

Copy that

12h, ok?

Ok

Did you get my ok?

So what?

Good!

The University of Sydney Page 9

Message loss

– Constraints of the problem
– Two remote entities of a distributed system
– Can only communicate through messages
– The network is unreliable: messages can be dropped

– Goal: they want to make sure to do something simultaneously

This is impossible, even if all messages go through

Analogy in networking

The University of Sydney Page 10

TCP/IP

Communication 2/2
Week 4, COMP3221

The University of Sydney Page 11

TCP: Overview RFCs: 793,1122,1323, 2018, 2581

– Point-to-point:
– one sender, one receiver

– Connection-oriented, reliable, in-order byte steam:
– Full duplex data:
– bi-directional data flow in same connection

– Flow controlled:
– sender will not overwhelm receiver

– Congestion controlled:
– TCP congestion and flow control set window size

The University of Sydney Page 12

TCP seq. numbers, ACKs

sequence numbers:
–byte stream “number” of
first byte in segment’s
data

acknowledgements:
–seq # of next byte
expected from other side
–cumulative ACK

Q: how receiver handles out-
of-order segments
–A: TCP spec doesn’t say, –
up to implementor

source port # dest port #

sequence number
acknowledgement number

checksum
rwnd

urg pointer

incoming segment to sender

A

sent
ACKed

sent, not-
yet ACKed
(“in-
flight”)

usable
but not
yet sent

not
usable

window size
N

sender sequence number space

source port # dest port #

sequence number
acknowledgement number

checksum
rwnd

urg pointer

outgoing segment from sender

The University of Sydney Page 13

TCP seq. numbers, ACKs

User
types
‘C’

host ACKs
receipt

of echoed
‘C’

host ACKs
receipt of
‘C’, echoes
back ‘C’

simple telnet scenario

Host BHost A

Seq=42, ACK=79, data = ‘C’

Seq=79, ACK=43, data = ‘C’

Seq=43, ACK=80

The University of Sydney Page 14

TCP sender events:

data rcvd from app:
§ create segment with seq #
§ seq # is byte-stream

number of first data byte
in segment

§ start timer if not already
running
• think of timer as for

oldest unacked
segment

• expiration interval:
TimeOutInterval

timeout:
§ retransmit segment that

caused timeout
§ restart timer
ack rcvd:
§ if ack acknowledges

previously unacked
segments
• update what is known

to be ACKed
• start timer if there are

still unacked segments

The University of Sydney Page 15

TCP round trip time, timeout

Q: how to set TCP
timeout value?

§ longer than RTT
• but RTT varies

§ too short: premature
timeout, unnecessary
retransmissions

§ too long: slow reaction to
segment loss

Q: how to estimate RTT?
– SampleRTT: measured

time from segment
transmission until ACK receipt
– ignore retransmissions

– SampleRTT will vary, want
estimated RTT “smoother”
– average several recent

measurements, not just
current SampleRTT

The University of Sydney Page 16

RTT: gaia.cs.umass.edu to fantasia.eurecom.fr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)

RT
T

(m
ill

is
ec

on
ds

)

SampleRTT Estimated RTT

EstimatedRTT = (1- a)*EstimatedRTT + a*SampleRTT

§ exponential weighted moving average
§ influence of past sample decreases exponentially fast
§ typical value: a = 0.125

TCP round trip time, timeout

RT
T

(m
ill

is
ec

on
ds

)

RTT: gaia.cs.umass.edu to fantasia.eurecom.fr

sampleRTT
EstimatedRTT

time (seconds)

The University of Sydney Page 17

– timeout interval: EstimatedRTT plus “safety margin”
– large variation in EstimatedRTT -> larger safety margin

– estimate SampleRTT deviation from EstimatedRTT:
DevRTT = (1-b)*DevRTT +

b*|SampleRTT-EstimatedRTT|

TCP round trip time, timeout

(typically, b = 0.25)

TimeoutInterval = EstimatedRTT + 4*DevRTT

estimated RTT “safety margin”

The University of Sydney Page 18

TCP: retransmission scenarios

lost ACK scenario

Host BHost A

Seq=92, 8 bytes of data

ACK=100

Seq=92, 8 bytes of data

Xtim
eo

ut

ACK=100

premature timeout

Host BHost A

Seq=92, 8 bytes of data

ACK=100

Seq=92, 8
bytes of data

tim
eo

ut

ACK=120

Seq=100, 20 bytes of data

ACK=120

SendBase=100

SendBase=120

SendBase=120

SendBase=92

The University of Sydney Page 19

TCP: retransmission scenarios

X

cumulative ACK

Host BHost A

Seq=92, 8 bytes of data

ACK=100

Seq=120, 15 bytes of data

tim
eo

ut

Seq=100, 20 bytes of data

ACK=120

The University of Sydney Page 20

TCP fast retransmit

§ time-out period often
relatively long:
• long delay before

resending lost packet
§ detect lost segments via

duplicate ACKs.
• sender often sends

many segments back-
to-back

• if segment is lost, there
will likely be many
duplicate ACKs.

if sender receives 3
ACKs for same data
(“triple duplicate ACKs”),
resend unacked
segment with smallest
seq #
§ likely that unacked

segment lost, so don’t
wait for timeout

TCP fast retransmit

(“triple duplicate ACKs”),

The University of Sydney Page 21

X

fast retransmit after sender
receipt of triple duplicate ACK

Host BHost A

Seq=92, 8 bytes of data

ACK=100

tim
eo

ut ACK=100

ACK=100
ACK=100

TCP fast retransmit

Seq=100, 20 bytes of data

Seq=100, 20 bytes of data

The University of Sydney Page 22

TCP flow control
application
process

TCP socket
receiver buffers

TCP
code

IP
code

application
OS

receiver protocol stack

application may
remove data from

TCP socket buffers ….

… slower than TCP
receiver is delivering
(sender is sending)

from sender

receiver controls sender, so
sender won’t overflow
receiver’s buffer by transmitting
too much, too fast

flow control

The University of Sydney Page 23

TCP flow control

buffered data

free buffer spacerwnd

RcvBuffer

TCP segment payloads

to application process

– receiver “advertises” free
buffer space by including
rwnd value in TCP header of
receiver-to-sender segments
– RcvBuffer size set via

socket options (typical default is
4096 bytes)

– many operating systems auto
adjust RcvBuffer

– sender limits amount of
unacked (“in-flight”) data to
receiver’s rwnd value

– guarantees receive buffer will
not overflow

receiver-side buffering

source port # dest port #

sequence number
acknowledgement number

checksum

rwnd
urg pointer

The University of Sydney Page 24

congestion:
– informally: “too many sources sending too much data too

fast for network to handle”
– different from flow control!
– manifestations:
– lost packets (buffer overflow at routers)
– long delays (queueing in router buffers)

– a top-10 problem!

Principles of congestion control

The University of Sydney Page 25

TCP congestion control

§ approach: sender increases transmission rate (window
size), probing for usable bandwidth, until loss occurs
• additive increase: increase cwnd by 1 MSS (Maximum

Segment Size) every RTT until loss detected
• multiplicative decrease: cut cwnd in half after loss

c
w
n
d
:

TC
P

s
en

de
r

co
ng

es
tio

n
w

in
do

w
s

iz
e

AIMD saw tooth
behavior: probing

for bandwidth

additively increase window size …
…. until loss occurs (then cut window in half)

time

The University of Sydney Page 26

TCP Congestion Control: details

§ sender limits transmission:

§ Congestion window (cwnd) is
dynamic function of perceived
network congestion

TCP sending rate:
§ roughly: send cwnd bytes,

wait RTT for ACKS, then
send more byteslast byte

ACKed sent, not-
yet ACKed
(“in-
flight”)

last byte
sent

cwnd

LastByteSent-
LastByteAcked

< cwnd sender sequence number space rate ~ cwnd RTT bytes/sec The University of Sydney Page 27 TCP Slow Start § when connection begins, increase rate exponentially until first loss event: • initially cwnd = 1 MSS (Maximum Segment Size) • double cwnd every RTT • done by incrementing cwnd for every ACK received § summary: initial rate is slow but ramps up exponentially fast Host A one segment R TT Host B time two segments four segments The University of Sydney Page 28 TCP: detecting, reacting to loss § Loss indicated by timeout: • cwnd set to 1 MSS; • window then grows exponentially (as in slow start) to threshold, then grows linearly § Loss indicated by 3 duplicate ACKs: TCP RENO • dup ACKs indicate network capable of delivering some segments • cwnd is cut in half window then grows linearly § TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) The University of Sydney Page 29 Q: when should the exponential increase switch to linear? A: when cwnd gets to 1/2 of its value before timeout. Implementation: § variable ssthresh § on loss event, ssthresh is set to 1/2 of cwnd just before loss event TCP: switching from slow start to Congestion Avoidance The University of Sydney Page 30 Connection Management before exchanging data, sender/receiver “handshake”: – agree to establish connection (each knowing the other willing to establish connection) – agree on connection parameters, e.g. MSS, rwnd, etc. – connection-oriented connection state: ESTAB connection variables: seq # client-to-server server-to-client rcvBuffer size at server,client application network connection state: ESTAB connection Variables: seq # client-to-server server-to-client rcvBuffer size at server,client application network Socket clientSocket = newSocket("hostname","port number"); Socket connectionSocket = welcomeSocket.accept(); The University of Sydney Page 31 TCP 3-way handshake SYNbit=1, Seq=x choose init seq num, x send TCP SYN msg ESTAB SYNbit=1, Seq=y ACKbit=1; ACKnum=x+1 choose init seq num, y send TCP SYNACK msg, acking SYN ACKbit=1, ACKnum=y+1 received SYNACK(x) indicates server is live; send ACK for SYNACK; this segment may contain client-to-server data received ACK(y) indicates client is live SYNSENT ESTAB SYN RCVD client state LISTEN server state LISTEN The University of Sydney Page 32 FIN_WAIT_2 CLOSE_WAIT FINbit=1, seq=y ACKbit=1; ACKnum=y+1 ACKbit=1; ACKnum=x+1 wait for server close can still send data can no longer send data LAST_ACK CLOSED TIMED_WAIT timed wait for 2*max segment lifetime CLOSED TCP: closing a connection FIN_WAIT_1 FINbit=1, seq=xcan no longer send but can receive data clientSocket.close() client state server state ESTABESTAB The University of Sydney Page 33 HTTP Overview HTTP: hypertext transfer protocol – Web’s application layer protocol – client/server model – client: browser that requests, receives, (using HTTP protocol) and “displays”Web objects – server:Web server sends (using HTTP protocol) objects in response to requests PC running Firefox browser server running Apache Web server iPhone running Safari browser HTTP request HTTP response HTT P r equ est HTT P r esp ons e The University of Sydney Page 34 HTTP overview Uses TCP: § client initiates TCP connection (creates socket) to server, port 80 § server accepts TCP connection from client § HTTP messages (application- layer protocol messages) exchanged between browser (HTTP client) and Web server (HTTP server) § TCP connection closed HTTP is “stateless” – server maintains no information about past client requests What is HTTPS ? – HTTP over TLS (Transport Layer Security), e.g. over SSL The University of Sydney Page 35 Related terminology – Broadcast: one-to-all communication – Action of sending a message to all nodes of the system – Typically used for relatively small systems, like IP broadcast as part of Ethernet in LANs – Unicast: one-to-one communication – In contrast with broadcast, describe a special form of distribution to a single receiver – Used generally in the context of multimedia or streaming applications – Anycast: one-to-random-one communication – Send a message to an IP address range from to obtain a response from any potential receiver, whose IP address belongs to this range – Used with UDP and TCP – Multicast: one-to-many communication – Action of sending a message to multiple nodes of the system (not necessarily all nodes) – This term is used in many contexts (network, algorithm) The University of Sydney Page 36 Conclusion (Transportation layer) – Message losses can have dramatic consequences – TCP/IP protocol suite hides these losses from the application level – Socket, RPC, use TCP/IP – Sockets are complex – RPC is more transparent for the client – Multicast communication scales well for large distributed systems