CS计算机代考程序代写 distributed system algorithm COMP3221: Distributed Systems

COMP3221: Distributed Systems
Communication (TCP)
Dr Nguyen Tran
School of Computer Science
The University of Sydney
Page 1

Previously…
– Previous lecture:
– Message-based communication is complex (e.g., routing towards
destination, subject to message losses)
– Today’s lecture:
– How to avoid message losses?
– How to give the impression that everything happens locally (not remotely)?
– One to many communication
The University of Sydney
Page 2

Outline
– The Problem of Message Loss – The TCP/IP Solution
– Multicast communication
The University of Sydney Page 3

The Problem of Message Loss
The University of Sydney Page 4

Message loss
Cause
– Networks are in general unreliable
– Messages can be lost (never been delivered even if sent)
– Examples:
– A server receives too many requests simultaneously so it cannot treat all – A router drops the message because its queue is full
Message losses may impact the computation of a distributed system
The University of Sydney Page 5

Message loss
Coordinated Attack Problem
– Constraints of the problem
– Two armies, each led by a general on separate mountains surrounding a battlefield (distributed system)
– Can only communicate via messengers (message passing)
– Messengers can be killed before reaching destination (message losses)
– Goal: they want to coordinate an attack – If they attack at different times, they both die – If they attack at the same time, they win
The University of Sydney Page 6

Message loss
Coordinated Attack Problem (con’t)
– There is no protocols to make sure they will win! time
12h!
The University of Sydney
Page 7

Message loss
Coordinated Attack Problem (con’t)
– There is no protocols to make sure they will win! time
12h, ok?
Good! So what?
Ok
Did you get my ok?
The University of Sydney
Page 8

Attack at noon, ok?
Ok, it works for me
Copy that

Message loss
Analogy in networking
– Constraints of the problem
– Two remote entities of a distributed system
– Can only communicate through messages
– The network is unreliable: messages can be dropped
– Goal: they want to make sure to do something simultaneously This is impossible, even if all messages go through
The University of Sydney
Page 9

TCP/IP
Communication 2/2 Week 4, COMP3221
The University of Sydney Page 10

TCP: Overview RFCs: 793,1122,1323, 2018, 2581
– Point-to-point:
– one sender, one receiver
– Connection-oriented, reliable, in-order byte steam: – Full duplex data:
– bi-directional data flow in same connection – Flow controlled:
– sender will not overwhelm receiver – Congestion controlled:
– TCP congestion and flow control set window size The University of Sydney
Page 11

TCP seq. numbers, ACKs
sequence numbers:
–byte stream “number” of first byte in segment’s data
acknowledgements:
–seq # of next byte expected from other side
–cumulative ACK
Q: how receiver handles out-
of-order segments
–A: TCP spec doesn’t say, – up to implementor
outgoing segment from sender
source port # dest port #
sequence number
acknowledgement number
rwnd
checksum
urg pointer
window size N
sender sequence number space
sent ACKed
sent, not- yet ACKed (“in- flight”)
usable not but not usable yet sent
incoming segment to sender
source port # dest port #
sequence number
acknowledgement number
A
rwnd
checksum
urg pointer
The University of Sydney
Page 12

TCP seq. numbers, ACKs
Host A Host B
User types ‘C’
Seq=79, ACK=43, data = ‘C’ host ACKs
The University of Sydney
Page 13
receipt of echoed ‘C’
Seq=43, ACK=80
simple telnet scenario
Seq=42, ACK=79, data = ‘C’
host ACKs
receipt of ‘C’, echoes back ‘C’

TCP sender events:
data rcvd from app: timeout:
§ create segment with seq # § retransmit segment that
caused timeout numberoffirstdatabyte §restarttimer
§ seq # is byte-stream in segment
§ start timer if not already running
• think of timer as for oldest unacked segment
• expiration interval: TimeOutInterval
ack rcvd:
§ if ack acknowledges previously unacked segments
• update what is known to be ACKed
• start timer if there are still unacked segments
The University of Sydney
Page 14

TCP round trip time, timeout
Q: how to set TCP
timeout value? § longer than RTT
• but RTT varies
§ too short: premature timeout, unnecessary retransmissions
§ too long: slow reaction to segment loss
Q: how to estimate RTT?
– SampleRTT: measured time from segment transmission until ACK receipt
– ignore retransmissions
– SampleRTT will vary, want
estimated RTT “smoother”
– average several recent measurements, not just current SampleRTT
The University of Sydney
Page 15

TCP round trip time, timeout
EstimatedRTT = (1- a)*EstimatedRTT + a*SampleRTT
§ exponential weighted moving average
§ influence of past sample decreases exponentially fast
§ typical value: a = 0.125
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
350
300
250
200
150
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
sampleRTT
EstimatedRTT
The University of Sydney
time (seconds)
Page 16
time (seconnds)
SampleRTT Estimated RTT
RTT (milliseconds)
RTT (milliseconds)

TCP round trip time, timeout
– timeout interval: EstimatedRTT plus “safety margin” – largevariationinEstimatedRTT->largersafetymargin
– estimate SampleRTT deviation from EstimatedRTT: DevRTT = (1-b)*DevRTT +
b*|SampleRTT-EstimatedRTT| (typically, b = 0.25)
TimeoutInterval = EstimatedRTT + 4*DevRTT
The University of Sydney
estimated RTT
“safety margin”
Page 17

TCP: retransmission scenarios
Host A Host B
Seq=92, 8 bytes of data
Host A Host B SendBase=92
ACK=100
X
Seq=92, 8 bytes of data
ACK=100
lost ACK scenario
SendBase=100 SendBase=120
SendBase=120
Seq=92, 8 bytes of data Seq=100, 20 bytes of data
ACK=100 ACK=120
Seq=92, 8 bytes of data
ACK=120
premature timeout
The University of Sydney
Page 18
timeout
timeout

TCP: retransmission scenarios
Host A
Host B
Seq=92, 8 bytes of data
Seq=100, 20 bytes of data
X ACK=100 ACK=120
Seq=120, 15 bytes of data
cumulative ACK
The University of Sydney
Page 19
timeout

TCP fast retransmit
§ time-out period often relatively long:
• long delay before resending lost packet
§ detect lost segments via duplicate ACKs.
• sender often sends many segments back- to-back
• if segment is lost, there will likely be many duplicate ACKs.
TCP fast retransmit
if sender receives 3 ACKs for same data
(“triple duplicate ACKs”), (“triple duplicate ACKs”),
resend unacked segment with smallest seq #
§ likely that unacked segment lost, so don’t wait for timeout
The University of Sydney
Page 20

TCP fast retransmit
Host A Host B
Seq=92, 8 bytes of data
Seq=100, 20 bytes of data
X
ACK=100
ACK=100
ACK=100 ACK=100
Seq=100, 20 bytes of data
The University of Sydney
fast retransmit after sender receipt of triple duplicate ACK
Page 21
timeout

TCP flow control
application process
TCP code
TCP socket receiver buffers
IP code
application may remove data from TCP socket buffers ….
… slower than TCP receiver is delivering (sender is sending)
application OS
flow control
receiver controls sender, so sender won’t overflow receiver’s buffer by transmitting too much, too fast
from sender
The University of Sydney
Page 22
receiver protocol stack

TCP flow control
– receiver“advertises”free buffer space by including rwnd value in TCP header of receiver-to-sender segments
– RcvBuffer size set via socket options (typical default is 4096 bytes)
– manyoperatingsystemsauto adjust RcvBuffer
– senderlimitsamountof unacked (“in-flight”) data to receiver’s rwnd value
– guaranteesreceivebufferwill not overflow
to application process
buffered data
free buffer space
RcvBuffer
rwnd
TCP segment payloads receiver-side buffering
source port # dest port #
sequence number
acknowledgement number
rwnd
checksum
urg pointer
The University of Sydney
Page 23

Principles of congestion control
congestion:
– informally: “too many sources sending too much data too
fast for network to handle”
– different from flow control!
– manifestations:
– lost packets (buffer overflow at routers) – long delays (queueing in router buffers)
– a top-10 problem! The University of Sydney
Page 24

TCP congestion control
§ approach: sender increases transmission rate (window size), probing for usable bandwidth, until loss occurs
• additive increase: increase cwnd by 1 MSS (Maximum Segment Size) every RTT until loss detected
• multiplicative decrease: cut cwnd in half after loss additively increase window size …
…. until loss occurs (then cut window in half)
AIMD saw tooth behavior: probing for bandwidth
The University of Sydney
Page 25
time
cwnd: TCP sender congestion window size

TCP Congestion Control: details
sender sequence number space
TCP sending rate:
§ roughly: send cwnd bytes, wait RTT for ACKS, then send more bytes
cwnd
last byte ACKed
sent, not- yet ACKed (“in- flight”)
last byte sent
§ sender limits transmission:
§ Congestion window (cwnd) is dynamic function of perceived network congestion
The University of Sydney
Page 26
rate
~
cwnd RTT
bytes/sec
LastByteSent- < LastByteAcked cwnd TCP Slow Start § when connection begins, increase rate exponentially until first loss event: • initiallycwnd=1MSS (Maximum Segment Size) • double cwnd every RTT • done by incrementing cwnd for every ACK received § summary: initial rate is slow but ramps up exponentially fast Host A Host B The University of Sydney Page 27 time RTT one segment two segments four segments TCP: detecting, reacting to loss § Loss indicated by timeout: • cwnd set to 1 MSS; • windowthengrowsexponentially(asinslowstart)tothreshold,then grows linearly § Loss indicated by 3 duplicate ACKs: TCP RENO • dupACKsindicatenetworkcapableofdeliveringsomesegments • cwnd is cut in half window then grows linearly § TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks) The University of Sydney Page 28 TCP: switching from slow start to Congestion Avoidance Q: when should the exponential increase switch to linear? A: when cwnd gets to 1/2 of its value before timeout. Implementation: § variable ssthresh § on loss event, ssthresh is set to 1/2 of cwnd just before loss event The University of Sydney Page 29 Connection Management before exchanging data, sender/receiver “handshake”: – agree to establish connection (each knowing the other willing to establish connection) – agree on connection parameters, e.g. MSS, rwnd, etc. – connection-oriented application connection state: ESTAB connection variables: seq # client-to-server server-to-client rcvBuffer size at server,client network application connection state: ESTAB connection Variables: seq # client-to-server server-to-client rcvBuffer size at server,client network The University of Sydney Page 30 Socket clientSocket = newSocket("hostname","port number"); Socket connectionSocket = welcomeSocket.accept(); TCP 3-way handshake client state LISTEN choose init seq num, x server state LISTEN SYN RCVD SYNSENT SYNbit=1, Seq=x SYNbit=1, Seq=y ACKbit=1; ACKnum=x+1 ACKbit=1, ACKnum=y+1 choose init seq num, y send TCP SYNACK msg, acking SYN send TCP SYN msg received SYNACK(x) indicates server is live; send ACK for SYNACK; ESTAB this segment may contain client-to-server data received ACK(y) indicates client is live The University of Sydney Page 31 ESTAB TCP: closing a connection client state server state ESTAB CLOSE_WAIT LAST_ACK CLOSED ESTAB clientSocket.close() FIN_WAIT_1 FIN_WAIT_2 TIMED_WAIT can no longer send but can receive data wait for server close FINbit=1, seq=x ACKbit=1; ACKnum=x+1 FINbit=1, seq=y ACKbit=1; ACKnum=y+1 can still send data can no longer send data CLOSED The University of Sydney Page 32 timed wait for 2*max segment lifetime HTTP Overview HTTP: hypertext transfer protocol – Web’s application layer protocol – client/server model – client: browser that requests, receives, (using HTTP protocol) and “displays” Web objects – server: Web server sends (using HTTP protocol) objects in response to requests PC running Firefox browser server running Apache Web server The University of Sydney Page 33 iPhone running Safari browser HTTP request HTTP response HTTP request HTTP response HTTP overview Uses TCP: § client initiates TCP connection (creates socket) to server, port 80 § server accepts TCP connection from client § HTTP messages (application- layer protocol messages) exchanged between browser (HTTP client) and Web server (HTTP server) § TCP connection closed The University of Sydney HTTP is “stateless” – server maintains no information about past client requests What is HTTPS ? – HTTP over TLS (Transport Layer Security), e.g. over SSL Page 34 Related terminology – Broadcast:one-to-allcommunication – Actionofsendingamessagetoallnodesofthesystem – Typicallyusedforrelativelysmallsystems,likeIPbroadcastaspartof Ethernet in LANs – Unicast:one-to-onecommunication – Incontrastwithbroadcast,describeaspecialformofdistributiontoasingle receiver – Usedgenerallyinthecontextofmultimediaorstreamingapplications – Anycast:one-to-random-onecommunication – SendamessagetoanIPaddressrangefromtoobtainaresponsefrom any potential receiver, whose IP address belongs to this range – UsedwithUDPandTCP – Multicast:one-to-manycommunication – Actionofsendingamessagetomultiplenodesofthesystem(not necessarily all nodes) – Thistermisusedinmanycontexts(network,algorithm) The University of Sydney Page 35 Conclusion (Transportation layer) – Message losses can have dramatic consequences – TCP/IP protocol suite hides these losses from the application level – Socket, RPC, use TCP/IP – Sockets are complex – RPC is more transparent for the client – Multicast communication scales well for large distributed systems The University of Sydney Page 36