PowerPoint Presentation
Transport Layer
All material copyright 1996-2012
J.F Kurose and K.W. Ross, All Rights Reserved
George Parisis
School of Engineering and Informatics
University of Sussex
Transport Layer
3-*
Outline
transport-layer services
multiplexing and demultiplexing
connectionless transport: UDP
principles of reliable data transfer
connection-oriented transport: TCP
segment structure
reliable data transfer
flow control
connection management
principles of congestion control
TCP congestion control
Transport Layer
Transport Layer
3-*
TCP: Overview RFCs: 793,1122,1323, 2018, 2581
full duplex data:
bi-directional data flow in same connection
MSS: maximum segment size (application layer data)
connection-oriented:
handshaking (exchange of control messages): initialises sender, receiver state before data exchange
flow controlled:
sender will not overwhelm receiver
point-to-point:
one sender, one receiver
reliable, in-order byte stream:
no “message boundaries”
pipelined:
TCP congestion and flow control set window size
Transport Layer
Transport Layer
3-*
TCP segment structure
source port #
dest port #
32 bits
application
data
(variable length)
sequence number
acknowledgement number
receive window
Urg data pointer
checksum
F
S
R
P
A
U
head
len
not
used
options (variable length)
URG: urgent data
(generally not used)
ACK: ACK #
valid
PSH: push data now
(generally not used)
RST, SYN, FIN:
connection estab
(setup, teardown
commands)
# bytes
rcvr willing
to accept
counting bytes
of data
(not segments!)
Internet
checksum
(as in UDP)
Transport Layer
Transport Layer
3-*
TCP seq. numbers, ACKs
sequence numbers:
byte stream “number” of first byte in segment’s data
acknowledgements:
seq # of next byte expected from other side
cumulative ACK
Q: how receiver handles out-of-order segments
A: TCP spec doesn’t say, – up to implementor
discard, keep
network bandwidth
sent
ACKed
sent, not-yet ACKed
(“in-flight”)
usable
but not
yet sent
not
usable
window size
N
sender sequence number space
source port #
dest port #
sequence number
acknowledgement number
checksum
rwnd
urg pointer
incoming segment to sender
A
source port #
dest port #
sequence number
acknowledgement number
checksum
rwnd
urg pointer
outgoing segment from sender
Transport Layer
Transport Layer
3-*
TCP seq. numbers, ACKs
User
types
‘C’
host ACKs
receipt
of echoed
‘C’
host ACKs
receipt of
‘C’, echoes
back ‘C’
simple telnet scenario
Host B
Host A
Seq=42, ACK=79, data = ‘C’
Seq=79, ACK=43, data = ‘C’
Seq=43, ACK=80
Transport Layer
Transport Layer
3-*
Outline
transport-layer services
multiplexing and demultiplexing
connectionless transport: UDP
principles of reliable data transfer
connection-oriented transport: TCP
segment structure
reliable data transfer
flow control
connection management
principles of congestion control
TCP congestion control
Transport Layer
Transport Layer
3-*
TCP reliable data transfer
TCP creates rdt service on top of IP’s unreliable service
pipelined segments
cumulative acks
single retransmission timer
retransmissions triggered by:
timeout events
duplicate acks
Let’s initially consider simplified TCP sender:
ignore duplicate acks
ignore flow control, congestion control
Transport Layer
Transport Layer
3-*
TCP round trip time, timeout
Q: how to set TCP timeout value?
longer than RTT
but RTT varies
too short: premature timeout, unnecessary retransmissions
too long: slow reaction to segment loss
Q: how to estimate RTT?
SampleRTT: measured time from segment transmission until ACK receipt
ignore retransmissions
SampleRTT will vary, want estimated RTT “smoother”
average several recent measurements, not just current SampleRTT
Transport Layer
Transport Layer
3-*
EstimatedRTT = (1 – )* EstimatedRTT + * SampleRTT
exponential weighted moving average
influence of past sample decreases exponentially fast
typical value: = 0.125
TCP round trip time, timeout
RTT (milliseconds)
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
sampleRTT
EstimatedRTT
time (seconds)
Transport Layer
Transport Layer
3-*
timeout interval: EstimatedRTT plus “safety margin”
large variation in EstimatedRTT -> larger safety margin
estimate SampleRTT deviation from EstimatedRTT:
DevRTT = (1 – )* DevRTT + * |SampleRTT – estimatedRTT|
TCP round trip time, timeout
(typically, = 0.25)
TimeoutInterval = EstimatedRTT + 4*DevRTT
estimated RTT
“safety margin”
Transport Layer
Transport Layer
3-*
TCP sender events:
data rcvd from app:
create segment with seq #
seq # is byte-stream number of first data byte in segment
start timer if not already running
think of timer as for oldest unacked segment
expiration interval: TimeOutInterval
timeout:
retransmit segment that caused timeout
restart timer
ack rcvd:
if ack acknowledges previously unacked segments
update what is known to be ACKed
start timer if there are still unacked segments
Transport Layer
*
Transport Layer
3-*
TCP sender (simplified)
wait
for
event
NextSeqNum = InitialSeqNum
SendBase = InitialSeqNum
L
create segment, seq. #: NextSeqNum
pass segment to IP (i.e., “send”)
NextSeqNum = NextSeqNum + length(data)
if (timer currently not running)
start timer
data received from application above
retransmit not-yet-acked segment with smallest seq. #
start timer
timeout
if (y > SendBase) {
SendBase = y
/* SendBase–1: last cumulatively ACKed byte */
if (there are currently not-yet-acked segments)
start timer
else stop timer
}
ACK received, with ACK field value y
Transport Layer
Transport Layer
3-*
TCP: retransmission scenarios
lost ACK scenario
Host B
Host A
Seq=92, 8 bytes of data
ACK=100
Seq=92, 8 bytes of data
X
timeout
ACK=100
premature timeout
Host B
Host A
Seq=92, 8 bytes of data
Seq=92, 8
bytes of data
timeout
ACK=120
SendBase=100
SendBase=120
SendBase=120
SendBase=92
ACK=100
Seq=100, 20 bytes of data
ACK=120
Transport Layer
*
Transport Layer
3-*
TCP: retransmission scenarios
X
cumulative ACK
Host B
Host A
Seq=92, 8 bytes of data
Seq=120, 15 bytes of data
ACK=100
timeout
Seq=100, 20 bytes of data
ACK=120
Transport Layer
*
Transport Layer
3-*
TCP ACK generation [RFC 1122, RFC 2581]
event at receiver
arrival of in-order segment with
expected seq #. All data up to
expected seq # already ACKed
arrival of in-order segment with
expected seq #. One other
segment has ACK pending
arrival of out-of-order segment
higher-than-expect seq. # .
Gap detected
arrival of segment that
partially or completely fills gap
TCP receiver action
delayed ACK. Wait up to 500ms
for next segment. If no next segment,
send ACK
immediately send single cumulative
ACK, ACKing both in-order segments
immediately send duplicate ACK,
indicating seq. # of next expected byte
immediately send ACK, provided that
segment starts at lower end of gap
Transport Layer
Doubling the Timeout Interval
Transport Layer
3-*
length of timeout after a timer expiration?
TCP sets the next timeout interval to twice the previous value, rather than deriving it from the last EstimatedRTT and DevRTT
intervals grow exponentially after each retransmission
Back to normal whenever the timer is started after either of the two other events (i.e. data received from application above, and ACK received)
limited form of congestion control
Transport Layer
Transport Layer
3-*
TCP fast retransmit
time-out period often relatively long:
long delay before resending lost packet
detect lost segments via duplicate ACKs
sender often sends many segments back-to-back
if segment is lost, there will likely be many duplicate ACKs.
if sender receives 3 ACKs for same data
(“triple duplicate ACKs”), resend unacked segment with smallest seq #
likely that unacked segment lost, so don’t wait for timeout
Why not resend after the first duplicate ack?
TCP fast retransmit
(“triple duplicate ACKs”),
Transport Layer
Transport Layer
3-*
X
fast retransmit after sender
receipt of triple duplicate ACK
Host B
Host A
Seq=92, 8 bytes of data
TCP fast retransmit
Seq=100, 20 bytes of data
Seq=100, 20 bytes of data
ACK=100
timeout
ACK=100
ACK=100
ACK=100
Transport Layer
Go-Back-N or Selective Repeat?
Transport Layer
3-*
TCP acknowledgments are cumulative
correctly received but out-of-order segments are not individually ACKed
TCP looks a lot like a GBN-style protocol but many TCP implementations will buffer correctly received but out-of-order segments. Example:
sender sends a sequence of segments 1, 2, . . . , N, and all of the segments arrive in order without error at the receiver
the acknowledgment for packet n < N gets lost, but the remaining N – 1 acknowledgments arrive at the sender before their respective timeouts
GBN would retransmit all of the subsequent packets
TCP would retransmit at most (remember cumulative acks) one segment (segment n)
TCP SACK: selective acknowledgments (ack out-of-order segments)
Transport Layer
Transport Layer
3-*
Outline
transport-layer services
multiplexing and demultiplexing
connectionless transport: UDP
principles of reliable data transfer
connection-oriented transport: TCP
segment structure
reliable data transfer
flow control
connection management
principles of congestion control
TCP congestion control
Transport Layer
Transport Layer
3-*
Connection Management
before exchanging data, sender/receiver “handshake”:
agree to establish connection (each knowing the other willing to establish connection)
agree on connection parameters
connection state: ESTAB
connection variables:
seq # client-to-server
server-to-client
rcvBuffer size
at server,client
application
network
connection state: ESTAB
connection Variables:
seq # client-to-server
server-to-client
rcvBuffer size
at server,client
application
network
Socket clientSocket = newSocket("hostname","port number");
Socket connectionSocket = welcomeSocket.accept();
Transport Layer
Transport Layer
3-*
Q: will 2-way handshake always work in network?
variable delays
retransmitted messages (e.g. req_conn(x)) due to message loss
message reordering
can’t “see” other side
2-way handshake:
Let’s talk
OK
ESTAB
ESTAB
choose x
req_conn(x)
ESTAB
ESTAB
acc_conn(x)
Agreeing to establish a connection
Transport Layer
Transport Layer
3-*
Agreeing to establish a connection
2-way handshake failure scenarios:
req_conn(x)
retransmit
req_conn(x)
ESTAB
half open connection!
(no client!)
client terminates
server
forgets x
connection
x completes
retransmit
req_conn(x)
ESTAB
req_conn(x)
data(x+1)
retransmit
data(x+1)
accept
data(x+1)
acc_conn(x)
choose x
req_conn(x)
ESTAB
ESTAB
acc_conn(x)
connection
x completes
client terminates
ESTAB
choose x
req_conn(x)
ESTAB
data(x+1)
accept
data(x+1)
server
forgets x
Transport Layer
Transport Layer
3-*
TCP 3-way handshake
ESTAB
SYNbit=1, Seq=x
choose init seq num, x
send TCP SYN msg
SYNbit=1, Seq=y
ACKbit=1; ACKnum=x+1
choose init seq num, y
send TCP SYNACK
msg, acking SYN
ACKbit=1, ACKnum=y+1
received SYNACK(x)
indicates server is live;
send ACK for SYNACK;
this segment may contain
client-to-server data
received ACK(y)
indicates client is live
SYNSENT
ESTAB
SYN RCVD
client state
CLOSED
server state
LISTEN
Transport Layer
Transport Layer
3-*
TCP 3-way handshake: FSM
closed
L
listen
SYN
rcvd
SYN
sent
ESTAB
Socket clientSocket =
newSocket("hostname","port number");
SYN(seq=x)
Socket connectionSocket = welcomeSocket.accept();
SYN(x)
SYNACK(seq=y,ACKnum=x+1)
create new socket for
communication back to client
SYNACK(seq=y,ACKnum=x+1)
ACK(ACKnum=y+1)
ACK(ACKnum=y+1)
L
Transport Layer
Transport Layer
3-*
TCP: closing a connection
client, server each close their side of connection
send TCP segment with FIN bit = 1
respond to received FIN with ACK
on receiving FIN, ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer
Transport Layer
3-*
TCP: closing a connection
client state
server state
ESTAB
ESTAB
FIN_WAIT_2
CLOSE_WAIT
FINbit=1, seq=y
ACKbit=1; ACKnum=y+1
ACKbit=1; ACKnum=x+1
wait for server
close
can still
send data
LAST_ACK
can no longer
send data
CLOSED
TIMED_WAIT
timed wait
for 2*max
segment lifetime
CLOSED
FIN_WAIT_1
FINbit=1, seq=x
can no longer
send but can
receive data
clientSocket.close()
Transport Layer
Transport Layer
3-*
Summary
TCP
Reliable data transfer
Connection management
Transport Layer
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
100
150
200
250
300
350
1815222936435057647178859299106
time (seconnds)
RTT (milliseconds)
SampleRTTEstimated RTT