Transport Layer
All material copyright 1996-2012
J.F Kurose and K.W. Ross, All Rights Reserved
George Parisis
School of Engineering and Informatics
University of Sussex
Transport Layer 3-2
Outline
v transport-layer services
v multiplexing and demultiplexing
v connectionless transport: UDP
v principles of reliable data transfer
v connection-oriented transport: TCP
§ segment structure
§ reliable data transfer
§ flow control
§ connection management
v principles of congestion control
v TCP congestion control
Transport Layer 3-3
TCP: Overview RFCs: 793,1122,1323, 2018, 2581
v full duplex data:
§ bi-directional data flow in
same connection
§ MSS: maximum segment
size (application layer
data)
v connection-oriented:
§ handshaking (exchange of
control messages):
initialises sender, receiver
state before data
exchange
v flow controlled:
§ sender will not overwhelm
receiver
v point-to-point:
§ one sender, one receiver
v reliable, in-order byte
stream:
§ no “message boundaries”
v pipelined:
§ TCP congestion and flow
control set window size
Transport Layer 3-4
TCP segment structure
source port # dest port #
32 bits
application
data
(variable length)
sequence number
acknowledgement number
receive window
Urg data pointer checksum
F S R P A U head len
not
used
options (variable length)
URG: urgent data
(generally not used)
ACK: ACK #
valid
PSH: push data now
(generally not used)
RST, SYN, FIN:
connection estab
(setup, teardown
commands)
# bytes
rcvr willing
to accept
counting bytes
of data
(not segments!)
Internet
checksum
(as in UDP)
Transport Layer 3-5
TCP seq. numbers, ACKs
sequence numbers:
§ byte stream “number” of
first byte in segment’s data
acknowledgements:
§ seq # of next byte expected
from other side
§ cumulative ACK
Q: how receiver handles
out-of-order segments
§ A: TCP spec doesn’t say, –
up to implementor
• discard, keep
• network bandwidth source port # dest port #
sequence number
acknowledgement number
checksum
rwnd
urg pointer
incoming segment to sender
A
sent
ACKed
sent, not-
yet ACKed
(“in-
flight”)
usable
but not
yet sent
not
usable
window size
N
sender sequence number space
source port # dest port #
sequence number
acknowledgement number
checksum
rwnd
urg pointer
outgoing segment from sender
Transport Layer 3-6
TCP seq. numbers, ACKs
User
types
‘C’
host ACKs
receipt
of echoed
‘C’
host ACKs
receipt of
‘C’, echoes
back ‘C’
simple telnet scenario
Host B Host A
Seq=42, ACK=79, data = ‘C’
Seq=79, ACK=43, data = ‘C’
Seq=43, ACK=80
Transport Layer 3-7
Outline
v transport-layer services
v multiplexing and demultiplexing
v connectionless transport: UDP
v principles of reliable data transfer
v connection-oriented transport: TCP
§ segment structure
§ reliable data transfer
§ flow control
§ connection management
v principles of congestion control
v TCP congestion control
Transport Layer 3-8
TCP reliable data transfer
v TCP creates rdt service
on top of IP’s unreliable
service
§ pipelined segments
§ cumulative acks
§ single retransmission
timer
v retransmissions
triggered by:
§ timeout events
§ duplicate acks
let’s initially consider
simplified TCP
sender:
§ ignore duplicate acks
§ ignore flow control,
congestion control
Transport Layer 3-9
TCP round trip time, timeout
Q: how to set TCP
timeout value?
v longer than RTT
§ but RTT varies
v too short:
premature timeout,
unnecessary
retransmissions
v too long: slow
reaction to segment
loss
Q: how to estimate
RTT?
v SampleRTT: measured
time from segment
transmission until ACK
receipt
§ ignore retransmissions
v SampleRTT will vary,
want estimated RTT
“smoother”
§ average several recent
measurements, not just
current SampleRTT
Transport Layer 3-10
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T
(m
ill
is
ec
on
ds
)
SampleRTT Estimated RTT
EstimatedRTT = (1 – α)* EstimatedRTT + α * SampleRTT
v exponential weighted moving average
v influence of past sample decreases exponentially fast
v typical value: α = 0.125
TCP round trip time, timeout
R
TT
(
m
ill
is
ec
on
ds
)
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
sampleRTT
EstimatedRTT
time (seconds)
Transport Layer 3-11
v timeout interval: EstimatedRTT plus “safety margin”
§ large variation in EstimatedRTT -> larger safety margin
v estimate SampleRTT deviation from EstimatedRTT:
DevRTT = (1 – β)* DevRTT + β * |SampleRTT – estimatedRTT|
TCP round trip time, timeout
(typically, β = 0.25)
TimeoutInterval = EstimatedRTT + 4*DevRTT
estimated RTT “safety margin”
Transport Layer 3-12
TCP sender events:
data rcvd from app:
v create segment with
seq #
v seq # is byte-stream
number of first data
byte in segment
v start timer if not
already running
§ think of timer as for
oldest unacked
segment
§ expiration interval:
TimeOutInterval
timeout:
v retransmit segment
that caused timeout
v restart timer
ack rcvd:
v if ack acknowledges
previously unacked
segments
§ update what is known
to be ACKed
§ start timer if there are
still unacked
segments
Transport Layer 3-13
TCP sender (simplified)
wait
for
event
NextSeqNum = InitialSeqNum
SendBase = InitialSeqNum
Λ
create segment, seq. #: NextSeqNum
pass segment to IP (i.e., “send”)
NextSeqNum = NextSeqNum + length(data)
if (timer currently not running)
start timer
data received from application above
retransmit not-yet-acked segment
with smallest seq. #
start timer
timeout
if (y > SendBase) {
SendBase = y
/* SendBase–1: last cumulatively ACKed byte */
if (there are currently not-yet-acked segments)
start timer
else stop timer
}
ACK received, with ACK field value y
Transport Layer 3-14
TCP: retransmission scenarios
lost ACK scenario
Host B Host A
Seq=92, 8 bytes of data
ACK=100
Seq=92, 8 bytes of data
X tim
eo
ut
ACK=100
premature timeout
Host B Host A
Seq=92, 8 bytes of data
ACK=100
Seq=92, 8
bytes of data
ti
m
eo
ut
ACK=120
Seq=100, 20 bytes of data
ACK=120
SendBase=100
SendBase=120
SendBase=120
SendBase=92
Transport Layer 3-15
TCP: retransmission scenarios
X
cumulative ACK
Host B Host A
Seq=92, 8 bytes of data
ACK=100
Seq=120, 15 bytes of data
ti
m
eo
ut
Seq=100, 20 bytes of data
ACK=120
Transport Layer 3-16
TCP ACK generation [RFC 1122, RFC 2581]
event at receiver
arrival of in-order segment with
expected seq #. All data up to
expected seq # already ACKed
arrival of in-order segment with
expected seq #. One other
segment has ACK pending
arrival of out-of-order segment
higher-than-expect seq. # .
Gap detected
arrival of segment that
partially or completely fills gap
TCP receiver action
delayed ACK. Wait up to 500ms
for next segment. If no next segment,
send ACK
immediately send single cumulative
ACK, ACKing both in-order segments
immediately send duplicate ACK,
indicating seq. # of next expected byte
immediately send ACK, provided that
segment starts at lower end of gap
Doubling the Timeout Interval
Transport Layer 3-17
v length of timeout after a timer expiration?
v TCP sets the next timeout interval to twice the
previous value, rather than deriving it from the
last EstimatedRTT and DevRTT
v intervals grow exponentially after each
retransmission
v Back to normal whenever the timer is started
after either of the two other events (i.e. data
received from application above, and ACK
received)
v limited form of congestion control
Transport Layer 3-18
TCP fast retransmit
v time-out period
often relatively long:
§ long delay before
resending lost packet
v detect lost segments
via duplicate ACKs
§ sender often sends
many segments
back-to-back
§ if segment is lost,
there will likely be
many duplicate
ACKs.
if sender receives 3
ACKs for same data
(“triple duplicate ACKs”),
resend unacked
segment with smallest
seq #
§ likely that unacked
segment lost, so don’t
wait for timeout
TCP fast retransmit
(“triple duplicate ACKs”),
Transport Layer 3-19
X
fast retransmit after sender
receipt of triple duplicate ACK
Host B Host A
Seq=92, 8 bytes of data
ACK=100
ti
m
eo
ut
ACK=100
ACK=100
ACK=100
TCP fast retransmit
Seq=100, 20 bytes of data
Seq=100, 20 bytes of data
Go-Back-N or Selective Repeat?
Transport Layer 3-20
v TCP acknowledgments are cumulative
v correctly received but out-of-order segments are not
individually ACKed
v TCP looks a lot like a GBN-style protocol but many TCP
implementations will buffer correctly received but out-of-
order segments. Example:
v sender sends a sequence of segments 1, 2, . . . , N, and all of
the segments arrive in order without error at the receiver
v the acknowledgment for packet n < N gets lost, but the remaining
N – 1 acknowledgments arrive at the sender before their
respective timeouts
v GBN would retransmit all of the subsequent packets
v TCP would retransmit at most (remember cumulative acks) one
segment (segment n)
v TCP SACK: selective acknowledgments (ack out-of-order
segments)
Transport Layer 3-21
Outline
v transport-layer services
v multiplexing and demultiplexing
v connectionless transport: UDP
v principles of reliable data transfer
v connection-oriented transport: TCP
§ segment structure
§ reliable data transfer
§ flow control
§ connection management
v principles of congestion control
v TCP congestion control
Transport Layer 3-22
Connection Management
before exchanging data, sender/receiver “handshake”:
v agree to establish connection (each knowing the other willing
to establish connection)
v agree on connection parameters
connection state: ESTAB
connection variables:
seq # client-to-server
server-to-client
rcvBuffer size
at server,client
application
network
connection state: ESTAB
connection Variables:
seq # client-to-server
server-to-client
rcvBuffer size
at server,client
application
network
Socket clientSocket =
newSocket("hostname","port
number");
Socket connectionSocket =
welcomeSocket.accept();
Transport Layer 3-23
Q: will 2-way
handshake always
work in network?
v variable delays
v retransmitted messages
(e.g. req_conn(x)) due to
message loss
v message reordering
v can’t “see” other side
2-way handshake:
Let’s talk
OK
ESTAB
ESTAB
choose x
req_conn(x)
ESTAB
ESTAB
acc_conn(x)
Agreeing to establish a connection
Transport Layer 3-24
Agreeing to establish a connection
2-way handshake failure scenarios:
retransmit
req_conn(x)
ESTAB
req_conn(x)
half open connection!
(no client!)
client
terminates
server
forgets x
connection
x completes
retransmit
req_conn(x)
ESTAB
req_conn(x)
data(x+1)
retransmit
data(x+1)
accept
data(x+1)
choose x
req_conn(x)
ESTAB
ESTAB
acc_conn(x)
client
terminates
ESTAB
choose x
req_conn(x)
ESTAB
acc_conn(x)
data(x+1) accept
data(x+1)
connection
x completes server
forgets x
Transport Layer 3-25
TCP 3-way handshake
SYNbit=1, Seq=x
choose init seq num, x
send TCP SYN msg
ESTAB
SYNbit=1, Seq=y
ACKbit=1; ACKnum=x+1
choose init seq num, y
send TCP SYNACK
msg, acking SYN
ACKbit=1, ACKnum=y+1
received SYNACK(x)
indicates server is live;
send ACK for SYNACK;
this segment may contain
client-to-server data
received ACK(y)
indicates client is live
SYNSENT
ESTAB
SYN RCVD
client state
CLOSED
server state
LISTEN
Transport Layer 3-26
TCP 3-way handshake: FSM
closed
Λ
listen
SYN
rcvd
SYN
sent
ESTAB
Socket clientSocket =
newSocket("hostname","port
number");
SYN(seq=x)
Socket connectionSocket =
welcomeSocket.accept();
SYN(x)
SYNACK(seq=y,ACKnum=x+1)
create new socket for
communication back to client
SYNACK(seq=y,ACKnum=x+1)
ACK(ACKnum=y+1)
ACK(ACKnum=y+1)
Λ
Transport Layer 3-27
TCP: closing a connection
v client, server each close their side of
connection
§ send TCP segment with FIN bit = 1
v respond to received FIN with ACK
§ on receiving FIN, ACK can be combined with own
FIN
v simultaneous FIN exchanges can be handled
Transport Layer 3-28
FIN_WAIT_2
CLOSE_WAIT
FINbit=1, seq=y
ACKbit=1; ACKnum=y+1
ACKbit=1; ACKnum=x+1
wait for server
close
can still
send data
can no longer
send data
LAST_ACK
CLOSED
TIMED_WAIT
timed wait
for 2*max
segment lifetime
CLOSED
TCP: closing a connection
FIN_WAIT_1 FINbit=1, seq=x can no longer
send but can
receive data
clientSocket.close()
client state
server state
ESTAB ESTAB
Transport Layer 3-29
Summary
v TCP
v Reliable data transfer
v Connection management