Advanced Networks Transport layer: TCP
Dr. | Lecturer School of Computer Science
rdt3.0 in action
rcv pkt0 send ack0
rcv pkt1 send ack1
rcv pkt1 (detect duplicate)
rcv pkt0 send ack0
rcv ack0 send pkt1
send pkt0 pkt0 rcv ack0 ack0
rcv pkt0 send ack0
rcv pkt1 send ack1
send pkt1 pkt1 ack1
resend pkt1 rcv ack1
resend pkt1
rcv pkt1 (detect duplicate)
rcv ack1 send pkt0
rcv ack1 (do nothing)
send pkt0 pkt0 ack0
rcv pkt0 send ack0
(c) ACK loss
(d) premature timeout/ delayed ACK
Performance of rdt3.0
› rdt3.0 is correct, but performance stinks
› e.g.: 1 Gbps link, 15 ms prop. delay, 8000 bit packet:
D =RL=8000bits = trans 109 bits/sec
§ U sender: utilization – fraction of time sender busy sending Usender= L/R = .008 = 0.00027
8 microsecs
RTT + L / R 30.008
§ if RTT=30 msec, 1KB pkt every 30 msec: 33kB/sec thruput over 1 Gbps link
v network protocol limits use of physical resources!
rdt3.0: stop-and-wait operation
sender first packet bit transmitted, t = 0
last packet bit transmitted, t = L / R RTT
ACK arrives, send next packet, t = RTT + L / R
Usender= L/R RTT + L / R
first packet bit arrives
last packet bit arrives, send ACK
= .008 30.008
pipelining: sender allows multiple, “in-flight”, yet-to- be-acknowledged pkts
– range of sequence numbers must be increased – buffering at sender and/or receiver
› two generic forms of pipelined protocols: go-Back-N, selective repeat
Pipelined protocols
Pipelining: increased utilization
first packet bit transmitted, t = 0 last bit transmitted, t = L / R
first packet bit arrives
last packet bit arrives, send ACK
last bit of 2nd packet arrives, send ACK last bit of 3rd packet arrives, send ACK
3-packet pipelining increases utilization by a factor of 3!
.0024 = 0.00081 30.008
ACK arrives, send next packet, t = RTT + L / R
U sender = 3L / R RTT + L / R
Pipelined protocols: overview
Go-back-N:
›sender can have up to N unacked packets in pipeline
› receiver only sends cumulative ack
– does not ack packet if there is a gap
› sender has timer for oldest unacked packet
– when timer expires, retransmit all unacked packets
Selective Repeat:
›sender can have up to N unacked packets in pipeline
› receiver sends individual ack for each packet
› sender maintains timer for each unacked packet
– when timer expires, retransmit only that unacked packet
Go-Back-N: sender
› “window” of up to N, consecutive unacked pkts allowed
v ACK(n): ACKs all pkts up to, including seq # n – “cumulative ACK”
§ may receive duplicate ACKs (see receiver)
v timer for oldest in-flight pkt
v timeout(n): retransmit packet n and all higher seq # pkts in window
Go-Back-N: sender
› “window” of up to N, consecutive unacked pkts allowed
v ACK(n): ACKs all pkts up to, including seq # n – “cumulative ACK”
§ may receive duplicate ACKs (see receiver)
v timer for oldest in-flight pkt
v timeout(n): retransmit packet n and all higher seq # pkts in window
GBN: sender extended FSM
rdt_send(data)
if (nextseqnum < base+N) {
sndpkt[nextseqnum] = make_pkt(nextseqnum,data,chksum) udt_send(sndpkt[nextseqnum])
if (base == nextseqnum)
base=1 nextseqnum=1
rdt_rcv(rcvpkt)
&& corrupt(rcvpkt)
start_timer nextseqnum++
} else refuse_data(data)
timeout start_timer
udt_send(sndpkt[base]) udt_send(sndpkt[base+1])
... udt_send(sndpkt[nextseqnum-1])
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
base = getacknum(rcvpkt)+1 If (base == nextseqnum)
stop_timer else
start_timer
GBN: receiver extended FSM
expectedseqnum=1
default udt_send(sndpkt)
rdt_rcv(rcvpkt)
&& notcurrupt(rcvpkt)
&& hasseqnum(rcvpkt,expectedseqnum)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(expectedseqnum,ACK,chksum) udt_send(sndpkt)
expectedseqnum++
ACK-only: always send ACK for correctly-received pkt with highest in-order seq #
- may generate duplicate ACKs
- need only remember expectedseqnum
› out-of-order pkt:
- discard (don’t buffer): no receiver buffering! - re-ACK pkt with highest in-order seq #
GBN in action
sender window (N=4)
012345678 012345678 45678 45678
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8
receive pkt0, send ack0 receive pkt1, send ack1
receive pkt3, discard, (re)send ack1
receive pkt4, discard, (re)send ack1 receive pkt5, discard,
(re)send ack1
rcv pkt2, deliver, send ack2 rcv pkt3, deliver, send ack3 rcv pkt4, deliver, send ack4 rcv pkt5, deliver, send ack5
send pkt0 send pkt1 send pkt2 send pkt3
01 678 0 1 6 7 8 0 1 6 7 8 0 1 6 7 8
rcv ack0, send pkt4 rcv ack1, send pkt5
ignore duplicate ACK
pkt 2 timeout
send pkt2 send pkt3 send pkt4 send pkt5
Xloss (wait)
Selective repeat
› receiver individually acknowledges all correctly received pkts
- buffers pkts, as needed, for eventual in-order delivery to upper layer
› sender only resends pkts for which ACK not received
- sender timer for each unACKed pkt
› sender window
› receiver window
Selective repeat: sender, receiver windows
Selective repeat
data from above:
› if next available seq # in window, send pkt
timeout(n):
› resend pkt n, restart timer ACK(n) in [sendbase,sendbase+N-1]:
› mark pkt n as received
› if n is smallest unACKed pkt, advance window base to next unACKed seq #
pkt n in [rcvbase, rcvbase+N-1]
v send ACK(n)
v out-of-order: buffer
v in-order: deliver (also deliver buffered, in-order pkts), advance window to next not-yet-received pkt
pkt n in [rcvbase-N,rcvbase-1]
otherwise:
Selective repeat in action
sender window (N=4)
012345678 012345678 45678 45678
012345678 012345678
01 678 01 678 01 678
012345678 0123456789
receive pkt0, send ack0 receive pkt1, send ack1
receive pkt3, buffer, send ack3
receive pkt4, buffer, send ack4
receive pkt5, buffer, send ack5
rcv pkt2; deliver pkt2, pkt3, pkt4, pkt5; send ack2
send pkt0 send pkt1 send pkt2 send pkt3
Xloss (wait)
rcv ack0, send pkt4 rcv ack1, send pkt5
record ack3 arrived
pkt 2 timeout
record ack4 arrived record ack5 arrived
Q: what happens when ack2 arrives?
Connection-oriented Transport TCP
TCP: Overview RFCs: 793,1122,1323, 2018, 2581
› point-to-point:
- one sender, one receiver
› reliable, in-order byte stream › pipelined:
- TCP congestion and flow control set window size
› full duplex data:
- bi-directional data flow in
same connection
- MSS: maximum segment size
› connection-oriented:
- handshaking (exchange of control msgs) inits sender, receiver state before data exchange
› flow controlled:
- sender will not overwhelm receiver
TCP segment structure
URG: urgent data (generally not used)
ACK: ACK # valid
PSH: push data now (generally not used)
RST, SYN, FIN: connection estab (setup, teardown commands)
Internet checksum (as in UDP)
(not segments!)
# bytes rcvr willing to accept
source port #
sequence number
acknowledgement number
dest port #
options (variable length)
application
(variable length)
receive window
Urg data pointer
TCP seq. numbers, ACKs
sequence numbers:
- “number” of first byte in segment’s data
acknowledgements:
- seq # of next byte expected from other side
- cumulative ACK
Q: how receiver handles out-of-
order segments
- A: TCP spec doesn’t say, - up to implementor
- Most will store, but still use cumulative ACK
outgoing segment from sender
source port # dest port #
sequence number
acknowledgement number
urg pointer
window size N
sender sequence number space
sent ACKed
sent, not- yet ACKed (“in- flight”)
usable not but not usable yet sent
incoming segment to sender
source port # dest port #
sequence number
acknowledgement number
urg pointer
Host A Host B
User types ‘C’
host ACKs receipt
of echoed ‘C’
TCP seq. numbers, ACKs
Seq=4201, ACK=7901, data = ‘C’, 4201- 4300
Seq=7901, ACK=4301, data = ‘C’ 7901-8000
host ACKs receipt of ‘C’, echoes back ‘C’
Seq=4301, ACK=8001 no data
simple telnet scenario
TCP round trip time, timeout
Q: how to set TCP timeout value?
› longer than RTT - but RTT varies
› too short: premature timeout, unnecessary retransmissions
› too long: slow reaction to segment loss
Q: how to estimate RTT?
› SampleRTT: measured time from segment transmission until ACK receipt
- ignore retransmissions
› SampleRTT will vary, want
estimated RTT “smoother”
- weighted average of several recent measurements, not just current SampleRTT
Host A Host B
SampleRTT SampleRTT
Ignore retransmissions
First attempt
First attempt
SampleRTT?
Retransmission ACK
SampleRTT?
Retransmission ACK
SampleRTT?
SampleRTT?
TCP round trip time, timeout
EstimatedRTT = (1- a)*EstimatedRTT + a*SampleRTT
v exponential weighted moving average
v influence of past sample decreases exponentially fast
v typical value: a = 0.125
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
EstimatedRTT
time (seconnds)
time (seconds)
SampleRTT Estimated RTT 26
RTT (milliseconds)
RTT (milliseconds)
TCP round trip time, timeout
› timeout interval: EstimatedRTT plus “safety margin” - large variation in EstimatedRTT -> larger safety margin
› estimate SampleRTT deviation from EstimatedRTT: DevRTT = (1-b)*DevRTT +
b*|SampleRTT-EstimatedRTT| (typically, b = 0.25)
TimeoutInterval = EstimatedRTT + 4*DevRTT
estimated RTT
“safety margin”
Reliable Data Transfer in TCP
TCP reliable data transfer
› TCP creates rdt service on top of IP’s unreliable service
– pipelined segments
– cumulative acks
– single retransmission timer
›retransmissions triggered by:
– timeout events – duplicate acks
let’s initially consider simplified TCP sender:
– ignore duplicate acks
– ignore flow control, congestion control
TCP sender events
data rcvd from app:
› create segment with seq #
› seq # is byte-stream number of first data byte in segment
› start timer if not already running
– think of timer as for oldest unacked segment
– expiration interval: TimeOutInterval
› retransmit segment that caused timeout
› restart timer ack rcvd:
› if ack acknowledges previously unacked segments
– update what is known to be ACKed
– start timer if there are still unacked segments
TCP sender (simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
wait for event
data received from application above
create segment, seq. #: NextSeqNum
pass segment to IP (i.e., “send”) NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
start timer
retransmit not-yet-acked segment with smallest seq. #
start timer
ACK received, with ACK field value y
if (y > SendBase) {
SendBase = y
/* SendBase–1: last cumulatively ACKed byte */ if (there are currently not-yet-acked segments)
start timer else stop timer
Host A Host B
Seq=92, 8 bytes of data
TCP: retransmission scenarios
Host A Host B SendBase=92
Seq=92, 8 bytes of data
lost ACK scenario
SendBase=100 SendBase=120
SendBase=120
Seq=92, 8 bytes of data Seq=100, 20 bytes of data
ACK=100 ACK=120
Seq=92, 8 bytes of data
premature timeout
TCP: retransmission scenarios
Seq=92, 8 bytes of data
Seq=100, 20 bytes of data
X ACK=100 ACK=120
Seq=120, 15 bytes of data
cumulative ACK
TCP ACK generation [RFC 1122, RFC 2581]
event at receiver
arrival of in-order segment with expected seq #. All data up to expected seq # already ACKed
arrival of in-order segment with expected seq #. One other segment has ACK pending
arrival of out-of-order segment higher-than-expect seq. # . Gap detected
arrival of segment that partially or completely fills gap
TCP receiver action
delayed ACK. Wait up to 500ms
for next segment. If no next segment, send ACK
immediately send single cumulative ACK, ACKing both in-order segments
immediately send duplicate ACK, indicating seq. # of next expected byte
immediate send ACK, provided that segment starts at lower end of gap
Too many ACKs
cumulative ACK
Host A Host B Host A
Need to ack this one?
TCP fast retransmit
› time-out period often relatively long:
– long delay before resending lost packet
› detect lost segments via duplicate ACKs.
– sender often sends many segments back-to-back
– if segment is lost, there will likely be many duplicate ACKs.
TCP fast retransmit
if sender receives 3 duplicate ACKs for same data
(“triple duplicate ACKs”), resend unacked segment with smallest seq #
§ likely that unacked segment lost, so don’t wait for timeout
TCP fast retransmit
Seq=92, 8 bytes of data
Seq=100, 20 bytes of data
ACK=100 ACK=100
Seq=100, 20 bytes of data
fast retransmit after sender receipt of triple duplicate ACK
Flow Control in TCP
TCP flow control
application process
TCP socket receiver buffers
application may remove data from TCP socket buffers ….
… slower than TCP receiver is delivering (sender is sending)
application OS
flow control
receiver controls sender, so sender won’t overflow receiver’s buffer by transmitting too much, too fast
from sender
receiver protocol stack
TCP flow control
› receiver “advertises” free buffer space by including rwnd value in TCP header of receiver-to- sender segments
– RcvBuffer size set via socket options (typical default is 4096 bytes)
– many operating systems autoadjust RcvBuffer
› sender limits amount of unacked (“in-flight”) data to receiver’s rwnd value
› guarantees receive buffer will not overflow
to application process
buffered data
free buffer space
TCP segment payloads receiver-side buffering
Connection Management in TCP
Connection Management
before exchanging data, sender/receiver “handshake”:
› agree to establish connection (each knowing the other willing to establish connection)
› agree on connection parameters
application
connection state: ESTAB connection Variables:
seq # client-to-server server-to-client
rcvBuffer size at server,client
application
connection state: ESTAB connection variables:
seq # client-to-server server-to-client
rcvBuffer size at server,client
2-way handshake:
Let’s talk OK
Agreeing to establish a connection
Q: will 2-way handshake always work in network?
› variable delays
› retransmitted messages (e.g.
req_conn(x)) due to message loss
› message reordering
› can’t “see” other side
req_conn(x)
acc_conn(x)
Agreeing to establish a connection
2-way handshake failure scenarios:
retransmit req_conn(x)
client terminates
req_conn(x)
retransmit req_conn(x)
retransmit data(x+1)
client terminates
req_conn(x)
accept data(x+1)
server forgets x
accept data(x+1)
acc_conn(x) req_conn(x)
connection x completes
server forgets x
acc_conn(x) data(x+1)
connection x completes
req_conn(x) data(x+1)
half open connection! (no client!)
client state
choose init seq num, x
TCP 3-way handshake
server state
SYNbit=1, Seq=x
SYNbit=1, Seq=y ACKbit=1; ACKnum=x+1
ACKbit=1, ACKnum=y+1
send TCP SYN msg
choose init seq num, y send TCP SYNACK msg, acking SYN
received SYNACK(x) indicates server is live; send ACK for SYNACK;
this segment may contain client-to-server data
received ACK(y) indicates client is live
TCP: closing a connection
› client, server each closes their side of connection – send TCP segment with FIN bit = 1
› respond to received FIN with ACK
client state
server state
CLOSE_WAIT
TCP: closing a connection
clientSocket.close()
FIN_WAIT_1
FIN_WAIT_2
TIMED_WAIT
can no longer send but can receive data
wait for server close
FINbit=1, seq=x
ACKbit=1; ACKnum=x+1
FINbit=1, seq=y ACKbit=1; ACKnum=y+1
can still send data
can no longer send data
timed wait for 2*max
segment lifetime
TCP segment structure
URG: urgent data (generally not used)
ACK: ACK # valid
PSH: push data now (generally not used)
RST, SYN, FIN: connection estab (setup, teardown commands)
Internet checksum (as in UDP)
(not segments!)
# bytes rcvr willing to accept
source port #
sequence number
acknowledgement number
dest port #
options (variable length)
application
(variable length)
receive window
Urg data pointer
Principles of Congestion Control
Principles of congestion control
congestion:
› informally: “too many sources sending too much
data too fast for network to handle” › different from flow control!
› manifestations:
– lost packets (buffer overflow at routers)
– long delays (queueing in router buffers) › a top-10 problem!
Causes/costs of congestion: scenario 1
original data: lin › two senders, two receivers
throughput: lout unlimited shared
output link buffers
throughput: lout
› one router, infinite buffers
› output link capacity: R
› no retransmission original data: lin
v large delays as arrival rate, lin, approaches capacity
› maximum per-connection throughput: R/2
Causes/costs of congestion: scenario 2
› one router, finite buffers
› sender retransmission of timed-out packet
– application-layer input = application-layer output: lin = lout、 – Goodput
– transport-layer input includes retransmissions : l’in lin lin : original data
l’in: original data, plus retransmitted data
finite shared output link buffers
Causes/costs of congestion: scenario 2
idealization: perfect knowledge
› sender sends only when router buffers available
lin : original data
l’in: original data, plus
retransmitted data
free buffer space!
finite shared output link buffers
Causes/costs of congestion: scenario 2
Idealization: known loss packets can be lost, dropped at router due to full buffers
› sender only resends if packet known to be lost
lin : original data lout l’in: original data, plus
retransmitted data
no buffer space!
Causes/costs of congestion: scenario 2
Idealization: known loss packets can be lost, dropped at router due to full buffers
› sender only resends if packet known to be lost
lin lin : original data
l’in: original data, plus retransmitted data
free buffer space!
when sending at R/2, some packets are retransmissions but asymptotic goodput is still R/2
Causes/costs of congestion: scenario 2
Realistic: duplicates
v packets can be lost, dropped
at router due to full buffers
v sender times out prematurely, sending two copies, both of which are delivered
when sending at R/2, some packets are retransmissions including duplicated that are delivered!
free buffer space!