6.Transport_Part2
COMP 3331/9331:
Computer Networks and
Applications
Week 5
Transport Layer (Continued)
Reading Guide: Chapter 3, Sections: 3.5 – 3.7
2
Transport Layer Outline
3.1 transport-layer
services
3.2 multiplexing and
demultiplexing
3.3 connectionless
transport: UDP
3.4 principles of reliable
data transfer
3.5 connection-oriented
transport: TCP
§ segment structure
§ reliable data transfer
§ flow control
§ connection management
3.6 principles of congestion
control
3.7 TCP congestion control
Practical Reliability Questions
v How do the sender and receiver keep track of
outstanding pipelined segments?
v How many segments should be pipelined?
v How do we choose sequence numbers?
v What does connection establishment and teardown
look like?
v How should we choose timeout values?
3
TCP: Overview RFCs: 793,1122,1323, 2018, 2581
v full duplex data:
§ bi-directional data flow
in same connection
§ MSS: maximum segment
size
v connection-oriented:
§ handshaking (exchange
of control msgs) inits
sender, receiver state
before data exchange
v flow controlled:
§ sender will not
overwhelm receiver
v point-to-point:
§ one sender, one receiver
v reliable, in-order byte
stream:
§ no “message boundaries”
v pipelined:
§ TCP congestion and flow
control set window size
v send and receive buffers
socket
door
TCP
send buffer
TCP
receive buffer
socket
door
segment
application
writes data
application
reads data
4
TCP segment structure
source port # dest port #
32 bits
application
data
(variable length)
sequence number
acknowledgement number
receive window
Urg data pointerchecksum
FSRPAUheadlen
not
used
options (variable length)
URG: urgent data
(generally not used)
ACK: ACK #
valid
PSH: push data now
(generally not used)
RST, SYN, FIN:
connection estab
(setup, teardown
commands)
# bytes
rcvr willing
to accept
counting
by bytes
of data
(not segments!)
Internet
checksum
(as in UDP)
5
TCP segment structure
6
TCP Segments
source port # dest port #
32 bits
application
data
(variable length)
Urg data pointer
F S R P A U
head
len
not
used
checksum
receive window
sequence number
acknowledgement number
options (variable length)
20 Bytes
(UDP was 8)
Transport Layer Outline
3.1 transport-layer
services
3.2 multiplexing and
demultiplexing
3.3 connectionless
transport: UDP
3.4 principles of reliable
data transfer
3.5 connection-oriented
transport: TCP
§ segment structure
§ reliable data transfer
§ flow control
§ connection management
3.6 principles of congestion
control
3.7 TCP congestion control
7
Recall: Components of a solution for
reliable transport
v Checksums (for error detection)
v Timers (for loss detection)
v Acknowledgments
§Cumulative
§ Selective
v Sequence numbers (duplicates, windows)
v Sliding Windows (for efficiency)
§Go-Back-N (GBN)
§ Selective Repeat (SR)
8
What does TCP do?
Many of our previous ideas, but some key
differences
v Checksum
9
10
TCP Header
Source port Destination port
Sequence number
Acknowledgment
Receive windowHdrLen Flags0
Checksum Urgent pointer
Options (variable)
Data
Computed
over header
and data
(SAME AS UDP)
What does TCP do?
Many of our previous ideas, but some key
differences
v Checksum
v Sequence numbers are byte offsets
11
TCP “Stream of Bytes” Service ..
B
yte 0
B
yte 1
B
yte 2
B
yte 3
B
yte 0
B
yte 1
B
yte 2
B
yte 3
Application @ Host A
Application @ Host B
B
yte 80
B
yte 80
12
.. Provided Using TCP “Segments”
B
yte 0
B
yte 1
B
yte 2
B
yte 3
B
yte 0
B
yte 1
B
yte 2
B
yte 3
Host A
Host B
B
yte 80
TCP Data
TCP Data
B
yte 80
Segment sent when:
1. Segment full (Max Segment Size),
2. Not full, but instructed by the Application e.g.,
1 byte in Telnet
13
TCP Maximum Segment Size
v IP packet
§ No bigger than Maximum Transmission Unit (MTU)
§ E.g., up to 1500 bytes with Ethernet
v TCP packet
§ IP packet with a TCP header and data inside
§ TCP header ³ 20 bytes long
v TCP segment
§ No more than Maximum Segment Size (MSS) bytes
§ E.g., up to 1460 consecutive bytes from the stream
§ MSS = MTU – 20 (min IP header ) – 20 ( min TCP header )
IP Hdr
IP Data
TCP HdrTCP Data (segment)
14
MTU
Sequence Numbers
Host A
ISN (initial sequence number)
Sequence number
= 1st byte in segment =
ISN + k
k bytes
15
Sequence numbers:
• byte stream “number” of first byte in
segment’s data
Sequence & Ack Numbers
Host B
TCP Data
TCP Data
TCP
HDR
TCP
HDR
ACK sequence number
= next expected byte
= seqno + length(data)
Host A
ISN (initial sequence number)
Sequence number
= 1st byte in segment =
ISN + k
k
16
17
Example
Note: Connection establishment not shown. Alice’s end point selects the initial
sequence number as 0 while Bob’s end point selects the initial sequence number as 10
18
Another Example
Note: Connection establishment not shown. Alice’s end point selects the initial
sequence number as 0 while Bob’s end point selects the initial sequence number as 10
HTTP response split into 3 segments (MSS = 1500 bytes)
Why choose random ISN?
v Avoids ambiguity with back-to-back connections
between same end-points
(a) When ISN=0 (b) When ISN is random
v Potential security issue if the ISN is known
3-19
What does TCP do?
Most of our previous tricks, but a few differences
v Checksum
v Sequence numbers are byte offsets
v Receiver sends cumulative acknowledgements (like GBN)
20
ACKing and Sequence Numbers
v Sender sends packet
§ Data starts with sequence number X
§ Packet contains B bytes [X, X+1, X+2, ….X+B-1]
v Upon receipt of packet, receiver sends an ACK
§ If all data prior to X already received:
• ACK acknowledges X+B (because that is next expected byte)
§ If highest in-order byte received is Y s.t. (Y+1) < X
• ACK acknowledges Y+1
• Even if this has been ACKed before
21
22
TCP seq. numbers, ACKs
Host BHost A
Seq=100, Data=50
ISN=100
Seq=150, Data=50
Seq=200, Data=50
Seq=250, Data=50
ACK=200, Received 150-199
Seq=???, Data=50
Seq 100 to Seq 149
ACK=150, Received 100-149
ACK=250, Received 200-249
ACK=300, Received 250-299
23
TCP seq. numbers, ACKs
Host BHost A
Seq=100, Data=50
ISN=100
Seq=150, Data=50
Seq=200, Data=50
Seq=250, Data=50
ACK=150, Received 100-149
ACK=???, Received 200-249
ACK=???, Received 250-299
Both ACKs would be 150, as that
is the next expected seq number
Normal Pattern
v Sender: seqno=X, length=B
v Receiver: ACK=X+B
v Sender: seqno=X+B, length=B
v Receiver: ACK=X+2B
v Sender: seqno=X+2B, length=B
v Seqno of next packet is same as last ACK field
24
Packet Loss
v Sender: seqno=X, length=B
v Receiver: ACK=X+B
v Sender: seqno=X+B, length=B
v Sender: seqno=X+2B, length=B
v Receiver: ACK = X+B
LOST
25
26
TCP Header
Source port Destination port
Sequence number
Acknowledgment
Receive windowHdrLen Flags0
Checksum Urgent pointer
Options (variable)
Data
Acknowledgment
gives seqno just
beyond highest
seqno received in
order
(“What Byte
is Next”)
Piggybacking
v So far, we’ve assumed
distinct “sender” and
“receiver” roles
v In reality, usually both
sides of a connection
send some data
27
Piggybacking
• So far, we’ve assumed
distinct “sender” and
“receiver” roles
• In reality, usually both
sides of a connection
send some data
– request/response is a
common pattern
Client Server
Without
Piggybacking
…
Client Server
With
Piggybacking
…
Quiz
Seq = ?, 2 KBytes of data
ACK = ?
ACK
= ?
Seq =
1024
, 1 KB
yte of
data
Seq = 101, 2 KBytes of data
ACK = 101 + 2048 = 2149
ACK = 1024 + 1024 = 2048
Seq = 2149
28
What does TCP do?
Most of our previous tricks, but a few differences
v Checksum
v Sequence numbers are byte offsets
v Receiver sends cumulative acknowledgements (like GBN)
v Receivers can buffer out-of-sequence packets (like SR)
29
Loss with cumulative ACKs
v Sender sends packets with 100Bytes and
sequence numbers:
§ 100, 200, 300, 400, 500, 600, 700, 800, 900, …
v Assume the fifth packet (seq. no. 500) is lost,
but no others
v Stream of ACKs will be:
§ 200, 300, 400, 500, 500, 500, 500,…
30
What does TCP do?
Most of our previous tricks, but a few differences
v Checksum
v Sequence numbers are byte offsets
v Receiver sends cumulative acknowledgements (like GBN)
v Receivers do not drop out-of-sequence packets (like SR)
v Sender maintains a single retransmission timer (like GBN) and
retransmits on timeout (how much?)
31
32
TCP round trip time, timeout
TCP round trip time, timeout
Q: how to set TCP
timeout value?
v longer than RTT
§ but RTT varies
v too short: premature
timeout, unnecessary
retransmissions
v too long: slow reaction
to segment loss and
connection has lower
throughput
Q: how to estimate RTT?
v SampleRTT: measured
time from segment
transmission until ACK
receipt
§ ignore retransmissions
v SampleRTT will vary, want
estimated RTT “smoother”
§ average several recent
measurements, not just
current SampleRTT
33
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T
(m
ill
is
ec
on
ds
)
SampleRTT Estimated RTT
EstimatedRTT = (1- a)*EstimatedRTT + a*SampleRTT
v exponential weighted moving average
v influence of past sample decreases exponentially fast
v typical value: a = 0.125
TCP round trip time, timeout
RT
T
(m
ill
is
ec
on
ds
)
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
sampleRTT
EstimatedRTT
time (seconds) 34
v timeout interval: EstimatedRTT plus “safety margin”
§ large variation in EstimatedRTT -> larger safety margin
v estimate SampleRTT deviation from EstimatedRTT:
DevRTT = (1-b)*DevRTT +
b*|SampleRTT-EstimatedRTT|
TCP round trip time, timeout
(typically, b = 0.25)
TimeoutInterval = EstimatedRTT + 4*DevRTT
estimated RTT “safety margin”
35
Practice Problem:
http://wps.pearsoned.com/ecs_kurose_compnetw_6/216/55463/14198700.cw/index.html
36
TCP round trip time, timeout
TimeoutInterval = EstimatedRTT + 4*DevRTT
estimated RTT “safety margin”
(EstimatedRTT+4*DevRTT)
DevRTT
EstimatedRTT
RTT
Figure: Credits Prof David Wetherall UoW
Why exclude retransmissions in RTT
computation?
v How do we differentiate between the real ACK, and ACK of
the retransmitted packet?
ACK
Retransmission
Original Transmission
Sa
m
pl
eR
T
T
Sender Receiver
ACK
Retransmission
Original Transmission
SampleRTT
Sender Receiver
37
TCP sender events:
data rcvd from app:
v create segment with
seq #
v seq # is byte-stream
number of first data
byte in segment
v start timer if not
already running
§ think of timer as for
oldest unacked
segment
§ expiration interval:
TimeOutInterval
timeout:
v retransmit segment
that caused timeout
v restart timer
ack rcvd:
v if ack acknowledges
previously unacked
segments
§ update what is known
to be ACKed
§ start timer if there are
still unacked segments
38
PUTTING IT
TOGETHER
TCP sender (simplified)
wait
for
event
NextSeqNum = InitialSeqNum
SendBase = InitialSeqNum
L
create segment, seq. #: NextSeqNum
pass segment to IP (i.e., “send”)
NextSeqNum = NextSeqNum + length(data)
if (timer currently not running)
start timer
data received from application above
retransmit not-yet-acked segment
with smallest seq. #
start timer
timeout
if (y > SendBase) {
SendBase = y
/* SendBase–1: last cumulatively ACKed byte */
if (there are currently not-yet-acked segments)
start timer
else stop timer
}
ACK received, with ACK field value y
39
PUTTING IT
TOGETHER
TCP: retransmission scenarios
lost ACK scenario
Host BHost A
Seq=92, 8 bytes of data
ACK=100
Seq=92, 8 bytes of data
Xtim
eo
ut
ACK=100
premature timeout
Host BHost A
Seq=92, 8 bytes of data
ACK=100
Seq=92, 8
bytes of data
tim
eo
ut
ACK=120
Seq=100, 20 bytes of data
ACK=120
SendBase=100
SendBase=120
SendBase=120
SendBase=92
40
TCP: retransmission scenarios
X
cumulative ACK
Host BHost A
Seq=92, 8 bytes of data
ACK=100
Seq=120, 15 bytes of data
tim
eo
ut
Seq=100, 20 bytes of data
ACK=120
41
X
cumulative ACK
Host BHost A
Seq=92, 8 bytes of data
tim
eo
ut
Seq=100, 20 bytes of data
ACK=?
ACK = 92
TCP ACK generation [RFC 1122, RFC 2581]
event at receiver
arrival of in-order segment with
expected seq #. All data up to
expected seq # already ACKed
arrival of in-order segment with
expected seq #. One other
segment has ACK pending
arrival of out-of-order segment
higher-than-expect seq. # .
Gap detected
arrival of segment that
partially or completely fills gap
TCP receiver action
delayed ACK. Wait up to 500ms
for next segment. If no next segment,
send ACK
immediately send single cumulative
ACK, ACKing both in-order segments
immediately send duplicate ACK,
indicating seq. # of next expected byte
immediate send ACK, provided that
segment starts at lower end of gap
42
What does TCP do?
Most of our previous tricks, but a few differences
v Checksum
v Sequence numbers are byte offsets
v Receiver sends cumulative acknowledgements (like GBN)
v Receivers may not drop out-of-sequence packets (like SR)
v Sender maintains a single retransmission timer (like GBN) and
retransmits on timeout
v Introduces fast retransmit: optimisation that uses duplicate
ACKs to trigger early retransmission
43
X
fast retransmit after sender
receipt of triple duplicate ACK
Host BHost A
Seq=92, 8 bytes of data
ACK=100
ACK=100
ACK=100
ACK=100
TCP fast retransmit
Seq=100, 20 bytes of data
Seq=100, 20 bytes of data
44
tim
eo
ut
TCP fast retransmit
v time-out period often
relatively long:
§ long delay before resending
lost packet
v “Duplicate ACKs” are a
sign of an isolated loss
§ The lack of ACK progress
means that packet hasn’t
been delivered
§ Stream of ACKs means some
packets are being delivered
§ Could trigger resend on
receiving “k” duplicate ACKs
(TCP uses k = 3)
if sender receives 3
duplicate ACKs for
same data
(“triple duplicate ACKs”),
resend unacked
segment with smallest
seq #
§ likely that unacked
segment is lost, so
don’t wait for timeout
TCP fast retransmit
45
What does TCP do?
Most of our previous ideas, but some key
differences
v Checksum
v Sequence numbers are byte offsets
v Receiver sends cumulative acknowledgements (like GBN)
v Receivers do not drop out-of-sequence packets (like SR)
v Sender maintains a single retransmission timer (like GBN) and
retransmits on timeout
v Introduces fast retransmit: optimization that uses duplicate
ACKs to trigger early retransmission
46
Quiz: TCP Sequence Numbers?
A TCP Sender is just about to send a segment of
size 100 bytes with sequence number 1234 and ack
number 436 in the TCP header. What is the highest
sequence number up to (and including) which this
sender has received all bytes from the receiver?
A. 1233
B. 436
C. 435
D. 1334
E. 536
47
www.zeetings.com/salil
ANSWER: C
Quiz: TCP Sequence Numbers?
A TCP Sender is just about to send a segment of
size 100 bytes with sequence number 1234 and ack
number 436 in the TCP header. Is it possible that
the receiver has received byte number 1335?
A. Yes
B. No
48
www.zeetings.com/salil
ANSWER: A
Quiz: TCP Sequence Numbers?
The following statement is true about the TCP
sliding window protocol for implementing reliable
data transfer
A. It exclusively uses the ideas of Go-Back-N
B. It exclusively uses the ideas of Selective Repeat
C. It uses a combination of ideas of Go-Back-N and
Selective-Repeat
D. It uses none of the ideas of Go-Back-N and
Selective-Repeat
49
www.zeetings.com/salil
ANSWER: C
Transport Layer Outline
3.1 transport-layer
services
3.2 multiplexing and
demultiplexing
3.3 connectionless
transport: UDP
3.4 principles of reliable
data transfer
3.5 connection-oriented
transport: TCP
§ segment structure
§ reliable data transfer
§ flow control
§ connection management
3.6 principles of congestion
control
3.7 TCP congestion control
50
TCP flow control
application
process
TCP socket
receiver buffers
TCP
code
IP
code
application
OS
receiver protocol stack
application may
remove data from
TCP socket buffers ….
… slower than TCP
receiver is delivering
(sender is sending)
from sender
receiver controls sender, so
sender won’t overflow
receiver’s buffer by transmitting
too much, too fast
flow control
51
TCP flow control
buffered data
free buffer spacerwnd
RcvBuffer
TCP segment payloads
to application process
v receiver “advertises” free
buffer space by including
rwnd value in TCP header
of receiver-to-sender
segments
§ RcvBuffer size set via
socket options (typical default
is 4096 bytes)
§ many operating systems
autoadjust RcvBuffer
v sender limits amount of
unacked (“in-flight”) data to
receiver’s rwnd value
v guarantees receive buffer
will not overflow
receiver-side buffering
52
http://media.pearsoncmg.com/aw/aw_kurose_network_4/applets/flow/FlowControl.htm
53
TCP Header
Source port Destination port
Sequence number
Acknowledgment
Receive windowHdrLen Flags0
Checksum Urgent pointer
Options (variable)
Data
TCP flow control
v What if rwnd = 0?
§ Sender would stop sending data
§ Eventually the receive buffer would have space when
the application process reads some bytes
§ But how does the receiver advertise the new rwnd to
the sender?
v Sender keeps sending TCP segments with one
data byte to the receiver
v These segments are dropped but acknowledged
by the receiver with a zero-window size
v Eventually when the buffer empties, non-zero
window is advertised
54
Transport Layer Outline
3.1 transport-layer
services
3.2 multiplexing and
demultiplexing
3.3 connectionless
transport: UDP
3.4 principles of reliable
data transfer
3.5 connection-oriented
transport: TCP
§ segment structure
§ reliable data transfer
§ flow control
§ connection management
3.6 principles of congestion
control
3.7 TCP congestion control
55
Connection Management
before exchanging data, sender/receiver “handshake”:
v agree to establish connection (each knowing the other willing
to establish connection)
v agree on connection parameters
connection state: ESTAB
connection variables:
seq # client-to-server
server-to-client
rcvBuffer size
at server,client
application
network
connection state: ESTAB
connection Variables:
seq # client-to-server
server-to-client
rcvBuffer size
at server,client
application
network
Socket clientSocket =
newSocket(“hostname”,”port
number”);
Socket connectionSocket =
welcomeSocket.accept();
56
TCP 3-way handshake
SYNbit=1, Seq=x
choose init seq num, x
send TCP SYN msg
ESTAB
SYNbit=1, Seq=y
ACKbit=1; ACKnum=x+1
choose init seq num, y
send TCP SYNACK
msg, acking SYN
Allocates buffer,
initialize variables
ACKbit=1, ACKnum=y+1
received SYNACK(x)
indicates server is live;
send ACK for SYNACK;
this segment may contain
client-to-server data
received ACK(y)
indicates client is live
SYNSENT
ESTAB
SYN RCVD
client state
CLOSED
server state
LISTEN
57
A B
SYN Consumes 1 Sequence No
Step 1: A’s Initial SYN Packet
A’s port B’s port
A’s Initial Sequence Number
(Irrelevant since ACK not set)
Advertised window5 Flags0
Checksum Urgent pointer
Options (variable)
Flags: SYN
ACK
FIN
RST
PSH
URG
A tells B it wants to open a connection…
Step 2: B’s SYN-ACK Packet
B’s port A’s port
B’s Initial Sequence Number
ACK = A’s ISN plus 1
Advertised window5 0
Checksum Urgent pointer
Options (variable)
Flags: SYN
ACK
FIN
RST
PSH
URG
B tells A it accepts, and is ready to hear the next byte…
… upon receiving this packet, A can start sending data
Flags
59
Step 3: A’s ACK of the SYN-ACK
A’s port B’s port
B’s ISN plus 1
Advertised window5 Flags0
Checksum Urgent pointer
Options (variable)
Flags: SYN
ACK
FIN
RST
PSH
URG
A tells B it’s likewise okay to start sending
A’s Initial Sequence Number+1
… upon receiving this packet, B can start sending data
60
TCP 3-way handshake: FSM
closed
L
listen
SYN
rcvd
SYN
sent
ESTAB
clientSocket.connect((serverName,
serverPort))
SYN(seq=x)
connectionSocket, addr =
welcomeSocket.accept();
SYN(x)
SYNACK(seq=y,ACKnum=x+1)
SYNACK(seq=y,ACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
L
61
welcomeSocket.listen(1);
What if the SYN Packet Gets Lost?
v Suppose the SYN packet gets lost
§ Packet is lost inside the network, or:
§ Server discards the packet (e.g., it’s too busy)
v Eventually, no SYN-ACK arrives
§ Sender sets a timer and waits for the SYN-ACK
§ … and retransmits the SYN if needed
v How should the TCP sender set the timer?
§ Sender has no idea how far away the receiver is
§ Hard to guess a reasonable length of time to wait
§ SHOULD (RFCs 1122,2988) use default of 3 second,
RFC 6298 use default of 1 second
62
SYN Loss and Web Downloads
v User clicks on a hypertext link
§ Browser creates a socket and does a “connect”
§ The “connect” triggers the OS to transmit a SYN
v If the SYN is lost…
§ 1-3 seconds of delay: can be very long
§ User may become impatient
§ … and click the hyperlink again, or click “reload”
v User triggers an “abort” of the “connect”
§ Browser creates a new socket and another “connect”
§ Essentially, forces a faster send of a new SYN packet!
§ Sometimes very effective, and the page comes quickly
63
TCP: closing a connection
v client, server each close their side of connection
§ send TCP segment with FIN bit = 1
v respond to received FIN with ACK
§ on receiving FIN, ACK can be combined with own FIN
v simultaneous FIN exchanges can be handled
64
FIN_WAIT_2
CLOSE_WAIT
FINbit=1, seq=y
ACKbit=1; ACKnum=y+1
ACKbit=1; ACKnum=x+1
wait for server
close
can still
send data
can no longer
send data
LAST_ACK
CLOSED
TIMED_WAIT
timed wait
for 2*max
segment lifetime
CLOSED
Normal Termination, One at a Time
FIN_WAIT_1 FINbit=1, seq=xcan no longer
send but can
receive data
clientSocket.close()
client state server state
ESTABESTAB
65
TIMED_WAIT: Can retransmit ACK if last ACK is lost
FIN Consumes 1 Sequence No
CLOSE_WAIT
ACKbit=1; ACKnum=y+1
ACKbit=1; ACKnum=x+1
wait for server
close
FIN + ACK
together
can no longer
send data
LAST_ACK
CLOSED
TIMED_WAIT
timed wait
for 2*max
segment lifetime
CLOSED
FIN_WAIT_1 FINbit=1, seq=xcan no longer
send but can
receive data
clientSocket.close()
client state server state
ESTABESTAB
66
FINbit=1, seq=y
Normal Termination, Both Together
ACKbit=1,
ACKnum=y+1
wait for server
close
Send Ack
can no longer
send datacan no longer
send but can
receive data
clientSocket.close()
FINbit=1, seq=x
client state server state
67
FINbit=1, seq=y
Simultaneous Closure
ACKbit=1,
ACKnum=x+1
CLOSING
TIMED_WAIT
CLOSED
FIN_WAIT_1
ESTAB
CLOSING
CLOSED
FIN_WAIT_1
ESTAB
TIMED_WAIT
Abrupt Termination
v A sends a RESET (RST) to B
§ E.g., because application process on A crashed
v That’s it
§ B does not ack the RST
§ Thus, RST is not delivered reliably
§ And: any data in flight is lost
§ But: if B sends anything more, will elicit another RST
SY
N
SY
N
A
C
K
A
C
K
D
at
a
R
ST
A
C
K
time
A
B
D
ata R
ST
68
69
TCP Finite State Machine
CS144, Stanford University
FSM Example: TCP Connection
6 http://upload.wikimedia.org/wikipedia/commons/thumb/a/a2/Tcp_state_diagram_fixed.svg/
70
TCP SYN Attack (SYN flooding)
v Miscreant creates a fake SYN packet
§ Destination is IP address of victim host (usually some server)
§ Source is some spoofed IP address
v Victim host on receiving creates a TCP connection state i.e allocates
buffers, creates variables, etc and sends SYN ACK to the spoofed
address (half-open connection)
v ACK never comes back
v After a timeout connection state is freed
v However for this duration the connection state is unnecessarily
created
v Further miscreant sends large number of fake SYNs
§ Can easily overwhelm the victim
v Solutions:
§ Increase size of connection queue
§ Decrease timeout wait for the 3-way handshake
§ Firewalls: list of known bad source IP addresses
§ TCP SYN Cookies (explained on next slide)
71
TCP SYN Cookie
v On receipt of SYN, server does not create connection
state
v It creates an initial sequence number (init_seq) that is a
hash of source & dest IP address and port number of SYN
packet (secret key used for hash)
§ Replies back with SYN ACK containing init_seq
§ Server does not need to store this sequence number
v If original SYN is genuine, an ACK will come back
§ Same hash function run on the same header fields to get the initial
sequence number (init_seq)
§ Checks if the ACK is equal to (init_seq+1)
§ Only create connection state if above is true
v If fake SYN, no harm done since no state was created
Quiz: TCP Connection Management?
Roughly how much time does it take for both the
TCP Sender and Receiver to establish connection
state since the connect() call?
A. RTT
B. 1.5RTT
C. 2RTT
D. 3RTT
72
www.zeetings.com/salil
ANSWER: B
Note that the final ACK will be
typically piggybacked with
the first data segment, so
often the TCP connection
setup is approximated to be
1RTT
Quiz: TCP Connection Management?
Assume that one end point of the TCP connection
sends a FIN segment. If it never receives an ACK,
what should it do?
A. Assume that the connection is closed and do
nothing
B. Retransmit the FIN
C. Transmit an ACK
D. Start crying
73
www.zeetings.com/salil
ANSWER: B
74
Transport Layer: Outline
3.1 transport-layer
services
3.2 multiplexing and
demultiplexing
3.3 connectionless
transport: UDP
3.4 principles of reliable
data transfer
3.5 connection-oriented
transport: TCP
§ segment structure
§ reliable data transfer
§ flow control
§ connection management
3.6 principles of congestion
control
3.7 TCP congestion control
congestion:
v informally: “too many sources sending too much
data too fast for network to handle”
v different from flow control!
v manifestations:
§ lost packets (buffer overflow at routers)
§ long delays (queueing in router buffers)
v a top-10 problem!
Principles of congestion control
75
Congestion
76
Trash
Congestion
Router
Router’s buffer.
Incoming rate is faster than
outgoing link can support.
Ugh. I so
can’t deal
with this right
now!
Congestion Collapse
77
Congestion Collapse
…
…
…
…
Link A Link B
Congestion Collapse
78
Congestion Collapse
…
…
…
…
Link A Link B
One sender starts,
but there’s still
capacity at link A.
S1
Congestion Collapse
79
Congestion Collapse
…
…
…
…
Link A Link B
S1
S2
Another sender starts
up. Link A is showing
slight delay, but still
doing ok.
Congestion Collapse
80
Congestion Collapse
…
…
…
…
Link A Link B
S1
S2
Unrelated traffic
passes through and
congests link B.
Congestion Collapse
…
…
…
…
Link A Link B
S1
S2 S2’s traffic is being dropped at Link B, so it starts retransmitting
on top of what it was sending.
(This is very bad. S2 is now sending lots of traffic over link A
that has no hope of crossing link B.)
Congestion Collapse
81
Congestion Collapse
82
Congestion Collapse
…
…
…
…
Link A Link B
S1
S2
Increased traffic from S2
causes Link A to become
congested. S1 starts
retransmitting.
Congestion Collapse
83
Congestion Collapse
…
…
…
…
Link A Link B
S1
S2
Congestion
propagates
backwards…
congestion:
v Increases delays
§ If delays > RTO, sender retransmits
v Increases loss rate
§ Dropped packets also retransmitted
v Increases retransmissions, many unnecessary
§ Wastes capacity of traffic that is never delivered
§ Increase in load results in decrease in useful work done
v Increases congestion, cycle continues …
Without congestion control
84
Cost of Congestion
v Knee – point after which
§ Throughput increases slowly
§ Delay increases fast
v Cliff – point after which
§ Throughput starts to drop to zero
(congestion collapse)
§ Delay approaches infinity
Load
Load
T
hr
o
ug
hp
ut
D
el
ay
knee cliff
congestion
collapse
packet
loss
85
This happened to the Internet (then NSFnet) in
1986
v Rate dropped from a blazing 32 Kbps to 40bps
v This happened on and off for two years
v In 1988, Van Jacobson published “Congestion
Avoidance and Control”
v The fix: senders voluntarily limit sending rate
Congestion Collapse
86
Approaches towards congestion control
two broad approaches towards congestion control:
end-end congestion
control:
v no explicit feedback
from network
v congestion inferred
from end-system
observed loss, delay
v approach taken by
TCP
network-assisted
congestion control:
v routers provide
feedback to end systems
§ single bit indicating
congestion (SNA,
DECbit, TCP/IP ECN,
ATM)
§explicit rate for
sender to send at
87
Transport Layer: Outline
3.1 transport-layer
services
3.2 multiplexing and
demultiplexing
3.3 connectionless
transport: UDP
3.4 principles of reliable
data transfer
3.5 connection-oriented
transport: TCP
§ segment structure
§ reliable data transfer
§ flow control
§ connection management
3.6 principles of congestion
control
3.7 TCP congestion control
88
TCP’s Approach in a Nutshell
v TCP connection maintains a window
§ Controls number of packets in flight
v TCP sending rate:
§ roughly: send cwnd bytes, wait RTT for ACKs, then
send more bytes
v Vary window size to control sending rate
rate ~~
cwnd
RTT
bytes/sec
last byte
ACKed sent, not-
yet ACKed
(“in-
flight”)
last byte
sent
cwnd
sender sequence number space
89
All These Windows…
v Congestion Window: CWND
§ How many bytes can be sent without overflowing routers
§ Computed by the sender using congestion control algorithm
v Flow control window: Advertised / Receive Window (RWND)
§ How many bytes can be sent without overflowing receiver’s buffers
§ Determined by the receiver and reported to the sender
v Sender-side window = minimum{CWND, RWND}
• Assume for this discussion that RWND >> CWND
90
CWND
v This lecture will talk about CWND in units of MSS
§ (Recall MSS: Maximum Segment Size, the amount of payload
data in a TCP packet)
§ This is only for pedagogical purposes
v Keep in mind that real implementations maintain CWND in
bytes
91
Two Basic Questions
v How does the sender detect congestion?
v How does the sender adjust its sending rate?
92
Detection Congestion: Infer Loss
v Duplicate ACKs: isolated loss
§ dup ACKs indicate network capable of delivering
some segments
v Timeout: much more serious
§Not enough dup ACKs
§Must have suffered several losses
v Will adjust rate differently for each case
93
X
fast retransmit after sender
receipt of triple duplicate ACK
Host BHost A
Seq=92, 8 bytes of data
ACK=100
tim
eo
ut ACK=100
ACK=100
ACK=100
RECAP: TCP fast retransmit (dup acks)
Seq=100, 20 bytes of data
Seq=100, 20 bytes of data
94
Rate Adjustment
v Basic structure:
§Upon receipt of ACK (of new data): increase rate
§Upon detection of loss: decrease rate
v How we increase/decrease the rate depends on
the phase of congestion control we’re in:
§Discovering available bottleneck bandwidth vs.
§Adjusting to bandwidth variations
95
TCP Slow Start (Bandwidth discovery)
v when connection begins,
increase rate exponentially
until first loss event:
§ initially cwnd = 1 MSS
§ double cwnd every RTT (full
ACKs)
§ Simpler implementation
achieved by incrementing
cwnd for every ACK
received
§ cwnd += 1 for each ACK
v summary: initial rate is slow
but ramps up exponentially
fast
Host A
one segment
R
TT
Host B
time
two segments
four segments
96
Adjusting to Varying Bandwidth
v Slow start gave an estimate of available bandwidth
v Now, want to track variations in this available
bandwidth, oscillating around its current value
§ Repeated probing (rate increase) and backoff (rate
decrease)
§Known as Congestion Avoidance (CA)
v TCP uses: “Additive Increase Multiplicative
Decrease” (AIMD)
97
AIMD
v approach: sender increases transmission rate (window size), probing
for usable bandwidth, until another congestion event occurs
§ additive increase: increase cwnd by 1 MSS every RTT until loss
detected
• For each successful RTT (all ACKS), cwnd = cwnd +1
• Simple implementation: for each ACK, cwnd = cwnd +
1/cwnd (since there are cwnd/MSS packets in a window)
§ multiplicative decrease: cut cwnd in half after loss
c
w
n
d
:
TC
P
s
en
de
r
co
ng
es
tio
n
w
in
do
w
s
iz
e
AIMD saw tooth
behavior: probing
for bandwidth
additively increase window size …
…. until loss occurs (then cut window in half)
time 98
Leads to the TCP “Sawtooth”
Loss
Exponential
“slow start”
t
Window
99
Loss
Loss
Loss Loss
Slow-Start vs. AIMD
v When does a sender stop Slow-Start and start
Additive Increase?
v Introduce a “slow start threshold” (ssthresh)
§ Initialized to a large value
v Convert to AI when cwnd = ssthresh, sender
switches from slow-start to AIMD-style increase
§On timeout, ssthresh = CWND/2
100
Implementation
v State at sender
§CWND (initialized to a small constant)
§ ssthresh (initialized to a large constant)
§ [Also dupACKcount and timer, as before]
v Events
§ACK (new data)
§ dupACK (duplicate ACK for old data)
§ Timeout
101
Event: ACK (new data)
v If CWND < ssthresh §CWND += 1 • Hence after one RTT (All ACKs with no drops): CWND = 2xCWND 102 Event: ACK (new data) v If CWND < ssthresh §CWND += 1 v Else §CWND = CWND + 1/CWND Slow start phase • Hence after one RTT (All ACKs with no drops): CWND = CWND + 1 “Congestion Avoidance” phase (additive increase) 103 Event: dupACK v dupACKcount ++ v If dupACKcount = 3 /* fast retransmit */ § ssthresh = CWND/2 §CWND = CWND/2 104 Event: TimeOut v On Timeout § ssthresh ß CWND/2 §CWND ß 1 105 Example t Window Slow-start restart: Go back to CWND = 1 MSS, but take advantage of knowing the previous value of CWND Slow start in operation until it reaches half of previous CWND, I.e., SSTHRESH Timeout Fast Retransmission SSThresh Set to Here 106 TCP Flavours v TCP-Tahoe § cwnd =1 on triple dup ACK & timeout v TCP-Reno § cwnd =1 on timeout § cwnd = cwnd/2 on triple dup ACK v TCP-newReno § TCP-Reno + improved fast recovery (SKIPPED) v TCP-SACK (NOT COVERED IN THE COURSE) § incorporates selective acknowledgements 107 Quiz: TCP Congestion Control? In the figure how many congestion avoidance intervals can you identify? A. 0 B. 1 C. 2 D. 3 E. 4 108 www.zeetings.com/salil ANSWER: C Quiz: TCP Congestion Control? In the figure how many slow start intervals can you identify? A. 0 B. 1 C. 2 D. 3 E. 4 109 www.zeetings.com/salil ANSWER: C Quiz: TCP Congestion Control? In the figure after the 16th transmission round, segment loss is detected by _______ ? A. Triple Dup Ack B. Timeout 110 www.zeetings.com/salil ANSWER: A Quiz: TCP Congestion Control? In the figure what is the initial value of sstresh (steady state threshold)? A. 0 B. 28 C. 32 D. 42 E. 64 111 www.zeetings.com/salil ANSWER: C Quiz: TCP Congestion Control? In the figure what is the value of sstresh (steady state threshold) at the 18th round? A. 1 B. 32 C. 42 D. 21 E. 20 112 www.zeetings.com/salil ANSWER: D Transport Layer: Summary v principles behind transport layer services: §multiplexing, demultiplexing § reliable data transfer § flow control § congestion control v instantiation, implementation in the Internet § UDP § TCP next: v leaving the network “edge” (application, transport layers) v into the network “core” 113