12
34
1/5/21
UCLA CS 118 Winter 2021
Instructor: Giovanni Pau TAs:
Hunter Dellaverson Eric Newberry
This chapter slide deck draws from different sources 7th and 8th edition of the textbook
Transport Layer: 3-1
Chapter 3 Transport Layer
A note on the use of these Powerpoint slides:
We’re making these slides freely available to all (faculty, students, readers). They’re in PowerPoint form so you see the animations; and can add, modify, and delete slides (including this one) and slide content to suit your needs. They obviously represent a lot of work on our part. In return for use, we only ask the following:
§ If you use these slides (e.g., in a class) that you mention their source (after all, we’d like people to use our book!)
§ If you post any slides on a www site, that you note that they are adapted from (or perhaps identical to) our slides, and note our copyright of this material.
Thanks and enjoy! JFK/KWR
All material copyright 1996-2016
J.F Kurose and K.W. Ross, All Rights Reserved
Computer Networking: A Top Down Approach
7th edition
Jim Kurose, Keith Ross Pearson/Addison Wesley April 2016
Transport Layer 2-2
Chapter 3 Transport Layer
A note on the use of these PowerPoint slides:
We’re making these slides freely available to all (faculty, students, readers). They’re in PowerPoint form so you see the animations; and canadd,modify,anddeleteslides (includingthisone)andslidecontent to suit your needs. They obviously represent a lot of work on our part. In return for use, we only ask the following:
§ If you use these slides (e.g., in a class) that you mention their source (after all, we’d like people to use our book!)
§ If you post any slides on a www site, that you note that they are adapted from (or perhaps identical to) our slides, and note our copyright of this material.
For a revision history, see the slide note for this page. Thanks and enjoy! JFK/KWR
All material copyright 1996-2020
J.F Kurose and K.W. Ross, All Rights Reserved
Computer Networking: A
Top-Down Approach
8th edition
Jim Kurose, Keith Ross Pearson, 2020
Transport Layer: 3-3
Transport layer: overview
Our goal:
§understand principles behind transport layer services:
• multiplexing, demultiplexing
• reliable data transfer
• flow control
• congestion control
§learn about Internet transport layer protocols:
• UDP: connectionless transport
• TCP: connection-oriented reliable
transport
• TCP congestion control
Transport Layer: 3-4
1
56
1/5/21
Transport layer: roadmap
§Transport-layer services §Multiplexing and demultiplexing §Connectionless transport: UDP §Principles of reliable data transfer §Connection-oriented transport: TCP §Principles of congestion control §TCP congestion control
§Evolution of transport-layer functionality
Transport Layer: 3-5
Transport services and protocols
§provide logical communication between application processes running on different hosts
§transport protocols actions in end systems:
• sender: breaks application messages into segments, passes to network layer
• receiver: reassembles segments into messages, passes to application layer
§two transport protocols available to Internet applications
• TCP, UDP
mobile
home network
enterprise network
national or global ISP
local or regional ISP
application
transport
network
network
data link physical
content
provider
network
datacenter
application
network
transport
network data link physical
Transport Layer: 3-6
Transport vs. network layer services and protocols
household analogy:
12 kids in Ann’s house sending letters to 12 kids in Bill’s house:
§hosts = houses §processes = kids
§app messages = letters in
§transport protocol = Ann and Bill who demux to in-house siblings
§network-layer protocol = postal service
Transport Layer: 3-7
envelopes
Transport vs. network layer services and protocols
§network layer: logical communication between hosts
§transport layer: logical communication between processes
• relies on, enhances, network layer services
household analogy:
12 kids in Ann’s house sending letters to 12 kids in Bill’s house:
§hosts = houses §processes = kids
§app messages = letters in
§transport protocol = Ann and Bill who demux to in-house siblings
§network-layer protocol = postal service
Transport Layer: 3-8
envelopes
78
2
logical end-end transport
Transport Layer Actions
Sender:
§is passed an application- layer message
§determines segment header fields values
§creates segment §passes segment to IP
link physical
appalipcpa.timosng
TThhtraanpsp. omrstg
network (IP)
application
transport
network (IP)
link
physical
Transport Layer: 3-9
Transport Layer Actions
Receiver:
§receives segment from IP §checks header values §extracts application-layer
message
§demultiplexes message up to application via socket
application
transport
network (IP)
link
physical
application
tranpspp. omrtsg
network (IP)
link
physical
Th app. msg
Transport Layer: 3-10
9 10
Two principal Internet transport protocols
Chapter 3: roadmap
§Transport-layer services
§Multiplexing and demultiplexing
§Connectionless transport: UDP
§ Principles of reliable data transfer
§Connection-oriented transport: TCP
§Principles of congestion control
§TCP congestion control
§Evolution of transport-layer functionality
§TCP: Transmission Control Protocol • reliable, in-order delivery
• congestion control • flow control
• connection setup
§UDP: User Datagram Protocol
• unreliable, unordered delivery
• no-frills extension of “best-effort” IP
§services not available: • delay guarantees
• bandwidth guarantees
application
transport
network mobile nedtawta olinrk
physical
national or global ISP
1/5/21
local or regional ISP
home network
enterprise network
content
provider
network
datacenter
application
network
transport
network data link physical
11
12
Transport Layer: 3-11
Transport Layer: 3-12
3
logical end-end transport
application
HTTP server
HTTP msg
transport
network
link
physical
client
application
transport
network
link
physical
transport
network
link
physical
Transport Layer: 3-13
application
HTTP server
HTTP msg
t r H a t n H s T p T o P r mt s g
network
link
physical
client
application
transport
network
link
physical
transport
network
link
physical
Transport Layer: 3-14
13 14
application
HTTP server
HTTP msg
t r H a t n H s T p T o P r mt s g
HnnHet tHwToTrPkmsg
link
physical
client
application
transport
network
link
physical
transport
network
link
physical
Transport Layer: 3-15
application
HTTP server
transport
network
link
physical
client
application
transport
network
link
physical
transport
network
link
physical
Hn Ht HTTP msg
Transport Layer: 3-16
15 16
1/5/21
4
17 18
1/5/21
client1
client2
HTTP server
application
P-client1 P-client2
transport
transport
network
link
physical
network
transport
link
network
physical
application
link
physical
Transport Layer: 3-17
Multiplexing/demultiplexing
multiplexing at sender:
handle data from multiple sockets, add transport header (later used for demultiplexing)
demultiplexing at receiver:
use header info to deliver received segments to correct socket
socket
process
Transport Layer: 3-18
application
P1 P2
transport
network
link
physical
application
P4
transport
network
link
physical
application
P3
transport
network
link
physical
How demultiplexing works
§host receives IP datagrams
• each datagram has source IP
address, destination IP address
• each datagram carries one transport-layer segment
• each segment has source, destination port number
§host uses IP addresses & port numbers to direct segment to appropriate socket
TCP/UDP segment format
Transport Layer: 3-19
32
source port #
bits
dest port #
other header fields
application data (payload)
Connectionless demultiplexing
Recall:
§when creating socket, must specify host-local port #:
DatagramSocket mySocket1
= new DatagramSocket(12534);
§ when creating datagram to send into UDP socket, must specify
• destination IP address • destination port #
when receiving host receives UDP segment:
• checks destination port # in segment
• directs UDP segment to socket with that port #
IP/UDP datagrams with same dest. port #, but different source IP addresses and/or source port numbers will be directed to same socket at receiving host
Transport Layer: 3-20
19 20
5
1/5/21
Connectionless demultiplexing: an example
DatagramSocket mySocket2 = new DatagramSocket (9157);
source port: 9157 dest port: 6428
DatagramSocket serverSocket = new DatagramSocket (6428);
DatagramSocket mySocket1 = new DatagramSocket (5775);
application
P3
application
P1
transport
transport
transport
network
network
network
link
link
link
physical
application
P4
physical
physical
source port: 6428 dest port: 9157
source port: ? dest port: ?
source port: ? dest port: ?
Transport Layer: 3-21
Connection-oriented demultiplexing
§TCP socket identified by 4-tuple:
• source IP address
• source port number • dest IP address
• dest port number
§demux: receiver uses all four values (4-tuple) to direct segment to appropriate socket
§server may support many
simultaneous TCP sockets:
• each socket identified by its own 4-tuple
• each socket associated with a different connecting client
Transport Layer: 3-22
21 22
23 24
Connection-oriented demultiplexing: example
application
P4 P5 P6
transport
network
link
physical
application
P2 P3
transport
network
link
physical
application
P1
transport
network
link
physical
host: IP address A
source IP,port: B,80 dest IP,port: A,9157
source IP,port: A,9157 dest IP, port: B,80
server: IP address B
source IP,port: C,5775 dest IP,port: B,80
source IP,port: C,9157 dest IP,port: B,80
host: IP address C
Three segments, all destined to IP address: B,
dest port: 80 are demultiplexed to different sockets
Transport Layer: 3-23
Summary
§ Multiplexing, demultiplexing: based on segment, datagram header field values
§ UDP: demultiplexing using destination port number (only)
§ TCP: demultiplexing using 4-tuple: source and destination IP
addresses, and port numbers
§ Multiplexing/demultiplexing happen at all layers
Transport Layer: 3-24
6
25 26
1/5/21
Chapter 3: roadmap
§Transport-layer services §Multiplexing and demultiplexing §Connectionless transport: UDP §Principles of reliable data transfer §Connection-oriented transport: TCP §Principles of congestion control §TCP congestion control
§Evolution of transport-layer functionality
Transport Layer: 3-25
UDP: User Datagram Protocol
§ “no frills,” “bare bones” Internet transport protocol
§ “best effort” service, UDP segments may be:
• lost
• delivered out-of-order to app § connectionless:
• no handshaking between UDP sender, receiver
• each UDP segment handled independently of others
Why is there a UDP?
§ no connection establishment (which can add RTT delay)
§ simple: no connection state at sender, receiver
§ small header size
§ no congestion control
§ UDP can blast away as fast as
desired!
§ can function in the face of congestion
Transport Layer: 3-26
UDP: User Datagram Protocol
§UDP use:
§streaming multimedia apps (loss tolerant, rate sensitive) § DNS
§ SNMP
§HTTP/3
§ if reliable transfer needed over UDP (e.g., HTTP/3): §add needed reliability at application layer
§add congestion control at application layer
Transport Layer: 3-27
UDP: User Datagram Protocol [RFC 768]
Transport Layer: 3-28
27 28
7
UDP: Transport Layer Actions
SNMP client
SNMP server
link physical
application
transport (UDP)
network (IP)
application
transport (UDP)
network (IP)
link
physical
Transport Layer: 3-29
UDP: Transport Layer Actions
SNMP client
UDP sender actions:
§is passed an application- layer message
§determines UDP segment header fields values
§creates UDP segment §passes segment to IP
SNMP server
appSliNcaMtPiomnsg
transport
UUDDPPhh SNMP msg
(UDP)
network (IP)
link
physical
application
transport (UDP)
network (IP)
link
physical
Transport Layer: 3-30
29 30
1/5/21
UDP: Transport Layer Actions
SNMP client
UDP receiver actions:
§receives segment from IP §checks UDP checksum
header value
§extracts application-layer
message
§demultiplexes message up
to application via socket
SNMP server
application
transport (UDP)
network (IP)
link
physical
application
transport
SNMP msg
(UDP)
UnDPehtwSNorMkP(mIPs)g
link
physical
Transport Layer: 3-31
UDP segment header
32 bits
source port #
dest port #
length
checksum
application data (payload)
UDP segment format
length, in bytes of UDP segment, including header
data to/from application layer
Transport Layer: 3-32
31 32
8
33 34
1/5/21
UDP checksum
Goal: detect errors (i.e., flipped bits) in transmitted segment
Transmitted:
Received:
1st number 5
4
2nd number sum 6 11
6 11
receiver-computed = sender-computed checksum checksum (as received)
Transport Layer: 3-33
UDP checksum
Goal: detect errors (i.e., flipped bits) in transmitted segment
sender:
§treat contents of UDP segment (including UDP header fields and IP addresses) as sequence of 16-bit integers
§checksum: addition (one’s complement sum) of segment content
§checksum value put into UDP checksum field
receiver:
§compute checksum of received segment
§check if computed checksum equals checksum field value:
• Not equal – error detected
• Equal – no error detected. But maybe
errors nonetheless? More later ….
Transport Layer: 3-34
Internet checksum: an example
example: add two 16-bit integers
1110011001100110 1101010101010101
wraparound 11011101110111011 sum 1011101110111100
checksum 0100010001000011
Note: when adding numbers, a carryout from the most significant bit needs to be
added to the result
* Check out the online interactive exercises for more examples: http://gaia.cs.umass.edu/kurose_ross/interactive/
Transport Layer: 3-35
Internet checksum: weak protection!
example: add two 16-bit integers
1110011001100110 1101010101010101
wraparound 11011101110111011
sum 1011101110111100 checksum 0100010001000011
01 10
Even though numbers have changed (bit flips), no change in checksum!
Transport Layer: 3-36
35 36
9
1/5/21
Summary: UDP
§ “no frills” protocol:
• segmentsmaybelost,deliveredoutoforder
• besteffortservice:“sendandhopeforthebest”
§ UDP has its plusses:
• nosetup/handshakingneeded(noRTTincurred)
• canfunctionwhennetworkserviceiscompromised • helpswithreliability(checksum)
§ build additional functionality on top of UDP in application layer (e.g., HTTP/3)
Chapter 3: roadmap
§Transport-layer services §Multiplexing and demultiplexing §Connectionless transport: UDP §Principles of reliable data transfer §Connection-oriented transport: TCP §Principles of congestion control §TCP congestion control
§Evolution of transport-layer functionality
Transport Layer: 3-38
37 38
Principles of reliable data transfer
sending
process
application data transport
receiving process
data
reliable channel
reliable service abstraction
Transport Layer: 3-39
Principles of reliable data transfer
sending
process
application data transport
receiving process
data
application transport
sending process data
receiving process
data
receiver-side of reliable data transfer protocol
unreliable channel
reliable channel
reliable service abstraction
sender-side of reliable data transfer protocol
transport network
reliable service implementation
Transport Layer: 3-40
39 40
10
Principles of reliable data transfer
receiving process
data
receiver-side of reliable data transfer protocol
unreliable channel
reliable service implementation
Transport Layer: 3-41
application transport
sending process data
Complexity of reliable data transfer protocol will depend (strongly) on characteristics of unreliable channel (lose, corrupt, reorder data?)
sender-side of reliable data transfer protocol
transport network
Principles of reliable data transfer
receiving process
data
receiver-side of reliable data transfer protocol
unreliable channel
reliable service implementation
Transport Layer: 3-42
application transport
sending process data
Sender, receiver do not know the “state” of each other, e.g., was a message received?
§ unless communicated via a
message
sender-side of reliable data transfer protocol
transport network
41 42
1/5/21
Reliable data transfer protocol (rdt): interfaces
rdt_send(): called from above, (e.g., by app.). Passed data to deliver to receiver upper layer
rdt_send()
sending process data
receiving process
data
deliver_data()
receiver-side implementation of rdt reliable data transfer protocol
sender-side implementation of rdt reliable data transfer protocol
Bi-directional communication over unreliable channel
udt_send() Header data
data packet
deliver_data(): called by rdt to deliver data to upper layer
Header data rdt_rcv() unreliable channel
udt_send(): called by rdt
to transfer packet over unreliable channel to receiver
rdt_rcv(): called when packet arrives on receiver side of channel
Transport Layer: 3-43
Reliable data transfer: getting started
We will:
§ incrementally develop sender, receiver sides of reliable data transfer protocol (rdt)
§ consider only unidirectional data transfer • but control info will flow in both directions!
§ use finite state machines (FSM) to specify sender, receiver event causing state transition
actions taken on state transition
state: when in this “state” next state uniquely determined by next event
state 1
event actions
state 2
Transport Layer: 3-44
43 44
11
45 46
1/5/21
rdt1.0: reliable transfer over a reliable channel
§underlying channel perfectly reliable • no bit errors
• no loss of packets
§separate FSMs for sender, receiver:
• sender sends data into underlying channel
• receiver reads data from underlying channel
Wait for rdt_send(data) sender call from packet = make_pkt(data)
receiver
Wait for call from below
rdt_rcv(packet) extract (packet,data) deliver_data(data)
Transport Layer: 3-45
above
udt_send(packet)
rdt2.0: channel with bit errors
§ underlying channel may flip bits in packet
• checksum (e.g., Internet checksum) to detect bit errors
§ the question: how to recover from errors?
How do humans recover from “errors” during conversation?
Transport Layer: 3-46
rdt2.0: channel with bit errors
§ underlying channel may flip bits in packet • checksum to detect bit errors
§ the question: how to recover from errors?
• acknowledgements (ACKs): receiver explicitly tells sender that pkt received OK
• negative acknowledgements (NAKs): receiver explicitly tells sender that pkt had errors
• sender retransmits pkt on receipt of NAK stop and wait
sender sends one packet, then waits for receiver response
Transport Layer: 3-47
rdt2.0: FSM specifications
rdt_send(data)
snkpkt = make_pkt(data, checksum) udt_send(sndpkt)
sender
Wait for call from above
Wait for ACK or NAK
rdt_rcv(rcvpkt) && isNAK(rcvpkt) udt_send(sndpkt)
rdt_rcv(rcvpkt) && corrupt(rcvpkt) udt_send(NAK)
Wait for
call from receiver
below
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) extract(rcvpkt,data)
rdt_rcv(rcvpkt) && isACK(rcvpkt)
L
deliver_data(data) udt_send(ACK)
Transport Layer: 3-48
47 48
12
49 50
1/5/21
rdt2.0: FSM specification
rdt_send(data)
snkpkt = make_pkt(data, checksum) udt_send(sndpkt)
sender
Wait for call from above
Wait for ACK or NAK
rdt_rcv(rcvpkt) && isNAK(rcvpkt) udt_send(sndpkt)
rdt_rcv(rcvpkt) && corrupt(rcvpkt) udt_send(NAK)
rdt_rcv(rcvpkt) && isACK(rcvpkt) L
Wait for call from below
receiver
Note: “state” of receiver (did the receiver get my message correctly?) isn’t known to sender unless somehow communicated from receiver to sender
§ that’s why we need a protocol!
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) extract(rcvpkt,data)
deliver_data(data) udt_send(ACK)
Transport Layer: 3-49
rdt2.0: operation with no errors
rdt_send(data)
snkpkt = make_pkt(data, checksum) udt_send(sndpkt)
sender
Wait for call from above
Wait for ACK or NAK
rdt_rcv(rcvpkt) && isNAK(rcvpkt) udt_send(sndpkt)
rdt_rcv(rcvpkt) && corrupt(rcvpkt)
udt_send(NAK)
Wait for
call from receiver
below
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) extract(rcvpkt,data)
rdt_rcv(rcvpkt) && isACK(rcvpkt)
L
deliver_data(data) udt_send(ACK)
Transport Layer: 3-50
rdt2.0: corrupted packet scenario
rdt_send(data)
snkpkt = make_pkt(data, checksum) udt_send(sndpkt)
sender
Wait for call from above
Wait for ACK or NAK
rdt_rcv(rcvpkt) && isNAK(rcvpkt) udt_send(sndpkt)
rdt_rcv(rcvpkt) && corrupt(rcvpkt)
udt_send(NAK)
Wait for
call from receiver
below
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) extract(rcvpkt,data)
rdt_rcv(rcvpkt) && isACK(rcvpkt)
L
deliver_data(data) udt_send(ACK)
Transport Layer: 3-51
rdt2.0 has a fatal flaw!
what happens if ACK/NAK corrupted?
§sender doesn’t know what happened at receiver!
§can’t just retransmit: possible duplicate
stop and wait
handling duplicates: §sender retransmits current pkt
if ACK/NAK corrupted
§sender adds sequence number to each pkt
§receiver discards (doesn’t deliver up) duplicate pkt
sender sends one packet, then waits for receiver response
Transport Layer: 3-52
51 52
13
53 54
1/5/21
rdt2.1: sender, handling garbled ACK/NAKs
rdt_send(data)
sndpkt = make_pkt(0, data, checksum)
udt_send(sndpkt)
Wait for call 0 from above
rdt_rcv(rcvpkt) && (corrupt(rcvpkt) || isNAK(rcvpkt) )
udt_send(sndpkt)
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt) && isACK(rcvpkt)
L
sndpkt = make_pkt(1, data, checksum) udt_send(sndpkt)
Wait for ACK or NAK 0
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt)
L
rdt_rcv(rcvpkt) && (corrupt(rcvpkt) || isNAK(rcvpkt) )
udt_send(sndpkt)
Wait for ACK or NAK 1
Wait for call 1 from above
rdt_send(data)
Transport Layer: 3-53
rdt2.1: receiver, handling garbled ACK/NAKs
rdt_rcv(rcvpkt) && (corrupt(rcvpkt)
sndpkt = make_pkt(NAK, chksum) udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
not corrupt(rcvpkt) && has_seq1(rcvpkt)
sndpkt = make_pkt(ACK, chksum) udt_send(sndpkt)
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq0(rcvpkt)
extract(rcvpkt,data) deliver_data(data)
sndpkt = make_pkt(ACK, chksum) udt_send(sndpkt)
Wait for 0 from below
Wait for 1 from below
rdt_rcv(rcvpkt) && (corrupt(rcvpkt)
sndpkt = make_pkt(NAK, chksum) udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
not corrupt(rcvpkt) && has_seq0(rcvpkt)
sndpkt = make_pkt(ACK, chksum) udt_send(sndpkt)
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq1(rcvpkt)
extract(rcvpkt,data) deliver_data(data)
sndpkt = make_pkt(ACK, chksum) udt_send(sndpkt)
Transport Layer: 3-54
rdt2.1: discussion
sender:
§seq # added to pkt
§two seq. #s (0,1) will suffice.
Why?
§must check if received ACK/NAK corrupted
§twice as many states
• state must “remember” whether “expected” pkt should have seq # of 0 or 1
receiver:
§must check if received packet is duplicate
• state indicates whether 0 or 1 is expected pkt seq #
§note: receiver can not know if its last ACK/NAK received OK at sender
Transport Layer: 3-55
rdt2.2: a NAK-free protocol
§ same functionality as rdt2.1, using ACKs only
§ instead of NAK, receiver sends ACK for last pkt received OK
• receiver must explicitly include seq # of pkt being ACKed
§ duplicate ACK at sender results in same action as NAK:
retransmit current pkt
As we will see, TCP uses this approach to be NAK-free
Transport Layer: 3-56
55 56
14
57 58
1/5/21
rdt2.2: sender, receiver fragments
rdt_send(data)
sndpkt = make_pkt(0, data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) ||
isACK(rcvpkt,1) ) udt_send(sndpkt)
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt) && isACK(rcvpkt,0)
L
Wait for call 0 from above
sender FSM fragment
Wait for ACK 0
rdt_rcv(rcvpkt) && (corrupt(rcvpkt) ||
has_seq1(rcvpkt)) udt_send(sndpkt)
Wait for 0 from below
receiver FSM fragment
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq1(rcvpkt)
extract(rcvpkt,data) deliver_data(data)
sndpkt = make_pkt(ACK1, chksum) udt_send(sndpkt)
Transport Layer: 3-57
rdt3.0: channels with errors and loss
New channel assumption: underlying channel can also lose packets (data, ACKs)
• checksum, sequence #s, ACKs, retransmissions will be of help … but not quite enough
Q: How do humans handle lost sender-to- receiver words in conversation?
Transport Layer: 3-58
rdt3.0: channels with errors and loss
Approach: sender waits “reasonable” amount of time for ACK § retransmits if no ACK received in this time
§ if pkt (or ACK) just delayed (not lost):
• retransmission will be duplicate, but seq #s already handles this!
• receiver must specify seq # of packet being ACKed
§ use countdown timer to interrupt after “reasonable” amount of time
timeout
Transport Layer: 3-59
rdt3.0 sender
rdt_send(data)
sndpkt = make_pkt(0, data, checksum) udt_send(sndpkt)
start_timer
Wait for call 0 from above
Wait for ACK0
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt) && isACK(rcvpkt,1) stop_timer
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt) && isACK(rcvpkt,0) stop_timer
Wait for ACK1
Wait for call 1 from above
rdt_send(data)
sndpkt = make_pkt(1, data, checksum) udt_send(sndpkt)
start_timer
Transport Layer: 3-60
59 60
15
61 62
1/5/21
rdt3.0 sender
rdt_rcv(rcvpkt)
L
rdt_send(data)
sndpkt = make_pkt(0, data, checksum) udt_send(sndpkt)
start_timer
Wait for Wait call 0 from for
rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || isACK(rcvpkt,1) ) L
timeout udt_send(sndpkt) start_timer
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt) && isACK(rcvpkt,0) stop_timer
above
Wait for ACK1
ACK0
Wait for call 1 from above
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt) && isACK(rcvpkt,1) stop_timer
timeout udt_send(sndpkt) start_timer
rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || isACK(rcvpkt,0) )
L
rdt_send(data)
rdt_rcv(rcvpkt)
L
sndpkt = make_pkt(1, data, checksum) udt_send(sndpkt)
start_timer
Transport Layer: 3-61
rdt3.0 in action
sender
send pkt0 pkt0
receiver sender
send pkt0 pkt0
receiver
rcv pkt0 send ack0
rcv ack0 send pkt1
rcv ack1 send pkt0
rcv pkt0 ack0 send ack0
pkt1
rcv pkt1 ack1 send ack1
pkt0
rcv pkt0 ack0 send ack0
(a) no loss
rcv ack0 ack0 send pkt1 pkt1X
loss
pkt1
timeout
resend pkt1
rcv pkt1 ack1 send ack1
pkt0
rcv pkt0 ack0 send ack0
rcv ack1 send pkt0
(b) packet loss
Transport Layer: 3-62
rdt3.0 in action
sender
send pkt0
rcv ack0 send pkt1
receiver
rcv pkt0 send ack0
rcv pkt1 send ack1
rcv pkt1 (detect duplicate) send ack1
rcv pkt0 send ack0
sender
send pkt0
rcv ack0 ack0
receiver
rcv pkt0 send ack0
rcv pkt1 send ack1
pkt0
ack0
pkt1
pkt0 send pkt1 pkt1
ack1
ack1
pkt1
rcv ack1
send pkt0 pkt0
rcv ack1 ack1 (ignore) ack0
pkt1
X
timeout
loss
resend pkt1
rcv pkt1 (detect duplicate)
timeout
resend pkt1
pkt1
ack1
pkt0
ack0
send ack1
rcv ack1 send pkt0
rcv pkt0 send ack0
(c) ACK loss
(d) premature timeout/ delayed ACK
Transport Layer: 3-63
Performance of rdt3.0 (stop-and-wait)
§U sender: utilization – fraction of time sender busy sending
§example: 1 Gbps link, 15 ms prop. delay, 8000 bit packet • time to transmit packet into channel:
Dtrans = L = 8000 bits = 8 microsecs R 109 bits/sec
Transport Layer: 3-64
63 64
16
1/5/21
rdt3.0: stop-and-wait operation
sender first packet bit transmitted, t = 0
RT
ACK arrives, send next packet, t = RTT + L / R
receiver
first packet bit arrives
last packet bit arrives, send ACK
Transport Layer: 3-65
T
rdt3.0: stop-and-wait operation
sender
L/R L/ Usender= RTT + L / R
receiver
R
T
= .008 30.008
= 0.00027
RT
§ rdt 3.0 protocol performance stinks!
§ Protocol limits performance of underlying infrastructure (channel)
Transport Layer: 3-66
65 66
rdt3.0: pipelined protocols operation
pipelining: sender allows multiple, “in-flight”, yet-to-be-acknowledged packets
• range of sequence numbers must be increased • buffering at sender and/or receiver
Transport Layer: 3-67
Pipelining: increased utilization
sender
RT
receiver
first packet bit arrives
last packet bit arrives, send ACK
last bit of 2nd packet arrives, send ACK last bit of 3rd packet arrives, send ACK
3-packet pipelining increases utilization by a factor of 3!
.0024 = 0.00081 30.008
first packet bit transmitted, t = 0
last bit transmitted, t = L / R
T
ACK arrives, send next packet, t = RTT + L / R
U sender = 3L / R RTT + L / R
=
Transport Layer: 3-68
67 68
17
69 70
1/5/21
Go-Back-N: sender
§sender: “window” of up to N, consecutive transmitted but unACKed pkts • k-bit seq # in pkt header
§ cumulative ACK: ACK(n): ACKs all packets up to, including seq # n • on receiving ACK(n): move window forward to begin at n+1
§ timer for oldest in-flight packet
§ timeout(n): retransmit packet n and all higher seq # packets in window Transport Layer: 3-69
Go-Back-N: receiver
§ ACK-only: always send ACK for correctly-received packet so far, with highest in-order seq #
• may generate duplicate ACKs
• need only remember rcv_base
§on receipt of out-of-order packet:
• can discard (don’t buffer) or buffer: an implementation decision • re-ACK pkt with highest in-order seq #
Receiver view of sequence number space:
……
rcv_base
received and ACKed
Out-of-order: received but not ACKed Not received
Transport Layer: 3-70
Go-Back-N in action
sender window (N=4)
012345678 012345678 012345678 012345678
012345678 012345678
012345678 012345678 012345678 012345678
Xloss
sender
send pkt0 send pkt1 send pkt2 send pkt3
(wait)
rcv ack0, send pkt4 rcv ack1, send pkt5
ignore duplicate ACK
pkt 2 timeout
send pkt2 send pkt3 send pkt4 send pkt5
receiver
receive pkt0, send ack0 receive pkt1, send ack1
receive pkt3, discard, (re)send ack1
receive pkt4, discard, (re)send ack1 receive pkt5, discard, (re)send ack1
rcv pkt2, deliver, send ack2 rcv pkt3, deliver, send ack3 rcv pkt4, deliver, send ack4 rcv pkt5, deliver, send ack5
Transport Layer: 3-71
Selective repeat
§receiver individually acknowledges all correctly received packets • buffers packets, as needed, for eventual in-order delivery to upper
layer
§sender times-out/retransmits individually for unACKed packets
• sender maintains timer for each unACKed pkt
§sender window
• N consecutive seq #s
• limits seq #s of sent, unACKed packets
Transport Layer: 3-72
71 72
18
73 74
1/5/21
Selective repeat: sender, receiver windows
Transport Layer: 3-73
Selective repeat: sender and receiver
sender
data from above:
§ if next available seq # in window, send packet
timeout(n):
§ resend packet n, restart timer
ACK(n) in [sendbase,sendbase+N]:
§ mark packet n as received
§ if n smallest unACKed packet, advance window base to next unACKed seq #
receiver
packet n in [rcvbase, rcvbase+N-1] § send ACK(n)
§ out-of-order: buffer
§ in-order: deliver (also deliver buffered, in-order packets), advance window to next not-yet- received packet
packet n in [rcvbase-N,rcvbase-1] § ACK(n)
otherwise:
§ ignore
Transport Layer: 3-74
Selective Repeat in action
sender window (N=4)
012345678 012345678 012345678 012345678
012345678 012345678
012345678 012345678 012345678 012345678
Xloss
sender
send pkt0 send pkt1 send pkt2 send pkt3
(wait)
rcv ack0, send pkt4 rcv ack1, send pkt5
record ack3 arrived
pkt 2 timeout
send pkt2
(but not 3,4,5)
receiver
receive pkt0, send ack0 receive pkt1, send ack1
receive pkt3, buffer, send ack3
receive pkt4, buffer, send ack4
receive pkt5, buffer, send ack5
rcv pkt2; deliver pkt2, pkt3, pkt4, pkt5; send ack2
Transport Layer: 3-75
Q: what happens when ack2 arrives?
Selective repeat:
a dilemma!
example:
§ seq #s: 0, 1, 2, 3 (base 4 counting) § window size=3
sender window (after receipt)
0123012 pkt0
receiver window (after receipt)
0123012 0123012
pkt1 0123012 pkt2 0123012
0123012 pkt3X
0123012
0123012
(a) no problem
0123012 pkt0
0123012 pkt1 0123012 0123012 pkt2 X 0123012
pkt0 will accept packet with seq number 0
timeout
retransmit pkt0 0123012 pkt0
(b) oops!
X 0123012 X
will accept packet with seq number 0
Transport Layer: 3-76
75 76
19
77 78
1/5/21
Selective repeat:
a dilemma!
example:
§ seq #s: 0, 1, 2, 3 (base 4 counting) § window size=3
Q: what relationship is needed between sequence # size and window size to avoid problem in scenario (b)?
sender window (after receipt)
0123012 pkt0
0123012 0123012
0123012 pkt3X
0123012 §receiver can’t
receiver window (after receipt)
pkt1
pkt2 0123012
0123012 0123012
see sender side (§ar)ecneoivperoblem
behavior identical in both cases!
§ 0 s 1o 2m3 e0 t 1 h 2 i n g p’ s k t 0 (very) wronpkgt!1
with seq number 0
pkt0 will accept packet
0 1 2 3 0 1 2 0123012 pkt2 X 0123012 X 0123012
0123012
timeout X retransmit pkt0
0123012 pkt0
(b) oops!
will accept packet with seq number 0
Transport Layer: 3-77
Chapter 3: roadmap
§Transport-layer services §Multiplexing and demultiplexing §Connectionless transport: UDP §Principles of reliable data transfer
§Connection-oriented transport: TCP • segmentstructure
• reliabledatatransfer
• flowcontrol
• connectionmanagement §Principles of congestion control §TCP congestion control
Transport Layer: 3-78
TCP: overview RFCs: 793,1122, 2018, 5681, 7323
§point-to-point:
• one sender, one receiver
§reliable, in-order byte steam:
• no “message boundaries”
§ full duplex data:
• bi-directional data flow in
§cumulative ACKs
§ pipelining:
• TCP congestion and flow control
set window size
§ connection-oriented:
• handshaking (exchange of control messages) initializes sender, receiver state before data exchange
same connection
•MSS:maximumsegmentsize §flowcontrolled:
• sender will not overwhelm receiver Transport Layer: 3-79
TCP segment structure
32 bits
ACK: seq # of next expected byte; A bit: this is an ACK
length (of TCP header) flow control: # bytes
source port #
dest port #
sequence number
acknowledgement number
head len
not
used A R S F receive window
CEU P
checksum Urg data pointer
options (variable length)
application data (variable length)
segment seq #: counting bytes of data into bytestream (not segments!)
Internet checksum
C, E: congestion notification
TCP options
RST, SYN, FIN: connection management
receiver willing to accept
data sent by application into TCP socket
Transport Layer: 3-80
79 80
20
TCP sequence numbers, ACKs
Sequence numbers:
• byte stream “number” of first byte in segment’s data
Acknowledgements:
• seq # of next byte expected
from other side • cumulative ACK
Q: how receiver handles out-of- order segments
• A: TCP spec doesn’t say, – up to implementor
outgoing segment from sender
window size N
sender sequence number space
source port #
sequenc
dest port #
e number
acknowledgement number
rwnd
checksum
urg pointer
sent ACKed
sent, not- yet ACKed (“in-flight”)
usable not but not usable yet sent
outgoing segment from receiver
Transport Layer: 3-81
source port #
dest port #
sequence number
acknowledgement number
A
rwnd
checksum
urg pointer
TCP sequence numbers, ACKs
Host A User types‘C’
host ACKs receipt of echoed ‘C’
Seq=42, ACK=79, data = ‘C’ Seq=79, ACK=43, data = ‘C’
Seq=43, ACK=80
Host B
host ACKs receipt of‘C’, echoes back ‘C’
simple telnet scenario
Transport Layer: 3-82
81 82
TCP round trip time, timeout
EstimatedRTT = (1- a)*EstimatedRTT + a*SampleRTT § exponential weighted moving average (EWMA)
1/5/21
TCP round trip time, timeout
Q: how to set TCP timeout value?
§longer than RTT, but RTT varies! §too short: premature timeout,
unnecessary retransmissions
§too long: slow reaction to segment loss
Q: how to estimate RTT? §SampleRTT:measured time
from segment transmission until ACK receipt
• ignore retransmissions §SampleRTT will vary, want
estimated RTT “smoother”
• average several recent measurements, not just current SampleRTT
Transport Layer: 3-83
§ influence of past sample decreases exponentially fast
§ typical value: a = 0.125
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
350
300
250
200
150
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
83 84
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
Transport Layer: 3-84
sampleRTT EstimatedRTT
time (seconnds)
time (seconds)
SampleRTT Estimated RTT
21
RTT (milliseconds)
RTT (milliseconds)
1/5/21
TCP round trip time, timeout
§ timeout interval: EstimatedRTT plus “safety margin”
• large variation in EstimatedRTT: want a larger safety margin
TimeoutInterval = EstimatedRTT + 4*DevRTT
estimated RTT “safety margin”
§DevRTT: EWMA of SampleRTT deviation from EstimatedRTT:
DevRTT = (1-b)*DevRTT + b*|SampleRTT-EstimatedRTT| (typically,b =0.25)
* Check out the online interactive exercises for more examples: http://gaia.cs.umass.edu/kurose_ross/interactive/
Transport Layer: 3-85
TCP Sender (simplified) event: data received from
application
§create segment with seq # §seq # is byte-stream number
of first data byte in segment
§start timer if not already running
• think of timer as for oldest unACKed segment
• expiration interval: TimeOutInterval
event: timeout
§retransmit segment that caused timeout
§restart timer event: ACK received
§if ACK acknowledges previously unACKed segments
• update what is known to be ACKed
• start timer if there are still unACKed segments
Transport Layer: 3-86
85 86
TCP Receiver: ACK generation [RFC 5681]
Event at receiver
arrival of in-order segment with expected seq #. All data up to expected seq # already ACKed
arrival of in-order segment with expected seq #. One other segment has ACK pending
arrival of out-of-order segment higher-than-expect seq. # . Gap detected
arrival of segment that partially or completely fills gap
TCP receiver action
delayed ACK. Wait up to 500ms
for next segment. If no next segment, send ACK
immediately send single cumulative ACK, ACKing both in-order segments
immediately send duplicate ACK, indicating seq. # of next expected byte
immediate send ACK, provided that segment starts at lower end of gap
Transport Layer: 3-87
TCP: retransmission scenarios
Host A Host B
Seq=92, 8 bytes of data
Host A Host B SendBase=92
ACK=100
X
Seq=92, 8 bytes of data
ACK=100
lost ACK scenario
SendBase=100 SendBase=120
SendBase=120
Seq=92, 8 bytes of data Seq=100, 20 bytes of data
ACK=100 ACK=120
Seq=92, 8 bytes of data
ACK=120
premature timeout
send cumulative ACK for 120
Transport Layer: 3-88
87 88
22
timeout
timeout
TCP: retransmission scenarios
Host A Host B
Seq=92, 8 bytes of data
Seq=100, 20 bytes of data
X ACK=100 ACK=120
Seq=120, 15 bytes of data
cumulative ACK covers for earlier lost ACK
Transport Layer: 3-89
TCP fast retransmit
TCP fast retransmit
if sender receives 3 additional ACKs for same data (“triple duplicate ACKs”), resend unACKed segment with smallest seq #
§ likely that unACKed segment lost, so don’t wait for timeout
Receipt of three duplicate ACKs indicates 3 segments received after a missing segment – lost
segment is likely. So retransmit!
Host A
Host B
X
Seq=100, 20 bytes of data
Transport Layer: 3-90
89 90
1/5/21
Chapter 3: roadmap
§Transport-layer services §Multiplexing and demultiplexing §Connectionless transport: UDP §Principles of reliable data transfer
§Connection-oriented transport: TCP • segmentstructure
• reliabledatatransfer
• flowcontrol
• connectionmanagement §Principles of congestion control §TCP congestion control
Transport Layer: 3-91
TCP flow control
Q: What happens if network layer delivers data faster than application layer removes data from socket buffers?
Application removing data from TCP socket buffers
Network layer delivering IP datagram payload into TCP socket buffers
application process
TCP socket receiver buffers
TCP code
IP code
from sender
receiver protocol stack
Transport Layer: 3-92
91 92
23
Seq=92, 8 bytes of data
Seq=100, 20 bytes of data
ACK=100
timeout
ACK=100
ACK=100
ACK=100
93 94
1/5/21
TCP flow control
Q: What happens if network layer delivers data faster than application layer removes data from socket buffers?
Application removing data from TCP socket buffers
Network layer delivering IP datagram payload into TCP socket buffers
application process
TCP socket receiver buffers
TCP code
IP code
from sender
receiver protocol stack
Transport Layer: 3-93
TCP flow control
application layer removes data from socket buffers?
flow control: # bytes receiver willing to accept
Q: What happens if network
layer delivers data faster than data from TCP socket
application process
Application removing buffers
TCP socket receiver buffers
TCP code
receive window
IP code
from sender
receiver protocol stack
Transport Layer: 3-94
TCP flow control
Q: What happens if network layer delivers data faster than application layer removes data from socket buffers?
flow control
receiver controls sender, so sender won’t overflow receiver’s buffer by transmitting too much, too fast
Application removing data from TCP socket buffers
application process
TCP socket receiver buffers
TCP code
IP code
from sender
receiver protocol stack
Transport Layer: 3-95
TCP flow control
§TCP receiver “advertises” free buffer space in rwnd field in TCP header
• RcvBuffer size set via socket options (typical default is 4096 bytes)
• many operating systems autoadjust RcvBuffer
§sender limits amount of unACKed (“in-flight”) data to received rwnd
§guarantees receive buffer will not overflow
to application process
r buffered data
free buffer space
RcvBuffe
rwnd
TCP segment payloads TCP receiver-side buffering
Transport Layer: 3-96
95 96
24
TCP connection management
before exchanging data, sender/receiver “handshake”:
§ agree to establish connection (each knowing the other willing to establish connection) § agree on connection parameters (e.g., starting seq #s)
application
connection state: ESTAB connection variables:
seq # client-to-server server-to-client
rcvBuffer size at server,client
network
application
connection state: ESTAB connection Variables:
seq # client-to-server server-to-client
rcvBuffer size at server,client
network
Socket clientSocket = Socket connectionSocket = newSocket(“hostname”,”port number”); welcomeSocket.accept();
Transport Layer: 3-98
97 98
Agreeing to establish a connection
2-way handshake:
ESTAB
choose x
ESTAB
Let’s talk OK
req_conn(x)
acc_conn(x)
ESTAB
ESTAB
Q: will 2-way handshake always work in network?
§ variable delays
§ retransmitted messages (e.g.
req_conn(x)) due to message loss § message reordering
§ can’t “see” other side
Transport Layer: 3-99
2-way handshake scenarios
choose x
ESTAB
No problem!
req_conn(x)
acc_conn(x)
data(x+1)
ACK(x+1)
connection x completes
ESTAB
accept data(x+1)
Transport Layer: 3-100
99 100
1/5/21
TCP flow control
§TCP receiver “advertises” free buffer space in rwnd field in TCP header
• RcvBuffer size set via socket options (typical default is 4096 bytes)
• many operating systems autoadjust RcvBuffer
§sender limits amount of unACKed (“in-flight”) data to received rwnd
§guarantees receive buffer will not overflow
flow control: # bytes receiver willing to accept
receive window
TCP segment format
Transport Layer: 3-97
25
2-way handshake scenarios
choose x
retransmit req_conn(x)
ESTAB
client terminates
Problem: half open connection! (no client)
req_conn(x)
acc_conn(x)
req_conn(x)
ESTAB
server forgets x
ESTAB
connection x completes
acc_conn(x)
Transport Layer: 3-101
2-way handshake scenarios
choose x
retransmit req_conn(x)
ESTAB
retransmit data(x+1)
client terminates
req_conn(x) acc_conn(x)
data(x+1)
connection x completes
ESTAB
accept data(x+1)
server forgets x
ESTAB
accept data(x+1)
req_conn(x)
data(x+1)
Problem: dup data accepted!
101 102
1/5/21
TCP 3-way handshake
Client state
clientSocket = socket(AF_INET, SOCK_STREAM)
LISTEN
clientSocket.connect((serverName,serverPort))
choose init seq num, x send TCP SYN msg
Server state
serverSocket = socket(AF_INET,SOCK_STREAM) serverSocket.bind((‘’,serverPort)) serverSocket.listen(1)
connectionSocket, addr = serverSocket.accept()
SYNSENT
SYNbit=1, Seq=x
SYNbit=1, Seq=y ACKbit=1; ACKnum=x+1
ACKbit=1, ACKnum=y+1
choose init seq num, y send TCP SYNACK msg, acking SYN
received ACK(y) indicates client is live
LISTEN
SYN RCVD
ESTAB
ESTAB
received SYNACK(x) indicates server is live; send ACK for SYNACK;
this segment may contain client-to-server data
Transport Layer: 3-103
A human 3-way handshake protocol
1. On belay? 2. Belay on.
3. Climbing.
Transport Layer: 3-104
103 104
26
105 106
1/5/21
Closing a TCP connection
§client, server each close their side of connection • send TCP segment with FIN bit = 1
§respond to received FIN with ACK
• on receiving FIN, ACK can be combined with own FIN
§simultaneous FIN exchanges can be handled
Transport Layer: 3-105
Chapter 3: roadmap
§Transport-layer services §Multiplexing and demultiplexing §Connectionless transport: UDP §Principles of reliable data transfer §Connection-oriented transport: TCP §Principles of congestion control §TCP congestion control
§Evolution of transport-layer functionality
Transport Layer: 3-106
Principles of congestion control
Congestion:
§informally: “too many sources sending too much data too fast for network to handle”
§ manifestations:
• long delays (queueing in router buffers) • packet loss (buffer overflow at routers)
§different from flow control! §a top-10 problem!
congestion control:
too many senders, sending too fast
flow control: one sender too fast for one receiver
Transport Layer: 3-107
Causes/costs of congestion: scenario 1
Simplest scenario:
§ one router, infinite buffers
§ input, output link capacity: R § two flows
§ no retransmissions needed
throughput: lout infinite shared
output link buffers
RR
lin R/2
large delays as arrival rate lin approaches capacity
Transport Layer: 3-108
original data: lin Host A
Host B
R/2
lin
maximum per-connection throughput: R/2
Q: What happens as arrival rate lin approaches R/2?
R/2
107 108
R/2
27
throughput:lout
delay
Causes/costs of congestion: scenario 2
§one router, finite buffers
§sender retransmits lost, timed-out packet
• application-layer input = application-layer output: lin = lout • transport-layer input includes retransmissions : l’in lin
A
Host
Host B
lin : original data
l’in: original data, plus
retransmitted data
RR
finite shared output link buffers
out
l
Transport Layer: 3-109
Causes/costs of congestion: scenario 2
Idealization: perfect knowledge
§ sender sends only when router buffers available
R/2
Host A
Host B
lin : original data
copy l’in: original data, plus out
retransmitted data
free buffer space!
RR
finite shared output link buffers
lin
R/2
Transport Layer: 3-110
l
109 110
Causes/costs of congestion: scenario 2
Idealization: some perfect knowledge
§ packets can be lost (dropped at router) due to
full buffers
§ sender knows when packet has been dropped:
only resends if packet known to be lost
Host A
Host B
copy
lin : original data
l’in: original data, plus
retransmitted data
no buffer space!
RR
finite shared output link buffers
Transport Layer: 3-111
Causes/costs of congestion: scenario 2
Idealization: some perfect knowledge
§ packets can be lost (dropped at router) due to
full buffers
§ sender knows when packet has been dropped:
only resends if packet known to be lost
R/2
“wasted” capacity due to retransmissions
when sending at R/2, some packets are needed retransmissions
R/2
1/5/21
Host A
Host B
lin : original data
l’in: original data, plus
retransmitted data
free buffer space!
RR
finite shared output link buffers
lin
111 112
Transport Layer: 3-112
28
throughput: lout
throughput: lout
Causes/costs of congestion: scenario 2
Realistic scenario: un-needed duplicates § packets can be lost, dropped at router due to
full buffers – requiring retransmissions
§ but sender times can time out prematurely,
sending two copies, both of which are delivered Host lin : original data
R/2
A
time
o
ut
c
o
py
l’in: original data, plus retransmitted data
free buffer space!
RR
finite shared output link buffers
lin
“wasted” capacity due to un-needed retransmissions
when sending at R/2, some packets are retransmissions, including needed and un-needed
R/2 duplicates, that are delivered!
Transport Layer: 3-113
Host B
113 114
Causes/costs of congestion: scenario 3
§ four senders
§ multi-hop paths §timeout/retransmit
Host D
Q: what happens as lin and lin’ increase ?
A: as red lin’ increases, all arriving blue pkts at upper
queue are dropped, blue throughputg0
Host A lin : original data
l’in: original data, plus retransmitted data
finite shared output link buffers
lout
Host C
Host B
Transport Layer: 3-115
Causes/costs of congestion: scenario 3
R/2
lin’ R/2
another “cost” of congestion:
§ when packet dropped, any upstream transmission capacity and buffering used for that packet was wasted!
Transport Layer: 3-116
115 116
1/5/21
Causes/costs of congestion: scenario 2
Realistic scenario: un-needed duplicates § packets can be lost, dropped at router due to
full buffers – requiring retransmissions
§ but sender times can time out prematurely,
sending two copies, both of which are delivered
“costs” of congestion:
R/2
§ more work (retransmission) for given receiver throughput
§ unneeded retransmissions: link carries multiple copies of a packet
• decreasing maximum achievable throughput
lin
“wasted” capacity due to un-needed retransmissions
when sending at R/2, some packets are retransmissions, including needed and un-needed
R/2 duplicates, that are delivered!
Transport Layer: 3-114
29
lout
throughput: lout
throughput: lout
1/5/21
Causes/costs of congestion: insights
R/2
R/2
R/2
§ throughput can never exceed capacity
§ delay increases as capacity approached
§ loss/retransmission decreases effective throughput
§ un-needed duplicates further decreases effective throughput
lin
lin
R/2
R/2 R/2
lin R/2
lin R/2
Transport Layer: 3-117
§ upstream transmission capacity / buffering wasted for packets lost downstream
lin’ R/2
Approaches towards congestion control
End-end congestion control:
§ no explicit feedback from network
§ congestion inferred from observed loss, delay
§ approach taken by TCP
ACKs data
data ACKs
Transport Layer: 3-118
117 118
Approaches towards congestion control
Network-assisted congestion control:
§routers provide direct feedback to sending/receiving hosts with flows passing through congested router
§may indicate congestion level or explicitly set sending rate
§TCP ECN, ATM, DECbit protocols
explicit congestion info
ACKs data
data
ACKs
Transport Layer: 3-119
Chapter 3: roadmap
§Transport-layer services §Multiplexing and demultiplexing §Connectionless transport: UDP §Principles of reliable data transfer §Connection-oriented transport: TCP §Principles of congestion control §TCP congestion control
§Evolution of transport-layer functionality
Transport Layer: 3-120
119 120
30
lout
throughput: lout
throughput: lout
throughput: lout
delay
1/5/21
TCP congestion control: AIMD
§ approach: senders can increase sending rate until packet loss (congestion) occurs, then decrease sending rate on loss event
Additive Increase
increase sending rate by 1 maximum segment size every RTT until loss detected
time
Multiplicative Decrease
cut sending rate in half at each loss event
AIMD sawtooth behavior: probing for bandwidth
Transport Layer: 3-121
TCP AIMD: more
Multiplicative decrease detail: sending rate is
§ Cut in half on loss detected by triple duplicate ACK (TCP Reno)
§ Cut to 1 MSS (maximum segment size) when loss detected by timeout (TCP Tahoe)
Why AIMD?
§ AIMD – a distributed, asynchronous algorithm – has been
shown to:
• optimize congested flow rates network wide! • have desirable stability properties
Transport Layer: 3-122
121 122
TCP congestion control: details
sender sequence number space
TCP sending behavior:
§ roughly: send cwnd bytes, wait RTT for ACKS, then send more bytes
last byte ACKed
cwnd
sent, but not- yet ACKed (“in-flight”)
available but not used
last byte sent
§TCPsenderlimitstransmission:LastByteSent-LastByteAcked< cwnd
§cwnd is dynamically adjusted in response to observed network congestion (implementing TCP congestion control)
TCP rate ~ cwnd bytes/sec RTT
Transport Layer: 3-123
TCP slow start
§when connection begins, increase rate exponentially until first loss event:
• initially cwnd = 1 MSS
• double cwnd every RTT
• done by incrementing cwnd
for every ACK received
§summary: initial rate is slow, but ramps up exponentially fast
Host A
Host B
time
Transport Layer: 3-124
123 124
31
RTT
TCPsender Sendingrate
one segment
two segments
four segments
125 126
1/5/21
TCP: from slow start to congestion avoidance
Q: when should the exponential increase switch to linear?
A: when cwnd gets to 1/2 of its value before timeout.
Implementation:
§ variable ssthresh
§ on loss event, ssthresh is set to
X
1/2 of cwnd just before loss event
* Check out the online interactive exercises for more examples: http://gaia.cs.umass.edu/kurose_ross/interactive/
Transport Layer: 3-125
Summary: TCP congestion control
New
New new ACK. ACK!
cwnd = cwnd + MSS dupACKcount = 0
transmit new segment(s), as allowed
duplicate ACK dupACKcount++
slow start
ACK!
(MSS/cwnd)
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
timeout ssthresh = cwnd/2
cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
dupACKcount == 3
ssthresh= cwnd/2 cwnd = ssthresh + 3 retransmit missing segment
new ACK
cwnd = cwnd+MSS
dupACKcount = 0
transmit new segment(s), as allowed
cwnd > ssthresh
L
timeout ssthresh = cwnd/2
cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
congestion avoidance
timeout
ssthresh = cwnd/2
cwnd = 1
dupACKcount = 0 retransmit missing segment
fast recovery
New ACK! New ACK
cwnd = ssthresh dupACKcount = 0
duplicate ACK
cwnd = cwnd + MSS
duplicate ACK dupACKcount++
dupACKcount == 3
ssthresh= cwnd/2
cwnd = ssthresh + 3 retransmit missing segment
transmit new segment(s), as allowed
Transport Layer: 3-126
TCP CUBIC
§ Is there a better way than AIMD to “probe” for usable bandwidth?
§ Insight/intuition:
• Wmax: sending rate at which congestion loss was detected
• congestion state of bottleneck link probably (?) hasn’t changed much
• after cutting rate/window in half on loss, initially ramp to to Wmax faster, but then approach Wmax more slowly
Wmax classic TCP
Wmax/2
TCP CUBIC – higher throughput in this example
Transport Layer: 3-127
TCP CUBIC
§ K: point in time when TCP window size will reach Wmax • Kitselfistuneable
§ increase W as a function of the cube of the distance between current time and K
• larger increases when further away from K • smaller increases (cautious) when nearer K
§ TCP CUBIC default in Linux, most popular TCP for popular Web servers
Wmax
TCP sending rate
TCP Reno
TCP CUBIC
time t0 t1 t2 t3 t4
Transport Layer: 3-128
127 128
32
TCP and the congested “bottleneck link”
§TCP (classic, CUBIC) increase TCP’s sending rate until packet loss occurs at some router’s output: the bottleneck link
source
packet queue almost never empty, sometimes overflows packet (loss)
destination
application
TCP
network
link
physical
application
TCP
network
link
physical
bottleneck link (almost always busy)
Transport Layer: 3-129
TCP and the congested “bottleneck link”
§TCP (classic, CUBIC) increase TCP’s sending rate until packet loss occurs at some router’s output: the bottleneck link
§understanding congestion: useful to focus on congested bottleneck link
source
insight: increasing TCP sending rate will increase measured RTT
insight: increasing TCP sending rate will not increase end-end throughout with congested bottleneck
destination
application
TCP
network
link
physical
application
TCP
network
link
physical
RTT
Goal: “keep the end-end pipe just full, but not fuller” Transport Layer: 3-130
129 130
1/5/21
Delay-based TCP congestion control
Keeping sender-to-receiver pipe “just full enough, but no fuller”: keep
bottleneck link busy transmitting, but avoid high delays/buffering
RTTmeasured measured = throughput
Delay-based approach:
§ RTTmin – minimum observed RTT (uncongested path)
§ uncongested throughput with congestion window cwnd is cwnd/RTTmin
if measured throughput “very close” to uncongested throughput increase cwnd linearly /* since path not congested */ else if measured throughput “far below” uncongested throughout
decrease cwnd linearly /* since path is congested */
# bytes sent in last RTT interval
RTTmeasured
Transport Layer: 3-131
Delay-based TCP congestion control § congestion control without inducing/forcing loss
§ maximizing throughout (“keeping the just pipe full… ”) while keeping delay low (“…but not fuller”)
§ a number of deployed TCPs take a delay-based approach § BBR deployed on Google’s (internal) backbone network
Transport Layer: 3-132
131 132
33
133 134
1/5/21
Explicit congestion notification (ECN)
TCP deployments often implement network-assisted congestion control: § two bits in IP header (ToS field) marked by network router to indicate congestion
• policy to determine marking chosen by network operator §congestion indication carried to destination
§ destination sets ECE bit on ACK segment to notify sender of congestion § involves both IP (IP header ECN bit marking) and TCP (TCP header C,E bit marking)
source
IP datagram
TCP ACK segment
ECN=11
destination
application
TCP
network
application
TCP
network
ECE=1
link
physical
ECN
link
physical
=10
Transport Layer: 3-133
TCP fairness
Fairness goal: if K TCP sessions share same bottleneck link of bandwidth R, each should have average rate of R/K
TCP connection 1
TCP connection 2
bottleneck router capacity R
Transport Layer: 3-134
Q: is TCP Fair?
Example: two competing TCP sessions:
§additive increase gives slope of 1, as throughout increases §multiplicative decrease decreases throughput proportionally
R
equal bandwidth share
loss: decrease window by factor of 2 congestion avoidance: additive increase
loss: decrease window by factor of 2 congestion avoidance: additive increase
Connection 1 throughput R
Is TCP fair?
A: Yes, under idealized assumptions:
§ same RTT
§ fixed number of sessions
only in congestion avoidance
Transport Layer: 3-135
Fairness: must all network apps be “fair”?
Fairness and UDP
§multimedia apps often do not use TCP
• do not want rate throttled by congestion control
§instead use UDP:
• send audio/video at constant rate,
tolerate packet loss
§there is no “Internet police” policing use of congestion control
Fairness, parallel TCP connections
§application can open multiple parallel connections between two hosts
§web browsers do this , e.g., link of rate R with 9 existing connections:
• new app asks for 1 TCP, gets rate R/10 • new app asks for 11 TCPs, gets R/2
Transport Layer: 3-136
135 136
34
Connection 2 throughput
137 138
1/5/21
Transport layer: roadmap
§Transport-layer services §Multiplexing and demultiplexing §Connectionless transport: UDP §Principles of reliable data transfer §Connection-oriented transport: TCP §Principles of congestion control §TCP congestion control
§Evolution of transport-layer functionality
Transport Layer: 3-137
Evolving transport-layer functionality
§TCP, UDP: principal transport protocols for 40 years §different “flavors” of TCP developed, for specific scenarios:
§moving transport–layer functions to application layer, on top of UDP • HTTP/3: QUIC
Transport Layer: 3-138
Scenario
Challenges
Long, fat pipes (large data transfers)
Many packets “in flight”; loss shuts down pipeline
Wireless networks
Loss due to noisy wireless links, mobility; TCP treat this as congestion loss
Long-delay links
Extremely long RTTs
Data center networks
Latency sensitive
Background traffic flows
Low priority, “background” TCP flows
QUIC: Quick UDP Internet Connections
§application-layer protocol, on top of UDP
• increase performance of HTTP
• deployed on many Google servers, apps (Chrome, mobile YouTube app)
HTTP/2
Application HTTP/3
HTTP/2 (slimmed)
TLS
QUIC
Transport Network
TCP
UDP
IP
HTTP/2 over TCP HTTP/2 over QUIC over UDP
IP
Transport Layer: 3-139
QUIC: Quick UDP Internet Connections
adopts approaches we’ve studied in this chapter for connection establishment, error control, congestion control
• error and congestion control: “Readers familiar with TCP’s loss detection and congestion control will find algorithms here that parallel well-known TCP ones.” [from QUIC specification]
• connection establishment: reliability, congestion control, authentication, encryption, state established in one RTT
§ multiple application-level “streams” multiplexed over single QUIC connection
• separate reliable data transfer, security • common congestion control
Transport Layer: 3-140
139 140
35
141 142
1/5/21
QUIC: Connection establishment
TCP handshake
(transport layer)
QUIC handshake
data
QUIC: reliability, congestion control, authentication, crypto state
TLS handshake
(security)
TCP (reliability, congestion control state) + TLS (authentication, crypto state)
§2 serial handshakes
§1 handshake
data
Transport Layer: 3-141
QUIC: streams: parallelism, no HOL blocking
HTTP GET
HTTP GET
TLS enc
TCP R
TCP Cong
HTTP GET
HTTP GET
ryption
D
T
. Contr.
TLS encr
TeCrroPr! RDT
TCP C
ong.
Contr.
QUIC encrypt
(a) HTTP 1.1 (b) HTTP/2 with QUIC: no HOL blocking
Transport Layer: 3-142
HTTP GET
QUIC encrypt
HTTP GET
QUIC encrypt
yption
QUIC C
ong. Cont.
UD
P
QUIC encrypt
QUIC RDT
QUIC encrypt
QUIC encrypt
QUIC RDT
QUIC RDT
QUIC RDT
QUIC RDT
QUIC C
QUIC
error!
U
RDT
ong. Cont.
DP
Chapter 3: summary
§principles behind transport layer services:
• multiplexing, demultiplexing • reliable data transfer
• flow control
• congestion control
§instantiation, implementation in the Internet
• UDP • TCP
Up next:
§leaving the network “edge” (application, transport layers)
§into the network “core”
§two network-layer chapters:
• data plane
• control plane
Transport Layer: 3-143
Additional Chapter 3 slides
Transport Layer: 3-144
143 144
36
transport
application
145 146
1/5/21
Go-Back-N: sender extended FSM
rdt_send(data)
if (nextseqnum < base+N) {
sndpkt[nextseqnum] = make_pkt(nextseqnum,data,chksum) udt_send(sndpkt[nextseqnum])
if (base == nextseqnum)
start_timer nextseqnum++ }
L else
base=1 nextseqnum=1
rdt_rcv(rcvpkt)
&& corrupt(rcvpkt)
refuse_data(data)
Wait
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
base = getacknum(rcvpkt)+1 If (base == nextseqnum)
stop_timer else
start_timer
timeout
start_timer udt_send(sndpkt[base]) udt_send(sndpkt[base+1])
... udt_send(sndpkt[nextseqnum-1])
Transport Layer: 3-145
Go-Back-N: receiver extended FSM
any other event udt_send(sndpkt)
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt)
&& hasseqnum(rcvpkt,expectedseqnum)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(expectedseqnum,ACK,chksum) udt_send(sndpkt)
expectedseqnum++
L
expectedseqnum=1 Wait sndpkt =
make_pkt(expectedseqnum,ACK,chksum)
ACK-only: always send ACK for correctly-received packet with highest in-order seq #
• may generate duplicate ACKs
• need only remember expectedseqnum
§out-of-order packet:
• discard (don’t buffer): no receiver buffering! • re-ACK pkt with highest in-order seq #
Transport Layer: 3-146
TCP sender (simplified) L wait
data received from application above create segment, seq. #: NextSeqNum
pass segment to IP (i.e., “send”) NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum for
start timer
timeout
retransmit not-yet-acked segment
with smallest seq. # start timer
event
ACK received, with ACK field value y
if (y > SendBase) {
SendBase = y
/* SendBase–1: last cumulatively ACKed byte */ if (there are currently not-yet-acked segments)
start timer else stop timer
}
Transport Layer: 3-147
TCP 3-way handshake FSM
Socket connectionSocket = welcomeSocket.accept();
L
SYN(x)
SYNACK(seq=y,ACKnum=x+1) create new socket for communication back to client
SYN rcvd
closed
listen
ESTAB
Socket clientSocket = newSocket(“hostname”,”port number”);
SYN(seq=x)
SYN sent
ACK(ACKnum=y+1)
L
SYNACK(seq=y,ACKnum=x+1) ACK(ACKnum=y+1)
Transport Layer: 3-148
147 148
37
149 150
1/5/21
Closing a TCP connection
client state
server state
ESTAB
CLOSE_WAIT
LAST_ACK
CLOSED
ESTAB FIN_WAIT_1
FIN_WAIT_2
TIMED_WAIT
CLOSED
clientSocket.close()
can no longer send but can receive data
wait for server close
timed wait
for 2*max segment lifetime
FINbit=1, seq=x
ACKbit=1; ACKnum=x+1
FINbit=1, seq=y ACKbit=1; ACKnum=y+1
can still send data
can no longer send data
Transport Layer: 3-149
TCP throughput
§avg. TCP thruput as function of window size, RTT? • ignore slow start, assume there is always data to send
§W: window size (measured in bytes) where loss occurs • avg. window size (# in-flight bytes) is 3⁄4 W
• avg. thruput is 3/4W per RTT
avg TCP thruput = 3 W bytes/sec 4 RTT
W W/2
TCP over “long, fat pipes”
§example: 1500 byte segments, 100ms RTT, want 10 Gbps throughput §requires W = 83,333 in-flight segments
§throughput in terms of segment loss probability, L [Mathis 1997]:
TCP throughput = 1.22 . MSS RTT L
➜ to achieve 10 Gbps throughput, need a loss rate of L = 2·10-10 – a very small loss rate!
§versions of TCP for long, high-speed scenarios
Transport Layer: 3-151
151
38