Computer Systems Transport Layer
Dr. Mian M. Hamayun
m.m.hamayun@bham.ac.uk
Based on material and slides from
Computer Networking: A Top Down
Approach, 7th Edition – Chapter 3 Jim Kurose, Keith Ross Pearson/Addison Wesley
Lecture Objective
The objective of this lecture is to understand the conceptual and implementation aspects of transport layer protocols
Slide# 2 of 63
Lecture Outline
Transport Layer Services & Protocols Multiplexing / Demultiplexing
User Datagram Protocol (UDP)
Principles of Reliable Data Transfer
Transmission Control Protocol (TCP) Flow Control
Connection Management
Summary
Slide# 3 of 63
Recap – Network Layers
Slide# 4 of 63
Transport Layer Services
Provide logical communication between application processes running on different hosts.
Transport protocols run in end systems (not routers)
send side: breaks app messages into segments, passes to network layer
receive side: reassembles segments into messages, passes to application layer
More than one transport protocol available to apps
Internet: TCP and UDP
Slide# 5 of 63
Transport vs. Network Layer
Network layer: logical communication between hosts.
Transport layer: logical communication between processes
relies on, enhances, network layer services
Household Analogy:
12 kids in Ann’s house sending letters to 12 kids in Bill’s house:
hosts = houses
processes = kids
app messages = letters in envelopes
transport protocol = Ann and Bill who demux to in-house siblings
network-layer protocol = postal service
Slide# 6 of 63
Internet Transport Layer Protocols
Reliable, in-order delivery (TCP)
congestion control flow control connection setup
Unreliable, unordered delivery: UDP
no-frills extension of “best- effort” IP
Services not available: delay guarantees bandwidth guarantees
Slide# 7 of 63
Multiplexing / Demultiplexing
multiplexing at sender:
handle data from multiple sockets, add transport header (later used for demultiplexing)
demultiplexing at receiver:
use header info to deliver received segments to correct socket
Slide# 8 of 63
How Demultiplexing Works?
Host receives IP datagrams each datagram has source IP
address, destination IP address
each datagram carries one transport-layer segment
each segment has source, destination port number
port numbers 0 – 65535
well-known ports 0 – 1023
Host uses IP addresses & port numbers to direct segment to appropriate socket
TCP/UDP segment format
Slide# 9 of 63
Connectionless Demultiplexing
When a UDP Socket is created, a host-local port# is assigned to it.
When creating a datagram to send into a UDP socket, we must specify
Destination IP Addr Destination Port #
When a host receives UDP segment:
checks destination port# in segment
directs UDP segment to socket with that port #
IP datagrams with same dest. port #, but different source IP addresses and/ or source port numbers will be directed to same socket at destination
Slide# 10 of 63
Connectionless Demultiplexing – Example
Slide# 11 of 63
Connection-oriented Demultiplexing
TCP socket identified by 4-tuple:
source IP address source port number dest IP address
dest port number
Demux: receiver uses all four values to direct segment to appropriate socket
Server host may support many simultaneous TCP sockets:
each socket identified by its own 4-tuple
Web servers have different sockets for each connecting client
non-persistent HTTP will have different socket for each request
Slide# 12 of 63
Connection-oriented Demultiplexing – Example
Slide# 13 of 63
UDP: User Datagram Protocol [RFC 768]
“No Frills”, “Bare Bones” Internet transport protocol
“Best Effort” service, UDP segments may be:
Lost
Delivered out-of-order
Connectionless:
no handshaking between UDP
sender, receiver
each UDP segment handled independently of others
UDP used in:
streaming multimedia apps (loss
tolerant, rate sensitive)
DNS (Domain Name System)
SNMP (Simple Network Management Protocol)
Reliable transfer over UDP:
add reliability at application layer e.g. QUIC (Quick UDP Internet Connections) protocol
application-specific error recovery!
Slide# 14 of 63
UDP: Popular Internet Applications
Slide# 15 of 63
UDP: Segment Header
UDP segment format
length, in bytes of UDP segment, including header
Why is there a UDP?
no connection establishment (which can add delay)
simple: no connection state at sender, receiver
small header size (8 bytes)
no congestion control: UDP can blast away as fast as desired
Slide# 16 of 63
UDP Checksum
Goal: detect “errors” (e.g., flipped bits) in transmitted segment
Sender:
Treat segment contents, including header fields, as sequence of 16-bit integers
Checksum: addition (one’s complement sum) of segment contents
Sender puts checksum value into UDP checksum field
Receiver:
Compute checksum of received segment
Check if computed checksum equals checksum field value:
NO – error detected
YES – no error detected.
But maybe errors nonetheless? More later …
Slide# 17 of 63
UDP Checksum – Example
wraparound
sum checksum
11110011001100110 11101010101010101
11011101110111011 11011101110111100
10100010001000011
Note: when adding numbers, a carryout from the most significant bit needs to be added to the result
Slide# 18 of 63
Principles of Reliable Data Transfer
Slide# 19 of 63
Reliable Data Transfer over a Perfectly Reliable Channel: rdt1.0
Underlying channel perfectly reliable no bit errors
no loss of packets
Separate FSMs for sender, receiver: sender sends data into underlying channel receiver reads data from underlying channel
Slide# 20 of 63
RDT2.0: Channel with Bit Errors
Underlying channel may flip bits in packet checksum to detect bit errors
The question: how to recover from errors:
Acknowledgements (ACKs): receiver explicitly tells sender that packet
received OK
Negative Acknowledgements (NAKs): receiver explicitly tells sender that packet had errors
sender retransmits packet on receipt of NAK New mechanisms in rdt2.0 (beyond rdt1.0):
error detection
receiver feedback: control messages (ACK,NAK) receiver->sender
Slide# 21 of 63
RDT2.0: FSM Specifications
Slide# 22 of 63
RDT2.0: Operation with No Errors
Slide# 23 of 63
RDT2.0: Operation with Errors
Slide# 24 of 63
RDT2.0 has a fatal flaw!
What happens if ACK / NAK corrupted?
Sender doesn’t know what happened at receiver!
Can’t just retransmit: possible duplicate
Handling Duplicates: Sender retransmits current
packet if ACK/NAK corrupted Sender adds sequence
number to each packet
Receiver discards (doesn’t
deliver up) duplicate packet
Stop and Wait
sender sends one packet, then waits for receiver response
Slide# 25 of 63
RDT2.1: Sender, handles garbled ACK/NAKS
Slide# 26 of 63
RDT2.1: Receiver, handles garbled ACK/NAKS
Slide# 27 of 63
RDT2.1: Discussion
Sender:
Seq # added to pkt Two seq. #’s (0,1) will
suffice. Why?
Must check if received ACK/NAK corrupted
Twice as many states
state must “remember” whether “expected” packet should have seq # of 0 or 1
Receiver:
Must check if received packet is duplicate
state indicates whether 0 or 1 is expected packet seq #
Note: receiver can not know if its last ACK/ NAK received OK at sender
Slide# 28 of 63
RDT2.2: A NAK-free Protocol
Same functionality as rdt2.1, using ACKs only Instead of NAK, receiver sends ACK for last pkt
received OK
receiver must explicitly include seq # of packet being ACKed
Duplicate ACK at sender results in same action as NAK: retransmit current packet
Slide# 29 of 63
RDT2.2: Sender, Receiver FSM Fragments
Slide# 30 of 63
RDT3.0: Channels with Errors and Loss
New Assumption:
underlying channel can also lose packets (data, ACKs)
Checksum, seq. #, ACKs, retransmissions will be of help … but not enough
Approach: sender waits “reasonable” amount of time for ACK
Retransmits if no ACK received in this time
If packet (or ACK) just delayed (not lost):
retransmission will be duplicate, but seq. #’s already handles this
receiver must specify seq # of packet being ACKed
Requires countdown timer
Slide# 31 of 63
RDT3.0: Channels with Errors and Loss
Slide# 32 of 63
RDT3.0 in Action!
Slide# 33 of 63
RDT3.0 in Action!
Slide# 34 of 63
Performance of RDT 3.0
rdt3.0 is correct, but performance stinks
e.g. 1 Gbps link, 15ms propagation delay, 8000 bit packet:
Usender: utilization – fraction of time sender busy sending If RTT = 30ms, 1KB packet every 30ms: 33kB/sec
throughput over 1 Gbps link!
Network protocols limits use of physical resources!
Slide# 35 of 63
Pipelined Protocols
Pipelining: sender allows multiple, “in-flight”, yet-to-be- acknowledged packets
range of sequence numbers must be increased buffering at sender and/or receiver
Two generic forms of pipelined protocols:
Go-Back-N (GBN) and Selective Repeat (SR)
Slide# 36 of 63
TCP Overview
[RFCs 793, 1122, 1323, 2018, 2581]
Point-to-Point:
one sender, one receiver
Reliable, in-order byte steam:
no “message boundaries”
Pipelined:
TCP congestion and flow control set window size
Full duplex data:
bi-directional data flow in same
connection
MSS: maximum segment size (app data size, usually 1460 bytes)
Connection-oriented:
handshaking (exchange of control messages) initializes sender, receiver state before data exchange
Flow controlled:
sender will not overwhelm the receiver
Slide# 37 of 63
TCP: Segment Structure
URG: urgent data (generally not used)
ACK: ACK # valid
PSH: push data now (generally not used)
RST, SYN, FIN: connection estab (setup, teardown commands)
Internet checksum (as in UDP)
counting
by bytes
of data
(not segments!)
# bytes receiver willing to accept
Slide# 38 of 63
TCP Sequence Numbers & ACKs
Implicit numbering for each byte of data File Size = 500,000 bytes
MSS = 1000 bytes
Total Segments = 500
First Segment Seq. # = 0
Second Segment Seq. # 1000
Third Segment Seq. # 2000 & so on.
Slide# 39 of 63
TCP Sequence Numbers & ACKs
Sequence Numbers:
byte stream “number” of first byte in segment’s data
Acknowledgments:
seq # of next byte expected from other side
cumulative ACK
Q: How receiver handles out-
of-order segments?
A: TCP spec doesn’t say, up to the implementer (usually kept)
Slide# 40 of 63
TCP Sequence Numbers & ACKs
Slide# 41 of 63
TCP Round Trip Time (RTT), Timeout
Q: How to set TCP timeout value?
longer than RTT but RTT varies
too short: premature timeout, unnecessary retransmissions
too long: slow reaction to segment loss
Q: How to estimate RTT?
SampleRTT: measured time from segment transmission until ACK receipt
ignore retransmissions
SampleRTT will vary, want
estimated RTT “smoother”
average several recent measurements, not just current SampleRTT
Slide# 42 of 63
TCP Round Trip Time (RTT), Timeout
EstimatedRTT = (1-)*EstimatedRTT + *SampleRTT
Exponential Weighted Moving Average (EWMA)
Influence of past sample decreases exponentially fast
typical value: = 0.125
Slide# 43 of 63
TCP Round Trip Time (RTT), Timeout
Timeout Interval: EstimatedRTT plus “safety margin” large variation in EstimatedRTT -> larger safety margin
Estimate SampleRTT deviation from EstimatedRTT:
DevRTT = (1-)*DevRTT + *|SampleRTT-EstimatedRTT|
(typically, = 0.25) TimeoutInterval = EstimatedRTT + 4*DevRTT
estimated RTT
“safety margin”
Slide# 44 of 63
TCP Reliable Data Transfer
TCP creates reliable data transfer service on top of IP’s unreliable best-effort service
pipelined segments cumulative acks
single retransmission timer
Retransmissions are triggered by:
timeout events duplicate acks
Let’s initially consider simplified TCP sender:
ignore duplicate acks
ignore flow control, congestion control
Slide# 45 of 63
TCP Sender Events
Data Received from App:
Create segment with seq #
Seq # is byte-stream number of first data byte in segment
Start timer if not already running
think of timer as for oldest unacked segment
expiration interval: TimeOutInterval
Timeout:
Retransmit segment that
caused timeout Restart Timer
Ack Received:
If ack acknowledges previously unacked segments
update what is known to be ACKed
start timer if there are still unacked segments
Slide# 46 of 63
TCP Sender (Simplified)
Slide# 47 of 63
TCP Retransmission Scenarios
Lost ACK Scenario Time-out Scenario; Seg#100 not retransmitted
Slide# 48 of 63
TCP Retransmission Scenarios
Cumulative ACK
Slide# 49 of 63
TCP ACK Generation
[RFC 1122, RFC 2581]
Event at Receiver
Arrival of in-order segment with expected seq #. All data up to expected seq # already ACKed
Arrival of in-order segment with expected seq #. One other segment has ACK pending
Arrival of out-of-order segment higher-than-expect seq. # . Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver Action
Delayed ACK. Wait up to 500ms
for next segment. If no next segment, send ACK
Immediately send single cumulative ACK, ACKing both in-order segments
Immediately send duplicate ACK, indicating seq. # of next expected byte
Immediate send ACK, provided that segment starts at lower end of gap
Slide# 50 of 63
TCP Fast Retransmit
Time-out period often relatively long:
Long delay before resending lost packet
Detect lost segments via duplicate ACKs.
Sender often sends many segments back-to-back
If segment is lost, there will likely be many duplicate ACKs.
TCP Fast Retransmit
If sender receives 3 ACKs for same data
(“triple duplicate ACKs”), resend unacked segment with smallest sequence #.
Likely that unacked segment lost, so don’t wait for timeout.
Slide# 51 of 63
TCP Fast Retransmit
Slide# 52 of 63
TCP Flow Control
[Speed-matching b/w Sender & Receiver]
Flow Control
Receiver controls sender, so sender won’t overflow receiver’s buffer by transmitting too much data, too fast.
Slide# 53 of 63
TCP Flow Control
[Speed-matching b/w Sender & Receiver]
Receiver “advertises” free buffer space by including rwnd value in TCP header of receiver-to-sender segments
RcvBuffer size set via socket options (typical default is 4096 bytes)
many operating systems autoadjust RcvBuffer
Receiver-side Buffering
Receiver:
rwnd = RcvBuffer –
[LastByteRcvd – LastByteRead]
Sender:
LastByteSent – LastByteAcked <= rwnd
Sender limits amount of unacked (“in-flight”) data to receiver’s rwnd value
Guarantees receive buffer will not overflow
Slide# 54 of 63
TCP Connection Management
Before exchanging data, sender/receiver “handshake”: agree to establish connection (each knowing the other willing
to establish connection)
agree on connection parameters
Slide# 55 of 63
TCP 3-Way Handshake
Client State
LISTEN
choose init seq num, x
server state
LISTEN
SYN RCVD
SYNSENT
SYNbit=1, Seq=x
SYNbit=1, Seq=y ACKbit=1; ACKnum=x+1
ACKbit=1, ACKnum=y+1
choose init seq num, y send TCP SYNACK msg, acking SYN
send TCP SYN msg
ESTAB
received SYNACK(x) indicates server is live; send ACK for SYNACK;
this segment may contain client-to-server data
received ACK(y) indicates client is live
ESTAB
Slide# 56 of 63
TCP 3-Way Handshake
closed
Socket connectionSocket =
welcomeSocket.accept();
SYN(x)
SYNACK(seq=y,ACKnum=x+1) create new socket for communication back to client
listen
Socket clientSocket =
newSocket("hostname","port
number");
SYN(seq=x)
SYN sent
SYNACK(seq=y,ACKnum=x+1) ACK(ACKnum=y+1)
SYN rcvd
ACK(ACKnum=y+1)
ESTAB
Slide# 57 of 63
Closing a TCP Connection
Client, server each close their side of connection send TCP segment with FIN bit = 1
Respond to received FIN with ACK
on receiving FIN, ACK can be combined with own FIN
Simultaneous FIN exchanges can be handled
Slide# 58 of 63
Closing a TCP Connection
Client state
ESTAB
FIN_WAIT_1 FIN_WAIT_2
TIMED_WAIT
segment lifetime
CLOSED
Server state
ESTAB
CLOSE_WAIT send data
LAST_ACK
CLOSED
clientSocket.close()
can no longer send but can receive data
FINbit=1, seq=x
ACKbit=1; ACKnum=x+1 close
wait for server
can still
FINbit=1, seq=y ACKbit=1; ACKnum=y+1
can no longer send data
timed wait for 2*max
Slide# 59 of 63
TCP Client & Server States
Slide# 60 of 63
TCP Client & Server States
Slide# 61 of 63
Summary
We have studied:
Principles behind transport layer services: multiplexing, demultiplexing
reliable data transfer
flow control
Implementation of the Internet Protocols UDP
TCP
Slide# 62 of 63
References / Links
Chapter #3: Transport Layer, Computer Networking: A Top-Down Approach (7th edition) by Kurose & Ross
Slide# 63 of 63