程序代写代做代考 dns C Chapter 3 Transport Layer

Chapter 3 Transport Layer
Chapter 3: Transport Layer
All material copyright 1996-2016
J.F Kurose and K.W. Ross, All Rights Reserved
Chapter 3 outline
Transport services and protocols
3.1 transport-layer services
3.5 connection-oriented transport: TCP
 
provide logical communication between app processes running on different hosts
transport
3.2 multiplexing and demultiplexing
• segment structure
• reliable data transfer
• flow control
• connection management
transport protocols run in end systems
3.3 connectionless transport: UDP
3.6 principles of congestion control
• send side: breaks app messages into segments, passes to network layer
3.4 principles of reliable data transfer
• rcv side: reassembles segments into messages, passes to app layer
application
3.7 TCP congestion control
transport
Transport Layer 2-1
Transport Layer 3-2
Transport Layer 3-3
Transport Layer 3-4

more than one transport protocol available to apps
network data link physical
our goals:
understandprinciples learnaboutInternet
behind transport layer services:
transport layer protocols:
• multiplexing, demultiplexing
• UDP: connectionless transport
• reliable data transfer
• flow control
• congestion control
• TCP: connection-oriented reliable transport
• Internet: TCP and UDP
• TCP congestion control
application
network data link physical

Transport vs. network layer
Internet transport-layer protocols
network layer: logical communication between hosts

reliable, in-order delivery (TCP)
application
transport layer: logical
12 kids in Ann’s house sending letters to 12 kids in Bill’s house:
• congestion control • flow control
• connection setup
network data link physical
network data link physical
communication between processes
 hosts = houses
network data link physical
• relies on, enhances, network layer services
 app messages = letters in envelopes
• no-frills extension of “best-effort” IP
network data link physical
network data link physical
Chapter 3 outline
Multiplexing/demultiplexing
3.1 transport-layer services
3.5 connection-oriented transport: TCP
multiplexing at sender:
demultiplexing at receiver:
3.2 multiplexing and demultiplexing
• segment structure
• reliable data transfer
• flow control
• connection management
handle data from multiple sockets, add transport header (later used for demultiplexing)
use header info to deliver received segments to correct socket
3.3 connectionless transport: UDP
application
3.4 principles of reliable data transfer
3.6 principles of congestion control
P3
transport network
P4
process
household analogy:
network data link physical
network data link physical
 processes = kids
 
unreliable, unordered delivery: UDP
 transport protocol = Ann and Bill who demux to in- house siblings
application
 network-layer protocol = postal service
services not available: • delay guarantees
• bandwidth guarantees
network data link physical
3.7 TCP congestion control
transport network
link physical
transport network
Transport Layer 3-5
Transport Layer 3-6
Transport Layer 3-7
Transport Layer 3-8
application
P1 P2
application
socket
link physical
link physical
transport
network data link physical
transport

How demultiplexing works
Connectionless demultiplexing
 host receives IP datagrams • each datagram has source IP
32 bits
 recall: created socket has
 recall: when creating datagram to send into UDP socket, must specify
address, destination IP address
DatagramSocket mySocket1
= new DatagramSocket(12534);
• destination IP address • destination port #
• each datagram carries one transport-layer segment
other header fields
• each segment has source, destination port number
application data (payload)
 when host receives UDP segment:
IP datagrams with same dest. port #, but different source IP addresses and/or source port numbers will be directed to same socket at dest
 host uses IP addresses & port numbers to direct segment to appropriate socket
• checks destination port # in segment
Connectionless demux: example
Connection-oriented demux
DatagramSocket
mySocket2 = new
DatagramSocket
DatagramSocket
serverSocket = new
DatagramSocket
DatagramSocket
mySocket1 = new
DatagramSocket
(5775);

TCP socket identified by 4-tuple:
 server host may support many simultaneous TCP sockets:
(9157); application
(6428); application
• source IP address
• source port number • dest IP address
• dest port number
• each socket identified by its own 4-tuple
P3
P4
transport network
transport network
transport network
 web servers have different sockets for each connecting client
link physical
link physical
link physical

demux: receiver uses all four values to direct segment to appropriate socket
source port: 9157 dest port: 6428
source port: ? dest port: ?
source port: 6428 dest port: 9157
source port: ? dest port: ?
• non-persistent HTTP will have different socket for each request
P1
application
source port #
dest port #
host-local port #:
TCP/UDP segment format
• directs UDP segment to socket with that port #
Transport Layer 3-9
Transport Layer 3-10
Transport Layer 3-11
Transport Layer 3-12

Connection-oriented demux: example
Connection-oriented demux: example
host: IP address A
source IP,port: B,80 dest IP,port: A,9157
source IP,port: C,5775 dest IP,port: B,80
host: IP address C
host: IP address A
source IP,port: B,80 dest IP,port: A,9157
source IP,port: C,5775 dest IP,port: B,80
host: IP address C
application
P4
P5 P6
application
application
P4
application
P3
P2 P3
P3
P2 P3
transport network
transport network link
transport network
transport network
transport network link
transport network
link physical
physical
link physical
link physical
physical
link physical
source IP,port: A,9157 dest IP, port: B,80
source IP,port: C,9157 dest IP,port: B,80
source IP,port: A,9157 dest IP, port: B,80
source IP,port: C,9157 dest IP,port: B,80
three segments, all destined to IP address: B, dest port: 80 are demultiplexed to different sockets
Transport Layer 3-13
Transport Layer 3-14
Chapter 3 outline
UDP: User Datagram Protocol [RFC 768]
3.1 transport-layer services
3.5 connection-oriented transport: TCP
 “no frills,” “bare bones” Internet transport protocol
 UDP use:
 streaming multimedia
3.2 multiplexing and demultiplexing
• segment structure
• reliable data transfer
• flow control
• connection management
 “best effort” service, UDP segments may be:
apps (loss tolerant, rate sensitive)
3.3 connectionless transport: UDP
to app
 SNMP
 reliable transfer over
3.4 principles of reliable data transfer
3.6 principles of congestion control
 connectionless:
• no handshaking
UDP:
application
application
server: IP address B
server: IP address B
3.7 TCP congestion control
between UDP sender, receiver
 add reliability at application layer
Transport Layer 3-15
Transport Layer 3-16
• lost
• delivered out-of-order
 DNS
• each UDP segment handled independently of others
 application-specific error recovery!
threaded server

UDP: segment header
UDP checksum
source port # length
dest port # checksum
why is there a UDP?
sender:
receiver:
32 bits
length, in bytes of UDP segment, including header
Goal: detect “errors” (e.g., flipped bits) in transmitted segment
application data (payload)
 no connection establishment (which can add delay)
 treat segment contents, including header fields, as sequence of 16-bit integers
 compute checksum of received segment
UDP segment format
 small header size
 no congestion control:
 sender puts checksum value into UDP checksum field
• YES – no error detected. But maybe errors nonetheless? More later ….
Internet checksum: example
Chapter 3 outline
example: add two 16-bit integers
3.1 transport-layer services
3.5 connection-oriented transport: TCP
 simple: no connection state at sender, receiver
 checksum: addition (one’s complement sum) of segment contents
 check if computed checksum equals checksum field value:
11110011001100110 11101010101010101
3.2 multiplexing and demultiplexing
• segment structure
• reliable data transfer •flowcontrol •connectionmanagement
wraparound 11011101110111011 sum 11011101110111100
3.3connectionless transport:UDP
3.6principlesofcongestion control
checksum 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1 Note: when adding numbers, a carryout from the most
3.4principlesofreliable data transfer
3.7 TCP congestion control
significant bit needs to be added to the result
UDP can blast away as fast as desired
Transport Layer 3-17
Transport Layer 3-18
Transport Layer 3-19
Transport Layer 3-20
• NO – error detected

Principles of reliable data transfer
Principles of reliable data transfer
 important in application, transport, link layers • top-10 list of important networking topics!
 important in application, transport, link layers • top-10 list of important networking topics!
 characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
 characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Principles of reliable data transfer
Reliable data transfer: getting started
 important in application, transport, link layers • top-10 list of important networking topics!
rdt_send(): called from above, (e.g., by app.). Passed data to deliver to receiver upper layer
deliver_data(): called by rdt to deliver data to upper
 characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
udt_send(): called by rdt, to transfer packet over unreliable channel to receiver
rdt_rcv(): called when packet arrives on rcv-side of channel
Transport Layer 3-21
Transport Layer 3-22
Transport Layer 3-23
Transport Layer 3-24
send side
receive side

Reliable data transfer: getting started
rdt1.0: reliable transfer over a reliable channel
we’ll:
 incrementally develop sender, receiver sides of
 underlying channel perfectly reliable • no bit errors
• no loss of packets
reliable data transfer protocol (rdt)
 consider only unidirectional data transfer • but control info will flow on both directions!
 separate FSMs for sender, receiver:
• sender sends data into underlying channel
• receiver reads data from underlying channel
 use finite state machines (FSM) to specify sender,
receiver
event causing state transition actions taken on state transition
state: when in this “state” next state uniquely determined by next event
state 1
state 2
packet = make_pkt(data) udt_send(packet)
extract (packet,data) deliver_data(data)
rdt2.0: channel with bit errors
rdt2.0: channel with bit errors
 underlying channel may flip bits in packet • checksum to detect bit errors
 underlying channel may flip bits in packet • checksum to detect bit errors
 the question: how to recover from errors:
• acknowledgements (ACKs): receiver explicitly tells sender
 the question: how to recover from errors:
• acknowledgements (ACKs): receiver explicitly tells sender
that pkt received OK
that pkt received OK
event actions
sender
receiver
• negative acknowledgements (NAKs): receiver explicitly tells sender that pkt had errors
• negative acknowledgements (NAKs): receiver explicitly tells sender that pkt had errors
• sender retransmits pkt on receipt of NAK
How do humans recover from “errors”
• sender retransmits pkt on receipt of NAK
 new mechanisms in rdt2.0 (beyond rdt1.0):
 new mechanisms in rdt2.0 (beyond rdt1.0): during conversation?
• error detection
• error detection
• receiver feedback: control msgs (ACK,NAK) rcvr- >sender
• feedback: control msgs (ACK,NAK) from receiver to sender
Transport Layer 3-25
Transport Layer 3-26
Transport Layer 3-27
Transport Layer 3-28
Wait for call from above
rdt_send(data)
Wait for call from below
rdt_rcv(packet)

rdt2.0: FSM specification
rdt2.0: operation with no errors
rdt_send(data)
sndpkt = make_pkt(data, checksum)
receiver
rdt_send(data)
snkpkt = make_pkt(data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) && isNAK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) && isNAK(rcvpkt)
Wait for call from above
Wait for ACK or NAK
rdt_rcv(rcvpkt) && corrupt(rcvpkt)
Wait for call from above
Wait for ACK or NAK
rdt_rcv(rcvpkt) && corrupt(rcvpkt)
udt_send(sndpkt) rdt_rcv(rcvpkt) && isACK(rcvpkt)
udt_send(NAK)
udt_send(sndpkt) rdt_rcv(rcvpkt) && isACK(rcvpkt)
udt_send(NAK)
sender

Wait for call from below

Wait for call from below
rdt2.0: error scenario
rdt2.0 has a fatal flaw!
rdt_send(data)
snkpkt = make_pkt(data, checksum)
what happens if ACK/NAK corrupted?
handling duplicates:  sender retransmits
udt_send(sndpkt)
rdt_rcv(rcvpkt) && isNAK(rcvpkt)
Wait for call from above
Wait for ACK or NAK
rdt_rcv(rcvpkt) && corrupt(rcvpkt)
 
sender doesn’t know what happened at receiver!
current pkt if ACK/NAK corrupted
udt_send(sndpkt) rdt_rcv(rcvpkt) && isACK(rcvpkt)
udt_send(NAK)
 sender adds sequence number to each pkt

Wait for call from below
can’t just retransmit: possible duplicate
 receiver discards (doesn’t deliver up) duplicate pkt
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) extract(rcvpkt,data)
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) extract(rcvpkt,data)
deliver_data(data) udt_send(ACK)
deliver_data(data) udt_send(ACK)
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) extract(rcvpkt,data)
sender sends one packet, then waits for receiver response
deliver_data(data) udt_send(ACK)
Transport Layer 3-29
Transport Layer 3-30
Transport Layer 3-31
Transport Layer 3-32
stop and wait

rdt2.1: sender, handles garbled ACK/NAKs
rdt2.1: receiver, handles garbled ACK/NAKs
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt) && isACK(rcvpkt)
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt) && isACK(rcvpkt)
sndpkt = make_pkt(NAK, chksum) udt_send(sndpkt)
sndpkt = make_pkt(NAK, chksum) udt_send(sndpkt)


rdt_rcv(rcvpkt) &&
not corrupt(rcvpkt) && has_seq1(rcvpkt)
Wait for 0 from below
Wait for 1 from below
rdt_rcv(rcvpkt) &&
not corrupt(rcvpkt) && has_seq0(rcvpkt)
rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || isNAK(rcvpkt) )
Wait for ACK or NAK 1
Wait for call 1 from above
sndpkt = make_pkt(ACK, chksum) udt_send(sndpkt)
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq1(rcvpkt)
sndpkt = make_pkt(ACK, chksum) udt_send(sndpkt)
udt_send(sndpkt)
sndpkt = make_pkt(1, data, checksum) udt_send(sndpkt)
extract(rcvpkt,data) deliver_data(data)
sndpkt = make_pkt(ACK, chksum) udt_send(sndpkt)
rdt2.1: discussion
rdt2.2: a NAK-free protocol
sender:
receiver:
 
same functionality as rdt2.1, using ACKs only
 seq # added to pkt
 two seq. #’s (0,1) will
 must check if received packet is duplicate
instead of NAK, receiver sends ACK for last pkt received OK
suffice. Why?
• state indicates whether 0 or 1 is expected pkt seq #

• receiver must explicitly include seq # of pkt being ACKed duplicate ACK at sender results in same action as
 must check if received ACK/NAK corrupted
 note: receiver can not know if its last ACK/NAK received OK at sender
NAK: retransmit current pkt
 twice as many states
• state must “remember” whether “expected” pkt should have seq # of 0 or 1
rdt_send(data)
sndpkt = make_pkt(0, data, checksum)
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq0(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || isNAK(rcvpkt) )
extract(rcvpkt,data) deliver_data(data)
sndpkt = make_pkt(ACK, chksum) udt_send(sndpkt)
Wait for call 0 from above
Wait for ACK or NAK 0
udt_send(sndpkt)
rdt_rcv(rcvpkt) && (corrupt(rcvpkt)
rdt_rcv(rcvpkt) && (corrupt(rcvpkt)
rdt_send(data)
Transport Layer 3-33
Transport Layer 3-34
Transport Layer 3-35
Transport Layer 3-36

rdt2.2: sender, receiver fragments
rdt3.0: channels with errors and loss
rdt_rcv(rcvpkt) && (corrupt(rcvpkt) ||
rdt_rcv(rcvpkt) &&notcorrupt(rcvpkt) && isACK(rcvpkt,0)
• checksum, seq. #, ACKs,retransmissions will be of help … but not enough
ifpkt(orACK)justdelayed (not lost):
has_seq1(rcvpkt)) udt_send(sndpkt)
Wait for 0 from below
receiver FSM fragment

• retransmission will be duplicate, but seq. #’s already handles this
rdt3.0 sender
rdt3.0 in action
rdt_rcv(rcvpkt)
send pkt0
pkt0
send pkt0 pkt0
timeout udt_send(sndpkt) start_timer
Wait for ACK1
Wait for call 1 from above
ack0
rcv pkt0 send ack0
resend pkt1 pkt1
ack1 send ack1

Wait for call 0from above
Wait for ACK0
timeout udt_send(sndpkt) start_timer
ack0
rcv pkt0 send ack0
rcv pkt0 ack0 send ack0
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt) && isACK(rcvpkt,1)
rcv ack0 send pkt1
pkt1
rcv pkt1 send ack1
rcv ack0
send pkt1 pkt1X
stop_timer
rcv ack1 send pkt0
pkt0
timeout
rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || isACK(rcvpkt,0) )
rdt_send(data)


(b) packet loss
Transport Layer 3-40
rdt_send(data)
sndpkt = make_pkt(0, data, checksum)
new assumption:
approach: sender waits “reasonable” amount of time for ACK
udt_send(sndpkt)
rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) ||
underlying channel can also lose packets (data, ACKs)
Wait for call 0 from above
sender FSM fragment
Wait for ACK 0
isACK(rcvpkt,1) ) udt_send(sndpkt)
 retransmits if no ACK received in this time
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq1(rcvpkt)
• receiver must specify seq # of pkt being ACKed
extract(rcvpkt,data) deliver_data(data)
sndpkt = make_pkt(ACK1, chksum) udt_send(sndpkt)
Transport Layer 3-37
 requires countdown timer Transport Layer 3-38
rdt_send(data)
rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || isACK(rcvpkt,1) ) 
sndpkt = make_pkt(0, data, checksum) udt_send(sndpkt)
start_timer
sender
receiver
sender
receiver
sndpkt = make_pkt(1, data, checksum) udt_send(sndpkt)
start_timer
(a) no loss
ack0 send ack0
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt) && isACK(rcvpkt,0) stop_timer
ack1
loss
rdt_rcv(rcvpkt)
rcv ack1 pkt0 send pkt0
rcv pkt1 rcv pkt0
Transport Layer 3-39

rdt3.0 in action
sender
receiver
Performance of rdt3.0
sender
receiver
send pkt0
pkt0
 rdt3.0 is correct, but performance stinks
 e.g.: 1 Gbps link, 15 ms prop. delay, 8000 bit packet:
send pkt0
pkt0 rcv pkt0 ack0 send ack0
ack0
rcv pkt0 send ack0
rcv ack0 send pkt1
rcv ack0 send pkt1
pkt1
rcv pkt1 send ack1
timeout
rcv ack1 send pkt0
rcv pkt1 (detect duplicate)
resend pkt1
pkt1 rcv pkt1 (detect duplicate)
send ack1
rcv ack1 send pkt0
ack1 send ack1
rcv ack1 send pkt0
ack1 ack0
rcv pkt0 send ack0
sender = = 30.008 = 0.00027 RTT + L / R
pkt1 rcv pkt1 ack1 send ack1
ack1
D = RL = 8000 bits = 8 microsecs trans 109 bits/sec
X
timeout
loss
resend pkt1
pkt1 pkt0
 U sender: utilization – fraction of time sender busy sending U L/R .008
(c) ACK loss
send ack0 (d) premature timeout/ delayed ACK
 network protocol limits use of physical resources! Transport Layer 3-42
pkt0
pkt0
rcv pkt0 (detect duplicate)
rcv pkt0 ack0 send ack0
ack0
if RTT=30 msec, 1KB pkt every 30 msec: 33kB/sec thruput over 1 Gbps link
rdt3.0: stop-and-wait operation
Pipelined protocols
sender first packet bit transmitted, t = 0
receiver
pipelining: sender allows multiple, “in-flight”, yet- to-be-acknowledged pkts
last packet bit transmitted, t = L / R RTT
first packet bit arrives
last packet bit arrives, send ACK
• range of sequence numbers must be increased • buffering at sender and/or receiver
ACK arrives, send next packet, t = RTT + L / R
Usender= L/R RTT + L / R
= .008 30.008
= 0.00027
 two generic forms of pipelined protocols: go-Back-N, selective repeat
Transport Layer 3-41
Transport Layer 3-43
Transport Layer 3-44

Pipelining: increased utilization
Pipelined protocols: overview
sender last bit transmitted, t = L / R
receiver
Go-back-N:
Selective Repeat:
first packet bit transmitted, t = 0
 sender can have up to N unacked packets in pipeline
 sender can have up to N unack’ed packets in pipeline

ACK(n): ACKs all pkts up to, including seq # n – “cumulative ACK”
Wait
 
• may receive duplicate ACKs (see receiver) timer for oldest in-flight pkt
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
ACK arrives, send next packet, t = RTT + L / R
• doesn’t ack packet if there’s a gap
U sender =
3L / R RTT + L / R
.0024
= 30.008 =
0.00081
• when timer expires, retransmit all unacked packets
• when timer expires, retransmit only that unacked packet
RTT
first packet bit arrives
last packet bit arrives, send ACK
last bit of 2nd packet arrives, send ACK last bit of 3rd packet arrives, send ACK
 receiver only sends cumulative ack
 
rcvr sends individual ack for each packet
Go-Back-N: sender
GBN: sender extended FSM
 
k-bit seq # in pkt header
“window” of up to N, consecutive unack’ed pkts allowed
rdt_send(data)
timeout(n): retransmit packet n and all higher seq # pkts in window
base = getacknum(rcvpkt)+1 If (base == nextseqnum)
3-packet pipelining increases utilization by a factor of 3!
 sender has timer for oldest unacked packet
sender maintains timer for each unacked packet
Transport Layer 3-45
Transport Layer 3-46
Transport Layer 3-47
start_timer
Transport Layer 3-48

else refuse_data(data)
base=1 nextseqnum=1
timeout
start_timer udt_send(sndpkt[base]) udt_send(sndpkt[base+1])
… udt_send(sndpkt[nextseqnum-1])
rdt_rcv(rcvpkt)
&& corrupt(rcvpkt)
if (nextseqnum < base+N) { sndpkt[nextseqnum] = make_pkt(nextseqnum,data,chksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ } stop_timer else GBN: receiver extended FSM GBN in action  && hasseqnum(rcvpkt,expectedseqnum) 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 Xloss expectedseqnum=1 Wait sndpkt = extract(rcvpkt,data) deliver_data(data) sndpkt = make_pkt(expectedseqnum,ACK,chksum) udt_send(sndpkt) expectedseqnum++ (wait) receive pkt3, discard, (re)send ack1 make_pkt(0,ACK,chksum) default udt_send(sndpkt) sender window (N=4) sender receiver rdt_rcv(rcvpkt) 012345678 &&notcurrupt(rcvpkt) 012345678 send pkt0 send pkt1 send pkt2 send pkt3 receive pkt0, send ack0 receive pkt1, send ack1 ACK-only: always send ACK for correctly-received pkt with highest in-order seq # ignore duplicate ACK • may generate duplicate ACKs • need only remember expectedseqnum pkt 2 timeout out-of-orderpkt: 012345678 012345678 • discard (don’t buffer): no receiver buffering! • re-ACK pkt with highest in-order seq # Selective repeat Selective repeat: sender, receiver windows  receiver individually acknowledges all correctly received pkts • buffers pkts, as needed, for eventual in-order delivery to upper layer  sender only resends pkts for which ACK not received • sender timer for each unACKed pkt  sender window • N consecutive seq #’s • limits seq #s of sent, unACKed pkts Transport Layer 3-49 Transport Layer 3-50 Transport Layer 3-51 Transport Layer 3-52 012345678 012345678 rcv ack0, send pkt4 rcv ack1, send pkt5 receive pkt4, discard, (re)send ack1 receive pkt5, discard, (re)send ack1 012345678 0 1 2 3 4 5 6 7 8 send pkt2 send pkt3 send pkt4 send pkt5 rcv pkt2, deliver, send ack2 rcv pkt3, deliver, send ack3 rcv pkt4, deliver, send ack4 rcv pkt5, deliver, send ack5 Selective repeat Selective repeat in action sender receiver sender window (N=4) sender receiver data from above: pkt n in [rcvbase, rcvbase+N-1] 012345678 012345678 012345678 012345678 send pkt0 send pkt1 send pkt2 send pkt3 receive pkt0, send ack0 receive pkt1, send ack1  if next available seq # in window, send pkt  send ACK(n) Xloss timeout(n):  out-of-order: buffer (wait)  resend pkt n, restart timer ACK(n) in [sendbase,sendbase+N]:  in-order: deliver (also deliver buffered, in-order pkts), advance window to next not-yet-received pkt 012345678 012345678 rcv ack0, send pkt4 rcv ack1, send pkt5 receive pkt3, buffer, send ack3  mark pkt n as received  if n smallest unACKed pkt, record ack3 arrived receive pkt4, buffer, send ack4 advance window base to next unACKed seq #  ACK(n) otherwise: 012345678 send pkt2 record ack4 arrived record ack5 arrived Selective repeat: dilemma sender window (after receipt) receiver window (after receipt) Chapter 3 outline example: 0123012 pkt0 0123012 pkt1 0123012 pkt2 0123012 0123012 0123012 3.1 transport-layer services 3.5 connection-oriented transport: TCP  seq #’s: 0, 1, 2, 3 0 1 2 3 0 1 2 pkt3 X 0 1 2 3 0 1 2 pkt0  window size=3 will accept packet with seq number 0 3. 2 3. 3 3. 4 multiplexing and demultiplexing • segment structure • reliable data transfer • flow control • connection management  receiver sees no difference in two scenarios! (a) no problem  duplicate data accepted as new in (b) receiver can’t see sender side. receiver behavior identical in both cases! something’s (very) wrong! connectionless transport: UDP Q: what relationship between seq # size and window size to avoid problem in (b)? 0123012 X 0123012 3.7 TCP congestion control pkt n in [rcvbase-N,rcvbase-1] pkt 2 timeout receive pkt5, buffer, send ack5  ignore 012345678 012345678 rcv pkt2; deliver pkt2, pkt3, pkt4, pkt5; send ack2 0 1 2 3 0 1 2 pkt0 0123012 pkt1 0123012 pkt2 0123012 principles of reliable data transfer 3.6 principles of congestion control timeout X retransmit pkt0 pkt0 (b) oops! 0 1 2 3 0 1 2 will accept packet with seq number 0 Transport Layer 3-53 Transport Layer 3-54 Transport Layer 3-55 Transport Layer 3-56 0 1 2 3 4 5 6 7 8 Q: what happens when ack2 arrives? TCP: Overview RFCs: 793,1122,1323, 2018, 2581 TCP segment structure  point-to-point: • one sender, one receiver  full duplex data: • bi-directional data flow URG: urgent data (generally not used) source port # dest port # sequence number counting by bytes of data (not segments!)  reliable, in-order byte stream: in same connection ACK: ACK # valid acknowledgement number • no “message boundaries”  connection-oriented: • handshaking (exchange RST, SYN, FIN: connection estab (setup, teardown checksum options (variable length)  pipelined: • TCP congestion and of control msgs) inits sender, receiver state before data exchange commands) flow control set window size  flow controlled: • sender will not Internet checksum (as in UDP) application data (variable length) TCP seq. numbers, ACKs TCP seq. numbers, ACKs sequence numbers: outgoing segment from sender • byte stream “number” of first byte in segment’s data sequence number acknowledgements: window size N Seq=42, ACK=79, data = ‘C’ • seq # of next byte expected from other side sender sequence number space host ACKs receipt of ‘C’, echoes back ‘C’ • cumulative ACK Q: how receiver handles host ACKs receipt Seq=79, ACK=43, data = ‘C’ out-of-order segments sent ACKed sent, not- yet ACKed (“in- flight”) usable but not yet sent not usable of echoed ‘C’ • A: TCP spec doesn’t say, - up to implementor incoming segment to sender • MSS: maximum segment size PSH: push data now (generally not used) head not UAPRSF len used receive window Urg data pointer # bytes rcvr willing to accept source port # dest port # Host A Host B acknowledgement number rwnd checksum urg pointer User types ‘C’ overwhelm receiver source port # dest port # sequence number simple telnet scenario acknowledgement number A rwnd checksum urg pointer Transport Layer 3-57 Transport Layer 3-58 Transport Layer 3-59 Transport Layer 3-60 32 bits Seq=43, ACK=80 RTT (milliseconds) RTT (milliseconds) TCP round trip time, timeout TCP round trip time, timeout Q: how to set TCP timeout value? Q: how to estimate RTT? EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT  exponential weighted moving average longer than RTT • but RTT varies  SampleRTT: measured time from segment transmission until ACK receipt  influence of past sample decreases exponentially fast  typical value:  = 0.125  too short: premature timeout, unnecessary retransmissions • ignore retransmissions  SampleRTT will vary, want RTT: gaia.cs.umass.edu to fantasia.eurecom.fr  too long: slow reaction to segment loss • average several recent measurements, not just current SampleRTT 200 TCP round trip time, timeout Chapter 3 outline  timeout interval: EstimatedRTT plus “safety margin” • large variation in EstimatedRTT -> larger safety margin
3.1 transport-layer services
3.5 connection-oriented transport: TCP
 estimate SampleRTT deviation from EstimatedRTT: DevRTT = (1-)*DevRTT +
3.2 multiplexing and demultiplexing
• segment structure
• reliable data transfer
• flow control
• connection management
estimated RTT “smoother”
250
*|SampleRTT-EstimatedRTT| (typically,  = 0.25)
3.3 connectionless transport: UDP
3.6 principles of congestion control
TimeoutInterval = EstimatedRTT + 4*DevRTT
3.4 principles of reliable data transfer
3.7 TCP congestion control
estimated RTT “safety margin” Transport Layer 3-63
Transport Layer 3-64
Transport Layer 3-61
time (seconds) Transport Layer 3-62 SampleRTT Estimated RTT
350
300
150
sampleRTT EstimatedRTT
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)

timeout
timeout
TCP reliable data transfer
TCP sender events:
 TCP creates rdt service on top of IP’s unreliable service
data rcvd from app:
timeout:
• pipelined segments
let’s initially consider simplified TCP sender:
 seq # is byte-stream number of first data bytein segment
 restart timer ack rcvd:
• cumulative acks
ifackacknowledges previously unacked segments
• single retransmission timer
• ignore duplicate acks • ignore flow control,
 start timer if not already running
 retransmissions triggered by:
congestion control
• think of timer as for oldest unacked segment
• update what is known to be ACKed
• timeout events • duplicate acks
• expiration interval: TimeOutInterval
• start timer if there are still unacked segments
TCP sender (simplified)
TCP: retransmission scenarios

pass segment to IP (i.e., “send”) NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
Seq=92, 8 bytes of data
Seq=92, 8 bytes of data Seq=100, 20 bytes of data
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
wait for event
start timer
ACK received, with ACK field value y
SendBase=100 SendBase=120
if (y > SendBase) {
SendBase = y
/* SendBase–1: last cumulatively ACKed byte */ if (there are currently not-yet-acked segments)
ACK=100
ACK=120
}
Transport Layer 3-67
start timer else stop timer
lost ACK scenario
premature timeout
Transport Layer 3-68
data received from application above create segment, seq. #: NextSeqNum
Host A Host B
Host A SendBase=92
Host B
timeout
ACK=100 ACK=120
retransmit not-yet-acked segment with smallest seq. #
start timer
Seq=92, 8 bytes of data
Seq=92, 8 bytes of data
Transport Layer 3-65
Transport Layer 3-66
 create segment with seq #
 retransmit segment that caused timeout
X
ACK=100
SendBase=120

timeout
timeout
TCP: retransmission scenarios
TCP ACK generation [RFC 1122, RFC 2581]
Host A Host B
Seq=92, 8 bytes of data Seq=100, 20 bytes of data
arrival of in-order segment with expected seq #. All data up to expected seq # already ACKed
delayed ACK. Wait up to 500ms
for next segment. If no next segment, send ACK
X ACK=100 ACK=120
arrival of in-order segment with expected seq #. One other segment has ACK pending
immediately send single cumulative ACK, ACKing both in-order segments
Seq=120, 15 bytes of data
arrival of out-of-order segment higher-than-expect seq. # . Gap detected
immediately send duplicate ACK, indicating seq. # of next expected byte
cumulative ACK
arrival of segment that partially or completely fills gap
immediate send ACK, provided that segment starts at lower end of gap
TCP fast retransmit
TCP fast retransmit
time-outperiod often relatively long:
Host A
Host B
• long delay before resending lost packet
if sender receives 3 ACKs for same data
Seq=92, 8 bytes of data Seq=100, 20 bytes of data
 detect lost segments via duplicate ACKs.
(“triple duplicate ACKs”), (“triple duplicate ACKs”),
X
• sender often sends many segments back- to-back
resend unacked segment with smallest seq #
ACK=100 ACK=100 ACK=100 ACK=100
• if segment is lost, there will likely be many duplicate ACKs.
 likely that unacked segment lost, so don’t wait for timeout
Seq=100, 20 bytes of data
TCP fast retransmit
Transport Layer 3-69
Transport Layer 3-70
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Transport Layer 3-72
event at receiver
TCP receiver action

Chapter 3 outline
TCP flow control
3.1 transport-layer services
3.5 connection-oriented transport: TCP
TCP socket receiver buffers
application OS
3.2 multiplexing and demultiplexing
• segment structure
• reliable data transfer
• flow control
• connection management
… slower than TCP receiver is delivering (sender is sending)
TCP code
3.3 connectionless transport: UDP
3.6 principles of congestion control
3.4 principles of reliable data transfer
flow control
IP code
TCP flow control
Chapter 3 outline
 receiver “advertises” free buffer space by including rwnd value in TCP header of receiver-to-sender segments
to application process
3.1 transport-layer services
3.5 connection-oriented transport: TCP
• RcvBuffer size set via socket options (typical default is 4096 bytes)
RcvBuffer
rwnd
buffered data free buffer space
3.2 multiplexing and demultiplexing
• segment structure
• reliable data transfer
• flow control
• connection management
• many operating systems autoadjust RcvBuffer
TCP segment payloads receiver-side buffering
3.4 principles of reliable data transfer
 sender limits amount of unacked (“in-flight”) data to receiver’s rwnd value
3.7 TCP congestion control
 guarantees receive buffer will not overflow
3.7 TCP congestion control
receiver controls sender, so sender won’t overflow receiver’s buffer by transmitting too much, too fast
Transport Layer 3-73
Transport Layer 3-74
Transport Layer 3-75
Transport Layer 3-76
3.3 connectionless transport: UDP
3.6 principles of congestion control
application may remove data from TCP socket buffers ….
application process
from sender
receiver protocol stack

Connection Management
Agreeing to establish a connection
before exchanging data, sender/receiver “handshake”:  agree to establish connection (each knowing the other willing
2-way handshake: Let’s talk
Q: will 2-way handshake always work in network?
variable delays retransmitted messages (e.g. req_conn(x)) due to message loss
to establish connection)
 agree on connection parameters
application
application
OK
ESTAB
 
connection state: ESTAB connection variables:
connection state: ESTAB connection Variables:
seq # client-to-server server-to-client
seq # client-to-server server-to-client
 
message reordering can’t “see” other side
rcvBuffer size at server,client
rcvBuffer size at server,client
choose x
req_conn(x) acc_conn(x)
network
network
ESTAB
Socket clientSocket = newSocket(“hostname”,”port
Socket connectionSocket = welcomeSocket.accept();
number”);
Agreeing to establish a connection
TCP 3-way handshake
2-way handshake failure scenarios:
choose x
choose x
LISTEN SYNSENT
choose init seq num, x send TCP SYN msg
LISTEN
retransmit req_conn(x)
retransmit req_conn(x)
choose init seq num, y send TCP SYNACK msg, acking SYN
SYN RCVD
ESTAB
ESTAB
data(x+1)
accept data(x+1)
SYNbit=1, Seq=y ACKbit=1; ACKnum=x+1
client terminates
server forgets x
client terminates
server req_conn(x) forgets x
this segment may contain client-to-server data
ACKbit=1, ACKnum=y+1
req_conn(x) acc_conn(x)
ESTAB
req_conn(x) acc_conn(x)
ESTAB
SYNbit=1, Seq=x
req_conn(x)
retransmit data(x+1)
received SYNACK(x) indicates server is live; send ACK for SYNACK;
connection x completes
connection x completes
ESTAB
half open connection! (no client!)
accept data(x+1)
ESTAB
data(x+1)
ESTAB
Transport Layer 3-77
Transport Layer 3-78
Transport Layer 3-79
Transport Layer 3-80
ESTAB
ESTAB
client state
server state
received ACK(y) indicates client is live
ESTAB

TCP 3-way handshake: FSM
TCP: closing a connection
client state
server state
3.1 transport-layer services
3.5 connection-oriented transport: TCP
ESTAB FIN_WAIT_1
ESTAB
FIN_WAIT_2
wait for server close
can still
connectionless transport: UDP
TIMED_WAIT
can no longer send data
CLOSED
Socket connectionSocket = welcomeSocket.accept();
 respond to received FIN with ACK
• on receiving FIN, ACK can be combined with own FIN
SYN(x)
number”);
 simultaneous FIN exchanges can be handled
SYNACK(seq=y,ACKnum=x+1) create new socket for communication back to client
listen
SYN(seq=x)
ACK(ACKnum=y+1)
ESTAB
SYNACK(seq=y,ACKnum=x+1) ACK(ACKnum=y+1)

SYN rcvd
SYN sent
TCP: closing a connection
Chapter 3 outline
clientSocket.close()
can no longer send but can receive data
FINbit=1, seq=x ACKbit=1; ACKnum=x+1
CLOSE_WAIT send data
3. 2 3. 3 3. 4
multiplexing and demultiplexing
• segment structure
• reliable data transfer
• flow control
• connection management
timed wait for 2*max
CLOSED
3.7 TCP congestion control
segment lifetime

Socket clientSocket = newSocket(“hostname”,”port
closed
 client, server each close their side of connection • send TCP segment with FIN bit = 1
FINbit=1, seq=y ACKbit=1; ACKnum=y+1
LAST_ACK
principles of reliable data transfer
3. 6
principles of congestion control
Transport Layer 3-81
Transport Layer 3-82
Transport Layer 3-83
Transport Layer 3-84

out
out
delay
Principles of congestion control
Causes/costs of congestion: scenario 1
congestion:
 informally: “too many sources sending too much
 two senders, two original data: in
throughput: out unlimited shared
data too fast for network to handle”
 different from flow control!
 manifestations:
output link buffers
• lost packets (buffer overflow at routers)
R/2
• long delays (queueing in router buffers)  a top-10 problem!
Causes/costs of congestion: scenario 2
Causes/costs of congestion: scenario 2
 one router, finite buffers
 sender retransmission of timed-out packet
idealization: perfect knowledge
R/2
• application-layer input = application-layer output: in =
 sender sends only when router buffers available
out
• transport-layer input includes retransmissions : 

in R/2
Host B
finite shared output link buffers
Host B
finite shared output link buffers
Host A
A
free buffer space!
in : original data
’in: original data, plus
out
copy
in : original data
’in: original data, plus
out
retransmitted data
retransmitted data

in
in
Transport Layer 3-85
Transport Layer 3-86
Transport Layer 3-87
Transport Layer 3-88
receivers
 one router, infinite buffers
 output link capacity: R
 no retransmission
Host A
Host B
in R/2
 maximum per-connection  large delays as arrival rate, in,
in R/2 throughput: R/2 approaches capacity

out
out out
Causes/costs of congestion: scenario 2
Causes/costs of congestion: scenario 2
Idealization: known loss packets can be lost, dropped at router due to full buffers
Idealization: known loss packets can be lost, dropped at router due to full buffers
R/2
 sender only resends if packet known to be lost
 sender only resends if packet known to be lost
when sending at R/2, some packets are retransmissions but asymptotic goodput is still R/2 (why?)
Host B
Host B
copy
in : original data
’in: original data, plus

out
in : original data
’in: original data, plus

out
A
no buffer space!
A
free buffer space!
retransmitted data
retransmitted data
Causes/costs of congestion: scenario 2
Causes/costs of congestion: scenario 2
Realistic: duplicates
 packets can be lost, dropped at
R/2
Realistic: duplicates
 packets can be lost, dropped at
R/2
router due to full buffers
when sending at R/2, some packets are retransmissions including duplicated that are delivered!
router due to full buffers
when sending at R/2, some packets are retransmissions including duplicated that are delivered!
 sender times out prematurely, sending two copies, both of which are delivered
in
R/2
 sender times out prematurely, sending two copies, both of which are delivered
in R/2
Host B
timecooutpy
in ’in
out
“costs” of congestion:
 more work (retrans) for given “goodput”
 unneeded retransmissions: link carries multiple copies of pkt
A
free buffer space!
Transport Layer 3-89
Transport Layer 3-90
Transport Layer 3-91
Transport Layer 3-92
• decreasing goodput
in
R/2

cwnd: TCP sender congestion window size
out
Causes/costs of congestion: scenario 3
Causes/costs of congestion: scenario 3
 four senders
 multihop paths
 timeout/retransmit
Q: what happens as in and in’ increase ?
C/2
Host D
 when packet dropped, any “upstream transmission capacity used for that packet was wasted!
Chapter 3 outline
TCP congestion control:
additive increase multiplicative decrease
3.1 transport-layer services
3.5 connection-oriented transport: TCP
 approach: sender increases transmission rate (window size), probing for usable bandwidth, until loss occurs
3.2 multiplexing and demultiplexing
• segment structure
• reliable data transfer
• flow control
• connection management
• additive increase: increase cwnd by 1 MSS every RTT until loss detected
3.3 connectionless transport: UDP
• multiplicative decrease: cut cwnd in half after loss additively increase window size …
3.4 principles of reliable data transfer
3.6 principles of congestion control
…. until loss occurs (then cut window in half)
Host A
in : original data
’in: original data, plus
out Host B
A: as red in’ increases, all arriving blue pkts at upper queue are dropped, blue throughput  0
retransmitted data
 ’ in
C/2
finite shared output link buffers
Host C
3.7 TCP congestion control
AIMD saw tooth behavior: probing for bandwidth
Transport Layer 3-93
Transport Layer 3-94
Transport Layer 3-95
Transport Layer 3-96
another “cost” of congestion:
time

RTT
TCP Congestion Control: details
TCP Slow Start
sender sequence number space
 when connection begins, increase rate exponentially until first loss event:
Host A
Host B
last byte ACKed
sent, not- yet ACKed (“in- flight”)
last byte sent
 roughly: send cwnd bytes, wait RTT for ACKS, then send more bytes
• initially cwnd = 1 MSS
cwnd
TCP sending rate:
 sender limits transmission:
• done by incrementing cwnd for every ACK received
LastByteSent-
LastByteAcked
< cwnd  cwnd is dynamic, function  summary: initial rate is slow but ramps up exponentially fast of perceived network congestion time TCP: detecting, reacting to loss TCP: switching from slow start to CA loss indicated by timeout: Q: when should the exponential increase switch to linear? • cwnd set to 1 MSS; • window then grows exponentially (as in slow start) to threshold, then grows linearly A: when cwnd gets to 1/2 of its value before timeout. loss indicated by 3 duplicate ACKs: TCP RENO • dup ACKs indicate network capable of delivering Implementation: some segments • cwnd is cut in half window then grows linearly  variable ssthresh  TCP T ahoe always sets cwnd to 1 (timeout or 3 duplicate acks)  on loss event, ssthresh is set to 1/2 of cwnd just before loss event rate ~ cwnd bytes/sec RTT • double cwnd every RTT Transport Layer 3-97 Transport Layer 3-98 Transport Layer 3-99 Transport Layer 3-100 Summary: TCP Congestion Control TCP throughput   W: window size (measured in bytes) where loss occurs • avg. window size (# in-flight bytes) is 3⁄4 W • avg. thruput is 3/4W per RTT cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0 slow start cwnd > ssthresh
congestion avoidance
timeout ssthresh = cwnd/2
timeout
ssthresh = cwnd/2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
duplicate ACK dupACKcount++
3 W
4 RTT bytes/sec
cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
avg TCP thruput =
dupACKcount == 3
cwnd = ssthresh dupACKcount = 0
dupACKcount == 3
ssthresh= cwnd/2 cwnd = ssthresh + 3 retransmit missing segment
fast recovery
ssthresh= cwnd/2
cwnd = ssthresh + 3 retransmit missing segment
duplicate ACK dupACKcount++
New A C K !
n e w A C K. cwnd = cwnd + MSS
New A C K !
 avg. TCP thruput as function of window size, R T T? • ignore slow start, assume always data to send
timeout
ssthresh = cwnd/2
New ACK!
new ACK
cwnd = cwnd+MSS
dupACKcount = 0
transmit new segment(s), as allowed
(MSS/cwnd) dupACKcount = 0
cwnd = 1
dupACKcount = 0 retransmit missing segment
New ACK
W W/2
TCP Futures: TCP over “long, fat pipes” example: 1500 byte segments, 100ms RTT, want
TCP Fairness
10 Gbps throughput
fairness goal: if K TCP sessions share same bottleneck link of bandwidth R, each should have average rate of R/K
 requires W = 83,333 in-flight segments
 throughput in terms of segment loss probability, L [Mathis 1997]:
TCP connection 1
TCP throughput = 1.22 . MSS RTT L

➜ to achieve 10 Gbps throughput, need a loss rate of L = 2·10-10 – a very small loss rate!
TCP connection 2
bottleneck router capacity R
 new versions of TCP for high-speed
transmit new segment(s), as allowed
duplicate ACK
cwnd = cwnd + MSS
transmit new segment(s), as allowed
Transport Layer 3-101
Transport Layer 3-102
Transport Layer 3-103
Transport Layer 3-104

Why is TCP fair?
Fairness (more)
two competing sessions:
 additive increase gives slope of 1, as throughout increases
 multiplicative decrease decreases throughput proportionally
Fairness and UDP
Fairness, parallel TCP connections
R
equal bandwidth share
• do not want rate throttled by congestion control
Connection 1 throughput R
• new app asks for 11 TCPs, gets R/2
loss: decrease window by factor of 2 congestion avoidance: additive increase
 instead use UDP:
• send audio/video at
 web browsers do this
 e.g., link of rate R with 9
Explicit Congestion Notification (ECN)
Chapter 3: summary
network-assisted congestion control:

principles behind transport layer services:
 two bits in IP header (ToS field) marked by network router to indicate congestion
• multiplexing, demultiplexing
next:
 congestion indication carried to receiving host
 leaving the network “edge” (application, transport layers)
 receiver (seeing congestion indication in IP datagram) ) sets ECE bit on receiver-to-sender ACK segment to notify sender of congestion
• reliable data transfer
• flow control
• congestion control
 into the network “core”
source
TCP ACK segment
destination
application transport network link physical
ECE=1
application transport network link physical

instantiation, implementation in the Internet
 two network layer chapters:
IP datagram
Transport Layer 3-107
Transport Layer 3-108
ECN=00
ECN=11
loss: decrease window by factor of 2 congestion avoidance: additive increase
constant rate, tolerate packet loss
existing connections:
Transport Layer 3-105
Transport Layer 3-106
 multimedia apps often do not use TCP
 application can open multiple parallel connections between two hosts
• UDP • TCP
• data plane
• control plane
• new app asks for 1 TCP, gets rate R/10