COMP3310/6331 – #12
Transport Layer: TCP/UDP intro
Dr Markus Buchhorn: markus.buchhorn@anu.edu.au
Where are we?
• Moving further up
Application Presentation
Session Transport
Network (IPv4, v6) Link (Ethernet, WiFi, …)
Physical (Cables, Space and Bits)
Messages
Segments
Packets Frames
Bits
2
Ignore the network, focus on applications
HTTP
Browser (1)
Net(1)
TCP, UDP, …
IP
Server (2)
Transport (1)
Transport (2)
Net(2)
Link(2)
Physical (2)
WiFi
Air
20 different LAN links
Ethernet
Copper
Link(1)
Physical (1)
Applications don’t know nor care. Unless there is a performance question.
3
Getting into the transport layer
• Leave all the packet to-and-fro to the network layer • Everything here is a payload for IP packets
– A Segment
• Offers rich functionality (or not) to Applications
– Reliability, performance, security, and other quality measures – on unreliable IP
• Routers and other network devices do not get in the way
– They (should) only look at ‘the envelope’ of a message, not the messages – This is pure host-to-host.
Ethernet 802.3
IP
Transport T’port Payload
4
Simple client/server model
• Servers offer something,
• Clients connect – Send a request – Server replies
• Servers can handle multiple clients
• Model breaks in p2p applications – everyone is both.
5
Transport Services
• What common application needs are there? • Main decision:
– Reliable – everything has to arrive bit-perfect.
• Transport layer repairs packet loss, mis-ordering (and other damage) • I can wait!
– Unreliable
• Don’t care about eventual perfection,
• Do care about performance, simplicity, …
• Two types of communication
– Messages: self-contained command and response (post office)
– Byte-stream: generic flow of bytes, chunked into segments (conversation)
6
Which does what?
Unreliable
Reliable
Messages
UDP (datagrams)
Byte-stream
TCP (Streams)
• Could have reliable messages – but can build that on top of TCP
• Could have unreliable byte-streams – but that looks like UDP
Transmission Control Protocol: TCP = IP Protocol 6 User Datagram Protocol: UDP = IP Protocol 17
ICMP = 1, IGMP = 2, IPv6 encapsulation = 41, 130+ more
7
Compare them
TCP
UDP
Connection-oriented
(significant state in transport layer @host)
Connectionless
(minimal state in transport layer)
Delivers BYTES: once, reliably, in order
(to your process)
Delivers MESSAGES: 0-n times, any order
Any number of bytes (in a stream)
Fixed message size
Flow control (sender/receiver negotiate)
Don’t care
Congestion control (sender/network negotiate)
Don’t care
• UDP is an enhanced IP packet
• TCP is a lifestyle choice – many features
8
IP Multicast: UDP
A
Connectionless, maybe time-sensitive
Replica packets are fine!
1
A
A
2 3
A
A
4 A56
A
9
Ports
IP is “host-to-host”
Applications are process-to-process
Process #5
Process #1
Process #2
Process #3
Process #4
Operating System
Net. Interface
Port: 16bit identifier(s) for a process,
on a host, on an interface,
at each end
10
Well-known (and other) ports
• https://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml • Opening ports below 1024 requires extra privileges
20,21
ftp
File transfer
22
ssh
Secure shell
25
smtp
Email – outbound
80
http
Web
110
pop3
Email – inbound
143
imap
Email – inbound
443
https
Secure-Web
11
A Port is just a start
• Inetd/xinetd
– Don’t continually run every server-service somebody may eventually talk to – Single service, launch appropriate service on demand
– Listens to all (registered) ports and protocols (tcp, udp)
– Spawns the service to have the conversation
• Port mapping
– (e.g. remote procedure calls, bittorrent, …) – Listen on a well-known port
– Accept connections
– Redirect them to a spawned service on another port • Services can register with the portmapper
12
NAT is actually NAPT
• NAT has everyone ‘hiding’ behind a single public IP address
• But everyone wants access to/from the Internet at the same time
• So translate addresses and ports
• Router maintains a table
– Dynamically for outbound. Can be static for inbound.
10.0.0.2 10.0.0.3
“150.203.56.99:7880 = 10.0.0.2:80” “150.203.56.99:7881 = 10.0.0.4:80”
10.0.0.1
150.203.56.99
Internet
10.0.0.4 10.0.0.5
13
UDP
Ethernet 802.3
IP
UDP UDP Payload
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
Source Port
Destination Port
Length
Checksum
Payload (…)
• UDP adds to IP: Ports, payload length and a Checksum • And nothing else…
14
Byte-streams
• TCP segments carry chunks of a byte-stream – “Message” boundaries are not preserved
• Sender packetises (eventually) on write()
– Multiple writes can be one packet and vice-versa – buffer dependent
A
DE
F
BC
…ABCDEF…
• Receiver unpacks
– Applications read() a stream of bytes
• Hence: Segments
15
TCP segment format
Reliability
Ports
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
Source port
Destination Port
Sequence Number
Acknowledgement Number
TCP Header Length
CWR
ECE
URG
ACK
PSH
RST
SYN
FI N
Window Size
Checksum
Urgent Pointer
Options (0 or more)
Payload (optional…)
Performance
Acknowledgement
Start&Stop
16
TCP Options
• These actually get used…
• Maximum Segment Size: how much each end is willing to take
• Window Scale: When 64kB is not enough – multiply
• Timestamp: For computing rtt and expanding sequence number space • Selective Acknowledgement: Like ACK, but better.
17
Programming connections
• “Socket” programming – an address, a port, and a need to communicate • Connections are identified in the Operating System by a ‘5-tuple’
– source/destination ip, source/destination port, protocol
Host 1
Host 2
connect
request reply
Standard conversation
disconnect
18
Socket API
Primitive (function)
What it does
SOCKET
Create an object/descriptor
BIND
Attach a local address and port
LISTEN (tcp)
Tell network layer to get ready
ACCEPT (tcp)
Be ready!
CONNECT (tcp)
… Connect …
SEND(tcp) or SENDTO(udp)
… Send …
RECEIVE(tcp) or RECEIVEFROM(udp)
… Receive …
CLOSE
Release the connection/socket
19
So…
• Server needs to be prepared for connections • Client initiates the connection
Host 1 Client
1. socket() 5. connect() *
7. send() 8. recv() *
10. close()
Host 2
Server1. socket() 2. bind()
3. listen()
4. accept() *
6. recv() * 9. send() 10. close()
* = call blocks
connect
request reply
disconnect
20
TCP and reliability
• TCP is a reliable, bidirectional byte-stream
– Uses Sequence Numbers and Acknowledgements to provide reliability
– Piggybacks control information on data segments in reverse direction • If there’s no data, just sends feedback
• Sequence numbers: N-bit counter that wraps (e.g. …,253, 254, 255, 0, 1, 2…) – Byte count (pointer) in a stream – a cumulative ACK
– Can wrap quickly on high-speed links (232 = 4GB) – can use timestamps too
– Does not start from zero (for security)
• Acknowledgements: Which bytes have been received/is expected
21
Getting connected – 3 way handshake
• TCP is full-duplex = two simplex paths – Both need to start together(*)
• Synchronise Sequence numbers in both directions • Connecting
– Receiving transport stack decides:
• anybody listen()ing on that port?
– If not, ReSeT
– If yes, passed to receiving process listen()ing,
– Transport stack ACKnowledges
– Originator ACKs that SYN/ACK • and off they go
22
Hanging up
• Both need to end together – Ideally…
– Time to flush buffers
• Disconnecting
– One side initiates close()
– Triggers a FIN(alise)
– Other side ACKs and FINs too
• And if FIN is lost? Resend…
23
Socket states:
State
LISTEN
ESTABLISHED
SYN_SENT
SYN_RECV
LAST_ACK
CLOSE_WAIT
TIME_WAIT
CLOSED
CLOSING
FIN_WAIT1
FIN_WAIT2
Description
Accepting connections
Connection up and passing data
Waiting for reply from remote endpoint
Session requested by remote, for a listen()ing socket
Closed; remote shut down; waiting for a final ACK
Remote shut down; kernel waiting for application to close() socket
Socket is waiting after close() for any packets left on the network
Socket is being cleared
Our socket shut; remote shut; not all data has been ACK’ed
We sent FIN, waiting on ACK
We sent FIN, got ACK, waiting on their FIN
24
% netstat -n
What’s happening on my machine?
25
TCP Sliding Windows
• Want reliability and throughput (of course!) • Start with ARQ – stop-and-wait
– Single segment outstanding = problem on high bandwidth*delay networks
• Say one-way-delay=50ms so round-trip-time (RTT)=2d=100ms • Single segment per RTT = 10 packets/s
• Typical packet ? Say 1000 bytes = ~10,000 bits -> 100kb/s • Even if bandwidth goes up, throughput doesn’t!
26
TCP Sliding Windows
• Allow W segments to be ‘outstanding’ (unACKed) per RTT – Fill a pipeline/conveyor-belt with segments
1 N 21
• Set up a ‘window’ of W segments
• W=2*Bandwidth*delay
• At 100Mb/s, delay=50ms means W=10Mb
– Assuming same 10kb segments, W=1000 segments – 500 are out there somewhere!
27
Sliding Window approach
Sender buffers up W segments until they are ACKed Seq# Available
Ack
ed
Un-Acked
Waiting
Window not full, so send a packet
W=5
Acked
Un-Acked
Waiting
Packet ACKed, so Window not full
Available
Acked
Un-Acked
Waiting
28
If(lost) then: ARQ – “Go Back N”
• Receiver buffers just a single segment
• If it’s the next one in sequence, ACK it, everyone happy • If it’s not, drop it,
• Let sender retransmit what I’m actually waiting for
• Sender has a single timer. After timeout, resend (all) from (first) ACK-less. • Really simple, but somewhat inefficient
1
2
3
4
5
29
ARQ – “Selective Repeat”
• Receiverbuffersmanysegments – Reduceretransmissions
• ACKwhathasbeenreceivedinorder
• And also ACK received segments that aren’t
– Anygapsindicatesmissingsegment!
– SelectiveACK(SACK)
– TCPheaderhasanACKflag(1bit),andaSACKOption(32bits…) – 3 duplicate ACKs (plus SACKs) trigger resend
• SenderhasatimerperunACKed-segment – Aseachtimerexpires,resendthatsegment
• Copewith(some)misordering.Waymoreefficient,nowwidespread
1
2
3
4
5
30
Everybody runs the same TCP…?
• No. There is no single TCP stack
• Many years of various optimisations, experiments, algorithms, …
– Suited to various circumstances
– And as vulnerabilities have been found and mitigated (and found and …)
• Doesn’t impact the network, only hosts, so you can do what you want…
31