COMP3310/6331 – #10-11
Network Layer: IPv4/v6
Dr Markus Buchhorn: markus.buchhorn@anu.edu.au
Where are we?
• Moving further up
Application Presentation
Session Transport
Network
Link (Ethernet, WiFi, …)
Physical (Cables, Space and Bits)
Messages
Segments
Packets Frames
Bits
2
Remember circuits?
• POTS/PSTN
• Set up (and tear-down)
• Guaranteed channel
– But inefficient
– Solid block-out for duration of a conversation
3
After Circuit switched…
• MESSAGE switched:
– Postal Service:
• Put message in container, • Address it
• Put in the network
– Network (hopefully) takes care of delivery – “Store-and-forward” – hold entire message
• Examine container at each hop before forwarding. – Message loss is a large problem.
• Less than a circuit. Failure along the path is flagged from that point
– Potentially High Latency
• Each hop takes time, especially for large messages
M
M
M
4
After Message Switched…
• PACKET switched
– Put fragments of message in multiple containers – Address them all, and put them on the network
1 2 3
– Network (hopefully) takes care of delivery
• Examine each container at each hop before forwarding.
– Acknowledgement happen sooner, more frequently • Packet loss is more tolerable
– Recovery is a smaller effort
– Still high latency
• Each hop takes time, plus overheads • But less than for large messages
– Not waiting for each hop
– And much better sharing of capacity
1 2 3
5
Role of the Network layer
• Each consumes services (functionality)
from the layer below
• Each offers services (functionality) to layer above
• Which functions belong in – Link/Physical
– Transport
– Network?
Application Transport Network Link, Physical
6
Role of the (IP) Network layer
• Simplest common functions – Across many/all link types
– Just a glue layer
Application
Transport
Network
Link, Physical
1. 2. 3.
End-to-end delivery of packets Global addressing
Cope with evolving network topology
• Consume little from lower layer • Offer little to higher layers
7
Hiding path complexity HTTP
Browser (1)
Net(1)
TCP
IP
Server (2)
Transport (1)
Transport (2)
Net(2)
Link(2)
Physical (2)
WiFi
Air
20 different LAN links
Ethernet
Copper
Link(1)
Physical (1)
Applications don’t know nor care. Unless there is a performance question.
8
Guiding principles
• Simplicity and end-to-end design
– Keep connection/conversation state at the edge, not in the network
• Provide ‘best-effort’ delivery
– Minimal Service Level Agreement (SLA)
• Ensure ‘reliability’ is delivered (only) where it is needed
– For specific application needs: file transfers, audio calls, …
– Can be done at different layers • Link (vlan, …), Transport (tcp, …), • Network layer? (mpls, …)
9
Connectionless vs Connection-oriented • What guarantees does your application need?
• Which layer provides it? • Connectionless, packets
– Go where the network chooses, in realtime
• Connection-oriented, circuits
– In a packet-switched world – ‘virtual’ circuits.
– Go where I configured the network to send them
• Need to understand how packets get sent towards their destination
10
Packet Forwarding and Routing A
Router 1: Next Hops (now)
Target
Port
Server
A
Printer
B
Source
2
All Routers:
1
B
Forwarding Table
Separation of control plane (global) and data plane (local)
3 4
Routing
Target Target
Port
Port
Target
Port
A B
Printer
11
5
6
Server
T
arget Port
Server A Server
A B
Server
X Y
Server Printer B
Printer Printer
Printer
Multi-path packet forwarding
• Statistical multiplexing
• Unpredictable ordering
• Variable delays
• No guarantees
• Receiver’s problem to deal
DCBA
1
C
2
with order, loss, jitter
A bunch of independent decision makers…
A B3D
4
5
CABD
6
12
Circuits in a Packet world??
Circuit ID
Payload
Circuit
Port
X
A
Y
B
Source
A
1
B
2 3
4
Circuit Forwarding Table
Source
Target
Server
Printer
Server
Port
A
B
Payload
Packet Forwarding Table
5
6
Server
13
Circuits over packets
• Why???
– Guaranteed path
– Guaranteed (maybe) bandwidth/performance
• How?
– Circuit set-up and tear-down: manual, or on-demand
– Packets: More encapsulation – IPinIP, Multi-Protocol Label Switching (MPLS)
IP
TCP HTTP
IP/ Label
Ethernet 802.3
14
1
A
2 3
4
6
Circuits vs packets
B
5
Packets
Circuits
Path router control
Not needed
Required
Prior Setup
Nothing needed
Required
Router State
Per destination
Per circuit configuration
Addressing
Packets carry full src/dest
Packets carry short label
Forwarding
Per packet
Per circuit
Router failure
Packets lost, reroute
Circuit fails completely
Quality of service
Difficult
Easy(*)
Security
Per-packet, other layers
Maybe…
15
(The) Internet design
• Standards: the Internet Engineering Task Force (IETF.org)
– Just a bunch of people, arguing. Not a company, no board, no members – Lots of Working Groups, under Areas
– Work revolves around ‘drafts’ and ‘request for comments’ (RFC)
• RFC
– Strict rules about structure, references, and language (MUST/SHOULD/MAY)
– Standards Track or
• Best Current Practice, Informational, Experimental, Historic (Lost interest or Detrimental)
– Locked down on publication. Regularly Obsoleted or Updated
– RFC-0001: April 1969, RFC-8571: March 2019 • Watch out for 1 April RFCs…
16
Taming the crowds
• IETF needs some structure:
– ISOC: Internet Society – international, non-profit, legal entity
– IESG: Internet Engineering Steering Group – oversees IETF processes, signs off. – IAB: Internet Architecture Board – Big picture view, identify/review issues
– IRTF: Internet Research Task Force – Researches issues… • Overseen by IRSG: Internet Research Steering Group
– IANA: Internet Assigned Number Authority – Directory keeper
• Contracted to ICANN: Internet Corporation for Assigned Names and Numbers
– RFC Editor
17
IETF “principles”
• End-to-end…
• “Rough consensus and running code”
• “Be conservative in what you send, liberal in what you accept”
• Simplicity, clarity,
– Fight feature creep, use other layers, offer just one way to do something – Don’t hardwire too much, let it be negotiated
– Aim for good, broad design; let others deal with edge-cases
• Think about scalability, non-linearity, heterogeneity, cost – Law of Unintended Consequences
– https://www.rfc-editor.org/rfc/rfc3439.txt (updates RFC1958)
18
Introducing… The Internet Protocol!
Read left-to-right, top to bottom
32 bits
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
Version
IHL
Diff. Serv.
ECN
Total Length
Identification
DF
MF
Fragment Offset
Time to Live
Protocol
Header Checksum
Source Address
Destination Address
Options (0 or more words)
Payload (…)
19
Key fields
IPv4
Total Length (8-bit bytes) Internet Header Length (32-bit words)
Addresses (32-bit)
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
Version
IHL
Diff. Serv.
ECN
Total Length
Identification
DF
MF
Fragment Offset
Time to Live
Protocol
Header Checksum
Source Address
Hop count Demultiplexing
Destination Address Options (0 or more words)
Payload (…)
Updated every hop
20
IP addressing
• 32-bits = 232 hosts = ~4billion – in theory • Written in ‘dotted-quad’ notation
– i.e. four numbers, separated by dots
• 11010101|11110000|10101010|00001111
• 213.240.170.15
• Not a host, but an interface
– 1IP=1MACmostofthetime…
21
IP prefixes
• Aggregate ‘nearby’ addresses into a block for routing (tables)
• A block of addresses is described by its prefix
• Split 32-bits into network and host components using upper L bits:
32 bits
L bits 32-L bits
Network address (same top L bits)
Host addresses (232-L)
Fixed
Variable
22
IP prefixes
• Use a ‘/’ (‘slash’) notation:
• Network address/prefixlength:
– Network address is the first ‘host’ in the host-range
• For example: 150.203.0.0/16
– From 150.203.0.0 up to 150.203.255.255
– 32 bits, using 16 for the prefix: 232-16 = 216 = 65,536 addresses
• A “/24” has 256 addresses [e.g. 150.203.0.0/24 = 150.203.0.0-150.203.0.255] • A “/30” has 4 addresses
23
IP subnets
• Network address/prefixlength
• Network address = a subnet (a block of contiguous host addresses) • Prefix length = a subnet mask
• For example: 150.203.10.0/24
– The 150.203.10.0 subnet
– /24 = 24-bit network id so mask= 24 1’s and 8 0’s: 255.255.255.0
24
IP address classes
• Largelyhistorical,butstillcommonlanguage • ClassA:/8
– Firstbyte:1-126 • ClassB:/16
– Firstbyte:128-191 • ClassC:/24
– Firstbyte:192-223 • ClassD:
– Firstbyte:224-239
• ClassE:
– Firstbyte:240-255
0
7-bit Network ID
Host ID (24-bits) [16M]
1
0
14-bit Network ID
Host ID (16-bits) [64k]
1
1
0
21-bit Network ID
Host ID (8-bits) [256]
1
1
1
0
IP Multicast group
1
1
1
1
(no longer) Experimental
25
More or Less?
• A “More-specific” prefix = longer prefix = fewer hosts • A “Less-specific” prefix = shorter prefix = more hosts
/0 /8 /16
232 224 216
/24 /32
28 20
More
# Host addresses
Less
Specific
26
Forwarding, by prefix
Router 1 Forwarding table
Target
Next Hop
150.99.10.0/24
Router 5
150.203.0.0/16
Router 4
Source
A
1
B
2 3
4
Printer: 150.99.10.12
27
5
6
Server: 150.203.18.27
Forwarding by longest matching prefix
• Prefixes in a forwarding table are allowed to overlap
– For good reasons!
– Aggregation benefit of hierarchical addressing (e.g. a /20 holds sixteen /24’s) – As well as flexibility to direct some specific traffic
• Longest matching prefix rule:
1. Foreachpacket,identifyallsubnetprefixesthatapply
2. Selecttheonewiththelongestmatchingprefix • The ‘most specific’
3. Forwardaccordinglytothenexthop
28
Longest Matching Prefix…
Router Forwarding Table
150.203.255.255
Target
150.203.0.0/16
150.203.10.0/24
• 150.203.8.99 goes to … Router 5
• 150.203.100.6 goes to … Router 5 • 150.203.10.200 goes to … Router 4
Next Hop
Router 5
Router 4
150.203.10.255
150.203.10.0
150.203.0.0
29
Why?
• Provide default behaviour with shorter (less-specific) prefixes – Catches more host-addresses in a single block
• Support specialised behaviour with longer (more-specific) prefixes
– Key services to be reached via • Higher performance paths
• Lower cost paths
• More secure paths
• … (policy reasons)
• Hierarchy generates more compact forwarding tables on routers – Cost of lookups vs simple tables is largely optimised away now
30
Hosts as routers?
• How does a host decide how to send a packet?
– Assume it has learnt the destination IP address from somewhere
• Hosts are not good routers – keep it simple, let routers route!
• Two types of destination
– On my LAN – use LAN services – Beyond the LAN – use a router
Internet
31
Host forwarding table
• How to decide?
• Longest matching prefix plus a catch-all address: • 0.0.0.0/0 = ‘the whole internet’
• Host knows its IP address and its prefix (subnet mask): – “I’m 150.203.56.99 and I’m on a /24”
– So my network is 150.203.56.0/24
Longest matching prefix
…which is also on my LAN
Target
Next Hop
150.203.56.0/24
Direct on my LAN
0.0.0.0/0
My (default) Router
150.203.56.1
32
My Forwarding table
• Lots of interfaces • Default route
– My router
• LAN route
– 192.168.178/24 – My subnet
– All “On-link”
• All learned routes
– No static or “persistent” routes
33
Home on the LAN
• Network-Layer:
– Hey, I’m on the same Ethernet as my target, I can just send this packet directly – Link layer, send this to IP-address 150.203.56.99
• Link-Layer:
– What’s an IP address????
– I need a MAC (Link Layer) address – Network-layer won’t help me
• Need to cross layers. Need some kind of Address Resolution Protocol
Source MAC
Dest MAC?
Source IP
Dest IP
Payload
34
The Address Resolution Protocol (ARP)
• RFC 826 (and updates)
• Mapping IP addresses to Ethernet/etc. hardware addresses
– Not an IP packet – Link Layer
• A wants to send IP packet to C
– Send a Link layer broadcast
• Src MAC = AA:AA:AA:AA:AA:AA • Dest MAC = FF:FF:FF:FF:FF:FF
• I am/Tell 150.203.56.88
• Who has 150.203.56.99?
150.203.56.88 150.203.56.77 AB
C 150.203.56.99
R
35
The Address Resolution Protocol
• B receives the broadcast – And ignores it…
• C receives the broadcast
– And replies (Link layer)
• Src MAC = CC:CC:CC:CC:CC:CC • Dst MAC = AA:AA:AA:AA:AA:AA
• Dear 150.203.56.88
• I am 150.203.56.99
• A gets the reply
– And can now send directly
150.203.56.88 AB
C 150.203.56.99
150.203.56.77
R
36
The Address Resolution Protocol
• Some optimisations:
– Caching, with timeouts
150.203.56.88 – Catch passing ARP information A
• B doesn’t ignore the broadcast
– Tell everyone of your changes
• Gratuitous ARP – “look for yourself” • Also helps find duplicate IPs
• Also applies to packets going to/through the router
– Need MAC address of R
150.203.56.77 B
C 150.203.56.99
R
ARP is a simple Discovery Protocol 37
ARP cache
• Dynamic – Learned
• Static
– Configured
• 2 interfaces
– WiFi, Ethernet
• 192.168.178/24 subnet
• Some special addresses
38
Getting addresses (blocks) for your network
• Need to consider
– Globally-unique allocation
– Routing aggregation opportunities – Politics
• Need an authority, which scales
EU
AP
Af
NA
L/SA
Regional Internet Registries
ISP
IANA
ICANN
Consumers
39
Regional Internet Registries
40
Addresses are not equal
• IPv4 – 232 addresses – can’t use all of them. • Special allocations
– Private Networks: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16
• Can be used on networks (that are/are NOT) connected directly to the Internet
– IP Multicasting: 224.0.0.0/4 [old class-D] • Distribute packets to groups of subscribers
• Requires additional services on the network
– Experimental: 240.0.0.0/4 [old class-E, 200M addresses!] • Still waiting for an experiment
• Most OS will drop such packets
41
Addresses are not equal (2)
• Special networks:
– This host on this network: 0.0.0.0/8
• Only used as a source address
• Used for ‘any interface’ or ‘I do not know’
– Local interface: 127.0.0.0/8
• Loopback interface: 127.0.0.1
– Link-local: 169.254.0.0/16 • My LAN when all else fails
– Broadcast: 255.255.255.255
• Specific address for a global broadcast
– In theory…
42
Address conventions
• Subnet broadcast: A.B.C.255
– All ones in the host field (.255 for /24, .255.255 for /16)
• The subnet: A.B.C.0
– Aka “the wire”, usually followed by /n – All zeroes in the host field
• Router/gateway: A.B.C.1
– A convention. Makes it easy to find
• Note: Host field is shorter than 8 bits, for prefixes >24
– 150.203.56.0/28: (Sub)Netmask=255.255.255.240, Broadcast=150.203.56.15!
43
Getting feedback from the Internet
• Sometimes things happen to packets
– Along the route
– Loss, corruption, mistakes, …
• Wrong addresses, nobody home, packet malformed, …
• Sender needs to know what happened
– With little/no feedback from receiver
– Retransmission may be wrong
• Don’t keep making the same mistake: “Internet says NO!”
• Internet Protocol needs some Control Messages
44
Internet Control Message Protocol (ICMP)
• IP packet – protocol #1
• Designed mainly for routers to inform senders (including routers)
– Senders listen, but don’t (usually) send
• ‘Type’: category. ‘Code’: actual problem/question/answer
• Data=
– Header of packet that caused the problem, and 8+ bytes of payload – Other information related to the problem/request
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
Type
Code
Checksum
Data
45
ICMP Types
• Type 0,8 – ICMP Echo
• Type 3 – Destination Unreachable
– Many reasons, at intermediate routers, final router, host • Type 4 – Source Quench
– Please Slow Down! (deprecated) • Type 5 – Redirect
– Looks elsewhere
• Type 9,10 – Router discovery
• Type 11 – Time exceeded
– E.g. TTL has hit zero
• Type 12 – Bad header
• Type 13,14 – Timestamp
• Type 15+ – Deprecated, Experimental, Unallocated and Reserved
46
ICMP use by hosts?
• Ping
• Send an ICMP Echo-request (Type 8/Code 0) to an IP address
• If received, receiver sends back an ICMP Echo-reply (Type 0/0)
• Useful for testing (many options) – and for probing…
47
Traceroute
• Identify all the routers through which your packets are going (now) • Use ‘TTL decrement’ and ‘ICMP Time Exceeded’ (Type 11/0)
– Replies include IP of router that hit zero
TTL=1 TTL=2 TTL=3 TTL=4 TTL=5
ICMP TTL expired (11/0) at router X …
48
Traceroute
• Really useful to identify path, intermediate devices and distances • And to probe internal networks…
3 attempts each hop
RTT increases in jumps (within variations)
#13 is a bit shy…?
49
Big packets
• Bigger packets are more efficient, but can be ‘too big’.
• What’s a ‘big’ packet?
• Something bigger than the payload of your LAN – Maximum Transmission Unit (MTU)
– Ethernet: 1500bytes, WiFi: 2300bytes
– Leads to Fragmentation
MTU=1500
MTU=2300
MTU=1000 MTU=800
1000 800
1400 bytes 1400
400
200
400
50
IP Fragmentation
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
Version
IHL
Diff. Serv.
ECN
Total Length
Identification
DF
MF
Fragment Offset
Time to Live
Protocol
Header Checksum
Source Address
Destination Address
Options (0 or more words)
Payload (…)
• Identification = key to identify a packet uniquely
• MF = More Fragments (flag)
• DF = Don’t Fragment (flag)
• Total Length = of this packet
• Fragment Offset = position within original 51
Router Fragmentation Process
• Incoming packet of size > outbound MTU
• Split packet into (large) new packets
• Copy IP Header to each new packet – including the Identification
• Adjust Length field for each packet – And Checksum, and TTL
• Set Offset to identify location within overall packet • Set MF flag on all packets, except the last one
• Receiver collects all fragment-packets and reassembles
52
Fragmentation example
MTU=1500
1400 bytes
MTU=2300 MTU=1000 MTU=800
1400 1000 800
Length=400 Offset=1000 MF=0
400
200
400
Length=800 Offset=0 MF=1
Length=1000 Offset=0 MF=1
Length=200 Offset=800 MF=1
Length=400 Offset=1000 MF=0
53
It works, but…
• Has been used since the beginning of IP, and works well • Creates performance issues
– More work for routers and receivers
– Increased probability of (total) packet loss
• No retransmission of fragments
– Security issues
• Easier to hide malicious traffic
• Harder for Deep Packet Inspection
54
Better approach
• Test the network and send the smallest big-packet you can • Path MTU Discovery
• Looks like traceroute – but use packet sizes and DF=1
MTU-1 MTU-2 MTU-3 MTU-4 MTU-5
ICMP Destination unreachable (Type 3) Fragmentation required, and DF flag set (Code 4) Data = next-hop MTU
55
IP Multicast
A
Packets sent to all group subscribers (224/8) – sender sends once
Internet Group Membership Protocol (IGMP)
A A
A
1
4
A A56
Source
2 3
A
A
A
56
IPv6!
• When IPv4 just won’t do it anymore
• IPv4 designed in a smaller, more scalable and way more trusting world
• Neverconsideredplanetarywideparticipation
• Neverconsideredmajorinfrastructurerole
• NeverconsideredIoT,smartdevices,mobility,…
• Neverconsideredbadpeople
• Wenowhaveaproblem – Several
– Butespeciallytheneedformorethan4billiondevices
57
IPv4 is over
• A common catchphrase
• However…
• Addresses are largely exhausted
– RIR’s ran out 2011-2015
– Lots of wasted address space
– Re-allocating ever-smaller chunks (/30) • With tighter rules
– Can’t aggregate address blocks for routing • Forwarding tables are immense
58
The routing problem
• Every router has a forwarding table – Fortunately only 10-100 interfaces
• Across the whole internet
– Lookup tables of ~1M entries
– Fuzzy matching on 32bit address – In under 5ns (100Gb/s)
• And 170k updates/day – 2 per second
59
The routing problem
Address blocks
are getting much harder to aggregate
60
IPv6
• New effort from ~1994
– Address exhaustion was long predicted – Note around the rise of WWW
• Standardised around 1998, OS support from 2000
• And till recently, limited effort
– 1983.1.1 Internet flag day: Comply or disappear. Can’t do that now!
– Hampered by deployment issues
– Lacking incentives
• Nobody does homework till there’s a deadline
• What do we get? – Bigger addresses – And other stuff…
61
IPv6 packets
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
Version
Traffic class
Flow label
Payload length
Next Header
Hop Limit
Source Address
(128-bit)
Destination Address
(128-bit)
Payload (…)
62
IPv6 addressing
• 128-bits = 296 more than IPv4
– 6×1023 per square meter on Earth
– A few thousand for every atom on the surface
• ‘Coloned hex-quad (with compression)’ – Instead of ‘dotted quad’
• 8 groups of four hexadecimals (8*16bits)
• For visuals, compress
1. Dropleadingzeroes
2. Dropconsecutivezeroblocks
1. 3018:0ae8:0000:0000:0000:ae00:0098:8ac2 2. 3018:ae8:0000:0000:0000:ae00:98:8ac2
3. 3018:ae8::ae00:98:8ac2
63
IPv6 address types and scopes
• Types:
– Unicast – to one
– Multicast – to a group
– Anycast – to the nearest in a group • Note – no broadcast!
• Scopes: (except multicast)
– Link-local – my subnet
– Site-local – my organisation/site – Global – everywhere
64
IPv6 address semantics – 1 example
Unicast-Global
001
TLA (13 bits)
Res (8 bits)
NLA (24 bits)
SLA (16 bits)
Interface (64 bits)
Public topology
Site topology Host
TLA: Top level aggregator – IANA-> global ISP NLA: Next level aggregator – global ISP->site SLA: Site level aggregator – site->subnets
Res: reserved
65
Moving to IPv6
• Will probably have both for another decade or more – around 10-20% now • Transitioning is a large problem…
– Bottom-up, top-down challenges – leaves islands of addressing • Dual stack (run both)
• Translate – convert IPv6 <-> IPv4
– But how do you handle those addresses
• Tunneling – IPv6 inside IPv4 – V4 is everywhere
66
IPv6 killer – NAT – Network Address Translation
• Use of private address spaces inside: – Homes
– Mobile networks – Organisations
• All ‘hiding’ behind a single public IP address 10.0.0.2 10.0.0.3
150.203.56.99
Internet
10.0.0.4 10.0.0.5
67
Still…
World IPv6 launch day World IPv6 day
68
Around the world
69