CS计算机代考程序代写 data structure database flex distributed system Excel algorithm COMP3310/6331 – #18

COMP3310/6331 – #18
Routing
Dr Markus Buchhorn: markus.buchhorn@anu.edu.au

The biggest application of all?
• Routing
– How do packets get across the Internet? – Without it there is no Internet
• Most complex, longest-running application ever? – Millions of devices
– Running 24/7
– Shifting Pb/s of traffic
– Dealing with multiple topology changes every second
• And crosses from technology to humanity – ‘Optimal’ for what or who
2

Packet Forwarding and Routing
Router 1: Next Hops
Target
Port
Server
A
Printer
B
Source
A
1
B
2 3
4
Forwarding Table
Routing
Server
Separation of control plane and data plane – globally
Target Target
Port
Port
Target
Port
A B
Printer
3
5
6
T
arget Port
Server A Server
A B
Server
X Y
Server Printer B
Printer Printer
Printer

Back into the network layer
• Distinction between forwarding and routing
– Local decisions vs global decisions
– Given a network with multiple paths/interfaces, which one do you send to?
• Focus on unicast routing
– Opposed to broadcast, multicast, anycast routing
unicast broadcast
multicast
anycast
4

Spanning Tree = routing?
• An wide-area view that spans a network
– Removes loops, establishes reliable paths
– But only runs at layer-2 (single-technology), and doesn’t scale
– Wastes paths
• No measure of ‘quality’ of a path
• Doesn’t use redundant paths when beneficial!
R
R
5

From local to global
• Locally find devices via ARP, but that doesn’t scale. • But then what on the WAN?
– LAN to WAN, a single default route? • Can have backup paths
– Routers advertise a prefix (subnet) – aggregation!
– Need to go from ‘enterprise’ networks up to global scale
Internet
6

Global routing is hard
• “Routing table” sizes – growing (1M+)
• Updates – growing (170k/day)
• Computing forwarding tables – growing
• Routers – used to be simple computers
– 100Gbps = one small IP packet every 5
nanoseconds.
– “Performing a lookup into a data structure of around one million entries for an imprecise match of a 32-bit value within 5 nanoseconds represents an extremely challenging silicon design problem.”
7

Carrier-grade routers
8

“Routing” at different timescales
• Routing = sending some traffic over some path at some time – It’s allocating bandwidth across the network
– While adapting to requirements and changing conditions
What you’re doing
How quickly
Because
Forwarding/ “load-sensitive” routing
Seconds
Bursts, Congestion
Routing
Minutes
Changes, failures
Traffic Engineering
Hours
Long-term load
Provisioning
Months
Customer demand
9

Expectations
• Routing has to work “right”, all the time
Expect
Because
Correctness
It has to get packets from A to B
Efficiency
Use available bandwidth well
Fairness
Don’t ignore capable network elements
Convergence
Recover quickly from any disturbances
Scalability
Copes with increasingly large and complex networks
10

Routing context – ideally:
• Decentralised, no controller, no hub – Up to a point…
• All nodes (routers) are alike
– Speak the same language, run the same algorithm, at the same time
• Learn through message exchanges with “neighbours” • Need to deal with router, link and message failures
11

What is the ‘best’ route?
F
• Various measures of ‘best’ – Latency (delay = distance) – Bandwidth (slow)
– Cost (money)
– Hops (forwarding delays)
• For a fixed topology
– Ignores link congestion – Ignores router load
G
E
A
H
B
D C
12

Shortest-path routing
• AKA “lowest-cost” path routing
• Associate some cost with each link
– You choose: $$, ms, hops, bps, …
– In each direction – can be asymmetric
• Add up the total end-to-end
• And try to minimise the total A
– If tied, pick one
4
F 2 1
G
10 4
3
3 4
E
2
H
C
B
D
2
2
3
13

Eyeball analysis
• Shortest (lowest cost) path A to E? • AE = 10
• ABFE = 9
• ABE = 8
• ABCE=7
4
F
2
E
G
10
3
3 4
2
A
4
B
1
D
H
C
2
2
3
14

Optimality property
• Sub-paths of the shortest path are themselves shortest paths
• ABCE = 7 = shortest path • So are
– AB, ABC – BC, BCE – CE
4
3
F
G
10
2
E
1
D
3
42
• If there’s a shorter sub-path, it’d be on the shortest total path
H
C
A
4
B
2
2
3
15

Sink (and source) trees
• Union of all shortest paths towards a node from each source
• Consider E’s sink tree.
– ABCE = shortest for A, B, C – Work out the rest
• Source tree = sink tree
G
10
4
4 F 3
2
E
3
– Usually, but doesn’t have to.
• Asymmetric costs => different trees • What happens if BC=2, CB=5?
2
2
A
B
42
1
D
H
C
3
16

So what?
• Regardless of where you start, routing decisions only considers destination
– Source is irrelevant (*)
• A,B,HgettoC–andthentoE
• Eachnodeonlyneedstoknow A next hop on the optimal path => forwarding table
• Forwardingtable=
– Next hop for every destination
4
3
F
2
E
G
3
4
B
D
10
42
1
2
H
3
C
2
17

Computing shortest paths
• Eyeballs are good, computers are better…
• (Edsger W.) Dijkstra’s algorithm, 1959
– Identifies source tree for a single source, when given the topology and costs – Uses Optimality Property
• Algorithm outline:
– Start at source node, mark all others as ‘tentative’
– Give source “0” node-cost, everyone else is “infinite” cost
– Loop: while (tentative nodes)
• Identify lowest-cost node, confirm it
• Add link to source tree
• Modify (‘relax’) other costs by distances you now know
18

Walkthrough – tree from A (calculated by A)

F2 3∞E
0 10 A
G∞ 3
4
42 4B∞1∞
∞2D
H3∞2 C
19

Walkthrough – tree from A
Neighbours of A get ‘relaxed’ (costs are reduced)
G∞
4

F2
3 10
3
42
E
0 10
4B41∞
A
∞2D
H3∞2 C
20

Walkthrough – tree from A
B is lowest node-cost, confirm AB

F2
3 10
3
42
4
G

E
0 10
4B41∞
A
∞2D
H3∞2 C
21

Walkthrough – tree from A
Add your own cost to neighbours
Now do neighbours of B 7F
7 4 2 G3
8(!)
0 10
3
42
E
4B41∞
A
∞2D
H362 C
22

Walkthrough – tree from A
C is next lowest cost after B, So confirm (B)C
Then do neighbours of C
7F
2
7 4 G
7(!!)
3 3
E
0 10 A
42 4B418
92D
H362 C
23

Walkthrough – tree from A
E,F,G are equal lowest-cost Pick one: confirm (B)G
Do neighbour(s) of G – no help
7F
2
7
E
42 4B418
7 4 G
3 3
0 10 A
92D
H362 C
24

Walkthrough – tree from A
E,F are equal lowest-cost Pick one: confirm F
Do neighbours of F – no help
4
3 4B418
92D
H362 C
7F 3
G
7
E
7
2 42
0 10 A
25

Walkthrough – tree from A
E is the lowest-cost, confirm (C)E Do neighbours of E – no help
And repeat for D – no changes And repeat for H – no changes
G
7
4
3E
7F27 42
3 A4B418
0 10 92D
H362 C
26

Dijkstra
• Works out from the source
– And you need to repeat for every source
• Leverages optimality property
– Use sub-paths to build longer shortest-paths
• Has some scaling issues in complex networks – Imagine 1000+ nodes, 1000’s links
– Imagine changes at any single point
– Lots of research…
• Needs complete topology – At each node/source
27

Bring on Distance Vector routing!
• When you don’t know the topology…
– Calculate Source tree in a distributed fashion – Looks a bit like Spanning Tree
• Nodes only know costs to neighbours • Nodes only talk to neighbours
• Nodes all run the same algorithm
• Nodes/links may fail or lose messages
28

The algorithm
• Distributed Bellman-Ford
• One of two major approaches for routing protocols
– Link-state is the other one
• Early routing approach (Routing Information Protocol (RIP), 1988)
– Simple to use, but slow to converge, and somewhat fragile – still improving
• Each node stores a vector of distances, and next hops, to all destinations – Initially vector has 0 cost to self, infinity to all others
– Send vector to neighbours
– Update for each destination with lowest cost heard, adding cost of link
– Repeat
29

Simplest (non-trivial) example
A
B
C
A
B
1
0
2
C
B
B’s table
A
B
C
A
0
1
5
B
C
A
B
C
A
B
C
5
2
0
A
12 5
C
A’s table
C’s table
30

B, C vectors shared with A B
A
B
C
A
B
1
0
2
C
A
B
C
A
0
1
5
B
1
0
2
C
5
2
0
A
B
C
A
B
C
5
2
0
A
12 5
C
31

A updates its vectors… B
A
B
C
A
B
1
0
2
C
A
B
C
A
0
1
3(B)
B
1
0
2
C
5
2
0
A
B
C
A
B
C
5
2
0
A
12 5
C
32

Other routers do the same…
A
B
C
A
0
1
5
B
1
0
2
C
3
2
0
B
A
B
C
A
0
1
3(B)
B
1
0
2
C
5
2
0
A
B
C
A
0
1
5
B
1
0
2
C
3(B)
2
0
A
12 5
C
33

Everyone updates… and converges
A
B
C
A
0
1
3(B)
B
1
0
2
C
3(B)
2
0
B
A
B
C
A
0
1
3(B)
B
1
0
2
C
3(B)
2
0
A
B
C
A
0
1
3(B)
B
1
0
2
C
3(B)
2
0
A
12 5
C
34

Distance-vector routing
• Adding routes:
– One hop wider awareness for every message exchange
• Deleting routes:
– Deliberately or due to failures
– Drop out of vectors, other nodes delete
• One small problem
– Count to infinity…
– When a particular piece of the network falls off.
35

Count to infinity
• Good news travels quickly, Bad news travels slowly A1B1C1D
• Normally everyone else (C, D) sees a path to A through B
– When A falls off, B sees a path to A through C, which sees a path through B…
• Deal with problem via “poison reverse”, “split horizon” – Don’t advertise route to the node you learnt it from
– These don’t scale well in certain circumstances
36

Link state routing
• The other, more common routing algorithm • More computation, but better behaviours
– Scales well to enterprise networks, though not globally
• Used in
– Open Shortest Path First (OSPF),
– Intermediate System to Intermediate System (IS-IS)
• Same baseline rules for a federated environment – Only talk to neighbours
– Only know their costs
– don’t know the topology (to start with)
– Deal with node/link/message failures
37

Link state routing
• Simple Algorithm
• 3 parts:
– Flood the network
– Learn the topology
– Compute tables with Dijkstra
– And repeat every time there’s a change.
• That’s it!
38

Flooding?
• Broadcast incoming (broadcast) message to all (other) outbound interfaces
– Yes, it’s bad. – Unless it isn’t.
• Just keep track so you don’t repeat yourself, or cause storms – Use incrementing sequence numbers
– May get multiple copies, deal with it
• Ironically, if you miss a message, use ARQ
39

Link State flooding
• Each node floods link-state-packet (LSP)
– to the entire reachable network
4
F
2
E
G
3
3
42
Node E LSP (+ some Seq #)
A
10
B
4
C
1
D
2
F
2
A
4
10
1
B
D
H
C
2
2
3
40

4
3
F
Link-state topology analysis
• Listen for LSPs, learn the topology
• Then run Dijkstra locally
– On each node!
– Wasteful A • replicated communication/computation,
• Lots of CPU grunt needed
– But it’s effective.
• On changes (node/link failures) the neighbours: – Detect a link down, or lack of heartbeat packets – flood updated LSP
– and everybody recomputes
G
2
E
1
D
2
C
3
42
4
10
2
B
H
3
• Various (rare) failure modes (flooding fails, node flaps, seq# errors, races, …) – Manage by ageing LSPs and timeouts
41

Remember our routing expectations?
Expectation
Distance Vector
Link State
Correctness
Distributed Bellman-Ford
Replicated Dijkstra
Efficiency
Reasonable – shortest path
Reasonable – shortest path
Fairness
Reasonable – shortest path
Reasonable – shortest path
Convergence
Slow… many exchanges
Fast – flood and compute
Scalability
Excellent – storage/compute
Ok – storage/compute
42

Equal-cost multipath routing (ECMP)
• Not a protocol/algorithm, but an extension for flexibility
• Allow for multiple paths for packets between source and destination
– Greater redundancy
– Improve performance • Capacity increase
• Load balance
• Need to
– Detect them, and
– Forward traffic along them
43

ECMP detection
• One approach: don’t tiebreak
• Allow shortest path to be a set
– Rather than a single choice
• AtoE
– With modified costs – ABE = 8
– ABCE=8
– ABCDE = 8
• Notatreenowbut
a directed acyclic graph
G
10
4
4 F 3
2
E
A
B
41 2
3
H
C
2
1
D
3
44

ECMP forwarding
• Forwarding tables now have a set of interfaces for each destination
• Allocate each packet randomly?
– Good for load balancing,
– Bad for jitter (packet delay variation)
• Allocate by ‘relationship’
– Use the destination and source IP#
• E.g. E chooses: F-H goes EC, E-H goes EDC • Equal cost, and consistent performance
F
G
4
3
2
E
3
41
• Allocate by ‘flow’
– Using flow identifiers (IPv6)
B
D
45
A
4
10
2
• Less balanced, but more predictable
2
1
H
C
3

Hierarchical routing
• Scaling problems as identified earlier – Routing tables growing
– Routing computing growing
– Forwarding tables growing
• Network aggregation
– Already have LAN prefixes aggregating a whole subnet
• Don’t need to advertise every single host on your LAN
– Can treat a group of subnets as a larger subnet • E.g. adjacent /24s within a /16 (150.203.aaa.bbb) • But not all subnets now are ‘adjacent’
– What about geographical aggregation?
46

Routing to a region
• Aggregate nodes/subnets – Hide internal complexity
– Shorter tables
• Downside:
– Less optimal paths
– Full1Ato5C[1B]=5hops
– Hier.1Ato5C[1C]=6hops
47

Routing to a region
• Outside of a region, routers get told one route to get to that region – All hosts are aggregated into a smaller table,
– Reducing communication and computation
• There can still be more than one route into or out of a region
– It’s still a local router decision how to get in/out for a particular region – A region sets a context, and designates some border routers
– Within a region, we make our own (excellent) arrangements
48

Policy-based routing – and routing policies
• At the heart of the Internet
– Multiple ISPs, interconnecting via Internet Exchange Points (IXP) – All running a business. Or a country.
IXP
ISP F
ISP A
CDN B
IXP IXP
Net C
Net D
CDN E
IXP
49

Policy routing
• Already have dynamic, complex, large routing databases and algorithms
• Now Introduce human needs: Layers 8+ – Money
– Politics – Security – Religion
• i.e. we have POLICIES to add to our protocols
– E.g. National Research and Education Networks have an R&E traffic policy
• Wholesale purchase AND don’t compete with commercial providers AND social good
50

Shortest path is a local priority…
• E.g. each ISP policy: offload as quickly as possible • Technical Term:
• Hot Potato Routing
• Sub-optimal shortest path
• Asymmetric paths!
• Hierarchy is
(consciously) broken,
for good business reasons
ISP A
ISP B
51

Most common policies: Transiting and Peering
• ISPs
– Take your traffic and pass it through their network to the Internet
– They take the Internet traffic and pass it through to you
– And you pay them.
ISP A
(Rest of) Internet
52

Common policies: Transiting and Peering
• ISPs
– Take your traffic and pass it through to the other network
– They take the other networks traffic and pass it through to you
– You cannot reach the Internet through them.
– Mutual benefit
• No money exchanged
• ACDN,oracloud provider, or an NREN, or…
(Rest of) Internet
ISP A
ISP B
53

Border Gateway Protocol (BGP)
• The main Internet routing protocol today
• Key concepts:
• Separation of interior routing protocols and exterior routing protocols – Intradomain vs Interdomain
• Enterprise vs International
• Identifies Border Routers (or Gateways) which run BGP
– Creates an edge between interior and exterior routing
• Aggregates nodes within an ‘Autonomous System’ (AS) – Think a region, a business, an ISP
54

BGP is more DV than LS
• Instead of Distance Vector it is a Path Vector
• Announcements:
– IP Prefix, Next Hop
– And the Path: list of AS’s to transit
• This allows loops to be detected and removed • No distance indications
IXP
ISP F
ISP A
CDN B
IXP IXP
Net C
Net D
Net E
IXP
55

BGP route advertisments
56

Policy implementation
• BGP allows you to configure your route “advertisements”
• Border routers advertise available paths – with policy constraints – Only to those AS’s that may use them
– And filter out those they cannot use
– E.g. offer transit to some, peering to others
– E.g. offer a faster path to some, slower to others ($$$) • Border routers listen for available paths
– And (given a choice) pick the one that suits them – for any reason! • Shortest, cheapest, friendliest, safest, politically/contractually-suitable, … • Human rather than ‘technical’ optimisation
57

BGP example
• Various businesses here:
– AS1 is selling transit to AS2,3,4
– AS2 and AS3 are peering (ditto AS3 and AS4) – AS2 is selling transit to customer A
58

BGP advertisements
• Various advertisements here:
– Customer: [A, (AS2), router2U] is sent by AS2 to AS1
– Transit: [B,(AS1,AS3), router1L]and [C,(AS1,AS4), router1L] is sent by AS1 to AS2
– Peer: [B,(AS3), router3L] is sent by AS3 to AS2, [A,(AS2), router2R] is sent by AS2 to AS3
59

Customer A view
• So AS2 (and hence customer A)
– Hears one option for reaching C: (AS1, AS4)
– Hears two options for reaching B: Transit(AS1,AS3) and Peer(AS3) • And peering traffic is usually free…
60

In closing
• Routing is complicated and hard – this has been a very high-level view! – DV, LS and BGP are very important
• Internet is large and complex
• Policies are an important factor
– the internet is also a business
• Connecting interior and exterior routing/gateway protocols
– Literally an edge case. Haven’t even discussed it
• Performance is challenging
– Scalability, convergence, reliability, trustworthiness, optimisation, … – All in a (globally) distributed system
• Anybody looking for a PhD topic?
61