CS计算机代考程序代写 scheme data structure chain AI algorithm 6CCS3OME/7CCSMOME – Optimisation Methods

6CCS3OME/7CCSMOME – Optimisation Methods
Lecture 1
Single-source shortest-paths problem: Basic concepts, Relaxation technique, Bellman-Ford algorithm
Tomasz Radzik and Kathleen Steinho ̈fel
Department of Informatics, King’s College London 2020/21, Second term

Single-source shortest-paths problem
• w(v, u) – the weight of edge (v, u)
• s ∈ V – the source vertex
26 79573
1
51s53241 32
• G=(V,E) –directedgraph; |V|=n,|E|=m.
• Compute the shortest paths, that is, the paths with the smallest total weights, from s to all other vertices.
• The point-to-point shortest path problem is normally solved using a single-source shortest path algorithm (or an adaptation of such an algorithm).
6CCS3OME/7CCSMOME, Lecture 1
2 / 31
2

Preliminaries
a
2 c 6 v 1 79573
δ(s, v) = 14.
Two shortest-paths
z
5 1 s 5 3 2 4 1
from s to v. δ(s, y) = +∞.
32 xr2
• A path: p =< v1,v2,...,vk,vk+1 >, where (vi,vi+1) is an edge for each
i = 1,2,…,k. For example, path Q =< s,x,r,z,v >.
• The weight of a path p =< v1,v2,...,vk,vk+1 >:
w(p) = w(v1, v2) + w(v2, v3) + · · · + w(vk, vk+1). w(Q) = 14.
• The shortest-path weight (distance) from u to v:
δ(u,v) = 􏰂 min w(p) over all p from u to v, if there is any such path;
+∞, ifthereisnopathfromutov.
• Ashortestpathfromutovisanypathpfromutovwithweight w(p) = δ(u, v).
6CCS3OME/7CCSMOME, Lecture 1
3 / 31
y

Preliminaries (cont.)
• Useful simple facts:
1. A subpath of a shortest path is a shortest path.
In the graph on the previous slide: path ⟨x, r, z⟩ is a subpath of a shortest path ⟨s, x, r, z, v⟩, so it must be a shortest path (from x to z).
2. Triangle inequality:
foreachedge(u,v)∈E, δ(s,v)≤δ(s,u)+w(u,v).
a shortest path from s to v
v
s
from s to u
Holds also if u or v is not reachable from s.
• First the general case:
the weights of edges may be negative.
6CCS3OME/7CCSMOME, Lecture 1
4 / 31
u
a shortest path

Negative weights in an application
Examplefromfinancialanalysis: agraphofexchangerates. Find paths maximising exchange rates:
Find shortest paths:
• •
For an edge (x, y): γ(x, y) = the exchange rate from x to y.
E.g., γ(S, D) = 1.6.
• •
For example: w(S, D) = − ln(γ(S, D)) = − ln(1.6) ≈ −0.47

If γ(x,y) > 1, then w(x,y) < 0. We cannot avoid negative weights here, so we have to solve the shortest-paths D 0.006 Yen −4.61 −8.01 Dollar 0.6 100 1.4 Dollar 3.00 5.12 0.35 −0.34 1.6 S 0.039 0.05 −0.47 0.51 3.24 0.031 0.00032 8.05 Sterling INTEL 0.7 Sterling INTEL 3000 IBM 3.47 Yen IBM Set the weight of edge (x, y): w(x, y) = − ln(γ(x, y)) (We use the natural logarithms, but any fixed base > 1 would do.)
A path from v to u maximises the combined exchange rate from v to u, if and only if, this is a shortest path from v to u according to the these edge weights.
problem in a graph with (some) edge weights negative. 6CCS3OME/7CCSMOME, Lecture 1
5 / 31

The exchange-rates example (cont.)
For a path P = ⟨v1,v2,…,vk⟩:
w(v1,v2)+w(v2,v3)+···w(vk−1,vk)
w(P) =
= − ln(γ(v1, v2)) − ln(γ(v2, v3)) − · · · − ln(γ(vk−1, vk))
= − [ln(γ(v1, v2)) + ln(γ(v2, v3)) + · · · + ln(γ(vk−1, vk))]
= − ln (γ(v1, v2)γ(v2, v3) · · · γ(vk−1, vk)) {ln(a) + ln(b) + ln(c) = ln(abc)}
= −ln(γ(P)).
For two paths P and Q, the following relations are equivalent:
That is, “γ(P ) > γ(Q)” if and only if “w(P ) < w(Q)” γ(P) > ln(γ(P)) > −ln(γ(P)) < w(P) < γ(Q) ln(γ(Q)) − ln(γ(Q)) w(Q) Thus, a path P from v to u is a maximum exchange rate path from v to u, if and only if, P is a shortest path from v to u according to the edge weights w. 6CCS3OME/7CCSMOME, Lecture 1 6 / 31 8 8 8 8 Negative-weight cycles (negative cycles) s 2 −3 5 + 3 5 6 8 11 5 3 −4 −1 2? − 5? − 7 3 2 −6 • If there is a negative cycle on a path from s to v, then by going increasingly many times around such a cycle, we get paths from s to v of arbitrarily small (negative) weights. • If there is a negative cycle on a path from s to v, then, by convention, δ(s, v) = −∞. 6CCS3OME/7CCSMOME, Lecture 1 7 / 31 4 3? − We give up, if we detect a negative cycle in the input graph • We consider only shortest-paths algorithms which: • compute shortest paths for graphs with no negative cycles reachable from the source • for graphs with negative cycles reachable from the source, correctly detect that the input graph has a negative cycle (but are not required to compute anything else). • Why don’t we compute shortest simple paths (not containing cycles)? Such paths are always well defined, even if the graph has negative cycles. • This would be a valid, and interesting computational problem (with some applications). • The issue: we know efficient algorithms which compute shortest paths in graphs with no negative cycles, but we don’t know any efficient algorithm for computing shortest simple paths in graphs with negative cycles. • The problem of computing shortest simple paths in graphs containing negative cycles is NP-hard, that is, computationally difficult (by reduction from the Hamiltonian Path problem), and the algorithms which we discuss in Lectures 1–3 do not apply. 6CCS3OME/7CCSMOME, Lecture 1 8 / 31 Representation of computed shortest paths: A shortest-paths tree a 2 c 6 v PARENT[a] = x PARENT[x] = s PARENT[s] = nil PARENT[y] = nil ... 7 5 1 5 y x r h b 2 s 3 5 3 2 9 7 3 z 2 4 1 • If there are multiple shortest paths from s to v, do we want to find all of them or just one? 1 For each node v reachable from s, we want to find just one shortest path from s to v. • How should we represent the output, that is, the computed shortest paths? • If there is no negative cycle reachable from s, then shortest paths from s to all nodes reachable from s are well defined and can be represented by a shortest-paths tree: a tree rooted at s such that for any node v reachable from s, the tree path from s to v is a shortest path from s to v. 6CCS3OME/7CCSMOME, Lecture 1 9 / 31 Representation of computed shortest paths: A shortest-paths tree (cont.) • A shortest-paths tree from s contains exactly one shortest path from s to each v reachable from s. There may be other shortest paths from s to v, which are not included in this tree. • A shortest-paths tree T can be represented by an array PARENT[v], v ∈ V , in Θ(n) space (memory) , where n is the number of nodes in the graph. • PARENT[v] is the predecessor of node v in tree T. • An explicit representation of shortest-paths from s (a list of shortest-paths, one path per one node reachable from s, and each path represented as a full sequence of all its edges) would take Θ(n2) space in the worst case. s 6CCS3OME/7CCSMOME, Lecture 1 10 / 31 Output of a shortest-paths algorithm • A shortest-paths algorithm computes the shortest-path weights and a shortest-paths tree, or detects a negative cycle reachable from s. • Example. Input graph and the computed shortest-paths tree: a 2c6 79573 51s53z241 x 2 32 rh b The output of a shortest-path algorithm: array PARENT[.] representing the computed shortest-paths tree and array d[.] with the shortest-path weights: node a b c h r s v x y z PARENT[node] x nil a r x nil z s nil r d[node] 6 ∞ 8 6 4 0 14 1 ∞ 7 • Terminology and notation in [CLRS]: (parent, PARENT[v], d[v]) −→ (predecessor, v.π, v.d). 6CCS3OME/7CCSMOME, Lecture 1 11 / 31 v 1 y Relaxation technique (for computing shortest paths) • For each node v, maintain: • d[v]: the shortest-path estimate for v – an upper bound on the weight of a shortest path from s to v; • PARENT[v]: the current predecessor of node v. • The relaxation technique: INITIALIZATION followed by node a b c h r s v x y z PARENT nil nil nil nil nil nil nil nil nil nil d∞∞∞∞∞0∞∞∞∞ a sequence of RELAX operations. INITIALIZATION(G, s) d[s] ← 0; PARENT[s] ← NIL for each node v ∈ V − {s} do d[u] w(u,v) PARENT[v] d[v] v d[v] ← ∞; RELAX(u, v, w) PARENT[v] ← NIL { relax edge (u, v) } if d[v]>d[u]+w(u,v) then
s
PARENT[v]
d[v] ← d[u] + w(u, v); PARENT[v] ← u
• All algorithms discussed in this module are based on the relaxation technique (as
most of the SP algorithms). They differ in the order of RELAX operations. • When should the computation stop?
6CCS3OME/7CCSMOME, Lecture 1 12 / 31
u
PARENT[u]

8
8
8
8
8
8
Relax operation
14
0
RELAX(u,v) 14 0
s s
12
23 12
23
6CCS3OME/7CCSMOME, Lecture 1
13 / 31
20
20
4 29 uu
27
v 6 v 5 5
7 7
18
8 8
18
8 25 8
25
9

Example of Relaxation Technique – LGT
10
23
10 023
9
7
u
1
v
10 u 1
v 21 46
x
y
5
12 2
9s 46
s
5 7 x y25
2
(s,u), (y,v), (u,x), (x,v), (v,y)
10 u 1 v 11 s46s46
10 u 02390239
10
10
1
v 11
5757
12 x (u,v)
y 25 12 x 22
y 14
… and so on. 6CCS3OME/7CCSMOME, Lecture 1
14 / 31
(x,y)

Properties of the relaxation technique
• (1)Non-increasingshortest-pathestimates. Throughoutthe computation, for each node v, the shortest-path estimate d[v] can only decrease (it never increases). (from definition of operation)
• (2) For each node v, the shortest-path estimate d[v] is always either equal to ∞ (at the beginning) or equal to the weight of some path from s to v.
graph G
(by induction)
finte d[.]
infinite d[.]
s
u
v
• (3)Upperboundproperty. Foreachnodev,wealwayshave
d[v] ≥ δ(s, v). (directly from (2))
• (4)No-pathproperty. Ifthereisnopathfromstov,thenwealwayshave d[v] = δ(s, v) = ∞. (directly from (2))
6CCS3OME/7CCSMOME, Lecture 1 15 / 31

Properties of the relaxation technique (cont. 1)
• (5) Convergence property. If (s, . . . , u, v) is a shortest path from s to v and if d[u] = δ(s, u) at any time prior to relaxing edge (u, v), then
d[v] = δ(s, v) at all times afterward.
s
• (6) Path-relaxation property. If p = (v0, v1, . . . , vk), is a shortest path from s = v0 to vk, and we relax the edges of p in the order (v0,v1),
(v1, v2), . . . , (vk−1, vk), then d[vk] = δ(s, vk).
This property holds regardless of any other relaxation steps that occur, even if they are intermixed with relaxations of the edges of path p.
s
v0 v1 vi vi+1 vk
6CCS3OME/7CCSMOME, Lecture 1
16 / 31
uv

Relaxation technique: properties of the parent subgraph
• (7) For each edge (u, v) in the current parent subgraph (that is,
u = PARENT[v]), d[u] + w(u, v) ≤ d[v]. (by induction – LGT)
• (8) If the graph contains no negative-weight cycle reachable from s, then • the parent subgraph is always a tree T rooted at s,
• foreachnodevintreeT,d[v]≥theweightofthepathinT fromstov. (prove using (7) – LGT)
graph G
s
The red part of the current parent tree (“near” the source vertex s) is
already part of the computed shortest-paths tree.
The blue part may change during the subsequent relax operations. 6CCS3OME/7CCSMOME, Lecture 1
17 / 31
finite d[.], parent tree

Relaxation technique: properties of the parent subgraph (cont.)
• (9)Whend[v]=δ(s,v)forallv∈V,then
• the parent subgraph is a shortest-paths tree rooted at s, (from (8))
• no further updates possible. (from (3) – the upper bound property) graph G
s
not reachable from s, infinite d[.]
6CCS3OME/7CCSMOME, Lecture 1 18 / 31

Effective relax operations
• Definition:
an effective relax operation is a relax operation which decreases d[v].
• (10) The computation can progress, if the s.p. weights not reached yet:
• (b)
• (a)
⇒ (a): from (3) and the Triangle Inequality.
⇒ (b): Assume d[x] > δ(s, x) for some node x (so δ(s, x) < +∞). • (a) There exists a vertex x ∈ V such that d[x] > δ(s, x), if and only if,
• (b) There exists an edge (u, v) ∈ E such that d[v] > d[u] + w(u, v) (that is, another effective relax operation is possible).
• Case δ(s, x) > −∞ (no negative cycle on a path from s to x). Consider a shortest s-to-x path P :
s
x
u v d[x] > δ(s,x)
d[s] = δ(s,s)
Must have an edge (u, v) on P such that d[u] = δ(s, u) but d[v] > δ(s, v).
Forsuchanedge: d[v]>δ(s,v)=δ(s,u)+w(u,v)=d[u]+w(u,v). • Case δ(s, x) = −∞ for some vertex x: exercise (consider a path
(s,u1,…,uk,v1,…,vr,v1), where (v1,…,vr,v1) is a negative cycle). 6CCS3OME/7CCSMOME, Lecture 1 19 / 31

The relaxation technique: summary for the case of no negative cycles
graph G
• If no negative cycle is reachable from s:
s
not reachable from s, infinite d[.]
• Only finitely many effective relax operations during the computation.
(If no neg. cycles, Prop. (2) becomes: d[v] is ∞ or the weight of a simple s-to-v path. Hence d[v] takes on only finitely many different values.)
• The PARENT-subgraph is always a tree rooted at s. (property (8))
• When eventually no effective relax operation possible, then for each node v, d[v] = δ(s, v) (property (10)), and
the PARENT pointers form a shortest-paths tree (property(9)).
6CCS3OME/7CCSMOME, Lecture 1 20 / 31

The relaxation technique: summary for the case with negative cycles
graph G
s
• If there is a negative cycle reachable from s:
• There is always an edge (u, v) such that: d[v] > d[u] + w(u, v), that is,
an effective RELAX operation is always possible. (from property (10))
• The PARENT subgraph eventually contains a cycle. (not easy to prove)
• Thus we can detect the existence of a negative cycle by periodically checking if the PARENT pointers form a cycle.
This way we can avoid infinite loop, so we can turn any relaxation technique computation into an algorithm.
How to check for cycles in the parent subgraph and how often? 6CCS3OME/7CCSMOME, Lecture 1
21 / 31
negative cycle

The Bellman-Ford algorithm (for single-source shortest-paths problem)
• Edge weights may be negative.
BELLMAN-FORD(G, w, s) { G = (V, E) } INITIALIZATION(G, s)
1: for i←1 to n−1 do {n=|V|}
for each edge (u, v) ∈ E do { edges considered in arbitrary order}
RELAX(u, v, w)
representation of input:
nodes indexed by 1,2,…,n; one node index as the source; list of edges
2: for each edge (u,v) ∈ E do
if d[v]>d[u]+w(u,v) then
return FALSE { a negative cycle is reachable from s} return TRUE { no negative cycles; for each v ∈ V , d[v] = δ(s, v) }
• This algorithm is based on the relaxation technique: INITIALIZATION followed by a (finite) sequence of RELAX operations.
• The running time is Θ(mn), where n is the number of nodes and m is the
number of edges.
• The worst case running time of any algorithm for the single-source shortest
paths problem with negative weights is Ω(mn). 6CCS3OME/7CCSMOME, Lecture 1
at least of the order of
22 / 31
exactly of the order of

Example of the computation by the Bellman-Ford algorithm – LGT
s
Assume that the edges are given in this order:
10
23
9
46
5
7
(s, u), (s, x), (y, s), (v, y), (u, v), (x, v), (y, v), (x, u),
(x, y),
(u, x).
Initially:
(relaxation technique initialization)
node s u v x y PARENT[.] nil nil nil nil nil d[.] 0 ∞ ∞ ∞ ∞
6CCS3OME/7CCSMOME, Lecture 1
23 / 31
1 u
v
xy 2

Example the computation by the Bellman-Ford algorithm (cont.)
At the end of the 1-st iteration of the main loop 1:
81 uv
11
node s u v x y PARENT[.] nil x u s x d[.] 0 8 11 5 7
0239 s
46
At the end of the 2-nd iteration of the main loop 1:
5
819
node s u v x y PARENT[.] nil x u s x d[.] 0 8 9 5 7
10 s
46 7
The shortest paths and the shortest-paths weights are now computed, but the algorithm will execute the remaining two iterations of the main loop.
6CCS3OME/7CCSMOME, Lecture 1 24 / 31
10
5 7 xy
7
uv 0239
5
5
2
2
xy
7

Correctness of the Bellman-Ford algorithm
• If there is a negative cycle reachable from s, then Bellman-Ford algorithm returns FALSE (because an effective relax operation will always be possible).
That is, in this case the algorithm is correct.
• If no negative cycle reachable from s, then the following claim is true.
Claim: Attheterminationofloop1,d[v]=δ(s,v)foreachvertexv∈V.
• This claim follows from the lemma given on the next slide.
• This claim implies that at the termination of loop 1, no effective relax operation is possible so the algorithm returns TRUE.
At the end of the computation:
(a) array d contains the shortest path distances from s to all other nodes;
(b) array PARENT contains a shortest-paths tree with node s as the source (from (a) and property (9) of the relaxation technique).
• That is, in the case of “no negative cycles,” the algorithm is also correct. 6CCS3OME/7CCSMOME, Lecture 1 25 / 31

Correctness of the Bellman-Ford algorithm (cont.)
• Lemma: If the length (the number of edges) of a shortest (simple) path from s to a node v is k, then at the end of iteration k (of the main loop 1) in the Bellman-Ford algorithm, d[v] = δ(s, v).
• This lemma follows from the path-relaxation property of the relaxation technique (property (6)).
• A shortest path from s to v of lenght k (k edges on the path, k ≤ n−1): s
x1 x2 x3
xk=v
d[x1] = δ(s, x1) by the end of the 1st iteration, d[x2] = δ(s, x2) by the end of the 2nd iteration, d[x3] = δ(s, x3) by the end of the 3rd iteration, …
d[v] = δ(s, v) by the end of the k-th iteration.
• This lemma implies the claim from
the previous slide, because each
simple shortest path has at most n − 1 edges. 6CCS3OME/7CCSMOME, Lecture 1
26 / 31

Speeding-up the Bellman-Ford algorithm
• Try to decrease the number of iterations of loop 1:
• If no effective relax operation in the current iteration, then terminate: the
shortest-path weights are already computed.
• Check periodically (at the end of each iteration of loop 1?) if the PARENT pointers form a cycle. If they do, then terminate and return FALSE: there is a negative cycle reachable from s.
• Try to decrease the running time of one iteration of loop 1: consider only edges which may give effective relax operations.
• We say a vertex u is active, if the edges outgoing from u have not been relaxed since
the last time d[u] has been decreased.
v1
• Perform RELAX only on edges outgoing from active vertices.
• Store active vertices in the Queue data structure.
(What is the Queue data structure?) 6CCS3OME/7CCSMOME, Lecture 1
27 / 31
x
u parent[u]
v2
z

The Bellman-Ford algorithm with FIFO Queue
BF-WITH-FIFO(G, w, s) INITIALIZATION(G, s)
Q ← ∅; ENQUEUE(Q, s)
{ G = (V, E) }
{Q – FIFO queue of active vertices}
while Q̸=∅ do
u ← head(Q); DEQUEUE(Q)
for each v ∈ Adj[u] { for each node v adjacent to u } do RELAX(u,v,w)
v1
v2
return TRUE {foreachv∈V,d[v]=δ(s,v)}
RELAX(u, v, w):
if d[v]>d[u]+w(u,v) then
s
u
v3
d[v] ← d[u] + w(u, v); PARENT[v] ← u if v is not in Q then ENQUEUE(Q, v)
• A mechanism for detecting negative cycles must be added (Q never empty, if there is a negative cycle): periodically check PARENT pointers for a cycle.
• The running time: O(mn), if properly implemented. But Θ(mn) worst case. 6CCS3OME/7CCSMOME, Lecture 1 28 / 31
representation of input:
the array Adj of adjacency lists; Adj[u] – the list of u’s neighbours

Example of the computation of Bellman-Ford with FIFO Queue – LGT
s
5
7
10
23
9
46
Assume that the edges outgoing from nodes are given in this order:
(s, u), (u, x), (x, u), (v, y); (y, s),
(s, x);
(u, v);
(x, v), (x, y);
(y, v). 6CCS3OME/7CCSMOME, Lecture 1
29 / 31
1 u
v
xy 2

Examples and Exercises – LGT
1. Example of the relaxation technique – slide 14.
2. Show the properties of the relaxation technique given on slide 17:
• Property (7): For each edge (u, v) in the current parent subgraph (that is, u = PARENT[v]), d[u] + w(u, v) ≤ d[v].
• Property (8):
If the graph contains no negative-weight cycle reachable from s, then
• the parent subgraph is always a tree T rooted at s,
• foreachnodevinT,d[v]≥theweightofthepathinT fromstov.
3. Example of the computation by the Bellman-Ford algorithm – slide 23.
4. Example of the computation of Bellman-Ford with FIFO Queue – slide 29.
6CCS3OME/7CCSMOME, Lecture 1 30 / 31

Exercises – LGT (cont.)
5. Design an efficient algorithm for checking whether the array PARENT[.] represents a tree (or it has a cycle). Check how your algorithm works on these two arrays:
NODE abcgh s xz PARENT: x s s c x NIL b c
NODE abcgh s xz PARENT: x h s c x NIL b c
6CCS3OME/7CCSMOME, Lecture 1
31 / 31

6CCS3OME/7CCSMOME – Optimisation Methods
Lecture 2
Single-source shortest-paths: Dijkstra’s algorithm, shortest-paths algorithm for DAGs
Tomasz Radzik and Kathleen Steinho ̈fel
Department of Informatics, King’s College London 2020/21, Second term

Topics
• Single-source shortest-paths; restricted cases • Only non-negative edge weights allowed:
Dijkstra’s shortest-paths algorithm
• The input graph is acyclic (a DAG – a directed acyclic graph): Single-source shortest paths algorithm for DAG’s
• In both cases, the Bellman-Ford shortest-paths algorithm can be used.
In both cases, the new algorithms are faster than the Bellman-Ford algorithm.
6CCS3OME/7CCSMOME, Lecture 2
2 / 27

8 8
Dijkstra’s shortest-paths algorithm
• Crucial assumption: all edge weights are nonnegative.
• For convenience, assume that all nodes are reachable from s.
node a b c h r s v x y z
DIJKSTRA(G, w, s) INITIALIZATION(G, s) S ← ∅
Q ← V
d∞∞∞∞∞0∞∞∞∞
while Q̸=∅ do
u ← EXTRACT-MIN(Q)
{ u has the min d[.] value in Q } for each node v ∈ Adj[u] do RELAX(u,v,w)
0 s
S ← S ∪ {u} end while .
3
• Q is implemented as a
Priority Queue data structure.
Priority Queue maintains pairs (value, key) and the main operation is EXTRACT-MIN.
12
u
6
In Dijkstra’s algorithm: pairs (node, d[node]). 6CCS3OME/7CCSMOME, Lecture 2
3 / 27
PARENT nil nil nil nil nil nil nil nil nil nil
{ G = (V, E) }
{ “relaxation technique” initialization }
{ nodes v for which we know that d[v] = δ(s, v) } { the other nodes in Priority Queue with keys d[.]}
S
5
9 14
15
15
11 Q
7
7
2

Example
From [CLRS] textbook:
s
5
7
6CCS3OME/7CCSMOME, Lecture 2
4 / 27
10
23
9
46
u
1
v
xy 2

8
8
8
8
8
8
8
8
Example: initialization (continued at LGT)
From [CLRS] textbook:
0
23
9
46
s
5
7
u 10
1
v
Q = { (s,0), (u, ), (v,
), (x,
), (y,
) }
6CCS3OME/7CCSMOME, Lecture 2
5 / 27
xy 2

8
8
The correctness of Dijkstra’s algorithm
An intermediate state of the computation (at the end of one iteration).
• Set S grows by one node per one iteration
set S
End 1-st iter.: S = {s} End last iter.: S = V
s
• All edges from nodes in S, and only these edges, have been relax’ed.
• AllnodesinS orattheendofedgesfromS:
• have finite d[.] values (by induction)
• have parents, forming a tree (no negative cycles and Prop. (8)). • For all other nodes, d[.] = ∞, and not in the parent tree.
6CCS3OME/7CCSMOME, Lecture 2
6 / 27
d[.] finite

The correctness of Dijkstra’s algorithm (2)
At the end of set S = V the computation:
• SetS=V.
For each v ∈ V , d[v] is finite and d[v] ≥ δ(s, v). The parent subgraph is a tree (rooted at s) which includes all nodes.
• We haven’t shown yet that the computed: tree is a shortest paths tree and
the d[.] values are the shortest-path weights.
6CCS3OME/7CCSMOME, Lecture 2
7 / 27
all d[.] finite
s

8 8
8 8
8
8
8 8
8
8
8
The correctness of Dijkstra’s algorithm (2): the crucial property
The crucial property (the invariant of the computation of Dijkstra’s algorithm):
At the end of each iteration (on the main loop) of Dijkstra’s algorithm, foreachnodev∈S, d[v]=δ(s,v).
• This property can be shown by induction on the number of iterations.
• Basis of the induction: The property holds at the end of the first iteration:
S={s}, d[s]=0=δ(s,s). set S
6
6CCS3OME/7CCSMOME, Lecture 2
8 / 27
5
6 5
s 2
5 2
0
5

8 8
8
8
8
8
8
8
8
8
8
8
8
8
The invariant of Dijkstra’s algorithm: the induction step
Induction step:
set S
• Assume the invariant holds at the end of some iteration
(not the last one):
s
for each v ∈ S, d[v] = δ(s, v).
• Let u denote the node selected in the next (current) iteration.
set S
u
• Show that the invariant holds at the end of
the current iteration, (after u added to set S).
s
• That is, show that d[u] = δ(s, u).
6CCS3OME/7CCSMOME, Lecture 2
9 / 27
u

The invariant of Dijkstra’s algorithm: the induction step (cont.)
• We have to show that at the beginning of the current iteration, d[u] is the shortest-path weight from s to u. That is, we have to show that d[u] = δ(s, u).
• We have d[u] ≥ δ(s, u) (relaxation technique). We show d[u] ≤ δ(s, u).
set S the tree path to x
u x
• Take any (simple) path R from s to u and show that its weight w(R) ≥ d[u].
R2
• Letzbethefirstnodeon path R which is outside S, and let y be the predecessor of z on R (so y ∈ S).
s
R1 y
z
x = Parent[u]
• Let R1 be the initial part of path R ending at node y, and let R2 be the part of path R from node z to the final node u.
• w(R) = w(R1)+w(y,z)+w(R2) ≥ d[y]+w(y,z)+w(R2) ≥ d[z]+w(R2) ≥ d[u]+w(R2) ≥ d[u].
• The inequalities above follow from: the inductive assumption, RELAX(y,z) done, the rule for selecting u, and the non-negative weights of edges, respectively.
• Thus no path from s to u has smaller weight than d[u], so d[u] ≤ δ(s, u). 6CCS3OME/7CCSMOME, Lecture 2
10 / 27

The running time of Dijkstra’s algorithm
• n – number of nodes; m – number of edges.
• Initialisation (as in the relaxation technique): Θ(n).
• For every edge (u, v), RELAX(u, v) is performed exactly once.
This happens during the iteration when node u is removed from Q. (Each node is removed from Q exactly once.)
• The running time depends on the implementation of the priority queue Q.
• Notethat n−1≤m≤n(n−1), so m=Ω(n) and m=O(n2).
• The naive implementation of Q, using an unordered list of elements: bcfklpqrsw
Q:w c b k q d:
• one EXTRACT-MIN(Q): O(n) time; all of them: O(n2); • one RELAX operation: O(1); all of them: O(m);
• the total running time: Θ(n) + O(n2) + O(m) = O(n2).
• What would be the running time of Dijkstra’s algorithm, if Q was maintained
asanorderedlist? (LGT).
6CCS3OME/7CCSMOME, Lecture 2 11 / 27

PriorityQueueimplementedasHeap: operationsandperformance
• To improve the running time of Dijkstra’s algorithm, use the Heap implementation of the Priority-Queue to maintain Q.
• The main Priority Queue operations in Heap (n denotes the size of the heap, that is, the number of elements in the heap):
• INSERT(Q, v, k): insert the value-key pair (v, k). O(log n) time
• EXTRACT-MIN(Q): remove and return the pair with the smallest key.
• Check if Heap is not empty.
• Initialise Heap with given n elements.
O(log n) time O(1) (constant) time Θ(n) time.
• In Dijkstra’s algorithm, we also need heap operation: HEAP-DECREASE-KEY(Q, v, k)
– decrease the key associated with v to k. O(log n) time
6CCS3OME/7CCSMOME, Lecture 2
12 / 27

Dijkstra’s algorithm with Heap
• The set Q in Dijkstra’s algorithm maintained as Heap:
• Initialisation of the Heap: Θ(n) time.
• One EXTRACT-MIN(Q): O(log n) time; all: O(n log n) time.
• one RELAX(u, v): O(log n) time, since it may involve one operation HEAP-DECREASE-KEY; all: O(m log n) time.
• The total running time of Dijkstra’s algorithm with Heap is: Θ(n)+Θ(n)+O(nlogn)+O(mlogn)=O(mlogn).
• If the input graph is dense, that is (here), if m = Ω(n2/ log n), then the implementation of Dijkstra’s algorithm without Heap (maintaining Q as an unordered list) gives the better (worst-case) running time – O(n2).
• For the other cases (when m = O(n2/ log n)), the implementation of Dijkstra’s algorithm with Heap gives the better (worst-case) running time.
• In most applications of Dijkstra’s algorithm, graphs are not dense, so Dijkstra’s algorithm is commonly assumed to use Heap.
6CCS3OME/7CCSMOME, Lecture 2
13 / 27

Heap implementation of Priority Queue data structure
• Heap: An array A[1..n] with a sequence of numbers (keys) and associated data (values), satisfying some specific partial order of the entries.
This specific order is called the heap property (Min-Heap here):
A[i]≤A[2i] and A[i]≤A[2i+1], fori=1,2,… Below, an arrow points from a smaller key to a larger key:
A[1] A[2] A[3] A[4] A[5]A[6]
A[i]
A[2i]A[2i+1]
smallest key
A[4] A[5]
A[6] A[7]
A[2]
A[3]
A[1]
• The operations of extracting the minimum and inserting a new element take O(log n) time, where n is the current size of the Heap.
6CCS3OME/7CCSMOME, Lecture 2 14 / 27
A[2i]
A[2i+1]
A[i]

Heap in Dijkstra’s algorithm
The entries in the Heap array are the pairs (u, d[u]), where u is a node in the graph and d[u] is its current shortest path estimate. The number d[u] is the key of this entry.
Q_A:
4
………………………
x
y z
pos_in_Q:
………………………
1 2 3
x y z v
d[x] d[y] d[z] d[v]
………………………
y 2413
vxz
v
• We need a heap operation HEAP-DECREASE-KEY, which restores the heap property when the key of one entry is decreased.
RELAX(u,v) (in Dijkstra’s algorithm, if Q is a Heap) if d[v]>d[u]+w(u,v) then
d[v] ← d[u] + w(u, v); PARENT[v] ← u HEAP-DECREASE-KEY(Q, v, d[v])
Operation HEAP-DECREASE-KEY takes O(log n) time.
6CCS3OME/7CCSMOME, Lecture 2 15 / 27

Heap Extract-Min operation
6CCS3OME/7CCSMOME, Lecture 2
16 / 27
log (n) 2
min

Heap-Decrease-Key operation
6CCS3OME/7CCSMOME, Lecture 2
17 / 27
log (n) 2
pos_in_Q:
v
v

Single-source shortest paths in DAG’s
Input:
• G = (V, E) – directed acyclic graph (DAG);
• w – weights of edges (may be negative);
• s∈V – thesourcenode.
3 a −1 b 8
Output: theshortest-pathweightsfromstoallothernodes and a shortest-paths tree from s.
There may be edges with negative weights, but no problem with negative cycles, because there are no cycles at all.
Bellman-Ford: O(mn).
We show an algorithm which runs in O(n + m) time.
(Linear, optimal time) 6CCS3OME/7CCSMOME, Lecture 2
18 / 27
s
4 −2
1 3
c
d
5 e

Application
• Determine the critical (longest) paths in PERT chart analysis (Program Evaluation and Review Technique).
• Nodes: milestones of the project (‘synchronisation points’) . Edges: tasks of the project. The weight of an edge: the duration of this task.
Find the longest path from node “begin” to node “end”:
this gives the fastest possible time for completing the whole project (that is, for completing all tasks of the project).
begin
3
a
4g
3
end
6CCS3OME/7CCSMOME, Lecture 2
19 / 27
e
5
5
15
3 h
1
3
b 46
3
c
4 p6 23
4
f
d2
7
q
3
17

DAGs: Longest paths to shortest paths by negating all egde weights
• To compute longest paths from the start node, negate all edge weights and compute shortest paths from the start node.
begin
a
end
6CCS3OME/7CCSMOME, Lecture 2
20 / 27
−1
−1
−4
−3 −3f c
b
−4 −5 −6
−4 −3−3 p−6−23
−4g −3 −5
−17
e −5
h
−3
d −2
−7
q

Algorithm
• Consider the nodes in a topological order: all edges go in the same direction.
begin
1: 2: 3: 4: 5:
topologically sort the nodes of G { put the nodes in a topological order } INITIALIZATION(G,s)
for
each node u taken in the topological order computed in step 1 for each node v ∈ Adj[u] do
do
aegbchdf
p q end
DAGSHORTESTPATHS(G, w, s)
RELAX(u, v, w) • Runningtime: O(n+m)
(topological sort takes O(n + m) time).
• Correctness: show (by induction) that when
node u is considered in line 3, then d[u] = δ(s, u). 6CCS3OME/7CCSMOME, Lecture 2
21 / 27

Examples and Exercises – LGT
1. Example of the computation of Dijkstra’s algorithm – slides 4-5.
2. What would be the running time of Dijkstra’s algorithm, if the priority queue
Q was maintained as an ordered list?
6CCS3OME/7CCSMOME, Lecture 2 22 / 27

Examples and Exercises – LGT (cont)
3. (Exercise 24.2-3 from [CLRS])
The PERT chart formulation given in this lecture is somewhat unnatural. More naturally, vertices would represent tasks and edges would represent order constraints; edge (u, v) would indicate that task u must be performed before task v. We would then assign weights (duration of the tasks) to vertices, not edges. Modify the DAGSHORTESTPATHS algorithm so that it finds, in linear time, a longest path in a directed acyclic graph with weighted vertices. Let w(v) denote the weight of a vertex v.
Trace the computation of the algorithm on the graph below.
The green path has weight 20.
b3f c
26
The red path,
with weight 26,
is the longest path.
g
6CCS3OME/7CCSMOME, Lecture 2
23 / 27
4 3
4 3
begin a
p end
5 4d6
e
h 7q2
1
2
20

Exercises – SGT
1. This exercise shows that Dijkstra’s algorithm does not necessary compute shortest paths, if there are negative weights; even if there are no negative cycles.
(a) Show the values d[x] and the tree computed by Dijkstra’s algorithm for the input graph:
a
(b) Show the shortest-path weights and a shortest-paths tree in this graph.
(c) How to modify Dijkstra’s algorithm so that it runs also for graphs where edge weights may be negative?
(d) What is the running time of the modified algorithm?
6CCS3OME/7CCSMOME, Lecture 2
24 / 27
8 s
6 b
d
4
8 -6
7
4
4 c

Exercises – SGT (cont.)
2. Design a linear-time (O(n + m)-time) algorithm which for a given directed acyclic graph (with n nodes and m edges) computes a topological order of the nodes.
Do not use the Depth-First Search (DFS) algorithm.
(There is a topological sort algorithm which is based on the DFS algorithm, but in this exercise you are asked to design an alternative algorithm.)
Hint: How to identify the fist node for the topological order? How to identify subsequent nodes?
For the running time, assume the adjacency-list representation of the graph.
Specify all data structures which your algorithm needs to achieve the O(n + m) running time.
6CCS3OME/7CCSMOME, Lecture 2
25 / 27

Exercises – SGT (cont.)
3. Revise the Breadth-First Search (BFS) algorithm. Trace the execution of BFS on the graph shown below.
4. Revise the Depth-First Search (DFS) algorithm.
Trace the execution of BFS on the graph shown below.
6CCS3OME/7CCSMOME, Lecture 2
26 / 27
a
f
b
k
g
s
h
e
c
d

Exercises – SGT (cont.)
5. The graph below has a negative cycle reachable from the source. Trace the computation of the Bellman-Ford algorithm with FIFO queue on this graph until the PARENT edges form a cycle.
(u, x),
(x, u),
(v, x),
(v, y); (y, v).
(y, s), 6CCS3OME/7CCSMOME, Lecture 2
27 / 27
u −2
1
v
s
5
63
−5
xy 2
Assume that the edges outgoing from nodes are given in this order:
(s, u),
(s, x); (u, v); (x, y);
7
46

6CCS3OME/7CCSMOME – Optimisation Methods
Lecture 3
All-pairs shortest paths
Point-to-point shortest-paths in geographical networks
Tomasz Radzik and Kathleen Steinho ̈fel
Department of Informatics, King’s College London 2020/21, Second term

Topics
• All-pairs shortest-paths problem: find shortest paths for all source-destination pairs of nodes.
Johnson’s algorithm
• Single-source single-destination shortest-path problem
• Geographical networks: geographical coordinates of the nodes are known; the weights of edges are at least the straight-line (geographical) distances between the nodes.
Adaptation of Dijkstra’s shortest-paths algorithm
• From Dijkstra’s algorithm to the A* search algorithm.
6CCS3OME/7CCSMOME, Lecture 3 2 / 23

All-pairs shortest-paths problem
16
19
D E
16
D 16 31 50 0 57 E 34 26 10 18 0
D D A B E D E E
B
Output:
D 16
ABCDE ABCDE ABCDE
A B 15 19 1010 C
15 16 19
A 0 15 34 16 41 A A B A B B 16 0 19 32 26 B B B A B C351902810 CBC EC
A 16 18 C
16
26 19 10
26 B
E 26
26 10 18
E
Input: weighteddirectedgraphG=(V,E). w(v, u) – the weight of edge (v, u).
Assume that the nodes are numbered from 1 to n, that is, V = {1,2,…,n}.
• information whether G contains a negative cycle, and if it doesn’t, then
• n×nmatrix D=(di,j) suchthatdi,j isequaltoδ(i,j)(theshortest-path weight from node i to node j); and
shortest-paths trees, one from each node in the graph.
To avoid technicalities, we won’t discuss computation of shortest-paths trees. 6CCS3OME/7CCSMOME, Lecture 3 3 / 23

Repeatedly apply a single-source shortest-paths algorithm
• If all edge weights are non-negative: • Use Dijkstra’s algorithm.
• The total running time is
n × O(min{m log n, n2}) = O(min{nm log n, n3}).
• No method with a better (worst-case) running time is known. • In the general case, when the edge weights may be negative:
• We may use the Bellman-Ford algorithm.
• If we do so, the total running time is n × O(nm) = O(n2m)
(this is Θ(n4), if m = Θ(n2)).
• The Floyd-Warshall algorithm: Θ(n3).
• Johnson’s algorithm: O(nm log n).
6CCS3OME/7CCSMOME, Lecture 3
4 / 23

Change the weights of edges without changing shortest paths
How can we change the weight of edges (ideally removing negative weights) without changing shortest paths?
Adding the same (large) number
to the weight of each edge doesn’t work.
Changing weights of edges in Johnson’s algorithm:
+M
−y
−e
e
+M
+z −r−r −z
6CCS3OME/7CCSMOME, Lecture 3
5 / 23
y +y +M+M +x−y
+M +M
+r
+c −c c
+M
+M
+e +e
+M x +M
−x
+z
−a
+M +M
+z z
r −c
+a
+x
−z +z
a −a

The main idea behind Johnson’s algorithm
• Reduces the general case (edge weights may be negative) to the case with all edge weights nonnegative.
• Re-weighting: compute new edge weights wˆ with these properties:
1. For all u, v ∈ V , a shortest path from u to v using the original weights w
is also a shortest path from u to v using the new weights wˆ. 2. wˆ(u,v) ≥ 0, for each edge (u,v) ∈ E.
• The general idea for re-weighting:
• Foreachnodev∈V,assignanumberh(v)tov.
• Foreach(v,u)∈E,let wˆ(u,v)=w(u,v)+h(u)−h(v).
• For any numbers h(v), the new edge weights satisfy Property 1 above.
• If there is no negative cycle in G, then we can find numbers h(v) which satisfy also Property 2.
6CCS3OME/7CCSMOME, Lecture 3
6 / 23

Re-weighting
• For any path P = (v1,v2,…,vk):
wˆ(P) = =
wˆ(v1,v2)+wˆ(v2,v3)+···+wˆ(vk−1,vk)
w(v1, v2) + h(v1) − h(v2) + w(v2, v3) + h(v2) − h(v3) +···+w(vk−1,vk)+h(vk−1)−h(vk) w(v1,v2)+w(v2,v3)+···+w(vk−1,vk)+h(v1)−h(vk) w(P) + h(v1) − h(vk) (∗)
= =
• Thus when we change the edge weights from w to wˆ, then for each pair of
nodes u and v, the weight of each path from u to v changes by the same
amount: h(u)−h(v). 25
20
u
v h(v)=8
u
v
h(u)=3
22
17
18
13
• Hence a path P is a shortest path from u to v according to weights w, if and only if, P is a shortest path from u to v according to weights wˆ.
ˆ
• We also have, from (∗): δ(u, v) = δ(u, v) + [h(u) − h(v)].
• How to select numbers h(v), so that all new weights are nonnegative? 6CCS3OME/7CCSMOME, Lecture 3 7 / 23

Computation of the new edge weights – example
input graph
0
s
7 1
−2 3
0
0
−5
0
−4
202
−5 0
0 0
2 −1 344
2
22 34 34
71071
1−2 3s1−2 3
0
−42 0−42
−4
−5
−5
54054 66
0540524 6
Calculate the original and the new weights of paths: 1-3; 1-2-4-3; 1-2-5-4-3. 6CCS3OME/7CCSMOME, Lecture 3
8 / 23
new edge weights
10 0 1313
0

Johnson’s algorithm
JOHNSON(G, w) { G = (V, E), w – edge weights } computeG′ =(V′,E′): V′ =V ∪{s}, E′ =E∪{(s,v)|v∈V}
assign weights to new edges: w(s, v) ← 0, for each v ∈ V

return D = duv
Running time: O(nm) (one BELLMAN-FORD);
if BELLMAN-FORD(G′,w,s)=FALSE then
terminate: the input graph G contains a negative cycle
else { BELLMAN-FORD has computed values δ(s, v) } for eachv∈V do h(v)←δ(s,v)
for each(u,v)∈E do wˆ(u,v)←w(u,v)+h(u)−h(v) for eachu∈V do
ˆ
run DIJKSTRA(G, wˆ, u) to compute δ(u, v) for all v ∈ V
ˆ
for eachv∈V do duv ←δ(u,v)−[h(u)−h(v)]
n · O(min{n2, m log n}) (n DIJKSTRA’S); O(n2) (other computation).
Summing up, the total running time is O(min{n3, nm log n }). 6CCS3OME/7CCSMOME, Lecture 3
9 / 23

Correctness of Johnson’s algorithm
1. The input graph G contains a negative cycle, if and only if, there is a negative cycle in graph G′ reachable from s.
This means that the algorithm correctly identifies whether the input graph
has a negative cycle.
a shortest path from s to v
v
2. For each edge (u, v) in G, the new weight wˆ(u, v) assigned to this edge in Johnson’s algorithm is nonnegative. Indeed,
u
δ(s, u) + w(u, v) ≥ δ(s, v)
(the Triangle Inequality for the shortest-path weights), so
a shortest path
wˆ(u,v) = w(u,v)+h(u)−h(v) = w(u,v)+δ(s,u)−δ(s,v) ≥ 0. The nonnegative weights imply that the algorithm correctly computes
ˆ
δ(u, v) for all pairs of vertices.
3. For any numbers h(.) and for each pair of nodes u and v in G: ˆ
δ(u, v) = δ(u, v) − [h(u) − h(v)] (shown earlier)
Thus the algorithm computes correctly δ(u, v) for all pairs of vertices. 6CCS3OME/7CCSMOME, Lecture 3
10 / 23
s
from s to u

Dijkstra’s algorithm for single-source single-destination
• Run Dijkstra from the source and stop when the destination node is considered (is removed from Q). May check the whole network.
s
d
s
d
• Run in parallel two Dijkstra’s computations, one from the source, one from the destinations (considering edges backward). Stop when the shortest
s → d path is found. A smaller part of the network is examined.
How do we know that the combined path (one part from the shortest-path tree from s and the other part from the shortest-path tree to d) is the
shortest s − d path? Implementation is somewhat tricky. 6CCS3OME/7CCSMOME, Lecture 3
11 / 23

Dijkstra’s algorithm for “geographical” networks

The nodes are geographical places (say, towns), and we know their coordinates. Theweightsofedges(roaddistances)areatleastthe straight-line distances between the nodes.
How can we use this information to speed-up Dijkstra’s computation for finding a shortest source-destination path?
• •
Use re-weighting of edges to favour the edges leading “geographically” closer to the destination.
The re-weighting with h(v) = −dist(v, d), where dist(v, d) is the straight-line distance from node v to the destination d (computed from the coordinates of v and d):
7
u
5
24
1
3
wˆ(u, v) 27
= w(u, v) − dist(u, d) + dist(v, d) d
(observe that this is always ≥ 0) d
5 3
22 5
26
6CCS3OME/7CCSMOME, Lecture 3
12 / 23
v
=⇒ 10
u
20

Dijkstra’s algorithm with “geographical” re-weighting
• The search for the shortest path is “directed” towards the destination. Only relatively few edges away from the shortest path are considered. This can lead to a considerable improvement of the average running time. But compute the new weights only when you need them!
• The worst-case running time remains as in the main Dijkstra’s algorithm for the single-source (all destinations) case.
6CCS3OME/7CCSMOME, Lecture 3 13 / 23
s
d

Dijkstra’s algorithm with re-weighting to direct search towards the target
• For single-source single-destination shortest-path queries in geographical networks, Dijkstra’s algorithm with re-weighting by the straight-line distances to the destination works because the straight-line distances satisfy the triangle inequality:
dist(u,d) ≤ dist(u,v)+dist(v,d) ≤ w(u,v)+dist(v,d) so the new weights are non-negative:
wˆ(u,v) = w(u,v)−dist(u,d)+dist(v,d) ≥0.
• The straight-line distances are used here as (under)-estimates of the
shortest-path weights.
• Can we use this idea of speeding-up Dijkstra’s algorithm in other context, when we might have some estimates on
the shortest-path weights, but
we cannot guarantee that those estimates
satisfy the triangle inequality? 6CCS3OME/7CCSMOME, Lecture 3
14 / 23

Dijkstra’s algorithm with re-weighting: a general setting (1)
Example:
• The nodes of the graph represent all possible ‘configurations’, say of some game.
c”
An edge from a configuration c′ to
a configuration c′′ represents a possible move, and the weight w(c′, c′′) of this edge represents the cost
of this move (a positive number).
c’ s
c
t
• Find a shortest (least costly) path of moves from a given starting configuration s to a given goal (target) configuration t.
• This is a single-source single-destination shortest-path problem with non-negative edge weights, so we can use Dijkstra’s algorithm, but . . .
• The graph is huge, way too large to be given explicitly as input. The graph is given implicitly by specifying a procedure which for a given configuration generates all possible moves from this configuration.
• We could try Dijkstra, but it would take too much time and memory. 6CCS3OME/7CCSMOME, Lecture 3 15 / 23

Dijkstra’s algorithm with re-weighting: a general setting (2)
• Let’s say for each configuration c
we have an estimate b(c) ≥ 0 on
the cost of getting from c
to the target configuration t, and these etimates are easy to compute.
w’+3 19 w”−5 c”
• We can re-weight using estimates b(.): wˆ(u,v) = w(u,v)−b(u)+b(v).
• We want to apply Dijkstra’s algorithm
to the new weights wˆ(u, v), so that
the computation is directed towards the target.
What should the considerations be here?
6CCS3OME/7CCSMOME, Lecture 3
16 / 23
s
t
20 c’
23 17
15

Dijkstra’s algorithm with re-weighting: a general setting (3)
• Firstly, we probably cannot guarantee non-negative new weights.
set S
c”
u v
• We should therefore use the
modified Dijkstra’s algorithm,
which moves a node back from S to Q, if its shortest-path estimate is updated.
c’ s
t
• In the diagram, if node u is considered
in the current iteration, RELAX(u, v)
may update (reduce) the shortest-path estimate d[v] at v. If d[v] is reduced, then node v is removed form set S and put back to the priority queue Q.
• The modified Dijkstra guarantees that the shortest path to t is eventually found (no negative cycles, so the computation will end successfully), if we compute shortest paths to all nodes.
In other words, we cannot stop the computation when we take the target node t from Q, because the d[t] value at this point of the computation isn’t guaranteed to be the shortest-path weight to t.
6CCS3OME/7CCSMOME, Lecture 3 17 / 23

Dijkstra’s algorithm with re-weighting −→ A* search algorithm
• The modified Dijkstra’s algorithm with re-weighting using a cost estimate function b is the A∗ search algorithm (common in the context of AI methods).
c”
• In the A∗ algorithm, function b(.) is called a heuristic function, and is often denoted by h(.)
• The property of the heuristic function h
which guarantees that when the target
node t is taken from Q, then the shortest path to t is computed:
For each edge (u, v), h(u) ≤ w(u, v) + h(v).
• A heuristic function h(.) which satisfies this property is called consistent or
monotone, and is equivalent to the property that all new weights wˆ(u,v) = w(u,v)−b(u)+b(v)
are non-negative.
6CCS3OME/7CCSMOME, Lecture 3 18 / 23
c’ s
t

Examples and Exercises – LGT
1. For an input graph with no negative edge weights, we have computed all-pairs shortest paths: two output matrices as on slide 3.
One edge (u, v) changes its weight, while all other edge weights remain as they were. What do we have to recalculate to update the all-pairs shortest-path matrices?
Consider cases:
• edge (u, v) belongs / doesn’t belong, to computed shortest-path trees;
• the weight of edge (u, v) increases / decreases.
2. What is the running time of Dijkstra’s algorithm for geographical networks, in terms of n, m, p and q, where n and m are the number of nodes and edges in the graph, respectively, p is the number of iterations (until the destination node is taken from Q), and q is the number of executed relax operations?
In the worst case p = n and q = m, but we would expect p and q to be much smaller than n and m. How should the algorithm be implemented so that the running time depends on p and q, but not on n or m?
6CCS3OME/7CCSMOME, Lecture 3 19 / 23

Examples and Exercises – LGT (cont.)
3. Consider the following problem of maximizing the number of deliveries. The input is a directed graph G = (V, E) with a designated vertex s where a courier is located at time 0.
Each vertex (location) v other than s has a specified fixed time tv > 0 when the message for v should be delivered. That is, if the courier wants to deliver the message at location v, it has to be done exactly at time tv.
Each edge (x, y) of G has a specified travel time w(x, y) > 0: the courier needs w(x, y) time to go from location x to location y.
Design an algorithm for computing a delivery route for the courier which maximises the number of delivered messages.
Further assumptions: (i) the courier can wait, that is, does not have to be constantly moving; (ii) when the courier is at a location v, the delivery of the messages at this location is instantaneous (no any additional time); (iii) the courier can move along the same edges more than once.
Hint: analyse this problem referring to shortest-paths problems and use
appropriate shortest-paths algorithms.
6CCS3OME/7CCSMOME, Lecture 3 20 / 23

Exercises – SGT
1. How does the reweighting change the weights of the cycles?
2. If the input graph for Johnson’s alg. doesn’t have negative cycles but has a
cycle of weight 0, what are the new weights of the edges on this cycle? 3. Consider Johnson’s all-pairs shortest path algorithm and an input graph
with all edge weights non-negative.
(a) What is the relationship between the input weights w and the weights wˆ computed in Johnson’s algorithm?
(b) If Johnson’s algorithm computes the new edge weights using the “Bellman-Ford with FIFO Queue” algorithm, then what is the running time of Bellman-Ford in Johnson’s algorithm for such an input graph (when all edge weights are non-negative)? Give a precise bound.
4. Consider Johnson’s algorithm with the following modification: instead of adding a new vertex s, let s be any vertex of the input graph G.
(a) Give an example of an input graph for which Johnson’s algorithm modified in this way gives incorrect output.
(b) Show that if the input graph is strongly connected (every vertex is
reachable from every other vertex), then the modified Johnson’s
algorithm gives correct output. 6CCS3OME/7CCSMOME, Lecture 3
21 / 23

Exercises – SGT (cont.)
5. For the “geographical” network given below, the starting node p and the destination d, compare the computation of the standard Dijkstra’s shortest-paths algorithm, which uses only given original distances of the direct connections, with the computation of Dijkstra’s algorithm which uses the geographical re-weighting. The straight-line distances from all nodes to the destination node d are given below.
In both cases, show the shortest paths tree at the termination of the computation, that is, at the time when the destination node d is taken from the priority queue. Indicate also all edges to which the relax operation has been applied.
Note that the original distances are the same in both directions of an edge, but the re-weighted distance of an edge (x, y) may be different than the re-weighted distance of the reverse edge (y, x).
6CCS3OME/7CCSMOME, Lecture 3 22 / 23

Exercises – SGT (cont.)
5. (cont.)
f
6
h
6CCS3OME/7CCSMOME, Lecture 3
23 / 23
e 36
3 r
4
d
4 j9
9
g 3p4
3
c
5
6 5
i 2
l3 4
7
k
7 a
q
9 straight−line distances to d:
10
57 b
7
6
abcdefghijklpqr 21 13 15 0 12 17 6 4 14 18 9 16 11 13 6

6CCS3OME/7CCSMOME – Optimisation Methods
Lecture 4
Network flow problems Ford-Fulkerson method
Tomasz Radzik and Kathleen Steinho ̈fel
Department of Informatics, King’s College London 2020/21, Second term

Maximum network-flow problem
85 27 27
t
1
49s12141 36
• A Flow Network: G = (V, E, c, s, t), where
• V – set of n nodes, E – set of m directed edges (links)
• For each (u,v) ∈ E, c(u,v) ≥ 0 is the capacity of edge (u,v).
if (u, v) ̸∈ E, then (for convenience) define c(u, v) = 0; • two distinguished nodes: source s and sink t. (s ̸= t)
• Find (design) a maximum flow from the source to the sink:
• a flow of the maximum total amount, while the flow on each edge is not greater than the capacity of this edge;
• a continuous flow of the maximum total rate, while the rate of flow on each edge is not greater than the capacity of this edge.
6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method 2 / 40
2

Flow in network: example
52 85
t
1
27272 441
9s121 36
1
the rate (value) of this flow = 5 + 2 + 1 + 1 = 9
85 22571
7252 7
4241 9s1213
311
36
33
2
6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method
3 / 40
the maximum flow rate (value) = 5+5+1 = 11
t
1
2
1

Application: Problem of Representatives
• Applications in transportation and communication networks.
• Applications discussed in the Ahuja-Magnanti-Orlin textbook:
parallel machine scheduling, assignment of computer modules to computer processors, rounding of census data, tanker scheduling,
the problem of representatives.
• The problem of representatives:
A town has r residents, q clubs, p political parties; r ≫ q ≫ p.
Each resident may be a member of a number of clubs, but belongs to exactly one political party.
Select the governing council of exactly q members, which satisfies the following balancing “properties”:
1. Each club is represented in the council by (exactly) one of its members. One council member can represent only one club.
2. For each political party Pk, the number of council members belonging to this party is at most uk. (The numbers uk are given.)
Is it possible to select a council which satisfies this properties?
6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method 4 / 40

Problem of Representatives reduced to the Maximum-flow Problem
clubs
residents parties
integral flow from s to t
(positive flow on the red (bold) edges)
C
P
2 u 1 = 1
C
2
R 3
1 1
C 1 R 1 2 3
C3 C4
1 C3 1
11 5
2
R 1
R 1 1 1
P u = 1 C 111 111
R
1 R
11 1
R 4
R 1
R 5
R 1 R6
1 1
u u3 3 = = 2 2
R6 R
1
2
7
R 7
P2 u2 = 2
s
1
1 4
u 2 = 2 P2 t
P3
u3 = 2
C4 1
P3
• An integral flow in the constructed flow network corresponds to a feasible “partial” council (some clubs might not have their representatives).
• It is possible to select a full balanced council (Properties 1 & 2), if, and only
if, the maximum flow value is equal to q (the number of clubs).
6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method 5 / 40

Other flow problems
• Flow-Feasibility problem (Transshipment problem): multiple sources and sinks with specified supplies and demands.
Find a feasible flow, that is, a flow which satisfies the edge capacities and the specified supply/demand values at the nodes.
(6)
(4)
(1) (8)
(2) (3)
(3) (2)
(3)
+8 (5)
−6
(3)
(2)
(4)
(4)
(4)
This problem is “equivalent” to the maximum flow problem. • Minimum-cost flow problems.
Multicommodity flow problems. These problems are more difficult
than the maximum flow problem. 6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method
6 / 40
(1)
+5
−4 (1)
−3

Flows – formal definition
• We assume, for convenience, that if (u,v) ∈ E, then (v,u) ̸∈ E.
uv
If there are “antiparallel” edges (u, v) and (v, u)
in a given application, we can convert the network into an equivalent one with no antiparallel edges.
uv
• Aflowisafunctionf:E→R, wheref(u,v)≥0istheflow on edge (u, v), that has the following properties:
• Capacity constraints:
For each edge (u,v) ∈ E, 0 ≤ f(u,v) ≤ c(u,v).
0 <= f(u,v) <= c(u,v) . uv • Flow conservation: For each node v ∈ V − {s, t}, the total flow into v is equal to the total flow out of v. 2 In other words: the net flow into v is equal to 0. 3 v 8 4 6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method 7 / 40 new vertex 1 Flow conservation Foreachnodev∈V −{s,t}: • Theflowintovisequal a v d flow in = flow out: to the flow out of v: b c . 􏰄 f(x,v) = 􏰄 f(v,z). (x,v)∈E (v,z)∈E • The net flow into v is equal to 0: a vd net flow into v: 􏰄 f(x,v)− 􏰄 f(v,z)=0. (x,v)∈E (v,z)∈E 6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method 8 / 40 b c . e a+b+c = d+e e a+b+c−d−e = 0 The maximum flow problem (formally) • The value of a flow f is the net flow from the source, which is equal to the net flow into the sink, due to the flow conservation condition: |f| = def 􏰄􏰄s f(s,v) − f(u,s) = 􏰄 f(x,t) − 􏰄 f(t,z) (x,t)∈E (t,z)∈E (s,v)∈E (u,s)∈E • Maximum flow problem: For a given flow network, find a flow of the maximum possible value (maximizes the net flow from the source). Such a flow is called a maximum flow. (Not necessarily unique.) • For a given flow f, if f(v,u) = c(v,u), then we say that flow f saturates edge (v, u), and edge (v, u) is a saturated edge. 6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method 9 / 40 t From flows to paths • G = (V,E,c,s,t): a flow network; f: a flow in G. • Flow f can be “decomposed” into at most m paths from s to t and cycles, where m is the number of edges in G. • Example (no flow cycles here). Decompose the following flow into paths (the numbers on the edges are the edge flows): s a b 35 • Algorithm: keep selecting and removing (from the current flow) “maximal” s–t path flows. Each path removes all remaining flow from at least one edge, so at most m paths. 4 • This algorithm can be extended to the case when there are flow cycles. • One flow path (or cycle) can be found in O(n) time, and there are at most m paths/cycles selected, so the total running time is O(nm). This is less than the time needed to find a maximum flow. 6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method 10 / 40 11 4 3 7 3 c d 5 2 9 5 t The Flow-Feasibility problem (formally) (6) (4) (1) (8) (2) (3) (3) (2) (3) +8 (5) −6 (3) (2) (4) (4) (4) (1) Assume: 􏰃v∈V d(v) = 0, that is, total supply = total demand. • Find a feasible flow f, that is, a flow within the edge capacities and such that for each node v ∈ V , the net flow from v equals d(v): 􏰄 f(v,x)− 􏰄 f(z,v) = d(v). (v,x)∈E (z,v)∈E +5 −4 (1) −3 • Input: G = (V, E, c, d), where V and E are the sets of nodes and edges; for each (v,u) ∈ E, c(v,u) ≥ 0 is the capacity of edge (v,u); for each v ∈ V , d(v) indicates the supply/demand at node v. d(v) > 0: supply of d(v) units at v; d(v) < 0: demand of |d(v)| units at v; d(v)=0: a“transitional”node(flowonlypassesthrough). • The above condition is the flow conservation property for this problem. 6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method 11 / 40 Reduction from flow-feasibility to maximum-flow s (2) (3) (3) (5) (3) (8) (1) (8) (2) (6) +8 (5) (2) −6 (1) (6) (3) +5 (2) −4 (1) (4) (4) (4) (4) −3 • For a given input G = (V, E, c, d) of the flow feasibility problem, construct the following input G′ = (V ′, E′, c′, s, t) for the maximum flow problem: • V′ =V ∪{s,t},wheresandtaretwonewnodes; • E′ =E∪{(s,v):v∈V, d(v)>0}∪{(u,t):u∈V, d(u)<0}; • c′(v, u) = c(v, u); c′(s, v) = d(v); c′(u, t) = −d(u). 6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method 12 / 40 (3) t Reduction from flow-feasibility to maximum-flow (cont.) • Compute a maximum flow f′ in G′. (5) 3 (3) +5 −4 5 1 (3) s (2) 2 (2) 81 1(2)4 (8) (3) 3(4) (4) 6(6) (6) 4 (2) 2 (8) −3 +8 (5) −6 4 (4) 3 (4) • A maximum flow in G′ saturates all edges outgoing from s, if and only if, there is a feasible flow in G. • If a maximum flow f′ in G′ saturates all edges outgoing from s, then remove the added nodes and edges to get a feasible flow in G. • If f is a feasible flow in G, then saturate all edges outgoing from s and all edges incoming to t to get a maximum flow in G′. 6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method 13 / 40 3(3) (1) 1(1) (1) 1t 3 (3) Residual capacities • Let f be a flow in a flow network G = (V,E,c,s,t). • The residual capacity of an edge (u, v) ∈ E is defined as: cf (u, v) = c(u, v) − f (u, v). 3 (4) (1) x (y) flow capacity u v u v capacity = 4; 3 units of flow residual capacity = 1 • If for an edge (u,v) ∈ E, f(u,v) > 0, then the residual capacity of a reverse edge (v, u) is defined as: cf (v, u) = f (u, v).
u
v
u
v
3 (4)
(1)
0 < f(u,v) < c(u,v) positive residual capacities in both directions • The positive residual capacity of the reverse edge (v, u) represents possibility of decreasing the current positive flow f (u, v). 6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method 14 / 40 (3) Residual networks • If cf(u,v) > 0, then (u,v) is a residual edge.
• Let Ef denote the set of all residual edges.
m = |E| ≤ |Ef | ≤ 2|E| = 2m
• The residual network of G induced by flow f is the flow network
Gf = (V,Ef,cf,s,t),
where cf are the residual capacities and Ef is the set of residual edges.
6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method
15 / 40

Increase the flow using the residual network: example
input
1 (1)
(1)
(3)
network G
and flow f (3) of value 7
2 (5) 4 (4)
residual
(3) network Gf
flow f’
(2) (3)
of value 1
1 (3)
(2) s
1 (3)
in residual
(2) s
4 (4)
3 (3)
and flow h = f f’
(1)
a 1 (3) b a (2) b
(2) s
(2) s
1 (1)
a (2) b
a (3) b
(1)
1 (3)
1 (1)
3 (5)
input network G
(4) fctct
network G
1 (3)
1 (3)
of value 7+1 = 8.
6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method
16 / 40
3 (3)
(3)
(2)
(4) ctct
(3)
(3)
.

Augmenting the current flow by a flow in the residual network
• Iff isaflowinGandf′ isaflowintheresidualnetworkGf,thenwecan augment f by f′ to get a new greater flow h in G.
• Thenewflowh=f ↑f′ inGisdefinedinthefollowingway: h(u,v)=􏰂 f(u,v)+f′(u,v)−f′(v,u) if(u,v)∈E,
u
u
v
3 (7)
2 (4)
v u 0 (3) v
5 (7)
in the input network in the residual network
combined (in the input network)
u
u
v
3 (7)
0 (4)
v u 2 (3) v
1 (7)
0 otherwise.
in the input network in the residual network
• This definition of f ↑ f′ works also if both f′(u,v) and f′(v,u) are positive.
• |f ↑f′|=|f|+|f′|, thatisthevalueofflowf ↑f′ (thenetflowfromthe source) is equal to the sum of the values of flows f and f′.
6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method 17 / 40
combined (in the input network)

A general approach to finding a maximum flow
• Start with the zero flow and keep iteratively increasing the flow by flows in the residual network:
f← zeroflow {f(u,v)=0,foreach(u,v)∈E} loop
(1) Construct the residual network Gf
(2) Find a nonzero flow f′ in Gf ,
If there is no such a flow, then exit
(3) f ← f ↑ f′ { augmentation: augment flow f with flow f′ }
end of loop
• Most of the max-flow algorithms which follow this general approach compute in each iteration a path flow f′, that is, a flow along one residual path from s to t.
6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method 18 / 40

Augmenting paths and path flows
G = (V,E,c,s,t) – a flow network; f – flow in G.
• Residual path: a simple path in the residual network Gf .
Augmenting path: a residual path from s to t.
• The residual capacity of an augmenting path p is the maximum amount of
flow that can be sent along path p:
cf(p) = min{cf(u,v) | (u,v) is an edge on p}.
s (5) (7) (3) (4) (3) (7) t
bottleneck capacity
• Let p be an augmenting path in Gf . The (path) flow fp in Gf is: fp(u,v) = 􏰂 cf(p), if (u,v) is on p,
• Thevalueofflowfp is |fp|=cf(p)>0. 6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method
19 / 40
0. otherwise.

Ford-Fulkerson method
FORD-FULKERSON(G) { G=(V,E,c,s,t) }
f← zeroflow {f(u,v)=0 foreach (u,v)∈E} loop (1) Construct the residual network Gf
(*)
(3) f ← f ↑ fp { path augmentation } end of loop
(2) Find an augmenting path p in Gf ,
If there is no augmenting path in Gf , then exit
Below, the highlighted edges have positive flow, the other edges – zero flow.
current f
y> q z = q t sqqqqqqt
fp new f
+q t
20 / 40
s
s +q
6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method
x >q
x−q
y−q
0
+q

Ford-Fulkerson method (cont.)
FORD-FULKERSON(G) { G=(V,E,c,s,t) }
f← zeroflow {f(u,v)=0 foreach (u,v)∈E}
• •
Does this algorithm compute a maximum flow in G?
loop (1)
t
end of loop
Construct the residual network Gf (2) Find an augmenting path p in Gf ,
If there is no augmenting path in Gf , then exit (3) f ← f ↑ fp { path augmentation }
(100)
(100) (1)
What is the running time?
• The running time of one iteration is O(m): construct Gf : Θ(m), or O(n), if incrementally; search Gf to find an augmenting path: O(m); updatetheflow: O(n).
(n – number of nodes; m – number of edges)
s
• The number of iterations depends on the selection strategy in step (2).
6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method
21 / 40
(100)
(100)

Cuts
• Acut(S,T)inanetworkG: S⊆V, T =V −S, s∈S, t∈T.
That is, a cut is a partitioning of the set of nodes V into two disjoint sets S andT suchthatthesourcenodesisinsetS andthesinknodetisinsetT.
• The capacity of a cut (S,T) is the sum of the capacities of the edges from S to T :
T
c(S,T) = 􏰄 c(u,v). (u,v)∈E: u∈S,v∈T
S s
t
• Iff isaflow,then
the net flow across the cut (S, T ) is:
f(S,T) = 􏰄 f(u,v) − (u,v)∈E: u∈S,v∈T
􏰄 f(x,y). (x,y)∈E: x∈T,y∈S
• For each cut (S,T), f(S,T) = |f|,
that is, the net flow across the cut is equal to the net flow into the sink t. This follows from the flow conservation constraints.
6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method 22 / 40

Cuts: example
(4)
1 (1)
2 (2)
(1)
S
(8) (2)
t 5 (5)
1 (1)
1(9) s 1(3)
1(1)
For this cut (S,T) and this flow f:
• c(S,T) = 4+7+7+1+6 = 25.
• f(S,T) = 7+3+1−2 = 9 = |f|.
6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method
23 / 40
7(7) 2(2) 3(7)
1 (4)
(6)
(2)

Cuts (cont.)
• For any flow f and any cut (S,T): |f| = f(S,T) ≤ c(S,T).
(The net flow across a cut cannot be greater than the capacity of this cut.)
• Therefore, the value of a maximum flow is not greater than the minimum capacity of a cut.
max{|f| : f isaflowinG} ≤ min{c(S,T): (S,T) isacutinG}.
(4) 1(9) s 1(3)
2 (2)
1(1)
(8) 5 (5)
(2) 2(2) 3(7)
1 (1)
7(7) 1 (1)
1 (4)
(1)
• Thiscut(S,T)isaminimumcapacitycut, c(S,T)=5+2+2+1=10,so the maximum value of flow for this network is not greater than 10.
6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method 24 / 40
(6)
T
t
(2)

The Max-flow Min-cut theorem
• Theorem 1. (Max-flow Min-cut theorem)
The maximum value of a flow equals to the minimum capacity of a cut:
max{|f| : f isaflowinG} = min{c(S,T) : (S,T) isacutinG}.
• Theorem 2.
For a flow f in G, the following three conditions are equivalent. (a) f is a maximum flow in G.
(b) There is no augmenting path in the residual network Gf . (c) For some cut (S,T) in G, |f| = c(S,T).
• Theorem 2 can be proven by showing implications: (a) ⇒ (b) ⇒ (c) ⇒ (a).
• Theorem 1 follows from Theorem 2:
(a) ⇒ (c), so for a maximum flow f, there is a cut (S,T) such that
|f| = f(S,T) = c(S,T) ≥ min{c(S′,T′) : (S′,T′) is a cut in G}.
• Correctness of the FORD-FULKERSON method: When this method terminates, then for the computed flow f, there is no augmenting path in the residual network Gf , so f is a maximum flow (by Theorem 2, (b) ⇒ (a)).
6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method 25 / 40

Max-flow Min-cut (cont.)
• Theorem 2.
For a flow f in G, the following three conditions are equivalent. (a) f is a maximum flow in G.
(b) There is no augmenting path in the residual network Gf . (c) For some cut (S,T) in G, |f| = c(S,T).
• (a) ⇒ (b). Assume f is a maximum flow in G.
If there is an augmenting path, then the flow cannot be maximum because
we can use such a path to increase the value of flow.
T
• (b) ⇒ (c). Assume no augmenting path in Gf . Let S be the set of nodes reachable from s
S s
t
in the residual network Gf . t is not in S. AlledgesfromStoT =V −Sare
saturated. All edges from T to S have zero flow.
Therefore, |f| = f(S,T) = c(S,T).
• (c)⇒(a). Let(S,T)beacutasin(c).
|f| = f(S,T) = c(S,T), but there is no flow of value greater than c(S,T), so
f is a maximum flow.
6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method 26 / 40

Number of iterations in Ford-Fulkerson
• Depends on the strategy of selecting an augmenting path p in each iteration.
• If FORD-FULKERSON is run on a network such that all edge capacities are integral:
• The residual capacities are integral throughout the computation, so the value of the current total flow in G increases in each iteration by some integer, that is, by at least 1.
• Thus, if f∗ denotes a maximum flow, then the number of iterations is at most |f∗|.
• Hence the total running time is O(|f∗|m).
• Note that this bound does not depend on the selection strategy used.
• There is no general bound on the number of iterations which would depend only on the size of the network (on numbers n and m).
6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method 27 / 40

The Edmonds-Karp algorithm
The Edmonds-Karp algorithm is the Ford-Fulkerson method with the following strategy for selecting augmenting paths:
In each iteration select a shortest augmenting path (that is, a path with the fewest number of edges).
EDMONDS-KARP(G)
f ← zeroflowinG (f(u,v)=0,foreach(u,v)∈E) loop
1) Construct residual network Gf
2) BFS in Gf from s to find a shortest augmenting path p
3) If there is no augmenting path in Gf , then exit
(the current flow is optimal)
4) Let p be the selected augmenting path
f ← f ↑ fp (augment the current flow with flow fp) end of loop
6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method
28 / 40

The running time of Edmonds-Karp
• Each iteration takes O(m) time and there are O(nm) iterations, so the total running time is O(nm2).
• This O(nm2) bound does not depend on the values of the edge capacities, and does not depend on the maximum value of a flow.
• The bound O(nm) on the number of iterations follows from the following claim.
Claim: Let q denote the number of iterations in EDMONDS-KARP, and let k1, k2, k3, . . . , kq denote the lengths of the augmenting paths selected in iterations 1,2,3,…,q. We have:
(a) 1≤k1 ≤k2 ≤k3 ≤…kq ≤n−1,
(b) the same length appears in the sequence < k1,k2,...,kq > at most m times.
• This claim immediately implies that the number of iterations q ≤ nm.
• We omit the proof of this claim.
6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method 29 / 40

Other maximum-flow algorithms
• “Augmenting paths” algorithms:
• Modified Edmonds-Karp algorithm – O(n2m)
(by achieving the average time of O(n) per one iteration)
• Dinic’s algorithm – O(n3)
• fastest known “blocking flow” algorithm – O(nm log n)
• “Preflow-push” algorithms (not part of this module, but see textbook CLRS, if interested):
• the generic preflow-push algorithm – O(n2m)
• the lift-to-front algorithm – O(n3)
• if special data structure used – O(nm log n)
6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method 30 / 40

Resource assignment & Maximum bipartite matching
• Consider the following problem (an example of “resource assignment” problems):
• We have a number of employees E1,E2,…,Ep and a number of tasks T1,T2,…,Tq which should be done.
• Each employee Ei can do some of the tasks, depending on the employee’s skills and the skills required by individual tasks.
• We want to allocate tasks to employees in such a way that:
• each employee gets at most one task;
• each task is given to at most one employee;
• each employee can get only a task which this employee can do;
• the largest possible number of tasks are allocated.
• This problem can be modeled as
the maximum bipartite matching problem.
6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method
31 / 40

Maximum bipartite matching (cont.)
• Construct a bipartite graph with nodes E1, E2, . . . , Ep on one side and nodes T1,T2,…,Tq on the other side.
An edge (T i, Ej) means that task T i can be done by employee Ej.
T1 T2 T3 Ti
T1 T2 T3 Ti
E1 E2
E3
Ej
E1 E2
E3
Ej
• A matching in a bipartite graph is a subset of edges M such that each node belongs to at most one edge in M.
• In our application, a matching is a feasible allocation (each employee gets at most one task, each task is given to at most one employee, and each employee can get only a task which this employee can do).
• Thus maximising the number of allocated tasks is equivalent to finding in the corresponding bipartite graph a matching of the maximum size.
6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method 32 / 40

Solving the maximum bipartite matching as the maximum flow problem
• For a given bipartite graph B, construct the following flow network G, where all edge capacities are equal to 1:
bipartite graph B
• There is a one-to-one correspondence between
flow network G
matchings in B and integral flows in G:
an edge (v, u) in a given matching
corresponds to a path flow of value 1
from s to t which passes over edge (v, u). 6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method
33 / 40
s
t

Solving the maximum bipartite matching as the maximum flow (cont.)
• For a matching M in B and the corresponding integral flow f in G, the size of the matching M is equal to the value of the flow f.
• Thus finding a maximum-size matching in B is equivalent to finding a maximum integral flow in G.
• Use the Ford-Fulkerson method to find a maximum flow in G. Since the edge capacities in G are integral, the computed maximum flow is integral. The calculated maximum flow gives a maximum-size matching in B.
• The running time is O(mn), where n and m are the number of nodes and
the number of edges in B: at most n iterations (since the value of a
maximum flow in G is at most n), and each iteration takes O(m) time. 6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method 34 / 40
s
t

Examples and Exercises – LGT
1. Assume that the flow in the right diagram on slide 5 is the flow at the end of some iteration of the execution of the Ford-Fulkerson method. Continue this method to obtain a maximum flow.
6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method 35 / 40

Examples and Exercises – LGT (cont.)
2. Trace the computation of the Edmonds-Karp maximum-flow algorithm on the input given below.
For each iteration, show the residual network at the beginning of the iteration, the selected augmenting path, and the total flow at the end of the iteration
Iteration 1
s
bd (1)
t
(7) ap
(8)
(3) (4) (5)
(2)
(5)
(2)
(3) h
c
(3)
The residual network is the same as the input network. Show on the diagram above the selected augmenting path (which gives the total flow at the end of this iteration).
6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method 36 / 40
(1)
(3)

Exercises (cont.)
Iteration 2
ap
Show all residual edges and their residual capacities. Show the selected augmenting path.
s
b
d
t
Show the total flow at the end (7) ap
of this iteration.
Continue, until the computation terminates. 6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method
37 / 40
s
bd (1)
t
c
h
(8)
(3) (4) (5)
(2)
(5)
(2)
(3) h
c
(3)
(1)
(3)

Exercises (cont.)
3. Decompose the computed maximum flow into paths.
4. Show a minimum cut in this network. Is the minimum cut unique?
5. If we want to modify the network so that the maximum flow increases by 1 unit, how many edge capacities do we have to increase?
6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method 38 / 40

Exercises – SGT
1. Trace the computation of the Ford-Fulkerson maximum-flow method on the input given below (the same flow network as in the LGT). In each iteration select the augmenting path which has maximum capacity.
For each iteration, show the residual network at the beginning of the iteration, the selected augmenting path, and the total flow at the end of the iteration
s
t
6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method
39 / 40
(8)
(3) b
(3)
(7) ap
(2)
(5)
(4) d (5)
(2)
(3)
c
(3)
(1)
(1) h

Exercises – SGT (cont.)
2. The following diagram shows who (employees E1, E2, . . . , E6) can execute which tasks (T1,T2,…,T6).
(E1, E5, E6).
Show the computed allocation of tasks.
6CCS3OME/7CCSMOME, Lecture 4: Network flow problems; Ford-Fulkerson method
40 / 40
T1 T2 T3 T4 T5
T6
E1
E2 E3 E4 E5 E6
Trace the execution of the Edmonds-Karp algorithm when applied to calculate a maximum allocation of tasks to employees in this example. Whenever you need to break a tie between vertices, assume that the vertex corresponding to T (or E) with the lower index comes first. For example, the order of the adjacency list of the vertex T4 will be

6CCS3OME/7CCSMOME – Optimisation Methods
Lecture 5
Minimum cost flow problem Multicommodity flow problems
Tomasz Radzik and Kathleen Steinho ̈fel
Department of Informatics, King’s College London 2020/21, Second term

Minimum cost flow problem
(6,3)
3 (4,3) (3,2)
1
3
1 (1,2) 4(4,1)
(1,5)
4
(8,1)
(2,2)
(4,3)
1 (2,3) (3,2)
2 (3,8)
Cost of this flow:
1*2 + 4*7 + 3*3 + … = 114
+8
−6
(5,7)
4 (4,5)
2
+5
−4
−3
G = (V, E, u, c, d).
A Flow Network:
• V – set of nodes, E – set of directed edges (links).
• Foreachedge(v,w)∈E,
3 (4,1)
u(v, w) ≥ 0 is the capacity of edge (v, w) (the 1st number on the edge). • For each node v ∈ V , (as in the flow-feasibility problem),
|d(v)| is the (initial) supply (if d(v) > 0)
or demand (if d(v) < 0) at v. 􏰃v∈V d(v) = 0. • Foreachedge(v,w)∈E, c(v,w) ≥ 0 is the cost of one unit of flow on edge (v,w) (the 2nd number on the edge). 6CCS3OME/7CCSMOME, Lecture 5: Min-cost flow, Multicommodity flow 2 / 27 1 3 Is there a cheaper solution? (3,3) Minimum cost flow problem (cont.) • Objective: Find a flow which satisfies all supply and demand and has the minimum possible cost, or determine that it’s not possible to satisfy supllies/demands. • Formally, a flow (in network G) is a function f : E → R, assigning flows to edges, f(v,w) isthenon-negative(amountof)flowonedge(v,w), which satisfies the capacity constraints and the flow conservation constraints. • Thecostofaflowf is: 􏰄 (v,w)∈E c(v, w)f (v, w). • Capacity constraints: For each edge (v, w) ∈ E, f (u, v) ≤ u(v, w). • A technical assumption (for convenience): if (v,w) ∈ E, then (w,v) ̸∈ E. (3, 5) (3, 5) vwvw (3, 5) (3, 0) (2, 6) (2, 6) 6CCS3OME/7CCSMOME, Lecture 5: Min-cost flow, Multicommodity flow (2, 6) vw 3 / 27 Flow conservation constraints • Netflowoutgoingfromnodev: 􏰄 f(v,x)− 􏰄 f(z,v). b (a + b + c) − (d + e) . a v d e c • Flow conservation constraints: (v,x)∈E net flow out of v: (z,v)∈E • the net flow outgoing from each transitional node (d(v) = 0) is 0; • the net flow outgoing from each “supply” node v is at most d(v); • the net flow incoming to each “demand” node v is at most |d(v)| = −d(v) (the demand at v), or equivalently, the net flow outgoing from v is at least d(v) Formally, the net flow outgoing from v:  = 0, if d(v) = 0, 􏰄 f(v,x)− 􏰄 f(z,v)= ≤ d(v), ifd(v)>0,
(v,x)∈E (z,v)∈E
• A flow satisfies all supplies and demands,
 ≥ d(v), if d(v) < 0. if the net flow outgoing from each vertex v is equal to d(v). 6CCS3OME/7CCSMOME, Lecture 5: Min-cost flow, Multicommodity flow 4 / 27 Flow conservation constraints: example (6,3) a −3 t2 2 (4,1) b 1 (2,3) s2 +5 2 (3,8) 1 (1,2) t3 −4 type of node node net flow out supply transitional supply node d(v) a,b,g,v 0 = 0 demand nodes net flow in demand net flow out d(v) 1 (3,2) 2 (4,3) 3 (4,3) 3(4,1) (1,5) net flow out from v: 3 (8,1) t1 (2,2) 12 (3,3)  = 0, if d(v) = 0, +8 g s1 (3,2) −6 ≤ d(v), ifd(v)>0,  ≥ d(v), ifd(v)<0. (5,7) 3 (4,5) 2 s1 s2 7≤8 4 ≤ 5 t1 t2 t3 −d(v) 5≤6⇔−5≥−6 v 3≤3⇔−3≥−3 3≤4⇔−3≥−4 6CCS3OME/7CCSMOME, Lecture 5: Min-cost flow, Multicommodity flow 5 / 27 Application: Production and distribution problems • A car manufacturer produces several car models in several plants and ships them to several retail centres. • Each retail centre requests a specific number of cars of each model. • Determine the production plan for each plant and the shipping schedule so that the total cost of production and transportation is minimized. (Application 9.1 in Ahuja, Magnanti and Orlin’s textbook). • We can model this problem as a minimum cost flow problem: • four types of nodes: plant nodes, plant/model nodes, retailer/model nodes and retailer nodes; • three types of edges: production edges, transportation edges and demand edges. 6CCS3OME/7CCSMOME, Lecture 5: Min-cost flow, Multicommodity flow 6 / 27 Application: Production and distribution problems (cont.) +120 +180 p1 (100, 5) p2 p2/m2 r2/m1 p2/m2 (50, 3) r1/m2 p1/m1 r1/m1 p2/m1 r1/m3 p2/m3 r2/m2 (80, 0.5) (70, 0.6) r1 • A production edge (pi, pi/mj): capacity – maximum possible production of cars mj at plant pi; cost – production cost of one car mj at plant pi. • A transportation edge (pi/mj, rk/mj): transportation of cars mj from plant pi to retailer rk; capacity and transportation cost per one car. • A demand edge (rk/mj, rk): capacity - demand for cars mj at retailer rk. • Supplies at nodes pi (production capacities) and demands at nodes rk. 6CCS3OME/7CCSMOME, Lecture 5: Min-cost flow, Multicommodity flow 7 / 27 (90, 0) r2 −200 −100 Application: Production and distribution problems (cont.) +180 p1 70 30 40 +120 50 30 30 p2 p2/m2 r2/m1 p1/m1 r1/m1 p2/m2 p2/m1 r1/m3 80 20 100 40 80 p2/m3 80 r2/m2 • A (integral) flow corresponds to a feasible production/transportation schedule. • A minimum cost (integral) flow gives an optimal (minimum cost) production/transportation schedule. 6CCS3OME/7CCSMOME, Lecture 5: Min-cost flow, Multicommodity flow 8 / 27 60 r1/m2 r1 90 80 r2 20 −200 −100 Sending flows along cheapest paths • Take a node v which has positive remaining supply; find a cheapest path from v to a node w with positive remaining demand; send as much flow as possible from v to w along this path. • Example: 1-st path: (a, c, p) 2-nd path: (a, i) 3-rd path: (b, c, i) +5 (5,3) −9 a2 37 i (4,1) (8,4) c (9,5) (3,0) +7 7 b 3 • Thecostofthisflow: 2∗3+3∗1+7∗5+7∗4+3∗0=72. Notoptimal. • Sometimes we have to “re-route” the flow to get the optimal cost. 6CCS3OME/7CCSMOME, Lecture 5: Min-cost flow, Multicommodity flow 9 / 27 (4,3) p −3 Residual network • Let f be a flow in network G = (V,E,u,c,d). • The residual capacity of an edge (w, v) ∈ E: uf (w, v) = u(w, v) − f (w, v). w 2 (3, 5) (1, 5) x (y, z) flow capacity cost capacity = 3; 2 units of flow residual capacity = 1 v w v • If (w, v) ∈ E and f (w, v) > 0, then the residual capacity of a reverse edge (v,w) is uf(v,w) = f(w,v), and its cost is c(v,w) = −c(w,v).
2 (3, 5)
(1 5)
w (2, −5) v
wv
The positive residual capacity of the reverse edge (v, w) represents possibility of reducing (some) flow on the actual edge (w, v).
6CCS3OME/7CCSMOME, Lecture 5: Min-cost flow, Multicommodity flow
10 / 27

Residual network (cont.)
2 (3, 5)
(1 5)
w (2, −5) v
wv
• The negative costs of the reverse edges ensure that sending flow g(x, y) through a residual edge (x, y) changes the cost of the flow by c(x, y)g(x, y).
Sending flow g(v, w) through a reverse residual edge (v, w) represents reducing the flow on the actual edge (w, v) by g(v, w) and this reduces the cost of the flow by c(w, v)g(v, w), so the cost of flow changes by:
−c(w, v)g(v, w) = c(v, w)g(v, w).
• For each node v ∈ V , the residual supply/demand at v is
df(v)=d(v)− 􏰄 f(v,x)+ 􏰄 f(z,v). (v,x)∈E (z,v)∈E
• The residual network of network G with respect to flow f is
Gf =(V,Ef,uf,c,df), whereEf isthesetofresidualedges(edgeswith positive residual capacities).
6CCS3OME/7CCSMOME, Lecture 5: Min-cost flow, Multicommodity flow
11 / 27

Example: residual networks and cheapest paths in residual networks
current flow in input network:
residual network: [3,3]
+5 a
[4,1] 3
i −9 a i [3,−1]
[8,4] 0 +7 b
[3,0] 3
p −3
[8,4] [3,0]
[5,3] 2
[2,−3]
−7
[4,3] 0
+7 b p [4,3]
flow in residual network: [3,3] 3
new flow in input network:
+7 b
[4,3] 3
a
[2,−3] [3,−1] 3
i [9,5]
−7
+5 a
[4,1] 0
[1,1] [8,4]
c
[8,4] 0 +7 b
[9,5] 0 [1,1] cc
[9,5]
path:
[4,3] 3
6CCS3OME/7CCSMOME, Lecture 5: Min-cost flow, Multicommodity flow
12 / 27
[3,0] 3 p
p −3
[5,3] 5
i −9
c
[9,5] 0 [3,0] 0
cost: (b, c, i) 9 (b,p,c,i) 8 (b,c,a,i) 6 (b,p,c,a,i) 5

Example (cont.)
current flow in input network:
flow in residual network: [3,3] 3
new flow in input network:
+5 a
i −9 [9,5] 0
a
i −9
[4,1] 3
[9,5] ccc
[9,5] 0 [3,0] 0
[8,4] 0 +7 b
[3,0] 3
p −3 +7 b
[3,0] 3 p
[8,4] 0 +7 b
p −3
[5,3] 2
[2,−3] [3,−1] 3
i −7 +5 a
[4,1] 0
[5,3] 5
[4,3] 0
[4,3] 3
[4,3] 3
flow fold
value(fold) = 2+3 = 5
flow fp value(fp) = 3
flow fnew = fold ↑ fp
cost(fold) =
3 · 2 + 1 · 3 + 0 · 3 = 9
cost(fp) = (3+0+(−1)+3)·3 = 15
value(fnew) = value(fold) + value(fp) =5+3=8
Additional cost £18 for sending the additional 3 units through edges (a, i) and (b, p), but £3 back because after all we’re not using edge (a, c).
cost(fnew) = cost(fold) + cost(fp) = 9 + 15 = 24
(= 5 · 3 + 3 · 3)
6CCS3OME/7CCSMOME, Lecture 5: Min-cost flow, Multicommodity flow
13 / 27
[1,1] [8,4]

Successive shortest path algorithm
SUCCESSIVESP(G)
(6,3)
(4,3) (3,2)
(4,1) (8,1)
(3,3)
f ← zero flow;
+8 (5,7)
−6
S={v∈V :d (v)>0} f
(4,5)
(4,1)
while S is not empty do
compute the residual network Gf ;
select a node v ∈ S; { an arbitrary node v ∈ S would do } compute a shortest-path tree T in Gf from v
to all nodes reachable from v;
if there is a node w in T with residual demand (df (w) < 0) then P ←thepathinT fromvtow; {anyw∈T withdf(w)<0woulddo} q← min{df(v), −df(w), min{uf(x,y):(x,y)∈P}}; fP ← theflowofvalueqinGf alongpathP ; f ← f ↑fP; {updateflowf bysendingqunitsofflowalongpathP } if df(v)=0 then S←S−{v}; else S←S−{v}. 6CCS3OME/7CCSMOME, Lecture 5: Min-cost flow, Multicommodity flow 14 / 27 (2,3) (3,8) (3,2) (2,2) (4,3) (1,2) +5 −4 (1,5) −3 Successive Shortest Paths algorithm: correctness and running time • At the end of each iteration, the current flow f has the minimum cost among all flows which satisfy the same demands and supplies as flow f. This can be shown by induction on the number of iterations (not easy!). • Cost of residual edges can be negative, but no negative cycles. • Appropriate “h(v)” numbers can be (efficiently) maintained so that in each iteration the “re-weighted” costs are non-negative. The re-weighting scheme is as in Johnson’s all-pairs shortest-paths algorithm: cˆ(v,w)=c(v,w)+h(v)−h(w). Thus in each iteration, the computation of a shortest path tree can be done by Dijkstra’s shortest path algorithm, in O(min{m log n, n2}) time. • If all input edge capacities and node supplies/demands are integral, then the number of iterations is at most 􏰃{d(v) : v ∈ V, d(v) > 0} (the total supply in the network).
• The number of iterations can be exponential in the number of nodes.
• There are algorithms for the minimum cost flow problem which run in
polynomial time. The best known asymptotic running time of an algorithm
for the minimum cost flow problem is O(m2 log2 n).
6CCS3OME/7CCSMOME, Lecture 5: Min-cost flow, Multicommodity flow 15 / 27

Multicommodity flow problems
• Input instance:
• a (directed) network G = (V, E, u), where u(x, y) is the capacity of an
edge (x, y) ∈ E;
• k commodities. Commodity q (1 ≤ q ≤ k) is specified by (sq,tq,dq),
where
• sq ∈ V and tq ∈ V are the origin (source) and the destination (target)
of commodity q,
• dq (> 0) is the (amount of) demand of this commodity.
• Design simultaneous flow of all commodities which satisfies the demands of all commodities, and optimises some specified global objective.
• Application: allocation of communication paths to requests in telecommunication networks.
• For some multicommodity flow problems, flows do not need to satisfy the edge capacity constraints.
6CCS3OME/7CCSMOME, Lecture 5: Min-cost flow, Multicommodity flow
16 / 27

Example
• Specification of a network: 736
s1
t2 44c
d
2327
6
3q
5 t3
5 t1 s2 a b s3
• Specification of commodities:
commodity source destination amount
6CCS3OME/7CCSMOME, Lecture 5: Min-cost flow, Multicommodity flow
17 / 27
p
1db3 2ac6 3bq4

Example (cont.)
• Simultaneous flow of all commodities:
s1 d
t2
3
26
s2
a
b
s3=t1
2 24c
• Specification of commodities:
commodity source destination amount
p
q
4
t3
1db3 2ac6 3bq4
• Remark: this flow does not satisfy the edge capacities given on the
previous slide.
6CCS3OME/7CCSMOME, Lecture 5: Min-cost flow, Multicommodity flow
18 / 27

Flow of commodities
• A simultaneous flow of the specified commodities consists of separate flows of the individual commodities: ⟨f1, f2, . . . , fq, . . . , fk⟩
• Aflowfq ofcommodityqisafunctionfq :E→R,
where fq (x, y) is the amount of flow of commodity q on edge (x, y), which satisfies the flow conservation constraints.
(The capacity constraints not always present!)
• Flow conservation for commodity q:
the net flow of commodity q outgoing from the origin sq is dq;
the net flow of commodity q incoming to the destination tq is dq (equivalently, the net flow outgoing from tq is equal to −dq);
the net flow outgoing from each node v ∈ V − {sq, tq} is equal to 0.
• Formally, the flow conservation constraints for commodity q says that for each node v, the net flow from v is equal to:
d, ifv=s, 􏰄 􏰄 q q
fq(v,x)− fq(z,v) = −dq, ifv=tq.
(v,x)∈E (z,v)∈E
6CCS3OME/7CCSMOME, Lecture 5: Min-cost flow, Multicommodity flow 19 / 27
0, ifv∈V−{sq,tq}.

Feasibility multicommodity-flow problem
Design simultaneous flow ⟨f1, f2, . . . , fk⟩ of all commodities which satisfies the capacity constrains:
for each edge (v, w) ∈ E, the total flow on this edge is at most the capacity u(v, w) of this edge, that is,
f(v, w) = [the total flow on edge (v, w)] In this example,
defn. 􏰄
= fq(v, w) ≤
u(v, w).
f (p, q) = 3 + 2 + 4 = 9. s1
s2
a
b
s3=t1
6CCS3OME/7CCSMOME, Lecture 5: Min-cost flow, Multicommodity flow
20 / 27
d
3
26
2 24c
p
q
4
t3
k
q=1
t2

Minimum-cost multicommodity-flow problem
• Each edge (v, w) ∈ E, in addition to its capacity u(v, w), has also a cost c(v, w) associated with it.
s1
t2
d
2,1
4,4
7,2
c
6,2
3,5
5,3
7,1
a b6,3 3,6
s2 5,1 s3=t1
4,2
3,4
2,3
p
q
• The cost of a simultaneous flow ⟨f1, f2, . . . , fk⟩ is equal to
􏰄 c(v,w)f(v,w) = 􏰄 c(v,w)[f1(v,w)+f2(v,w)+···+fk(v,w)].
(v,w)∈E (v,w)∈E
• Design simultaneous flow of all commodities which satisfies the capacity
constraints and has the minimum possible cost.
6CCS3OME/7CCSMOME, Lecture 5: Min-cost flow, Multicommodity flow 21 / 27
t3

Minimum-congestion multicommodity-flow problem
• The congestion of flow on an edge (x, y) is defined as: f(x,y), the total flow on edge (x,y).
In this example, if u(p,q) = 10 and u(a,p) = 3, then the congestion:
s2 a
b
s3=t1
t2
on edge (p, q) is 9/10 = 0.9 (or 90%), on edge (a, p) is 4/3 = 1.33… (or 133%).
s1 d
2
4
c
• Design simultaneous flow of all commodities which minimises the maximum congestion, that is,
u(x, y)
minimise: max􏰅f(x,y) : (x,y) ∈ E􏰆. u(x, y)
• Note that the optimal (minimal) congestion may be greater than 100%. 6CCS3OME/7CCSMOME, Lecture 5: Min-cost flow, Multicommodity flow
22 / 27
3
2
6
p
q
4
t3
2

Computational approaches to multicommodity flow problems
• Specialised algorithms
A possible approach: start with some initial simultaneous flow, and iteratively keep improving it by re-routing (a fraction of) the flow of each commodity onto better paths.
Undirectededgecapacities:2; commodities:(b,c,2),(a,b,2),(a,c,2).
b
c optimal congestion 1.0 b c
a
←−
a
congestion 1.5; cannot improve by rerouting one path
• Mathematical programming
Express the mutlicommodity flow problem as a linear program (or other appropriate “mathematical program”) and use the general linear programming methods (a number of commercial and public-domain linear
programming packages are available).
6CCS3OME/7CCSMOME, Lecture 5: Min-cost flow, Multicommodity flow 23 / 27
−→

Exercises – LGT
1. We have an input network G for a minimum-cost flow problem and a flow f in this network which satisfies all supply and demand requirements and edge capacity constraints. Argue that if f′ is a flow in the residual network Gf which is positive only on the edges of one cycle, then f ↑ f′ is a flow in G which satisfies all supply and demand requirements and edge capacity constraints.
6CCS3OME/7CCSMOME, Lecture 5: Min-cost flow, Multicommodity flow 24 / 27

Exercises – LGT (cont.)
2. A network and flow in this network are shown below. The numbers in parentheses next to an edge are the capacity (the first number) and the cost of this edge. The flow is highlighted in red.
(6,3)
(2,2) +8
(4,1) 4(4,1)
1
3
1
3 1
(1,5)
4
(8,1)
1 (2,3) (3,2)
+5
2 (3,8) (1,2)
−4
(5,4)
4 (4,3)
2
3 (4,3) (4,2)
−6
(4,3)
(a) Show that there is a negative cost cycle in the residual network.
−3
3 (4,1)
(b) Improve the cost of flow by sending flow along a negative cycle in the residual network. How does the flow and the cost of flow change?
(c) Compute a minimum-cost flow in this network using the Successive Shortest Paths algorithm. Give priority to the vertex with the initial supply +5 (when selecting from set S).
6CCS3OME/7CCSMOME, Lecture 5: Min-cost flow, Multicommodity flow 25 / 27

Exercises – SGT
Consider the following instance of the minimum-congestion multicommodity-flow problem.
Network:
Commodities:
6CCS3OME/7CCSMOME, Lecture 5: Min-cost flow, Multicommodity flow
26 / 27
q
(4)
(2)
(3)
c
(4)
(5)
(3)
p
(1)
(5) a
b
commodity source destination demand
d
1bd3 2aq7 3bc4

Exercises – SGT (cont.)
1.
Check that the flow of each commodity given in the table below satisfies the flow conservation constraints. What is the maximum edge congestion?
2.
Show that the maximum edge congestion of the above flow can be improved by re-routing some flow of commodity 3 from edge (b, c) onto the path ⟨b, p, q, c⟩. By how much can you reduce this way the maximum edge congestion? We allow fractional flows (that is, flows on edges can be fractional numbers).
3.
By checking the total capacity of the edges going from the set of nodes {a, b} to the set of the remaining nodes {c, d, p, q}, argue that for this input, there is no flow which has the maximum edge congestion less than
edge → (a,b) (a,q) (b,c) (b,p) (c,d) (p,d) (p,q) (q,c)
commodity1: 0 0 2 1 2 1 0 0 commodity2: 3 4 0 3 0 0 3 0 commodity3: 0 0 3 1 0 0 1 1
7/6 ≈ 1.17.
6CCS3OME/7CCSMOME, Lecture 5: Min-cost flow, Multicommodity flow 27 / 27

Optimisation Methods
Kathleen Steinh ̈ofel and Tomasz Radzik
6CCS3OME
Combinatorial Optimisation, SAT, MST
Combinatorial Optimisation, SAT, MST 1 Kathleen Steinh ̈ofel (KCL) Lecture 1 / 25

Combinatorial Optimisation
A combinatorial optimisation problem:
We can view an instance of a combinatorial optimisation problem as a pair (R, C), where
R– is a (finite) set of (combinatorial) configurations, given in some implicit way (that is, not given as an explicit list of all configurations), C– is a cost function, C : R → Real numbers, assigning to each configuration its cost; specified as a procedure for calculating the cost of a given configuration
Problem: find a configuration with the minimum cost. That is, find a configuration config ∈ R such that the cost C(config) of this configuration is as small as possible, i.e., there is no configuration with a smaller cost.
Combinatorial Optimisation, SAT, MST 2 Kathleen Steinh ̈ofel (KCL) Lecture 1 / 25

Example
Let’s look at how solving a problem instance can also be formulated as an optimisation problem.
. . . I was really good this year, so please
bring me a present OR
bring my mum a present OR
DO NOT bring my little brother a present.
Let’s represent the letter: (x1 ∨ x2 ∨ x3)
There are 7 ways to satisfy this wish: (x1, x2, x3) =
{(000), (010), (011), (100), (101), (110), (111)}.
Combinatorial Optimisation, SAT, MST 3 Kathleen Steinh ̈ofel (KCL) Lecture 1 / 25

Can Santa satisfy all letters?
F =(x1 ∨x2 ∨x3) 􏰉(x3 ∨x4 ∨x5) 􏰉 (x1 ∨ x5 ∨ x6) . . .
Definition (Satisfiability Problem – SAT)
For a given CNF (conjunctive normal form) of a Boolean function over vaiables (x1 , . . . , xn ) decide whether or not there is an assignment φ(x1,…,xn) with all xi ∈ {0,1} such that F(φ) = 1.
Satisfiability is one of the core combinatorial problems with many applications. First problem that was shown to be NP-complete, i.e. computationally hard.
Combinatorial Optimisation, SAT, MST 4 Kathleen Steinh ̈ofel (KCL) Lecture 1 / 25

Computationally Hard Problems
What is an algorithm?
What are the fundamental capabilities and limitations of computers? When should an algorithm be considered practically feasible?
What makes some problems computationally hard and others easy?
Combinatorial Optimisation, SAT, MST 5 Kathleen Steinh ̈ofel (KCL) Lecture 1 / 25

Time Complexity of Algorithms
Combinatorial Optimisation, SAT, MST 6 Kathleen Steinh ̈ofel (KCL) Lecture 1 / 25

Time Complexity of Problems
Combinatorial Optimisation, SAT, MST 7 Kathleen Steinh ̈ofel (KCL) Lecture 1 / 25

NP, NP-Complete, and NP-Hard Problems
Combinatorial Optimisation, SAT, MST 8 Kathleen Steinh ̈ofel (KCL) Lecture 1 / 25

Back to Santa
Let’s restrict the letter size to three lines/wishes.
With n people sending possibly more than one letter, we have many
possible letters:
􏰇2n􏰈 2n · (2n − 1) · (2n − 2) 3 3 = 1·2·3 ≈n.
Santa decides for everyone whether they do or do not get a present. There are 2n possible combinations.
Definition (MAX-3-SAT)
For a given CNF with clause length three find a φ that maximises the number of satisfied clauses.
Combinatorial Optimisation, SAT, MST 9 Kathleen Steinh ̈ofel (KCL) Lecture 1 / 25

3-SAT:
NP-complete decision problem. Reduced from general SAT. Just a few letters: likely to satisfy all.
Lots and lots of letters: probably not satisfiable. Phase-transition of 3-SAT.
MAX SAT:
It is a combinatorial optimisation problem.
There are 2n different φ among which we want to find one φmax that maximises the number of satisfied clauses.
Already for n ≥ 300 there are more φs than atoms in the observable universe.
It is computationally hard to find a φmax .
Combinatorial Optimisation, SAT, MST 1 Kathleen Steinh ̈ofel (KCL) Lecture 1 / 25

Other Problems
Combinatorial Optimisation, SAT, MST 1 Kathleen Steinh ̈ofel (KCL) Lecture 1 / 25

Number of Edges in Complete Graphs
Combinatorial Optimisation, SAT, MST 1 Kathleen Steinh ̈ofel (KCL) Lecture 1 / 25

Number of Edges in Tree Graphs
Combinatorial Optimisation, SAT, MST 1 Kathleen Steinh ̈ofel (KCL) Lecture 1 / 25

Spanning Trees
Combinatorial Optimisation, SAT, MST 1 Kathleen Steinh ̈ofel (KCL) Lecture 1 / 25

Number of Spanning Trees
Combinatorial Optimisation, SAT, MST 1 Kathleen Steinh ̈ofel (KCL) Lecture 1 / 25

Combinatorial Optimisation, SAT, MST 1 Kathleen Steinh ̈ofel (KCL) Lecture 1 / 25

Combinatorial Optimisation, SAT, MST 1 Kathleen Steinh ̈ofel (KCL) Lecture 1 / 25

Combinatorial Optimisation, SAT, MST 1 Kathleen Steinh ̈ofel (KCL) Lecture 1 / 25

Combinatorial Optimisation, SAT, MST 1 Kathleen Steinh ̈ofel (KCL) Lecture 1 / 25

Weighted Graphs
Combinatorial Optimisation, SAT, MST 2 Kathleen Steinh ̈ofel (KCL) Lecture 1 / 25

Minimum Cost Spanning Trees (MST)
Combinatorial Optimisation, SAT, MST 2 Kathleen Steinh ̈ofel (KCL) Lecture 1 / 25

Prim’s Algorithm for Finding MST
Combinatorial Optimisation, SAT, MST 2 Kathleen Steinh ̈ofel (KCL) Lecture 1 / 25

Growing Forests to Find a MST
Combinatorial Optimisation, SAT, MST 2 Kathleen Steinh ̈ofel (KCL) Lecture 1 / 25

Kruskal’s Algorithm for Finding a MST
Combinatorial Optimisation, SAT, MST 2 Kathleen Steinh ̈ofel (KCL) Lecture 1 / 25

Complexity of MST Problems
Prim and Kruskal solve the MST problem in polynomial time.
Prim’s: O(|V|2) with a good choice of data structure O(|E|log|V|). Kruskal’s: O(|E|log|V|)
Related problems that we cannot solve that easily: Degree-constrained minimum spanning tree. Steiner Tree Problem.
Finding MST has many applications. Most variations arise in VLSI design.
Later during the course we will explore how an efficient algorithm for finding MST can help finding approximate solutions for the TSP.
Combinatorial Optimisation, SAT, MST 2 Kathleen Steinh ̈ofel (KCL) Lecture 1 / 25

Optimisation Methods
Kathleen Steinh ̈ofel and Tomasz Radzik
6CCS3OME
Complexity Classes, TSP, Approximation
Kathleen Steinh ̈ofel (KCL) Lecture 2 CC,TSP,BB 1 / 26

NP, NP-Complete, and NP-Hard Problems
Kathleen Steinh ̈ofel (KCL) Lecture 2 CC,TSP,BB 2 / 26

Problems and Complexity Classes
Kathleen Steinh ̈ofel (KCL) Lecture 2 CC,TSP,BB 3 / 26

P, NP, NP-hard
One of the Millennium Problems: Click here
Kathleen Steinh ̈ofel (KCL) Lecture 2 CC,TSP,BB 4 / 26

Methods
Kathleen Steinh ̈ofel (KCL) Lecture 2 CC,TSP,BB 5 / 26

Hamiltonian Cycles
Kathleen Steinh ̈ofel (KCL) Lecture 2 CC,TSP,BB 6 / 26

HCs in Complete Graphs
Kathleen Steinh ̈ofel (KCL) Lecture 2 CC,TSP,BB 7 / 26

Classic TSP
The input a complete, weighted graph G(V,E,w), where set V represents the cities to be visited and E represents the edges between cities, the weights on edges given by w are all positive.
Decision problem: Given a target k, is there a tour which visits each city exactly once and has total weight less than k?
Optimisation problem: Determine the tour which visits each city exactly once and has the minimum possible total weight.
Kathleen Steinh ̈ofel (KCL) Lecture 2 CC,TSP,BB 8 / 26

Branch-and-Bound
Consider the following instance of a TSP with n = 5 cities.
The example is from ”Foundations of Algorithms” by R. Neapolitan and K. Naimipour (see Reading List).
12345 1 0 14 4 10 20 2 14 0 7 8 7 3 4 5 0 7 16 4 11 7 9 0 2 5 18 7 17 4 0
Kathleen Steinh ̈ofel (KCL) Lecture 2 CC,TSP,BB 9 / 26

Branch-and-Bound: Example
Branch-and-Bound is an exhaustive method that evaluates partial configurations and discards those which cannot lead to an optimal solution.
The root is the city we start from. Each leaf represents a complete tour. The number of leaves for a n-city instance of TSP:
(n − 1)(n − 2) · · · 2 = (n − 1)!
Forexample: 4!=1·2·3·4=24,15!≈1.3·1012
Kathleen Steinh ̈ofel (KCL) Lecture 2 CC,TSP,BB 10 / 26

Branch-and-Bound: Notions
Root: There is a path from the initial partial configuration to any configuration.
Internal Node: Partially specified configuration, which can be extended to a number of complete configurations.
Branching: takes as input a partial configuration P (which is not a complete configuration) and produces partial configurations P1,P2,…,Pk which extend P.
Bounding: A bounding procedure computes for a given partial configuration P, a lower bound on the cost of any configuration which can be obtained by extending configuration P.
Kathleen Steinh ̈ofel (KCL) Lecture 2 CC,TSP,BB 11 / 26

Branch-and-Bound for TSP
For TSP with 1,2,··· ,n cities:
Partial configuration: a sequence of distinct cities starting at city 1. Formally, a sequence of integers ⟨i1, i2, · · · , ik ⟩, with 1 ≤ k ≤ n − 1 and i1 = 1. All i1, i2, · · · , ik being distinct. Example: ⟨1, 3, 2, 7⟩.
Initial Partial Configuration: ⟨1⟩
Branching: For a partial configuration ⟨i1 , i2 , · · · , ik ⟩ output all partial configurations that extend the sequence by one city, which is not included in the partial configuration: ⟨i1,i2,··· ,ik,x⟩ with
x ̸= ij,1 ≤ j ≤ k.
For example, if n = 6 and the input is ⟨1, 3, 4⟩ then the output is ⟨1,3,4,2⟩, ⟨1,3,4,5⟩ and ⟨1,3,4,6⟩
Kathleen Steinh ̈ofel (KCL) Lecture 2 CC,TSP,BB 12 / 26

Branch-and-Bound for TSP
Bounding: A procedure that computes for a given partial configuration P, a lower bound on the cost of any configuration which can be obtained by extending configuration P.
An example of a bounding procedure for TSP is:
For a P = ⟨i1, i2, · · · , ik ⟩, with k ≤ n − 2, sum the cost of the partial tour and the minimum outgoing edge from cities not already connected in P.
More formally, the procedure calculates a bound via the following sum CLB (P ) = c (i1 , i2 ) + c (i2 , i3 ) + · · · + c (ik −1 , ik )+
min{c(ik,x) : x ̸∈ {i1,i2,··· ,ik}}+
􏰃v̸∈{i1,i2,···,ik} min{c(v,x) : x ̸∈ {i2,i3,··· ,ik,v}}.
Kathleen Steinh ̈ofel (KCL) Lecture 2 CC,TSP,BB 13 / 26

Observation
If the input is the initial partial configuration ⟨1⟩, then sum, over all cities, the minimum cost of a link outgoing from each city:
n
􏰄 min{c(i, x) : x ∈ {1, 2, · · · , n} − {i}} i=1
This gives a lower bound on the cost of any TSP tour; in particular, a lower bound on the minimum cost of a TSP tour.
Kathleen Steinh ̈ofel (KCL) Lecture 2 CC,TSP,BB 14 / 26

Back to our Example
In our example, the bounding procedure on ⟨1⟩ = 21.
12345 1 0 14 4 10 20 2 14 0 7 8 7 3 4 5 0 7 16 4 11 7 9 0 2 5 18 7 17 4 0
We have
c (1, 3) + c (2, 3) + c (3, 1) + c (4, 5) + c (5, 4) = 4 + 7 + 4 + 2 + 4 = 21.
Note: This is a lower bound, i.e., any tour of this TSP instance has a cost greater or equal to 21.
Kathleen Steinh ̈ofel (KCL) Lecture 2 CC,TSP,BB 15 / 26

Back to our Example
For a partial tour P = ⟨1, 2, 3⟩ our bounding procedure returns 34.
12345 1 0 14 4 10 20 2 14 0 7 8 7 3 4 5 0 7 16 4 11 7 9 0 2 5 18 7 17 4 0
Sum of the cost of C (P ) = c (1, 2) + c (2, 3) = 14 + 7 = 21 and minimum outgoing edge of 3 not going back min{c (3, 4), c (3, 5)} = 7 and minimum outgoing edges of cities not in P again not going back except for 1: min{c(4, 1), c(4, 5)} + min{c(5, 1), c(5, 4)} = 2 + 4 = 6. Therefore, we have 21 + 7 + 6 = 34.
Kathleen Steinh ̈ofel (KCL) Lecture 2 CC,TSP,BB 16 / 26

Branch-and-Bound Pseudocode
Kathleen Steinh ̈ofel (KCL) Lecture 2 CC,TSP,BB 17 / 26

Symmetric TSP with ∆ineq.
Kathleen Steinh ̈ofel (KCL) Lecture 2 CC,TSP,BB 18 / 26

Christofides’ Algorithm
For the symmetric TSP (∆ineq.) 2-approximation algorithms were known, e.g. Double Tree, Nearest Addition.
In 1976, Christofides published the following algorithm
1 Find the MST T with distance matrix [di,j].
2 Find the nodes of T with odd degree and find the shortest complete match M in the complete graph consisting of these nodes only. Let G be the multigraph with nodes {1, . . . , n} and edges those in T and M.
3 Find an Eulerian walk of G and an embedded tour.
It finds a 1.5-approximate tour.
Kathleen Steinh ̈ofel (KCL) Lecture 2 CC,TSP,BB 19 / 26

Example
Kathleen Steinh ̈ofel (KCL) Lecture 2 CC,TSP,BB 20 / 26

Step 1 – MST
Kathleen Steinh ̈ofel (KCL) Lecture 2 CC,TSP,BB 21 / 26

Step 2a – Odd Degree Nodes
Kathleen Steinh ̈ofel (KCL) Lecture 2 CC,TSP,BB 22 / 26

Step 2b – Minimum Matching
Kathleen Steinh ̈ofel (KCL) Lecture 2 CC,TSP,BB 23 / 26

Step 3a – Eulerian Walk
Kathleen Steinh ̈ofel (KCL) Lecture 2 CC,TSP,BB 24 / 26

Step 3b – Embedded Tour
Kathleen Steinh ̈ofel (KCL) Lecture 2 CC,TSP,BB 25 / 26

Proof
Definition (Theorem)
Christofides’ Algorithm is a 1.5-approximation algorithm.
Proof:
MST and matching can be done in polynomial time.
The constructed tour is feasible.
The cost of the produced tour is at most the cost of the Euler tour, which is cost(T) + cost(M).
We know that cost(T) < OPT. We can show that cost(M) < OPT/2. Kathleen Steinh ̈ofel (KCL) Lecture 2 CC,TSP,BB 26 / 26 Optimisation Methods Kathleen Steinh ̈ofel and Tomasz Radzik 6CCS3OME Linear Programming Kathleen Steinh ̈ofel (KCL) Lecture 3 LP 1 / 24 General Optimisation Problems We have discussed how to formulate problems as Combinatorial Optimisation Problem where we have an implicit list of candidate configurations R and a cost function C, such that we want to find a R ∈ R such that C(R) is a minimum. This is useful for problems that have discrete values representing candidate configurations. For General Optimisation Problems we have: Decision variables representing the problem components; Constraints on the values of the variables; An objective function. Note: If we do not specify an objective function, then we are asking for a feasible solution, assignments of values to our variables such that all constraints are satisfied. Kathleen Steinh ̈ofel (KCL) Lecture 3 LP 2 / 24 Introductory Example Activity OME Project Other Modules Studying Maths 4 -1 0.5 Coding 1 5 1 Our decision variables: x1 time spend on studying maths. x2 time spend on coding. Constraints in order to get a first in OME: 4x1 + x2 ≥ 70 Constraints to get a 2:1 in the project: −x1 + 5x2 ≥ 60 For other modules: 12x1 + x2 ≥ 50 Kathleen Steinh ̈ofel (KCL) Lecture 3 (note that x1 ≥ 0) (note that x2 ≥ 0) LP 3 / 24 Finding a Solution Graphically Plotting the constraints, we get a feasible region. Note: Any values in the green region satisfy all constraints. Which values for x1,x2 minimise the hours worked? Kathleen Steinh ̈ofel (KCL) Lecture 3 LP 4 / 24 Objective Function Minimise: z = x1 + x2 Move the dotted line towards the feasible region. The dotted lines describe a plane in the 3D space. Kathleen Steinh ̈ofel (KCL) Lecture 3 LP 5 / 24 3D Plot Exercise: Find the values of x1 and x2 and the resulting corresponding values for q1 and q2. Kathleen Steinh ̈ofel (KCL) Lecture 3 LP 6 / 24 Formal Definition of Linear Programming A mathematical program is an optimisation problem in which objectives and constraints are given as mathematical functions and relations. We have a linear program, if objectives and all of the constraints are linear. Find values of the decision variable x1 , . . . xn , which minimise subject to f(x1,..,xn)=c1x1 +c2x2 +···+cnxn a11x1 +a12x2 +···+a1nxn a21x1 +a22x2 +···+a2nxn . aj1x1 +aj2x2 +···+ajnxn . am1x1 +am2x2 +···+amnxn l1 ≤x1 ≤u1,...,ln ≤xn ≤un ≤ b1 ≤ b2 ≤ bj ≤ bm ∀c1,..,cn ∈ R, ∀b1,..,bm ∈ R, and ∀aij ∈ R, for i = 1,..,n and j = 1,..,m. Kathleen Steinh ̈ofel (KCL) Lecture 3 LP 7 / 24 Some Observations f (x1, .., xn) is called the objective function. Instead of asking to minimise f (x1 , .., xn ) we could maximise it. Linear programmes can be solved in polynomial time, i.e., the linear programmming problem is in class P. The objective function is a hyberplane in the n + 1-dimensional space. If we have n variables, then the region of feasible solutions is k-dimensional, for 0 ≤ k ≤ n (if not empty). That is why only evaluating the objective function at the intersections is enough to find a minimum value. When two intersections have an equal minimum value, then so do all points in the line that connects them. No objective function means, we only want to know if there are values of the variables which satisfy all constraint: the feasibility problem. Kathleen Steinh ̈ofel (KCL) Lecture 3 LP 8 / 24 Methods Algorithms: simplex method; interior-point method; variations of these. (For the module, we focus on formulating problems as mathematical programmes only.) Commercial packages: CPLEX, MINOS, GIPALS, . . . (See Additional Material on KEATS, CPLEX as a student version free of charge.) Open access packages: GLPK (GNU Linear Prog. Kit), LP Solve, . . . Kathleen Steinh ̈ofel (KCL) Lecture 3 LP 9 / 24 Feasibility and Bounded Regions Definition We say a linear programme is feasible, if there is an assignment of values to variables x1, . . . , xn that satisfies all constraints. Otherwise, it is infeasible. The set of all feasible solutions is called the feasible region. Definition An assignment of values to variables x1, . . . , xn for that f (x) is the maximum (or minimum) among all feasible assignments, we say it is an optimal solution. If a programme has some feasible solution but no optimal solution, we say it is unbounded. Note: We can have optimal solutions even when the feasible region of a given linear programme is unbounded. (As seen in the introductory example of studying hours.) Kathleen Steinh ̈ofel (KCL) Lecture 3 LP 10 / 24 Formulating Problems on Graphs as LPs Maximum Flow as LP Recall some notation: c(u,v) ≥ 0 is the (non-negative) capacity of each edge (u,v). A flow is a non-negative real-valued function f : V × V ⇒ R that satisfies the capacity constraints and flow conservation. Let’s assume c(u,v) = 0 if (u,v) ̸∈ E. Problem: Find the maximum possible flow from source to sink while making sure to satisfy the capacity constraints of each edge and flow conservation in all nodes but source and sink ones. Kathleen Steinh ̈ofel (KCL) Lecture 3 LP 11 / 24 Max Flow as Linear Programme maximise 􏰃 fsv − 􏰃 fvs v∈V v∈V subjectto 􏰃fvu = 􏰃fuv foreachu∈V−{s,t} v∈V v∈V fuv ≤ c(u,v) foreachu,v∈V fuv ≥ 0 foreachu,v∈V Objective function: Maximise the net flow from the source. First set of constraints: Satisfying flow conservation for each node that is not the source nor the sink; Second set of constraints: Not allowing more flow through an edge than its capacity, and Third set of constraints: Making sure all flows are non-negative. Kathleen Steinh ̈ofel (KCL) Lecture 3 LP 12 / 24 Some Observations How many variables do we have? |V |2 How many constraints? 2|V |2 + |V | − 2 Note: We are counting a given fbd as a variable even though (b,d) might not be an edge of the graph. Does the number of variables matter? Yes The run-time of solvers is a function over the input size, i.e., the number of variables contributes to that size. The student version of CPLEX is limited in the number of variables you can use. Note: A good practice for actual implementations is it not to add variables fbd if (b,d) ̸∈ E. Kathleen Steinh ̈ofel (KCL) Lecture 3 LP 13 / 24 Example for Max Flow as LP Kathleen Steinh ̈ofel (KCL) Lecture 3 LP 14 / 24 Minimum-cost Flow Problem as LP So far, we have seen linear programmes that solve problems for which we already had polynomial algorithms. However, for minimum-cost flow, we only saw a exponential time algorithm (there are polynomial ones as well). Based on the methods we have seen so far, formulating the Minimum-cost Flow Problem as linear programme is going to offer an actual improvement on the running time. (recall LPs can be solved in polynomial time) Let us recall notation: d(u,v) is the amount of flow we want from u to v. u(u,v), as before, is the capacity of each edge. c(u,v) is the cost to send 1 unit of flow over edge (u,v). Therefore, c(u, v)fuv is the cost of sending fuv of flow over (u, v). We assume no antiparallel edges. (easy to deal with them if any arise) Kathleen Steinh ̈ofel (KCL) Lecture 3 LP 15 / 24 Formulation minimise 􏰃 c(u,v)fuv (u,v)∈E subjectto 􏰃 fvu − 􏰃 fuv = d(v) foreachu∈V v∈V v∈V fuv ≤ u(u,v) foreachu,v∈V fuv ≥ 0 foreachu,v∈V Objective function: Minimise the total cost of the flow. First set of constraints: Satisfying flow conservation for each node v with net flow d(v); Second set of constraints: Not allowing more flow through an edge than its capacity, and Third set of constraints: Making sure all flows are non-negative. Kathleen Steinh ̈ofel (KCL) Lecture 3 LP 16 / 24 Minimum-cost Flow as LP Kathleen Steinh ̈ofel (KCL) Lecture 3 LP 17 / 24 Multicommodity Flow as LP For this problem, the only known polynomial-time algorithm, expresses it as a LP and solves it with a polynomial-time linear-programming algorithm. Here, flows can be non-integers, otherwise the problem becomes NP-hard. Description of the problem: k commodities, for each 1 ≤ q ≤ k we have (sq, tq, dq), where sq is the source for commodity q. tq is the destination of commodity q. dq ≥ 0 is the (amount of) demand of commodity q. c(u,v), as before, is the capacity of each edge. fquv is the flow of commodity q over edge (u,v). fuv = 􏰃kq=1 fquv is the aggregate flow over edge (u, v ). We assume no antiparallel edges. This is a feasibility problem: Is it possible to send each commodity to its destination? We do not optimise anything, just ask whether it is possible. Kathleen Steinh ̈ofel (KCL) Lecture 3 LP 18 / 24 Feasibility as LP minimise 0 subject to fuv ≤ 􏰃fqvu = c(u,v) for each u, v ∈ V 􏰃fquv foreachq=1,...kand 􏰃 fqsqv − 􏰃 fqvsq = v∈V v∈V for each u ∈ V − {sq , tq } dq for each q = 1,...k 0 for each u, v ∈ V and q = 1,...k fquv ≥ Exercise: Do we need to assure v∈V v∈V 􏰄fqtqv −􏰄fqvtq =−dq ? v∈V v∈V Kathleen Steinh ̈ofel (KCL) Lecture 3 LP 19 / 24 Observations Objective function: Set as null. (this is a feasibility problem!) First set of constraints: Capacity constraints, i.e., not allowing more flow through an edge than its capacity, Second set of constraints: Conservation constraints, i.e., satisfying flow conservation for each node that is not the source nor the destination for a given commodity q, Third set of constraints: The net flow out of the source sq is dq, Fourth set of constraints: All flows are non-negative. For the Minimum-cost multicommodity-flow problem: Set the objective function similar to the minimum-cost flow problem (single-commodity). Kathleen Steinh ̈ofel (KCL) Lecture 3 LP 20 / 24 Minimum-congestion Multicommodity Flow Let’s add the requirement that the total flow over each edge (summing up all commodities) divided by its capacity is minimised. In other words, we minimise the most congested edge (the max congestion to be called λ). 􏰅fuv􏰊 􏰆 objective: minimise max c(u, v) 􏰊 (u, v) ∈ E That is not a linear objective function. How can we fix it? minimise λ subject to fuv ≤ λ for each (u,v) ∈ E c(u,v) . Note: These constraints can be written as, for each edge (u,v), fuv −c(u,v)λ≤0 Kathleen Steinh ̈ofel (KCL) Lecture 3 LP 21 / 24 Minimum-congestion Multicommodity Flow as LP minimise λ subjectto fuv−c(u,v)λ ≤ 0 foreachu,v∈V foreachq=1,...kand for each u ∈ V − {sq , tq } foreachq=1,...k foreachu,v∈Vand q = 1,...k Objective function: Minimise the congestion. First set of constraints (new): Make sure that, for all edges, we have 􏰄k fquv ≤λ q=1 c(u,v) Recall that 􏰃kq=1 fquv = fuv . We allow edges to overflow. Kathleen Steinh ̈ofel (KCL) Lecture 3 LP 22 / 24 􏰃 fqvu v∈V 􏰃 fqsqv − 􏰃 fqvsq v∈V v∈V = 􏰃 fquv v∈V = dq fquv ≥ 0 Observations We can add constraints to linear programmes very easily. 1 For example, we want the total flow over (b,d) equal to flow over (u,v), so we just add the line fbd − fuv = 0 2 Or perhaps we want something like fbd ≥ fuv 3 We might have a leaking edge: say (b,d) loses 5% of its flow (like a water pipe), then the balance equation for d will be something like: fdv +···+fdx −fgd −···−0.95fbd =0 􏰍 􏰌􏰋 􏰎􏰍 􏰌􏰋 􏰎 outgoing edges incoming edges Kathleen Steinh ̈ofel (KCL) Lecture 3 LP 23 / 24 Conclusions Let’s highlight some key points: Solutions to LP can be attained at the boundary of the feasible region (if bounded and if it exists). Linear programmes can be solved in polynomial time. - The Simplex Method has a worst case runtime exponential in the problem size (Klee and Minty, 1983). - It is widely used as it has a good average time performance. - The first worst-case polynomial time algorithm found for solving LPs runs in O(n6L) time. (Leonid Khachiyan, 1997). For Maximum Flow, polynomial-time methods exist, and LP is used. For Multicommodity Flow and Minimum-congestion Multicommodity Flow, all polynomial-time algorithms known use LP. Note: If we have the additional constraint that variables x take on integral values, then we have an integer linear-programming problem. Determining whether an ILP problem has a feasible solution, however, is NP-hard. Kathleen Steinh ̈ofel (KCL) Lecture 3 LP 24 / 24 Optimisation Methods Kathleen Steinh ̈ofel and Tomasz Radzik 6CCS3OME Solving NP-hard Problems Kathleen Steinh ̈ofel (KCL) Lecture 4 Heuristic Search 1 / 20 NP-hard Problems Problems in N P are decision problems.We are discussing optimisation problems where the corresponding decision problem is in NP. For some NP problems there are no polynomial-time methods known. (Remember finding a deterministic polynomial time method for an NP-hard problem implies the existence of polynomial-time algorithms for all problems in NP. Note: Not all NP-hard problems are in NP.) What can we do in reasonable time? Find solutions that are within a range of the optimum value. Use methods that solve a related problem and derive a solution. We can use linear programming to solve ILPs. Design methods that start at a random solution, improve it, and guarantee that it has the potential to find an optimum. Use knowledge about the problem to find good solutions. Kathleen Steinh ̈ofel (KCL) Lecture 4 Heuristic Search 2 / 20 Approximation Methods Optimisation problems that allow polynomial-time approximation algorithms with an approximation ratio bounded by a constant are in the class APX. Example: For the metric TSP we have a 1.5-approximation method that requires polynomial time. There are problems for which there are no polynomial time approximation methods, e.g., TSPs where the triangle inequality doesn’t hold. We say a problem has an polynomial-time approximation scheme (PTAS) if we can find approximate solutions within an approximation bound of 1 + ε in polynomial-time based on the input size and ε, e.g., a running time of O(n1/ε). If we dont have a PTAS then we need a different approach. Kathleen Steinh ̈ofel (KCL) Lecture 4 Heuristic Search 3 / 20 Solving ILPs Integer programs are linear programs with the additional constraint that some of the variables are integers. Note: The objective function and constraints are still linear functions. Idea: Let’s relax the requirement of variables being integers, assuming discrete values. Use a linear solver for the relaxed problem and transform solutions with real values to feasible integer ones. Observations: Transforming the solution found for the relaxed LP can take exponential time. We no longer guarantee to find an optimum solution. Kathleen Steinh ̈ofel (KCL) Lecture 4 Heuristic Search 4 / 20 Heuristic Search - Local Search Remember Santa and the MAX-3-SAT problem: We cannot evaluate all possible φ. A modern Santa with a PC (Intel Core i7 5960x) with 3·1011 IPS. One year to check all φ for n = 76. Local search for combinatorial optimisation: Is a powerful tool to find good, near optimal solutions. Employs the power of randomness. Examines a subset of all possible φ. Starts with an initial solution picked at random. Allows small, local changes to the solution. Tries to preserve good features and improve the current best solution. Kathleen Steinh ̈ofel (KCL) Lecture 4 Heuristic Search 5 / 20 Energy Landscape Search space: All φ. Local change/Neighbourhood function: Flipping one bit in φ. E.g. φ = (0,1,1) and φ′ = (0,0,1) are neighbours. Energy/Objective function: C(F(φ)) the number of clauses in F that are satisfied. These components induce an Energy Landscape. Kathleen Steinh ̈ofel (KCL) Lecture 4 Heuristic Search 6 / 20 Local Search: Guided walk through the landscape Riding through the Energy Landscape: 1 Start with a φ selected at random. 2 Evaluate φ. 3 Select at random a φ′ from the neighbourhood of φ. 4 Evaluate φ′ 5 If φ′ is better then check whether it is the best found so far and move to φ′. 6 Else decide whether to move to φ′ anyway. 7 Repeat steps starting from 3 until a stopping criterion is reached. Kathleen Steinh ̈ofel (KCL) Lecture 4 Heuristic Search 7 / 20 Features of the Landscape Let’s denote with R the set of all possible φ. Let’s assume C is the number of clauses satisfied. Let’s denote with S(φ) the neighbourhood of φ. Definition (Local Optimum) A φ is called a local optimum if ∀φi ∈ S(φ) : C(φi) ≤ C(φ). Definition (Global Optimum) A φ is called a global optimum if ∀φi ∈ R : C(φi) ≤ C(φ). Definition (Plateau) A set of solutions B is called a plateau if ∀φ,φ′ ∈ B : ∃φ1 = φ,...,φn = φ′ with φi +1 ∈ S (φi ) and C (φi ) = C (φj ) for 1 ≤ i , j ≤ n − 1. Kathleen Steinh ̈ofel (KCL) Lecture 4 Heuristic Search 8 / 20 TSP Revisited Symmetric TSP: c(i,j) = c(j,i) for each pair of cities i and j. Let H be a TS tour (a configuration). Possible Neighbourhoods: 1 N1(H): the set of TS tours other than H which can be obtained from H by changing the order of two consecutive cities on H. This neighbourhood structure is not very useful. 2 N2(H): A 2-change modification of a travelling salesman tour H. A new travelling salesman tour H′ obtained from H by replacing two non-consecutive edges in H with two new edges. Kathleen Steinh ̈ofel (KCL) Lecture 4 Heuristic Search 9 / 20 More Neighbourhoods 3 We define N3(H) as the set of all 3-change modifications of tour H. In this case, H′ obtained from H by replacing three non-consecutive edges in H with three new edges. For n cities, set N3(H) has Θ(n3) configurations. For the same set of 3 edges, there are 4 possible new tours. This neighbourhood structure is also frequently used in practice. Kathleen Steinh ̈ofel (KCL) Lecture 4 Heuristic Search 10 / 20 Weighted Graph Bisection Problem Kathleen Steinh ̈ofel (KCL) Lecture 4 Heuristic Search 11 / 20 WGBP as Combinatorial Optimisation Problem A configuration: A partitioning of the set of nodes V into two equal size sets L and R. The set R of all configurations: All partitionings of the set of nodes V into two sets of equal size L and R. Cost function: For a configuration (partitioning) (L,R): C(L,R) = 􏰄{w(x,y) : (x,y) ∈ E,x ∈ L,y ∈ R}, that is, the sum of the weights of the edges between sets L and R. Find a configuration (L,R) (a partitioning of V) such that C(L,R) is as minimal as possible. Kathleen Steinh ̈ofel (KCL) Lecture 4 Heuristic Search 12 / 20 WGBP - Some Observations We want to split the set of nodes of an undirected graph G = (V , E ) into groups of equal size. (if |V | is odd, we can have one more on one side) In such a way that the sum of edges between groups is minimum. Also known as graph partition problem. (bisection, trisection...) Consider the size of cuts of G. Let (L,R) be a partitioning (‘left’ and ‘right’) of the set of nodes of G into two equal size sets. (or almost equal) The size of cut (L,R) is the sum of the weight of edges that connected a node in L with a node in R. Warning: The graph doesn’t have to be bipartite! That is, we can have edges within partitions. Kathleen Steinh ̈ofel (KCL) Lecture 4 Heuristic Search 13 / 20 Possible Neighbourhood Functions The most obvious neighbourhood structure: Let’s define N1(L,R) as the set of partitions (L′,R′) which can be obtained from (L,R) by swapping a node in L with a node from R. We could also consider the following neighbourhood structure: We define N2(L, R) as set of partitionings (L′, R′) which can be obtained from partitioning (L,R) by swapping two nodes from L with two nodes from R. Kathleen Steinh ̈ofel (KCL) Lecture 4 Heuristic Search 14 / 20 Neighbourhoods and Generation Mechanisms Recall, an instance of a combinatorial optimisation problem is given by: R is the set of configurations, C is a cost function (giving costs of configurations). For each configuration config ∈ R, the neighbourhood of this configuration is a well defined set of configurations which can be obtain from this configuration by a simple modification: N (config) = {config1, config2, . . . , configk }. For the same problem, the neighbouring configurations can be defined in many different ways. It’s important that the neighbourhood function satisfies certain properties. Reachability: For any config ∈ R there is a sequence of neighbourhood steps such that config −→N configopt for configopt . Kathleen Steinh ̈ofel (KCL) Lecture 4 Heuristic Search 15 / 20 And how do we pick a neighbour from the neighbourhood? A broad definition is given as follows. A generation mechanism for a given neighbourhood structure N is a procedure which for a given configuration config ∈ R returns a neighbouring configuration chosen at random over the neighbourhood. Note: We don’t have to generate every neighbour. In fact, especially for more complicated neighbourhood functions, it is sufficient to pick the elements of config that are part of the local change at random. An Initial Configuration should be a config ∈ R chosen uniformly at random. Unfortunately, choosing a config ∈ R truly at random isn’t always easy. Therefore, we use construction heuristics that produce one feasible solution. Kathleen Steinh ̈ofel (KCL) Lecture 4 Heuristic Search 16 / 20 Random Hill - Climbing The above method maintains a current configuration, and in each iteration tries to find a neighbour with a better cost value. A generation mechanism is used in line 3 to obtain a neighbour. Recall that c(config) stands for the cost of a given configuration. When a better configuration is found, it replaces the current configuration (line 4). Such computation may terminate in a local optimum that isn’t global. Kathleen Steinh ̈ofel (KCL) Lecture 4 Heuristic Search 17 / 20 Local Search Local search is an iterative improvement heuristic, which is a guided walk through the energy landscape. Kathleen Steinh ̈ofel (KCL) Lecture 4 Heuristic Search 18 / 20 Observations It is good to examine local optima, remember a global optima is also a local one. To examine more than one local optima, we need to find ways to leave them. Properties of the neighbourhood function influence the performance of our search. Remembering the best configuration seen so far, allows to stop at any time and return a valid configuration. There is a trade-off between the total number of iterations (neighbourhood transitions) and the quality of the output. Kathleen Steinh ̈ofel (KCL) Lecture 4 Heuristic Search 19 / 20 Observations In a plateau we can easily get lost. When can we be confident enough, that we have found a good configuration? Methods to guide local search: Tabu Search: Remember which neighbours were already visited. Simulated Annealing: Uses the Boltzmann distribution to make random decisions on whether a worse solution is accepted. Genetic Local Search: Multiple independent walks (population) and tries to find short cuts (crossover). Kathleen Steinh ̈ofel (KCL) Lecture 4 Heuristic Search 20 / 20 Optimisation Methods Kathleen Steinh ̈ofel and Tomasz Radzik 6CCS3OME Simulated Annealing and Genetic Local Search Kathleen Steinh ̈ofel (KCL) Lecture 5 SA and GA 1 / 26 Walking through Energy Landscape Components: 1 Problem Instance: Search Space: R; Objective Function: minimise/maximise C. 2 Neighbourhood Function: Our design choice. Three Important Properties (there are many more): 1 Local Optima: How many are there? How deep/high are they? 2 Plateaus: Search in plateaus is a random walk → it’s important to accept/move to configurations with the same cost value. 3 Escape Height: How much worse do configurations need to get in order to find better ones. Main Questions: 1 Where do we start? Random Initial Configuration 2 How do we walk? Guiding the search towards global optimia. 3 Are we there yet? or When do we stop? Kathleen Steinh ̈ofel (KCL) Lecture 5 SA and GA 2 / 26 Simulated Annealing Basic idea to simulate the physical process of allowing molecules to find an optimal ordering. Introduced by Kirkpatrick et al in 1983. A dictionary would tell us that Anneal (verb): Heat (metal or glass) and allow it to cool slowly, in order to remove internal stresses and toughen it. Example: two opposite sides of annealing 1 Forming horseshoes: Metal is heated and shaped. When cooled quickly, molecules ”freeze” in their current position and forced to retain the shape. 2 Creating Crystal Glass (artificial diamonds): To obtain extremely clear glass, liquid glass is heated and then very! slowly cooled to allow the molecules to find the optimum position and create crystal clear glass (for the diamond effect, you still need to cut it and create facets which together with clear glass can reflect the light in the desired way. Kathleen Steinh ̈ofel (KCL) Lecture 5 SA and GA 3 / 26 Simulated Annealing Simulated annealing is a general heuristic for combinatorial optimisation problems for which we can show convergence to optimum solutions. Some questions are trying to answer today: How do we prove that simulated annealing method works, i.e., give us a better result compared to not using it? What is the likelihood of finding the global optimum? How slow do we have to decrease the temperature to increase our chances of finding the optimum point? For further reading on that subject (from our Reading List): Local Search in combinatorial optimisation by Aarts and Karel. Chapters 4 and 8 mainly. Kathleen Steinh ̈ofel (KCL) Lecture 5 SA and GA 4 / 26 The Simulated Annealing Algorithm Kathleen Steinh ̈ofel (KCL) Lecture 5 SA and GA 5 / 26 In line 1, k is the counter of phases. Tk – the control parameter (temperature) in phase k, Tk > 0
Lk – the number of transitions in phase k (the length of phase k).
Line 4 is the beginning of a new phase (given k).
For each phase k (i.e., same temperature Tk), we iterate lines 5-10
Lk times.
If the new configuration is better than the original one, we accepted it
(line 8).
Otherwise, we might accept it depending on an exponential function.
Kathleen Steinh ̈ofel (KCL) Lecture 5 SA and GA 6 / 26

The probability of accepting a worse configuration
config – the current configuration.
new config – the new, generated neighbouring configuration, which
turns out to be worse than the current configuration config; that is, (x =) c(new config) > c(config) (= a).
Let T > 0 denote the current “temperature” Tk .
The new, worse configuration new config is accepted, if for a real
number r randomly selected from [0, 1), 􏰏c(config)−c(new config)􏰐 (a−x)/T dfn
r a, then 0 < PT (x) < 1. Kathleen Steinh ̈ofel (KCL) Lecture 5 SA and GA 8 / 26 In the example above T2 < T1; A smaller T means a lower probability of accepting a worse configuration. Kathleen Steinh ̈ofel (KCL) Lecture 5 SA and GA 9 / 26 The control parameter T (the “temperature”) The initial value T of the control parameter: sufficiently large, so that ∀config1 ∈ R with config2 ∈ N (config1), the probability of accepting config2 is “substantial”. Decreasing the control parameter (the temperature) to decrease the probability of accepting a worse configuration. Definition For given a and x > a, the probability of accepting (worse) configurations with cost x when the current configuration has cost a, is
(a−x ) PT(x)=e T
keeps decreasing to 0, when T keeps decreasing to 0.
A cooling schedule is a function for updating the temperature:
Tk+1 =α·Tk, fork=0,1,2,…,andα<1(typically,0.8≤α≤0.99). This has to be considered together with the length of a phase. Kathleen Steinh ̈ofel (KCL) Lecture 5 SA and GA 10 / 26 Other parameters The number of transitions in (the length of) each phase: usually increases; should be related to the size of the neighbourhood of a configuration; often set to the size of the neighbourhood of a configuration, if this isn’t too large. The initial configuration: usually computed by a fast specialised heuristic. Possible stopping criteria: when the temperature Tk reaches a specified, small value; when the current configuration has not changed during the last phase. Potential problems if the energy landscape induced by the neighbourhood function contains a large number of plateaus or deep local minima. Kathleen Steinh ̈ofel (KCL) Lecture 5 SA and GA 11 / 26 The initial temperature TSP with neighbourhood structure N2. Let c(i,j) ≥ 0 for all (i,j) and let M = max{c(i,j)}. c(new config) ≤ c(config) + 2M, because the cost of the two new links which are in new config but not in config is at most 2M. If T0 = 4M, then the probability of accepting a worse configuration during the initial phase is 􏰇c(config) − c(new config)􏰈 −(2M)/(4M) −0.5 exp T >e =e ≈0.6.
0
This probability can be considered “substantial”.
Hence T0 = 4M may be a good initial value, but could be too big: check by testing.
Kathleen Steinh ̈ofel (KCL) Lecture 5 SA and GA 12 / 26

Running time
Trade-off between the total number of iterations (transitions) and the quality of the output.
The total number of iterations (transitions) should be monitored and tuned to a given class of input instances.
Devise a scheme of setting and updating the control parameters:
the initial temperature, updates of the temperature, the lengths of the phases.
Test various possibilities to identified effective and robust settings.
The running time of one iteration (transition): dominated by the running time of procedure Generate and the time needed to compute the cost of the new configuration.
A good data structure for maintaining the current configuration may be needed to achieve a good running time of one iteration.
Kathleen Steinh ̈ofel (KCL) Lecture 5 SA and GA 13 / 26

TSP and neighbourhood structure N2
A straightforward implementation: the current tour ⟨i1, i2, . . . , in⟩.
A 2-change modification: Remove links (ip, ip+1) and (iq, iq+1); p < q, where (ip, ip+1) and (iq, iq+1) are not consecutive. conf = ⟨i1,i2,...,ip,ip+1,...,iq,iq+1,...,in⟩ changes to: new conf = ⟨i1,i2,...,ip,iq,iq−1,...,ip+1,iq+1,...,in⟩ by reversing the part of the tour from ip+1 to iq. c(new conf) = c(conf) − c(ip, ip+1) − c(iq, iq+1) + c(ip, iq) + c(ip+1, iq+1) O(n) time per one iteration. Representing the current tour in a special data structure based on balanced trees gives O(logn) time per one iteration. Kathleen Steinh ̈ofel (KCL) Lecture 5 SA and GA 14 / 26 Sketch of Proof for Convergence Our choice of acceptance probability is not arbitrary. It gives us a Boltzmann distribution of being in configurations in R. Named after Ludwig Boltzmann who first formulated it in 1868 during his studies of the statistical mechanics of gases in thermal equilibrium. Imagine a matrix of m × m, where m = |R|. An entry mi ,j in this matrix is the probability of generating and accepting configj from configi. In the first state all entries mi,j > 0 if configi ∈ N(configj) and 0 otherwise.
The next state of this matrix is mi,j ≥ 0 for configj →N configi.
If our neighbourhood function is reversible (if configj →N configi implies configi →N configj) and has the property of reachability (optimum configurations can be reached via neighbourhood transitions from any config ∈ R.
Kathleen Steinh ̈ofel (KCL) Lecture 5 SA and GA 15 / 26

Markov Chains
The sequence of matrices defines a Markov Chain.
We can show that given our generation and acceptance probability function
−∆c
e T we can show, our matrix converges to a stationary distribution for
such that
L→∞forconstantT andforT →0
mi,j = 1 and 0 otherwise. |OPT|
OPT being the set of optimum configurations.
Kathleen Steinh ̈ofel (KCL) Lecture 5 SA and GA 16 / 26

Genetic Algorithms
Genetic Algorithms is a general heuristic method a meta-heuristic for combinatorial optimisation problems:
Inspired by the process of genetic evolution,
Uses some notions of genetic evolution.
Configurations are called individuals. Instead of minimising the cost function of configurations, we maximise the fitness function of individuals.
That is, we apply genetic algorithms to maximisation problems, but minimisation problems (TSP, WGBP, . . . ) normally can be easily re-phrased as maximisation problems.
A genetic algorithm for a combinatorial optimisation problem maintains a collection of configurations, called a population of individuals.
In contrast, a simulated annealing algorithm maintains only one configuration.
Kathleen Steinh ̈ofel (KCL) Lecture 5 SA and GA 17 / 26

Crossover and Mutation
New individuals (new configurations) are created mainly by applying a crossover operation.
A crossover operation takes two individuals (parents) and combines them (somehow) to create one, or more, new individuals (off-springs). An off-spring should inherit some elements from both parents.
A crossover operation is the core, and usually the most challenging part, of designing a genetic algorithm. The performance of a genetic algorithm often depends crucially on the quality of the crossover operation used.
To create new individuals, genetic algorithms use also the idea of “small modifications” (as in simulated annealing), calling them mutations.
In a genetic algorithm, a crossover operation is the main mechanism to create new individuals. Only a small fraction of new individuals are created by mutations. In contrast, in simulated annealing new configurations are created solely by means of small modifications (mutations).
Kathleen Steinh ̈ofel (KCL) Lecture 5 SA and GA 18 / 26

Example Crossover
Assume that:
individuals can be represented as binary sequences of length n; Each binary sequence of length n represents a (valid) individual.
That would be the case for MAX-SAT.
1 1-point crossover operation: For two sequences p1 and p2 (the parents), select a random number i, 1 ≤ i ≤ n − 1 and copy the first i characters from p1 and append a copy of the n − i last characters from p2, to obtain a new n-character sequence c (an offspring).
p1 : 0010 110101 , p2 : 1000 100011 −→ c : 0010100011
2 2-point crossover operation: Select two random crossover points, split each parent into 3 intervals. The offspring takes the 1-st and 3-rd interval from p1 and the middle interval from p2. Example:
p1 : 0010 1101 01,p2 : 1000 1000 11 −→c: 0010100001
Kathleen Steinh ̈ofel (KCL) Lecture 5 SA and GA 19 / 26

Observations
For every problem, we could represent a configuration as sequences of characters or binaries (that’s how the are in memory).
That might not be a natural representation.
Random crossover points might not preserve good features. Random combinations could lead to invalid configurations.
Even if there is a reasonably representation as sequences, the problem that not all sequences represent valid configurations is very often the case.
Simple k-point crossover operations may produce invalid off-springs, so these simple crossover operations might not be useful, unless a good mechanism for dealing with invalid off-springs is devised.
Crossover operations are tailored to concrete problems, where configura- tions are represented in an appropriate way, not necessarily as sequences of characters.
Example: try to design a crossover operation for TSP.
Kathleen Steinh ̈ofel (KCL) Lecture 5 SA and GA 20 / 26

Genetic Algorithms Pseudocode
GENETICALGORITHM(Fitness, k, r, q)
{ parameter Fitness is the fitness function of individuals;
k is the population size in each generation (e.g., 100);
r is the fraction of the population generated by crossover (e.g., 0.75);
q is the mutation rate (e.g., 0.05) }
P ← a set of k individuals generated randomly; { the initial population }
repeat
{ calculate a new population S (the next generation) }
S ← empty set;
for h ∈ P, let Prob(h) = Fitness(h) ; { the probability of selecting h} 􏰃 Fitness(x)
x∈P
{ Crossover step }
select randomly r × k pairs of individuals from P according to probabilities Prob; for each selected pair (h1, h2):
z ← CROSSOVER(h1,h2); add z to S; { S has now r ×k individuals }
Kathleen Steinh ̈ofel (KCL) Lecture 5 SA and GA 21 / 26

Genetic Algorithms Pseudocode: cont.
{ Survival of the fittest }
select randomly (1 − r ) × k individuals from P according to probabilities
Prob, and add them to S; { S has now k individuals }
{ Mutate step }
select uniformly at random q × k individuals from S ;
for each selected individual h: MUTATE(h);
P ← S; { next generation has been created } until stopping condition;
select the individual h in P with the highest fitness; return h.
Kathleen Steinh ̈ofel (KCL) Lecture 5 SA and GA 22 / 26

Summary
When we design a genetic algorithm, we have to provide:
A definition of individuals (configurations)
(normally directly from the definition of the problem);
A fitness function;
(normally directly from the definition of the problem);
CROSSOVER procedure
(the most challenging part of the genetic algorithms method);
MUTATE procedure
(if available/employed, neighbourhood mechanisms from the simulated annealing can be used);
A strategy for random selection of individuals (various strategies have been proposed);
Values of parameters k, r and q; A stopping condition.
Kathleen Steinh ̈ofel (KCL) Lecture 5 SA and GA 23 / 26

Selection Strategies
The selection of individuals used in our description of a genetic algorithm (the selection for crossover and the selection for survival) is called
The Roulette Wheel Selection:
P the current population, h ∈ P: Prob(h) = Fitness(h)
􏰃 Fitness(x) x∈P
Kathleen Steinh ̈ofel (KCL) Lecture 5
SA and GA
24 / 26

Selection Strategies
Rank Selection:
Rank the individuals according to their fitness
(the fittest individual in a population of size k gets rank k, the second fittest gets rank k − 1, and so on),
Apply the roulette wheel selection to the ranks of the individuals. That is, the individual hi with rank i is selected with probability
Prob(hi) = i . 1+2+···+k
Tournament Selection:
Choose randomly q individuals and select the fittest one among them. The parameter q is a small number; for example, q = 3.
Kathleen Steinh ̈ofel (KCL) Lecture 5 SA and GA 25 / 26

Stoppping Criterion
Possible stopping conditions:
When an individual with the desired fitness is found;
When there is not much change in the last iteration (or not much change over the last few iterations);
When a pre-defined number of iterations (number of populations) have been performed.
Important:
Let the fittest individual, or a few most fit ones, always survive.
Kathleen Steinh ̈ofel (KCL) Lecture 5 SA and GA 26 / 26