Algorithms and Data
14: Network Flow II Professor Kevin Gold
•
•
Network Flow Review
With a source vertex s with unlimited supply, a sink t of unlimited demand, and edges with limited integer capacities, we find the maximum flow from the source to the sink
The Ford-Fulkerson algorithm finds the maximum flow by repeatedly sending the maximum flow down s-t paths, adding backwards edges to remind us how to backtrack
10/10 10/20 s 0/20t
The Max Flow is equal to the value of the Min Cut, the smallest
•
cut between source and sink
5/10 5/5
Today: Extensions and Applications of Network Flow
Multiple sources and sinks
•
Survey design: asking people about a limited number
•
of products they’ve purchased
• • •
Airline scheduling Image segmentation
Project selection: what sequence of projects will maximize profit when some are prerequisites for others
Multiple Sources and Sinks With Limited Supply and Demand
Suppose there are multiple sources and sinks on
•
the graph, instead of just s and t.
Each source can supply some limited amount of
Each sink demands some amount of stuff ti. +3
•
stuff, si.
+5 4 3
2 (4) 2
1 1 (3) 1 (6)
•
2
• We want to find a flow that carries the maximum +7 amount of stuff from these multiple sources to the multiple sinks, not exceeding any supply or
demand.
5
Is there a way to do this that uses our existing max
•
flow algorithm?
•
Solution: Create a supersource and supersink
Add s and edges from s to each source with capacity equal to the supply Add t and edges to t from each demand node with capacity equal to the
•
demand
The maximum flow on this graph will show how the maximum flow can get
•
from the sources to the sinks.
5+54 2(4)4 3323
s +3 11(3) t 726
+7
5 1(6)
Another Variant: Lower Bounds on Edge Flow
Suppose we’re required to put some minimum flow on particular edges in our
•
construction. (Maybe we’re bound by a contract to use a particular route…)
•
•
We can reduce this problem to the previous multiple-sink-and-source problem. Reduce the capacity of each edge by its lower bound.
Increase the demand of a node (or decrease its supply) by the sum of all
•
the outgoing lower bounds (siphon away what we would need for the extra flow)
Decrease the demand of a node (or increase supply) by the amount of
•
required flow going into it.
•
•
After finding a solution, add the required flows to the solution flows.
It’s already being fed by the flow we’ve removed from the problem
Another Variant: Lower Bounds
+54 2(1) 3 1 1 (3)
+4 3 (1) 1 (0) 2 0 0 (1)
+33 3
+1 2 +1 2
Suppose we require a minimum of 1 flow on
Reduce all capacities by 1, reduce demand by incoming edges, increase demand by outgoing edges
each edge… 1/4 +4 1/3 (1) 1
s 1 2 0 0 +1 2 +1 1/2
1/1 (0) 0
1/1 t (1)
+5 2/4 1/3
+3
1/2 (1)
1/1 2/3
1/1
(3)
1/1
Find a max flow on the s to t graph
Add the lower bounds of 1 back to the flow discovered that way
1/3
•
• •
• •
Suppose we’re an online retailer, and we have a database of customers and products — we know which customers bought which products.
We want to invite people to take surveys about these products.
Any one customer should not have to do a survey about all the products they’ve bought — we limit the questions to c per customer.
But we want to be sure each product is hit by at least p surveys. How do we pick who gets asked about which products?
Survey Design
Survey Design Through Max Flow with Lower Bounds
Treat the who-bought-what information as a bipartite graph —
Do bipartite graph matching, but put a max on the source-to-customer
•
customers to products, edge if bought
•
edges of c, the max questions
Put a lower bound on the product-to-sink edges of p, the number of
•
questions we want to ask about each product
1 1
min p sc 1 minpt
c
c
1 1
c bound: no more than c edges from node = c questions
min p
min p: minimum of p edges in = min p questions
• • •
Generalizations of Survey Design
How could we…
Tailor the maximum number of questions to individual customers?
Get larger or smaller sample sizes for different products? Ensure every customer is asked at least 1 question?
Force the number of questions asked of all customers to be the same, even if we need more than one question per customer?
1 1
•
•
1 1
min p sc 1 minpt
c
c
min p
•
Airline Scheduling (a little idealized)
Suppose an airline has only k planes and needs to figure out how to cover A particular flight has a start time, end time, start airport, and ending
E.g., BOS to LAX leaving 2pm EST arriving 5 PST
Exact time or even airport doesn’t matter, just “is there enough time after flight A to do maintenance, maybe fly to a different airport, maybe do some more maintenance, then do flight B”
Goal is to hit all the required flights with one plane apiece
•
all the flights they’ve sold tickets for
•
airport
That said, all that matters for modeling is a yes/no “can this plane do this
•
flight, then this other flight”
•
•
Airline Scheduling as Flow Problem
•
A unit of flow will represent a plane.
0/1 0/1
1/1
0/1
1/1
MIA 9:30
0/1
0/2
LGA 4:00
1/1
Represent the required flights as
0/1
•
edges with min flow 1, capacity 1: some plane must fly this.
BOS 7:00
source +2
•
capacity 1 edge between A’s end and B’s beginning (no min)
0/1
If a plane can fly A then B, add a
BOS 6:00
LGA 8:00
Capacity 1 edges from source to all
can’t get to DCA in time
•
flight starts, and from all flight ends
to sink (planes can start anywhere, end anywhere)
0/1
0/1
DCA 8:15
Supply k on source, demand k on
: possible
: required
•
sink, and edge with capacity k
sink (2)
directly from one to other for unused planes.
Airline Scheduling is
Actually More Complicated
The airline wants to maximize money and can change the flights offered Planes need to be in the right place for the next day
Crew must be picked up and rotated out
Laws & pricing rules further restrict the possibilities
For an interesting argument that the flip side, finding lowest airline fares for
consumers, is ridiculously complex (“undecidable”!) see:
http://www.demarcken.org/carl/papers/ITA-software-travel-complexity/img0.html
No shame in using tractable methods to approximate intractable problems!
Everything in real-life that has to do with airlines is rather more complex
•
than this
•
•
• • • •
•
Given an image, assign some pixels to foreground, some to background.
Image Segmentation
Each pixel, depending on its
•
color, has a cost fi for associating
it with foreground and cost bi for associating it with background
The goal; from (Nieuwenhuis and Cremers 2013). (They didn’t use the technique we’ll cover.)
Assigning noise in the hand to background would
make no sense. So we balance inherent color vs smoothness. (Source: http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/CREMERS2/)
Maybe a dark or blue pixel is
•
more likely to be background, reducing the cost bi
Wherever we make the cut, there
•
is a cost sij for splitting pixels that
is greater the more similar the neighboring pixels are.
• •
Image Segmentation
As a Min-Cut Problem Treat all pixels as vertices
Add directed edges of capacity sij (cost of cut) going both ways between each pair of adjacent pixels
f1
b1
Add edges with capacity equal to
•
cost fi of assigning to foreground connecting source b to each pixel i
b
Add edges of capacity bi connecting
•
pixels i to sink f
fn
bn
a sab b
cd
The min capacity cut on this flow
•
graph is the minimum cost assignment — it’s background iff connected to b after the cut
sac
sbd
scd
f
Image Segmentation
•
•
As a Min-Cut Problem Treat all pixels as vertices
Add directed edges of capacity sij (cost of cut) going both ways between each pair of adjacent pixels
Add edges of capacity bi connecting
•
pixels i to sink f
Paying the foreground costs
b1
Add edges with capacity equal to cost fi of
f1
•
assigning to foreground connecting source b to each pixel i
b
f
• The min capacity cut on this flow graph is the minimum cost assignment — it’s background iff connected to b after the cut
f bn n
Quick Quiz: So how do we get the min cut?
Paying the background costs
Paying the costs to separate pixels
Project Selection
•
We have some projects, some of which are prerequisites for each other
Some of them have costs outweighing their payoffs, and we’re only
•
interested in them to unlock prequisites
• • •
Others have net positive payoffs
How do we choose projects that maximize the amount of money we make? “Open-Pit Mining Problem”: is it worth mining through these boring rocks?
Sometimes you gotta dig for the good stuff
• • •
•
Project Selection
Vertices for each project
If the payoff is positive, edge with that capacity from source to project.
If the payoff is negative (so positive cost), edge with capacity equal to the cost to the sink.
Edges of infinite capacity going from a vertex to its prerequisites.
If the costs outweigh the benefits, a min cut cuts the payoff. If the benefits So do the projects on the source side after a min cut.
$40 sgold∞ $20t
•
outweigh the costs, a min cut cuts all the prerequisites.
•
$100
∞
rocks
more rocks
• • •
•
Project Selection
Vertices for each project
If the payoff is positive, edge with that capacity from source to project.
If the payoff is negative (so positive cost), edge with capacity equal to the cost to the sink.
Edges of infinite capacity going from a vertex to its prerequisites.
If the costs outweigh the benefits, a min cut cuts the payoff. If the benefits So do the projects on the source side after a min cut.
$40
s gold ∞ $20 t
•
outweigh the costs, a min cut cuts all the prerequisites.
•
$100
∞
rocks
more rocks
• • •
•
Project Selection
Vertices for each project
If the payoff is positive, edge with that capacity from source to project.
If the payoff is negative (so positive cost), edge with capacity equal to the cost to the sink.
Edges of infinite capacity going from a vertex to its prerequisites.
If the costs outweigh the benefits, a min cut cuts the payoff. If the benefits So do the projects on the source side after a min cut.
$40
s gold ∞ $20 t
•
outweigh the costs, a min cut cuts all the prerequisites.
•
$10
∞
rocks
more rocks
Project Selection Analysis
•
So minimizing this cut must maximize our profit.
Edges in the precedence graph aren’t in the cut, because those are all infinite. The cut’s value is the sum of the costs of all bad projects on the s side of the
•
So each cut edge is attached to s or t.
•
cut, plus the sum of the payoffs of all the good projects on the t side
In other words, the cut is all the money we are losing one way or another (costs
•
& missed opportunities) and there’s no other way to miss out on money.
Further, it’s a valid project plan because we can’t cut the infinite capacity
•
edges; so projects are never separated from their prerequisites.
$40
s gold ∞ $20 t
$10
∞
rocks
more rocks
•
Push-Relabel:
A Faster Max-Flow
We’ll now see a method that can be used for an
O(|V|2|E|) max flow algorithm, potentially faster than either algorithm above
The implementations we’ve seen so far for max flow run in either
•
O(|E|C) or O(|E|2log C) time
This could be modified to a more complex variant (“relabel-to-
•
front”) that is O(|V|3) — we’ll skip this
•
To analyze the running time of push-relabel, we’ll introduce the concept of a “potential” on a data structure — a function indicating how many times something has happened or could happen
•
The state of the algorithm will no longer maintain a valid flow at every iteration. Excess flow into a vertex is possible, where less leaves than enters.
Preflows
The push-relabel algorithm will continue to use a metaphor
•
of fluid flowing through pipes.
In pipe analogy, each vertex can “drain” arbitrarily much
•
flow to an excess reservoir.
As long as capacity constraints are satisfied and no more flows to edges is called a preflow.
•
flows out of each vertex than flows in, an assignment of
•
•
•
Height, Push, Relabel We will assign each vertex a value called a “height.”
The height of the source is always |V|. The height of the sink is always 0.
The height of each other vertex starts 0 and can increase,
A push operation will send more flow from a vertex to a downhill
•
potentially above |V|.
•
neighbor. Flow can only be pushed (slightly) downhill.
A relabel operation will raise a height, changing it to one more than
•
the height of the lowest height neighbor to which it can push more flow.
Push Can only be called if all three apply:
vertex u is overflowing (flow in > flow out) edge (u, v) has excess capacity
the height of u, h[u], is h[v] + 1
(v is a little downhill)
Increase the flow on (u,v) by the smaller of the this df(u,v)
Decrease u’s overflow by df(u, v)
Increase v’s overflow by df(u, v)
5/5
2/5
•
•
• •
9
2
h[v] = 4
2/5
h[v] = 4
0/10 h[u] = 5
u
v
•
overflow on u and excess capacity on (u,v); call
4/4
5/5
11
u
v
• •
9/10 h[u] = 5
4/4
The Residual Graph
5
4
3
2
h[v] = 4
As with Ford-Fulkerson, we would
So we again have a residual graph – in adding flow in a direction, we just decrease the capacity in that direction and add capacity going the opposite way
Now a push operation can also act to shove back flow we don’t want after all
•
like to be able to easily undo moves
92 10
h[u] = 5
u
v
•
5023 1
h[u] = 5 9 h[v] = 4 (f(u,v) = 9)
11
u
v
•
4
Relabel Can only be called when all of the
h[x] = 9
10
h[y] = 10
(
x
•
following apply:
1
h[a] = 6 1
10
10
9
h[b] = 7 10
•
vertex u is overflowing
no neighbor of u in residual graph has
•
a smaller height (h[u] ≤ h[v])
h[u] = 5
Over all neighbors v such that (u,v) is in Set height h[v] = hmin + 1
•
the residual graph, find hmin = min(h[v])
: reversed flow)h[c] = 8
•
In some cases this could result in
•
“giving back” excess flow with next push
h[u] = 8
a
u
b
y
c
u
•
Initialize:
10
10/10
h[a] = 0
9
h[b] =0
8
h[c] =0
The Generic Push-Relabel Algorithm
Set all heights h[v], flows f(u,v), excesses e[v]
•
to 0.
Set height of source s to |V|
9/9 h[s] = |V|
s
• •
Put maximum flow on the neighbors of source s: set flows on these edges to maximum, set excesses of neighbors appropriately.
8/8
b
•
•
While we can push or relabel, Select an operation and do it.
a
c
• •
We’ll now go about proving this terminates and is optimal
Lemma: The push and relabel operations maintain that for all edges (u,v) in the residual graph, h(u) ≤ h(v)+1.
(no steep drops)
max height of u
u u u
:new residual edge
height difference
:max height difference
for relabel
Maintaining
the Height Function
Beginning residual graph has all heights 0 besides
•
source, and no residual edges from source exist after maxing out capacities.
•
A relabel changes a height so u is at most 1 higher than its outgoing neighbors. (It can be arbitrarily taller than incoming neighbor edges.)
A push can only create a residual edge (a,b) where
•
h(a) = h(b) – 1 ≤ h(b)+1.
u
v
An Overflowing Vertex Can Be Either Pushed or Relabeled
max height of u
u
u
u
: relabel legal (if no pushes available)
Suppose we can’t push an
•
overflowing vertex u.
u
v
•
Since we just guaranteed
h(u) ≤ h(v)+1 and the fact that we can’t push means h(u) ≠ h(v)+1,
then h(u) ≤ h(v) and we can relabel.
(Heights are always integers.)
: push legal
•
There is Never a Path From Source to Sink in the Residual Graph
Suppose not, and there is a path from source to sink. Then there is a simple path (remove the cycle) which therefore has |V|-1 or fewer steps.
•
We maintain the height function property,
So on this path, h(s) ≤ h(v1)+1 ≤ h(v2)+1 … ≤ h(t)+1 … at most
•
h(u) ≤ h(v)+1 for all edges in the residual graph
•
“one step down” on each step from |V|
h(t) = 0, so h(s) ≤ |V| -1 (each of |V|-1 steps back increases But the height of the source is |V|; contradiction.
•
height by at most 1)
•
Proof: Push-Relabel Works
Initialization obeys capacity constraints, has only excess flow on vertices. Relabel doesn’t change the flow values.
A push doesn’t make excesses negative and obeys capacity constraints.
The algorithm can only terminate when there is no overflow on any vertex.
•
Loop invariant: The residual graph represents a valid preflow (flow with
•
possible excess on vertices)
• • •
At termination, the residual graph represents a valid preflow with no excess:
•
a real flow.
Since the residual graph does not have a path from source to sink, just as in
•
the termination condition of Ford-Fulkerson, this is a maximum flow.
•
•
Beginnings of a Runtime Analysis
We split the operations into three kinds: Relabels
“Saturating pushes” — pushes that max out
•
edges
•
“Nonsaturating pushes” — pushes that get rid of all a vertex’s overflow without maxing out an edge
At Most O(|V|2) Relabels
•
•
The max simple path length from the source is |V|-1, height only +1 per step by the height function and previous lemma, so
|V| + |V| – 1 = 2|V|-1
Lemma: Every overflowing vertex has a simple path back to the
•
source in the residual graph.
Proof sketch: It’s an overflow, so there’s a path to it in the flow,
•
so there’s a simple path in the flow, so there’s a path back in the residual
Lemma: The maximum height any vertex can reach is 2|V|-1.
So at most O(|V|2) relabels: each of |V| vertices incremented at
•
most 2|V|-1 times.
At Most O(|V||E|) Saturating
Pushes
For any particular edge, we can push across (u,v) only if v is
•
“downhill by 1” from u.
Now we can’t do any more saturating pushes across this edge
•
until a push happens in the opposite direction.
That can only happen if u becomes “downhill by 1” from v. The
•
height of v needs to increase by 2.
If the maximum height of any vertex is 2|V|-1, this process can’t
•
happen more than |V| times.
Making the same argument across all edges in both directions
•
results in the argument that there are O(|V||E|) saturating pushes
How Many Nonsaturating Pushes? Introducing Potential Functions
•
Each nonsaturating push requires flow downhill.
If we have a maximum height, then there’s a limit to the number
•
of times our flow can “fall.”
To analyze this, we introduce the concept of a potential
•
function — some function on a data structure that:
• •
Starts zero
Remains non-negative
It’s “potential” as in “potential energy,” which is here directly tied
•
to the metaphor of flow going downhill.
• •
•
Using a Potential Function
to Analyze Push-Relabel
Our potential function: Φ(G) = Σ h[v] over all nodes with excess
•
flow e[v]
0 at first because heights of source neighbors are 0
Strictly non-negative
A nonsaturating push decreases Φ(G) by at least 1.
All the overflow flowed downhill toward a node with height 1
•
less; so potential drops by 1.
We can therefore bound the number of nonsaturating pushes by
•
the total amount Φ(G) ever rises.
• •
•
• •
So the number of nonsaturating pushes is at most O(|V|2|E|).
So push-relabel’s running time is O(|V|2) + O(|V||E|) + O(|V|2|E|) = O(|V|2|E|)
Potential Function Analysis
of Push-Relabel Φ(G) = Σ h[v] over all nodes with excess flow e[v]
Two ways potential function can increase:
Relabel increases potential by at most the maximum height, 2|V| – 1
A saturating push turns one more node into a node with overflow,
•
increasing the potential by at most the maximum height, 2|V| – 1
So the most Φ(G) can increase is (2|V|-1)(O|V|2 + O(|V||E|)) = O(|V|2|E|) •
•
A nonsaturating push decreases the potential by at least 1.
Summary:
More Network Flow
Survey Design Airline Scheduling Image Segmentation Project Selection
We also an O(|V|2|E|) way of doing network flow, push-relabel, which introduced a concept we’ll talk more about next time: the potential function
We saw a few interesting extensions and applications of
•
network flow
•
• • • •