Large Scale Optimization
We have seen a few examples of linear programs that are exponen- tially large on the size of the input of problem they are supposed to be modeling. This week we see some techniques to handle lin- ear program with a large number of constraints or variables. We will also cover an alternative algorithm for solving linear programs called the Ellipsoid algorithm.
8.1 Large number of constraints
Consider the following relaxation to the traveling salesman prob-
1 Recall that in the traveling salesman problem we are given a complete graph G = (V, E) with edge weights w : E → R+ andwewanttofind
a tour of the vertices with minimum weight.
∅⊂S⊂V suchthat
minimize subject to
e∈E wexe
∀ ∅ ⊂ S ⊂ V
Clearly, the formulation is too large even to generate except for very small values of |V|. However, we do not need to generate the whole description of the polyhedron, only the bits that are interest- ing for our particular objective function. Consider the polyhedra
|E| e∈δ(u)xe=2 ∀u∈V
P= x∈R+ :
xe≥2 ∀∅⊂S⊂V ,and
Q=x∈R|E|:
+ e∈δ(u) e
Notice that Q ⊇ P and that the size of Q is polynomial on the size of the instance. One strategy could be to optimize c · x over Q to obtain an optimal solution x∗ and check if x∗ ∈ P. If that is the case then x∗ is also an optimal solution for P.
On the other hand, if x∗ ∈/ P, there must be a subset of vertices
© Copyright 2021 Julián Mestre.
This work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike 4.0 International License.
week 8 – large scale optimization
discrete optimization 2
We could add the constraint corresponding to S to the polyhe- dron Q, re-optimize, and repeat the procedure.2 This strategy yields a sequence of polyhedra
Q=Q0 ⊃Q1 ⊃···⊇P
Notice that in order to carry out this scheme we only need an efficient procedure to detect whether a given solution x belongs in P; and if not, to return a violated constraint. We call such an algorithm a separation oracle3.
Lemma 8.1. The polyhedron P admits a polynomial time separation oracle.
Proof. Given a vector x, we need to check if x ∈ P. There are two types of constraints that make up P: Degree constraints and con- nectivity constraints. We check each type separately.
Each degree constraint is associated with a vertex u ∈ V; so there are only |V| many constraints to check.
Each connectivity constraint is associated with a vertex subset S. Since there are too many sets to check, we need to do this implicitly. Consider the problem of identifying the set ∅ ⊂ S∗ ⊂ V minimizing
xe. e∈δ(S∗ )
This can be done in polynomial time since it is equivalent to finding a global minimum capacity cut in the complete graph G where the capacity of edge e is xe. If the connectivity constraint defined by S∗ is satisfied, then we know that all connectivity constraints are. If the x does not satisfy the connectivity constraint defined by S∗ then we have found a violated constraint.4
To optimize over the polyhedra Q0, Q1, . . . , we can use simplex. One disadvantage of this approach is that each time we add one new constraint, we make the previous optimal solution infeasible, so we need to optimize from scratch. A better alternative is to solve the dual program where we have m constraints and an exponential number of variables. The next section deals with the problem of solving linear programs with many variables.
8.2 Large number of variables
The edge coloring5 problem is to partition the edge set of a given
graph G = (V,E) into a small collection of matchings. Let M be the set of all matchings in the input graph. Consider the following linear programming relaxation of the problem:
2 The hope is that we will only need to repeat this a modest number of times. If we need to generate all the constraints that make up P, there is no benefit in doing so in this iterative fashion.
3 The name separation oracle refers tothefactthatifx ∈/ P,weneedto return a hyperplane that leaves P on one side and x on the other side of the hyperplane.
minimize subjectto
xM =1 ∀e∈E xM ≥0 ∀M∈M
5 Edge coloring is polynomially solv- able in bipartite graphs. In fact, for this class of graph it is possible to find an edge coloring with ∆ = maxu deg(u) matchings, which is optimal.
For general graphs, we can always find in polynomial time an edge coloring with ∆ + 1 matchings, but it NP-complete to decide whether there exists a coloring with ∆ matchings.
4 In fact, the algorithm finds a most violated constraint!
week 8 – large scale optimization
discrete optimization 3
Suppose we wanted to solve the program with simplex. Recall that the algorithm works by moving from basis to basis until it finds an optimal solution. A basis of this program is defined by a collection of |E| matchings. The idea is to keep track of the current basis and to run simplex implicitly without actually constructing the entire program. To accomplish this we need to:
(i) be able to find an initial feasible basis, and,
(ii) given a basis, decide which new variable to bring into the basis and which variable to evict.
Finding an initial basis is usually an easy task. For example,
in our program, we can use the collection of matchings having a single edge; that is, for each edge e ∈ E we have the matching that contains only e. This is clearly, a basis.
Finding which variable to bring into the basis involves identi- fying a variable with negative reduced cost. Let B be the current basis. The reduced cost of the variable xM for some matching M is given by
cM − cTBA−1AM, B
where cM = 1 is the coefficient of xM in the objective function and
AM is the column of the constraint matrix that corresponds to xM.
If we let yT = cT A−1 then the reduced cost of xM is BB
Notice that the program is in standard form. Indeed, the constraint matrix has full row rank because the matching
M = {e} only appears in the constraint of edge e.
T 1−y AM=1−
Therefore, we can re-interpret the reduced costs of xM as 1 mi- nus the y-value of M. Therefore, if we find a matching M with maximum6 y-value, we will have identified the variable xM with smallest reduced cost. If the reduced cost happens to be negative we bring xM into the basis and continue with the execution of sim- plex. On the other hand, if the reduced cost of xM is 0 then we are at an optimal basic feasible solution.
8.3 The Ellipsoid algorithm
The results in the previous section tell us that we can run simplex over a polyhedron with many variables or many constraints with- out necessarily explicitly writing down the entire linear program: The only thing we need is separation oracle, or an oracle for identi- fying a variable with minimum reduced cost.
In the worst case, however, these methods may end up gener- ating a large number of constraints or variables. So the question remains, can these programs be solved efficiently? Amazingly, the answer is yes! In this last section we will go over the main ideas behind the Ellipsoid algorithm, which accomplishes this.
On the two-dimensional plane, an ellipsoid is the area enclosed by an ellipse. In higher dimensions an ellipsoid can be defined in
6 Maximum weight matching is poly- nomially solvable even in general graphs.
week 8 – large scale optimization
discrete optimization 4
terms of a ball and an affine transformation. The n-dimensional unit ball is defined as
Bn= x∈Rn:xTx≤1 .
An affine transformation f : Rn → Rn is defined by non-singular
matrix M ∈ Rn×n and a vector s ∈ Rn where f(x) = Mx+s.
Definition 8.1. An n-dimensional ellipsoid E is the image of the n- dimensional unit ball under some affine transformation f(x) = Mx + s; namely,
E = {Mx + s : x ∈ Bn} .
In in simplest form, the Ellipsoid algorithm does not optimize a
linear program but rather tests feasibility of a given polyhedron
As we have already seen, this primitive can be used to actually solve any given linear program. In fact, we will start by making some simplifying assumptions:
1. We can fit P inside a ball of radius R centered at the origin.
2. Either P is empty or we can fit a ball of radius ε inside of P.
3. Given an ellipsoid E centered at s and a half space H going through s, we can efficiently find an ellipsoid E′ such that
E ∩ H ⊆ E′ and volume(E′) < volume(E). 1
We are ready to formally list the pseudocode for the Ellipsoid
algorithm.
An alternative definition of ellipsoid can be obtained by putting together the definitions of affine transformation and unit ball:
{Mx + s : x ∈ Bn} ,
y:M−1(y−s)∈Bn ,
y : M−1(y−s) M−1(y−s) ≤ 1 ,
T −1T −1
y : (y − s) M M (y − s) ≤ 1 ,
P={x∈R :Ax≤b}.
T T−1 −1 y : (y − s) M M
(y − s) ≤ 1 ,
y : (y−s)TQ−1 (y−s) ≤ 1 ,
where Q = MT M. If M is non-singular then Q is positive definite. Conversely every positive definite Q matrix can be decomposed into Q = MT M for some non-singular matrix M.
1: 2: 3: 4: 5: 6: 7: 8: 9:
functionellipsoid(P,ε,R) E = Bn(R)
while s ∈/ P do
if we iterated more than n·(2n+2)·ln Rε times then return “P is empty”
let a·x ≤ b be a constraint of P that s violates H={x:a·x≤a·s}
E = ellipsoid E′ from assumption 3 w.r.t. E and H let s be the center of the new ellipsoid E
Theorem 8.1. Algorithm ellipsoid either returns a point inside P or correctly identifies that P = ∅, provided Assumptions 1 through 3 hold.
Proof. Clearly, if the algorithm returns a solution s then s ∈ P, so we only need to worry about what happens when P is reported to
week 8 – large scale optimization
discrete optimization 5
be empty. Let us assume that this happens after k = n · (2n + 2) · ln Rε iterations.
Let E0, E1, . . . , Ek be the sequence of ellipsoids computed by the algorithm. By Assumption 1, we have P ⊆ E0. By Assumption
3 and induction, it is trivial to see that in fact P ⊆ Ei for all i = 1,...,k.
By Assumption 2, the volume of P is at least the volume of Bn(ε). On the other hand, in each iteration the volume of the el- lipsoid gets progressively smaller. In fact,
volume(Ek) < volume(Bn(R)) k
= volume(Bn(R))
en ln Rε volume(Bn (R))
= volume(Bn(ε))
In other words, after k iterations, the ellipsoid E is too small to
contain a non-empty P provided Assumption 2 holds.
Our simplifying assumptions certainly makes proving the cor- rectness of the algorithm a breeze. Unfortunately, completely lifting these assumptions involves a fair amount of technical details that are beyond the scope of this class. You only need to know that el- lipsoid can be implemented to run in time that is polynomial on n, m, and the number of bits needed to write down the constraints.
8.4 Properties of the Ellipsoid algorithm
One remarkable property of ellipsoid is that the only thing needed to run the algorithm is to have a separation oracle.
Even though we have talked about ellipsoid in the context of linear programs, the algorithm can be used to solve optimization problems over arbitrary convex sets. The only thing we need is to be able to tell whether the center of the ellipsoid is feasible or to generate a hyperplane separating the center from the feasible region.
Finally, let us remark that for all its amazing theoretical impor- tance, ellipsoid has had little impact in practice. The degree of the polynomial bounding its observed running time is too large to be of much use in practice. There are, however, other algorithms based on the so-called interior point method7 that are competitive in practice yet have worst-case polynomial time complexity.
7 Interior point method, Wikipedia.
week 8 – large scale optimization
discrete optimization 6
1. Derive the dual of the linear programming relaxation for edge coloring. Notice that the dual has an exponential number of con- straints, and a polynomial number of variables. Design a separation oracle for the dual.
2. Let G = (V,E) be an undirected graph with edge capacities
c : E → R+.Foragivenpairofverticess,t ∈ V,letPbethe set of s-t paths in G. Consider the following linear programming formulation for the maximum flow problem.
maximize subjectto
xP ≥0 ∀P∈P
What do you think is the meaning of the variable xP? Show that the formulation is equivalent to the one where we have a flow variable fe for each edge e ∈ E.
3. Derive the dual of the previous program and design a separation oracle for it.
4. In the bin packing problem we are given a set of items {1, . . . , n}. Item i has a size si ∈ (0, 1). The objective is to partition the items into a small number of bins B1,B2,...,Bk such that s(Bi) ≤ 1 for i = 1,...,k. Let B be the collection of feasible bins:
B = {B ⊆ {1, . . . , n} : s(B) ≤ 1} Consider the following linear programming relaxation
minimize subject to
∀i = 1,...,n ∀B∈B
Suppose we wanted to run simplex without actually generating the whole program. What kind of optimization problem we need to solve in order to test if there is a non-basic variable with negative reduced cost?
5. Consider the problem of scheduling set of jobs J = {1, . . . , n} on a single machine. Each job j ∈ J has a processing time pj and a weight wj. Any ordering of the jobs σ1, σ2, . . . , σn induces a vector of completion times C1, C2, . . . , Cn where
xP ≤ce ∀e∈E
week 8 – large scale optimization
discrete optimization 7
Prove that any such vector of completion times is feasible for the following linear program that aims to minimize the sum of weighted completion times
subject to
pjCj ≥ 1 p2j + 1 pj 22
∀S ⊆ J ∀j∈J
* 6. Design an efficient separation oracle for the previous program.
* 7. Recall the following formulation for the minimum spanning tree
minimize subject to
xe ≤ |S|−1 xe =|V|−1
Design a separation oracle for this program using minimum cut.
week 8 – large scale optimization
discrete optimization 8
Solution of selected exercise
1. Recall that M is the set of all matching in the input graph and that the primal is
minimize xM
subject to xM = 1 M∈M:e∈M
∀ e ∈ E ∀M∈M
Its dual is as follow maximize
∀ e ∈ E ∗∗
To design a separation oracle, we need to find a matching M∗ ∈
Mmaximizingy(M ) = e∈M∗ ye. Ify(M ) > 1thenwehave found a violated constraint. Otherwise, y is feasible.