Week 1
Introduction to Discrete Optimization
This week we introduce the notion of optimization problems, draw a distinction between continuous and discrete optimization, in- troduce linear programming, and sketch the connection between polyhedral theory and optimization.
1.1 Optimization problems
An optimization problem is specified by a set of feasible solutions F and an objective function f : F → R. The objective is to find a solution x ∈ F minimizing1 f(x); that is, we want a solution x ∈ F such that f(x) ≤ f(y) for all y ∈ F.
Optimization problems can be classified into continuous or dis- crete depending on the nature of the set of feasible solutions F.
Continuous optimization
The defining feature of a continuous optimization problem is that its feasible set F can be described by real variables x1, . . . , xn.
An example for n = 1 is the problem of finding the value x > 0 maximizing−x2+x+2.InthiscaseF = {x∈R:x>0}and f(x) = x2 −x−2.
Although this unit of study deals mostly with discrete optimiza- tion, we will spend some time studying a special type of continu- ous optimization problem called linear programming.
A linear program is defined by a vector of variables x ∈ Rn, a system of linear inequalities Ax ≥ b defined by a constraints matrix A ∈ Rm×n and a vector b ∈ Rm, and a linear objective function c·xforsomevectorc ∈ Rn. Theproblemistominimizec·x subject to Ax ≥ b. In this case F = {x ∈ R : Ax ≥ b} and f(x) = c · x.
Discrete optimization
The defining feature of a discrete optimization problems is that its feasible set F is finite. It may seem at first that discrete optimiza- tion problems are trivial, since we can just go over every element in F and pick the best. The catch is that F is not given to us explicitly, but through a compact implicit representation.
1 The choice of minimization over maximization is arbitrary. Notice that we can maximize f(x) by minimizing −f(x), and vice-versa.
f(x) 2.25
0.5 x
The maximum is attained at x = 0.5.
© Copyright 2021 Julián Mestre.
This work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike 4.0 International License.
week 1 – introduction to discrete optimization
discrete optimization 2
As an example, consider the Knapsack problem were we are
given n items, where the ith item has size si and value vi and a
knapsack with capacity C, meaning that we can only fit a subset
of items whose combined size is no more than C. The objective
is to find a set of items with maximum value that can fit into the
A Knapsack instance:
knapsack.Inthiscase,F =S⊆[1,n]: s ≤Candf(S)=
The optimal solution is {1, 3} and
has value 500; the only other feasible alternative, {2}, has a value of only 400.
A maximum weight spanning tree instance:
2
ab
2 41
c3d An optimal solution is
{(a, b), (b, d), (c, d)}
2 Kruskal’s algorithm, Wikipedia.
3 One polyhedron, two polyhedra.
In R2 , a bounded polyhedron is simply the area enclosed by a convex polygon.
In R2, a hyperplane is a line and a half-space is a half-plane.
4RecallthatasetS⊆Rnisconvex ifforallx,y ∈ Sandλ ∈ [0,1]the vector λx + (1 − λ)y also belongs to S. The vector λx+(1−λ)y is said to be a convex combination of x and y
−
i∈S i
i∈S vi. Notice that we may have |F| = Ω(cn) for some c >
1. Therefore, an algorithm that tries all possible solutions in F may take an amount of time that is exponential on the size of the representation of the instance, which in the Knapsack case is just Θ(n). In fact, even if |F| were small, it is not easy to identify F out of the 2n possible subsets of the n items. The challenge, therefore, is to solve the optimization problem in time that is polynomial on the size of the representation of F.
Another example is the minimum weight spanning tree problem.
Given a connected graph G = (V,E) and an edge weight function
w : E → R+,wewanttofindasubsetofedgesT ⊆ Ethatis
connecting and minimizes e∈T w(e). In this case, F ={T ⊆E:(V,T)isconnected}
and f(T) = e∈T w(e). Unlike the Knapsack problem, we know of algorithms that solve the minimum weight spanning tree problem in polynomial time2.
1.2 Polyhedral Theory Polyhedra
Definition 1.1. A polyhedron3 is a set {x ∈ Rn : Ax ≥ b} defined by a matrix A ∈ Rm×n and a vector b ∈ Rm.
We say a polyhedron P ⊆ Rn is bounded if P ⊆ [−C, C]n for some C > 0, and unbounded otherwise.
Letabeanon-zerovectorinRn andd∈R. Then{x∈Rn :a·x=d} is called a hyperplane and {x ∈ Rn : a · x ≥ d} is called a half-space.
Half-spaces are the building blocks of polyhedra: Each constraint in the inequality system Ax ≥ b is a half-space. Indeed, a polyhe- dron is nothing but the intersection of a bunch of half-spaces. We can use this observation to prove that polyhedra are convex.
Theorem 1.1. Every polyhedron is a convex4 set.
Proof. It is not hard to show that the intersection of two convex sets is also convex and that half-spaces are convex. It follows then every polyhedron is convex.
Extreme points, vertices, and basic feasible solutions
Let us take a look at a very simple polyhedron in R2, the unit square P = (x1,x2) ∈ R2 : 0 ≤ x1,x2 ≤ 1. One can immediately see that P has four corners: {(0, 0), (0, 1), (1, 0), (1, 1)}. If we were to
(0, 1)
(0, 0)
(1, 1)
i si vi 1 21 300
2 35 400 andC=50 3 22 200
(1, 0)
week 1 – introduction to discrete optimization
discrete optimization 3
optimize any linear objective over P, the minimum will be attained at one of these four corners. This simple observation is true in gen- eral, but before we can prove that, we need to have a more formal definition of “corner” solution that can be generalized to higher dimensional polyhedra.
Extreme points. A vector x ∈ P is an extreme point of P if there does not exist y,z ∈ P−x such that x is a convex combination of y and z. Certainly, if a vector x could be written as the convex combina-
tion of two other vectors in P, we would not consider it a “corner” point.
Vertices. A vector x ∈ P is a vertex of P if there exists a vector c such that c · x < c · y for all y ∈ P − x.
This definition is more related to optimization: It says is that there is some linear program whose feasible region is P and its unique optimal solution is x.
Basic solutions. For this definition we need the notion of tight con- straint. Given a polyhedron P = {x ∈ Rn : ai ·x ≥ bi for i = 1,...,m} and a vector x ∈ Rn, we will say that the ith constraint is active or tight at x if ai · x = bi.
Then x ∈ P is a basic feasible solution if the set of active con- straints at x has full rank. In other words, x is the only vector in Rn that meets all active constraints with equality.
This definition is purely algebraic and does away with all the geometry of the problem. As such, it will be very useful when dealing with high dimension polyhedra where it is impossible to visualize things.
Equivalence of extreme points, vertices, and basic feasible solutions
The next Theorem states that our three tentative definitions of a “corner” point are, in fact, equivalent. The reason why we have three definitions for the same concept is that, depending on the context, it will be convenient to view polyhedra “corners” as ver- tices, extreme points, or basic feasible solutions.
Theorem 1.2. Let P be a non-empty polyhedron and x ∈ P. The following statements about x are equivalent:
i) x is a vertex.
ii) x is an extreme point.
iii) x is a basic feasible solution.
Proof. First, we show that every vertex is also an extreme point. For the sake of contradiction assume that x is a vertex but not an extreme point. Because x is not an extreme point, we can write x isλy+(1−λ)zwhere0 ≤ λ ≤ 1andy,z ∈ P−x. Becausexisa
The only extreme points of the unit square are its four corners.
The only vertices of the unit square are its four corners. For example, if weletc = (1,1),thenc·x > 0forall x ∈ P−(0,0) and c·(0,0) = 0.
The only basic solutions of the unit square are its four corners. For ex- ample, at (0, 1) we have two active constraints x1 ≥ 0 and x2 ≤ 1, which clearly have full rank.
It is not difficult to show that there is only a finite number of basic feasible solutions. (See exercises.)
week 1 – introduction to discrete optimization
discrete optimization 4
vertex, we can find c such that c·x < c·y and c·x < c·z. However, this leads to the contradiction that
c·x = c·(λy+(1−λ)z) = λ c · y + (1 − λ) c · z >λc·x+ (1−λ)c·x = c · x.
Therefore, x must be an extreme point.
Second, we show that every extreme point is also a basic feasible
solution. For the sake of contradiction assume that x is an extreme point but not a basic feasible solution. This means that there exists y ∈ P−xsuchthatifaconstraintistightatxitmustalsobe tight at y. Then it follows that for small enough ε > 0 we have
p = x+ε(y−x) ∈ P and q = x−ε(y−x) ∈ P. However, this leads to the contradiction that
x = 12 p + 21 q .
Therefore, x must be a basic feasible solution.
Finally, we show that every basic feasible solution is a vertex. As-
sume that the constraints defining our polyhedron P are of the form
ai·x≥bi.LetxbeabasicfeasiblesolutionandT ={i:aix=bi}
be the set of tight constraints at x. We define c = i∈T ai. No- ticethatc·x = b.Furthermore,forally ∈ Pwehave
i∈Ti
c · y ≥ i∈T bi and we have equality only if all constraints in T
are tight at y. Because x is basic feasible, we know that x is the only solution in P that has this property. Therefore, x is a vertex.
1.3 Bounded polyhedra
The convex hull of a set of vectors a1,a2,…,ak ∈ Rn is the set of
all convex combinations of these vectors:
kk CH({a1,…,ak})= λiai : λi =1andλi ≥0 .
i=1 i=1
The convex hull of a set of vectors is indeed convex5.
Theorem 1.3. Let P ⊆ Rn be a bounded non-empty polyhedron. Then P=CH({x∈Rn :xisanextremepointofP}).
It follows that if P is a bounded6 non-empty polyhedron, then the linear program
minimize c·x subject to x ∈ P
has an optimal solution in its set of extreme points for all c.
This means that even though linear programming over a bounded
polyhedron is a continuous optimization problem, we can think of it as a discrete optimization problem.
5 In fact, it is not difficult to show that it is the smallest convex set containing those vectors.
6 If P is not bounded and non-empty, it may be unbounded so the Theorem 1.3 does not hold in general. In fact, a linear program can have bounded objective even though its feasible region may have no extreme point.
week 1 – introduction to discrete optimization
discrete optimization 5
1.4 Polyhedra in standard form
It will be convenient for our algorithms to deal with polyhedra in standard form. Such polyhedra are defined by a matrix A ∈ Rm×n having m linearly independent rows and a vector b ∈ Rm.
P={x∈Rn :Ax=bandx≥0} isapolyhedroninstandardform.
Similarly, a linear program is said to be in standard form if its
feasible region is a polyhedron in standard form. It can be shown
(see exercises) that any linear program can be turned into an equiv-
alent linear program in standard form.
Consider the polyhedron P = x∈R3 :Ax=b, x≥0where
1 1 2 1 A= 2 0 −1 andb= 1 .
basis {1, 2} {1, 3} {2, 3}
Given a polyhedron P ⊆ R
n
in standard form, a set B ⊆ [1, n]
isabasisif|B| = mandAB hasfullrank. (WherethenotationAB
is a short-hand for the matrix defined by the columns of A indexed
by B; similarly, xB is the vector defined by the coordinates of x
indexed by B.) The vector x defined as xB = A−1b and x[1,n]\B = 0
B
is the basic solution induced by the basis B.
Notice that if a basic solution x also happens to be feasible it
agrees with our previous definition of basic feasible solution.
Exercises
1. Come up with three well known discrete optimization problems. For each problem, describe F and f.
2. Come up with an example of an unbounded polyhedron in R2.
3. Come up with an example of a linear program that has bounded objective and whose feasible region is non-empty and has no ex- treme points.
4. What does a polyhedron in R1 look like?
5. Let P be a polyhedron in Rn defined by m constraints. Prove
that P cannot have more than m basic feasible solutions. n
6. Show that every linear program of the form minimize c · y subject to Ay ≥ b
can be turned into an equivalent linear program of the form minimize c′ ·x subject to A′x ≥ b′, x ≥ 0.
Equivalent means that if we can solve the latter then we can solve the former. The second program need not have the same number of variables as the first.
basic solution
(0.5, 0.5, 0) yes (0.6, 0, 0.2) yes
feasible? (0, 3, −1) no
week 1 – introduction to discrete optimization
discrete optimization 6
7. Show that every linear program of the form minimize c·y subject to Ay ≥ b, y ≥ 0
can be turned into an equivalent linear program of the form minimize c′ ·x subject to A′x = b′, x ≥ 0.
Equivalent means that if we can solve the latter then we can solve former. The second program need not have the same number of variables as the first.
* 8. Show that every feasible linear program of the form minimize c·y subject to Ay = b, y ≥ 0
can be turned into an equivalent linear program of the form minimize c′ ·x subject to A′x = b′, x ≥ 0,
such that the rows in A′ are linearly independent. Equivalent means that if we can solve the latter then we can solve former.
* 9. Let A1,…,Am be vectors in Rn and assume m ≫ n. The convex hull of these vectors is
m
C= λ1A1+···+λmAm :
Show that each y ∈ C can be written as a convex combination of just n + 1 of these vectors.
i=1
λi =1andλi ≥0fori=1,…,m .
week 1 – introduction to discrete optimization
discrete optimization 7
Solutions of selected exercises
3. Rather than presenting the instance out of the blue, we follow the just-do-it approach7 to show the thought process of how to get to the instance. We start with some observations to guide our construction.
By Theorem 1.3 we know that if a polyhedron is bounded and non-empty, then it must have extreme points. Therefore, we must consider unbounded polyhedra.
By Theorem 1.2 we know that the system of linear inequalities should not have too many linearly independent constraints, other- wise, we are bound to have basic solutions.
If our linear program has a single variable, then either the fea- sible region is the whole real line (in which case the program will never have bounded objective) or it is subset of it (in which case it will have an extreme point). Clearly, if we are to make this work, we need at least two variables.
At this point, we need to start playing around with some simple program until we stumble upon something that works, such as:
minimize x1 + x2 subject to x1 + x2 ≥ 0.
The above program clearly has bounded objective (any solution with x1 + x2 = 0 is optimal and has value 0) and has no extreme points because it has a single constraint (we need at least two lin- early independent tight constraints in order to have a basic solu- tion).
6. Notice that the only difference between the two linear programs is that in the second program variables are only allowed to take non-negative values. Therefore, we need to be able to model unre- stricted variables (ones that can take positive and negative values) with restricted variables that can only take non-negative values. This can be accomplished by replacing each unrestricted variable yi with two variables x−i , x+i ≥ 0 using the relation yi = x+i − x−i . The new program would look like
minimize c·(x+ −x−) subject to A(x+ −x−) ≥ b, x+ ≥ 0,x− ≥ 0.
Given a feasible solution y of the first program, we can construct a feasible solution to the second program with the same cost as follows
x+i = max(0, yi) and x−i = − min(0, yi).
Similarly, given a feasible solution (x+, x−) of the second program, we can construct a feasible solution to the first program with the same cost as follows
y i = x +i − x −i .
It follows that an optimal solution solution of the second pro- gram can be transformed into an optimal solution of the first pro-
7 Just-do-it proofs. ’s Weblog, August 16, 2008.
week 1 – introduction to discrete optimization discrete optimization 8
gram using the above mapping. Therefore, if we can solve pro- grams of the second kind, we can also solve programs of the first kind.