CS计算机代考程序代写 data structure compiler algorithm interpreter Limits of Computation 2020/21

Limits of Computation 2020/21
Part IV Complexity: Time Measure and Classes (Some notes on Lectures 12–15)
Bernhard Reus March 8, 2021
In this part of the course we focus on complexity results. We thus ask the question “is it feasible to compute …?” rather than “is is computable?” Therefore we focus on decidable problems and ask how much time it takes to compute the answer (yes or no for decision problems) for a given input.
12 Measuring time usage
Before we can look at any result in complexity we must decide how to measure time usage of programs and we want to do this for all models of computation we have discussed in the previous chapter. Having done that we will try to define classes of programs with similar runtime “needs” as well as problems that can be decided in “similar time”. We will make all this precise in this chapter.
12.1 Unit-cost time measure
For imperative languages (and machine models) with labelled instructions (which is all our models besides WHILE) we can use the following measure, called unit- cost time measure, which counts each instruction as a “unit”, thus not dis- tinguishing between any differences in time the various instructions may need to execute. The cost of the computation is then the sum of the costs of the instructions.
Definition of unit-cost measure: For an imperative language L, the func- tion timeL : L−program → L−data → N⊥ is defined a follows: For any p ∈ L-program, d ∈ L-data we define
L 􏰍 t if p⊢s1 →s2…→st wheres1 =Readin(d)andst terminal timep(d) = ⊥ otherwise
This means that with any completed program execution we associate as “run- time” the number of transition steps the program with the given input takes according to the semantics. Note that reading the input takes one step implic- itly as the first state is indexed by 1 and so after one step we’re in state 2 and so on.
1

⃝c Bernhard Reus, Sussex University 2018–21 Limits of Computation
Note also that the time a program takes for execution obviously depends on the input d. For different inputs it can (and very often will) take different time to execute the program. Since the time depends on the input, the running time measure is a function from L-data to natural numbers. It is a partial function since the program may not terminate on some input.
12.2 Is this unit-cost measure always “fair”?
When is the unit-cost measure not appropriate? It’s worth pointing out that the unit-cost measure treats all instructions the same way, discounting what those instructions actually achieve. If those instructions “did a huge amount of work” in terms of the size of the input then the measure would be questionable.
If we consider Turing machines, we know (as Turing explained himself in his seminal paper) that each instruction can only do very little (move left, right, write one symbol on the tape) so for this the unit-cost measure is fine. For GOTO we need to consider that we can only move data values around in variables as we cannot have proper expressions in assignments. Instead we need to use assignments of the form X:= cons Y Z or X:= hd Y. Those cannot contribute much to the size of a tree. The cons case for instance uses two trees already constructed (so the time for their construction should be already accounted for) and adds just one node to it. Thus, it is justified to count each assignment by one “unit” of time (whether it is one or two does not make a difference anyway).
The situation is a bit more complicated for RAM despite the fact that – as for GOTO– most instructions only can move data around (or jump): it is the binary operations that could create large numbers in one single assignment. Since this computation model deals with numbers as input and output we could construct polynomially big numbers in constant time with the help of those binary operators + and *. In order to get a “reasonably realistic” time measure we would have to employ logarithmic measure of the size of numbers (how many digits needed to represent them). This is entirely possible and not too difficult but we won’t do this in the course and leave if to the interested student for further study.
This issue is also the motivation to look at SRAM (Successor RAM) that does not allow us to use any binary operators on registers but provides just an increment (X:=Y+1) and decrement operator (X := Y- ̇1). Here the unit cost measure is realistic and adequate.
For Counter machines we somewhat have the opposite problem. For them only a counter itself can be incremented or decremented by one (X:=X+1 and X := X- ̇1), respectively), but counters can not be copied in one step (at cost “1”) from one counter to the other (like possible in SRAM). Any copying action will need to deconstruct a counter (using decrement) and build another one (or several) of the same value (using increment). Therefore, counter machines are too weak to compute interesting results in polynomial time and are thus disregarded for time complexity.1
1They are however usable for space complexity considerations.
2

⃝c Bernhard Reus, Sussex University 2018–21 Limits of Computation
Finally, for WHILE the unit-cost time measure is definitely not adequate as it would disregard the time it costs to build a tree used in an assignment. Consider for instance the assignment X:= [nil, nil, nil, . . ., nil] which stores a tree of arbitrary size (depending on how many nils we put in the list) in X in just one assignment that therefore incurs just the cost “1”. This surely is not “fair” or adequate. We thus need to consider the size of tree expressions in the commands of a WHLE program.
12.3 Time measure for WHILE
We first define a measure for expressions, The slogan is “count operations and constant in expressions of commands as well”. Note that for a “fair” measure of time complexity we exclude the equality = from expressions. Actually, we only define time measure for the pure WHILE-language without extensions. We know that all extensions can always be translated into the pure language anyway. Any such comparison can be programmed already in WHILE using just comparisons with nil2
Definition of time measure for WHILE expressions:
The function T maps WHILE expressions into the time it takes to evaluate them and thus has type:
T : WHILE-Expressions → N
It is defined inductively according to the shape of expressions as follows:
T nil
TX
ThdE TtlE Tcons E F
= 1 special atom = 1 where X is variable = 1+TE
= 1+TE
= 1+TE+TF
In other words, for atom nil and variables we require just 1 unit of time as the atom is a basic constant and lookup of a variable should not cost much either. For the head hd and tail tl expressions, respectively, we measure the time for evaluating their argument and then take the left, or right subtree, respectively which costs one unit as well. For the binary tree constructor, we count 1 for the constructor and comparison and add the time needed to evaluate the subexpressions E and F. So T Et describes the time it takes to evaluate a WHILE-expression E in terms of units of time. Those units of time are given as natural numbers. Note that our expressions have no side effects and thus their evaluation always terminates. Therefore, therefore T is a total function.
For instance, T conshdXtlnil = 1+(1+1)+(1+1) = 5. Note that in Neil Jones’s book an extra store σ appears that cane dropped here as expressions do not have side effects.
Next, we define a measure for WHILE commands. We will write C ⊢time σ ⇒ t to indicate that it takes t units of time to run command C in store σ. Like with
2As an exercise implement equality of trees without equality. 3

⃝c Bernhard Reus, Sussex University 2018–21 Limits of Computation
the semantics of WHILE (due to the possibility of non-termination) we describe the above relation as the smallest relation fulfilling some rules.
Definition of time measure for WHILE commands:
For a store σ the relation S ⊢time σ ⇒ t states that executing WHILE-statement list S in store σ takes t units of time. Deviating minimally from the grammar of Lecture 3, we allow statement lists to be empty here to avoid unnecessary distinction between empty and non-empty blocks. The relation • ⊢time • ⇒ • ⊆ StatementList × Store × N is defined as the smallest relation satisfying the rules below:
X := E ifE{ST}else{SE}
ifE{ST}else{SE} while E {S}
while E {S}
C;S
if TE=t
E􏰖E􏰗σ=nil, TE=t and
⊢time σ⇒t+1
⊢time σ⇒t+1+t′ if
⊢time σ⇒t+1+t′ if
SE ⊢time σ ⇒ t′ E􏰖E􏰗σ̸=nil, TE=t and
⊢time σ⇒t+1
ST ⊢time σ ⇒ t′
if E􏰖E􏰗σ=nil, TE=t
⊢time σ⇒t+1+t′ if ⊢time σ ⇒ 0
⊢time σ⇒t+t′ if
E􏰖E􏰗σ̸=nil, TE=t and S;whileE{S}⊢time σ⇒t′
C⊢time σ⇒t, S⊢σ→σ′ and S ⊢time σ′ ⇒ t′
First, we consider statement lists that just contain one command: For an assign- ment, we measure the time it takes to evaluate E and then add one to account for the time it takes to perform the assignment. In case of the conditional, we have to first measure the time it takes to evaluate E, add one for the time it takes to check whether it is nil or not and then add the execution time for the branch that is executed according to the result of E. As for the semantics of WHILE, we have two possible cases for the while loop: either the loop terminates or the body is executed (at least once). If for while E {S} it is the case that E evaluates to nil, we measure the time it takes to evaluate E and add one for the test of being nil. If, however, the guard evaluates to “true” (not nil) we need to unfold the loop (according to the semantics discussed earlier), so we measure again the time it takes to evaluate E, adding one for the equality test, but we now also have to add the time for executing the body S followed, again, by the entire loop.
For statement lists, we distinguish two cases. The empty statement list, ε, requires no computation whatsoever and thus costs zero units. For sequential composition of a commands C and a statement list S, we define the time measure to be the added measure for executing C and S. In order to be able to measure
4

⃝c Bernhard Reus, Sussex University 2018–21 Limits of Computation
the time for S we need to know the (intermediate) store in which to begin the execution, σ′. We get this from actually computing the result state of C run in σ.
Definition of time measure for WHILE programs:
For a WHILE-program p = read X {S} write Y we define the time measure
timeWHILE : WHILE-program → D → N •⊥
i.e. the time it takes p to run with input d as follows:
WHILE 􏰍 t+2 iff S⊢time σ0p(d)⇒t
timep (d) = ⊥ otherwise
Note that the program argument is written as a subscript here to enhance readability.
So the time it takes for program p to run on input d is the time it takes its
body C to execute in the initial state for input d plus one for the input (and
output)3. If the program p does not terminate on input d then the time measure
timeWHILE(d) is undefined. p
12.4 Comparing timed programming languages
We can now equip a programming language L (which consists of syntax and semantics, see Chapter 6) with a time measure and call the result a timed programming language.
Definition of timed programming language: A timed programming language thus consists of
1. two sets, namely L-programs and L-data L
2. a function 􏰖 􏰗 : L-programs → (L-data → L-data⊥) that describes the semantics of L and
3. a function timeL : L-programs → (L-data → N⊥), the time measure for L, L
such that for every p ∈ L-programs and d ∈ L-data we have that 􏰖p􏰗 (d) = ⊥ if, and only if, timeL(d) = ⊥.
Note that the last condition in (3) just prescribes the timing function (time measure) to be undefined on those inputs for which the semantics prescribes that the result is undefined (because the program does not terminate). So the time function returns undefined exactly for those inputs for which the program does not terminate. This makes sense since we cannot measure run time for non-terminating programs in a sensible way.
In Chapter 11 we discussed compilers and how programs could be compiled into equivalent programs of another language (computational model). If we can
3We could also add 2 to explicitly acknowledge input and output, but this would not make any difference to what we are doing.
5

⃝c Bernhard Reus, Sussex University 2018–21 Limits of Computation
compile language L into language M (preserving program semantics) we can say that M simulates L. Now we want to add timing information to this. To keep things simple, we assume that both languages have the same datatype4. We thus define the following simulation relations for languages:
Definition of simulation relation
Suppose we are given two timed programming languages L and M with L-data = M-data. We define
1. L ≼prime M if for every L-program p there is an M-program q such that LM
􏰖p􏰗 = 􏰖q􏰗 and a polynomial f(n) such that for all d ∈ L-data timeMq(d) ≤ f(timeLp(d))
In words: M can simulate L up to polynomial difference in time. (Or alternatively: L can be simulated by L up to polynomial difference in time.)
The inequality above can be explained in words as follows: the time it takes for simulating the execution of program q with any input (worst- case analysis!) is less or equal a polynomial of the time it takes the original program p to run with the same input.
2. L ≼lintime M if for every L-program p there is a constant ap ≥ 0 and an LM
M-program q such that 􏰖p􏰗 = 􏰖q􏰗 such that for all d ∈ L-data timeMq(d) ≤ ap · timeLp(d)
In words: M can simulate L up to linear difference in time. Accordingly, ap is called the overhead factor. It can be less than 1 (speedup) or (more likely) greater than one (slowdown).
The inequality above can be explained in words as follows: the time it takes for simulating the execution of program q with any input (worst- case analysis!) is less or equal a constant program dependent factor ap times the time it takes the original program p to run with the same input.
3. L ≼lintime−pg−ind M if for every L-program p there is a constant a ≥ 0 and LM
an M-program q such that 􏰖p􏰗 = 􏰖q􏰗 such that for all d ∈ L-data timeMq(d) ≤ a · timeLp(d)
In words: M can simulate L up to a program-independent linear time dif- ference (or overhead factor).
The inequality above can be explained in words as follows: the time it takes for simulating the execution program q with any input (worst-case analysis!) is less or equal a constant program independent factor a times
4In cases where the data types are different one can still apply the same techniques but the translations get more complicated as one needs to encode data values and this is somewhat distracting.
6

⃝c Bernhard Reus, Sussex University 2018–21 Limits of Computation
the time it takes the original program p to run with the same input. Ob- serve how this establishes a stronger, i.e. more restrictive, definition of “simulation up to linear time”, where the linear overhead factor does not actually depend on the program p in question, but only on the program- ming language itself.
We can now define a simulation equivalence between timed programming lan- guages as simply the symmetric closure of simulation (just like equality is the symmetric closure of ≤):
Definition of simulation equivalence
Suppose we are given two timed programming languages L and M with L-data = M-data. We define
1. L ≡lintime −pg −ind M iff L ≼lintime −pg −ind M and M ≼lintime −pg −ind L. In words: L and M are strongly linearly equivalent.
2. L ≡lintime M iff L ≼lintime M and M ≼lintime L. In words: L and M are linearly equivalent.
3. L ≡ptime M iff L ≼ptime M and M ≼ptime L. In words: L and M are polynomi- ally equivalent.
13 Complexity Classes
Runtime bounds are total functions from natural numbers to natural numbers. They describe the time (i.e. number of time units) a program may take for a specific input in terms of the size of the input. Note that ⊥ ̸≤ n for any n, so programs with a time bound f are necessarily always terminating5. Since we only look at asymptotic time complexity we are interested in how the time bound (and thus the time needed at most by a program) grows when the input grows. In the image below several functions are plotted (quadratic in blue, exponential in red, linear in olive green) that describe such time bounds. According to the above, we are interested in the behaviour when we move (far) right on the x-axis. The time needed is supposed to increase, the question is by how much?
5This is of course sufficient since we are interested in runtime of decision procedures.
7

⃝c Bernhard Reus, Sussex University 2018–21 Limits of Computation
f (x)
x
Recall that we abbreviate by |d| the size of a data value d ∈ L-data. Recall that we have defined this size function for D explicitly in Section 3.1.
Definition of programs with bounds Given a timed programming language L and a total function f : N → N, we define three sets of time bounded programs
1. Ltime(f) = { p ∈ L-programs | timeLp(d) ≤ f(|d|) for all d ∈ L-data }
This is the set of programs that have a runtime that is bounded by function
f.
2. Lptime = 􏰦p is a polynomial Ltime(p(n))
This set is the union of all set of L programs that have a polynomial function p as time bound. Recall that a polynomial function is of the form f(n) = ck × nk + ck−1 × nk−1 + × + c2 × n2 + c1 × n + c0.
3. Llintime = 􏰦k≥0 Ltime(k×n)
This set is the union of all set of L programs that have a linear function
f as time bound. As linear functions are of the form f(n) = k · n.
4. Lexptime = 􏰦p is a polynomial Ltime(2p(n))
This set is the union of all set of L programs that have an exponential function f as time bound. An exponential functions are of the form f(n) = 2p(n) where p is polynomial.
Why do we focus on linear time and polynomial time? Why don’t we like exponential time bounds for algorithms?
Let us have a closer look at the growth of linear, polynomial, and exponential functions. Assume that you have various algorithms for a problem and that a unit in the unit cost measure is in “real time” 10−9 seconds. Consider how much time it takes (what the result of the time bound functions is) if we consider functions f(n) = n, f(n) = n2, f(n) = n3, f(n) = 2n and f(n) = 3n. The
8

⃝c Bernhard Reus, Sussex University 2018–21 Limits of Computation
table below6 shows the time bound expressed in the corresponding “real time” (instead of natural numbers) to convey a better feeling for the large numbers involved7 .
size = n 10
30
50
n 0.01 μs 0.03 μs 0.05 μs
n2 0.1μs
0.9μs 2.5μs
n3 0.01 ms
24.3 ms 0.31 s
2n 1 μs
1 s 13 days
3n 59 μs
2.4 days
2.3 ×105 centuries
Please note that we have only used polynomials with small exponents. In pro- gramming usually the polynomials have small exponents as those correspond to nested loops and their is only a certain depth of nesting needed. It is obvious that exponential time bounds are too generous and do not guarantee good run- time behaviour. But there is another reason why polynomial time is considered in complexity theory which will be discussed in the next section.
Now that we have defined classes of programs with time bounds, we are interested in defining classes of problems that can be decided within certain time bounds.
One problem we run into here is that different languages, machine models, i.e. models of computation, use different data-types. But we want to compare those classes of problems so we need to “unify” the domain of discourse of those problems. Let us simply take the words over alphabet {0,1} as this type, i.e. {0,1}∗. In away they correspond to words of bits which is how information is stored on modern devices anyway. It is also clear that Turing machines take such words as inputs (restricting the alphabet accordingly). In languages with binary tree type D we need to encode those words as lists of 􏰯0􏰰 and 􏰯1􏰰 which is not a problem. The word 010 corresponds then to the list [ 0, 1, 0 ] or [ nil, [nil, nil], nil ]. For register machines we can interpret lists of 0 and 1 as binary numbers and represent them e.g. in decimal.
Below we define time complexity classes for problems. We use boldface short abbreviations to denote complexity classes of problems as it is standard in the literature.
Definition of complexity classes Given a timed programming language L and a total function f : N → N, we define four (complexity) classes of problems:
1. The class of problems L-decidable in time f is:
TIMEL(f) = {A ⊆ {0,1}∗ | A is decided by some p ∈ Ltime(f) }
In other words, this is the class of all problems (or sets) A about words over alphabet {0, 1} that are decided by an L-program with a runtime that is bounded by time bound (function) f.
2. The class PL of problems L-decidable in polynomial time is:
6from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.27.3433&rep=rep1&type=pdf; notes by Luke Ong
7It seems humans can much better grasp large amounts of time than large natural numbers. Maybe this is a due to evolution?
9

⃝c Bernhard Reus, Sussex University 2018–21 Limits of Computation
PL ={A⊆{0,1}∗ |Aisdecidedbysomep∈Lptime }
In other words, this is the class of all problems (or sets) A about words over alphabet {0, 1} that are decided by an L-program with a runtime that is bounded by a polynomial.
3. The class LINL of problems L-decidable in linear time is:
LINL ={A⊆{0,1}∗ |Aisdecidedbysomep∈Llintime }
In other words, this is the class of all problems (or sets) A about words over alphabet {0, 1} that are decided by an L-program with a runtime that is bounded by a linear function.
4. The class EXPL of problems L-decidable in exponential time is:
EXPL ={A⊆{0,1}∗ |Aisdecidedbysomep∈Lexptime }
In other words, this is the class of all problems (or sets) A about words over alphabet {0, 1} that are decided by an L-program with a runtime that is bounded by an exponential function.
We often drop the superscript indicating the language in question, and simply use LIN, P, or EXP if we implicitly understand which programming language we are talking about.
We often drop the superscript indicating the language in question, and sim- ply use LIN, P, or EXP if we implicitly understand which programming lan- guage we are talking about.
Another justification to drop the superscript would be if we were allowed not to care which programming language we use. In the next section we will justify exactly that. We will look at the “robustness” of the class P, i.e. how PL relate for different languages L and we will meet yet another thesis that is similar to the Church-Turing Thesis.
13.1 Lifting Simulation Properties to Complexity Classes
We would like to establish relationships between complexity classes that use different kinds of “effective procedures”. It turns out we can use the simulation relations defined earlier and lift them straightforwardly to relations between complexity classes.
Lemma L ≼lintime M implies LINL ⊆ LINM, and as a consequence, L ≡lintime M implies LINL = LINM.
This means that if L can be simulated by M up to linear time difference then every problem in LINL is already in LINM. And then, as a consequence, that if L and M are linearly equivalent then LINL and LINM are the same.
Proof in exercises.
We get a similar result for polynomial time:
Lemma L ≼ptime M implies PL ⊆ PM, and as a consequence, L ≡ptime M implies PL = PM.
Proof in exercises.
10

⃝c Bernhard Reus, Sussex University 2018–21 Limits of Computation
The next little result describes how the different simulation relations are related w.r.t. each other:
14 Robustness of P
In the previous chapter we have seen how to measure the runtime of programs, and how to define complexity classes of problems accordingly. It is natural to ask what runtime is “feasible” or “tractable”. Which classes contain only problems that are feasible? It is also natural to ask “For complexity issues, does it matter which model of computation is used?” These questions will be addressed in this chapter with a specific focus on polynomial time computability. It will turn out that there is some evidence to consider polynomial time “good”, i.e. programs that run in polynomial time are “feasible”. The picture is however not that black and white.
“Robustness” is an invariance property. Before, we considered invariance of computability and decidability: What is computable (or decidable) is invariant, it does not depend on e.g. what model of commutation is chosen (see Church- Turing thesis). Now we consider whether time-bounded computation is (in a sense to be specified) “invariant” too.
We need a refined definition of “robustness”. Neil Jones suggests. criteria for resource-bounded problem solving. He writes [Chapter 18, page 271]:
“Ideally, resource-bounded problem solvability should be:
1. invariant with respect to choice of machine model (ie. model of computation)
2. invariant with respect to size and kind of resource bound (e.g. quadratic time etc.); and
3. invariant with respect to the problem representation (e.g. the choice to represent a directed graph by an incidence matrix or by adjacency lists should not make a complexity difference).”
The first item states that the defined complexity classes should be robust un- der compilation between models of computations (machine models/languages). And we will show this in this section. The second item (not quite so clearly put) simply says that it should be invariant under instances of the chosen kind of resource bound. It should not matter, for instance, which quadratic time function is chosen if one defines a class for quadratic time.
As the Extended Church-Turing Thesis as discussed above may not hold, there is a weaker version of the Extended Church-Turing Thesis for which we can find plenty of good evidence.
Cook’s (Invariance) Thesis All “reasonable” sequential notions of computa- tion can simulate each other up to a polynomially factor.
This is also known as Cook’s Thesis in Neil Jones’ book which phrases it as follows:
11

⃝c Bernhard Reus, Sussex University 2018–21 Limits of Computation
“PL is the same class of problems for all reasonable sequential (that is, nonparallel) computational models L.”
This means that whatever (reasonable) sequential computational model we use to decide problems, the difference in running time will be a polynomial factor. Similarly to the Church-Turing thesis, we can provide some evidence for this thesis which is, again, widely believed in computer science. We can give evidence by showing that the various instances of P for those models we defined formally are equal. In other words that
PSRAM = PTM = PGOTO = PWHILE = PWH1LE = PCA
It is important here that the notions of computations (models of computa- tion) are restricted to sequential ones. This excludes true parallel computation that needs special treatment in complexity. The cellular automata are not ex- cluded since we stipulate that their seed is always finite.
14.1 Evidence for Cook’s Thesis
We will show some auxiliary lemmas that relate the simulation property between languages to the inclusion relation between complexity classes. Once we have those lemmas, we can show inclusion (and equality) between complexity classes by timed simulation (or equivalence) between languages and thus by providing compilers that make certain guarantees about the runtime behaviour of their target programs w.r.t. source programs.
To provide evidence for Cook’s thesis we look at the various models of com- putation and use the compilers discussed in Chap 11. We analyse the runtime of the compiled program and compare it to the runtime of the original program, obtaining statements of the form L ≼lintime M or L ≼ptime M. With the auxiliary lemmas we can then derive results like
PL ⊆ PM LINL ⊆ LINM
Listed below are the main results without detailed proof. The time differ- ences between source and target program can be derived by inspecting the code that the compilers generate.
Lemma
TM ≼lintime−pg−ind GOTO ≼lintime−pg−ind SRAM ≼ptime TM
Proof: The result of compiling a Turing machine program into a GOTO program results in a GOTO program that is at most a constant (program-independent) linear factor slower than the original Turing machine program. Similarly, any GOTO program can be compiled into an SRAM program with only a (program- independent) constant slowdown. Compiling an SRAM program, however, into a Turing machine program will produce a Turing machine program that has a time bound that differs from the one of the original program by a polynomial.
12

⃝c Bernhard Reus, Sussex University 2018–21 Limits of Computation
This means that the resulting Turing machine program may be significantly slower. This is not surprising as the Turing machine head must move across the tape to find the right encoded registers and register values and the numbers (represented in a binary encoding on the Turing machine tape). By carefully monitoring how far the head moves in the worst case in terms of the size of the input one observes polynomial slow-down.
Lemma
TM ≼lintime CA ≼ptime TM
From the above lemmas as well as the “lifting” lemmas of the previous section
it follows that: Theorem It holds that
PCA = PTM = PGOTO = PSRAM
14.2 Cobham-Edmonds Thesis
The Cobham-Edmonds thesis is also known as Cook-Karp thesis in various text- books.
Cobham-Edmonds Thesis The tractable (feasible) problems are exactly those in P.
Again, this is “only” a thesis, because “tractable problems” is not a formally defined class. The statement is quite a strong one for which there is much less evidence as for Cook’s thesis. It is true that polynomial functions have a much more benign rate of growth than exponential functions and as such are more welcome as time bounds. But there arise two questions:
1. Is every polynomial time bound really a “good” time bound indicating feasibility?
2. Is every time bound beyond polynomial really a “bad” time bound indi- cating intractability?
Well, one might answer both questions with “no”. Regarding (1), consider polynomials with a very high degree, e.g. f(n) = n20. Is a problem with this time bound really tractable? If we consider f(n) = n20, for n = 10 we obtain a value of 31.68 centuries! Surely this is not feasible, and taking a larger exponent like n100 even less so. So in practice, “polynomial” is not automatically a synonym for “good”. On the other hand, one could argue that the argument (1) is not relevant because one does not know any algorithms that have a polynomial time bound where the degree is higher than 5 or 6,
Regarding (2), there are algorithms with exponential worst case complexity, like the simplex algorithm. was discovered in 1947 by George Dantzig for linear programming which performs much better on average than polynomial algo- rithms for not too large problems. The simplex algorithm solves this problem, which we discuss in more detail soon. On the other hand, one could argue that argument (2) is not relevant because we are interested in worst-case complex- ity not average-case and because an alternative polynomial algorithm has been found for linear programming.
13

⃝c Bernhard Reus, Sussex University 2018–21 Limits of Computation
In any case, the Cobham-Edmonds Thesis, also known as Cook-Karp Thesis, is slightly controversial.
It also recaps Big-O notation, so here is the part of 15 you already should know form previous modules:
15 Recap: Big-O
You should have already encountered the so-called “Big-O” notation8, short O, in your algorithms and data structures or program analysis course. Here “O” stands for the “order of growth rate” of a function, ignoring (discounting) constant factors and only looking at what the function does in the limit (ignor- ing the initial values on the “left” of the x-axis so to speak). When measuring runtime it makes sense to disregard the concrete time measure of a program as this depends on many “little” factors like style of coding or even optimisations. Abstracting away from such constants allows one to focus on how the runtime increases with the size of the input independent of minor implementation deci- sions or from the speed of a specific processor or machine that is used to run the program. This will become most helpful in the following chapters.
In our quest for infeasible problems, constant factors are only relevant for small values anyway. For instance, it makes a big difference to us whether a program runs for 1 second or 50 seconds. However, if we consider large input and thus long running times, it won’t make any difference to us whether a program runs for 1 century or 50 centuries, both are equally useless to us.
The “Big-O” notation will be useful to indicate the quality of an algorithm in terms of asymptotic worst-case runtime.
We say g(n) ∈ O(f(n))), or equivalently g ∈ O(f), to mean that g(n) approximately (up to a constant factor) grows at most as quickly as f(n). So g can grow even more slowly than f.
Definition Let f : N → N be a function. The order of f O( ), short O(f), is the set of all functions defined below:
{g:N→N|∀n>n0.g(n)≤c×f(n)forsomec∈N\{0}andn0 ∈N}
In other words O(f) are those functions that up to constant factors grow at most as fast as f (or, in other words, not faster) and are thus at least “not worse” a runtime bound than f (maybe even “better”). For g ∈ O(f) we also say g isO(f).
The n0 in the definition specifies the upper bound for “small numbers” one ignores for the purpose of asymptotic complexity considerations and c allows
The following Chapter 15 has not been covered in the 2019/20 version of this module and will not be part of the assessments in 2019/20. It is left in these notes so that interested students can learn some more complexity theory if they wish to do so.
8“O” stands for order.
14

⃝c Bernhard Reus, Sussex University 2018–21 Limits of Computation
one to ignore constant differences that in the limit are negligible as well. Recall that limn→∞ nc = 0 whatever constant c is.
Example Consider the following examples of O( ):
1. n is O􏰇n2􏰈 as it grows “at most like n2” In this case the constant factor
can be one as obviously n ≤ n2 for all n.
2. 3n2 is O􏰇n2􏰈, as it grows “like n2 up to a constant factor.” This can be
seenbysettingn0 to0andcto3.
3. 34n2 + 23n + 12 is O􏰇n2􏰈 as it grows “like n2 up to a constant factor.” This can be seen by setting n0 to 0 and c to 69. This suffices as we have 34n2+23n+12≤69n2 foralln>0because35n2 ≥23n+12forn≥1.
The O( ) notation is widespread to explain the complexity of algorithms. Also there, constant factors are ignored since they might depend on details of the coding of the algorithm or the language it is implemented in.
Example A very famous example is sorting where the input is an array of data values and its size is usually measured as the length of the array. It is well known that for worst case complexity, merge sort is in O(n log2 n) whereas quick sort is in O􏰇n2􏰈. For average case complexity both are in O(nlog2 n). We will only be considering worst case complexity in the following.
For a concrete program p one can give a concrete time bound like timeWHILE(2n2) p
but for algorithms this is difficult as they can be implemented by various pro- grams that differ slightly. The “O( )” notation will take care of this. On the one hand, when taking about algorithms, we might therefore say something like
“(The runtime of) quick sort is O􏰇n2􏰈.”
where n is understood to be the size of the array to be sorted (i.e. the input). But on the other hand, when talking about concrete programs, we are able to say
“The WHILE-program p that “does” quick sort is in WHILEtime(9n2+37).” although we might only be interested in the fact that
“The WHILE-program p that “does” quick sort is in WHILEtime(f) were f(n) ∈ O􏰇n2􏰈.”
From these examples we see the advantage of allowing the use of “Big-O” and we would like to make it available to the definition of complexity class TIME:
Definition We define another complexity class using “Big-O” as follows: TIME(O(f)) = 􏰧 TIME(g)
g∈O(f)
15

⃝c Bernhard Reus, Sussex University 2018–21 Limits of Computation
16 Hierarchy Theorems
Previously, we have defined classes of problems that can be decided (solved) by programs that run with certain time bounds. For instance, class TIMEL(f) is the class of problems that can be decided by an L-program with time bound f. The latter means that for a problem A in this class the question whether some d ∈ L-data is in A can be decided by running an L-program p such that the running time is bounded by f(|d|), i.e. timeLp(d) ≤ f(|d|). Note that we decided that for all complexity classes of problems we fixed L-data to be (suitably encoded) words over {0, 1}.
An obvious question that now arises is what the hierarchy of those complexity classes is: is the class TIMEL(f) bigger than TIMEL(g) (i.e. TIMEL(f) 􏰵 TIMEL(g)) if f is, in some appropriate sense, “bigger” than g?9 In other, words,
Can we decide more problems if we have more time available to do so?
This is a very natural question to ask and the so-called Hierarchy Theorems give us some answers.
16.1 Linear Time Hierarchy Theorems
Let us first consider more in detail what “bigger” means for time bounds. If we talk about linear time bounds, i.e. linear functions, then it should be obvious that function f(n) = a × n is “bigger” than g(n) = b × n if a > b. Therefore, for linear bounds, we can reformulate the above question to
Do constants matter for linear time complexity (classes)?
We can thus rephrase the original question for the special cases of linear runtime as follows:
Can L-programs decide more problems with a larger constant in their linear running time allowance?
or, more formally
Does a n then 􏰖tu􏰗
either timeWH1LE(d) or n steps, whatever is the smaller. If timeWH1LE(d) ≤ n, i.e.
p terminates within n steps, then tu produces a result that is not nil, namely a single element list that contains as its only element p’s result. If not, then tu produces nil indicating “time limit exceeded”. So a timed universal program has a built-in stopwatch. It is given an additional timeout parameter and when the number of steps simulating the argument program has exceeded the timeout parameter the interpreter stops and returns nil. As usual, we drop the encoding brackets around lists for the sake of readability.
Definition Assume L is a timed programming language with programs-as-data and pairing. A timed universal L-program tu is efficient if there is a constant k such that for all p, d ∈ L-data and n ≥ 1:
timeLtu[p, d, n] ≤ k × min (n, timeLp(d))
In other words, interpreting p using an efficient timed universal program is only a constant factor k slower than the running time of p where k is a program- independent constant.
The timed universal program still aborts program execution after the time- out parameter has been reached. That also means that the runtime of a timed universal program is bounded by the third input parameter (interpreted as a number) times some program-independent constant k.
Here is a timed universal program for WH1LE can be programmed as a WHILE- program tw.
2. If timep
This definition states that the effect of 􏰖tu􏰗
[p,d,n] = nil WH1 LE
[p, d, n] is to simulate p for pp
tw read X {
B := hd tl hd X;
val := hd tl X;
Cntr := hd tl tl X;
DSt := nil;
CSt := B;
state := [CSt,DSt,val]; while CSt {
if Cntr {
switch hd hd CSt { case quote, var,
(*X=[p,d,niln]*)
(* code block to be executed *) (* initial value for variable *) (* Time bound *)
(* initial data stack *)
(* initial code stack *)
(* initial state *)
(* continue interpretation *)
17
doHd, doTl,

⃝c Bernhard Reus, Sussex University 2018–21 Limits of Computation
}
write X
insufficient.
}
(* wrap up potential result *) (*endif*)
(* out of time so stop *)
(* and return nil *)
(* end of while loop *)
doCons, doAsgn,
doIf, doWhile : { Cntr := tl Cntr } }
state := state;
CSt := hd state;
val := hd tl tl state;
X := [ val ]
} else {
CSt := nil; X := nil
}
The STEP macro is the one used already for the universal program in Section 7. We make good use of the switch statement here to check whether an actual time consuming step has been executed (which could also be translated away to get a pure WHILE-program).
Theorem There is an efficient timed universal WH1LE program.
Proof To prove this we take the WHILE-program tw from below and translate
it into an WH1LE program which is always possible. The resulting universal
program shall be called tu. It remains to show that tu is efficient. So we need
to find two constants k and k such that timeWH1LE([p, d, n]) ≤ k × n in case 12 tu 1
the interpreter times out, and time WH1 LE ([p, d, n]) ≤ k × time WH1 LE (d) in case tu 2p
the interpreter computes the result within the allowed time interval. Then one chooses the required linear factor k to be the maximum of k1 and k2. The first inequality simply follows from the construction as after n times having executed one of the special atoms { quote, var, doHd, doTl, doCons, doAssign, doIf, doWhile } the counter is zero and tu terminates. The second inequality can be shown by close inspection of how the universal program interprets the original argument program via the STEP macro. Now we are able to prove a concrete version of the question
Does a>bimplyTIMEL(a×n)􏰴TIMEL(b×n)? instantiating L to be WH1LE.
Linear Time Hierarchy Theorem There is a constant b such that for all a ≥ 1 there is a problem A in TIMEWH1LE(a × b × n) that is not in TIMEWH1LE(a × n). This states the existence of a set (or problem) A that can be decided in linear time with bound f(n) = (a×b)×n but not with time bound f(n) = a×n. Therefore, in order to decide A, the bigger constant a × b is sufficient but a is
18

⃝c Bernhard Reus, Sussex University 2018–21 Limits of Computation
Proof To show the existence of such a problem we construct one by diagonalisa- tion, a concept we have already used before (for the proof of the undecidability of the Halting problem).
Let us first sketch the idea of this proof. We define A via diagonalisation such that the following hold:
1. A is decidable by a WH1LE program decA
2. decA ∈ WH1LEtime(a×b×n) for some b, in other words A ∈ TIMEWH1LE(a ×
b×n)
3. if A ∈ TIMEWH1LE(a × n) with a decision procedure p ∈ WH1LEtime(a×n),
}
then we get a contradiction if we ask p ∈ A using p itself as decision WH1 LE
procedure (diagonalisation), so we ask 􏰖p􏰗 (p) = true?
In order to immediately ensure the first condition, we use a program, called diag as outlined below to directly define A as the set accepted by diag. The use of the timed universal program (or self-interpreter) is essential here:
diag read X {
N := X;
Timebound := [N, “a”]; Arg := [X, X, Timebound];
X := Arg;
if hd X {
X := false } else {
X := true }
(*RunXonXforuptoa×|X|steps*) (* if there is enough time *)
(* and result is true: return false *) (* if there is not enough time *)
(* or result is false: return true *)
write X
As discussed earlier, we can compile diag into a WH1LE program decA and thus
define
WH1 LE
A={d| 􏰖decA􏰗
(d)=true}
Some comments about diag are in order: First of all, the computation of a × |X| needs to be performed at runtime as only at runtime we know the value of vari- able X. The computation makes use of macro calls. Program size is supposed to compute the size of the argument and mult computes the multiplication of two (encoded) natural numbers. The timed universal program for WH1LE, tu, is also called per macro. It is worth recalling that while an interpreter might not terminate, a timed interpreter always does since it has a timeout parameter. Consequently, program diag, and thus decA, always terminate.
Next, we show the third condition, that A ∈/ TIMEWH1LE(a×n) by a diagonal argument. As before, we carry out a proof by contradiction and assume A ∈ TIMEWH1LE(a × n). By definition of class TIMEWH1LE(a × n) this means that there is a WH1LE program p that decides membership in A and also satisfies
timeWH1LE(d) ≤ a × |d| for all d ∈ D. We now ask the question: p
19

⃝c Bernhard Reus, Sussex University 2018–21 Limits of Computation
“What happens if we run p on p itself (as we have done before and which is the reason why we call this a diagonal argument)?”
WH1 LE
In other words, we ask what is 􏰖p􏰗 (p)?
We use the fact that by assumption timeWH1LE(p) ≤ a×|p| which means that
tu has enough time to simulate p on input p. By definition of the efficient universal program this means that
Now assume that 􏰖p􏰗 WH1 LE
WH1 LE
WH1 LE WH1 LE
􏰖tu􏰗 ([ p, p, a × |p| ]) = [ 􏰖p􏰗
(p) ]
(p) = false. Then, by definition of decA we know that
􏰖decA􏰗 (p) = true. But then decA and p have different semantics and thus they can’t both decide membership of A. In this case, the assumption must be wrong that p decides the membership of A and thus the assumption A ∈ TIMEWH1LE(a × n) must be wrong.
WH1 LE
Consequently, it must be true that 􏰖p􏰗 (p) = true. But in this case, by WH1 LE
definition of decA again we know that 􏰖decA􏰗 (p) = false. But then, by the same argument as above, it must be wrong that p decides the membership of A and thus the assumption A ∈ TIMEWH1LE(a × n) must be wrong.
So in either case A ∈ TIMEWH1LE(a × n) must be wrong and therefore A ∈/ TIMEWH1LE(a × n).
It remains to show the second condition, namely that A ∈ TIMEWH1LE(a × b × n) for some b. This is shown by first analysing the runtime of A’s decision procedure diag which is a WHILE-program. We do this step by step:
• Assignment N:= X requires the execution of size which is in O(|p|) as X is |p|. Therefore, the assignment takes time 1 + ks × |p| for some constant ks.
• Assignment Timebound:=[N,”a”]: here N is |p|, a is a constant, and we know that macro mult runs in time O(|p| × a). Therefore, the time this assignment takes is 1 + km × a × |p| for some constant km.
• Assignment X:= Arg: we recall that tu is efficient, so we can esti-
mate the time it takes to execute this assignment is 1 + kt × min (a ×
|p|,timeWH1LE(p))≤k ×a×|p|forsomeconstantk. ptt
• The conditional takes either time 1 + 2 + 2 (false) or 1 + 2 + 3 (true) and is thus at most 6.
Putting now all the commands in diag together we can estimate
timeWHILE(p) ≤ 2+(1+k ×|p|)+(1+k ×a×|p|)+(k ×a×|p|)+6
diag smt or, adding up the constants,
timeWHILE(p) ≤ (k +(k +k )×a)×|p|)+10 diag smt
20
p

⃝c Bernhard Reus, Sussex University 2018–21 Limits of Computation
Since |p| ≥ 1 (remember that nil has size 1) we get that timeWHILE(p) ≤ a×(k +k +k +10)×|p|
diag smt
Let c = ks +km +kt +10. From one of our auxiliary lemmas we know that
WHILE ≼lintime WH1LE so we can conclude for decA that
timeWHILE(p) ≤ k′ ×(a×c×|p|) ≤ a×(k′ ×c)×|p|
decA
So the required b is k′ × c.
A rather intriguing open problem that remains is whether a similar Linear
Hierarchy Theorem can be proven for languages WHILE and GOTO? Note that for the latter languages we cannot apply the proof technique used above, since we do not get an efficient timed self-interpreter for those languages in the same way. The reason is that the number of variables in use is not program-independent but has an impact on the running time of the universal interpreter due to variable lookup.
Note that, however, for SRAM we could prove an analogous theorem (with either the unit-cost model or a logarithmic time model). For more information about that see e.g. Neil Jones’ book.
16.2 Big-O and Little-o
You should have already encountered the so-called “Big-O” notation10, short O, in your algorithms and data structures or program analysis course. Here “O” stands for the “order of growth rate” of a function, ignoring (discounting) constant factors and only looking at what the function does in the limit (ignor- ing the initial values on the “left” of the x-axis so to speak). When measuring runtime it makes sense to disregard the concrete time measure of a program as this depends on many “little” factors like style of coding or even optimisations. Abstracting away from such constants allows one to focus on how the runtime increases with the size of the input independent of minor implementation deci- sions or from the speed of a specific processor or machine that is used to run the program. This will become most helpful in the following chapters.
In our quest for infeasible problems, constant factors are only relevant for small values anyway. For instance, it makes a big difference to us whether a program runs for 1 second or 50 seconds. However, if we consider large input and thus long running times, it won’t make any difference to us whether a program runs for 1 century or 50 centuries, both are equally useless to us.
The “Big-O” notation will be useful to indicate the quality of an algorithm in terms of asymptotic worst-case runtime.
We say g(n) ∈ O(f(n))), or equivalently g ∈ O(f), to mean that g(n) approximately (up to a constant factor) grows at most as quickly as f(n). So g can grow even more slowly than f.
10“O” stands for order.
21

⃝c Bernhard Reus, Sussex University 2018–21 Limits of Computation
Definition Let f : N → N be a function. The order of f O( ), short O(f), is the set of all functions defined below:
{g:N→N|∀n>n0.g(n)≤c×f(n)forsomec∈N\{0}andn0 ∈N}
In other words O(f) are those functions that up to constant factors grow at most as fast as f (or, in other words, not faster) and are thus at least “not worse” a runtime bound than f (maybe even “better”). For g ∈ O(f) we also say g isO(f).
The n0 in the definition specifies the upper bound for “small numbers” one ignores for the purpose of asymptotic complexity considerations and c allows one to ignore constant differences that in the limit are negligible as well. Recall that limn→∞ nc = 0 whatever constant c is.
Example Consider the following examples of O( ):
1. n is O􏰇n2􏰈 as it grows “at most like n2” In this case the constant factor
can be one as obviously n ≤ n2 for all n.
2. 3n2 is O􏰇n2􏰈, as it grows “like n2 up to a constant factor.” This can be
seenbysettingn0 to0andcto3.
3. 34n2 + 23n + 12 is O􏰇n2􏰈 as it grows “like n2 up to a constant factor.” This can be seen by setting n0 to 0 and c to 69. This suffices as we have 34n2+23n+12≤69n2 foralln>0because35n2 ≥23n+12forn≥1.
The O( ) notation is widespread to explain the complexity of algorithms. Also there, constant factors are ignored since they might depend on details of the coding of the algorithm or the language it is implemented in.
Example A very famous example is sorting where the input is an array of data values and its size is usually measured as the length of the array. It is well known that for worst case complexity, merge sort is in O(n log2 n) whereas quick sort is in O􏰇n2􏰈. For average case complexity both are in O(nlog2 n). We will only be considering worst case complexity in the following.
For a concrete program p one can give a concrete time bound like timeWHILE(2n2) p
but for algorithms this is difficult as they can be implemented by various pro- grams that differ slightly. The “O( )” notation will take care of this. On the one hand, when taking about algorithms, we might therefore say something like
“(The runtime of) quick sort is O􏰇n2􏰈.”
where n is understood to be the size of the array to be sorted (i.e. the input). But on the other hand, when talking about concrete programs, we are able to say
“The WHILE-program p that “does” quick sort is in WHILEtime(9n2+37).” although we might only be interested in the fact that
22

⃝c Bernhard Reus, Sussex University 2018–21 Limits of Computation
“The WHILE-program p that “does” quick sort is in WHILEtime(f) were f(n) ∈ O􏰇n2􏰈.”
From these examples we see the advantage of allowing the use of “Big-O” and we would like to make it available to the definition of complexity class TIME:
Definition We define another complexity class using “Big-O” as follows: TIME(O(f)) = 􏰧 TIME(g)
g∈O(f)
To express the dual concept of O( ), namely that a function g grows asymp- totically much faster than a function f, one uses the “Little-o” notation defined as follows:
Definition Let f and g be functions of type N → N. Then O(g) are those functions that eventually grow much slower than g. Formally we can define this as follows:
f∈O(g) iffforall0<ε∈R andN∈Ns.t.ε×g(n)≥f(n)forall n≥N The above definition is equivalent to f∈O(g) ⇐⇒ limf(n)=0 n→∞ g(n) which shows that f grows ultimately much slower than g, or in other words, g grows faster than f, as the limit of the quotient f(n) approaches zero when n g(n) grows very large. Example Consider the following examples of O( ): 1. n is in O􏰇n2􏰈 as it grows “much slower than n2 ”. This follows from the factthatlim n =lim 1 whichis0. n→∞ n2 n→∞ n 2. 3n2 is not in O􏰇n2􏰈, as it does not grow much slower. This can be seen by setting by looking at the limit lim 3n2 which is 3 and thus not 0. n→∞ n2 3. n2 is not in O􏰇34n2 +23n+12􏰈 as lim n2 = 1 and thus n→∞ 34n2+23n+12 34 not 0. 16.3 Beyond Linear Time Let us now consider generalise the Hierarchy Theorem from linear time bounds to arbitrary ones that go beyond linear time. The original question we tackle in this chapter is now rephrased like so: Let f and g be time bound functions that are not both linear (as previously). Can L-programs decide more problems with time bound f than with time bound g if f grows (asymptotically) faster than g? 23 ⃝c Bernhard Reus, Sussex University 2018–21 Limits of Computation The asymptotic complexity is built into the kind of complexity theory we have been using so far. Time bounds describe how running time grows when input data size grows to an arbitrarily large amount. This is fair enough since our investigation about feasibility of problems concerns large inputs. Let f and g be time bounds, i.e. functions of type N → N. We have already encountered a definition (and notation) for f growing (much) faster than g, namely g ∈ O(f). It should be pointed out that g ∈ O(f) does not make any statement about whether f(n) > g(n) for small n. But if g ∈ O(f) we know that if n gets really big (in other words, the size of the input of the problem at hand becomes really big) then f(n) is significantly greater than g(n).
We first consider L = WH1LE.
Since we generalise from linear time bounds to arbitrary ones, there is one additionally condition in this version of the Hierarchy Theorem, namely that f and g are time constructible What does that mean? Informally, f is time constructible means that we can “find out how much time f(n) is available by a computation not taking more than the order of f(n) steps”. [Neil Jones] This is an intuitive restriction and many familiar functions are time constructible (in particular linear functions, polynomials, exponentials, and arithmetic combina- tions of those.
Definition A time bound f : N → N is called time constructible if there is a program p and a constant c > 0 such that for all n ≥ 0 we have that
􏰖p􏰗(􏰯n􏰰)=􏰯f(n)􏰰 and timep(d)≤c×f(|d|) foralld
where 􏰯n􏰰 denotes the encoding of number n in the datatype of the language used. This definition works for any programming language as every language should be able to encode natural numbers (thus the language superscripts have been omitted).
Non-computable functions (like the Busy-Beaver function) cannot be time constructible. Also, we need to have enough time to produce the result, thus sub-linear (small growing) functions are often not time constructible. In the Linear Hierarchy Theorem this condition was implicit as linear functions are automatically time constructible.
Asymptotic Hierarchy Theorem (WH1LE) If functions f and g are time constructible, f(n) ≥ n, g(n) ≥ n and g ∈ O(f) then it holds that:
TIMEWH1LE(O(f)) \ TIMEWH1LE(O(g)) ̸= ∅
Proof We proceed analogously to the Linear Time Hierarchy Theorem, i.e. we
need to find a problem A such that:
1. A is decidable by a WH1LE program decA
2. decA ∈ WH1LEtime(b×f(n)), in other words A ∈ TIMEWH1LE(b × f(n)) and thus A ∈ TIMEWH1LE(O(f))
24

⃝c Bernhard Reus, Sussex University 2018–21 Limits of Computation
3. if A ∈ TIMEWH1LE(O(g)) with a decision procedure p ∈ WH1LEtime(O(g)), then we get a contradiction if we ask p ∈ A using decision prcedure p, thus
WH1 LE
askig actually 􏰖p􏰗
To obtain the appropriate decA we modify diag from below changing the as-
signment to variable Timebound to
Timebound := n;
where f is the program that computes time bound f. This program exists since f is time constructible. Again, we can compile diag into a WH1LE program decA
and thus define
WH1 LE
(p) = true?
A={d| 􏰖decA􏰗
(d)=true}
Exactly as for the Linear Time Hierarchy Theorem we can prove that decA
∈ WH1LEtime(b×f(n)), for some b, so condition (2) is met. For condition (3) that
A ∈/ TIMEWH1LE(O(g)), as before, we carry out a proof by contradiction and
assume A ∈ TIMEWH1LE(O(g)). This means that there is a WH1LE program p
that decides membership in A and also satisfies timeWH1LE(d) ≤ T(|d|) for all p
d∈DandatimeboundT forwhichthereisan0 ∈Nsuchthatforalln>n0
WH1 LE
we have that T (n) ≤ cT × g(n). We now ask (as before) what is 􏰖p􏰗 The crucial step here is to obtain
timeWH1LE(p) ≤ T(|p|) =⇒ timeWH1LE(p) ≤ f(|p|) pp
(p)? (1)
If we have that then the same argument as in the Linear Time Hierarchy The- orem is applicable to derive a contradiction. As before, diag then allows the self interpreter enough time to interpret p and we get two semantically different deciders for A, a contradiction. It follows that A ∈/ TIMEWH1LE(O(g)) but by (2) A ∈ TIMEWH1LE(O(f(n)))
It remains to prove Eq. 1: It obviously suffices to show that T(n) ≤ f(n) wherenisthesizeofp. SinceT(n)≤cT ×g(n)forn>n0 itsufficestoshow that cT × g(n) ≤ f(n) for all n assuming |p| > n0. This, however, we get from the assumption g ∈ O(f), which implies
g(n)≤c1×f(n)forn>n1 , T
if |p| > max{n0,n1 } (in which case we are done). But what do we do if |p| is not large enough? Then we produce a program pnew from p that also meets the assumptions of condition (3), i.e. pnew ∈ WH1LEtime(O(g)) just by padding p with code of sufficiently large size.
The above theorem states that for two time bounds f and g (that are at least linear) such that f grows asymptotically faster than g there are more problems that decidable in time (bounded by) O(f) than there are problems decidable in time (bounded by) O(g).
25

⃝c Bernhard Reus, Sussex University 2018–21 Limits of Computation
The Asymptotic Hierarchy Theorem can analogously be shown for languages WHILE, SRAM and even RAM (under the unit cost model [?]). The version for Turing Machines requires a stronger assumption as g ∈ O(f) does not suffice but g(n) × log g(n) ∈ O(f) does. The additional logarithmic factor is due to the extra cost for maintaining the program counter of the timed universal program. While maintaining the counter can be done in constant time in WHILE and RAM, it requires more moves on the tape of a TM.
It should be clear that the Linear Time Hierarchy Theorem is not a special case of the asymptotic one as limn→∞ g(n) would boil down to limn→∞ a×n =
1b which is not 0.
Corollary There exists a problem in EXP that is not in P.
Proof P ⊆ TIME(O(2n)) 􏰴 TIME(O(2n × 2n)) by the Hierarchy Theorem as
lim 2n = lim 1 = 0. Now TIME(O(2n × 2n)) = TIME(O􏰇22n􏰈) n→∞ 2n×2n n→∞ 2n
and TIME(O􏰇22n􏰈) ⊆ EXP. We can put the above results together and obtain P 􏰴 EXP.
f (n) a×b×n
26