Limits of Computation 2020/21 Notes on Lectures 1–4
⃝c Dr Bernhard Reus, University of Sussex January 27, 2021
1 Introduction (and module outline)
As final year undergraduate (or postgraduate) students you will know how a modern digital computing device works. You know about these gadgets’ archi- tecture, their operating systems, their input/output devices, their networking capabilities. You appreciate the importance of compilers which you can use to translate instructions of programs so that the machine architecture in question can interpret them (carry them out). This code you have learned to write in modern high-level programming languages like Java. You have also learned, sometimes painfully, how to debug your programs to find and eliminate bugs so that in the end the system you designed and implemented does the job you want.
All these skill are very important (also for your future employability) and building system is, of course, the creative “fun part” of being a computer sci- entist. Yet, as the term “scientist” suggests, there is more to being a computer scientist than being able to write nifty programs and debug them (which is the “engineering” aspect of computing).
“Rocket scientists” who design space ships need to know the laws of physics and the limits described by those laws. For instance, they need to know that nothing can go faster than the speed of light. Similarly, a computer scientist needs to know the limits of computation.
1.1 Computability Overview
Finding the limits of computation requires, of course, that we have a clear-cut definition of what “computing” means. The origins of what is called “com- putability theory” go back to Alan M. Turing1, a British mathematician, who not only helped to decode the German code used by the Enigma machines. He also wrote a world famous paper that for the first time gave a definition of what computing is, what criteria this definition should fulfil, and then showed that
1 To commemorate Turing’s 100th birthday, 2012 was the Alan Turing Year, see http://www.mathcomp.leeds.ac.uk/turing2012/.
1
⃝c Bernhard Reus, Sussex University 2014-21 Limits of Computation
based on this definition there is a function that cannot be computed. And this was in 1936 before the invention of the general purpose digital computer. We will look at his machines (and maybe his this seminal paper )a bit later2. When searching for the limits of computation we will simplify the kind of “problems” we consider to be computed. This is fine since if we can find limitations for a restricted class of problems we of course have also found limitations of a larger class of problems.
After we have defined an adequate class of problems we also will idealise our computation devices and investigate what cannot be computed using unbounded resources. In computing the resources are time and memory and we discuss them in the complexity part of the module.
So the first big quest is to find a problem that cannot be computed even with unlimited resources (time and memory).
To be able to do this ourselves in detail we need to work with a particular model of computation. Of course we would like to work with a language that is similar to our modern high-level languages and not e.g. with Turing machine programs. But on the other hand, we want to be able to also program inter- preter for the language in this language. So the language should not be too complicated. We choose what Neil Jones suggested in his textbook, namely a simple WHILE language, with variables, assignment, while loops and as data type binary trees. We also fix that there is one input value to every program and one output value. Again, this is a simplification but good enough to successfully go on our quest.
Then we will learn how to use programs as objects such that we can write a WHILE-interpreter in WHILE.
This will help us to further distinguish non-computable problems. We will then also look at more such undecidable problems. We will learn that, roughly speaking, any interesting statement about computability itself is undecidable. Historically, Hilbert’s Entscheidungsproblem is of interest. Hilbert and many others believed that there is a decision procedure for first-order logic sentences with axioms, for instance, arithmetic (number theory). That means that there is a procedure that given any arithmetic sentence can compute whether this sentence is true or not. In 1936 this was disproved by G ̈odel, Turing and Church.
Next we will have to ensure that our definition of computation is robust. That means that our result does not depend on the choice of computing device. Otherwise, somebody might say, that the problem we have found might have been computable with a more powerful computing device. It therefore needs to be shown that what is computable with unlimited resources does not depend on the computing device (as long as this allows for a minimum of instructions needed). This is called the Church-Turing-Thesis. We will give evidence for it but showing that all commonly used models of computation are of equivalent power. We can show this by writing compilers that compile from one language (machine type) to another.
2The paper is On Computable Numbers, with an Application to the Entscheidungsproblem, Proceedings of the London Mathematical Society in 1937, and there is a link on our Canvas site.
2
⃝c Bernhard Reus, Sussex University 2014-21 Limits of Computation
1.2 Complexity Overview
In the second part, we look at what can be computed with limited time re- sources, defining time complexity classes (those you should already know from your “Algorithms and Data Structures” module). Complexity classes for pro- grams/algorithms are classes of programs that can be computed with a certain time bound. That means, we consider how much time is needed to compute a program in relation to the size of the input data. It is clear that for bigger input computations take more time.
We can the define also complexity classes for problems or functions describing those problems or functions that can be solved, or computed, respectively, by programs that are in the corresponding complexity classes. The question then is at what point become the computations intractable, i.e. too long to be useful in practice. By the way, we usually consider “worst case time complexity” when we consider complexity classes.
It turns out that it is easy to show that a problem is in a complexity class by providing an algorithm and proving its complexity function is within the class. It is much harder though to show that a problem is not in a complexity class since one needs to prove that no algorithm solving the problem has the required complexity.
It also turns out that many practical (combinatorial) problems are hard in the sense that the only algorithms we can come up with are of exponential complexity. Such algorithms are intractable since for medium large input the runtime complexity gets astronomically large (this is down to what is called “combinatorial explosion”). We will look at some of those problems that are very much “real life” problems containing issues in scheduling and timetabling and are thus important in practical applications in industry and transport. However, we don’t know yet whether there are some very clever algorithms out there that are tractable (e.g. with polynomial complexity) that just nobody has found yet.
In order to resolve the situation another complexity class is defined that contains all those problems of which we don’t know yet whether they have polynomial solutions. Of this class we can prove some properties. It contains all the hard problems that we encounter in applications and, very remarkably, it is the case that either all problems in this class are tractable (of polynomial complexity) or none. Which one is true turns out to be the most famous open problem in theoretical computer science.
Regarding space complexity (which we will not treat in this module), the question is still open whether with polynomial space we can solve more problems than with polynomial time. Note that the complexity class mentioned earlier that contains all the hard problems lies in between these two classes but again nobody knows if any (and if so which ones) are actually equal.
After all this we seem to be left with a big dilemma: If there are no known tractable solutions to so many practical (combinatorial) problems what do we do? One could be led to think that it helps to look for approximated solutions rather than optimal ones. Instead of looking for the optimal scheduling one could
3
⃝c Bernhard Reus, Sussex University 2014-21 Limits of Computation
just be as happy with computing a scheduling that is at most 10% or 20% worse that the optimal solution. Alas, it turns out that computing these approximative solutions are as hard as computing the real thing. This is an ongoing research area. Another solution could be using many parallel computers/processors. This will certainly speed up matters. But does it make intractable problems tractable? Also this is an open problem and subject of ongoing research. But the problem is that one would need super-polynomially many processors and communication between those is likely to take super-polynomial time. And by the way, the class of polynomially computable problems with parallel processors is equal to the class of problems in polynomial space.
So what do we do to get out of this dilemma? Well, sometimes in practice the intractable version works just fine for certain inputs. If one is not in this lucky situation then one needs to resort to probabilistic algorithms. There are two versions: one is always correct but only probably fast. The other is always fast but only probably correct. Of course one needs to obtain a high probability to be able to use those algorithms. In the 1970s very good polynomial time algorithms for primality testing have been developed and the research area of probabilistic algorithms be established. Testing whether a number is prime is very important for (public key) cryptography like it is used today to encrypt communication between computers and servers. However, recently it has been discovered that primality testing itself is polynomial (the famous AKS algo- rithm after the initials of the inventors). Anyway, it is still unknown whether randomised algorithms resolve our problem and give us good complexity algo- rithms for our hard problems.
Another computational model is based on molecular computing (also called DNA computing). This has been pioneered in the 1990s by Len Adelman and followers. However, like quantum computing, this is still technically difficult (one needs the right experiment set up in a lab). The massive parallelism in those molecular computations stems from the massive parallelism in interaction in the ”molecular soup”. Another fascinating aspect is that DNA computing is highly efficient, information is highly packed and it consumes very little energy. A fact that already has been successfully used to store information in DNA (see recent publications about storing and reading out again Shakespeare’s sonnets). But also here it is unknown whether DNA computing could be used to resolve intractability.
Yet another hope for tractability of our hard problems is quantum com- puting. Quantum computers are able to do some sort of “massively parallel computations” via so-called superposition of states in the world of quantum physics. The problem here is that it is really really hard to build quantum com- puters, even quantum storage as on the quantum scale everything can disturb the quantum effects, from the casing to the wires. Moreover, it is difficult to harness the parallelism of quantum computers as any observation of the result states just produces one result and the magic of the superposition is lost. So this is still an active research area. In theory however, some breakthroughs have been achieved. Shor’s algorithm is a quantum algorithm for prime factoring.
In any case, whether it’s quantum computing or molecular computing, none 4
⃝c Bernhard Reus, Sussex University 2014-21 Limits of Computation
of them would allow us to solve more problems than we already can with con- ventional computers.
2 Effective Procedures and Problems
Our first quest is to find a problem that is not computable to see that not everything is computable. In order to do this we obviously need to define what we mean exactly by “problem” and what we mean exactly by “computable.” Whereas the former is easy to do the latter is a bit more tricky.
2.1 Effective Procedures
“Effective Procedures” are the “programs” that we use to solve or compute problems and functions. The adjective “effective” highlights an important con- cept: it means that programs can be really carried out to produce a result. One must be able to produce the intended result by carrying out the procedure’s instructions. This means that instructions (or commands) in programs must not be“vague”. For instance “find a number that has property P” cannot be carried out out effectively. How do we find the number? We need instructions that produce the number effectively such that the produced number has the desired property. Therefore we cannot use “oracles” or “choice axioms” in our effective procedures. Moreover, we must be able to carry out the procedures in finite amount of time. Infinite computations are by definition not effective.
Copeland gives the following definition3:
’Effective’ and its synonym ’mechanical’ …do not carry their ev- eryday meaning. A method, or procedure, M, for achieving some desired result is called ’effective’ or ’mechanical’ just in case
1. M is set out in terms of a finite number of exact instructions (each instruction being expressed by means of a finite number of symbols);
2. M will, if carried out without error, always produce the desired result in a finite number of steps;
3. M can (in practice or in principle) be carried out by a human being unaided by any machinery save paper and pencil;
4. M demands no insight or ingenuity on the part of the human being carrying it out.
3 Copeland, J.: The Church-Turing Thesis. References on Alan
(2000). www.alanturing.net/turing archive/pages/Reference%20Articles/The%20Turing- Church%20Thesis.html.
5
Turing
⃝c Bernhard Reus, Sussex University 2014-21 Limits of Computation
2.2 Problems
Since we are interested in the Limits of Computation, we are interested in neg- ative results, i.e. what cannot be achieved in computing. Since computing in the 21st century is ubiquitous and microprocessors are not only in computers, but also games consoles, mobile phones, music players and all kinds of consumer products, even washing machines, we need to restrict the problem domain since “problem” in the context of washing will be very different from “problem” in the context of game playing, or other areas. And every reader may have their own idea of what a “problem” is. Moreover, for the problems we consider we must be able to clearly understand what it means to solve them. Otherwise we will never be able to claim that there is no program (algorithm, machine) that can find the solution.
So we allow ourselves to restrict the definition of problem. If there is a problem of that kind that is not computable then we still have found some problem that is not computable, so this restriction does not take anything away from our ambition.
A problem of the kind we’re interested in is characterised by two features:
1. it is a uniform class of questions. Uniform refers to the input of the problem, ie what data the problem is about. The type of input must be precisely definable.
2. it can be given a definite and finite answer. Also here the type of output must be precisely definable.
2.2.1 Examples for problems of our kind
1. For a tree t what is it’s height? Input: a tree (apparently with arbitrary number of children). Output: a natural number.
2. For a natural number, is it even? Input: a natural number. Output: a boolean.
3. For a given formula in number theory (arithmetic) is it valid? Input: formula in arithmetic. Output: boolean.
A decision problem is a problem where the output is a boolean, i.e. the output is ‘yes’ or ’no’. So the last two examples are decision problems. If the result is not a Boolean we call the problem a function problem.
2.2.2
1. 2.
Examples for problems which are not problems of our kind
What is the meaning of life? This is not a uniform family of questions and we don’t know what the input is, nor the output type if we’re honest.
Is the number 5 even? This is not a “uniform” class of questions and it has no input as the question only refers to the number 5.
6
⃝c Bernhard Reus, Sussex University 2014-21 Limits of Computation
3 The WHILE-language
We said that Turing machines are very tedious to program since the Turing machine computation model is quite inconvenient as the memory is organised as a tape. Thus, operations consist of symbol manipulations which usually involve a significant amount of moving around on the tape.
Following Neil Jones we use a more convenient notion of computation or “effective procedure”, namely WHILE-programs. He states in his book that
“The WHILE language seems to have just the right mix of expressive power and simplicity.”
We like the former to easily write programs and the latter to easily understand the language. Both is going to be very helpful for what we want to do in this module (and will see examples of this throughout the term).
WHILE is an untyped, imperative programming language where assignment of variables is the basic mechanism to record computations. There is one built-in data type which is the type of binary trees and we will see that we can view this also as a list type. We first look at this type.
3.1 The data type of binary trees
The values in the data type of trees are binary trees where every node has either two subtrees or is a leaf labelled with an atom (the only atom we will use is nil). This datatype is similar to the LISP datatype. The set of binary trees is denoted D (short for data), and for a tree constructed with l and r as left and right subtree, respectively, we write ⟨ l.r ⟩.
Here is an example of such a binary tree where all atoms are nil:
nil nil
nil
Definition The set of binary trees is given inductively. It contains
1. the empty tree:
nil
2. any tree constructed from two binary trees tl and tr:
tl tr
and which is written ⟨tl.tr ⟩ in textual notation.
3. and no other trees.
The set of binary trees is denoted D (short for “data”).
7
nil nil
nil nil
⃝c Bernhard Reus, Sussex University 2014-21 Limits of Computation
3.2 Encoding lists and numbers
Since our language is untyped we can, actually we must, use the binary tree type to encode anything we want to program with, in particular lists and nanutral numbers.
Definition The empty list is encoded by the empty tree nil and appending an element at the front of the list is modelled by ⟨ . ⟩. More formally we define:
[] = nil (1) [a1,a2,…,an ] = ⟨a1.⟨a2. ⟨···⟨an.nil⟩⟩···⟩⟩ (2)
The list [ [ ], [ ] ] that contains two empty lists can be encoded as: [[],[]] = ⟨nil.⟨nil.nil⟩⟩
Note also that we can “listify” every tree, so every tree encodes some sort of list. The list a tree encodes can be defined simply by “running down the right spine” of the tree until a nil is found, the “list terminator”. Then anything that “dangles” off the “spine” is a list element.
Some more examples:
1. The list [ [ ], [ ] ] that contains two empty lists can be encoded as:
[[],[]] = ⟨nil.⟨nil.nil⟩⟩
The two-dimensional representation of this tree looks as follows:
nil
2. The list containing two trees ⟨ nil.nil ⟩ and ⟨ nil.⟨ nil.nil ⟩ ⟩ can be encoded as:
[⟨nil.nil⟩,⟨nil.⟨nil.nil⟩⟩] = ⟨⟨nil.nil⟩.⟨⟨nil.⟨nil.nil⟩⟩.nil⟩⟩
3. The list containing of three lists, namely a list of two nil, a list of one nil
and an empty list, can be encoded as:
[[nil,nil],[nil][]] = ⟨⟨nil.⟨nil.nil⟩⟩.⟨⟨nil.nil⟩.⟨nil.nil⟩⟩⟩
Natural numbers can be encoded in unary representation (like tally sticks) as lists. The encoding of 0 is the empty list, and a number n > 0 consists of a list of length n with all elements being nils:
Definition We encode numbers inductively as follows:
0 = nil (3)
n + 1 = ⟨nil.n⟩ (4) 8
nil nil
⃝c Bernhard Reus, Sussex University 2014-21 Limits of Computation
Assuming E is a tree encoding number n, i.e. n = E, the term tl E represents n − 1 whereas cons nil E represents n + 1.
If we interpret the encoding trees as lists, the number n will be encoded by a list containing n nil atoms:
0
1
2 = 3 = 4 = and so on.
=[]
= [nil]=[[]]
Note that not every
Definition We encode Boolean values as follows:
false = nil true = ⟨ nil.nil ⟩
3.3 WHILE-syntax
Here is a BNF (Backus-Naur-Form) grammar for the syntax:
⟨expression⟩
⟨block⟩
⟨statement-list⟩
⟨elseblock⟩ ⟨command⟩
⟨program⟩
::= ⟨variable⟩
| nil
| cons ⟨expression⟩ ⟨expression⟩ | hd ⟨expression⟩
| tl ⟨expression⟩
| ( ⟨expression⟩ )
::= { ⟨statement-list⟩ } | { }
::= ⟨command⟩
| ⟨command⟩ ; ⟨statement-list⟩
::= else ⟨block⟩
::= ⟨variable⟩ := ⟨expression⟩
| while ⟨expression⟩ ⟨block⟩
| if ⟨expression⟩ ⟨block⟩
| if ⟨expression⟩ ⟨block⟩ ⟨elseblock⟩
::= ⟨name⟩ read ⟨variable⟩ ⟨block⟩
write ⟨variable⟩ 9
(variable expression) (atom nil) (construct tree) (left subtree) (right subtree) (right subtree)
(block of commands) (empty block)
(single command list) (list of commands)
(else-case)
(assignment) (while loop) (if-then) (if-then-else)
[nil,nil] = [[],[]]
[nil,nil,nil] = [[],[],[]] [nil,nil,nil,nil] = [[],[],[],[]]
tree encodes a natural number.
⃝c Bernhard Reus, Sussex University 2014-21 Limits of Computation
The are two nonterminals in the grammar without a definition: ⟨variable⟩ and ⟨name⟩. The former is the set of all variables (or variable names), the latter is the set of all program names. Both are considered to be sets of identifiers. The identifiers are supposed to not contain keywords, and should not start with a digit. Normally, we allow digits and letters, but not special characters with the exception maybe of and $. The concrete set of allowed identifiers may depend on the tool using the language.
3.3.1 WHILE-expressions
An expression of the WHILE-language denoting such a tree is one of the following: a variable, a constructor or a destructor. We have two constructors: the empty tree nil and the construction of a nonempty tree via cons E F where E and F are expressions denoting also binary trees. We also have tow destructors: hd, short for “head”, and tl, short for “tail”. They “disassemble” a tree into its left (head) and right (tail) subtree, respectively. They do not have any effect on the empty tree (as it does not have any subtrees). The details can also be seen in the semantics below.
Note that it is important to distinguish the language expressions that denote trees in WHILE and the trees that are data values in D. The expressions can be evaluated to values in D using the semantic interpretation for expressions explained below.
3.3.2 WHILE-statements and blocks
There are only three types of commands in the WHILE-language: assignment, conditional (if-then-else), and while loop. Conditionals may or may not have an ‘else’ part.
We can then form blocks that are essentially lists of statements, thus pro- viding the means for sequential composition of statements. Some blocks may have no statements, then they are empty blocks.
3.3.3 WHILE-programs
A program has a name, consists of a read statement, a command as described above, and a write statement. So we have one input and one output. The passing of data in a read and write statement, respectively, is via a variable, which is called the input and output variable, respectively. So a program looks like this:
pname read X {
S
}
write Y
where S is a statement list.
10
⃝c Bernhard Reus, Sussex University 2014-21 Limits of Computation
From the above it should be obvious that WHILE is a really simple language which is, however, already quite expressive. We have basic imperative program- ming constructs as well as a rich data type. Programs only have one input and one output, but this is not really a restriction, as we can wrap several arguments in one list. Throughout this module, we will do this, if we have more than one input or output, respectively. Note that we don’t use any list wrappers, if there is just one argument.
Here is a sample program for list reversal:
reverse read X {
Y := nil;
while X {
Y := cons hd X Y;
X := tl X
} }
write Y
(* initialise accumulator Y *)
(* run through input list X *)
(* append first element of X to Y *)
(* remove first element from X *)
Note that the first assignment we don’t really need as variables are initialised to be nil. This will be also discussed int he next section.
4 The semantics of WHILE
If we want to take WHILE as “effective procedures” we better make sure we
understand exactly how to execute each command and make sure it can be
executed using finite resources. We thus need to define the semantics of the
WHILE
takes a WHILE-program and maps it into its semantics which is a description of its input/ouput behaviour and thus a partial function from binary trees to binary trees, i.e. a function of type
WHILE-language. The semantic map D → D⊥.
WHILE
Then p (d) = e means that running program p with input d terminates WHILE
and the output is e and p (d) = ⊥ means that running program p with input d does not terminate. How do we define the semantic function then?
4.1 Stores
Since our WHILE-programs manipulate (values of) variables, the semantics (the effects) of commands must track these changes of variables’ values. For this we use stores and usually denote them with the Greek symbol σ decorated with various subscripts and superscripts.
Definition A store for a WHILE-program is a finite set of variable bindings (pair of variable name and a value) and thus an element in Set (VariableName × D). A store, sometimes also called environment, thus contains the values of variables
11
⃝c Bernhard Reus, Sussex University 2014-21 Limits of Computation
produced by a program. It changes during the programs execution due to the assignments made to variables. We use the abbreviation
Store = Set(VariableName × D)
so Store denotes the set of all possible stores for a WHILE-program.
We use several operations on stores defined next.
Definition Let p be a WHILE-program. We define the following operations and notations on stores:
Notation Since a store is essentially a set of key-value pairs we can write con- crete stores accordingly as set of pairs “(variable name,value)” as follows:
{X1 : d1,X2 : d2,…,Xn : dn} where d1,…,dn are values in D.
Lookup We write σ(X) for the lookup operation in a store σ for variable X. This returns a value in D. If X does not occur in σ it returns value nil. This will be in line with the use of variables that are assumed to be initialised with value nil.
Update We write σ[X := d] for the update operation of X in σ which is defined as follows:
σ[X := d] = σ \ {(X, val )} ∪ {(X, d)} (5) For the execution of a WHILE-program (or any WHILEcommands) we need a
store to start off with, the initial store:
Definition We denote the initial store for a program p with input d by σ0p(d). Note the extra argument d which denotes the input of the program that will be initially assigned to the input variable. We thus get the following definition of the initial store σ0p(d) for a program p read X {S} write Y with input d:
σ 0p ( d ) = { X : d }
We now have the ingredients to define the semantics of commands and ex-
pressions.
4.2 Semantics of commands
The semantics of commands (statement lists) is given by the judgment S ⊢ σ1 → σ2
pronounced “executing S in store σ1 terminates and leads to a new store σ2”, which is a “cool” notation for (S, σ1, σ2) ∈ RSemanticsStmtList. The relation
12
⃝c Bernhard Reus, Sussex University 2014-21 Limits of Computation
is a relation on triples (a ternary relation), consisting of a statement list and two stores.4 This relation will be defined as the smallest relation fulfilling the following some rules that describe the effect of executing commands (statements) and lists thereof. They can be found in the book and will not be part of the assessed work.
4.3 Semantics of programs
Definition Let p read X {S} write Y be a WHILE-program where S is a state- ment list. The semantics of p is defined as follows:
p
WHILE e ifS⊢σ0p(d)→σandσ(Y)=e
(d) = ⊥ otherwise5 (6)
So the semantics of p (which is a partial function) applied to argument d equals e, if running the body of p, i.e. S, in the initial state σ0p(d) for p (which sets the input variable X to d and all other variables to nil terminates in a state σ in which the output variable Y has the value e. It equals ⊥ (undefined) if the body of p in the initial state σ0p(d) does not terminate.
4You know for instance equality, a binary relation, and write .e.g u = v instead of (u, v) ∈ Requality .
5i.e. if S does not terminate when run in initial store σ0p(d) 13