Limits of Computation 20/21
Part III Computability (Some notes on Lectures 9–11)
Bernhard Reus February 20, 2021
9 More undecidable problems
Next we consider more undecidable problems. There are quite a number of undecidable problems, infinitely many, and they appear (in disguises) more often that you might think. But before we do that, let us make good use of the WHILEself-interpreter to show that HALT (our version of the Halting problem) is semi-decidable:
9.1 Semi-decidability of the Halting Problem
A WHILE-semi-decidable set (or a semi-decidable problem) is one for which an approximation of the membership test can be computed by a WHILE-program that may not terminate if the answer is “false” . Strictly speaking, the WHILE- program does not actually compute the membership test completely, only half of it (thus the name “semi”-decidable).
More formally we can define:
Definition A set A ⊆ D is WHILE-semi-decidable if, and only if, there exists a WHILE-program p such that for all d ∈ D the following holds: d ∈ A if, and
WHILE
only if, p
Let us for now assume we already have a WHILE-interpreter written in WHILE
and call this self-interpreter u. To show that HALT, the Halting set or Halting
problem, is semi-decidable, it suffices to come up with a WHILE-program q such
that for all input values d ∈ D and all WHILE-programs p we have that (p, d) ∈
WHILE
HALT if, and only if q
(p,d) = true. In other words (expanding the WHILE
(d) = true.
definition of HALT), HALT is semi-decidable if, and only, if p (d) ↓ iff WHILE
q (p, d) = true. As usual, we drop the encoding brackets, and since we know how to encode pairs in WHILE the above statement is equivalent to: HALT
WHILE
(d) ↓ iff q
WHILE
is semi-decidable if, and only, if p
Theorem The Halting problem (for WHILE-programs) HALT is WHILE-semi-
decidable.
Proof : To show that HALT, the Halting set or Halting problem, is semi-
decidable it suffices to come up with a WHILE-program sd such that for all input values d ∈ D and all WHILE-programs p we have that [p, d] ∈ HALT if, and only
1
[p, d] = true.
⃝c Bernhard Reus, Sussex University 2017–21 Limits of Computation
WHILE
[p,d] = true. In other words (expanding the definition of HALT), WHILE WHILE
if, sd
HALT is semi-decidable if, and only, if p (d) ↓ iff sd [p,d] = true. We can write such a program sd as outlined below:
sd read PD {
Res := PD; // call self-interpreter
X := true // X is true
}
write X
This program clearly returns output true if, and only if, Res := PD termi- nates, which by the fact that u is the self-interpreter is exactly the case if PD ∈ HALT.
In this subsection we state some important facts about (WHILE) decidability, semi-decidability, and their relationship. We can actually prove these results relatively easily by writing some programs.
Theorem
1. Any finite set A ⊆ D is WHILE-decidable.
2. If A ⊆ D is WHILE-decidable then so is D \ A, its complement in D.
3. Any WHILE decidable set is WHILE-semi-decidable.
4. A problem (set) A ⊆ D is WHILE-decidable if, and only if, both A and its complement, D \ A, are WHILE-semi-decidable.
Item 1 and 2 were proved in exercises. Item 3 follows directly from the definitions. The proof of item 4 remains as brain teaser (!).
9.2 Rice’s theorem
Next, we want to generalise the result regarding the undecidability of the Halting Problem. The Halting problem is a problem of (WHILE-) programs, namely whether such a program terminated on a specific input. It describes a property of WHILE-programs. We can actually generalise and show that more properties of programs are undecidable, actually all “interesting” (semantical) properties of programs are undecidable. We need to say what we mean by that.
A property of a program is “interesting” in our very precise sense if it is a non-trivial property, i.e. one that not every program has but also that at least one program has, and if it is extensional. An extensional property is one that depends exclusively on the semantics, i.e. the input/output behaviour of the program. So “interesting” program properties are “semantical” program properties.
Definition A program property A is a subset of WHILE-programs.
A program property A is non-trivial if {} ≠ A ̸= WHILE-programs.
2
⃝c Bernhard Reus, Sussex University 2017–21 Limits of Computation
A program property is extensional if for all p, q ∈ WHILE-programs such that
WHILE WHILE
it holds that p ∈ A if and only if q ∈ A.
p = q
The last property states that an extensional property is one that cannot
distinguish between semantically equal programs. If we have two different pro- grams that have the same semantics either both must be in the set or none of them are in the set. It cannot be that one is in the set and the other one is not.
With the help of this definition we can now formulate Rice’s theorem, named after Henry Gordon Rice:1
Rice’s theorem: If A is an extensional and non-trivial program property then A is undecidable.
This is a remarkable result as from this we get an entire family of undecidable problems. This roughly states that “any interesting semantical property about computation itself is undecidable”. We also get a “recipe” how we could show that a problem about programs (or a program property) is undecidable, namely applying Rice’s theorem.
But let us first prove this theorem. Once again we do a proof by contradic- tion. We are given a property (= problem) A that is extensional and non-trivial. We now assume A is decidable. Then we show that the Halting problem is de- cidable. The latter is clearly false so our assumption that A is decidable must have been false and therefore A must be undecidable.
How can we derive the decidability of the Halting problem from the assump- tion that A is decidable? We will show that any given program p on any given input e terminates if, and only if, another program (that we construct) is in A. As a consequence A is decidable if, and only if, the Halting problem is decidable.
In order to do carry out this plan, we need two auxiliary programs called diverge and comp. Let us define diverge, as the program that never terminates:
diverge read X {
while true {
}
} write Y
WHILE
(d) = ⊥ for any d ∈ D. By non-triviality of A we
We have that diverge
know that there is a program that is not in A. Let us call this program comp.
Now assume A contains diverge (if this is not the case just swap the roles of A and its complement and perform the analogous proof). By extensionality of A we know that A contains all programs that never terminate for any input.
To show that the Halting problem is decidable assume we are given a program
WHILE
p and its input e. We wish to decide whether p
Without loss of generality, assume that the above program p and the pro-
gram comp (that is not in A) have no variables in common. If they have, just consistently rename them without changing the semantics. We now construct a program q as outlined below
1Henry Gordon Rice (July 18, 1920 – April 14, 2003) was an American logician and math- ematician. He proved “his” theorem in his doctoral dissertation of 1951.
3
(e) ̸= ⊥.
⃝c Bernhard Reus, Sussex University 2017–21
q read X {
Y :=
“e”; // run p on value e
Res :=
}
write Res
Limits of Computation
We now consider the behaviour of q and whether it is in A or not. If
WHILE
WHILE
p (e)=⊥thenclearlyq
(d)=⊥forall d∈D. Ontheotherhand,
WHILE
if p get that
(e) ↓ then q
WHILE
(d) = comp WHILE
(d) for all d ∈ D. Therefore we WHILE
WHILE WHILE
q
So if p does not halt on e we have that q
=
WHILE
diverge WHILE
if p (e) = ⊥
comp
if p WHILE
(e) ̸= ⊥ = diverge
WHILE
extensionality of A and the fact that diverge ∈ A we get that q ∈ A. On
WHILE WHILE
the other hand, if p does halt on e then q = comp
extensionality of A and the fact that comp ∈/ A we get that q ∈/ A. Therefore we get that q ∈/ A if, and only if, if p halts on e so the latter would be decidable if the former was. This completes the proof.
9.3 The Tiling Problem
This is a classic problem (also called Domino problem) of which there are many variations. It dates back to the 1960s when mathematician, logician, and philosopher Hao Wang proposed them as an interesting class of formal sys- tems. A tile (now also called Wang tile) has a quadratic shape and is divided up diagonally into four sections (triangles), each of which can have a different colour (but the colours do not have to be necessarily different). The four sections are obtained by the cuts of the two diagonals of the square as follows:
where N,S,W,E stand for the four directions North, South, West, and East. When laying out the tiles we say that tiles put next to each other match if the colours of their adjacent edges are identical in all four directions: N, S, W, E. Note that it is not allowed to rotate the tiles2. If the edges match colours, one obtains nice diagonal patterns since matching tiles next to each other give rise to a diagonal square in one colour (made up of the two halves of the matching tiles) in the horizontal W/E direction this looks like explained below:
2This is not a real restriction as one can add to the type of tiles the rotated version if one wishes to do so.
4
and so by
and so by
N WE S
⃝c Bernhard Reus, Sussex University 2017–21 Limits of Computation
Therefore we can tile a quadratic floor (of size n × n) with a given set of tile types if we can put tiles of the given types next to each other (without rotating them) such that their adjacent edges have the same colour.
In particular, about such a tiling system Wang asked the following question:
“Given a certain (finite) number of tile types (colourings) and an infinite supply of tiles for each type, can one tile an arbitrary large quadratic floor (obeying the rules mentioned above)?”
Wang proposed this originally for a specific set of four tile types. He con- jectured in 1961 that this question was decidable. In his proof argument he assumed that any set that could tile the plane would be able to do so period- ically, i.e. with a repeating pattern like a wallpaper. Alas, he was wrong. His student Robert Berger managed to prove this wrong in 1966. He presented a case where the tiles would only tile the plane without repeating pattern, orig- inally using 20,426 distinct tile shapes, later reducing this to 104. The idea of Berger’s proof was to show that if the tiling problem was decidable then the Halting problem would be decidable too. This is the same proof technique that we used for proving Rice’s theorem. So the trick here is to come up with tile patterns (you can use arbitrarily many as long as they are finitely many) that can encode the configurations a specific Turing machine goes through during execution for a specific input. Doing this, one must ensure that one can tile an arbitrary large quadratic floor with those tile types if, and only if, the en- coded Turing machine run does not terminate (and runs for an arbitrarily large amount of time). Since the run is only determined by the Turing machine (pro- gram) and its input we know that we can tile a quadratic floor of any size if, and only if, the Halting problem for Turing machines was decidable.
9.4 Problem Reduction
What we have seen just now with the proof of the undecidability of the Tiling problem or the proof of Rice’s theorem seems to be a good “recipe” for obtaining
5
⃝c Bernhard Reus, Sussex University 2017–21 Limits of Computation
new results of undecidability from old ones. Indeed, what we have done there is reducing the Halting problem to the problem we want to show to be undecidable. If this reduction works, it means that we could decide the Halting problem if the problem in question was decidable too. But we already know that the Halting problem is undecidable so we can conclude that the problem in question must also be undecidable. A quite useful principle arises with the help of which one can show that a problem is undecidable.
So let us pin down this principle a bit more formally. In particular we need to define more precisely what “reduction” means exactly. In our case, dealing with computability, we will need to require that the reduction itself is computable (otherwise our argument above would not stand).
Assume A and B are problems (or sets) about items in X and Y , respectively. (For WHILE-decidability X and Y will be D, the type of binary trees.) Informally, we have a reduction from problem A on X (or the question “is a given x ∈ X inA”)toproblemBonY (orthequestion“isagiveny∈Y inB”)ifwecan find
1. a computable total function f : X → Y , mapping every element of X to an element of Y
2. such that we know that x ∈ A if, and only, if f(x) ∈ B. This condition is important to guarantee that the reduction is correct.
What do these two conditions mean? The first condition states that any input x ∈ X of which one can ask the question whether it is in A must be “trans- latable” into an element y ∈ Y of which one can ask whether it is in B; and, moreover, this translation cannot be “magic”, it must be computable. The second condition states that the question whether x is in A or whether the translated x, namely f(x), is in B are equivalent. If you know the answer to one, you know the answer to the other.
The reduction and its ingredients are depicted below where the decision procedure’s input is at the bottom such that the simpler (or at least not harder than B) problem A is below B.
yes/no
B
f(x) ∈ Y f
x∈X 6
A
⃝c Bernhard Reus, Sussex University 2017–21 Limits of Computation
Formally the above explanations can be stated as follows:
Definition of Reduction: Suppose on is given A ⊆ X and B ⊆ Y . Define A to be effectively reducible to B if there is a total computable function f : X → Y such that for all x ∈ X, we have x ∈ A if, and only if, f(x) ∈ B.
Symbolically we write this relation as A ≤rec B (“A is effectively reducible to B”).
The above definition specifies just one particular type of reduction. There are many others. The one we use here is also often called “recursive many-one reduction”. This is because many elements in B can correspond to one and the same element3 in A via f and because, historically drecursive is another word for “effective” or computable.
The following theorem now explains the importance of this concept of reduc- tion and we have implicitly used it already to prove Rice’s theorem and argue for the undecidability of the Tiling problem:
Theorem If A ≤rec B and B is decidable then A is also decidable. Contrapositively, if A ≤rec B and A is undecidable then B is also undecidable. This first part was proved in the exercises. The second part is just the logical
contraposition of the first part.
The notion of reduction is a very important tool in the toolbox of com-
putability and complexity researchers and we will meet another version that is most useful in the complexity part later.
9.5 Other (famous) Undecidable Problems
We have met a number of famous undecidable problems already and we have been given some “recipes” how to obtain new ones. Below is a selective list of other famous undecidable problems to show you that those problems are not so rare in computer science as well as mathematics, that some were quite hard to prove undecidable and that new undecidable problems can be found, are still being found, and will be found in the future.
• Do languages generated by two different context free grammars overlap?
• Is a context free grammar ambiguous? In other words: Are there two dif- ferent derivations of the same word (string) using the rules of this gram- mar?
• Does a context free grammar generate all worlds over a given alphabet?
• Given two strings (over a finite alphabet) and a set of rewrite rules, does
the first string rewrite into the second one using the given rules?
• Is a given program typable in a functional programming language with
polymorphic types4 where the set of all types is itself a type? 3That means that for different x ̸= y ∈ A we can still have f(x) = f(y).
4Haskell and ML are such programming languages but they exclude the type of all types.
7
⃝c Bernhard Reus, Sussex University 2017–21 Limits of Computation
• Given a set of polynomial equations with integer coefficients, does it have an integer solution? This was a long-standing open problem posed by David Hilbert known as Hilbert’s Tenth Problem (from the list of prob- lems he presented at a conference in Paris 1900), until Yuri Matiyasevich famously proved undecidability in 1977.
• Word problem for groups: given a finitely generated group, are any two of the generators representing the same element? This has been shown to be undecidable by Pyotr Novikov.
• The “Entscheidungsproblem” (German for decision problem): Is a given arithmetic formula valid (or satisfiable)? This has been famously proved to be undecidable by Alan Turing in 1936.
9.6 Dealing with Undecidable Problems
We have seen that many problems, in particular in the context of program development, are undecidable. This is bad news and one might wonder what impact this has on the life of software engineers and programmers.
As usual in life, one has to find ways around fundamental problems like this. There are a few coping strategies:
1. approximate the decision procedure, accepting that the approximated de- cision procedure may return false negatives. This occurs, for instance, in certain cases of type checking. Semantically, type soundness is undecid- able by Rice’s theorem. From this follows immediately that any nontrivial, sound and decidable type system must necessarily be incomplete. It will reject some programs that are actually type sound5. Instead of working with semantic types, e.g. all trees in D that denote natural numbers, one does work with syntactic types which are an approximation of the seman- tic types, e.g. introducing type expressions into the language and either requiring the programmer to annotate declarations with types or have a type inference algorithm that does that but may sometimes still need some typing hints from the programmer. Of course, to be useful, the type system must be expressive enough to type most “common” programs.
2. give up on uniformity, and just test a few input. This method is used to replace undecidable program verification (undecidable again by Rice’s Theorem). Instead of verifying that a program is correct for all input, just test whether it is correct for some, ideally the most likely, input. One does not obtain a guarantee of correctness and thus not a decision procedure. This may be combined with probabilistic methods, if one has an idea how likely the inputs actually are.
5It may also return false positives, i.e. accept programs that are not type sound, this happens e.g. in the case of division-by-zero.
8
⃝c Bernhard Reus, Sussex University 2017–21 Limits of Computation
3.
4.
9.7
give up on the automation provided by the decision procedure. This strat- egy is very popular in program verification. Instead of a decision procedure that simply takes a program and a specification and returns a Boolean stating whether the program meets the specification, build an interactive tool that will be guided by the user (the human verifier) to find the right proof.
simplify the problem. Instead of solving the undecidable problem, decide a simpler one. In program verification again, this strategy is also used, by restricting for instance the complexity of properties that can be shown a program to have (e.g. memory safety) and/or by restricting the programs.
Busy Beaver
We have seen that for undecidable problems the characteristic function6 of the problem is non-computable. But a characteristic function is a function into the Booleans (as the test result is ‘yes’ or ‘no’). The example discussed in this section actually computes arbitrarily large natural numbers.
So, for a a change, we now present a non-computable function that computes natural numbers, i.e. that is of type N → N⊥.
Tibor Rad ́o7 in his 1962 paper introduced the “Busy Beaver Game” which corresponds to the following question for Turing machine programs (we can adapt this to WHILE-programs and will do so further below):
for every input number n return the greatest number that can be outputted by any Turing machine that has at most n states when run on the empty input.
To produce an output, the Turing machine in question needs to terminate. Observe also that in Rad ́o’s original formulation the input/output alphabet of the Turing machine consists of just the symbol 1, so on the tape one can write only blanks and symbol 1 and so the output number is represented in unary form (three 1s denote number 3 just like our encoding of numbers in WHILE is a list of three nils). Therefore, the Busy Beaver function for Turing machines is also often described as the highest number of non-blank symbols that a Turing machine with n states can write on the tape. It is relevant that the input is empty, so that one cannot “inject” simply big numbers via the input and then return them again as output.
For WHILE-programs we need to translate “n states” into something compa- rable for WHILE-programs:
“Can we compute the function that for every input number n returns the greatest number that can be outputted by any WHILE-program
6The characteristic function χ of a set A of elements of type T is of type T → B and defined by χ(d) = d ∈ A.
7Tibor Rado ́ (June 2, 1895 – December 29, 1965) was a Hungarian mathematician who moved to the USA and worked in Computer Science later in his life.
9
⃝c Bernhard Reus, Sussex University 2017–21 Limits of Computation
that is (as syntactic string) at most n characters long when run on input nil.
Definition (Busy Beaver): The (total) function BB : N → N is defined as follows
BB(n) = max
WHILE p is a WHILE-program with
p (nil) WHILE |p|≤n, and p
(nil)↓
where |p| denotes the length of the textual program p (not the size of the abstract syntax tree).
It should be obvious why this is called “busy beaver”, since the implemen- tation of such a function needs to be “busy as a beaver” to compute the highest of all possible outputs for any program meeting the size constraint.
Maybe surprisingly, the Busy Beaver function attracted some interest in the research community (in Theoretical Computer Science) and still does so. The reason is the following:
Theorem: BB(x) grows more quickly than any computable function f : N → N. Thatmeansthatforanyf thereisanx0 ∈Nsuchthatforallx>x0 wehave that BB(x) > f(x).
From this it follows quickly that the BB function is not computable. Assume BB was computable. the above Theorem, setting f := BB, it would follow that there is an x such that BB(x) > BB(x) which is impossible.
Although the function is not computable, it is clearly the case that for a fixed argument n the value BB(n) is computable, as finitely many programs need to be taken into consideration. Note how this again stresses the importance of the difference between a function being computable and a specific result of a function being computable. Of course, the number of programs having to be taken into consideration increases rapidly with the size of the argument of the Busy Beaver function. And the sheer number of programs makes it difficult to compute the value of BB for even mildly large input values (such as 5). Traditionally, the result of BB have been investigated for Turing machines, as they are simple and one has only to count the size of states (i.e. instructions) and not entire programs.
By the work of Marxen and Bundtrock we know (for Turing machine’s Busy Beaver) that BB(5) ≥ 47,176,870. They used computers to look through as many Turing machine programs of size 5 as possible, simulating their runs as efficiently as possible (the problem always is what do with non-terminating Turing machines).
As recent as 2010, Pavel Kropitz found the latest record lower bound for BB(6) ≥ 3.5 × 1018267, a truly freakishly large number.
10
⃝c Bernhard Reus, Sussex University 2017–21 Limits of Computation
10 Partial Evaluation and Self-referencing Pro- grams
The question we answer in this section is the following: when is a program that refers to itself (it’s own syntax or AST) well-defined? Under which assumptions does it have a semantics? Can we write a program, for instance, that returns itself ?
Self-reference can lead to inconsistencies and paradoxes (as we have seen in the proofs of the undecidability of the Halting problem), so this question is more than justified. Such self-reference of programs is often called “reflection” and the program a “reflexive program”. You might have seen that modern programming languages come with reflection libraries that allow self-inspection and even manipulation. This is often used for flexibility, to make programs adaptable, or to actually to elegantly achieve a complex change in an already existing larger software system. These changes depend then on the structure of the software system, for instance classes. This often leads to elegant solutions, at least more elegant than manually editing thousands of lines of code or interfaces.
A fascinating fact is that the meaning of such reflective programs can be de- fined with the help of an old theorem, Kleene’s Recursion theorem, that Stephen Kleene8 proved in 1938. One of many astonishing implications of Kleene’s Re- cursion Theorem d is that
. . . any acceptable programming system is “closed under recursion”. [Neil Jones, page 222 of Computability and Complexity]
This then explains the name of the theorem. The above statement means that recursion can be “done” by reflection, it does not have to be built into the language. Note that the recursion theorem has many applications. Before we look into it in detail, we discuss another result of Kleene’s which will be used in this context and which has interesting applications also in its own right:
In this section we discuss the existence of self-referencing programs, that we can also call “recursive” since with the help of self-reference one can implement recursion. This will be discussed in more detail later.
Before looking into it and its applications in detail, we first discuss another result of Kleene’s that will be used in this context, and which has interesting applications in its own right and is related to (optimisation by ) partial evalua- tion:
10.1 The S-m-n Theorem
The so-called “S-m-n” theorem, sometimes also called parameterization theo- rem, can be regarded as justification for partial evaluation as an optimisation technique. The reasons for the strange name will be explained soon. In terms
8Stephen Cole Kleene (1909–1994) was a famous American mathematician. You might know him already from a course on finite automata. He invented regular expressions! He was a student of Alonzo Church. He is the co-founder of mathematical logic and recursion theory.
11
⃝c Bernhard Reus, Sussex University 2017–21 Limits of Computation
of metaprogramming (i.e. programming with programs) the theorem basically states the existence of specialisers.
S-1-1 Theorem For any programming language L with pairing and programs- as-data there exists a program spec such that
L L L spec (p, s) (d) = p (s, d)
where ( , ) denotes pairing for a general language L.9
This program spec is a specialiser from L to L as first mentioned in Chap. 6.
Recall from Chap. 6 that a specialiser takes a program and some of its input and produces a new program that needs “fewer input parameters”. The above equation specifies the resulting program’s behaviour. The specialiser “applies” the one argument to the given program that requires two arguments, using it as its first argument. In doing so it constructs the result program with one remaining argument, otherwise preserving the program’s semantics. One might say the specialiser is “freezing the first argument of the given program”.
We can easily prove the S-1-1 theorem for L = WHILE by simply constructing the program specWHILE as given on our Canvas site.
Imagine a program p to be specialised that has two arguments encoded by input variable XY where X has been assigned hd XY and Y been assigned hd tl XY using the standard list encoding of tuples. After p has been “specialised” with partial input 3 the resulting program will have input variable, say A, and an extra assignment XY:=cons 3 A added to it. The latter implements the passing of the first argument provided. Therefore, if the program contained an assignment of the form Z:=
In the above example an assignment Z:=
So partial evaluation can be regarded as “efficient program specialisation” [Neil Jones].
10.2 Kleene’s Recursion Theorem
The following theorem guarantees the existence of self-referencing programs, that we can also call “recursive” since with the help of self-reference one can implement recursion. This will be discussed in more detail later. Let us first state Kleene’s theorem. Recall the for a generic language we use notation (, ) for pairs.
9In WHILE we know how to express pairing, namely as a list via [ , ] since we know the datatype is D.
12
⃝c Bernhard Reus, Sussex University 2017–21 Limits of Computation
Kleene’s Recursion Theorem Let L be an acceptable programming language. For any L-program p, there is a L-program q such that for all input d ∈ L-data we have
q (d)=p (q,d)
How does this deal with reflective programs and what is an acceptable program- ming language? The reflective program q is defined via the help of p which has an extra parameter to represent “self”. So p may regard q as its own metadata representation.
Definiton An acceptable programming language is a programming language L (in the sense we have already provided earlier) for which the following holds:
1. L has a self-interpreter (or universal program and therefore must have programs-as-data as well as pairing;
2. The S-m-n theorem holds for L;
3. L is Turing-complete in the sense that it allows to express all programs
that a Turing machine can compute.10
What can the Recursion Theorem be used for? A famous example is the following: One can show that a self-reproducing program must exist with the help of the above theorem. Just choose p to be a program that satisfies
p [x, y] = x
which is achieved by program p that writes as output hd XY if XY is its input variable. As usual we do not write explicit encoding brackets for list [x, y]. Form this follows the existence of a program q such that
q
WHILE
(d) = p
WHILE
[q,d] = q
LL
WHILE
The proof of Kleene’s (second) recursion theorem is relatively short but very clever indeed.
We start with program p that takes two arguments the first of which is considered to be a program itself. We know there is a specialiser spec for the programming language (as it is acceptable). We can therefore apply any program, say r, to itself, i.e. to its data representation r, with the help of the specialiser
L
spec (r,r)
self application of r
which returns a L program that “performs” this self-application. Then we can pass this resulting program as input to our p to obtain a function
f(r,d)=p (spec (r,r), d) (10.1)
10This is a relatively mild assumption on a language that we are prepared to call a pro- gramming language.
13
LL
⃝c Bernhard Reus, Sussex University 2017–21 Limits of Computation
where f is now a function mapping L-data to L-data. Since L is acceptable, we know that f must be computable by an L-program pself. In short:
pselfL = f (10.2) Now we simply define the program q we are looking for as another self-application,
this time of pself.
L self self
q=spec (p ,p ) (10.3)
self application of pself
This program q returns a L-program and this is actually what we will choose for the program required by the theorem. We have to show that it has the required property.
L
L self self L
= spec(p ,p ) (d)
use Eq. 10.3, the definition of q use S-1-1 Thm with p and s=p use Eq 10.2, definition pself
use Eq. 10.1, definition of f
q (d)
self L self
self
= p (p ,d) = f(pself,d)
L L self self
= p (spec (p ,p ),d) L
= p (q,d)
This is yet another neat proof achieved with self-application11, two helpings of
self-application actually.
10.3 Recursion Elimination
There are many applications of this theorem. A classic one is recursion elimi- nation. So let us return to the specific language WHILE (so L := WHILE). The program fac below shows an attempt of implementing the factorial function that makes use of a macro call to mult and also calls itself (fac) recursively.
fac read n {
if n {
A :=
Res :=
}
fac2 read qn {
q := hd qn;
n := hd tl qn;
if n {
else {}
A := [q, tl n];
Res :=
Res := 1 else {
}
}} write Res }
Res := 1
write Res
11Readers familiar with Alonzo Church’s untyped λ-calculus will recognise the similarity with the construction of the recursion operator Y f = (λx.f (x x)) (λx.f (x x)) which also uses two self-applications.
14
use Eq. 10.3, definition q backwards
⃝c Bernhard Reus, Sussex University 2017–21 Limits of Computation
We have seen already that in WHILE we cannot define such a recursive def- inition directly, even in extensions of WHILE because recursive macros are not supported and thus we cannot call fac as suggested. We first need to be able to compile the macro away. However, with the help of an extra parameter that represents the program fac and using the universal program (self-interpreter) for WHILE, u, we can write program fac2 without a recursive macro. Pro- gram fac2 takes two arguments the first of which is a program, namely the meta-representation of the factorial program. But fac2 is not yet the factorial function we want. Applying Kleene’s recursion theorem. however, we obtain a program facRefl such that
[facRefl, d]
WHILE WHILE
facRefl (d) = fac2
and this program does implement the factorial function. We have used the self-interpreter here to make the recursion on the program source explicit.
The proof of Kleene’s recursion theorem is constructive which means that the proof tells us how the program q actually looks like. It turns out that, despite of the elegance of the proof by self application, the resulting program is not very efficient. For instance, to compute the factorial of n, self application leads to a self-interpreter running a self-interpreter running a self-interpreter and so on for n levels which leads to a significant slow-down in runtime. This is the reason why many interpreters provide explicit reflective means in terms of syntactic operators to refer to the program’s own source (as abstract syntax tree) and to the self-interpreter.12 As a consequence only one level of self-interpreter needs to be running at any time (as it can be called itself repeatedly).
In compiled languages the situation is more difficult, as the compiled pro- gram usually dropped all reference to source code and consists of executable binaries only. Often this is also in the spirit of the language (as reflective mech- anisms may lead to execution overhead even if one does not use reflection), and C or C++ are example cases. In cases where compiled code is executed in a runtime environment like a virtual machine, reflective libraries provide access to metadata produced by this runtime system13 intentionally for the purpose of reflective programming. This is possible because the intermediate code (byte code) has more information attached than the binaries produced by direct com- pilation.
11 The Church-Turing Thesis
So far, we have successfully identified quite a few undecidable problems and non- computable functions. We also have identified some recipes how to show that a problem is decidable or undecidable. We have looked at a weaker property than decidability, namely semi-decidability.
But we have done that using a particular notion of computation, in other words a particular choice of what “effective procedures” are. Our choice was
12In LISP we have for instance quote and unquote. 13e.g. meta representation of classes in Java
15
⃝c Bernhard Reus, Sussex University 2017–21 Limits of Computation
WHILE-programs. The WHILE-language had been picked because it is simple but at the same time rich enough to allow one to write complicated algorithms with relative ease.
But how do we know that the problems we have shown to be undecidable by WHILE-programs are not decidable if we used another computational device or language? That the results obtained in previous chapters do not depend on the choice of WHILE is something we still have to show and we will do this now in this chapter. Naturally, we will have a look at a few more possible notions of computation: Turing machines, the flowchart language GOTO language, register machines, counter machines, as well as a restricted version: counter machines, and cellular automata. The former four models are instances of machine lan- guages, for which we generally discuss a framework for semantics.
We will provide arguments that those languages are all equally powerful. The claim that all reasonable formalisations of the (more or less) intuitive no- tion computation are equivalent is called Church-Turing thesis which will be addresses first.
Before we do that let us first cite the famous Church-Turing Thesis that appears in all kinds of variations and disguises.14
The thesis has been originally suggested by Turing and Church indepen- dently. They both had introduced a notion of “effective procedure”, namely the Turing machine and the λ-calculus terms, respectively. And both claimed that any form of computation can be described with their preferred choice.
In its original “Turing” form the thesis thus states that:
A function is ‘effectively calculable’ (that is algorithmically com- putable or effectively mechanisable) if, and only if, it is computable by a Turing machine.
This would justify our choice of WHILE instead of the notoriously tedious to program Turing machines as WHILE-programs are ‘effectively calculable’ and thus by the thesis above, each WHILE-program must be equivalent to some Turing machine. A more “liberal” formulation (also used by Neil Jones) is:
All reasonable formalisations of the intuitive notion of effective com- putability are equivalent.
where it is also added that Turing machine computability is such a reasonable formalisation of effective computability. This formulation “factors out” the concrete Turing machine model and simply focuses on the consequence that all models must be equivalent (as they are all equivalent to Turing machines). In the lecture slides I have also used the formulation below:
14Turing experts like Andrew Hodges do correctly emphasise that many of the theses that these days come with the label “Church-Turing thesis” are very free interpretations of it. Also philosophically these variations can be interpreted in very different ways but we do not want to go into that. The interested reader is kindly referred to Andrew Hodges – http://www.turing.org.uk/publications/stanford.html – and Jack Copeland’s blogs – http://www.alanturing.net/turing archive/pages/reference%20articles/The%20Turing- Church%20Thesis.html.
16
⃝c Bernhard Reus, Sussex University 2017–21 Limits of Computation
All natural models of computation are of equivalent power.
Whatever version you prefer they are all describing a thesis and not a theorem (and thus can’t be proved). The reason is that the statement refers to a vaguely defined notion of what a computational model is or what computable means: the first talks about “algorithmically computable”, the second uses “reasonable formalisation of the intuitive notion of effective computability” and the last uses “(reasonably rich) models of computation”. None of the mentioned ”no- tions of computation” are formally defined like Turing machines are. Note that “reasonable” implicitly refers to the fact that the corresponding notion of com- putability must be powerful enough to express Turing machines. So for instance, finite automata would not count as “reasonable” notion of computation.
Although we can’t prove a thesis, we shall now provide evidence for the Church-Turing thesis, and there is plenty of it. In order to do that, we will in- troduce a few more notions of computation very precisely and formally, and then argue precisely (without formally proving anything) that they are equivalent:
• Turing machines TM
• Register machines RAM
• Counter machines CM
• a flow chart language GOTO
• an imperative language WHILE (already discussed at length in Chapters 3- 5)
• Cellular Automata CA
Another language mentioned earlier is the λ-calculus that Alonzo Church had used for his definition of what computable means (namely computable by a λ- term). We won’t discuss the λ-calculus here, explaining its absence from the above list.
We split the list above into two types of models: machine-based ones and others. The machine-based ones include TM, GOTO, CM, RAM as they all have in common that they use programs that consist of labelled instructions. The WHILE-language is of a different nature, and different from all of them are cellular automata.
11.1 Semantic Framework for Machine-like Models
A machine program p consists of a list of labelled instructions 1:I1 ;2:I2 ;…m:Im
where the labels are natural numbers ranging from 1 to m.
Before we define syntax and semantics of the languages non encountered so far, we first present a framework that can be instantiated to describe the semantics of all machine like models.
17
⃝c Bernhard Reus, Sussex University 2017–21 Limits of Computation
Assuming we now the syntax of the machine language, i.e.. the instruction set, to define the interpretation of programs , ie. the semantics of the machine language, all we need is the following four ingredients:
• a definition of what a store is for the semantics of the language. The store, or memory, will typically contain the values of variables or registers or anything that is supposed to store data.
• a function Readin that takes the input and produces the initial store of the machine.
• a function Readout that takes the final store and reads out the result (output).
• the description of the semantics of the instructions for the machine (lan- guage) written as p ⊢ s → s′ which means that the machine program p transits from state (or configuration) s into state s′ in one step. In other words, it provokes a state changes from s to s′. Note that the machine in- structions are simple and thus such a transition can be described without defining a more complex relation as done for the semantics of commands in WHILE.
Based on the third item, the one step semantics p ⊢ s → s′, one defines the “many-step-semantics” p ⊢ s →∗ s′ which shall denote that machine program p transits from state s into state s′ in zero, one or many steps. A state is a tuple consisting of the current instruction label to be executed next and the store (or memory). So, a machine state is e.g. (3,σ) where σ is a concrete store for the type of machine in question.
If we have those three ingredients we can now define the semantics of a machine program p as follows:
p(x) = y iff σ0 = Readin(x) and p⊢(1,σ0)→∗ (m+1,σ) and
y = Readout(σ)
In other words σ0 describes the initial store and thus (1,σ0) the initial state of the machine as the machine starts with instruction 1. We have reached the final state if the next instruction is one that is not anymore in the program. As m is the last instruction of our program p we know that if we are at instruction m + 1 the program has terminated. One then reads out the result from the final store σ using Readout.
11.2 The various models
We will now instantiate this framework by describing the ingredients for the framework in each case: the syntax and then the four ingredients for the se- mantic framework:
18
⃝c Bernhard Reus, Sussex University 2017–21 Limits of Computation
11.2.1 Turing machines TM
syntax
store
readin
readout
semantics
Instructions are as follows:
rightj , leftj , writej , S, ifj S goto l else l’
The j subscript indicates which tape the instruction is intended for.
The stores for a Turing machine are its tapes. For Turing machines we also need to be aware of the head position on each tape which is indicated by an underscore. So for k-tape machines the store is a k-tuple of the form
(L1S1R1, L2S2R2, . . . LkSkRk)
where the Li and Ri are strings of symbols of the tape alphabet and Si are
symbols of the tape alphabet.
Readin(x)=(Bx,B,B,…,B).
In other words, the input is stored on the first tape, the head is put left of the first input character and all other tapes are initially blank.
Readout(L1S1R1,L2S2R2,…LkSkRk) = Prefix(R1)
where Prefix(R1BR2) = R1 provided that R1 does not contain any blank symbols B.
In other words, the output is the word just right of the current head position on tape 1 up to, and excluding, the first blank.
We just give below the semantics for Turing machines with 1 tape (it is pretty obvious how to generalise that to k tapes). Note that p is the program l1 : I1;…lm : Ilm;.
p⊢(l,LSS′R) p ⊢ (l, LS) p⊢(l,LS′SR) p ⊢ (l, SR)
p ⊢ (l, LSR) p ⊢ (l, LSR) p ⊢ (l,LS′R)
→ (l+1,LSS′R) → (l+1,LSB) → (l+1,LS′SR) → (l+1,BSR) → (l+1,LS′R) → (l′ , LSR)
→ (l′′ , LS′ R)
ifIl =right
if Il = right
ifIl =left
ifIl=left
ifIl=writeS′
ifIl =ifSgotol′ elsel′′
ifIl =ifSgotol′ elsel′′ andS̸=S′
The second and fourth line above deal with the cases where one moves right over the right end of the tape, and left over the left end of the tape, respectively. Since there should be no “end” of the tape (the tape is supposed to be potentially infinite) we accordingly can never “fall off the tape” in any direction. What happens is that we just write a blank symbol if we go to a tape cell we have never visited before.
11.2.2 GOTO language
This language uses the same datatype as WHILE, namely binary trees. The main difference is that it does not provide an abstract control flow mechanism like a while loop but instead a goto command which lends the language its name.
19
⃝c Bernhard Reus, Sussex University 2017–21 Limits of Computation
Such jumps are known from machine languages but also the good old language BASIC (beginner’s all purpose instruction code) that was the only language shipped with the first basic desk top computers in the early eighties. Note also that in GOTO we cannot build complex expressions E like in WHILE. So all arguments to function hd, tl, and cons are variables.
syntax
store readin
readout semantics
In the following X, Y, and Z denote program variables. Instructions are as follows:
X:=nil, X:=Y, X:=hdY, X:=tlY, X:=consYZ,
if X goto l else l′
The store is exactly the same as for WHILE-programs. The store is a map from those variables to elements in D, e.g. {X : d,…,Z : e}.
The input and output variable are usually both fixed as X (but could also be declared explicitly). We assume that Readin(d) = {X : d}.
In other words the initial store maps the input variable to X, all other variables of the program are initialises with nil.
Readout(σ) = σ(X).
In other words the result of the program is the value of variable X.
We give the one step semantics now. Note that σ[X := d] abbreviates the store that is the same as σ with the one exception that the value for variable X is now set to be d. As before with other machines, we assume that p is the program l1 : I1;…lm : Ilm;.
11.2.3
p⊢(l,σ) → (l+1,σ[X:=nil]) p⊢(l,σ) → (l+1,σ[X:=σ(Y)]) p⊢(l,σ) → (l+1,σ[X:=d]) p⊢(l,σ) → (l+1,σ[X:=nil]) p⊢(l,σ) → (l+1,σ[X:=e]) p⊢(l,σ) → (l+1,σ[X:=nil]) p⊢(l,σ) → (l+1,σ[X:=⟨d.e⟩]) p⊢(l,σ) → (l′,σ)
p⊢(l,σ) → (l′′,σ)
Register machines RAM and SRAM
ifIl =X:=nil
ifIl =X:=Y
ifIl =X:=hdYand σ(Y)=(d.e)
ifIl =X:=hdYand σ(Y)=nil
ifIl =X:=tlYand σ(Y)=⟨d.e⟩
ifIl =X:=tlYand σ(Y)=nil
ifIl =X:=consYZand σ(Y)=dand σ(Z)=e ifIl =ifXgotol′ elsel′′ andσ(X)̸=nil
ifIl =ifXgotol′ elsel′′ andσ(X)=nil
RAM actually stands traditionally for Random Access Memory which is memory that can be directly addressed (and one does not have to run through the storage sequentially to find the data like e.g. on a Turing machine tape). It is the machine type that comes closest to the actual CPUs in modern computing devices.
Since we deal with computability (and not yet complexity) we allow an unbounded number of registers which are indexed by natural numbers15. More-
15After all, we have not restricted the length of the tapes for Turing machines nor the number of variables that can occur in WHILE-programs.
20
store
readin
readout semantics
⃝c Bernhard Reus, Sussex University 2017–21 Limits of Computation
over, each register contains a natural number of arbitrary size16. So we have an idealised register machine in the same way as we had idealised WHILE and GOTO language where variables can store arbitrarily large trees. Although our real-world gadgets always have a finite number of registers that store num- bers of a certain maximum size, this is not a problem for our considerations of computability (without resource bounds). We will consider computations with resource bounds later in the complexity part of the module.
Like most machine languages there are instructions to move data from one register to another, there are conditional jumps depending on whether the value of a register is 0 or not, and there are operations like increase or decrease of a register, addition or multiplication of registers (storing the result in a third register). Indirect addressing is also important and provided.
The successor RAM, called SRAM, is like the RAM machine but does not allow binary operations on registers.
syntax
In the following Xi denotes the i-th register. Instructions are as follows: Xi:=Xi+1, Xi:=Xi- ̇1, Xi:=0, Xi:=Xj,
Xi :=
if Xi=0 goto l else l′
The extra angle brackets in assignments indicate indirect addressing. The meaning will be clear from the semantics below. Note that Xi := Xi- ̇1 denotes the decrement operation on natural numbers which stipulates that 0- ̇1 = 0. In order to highlight this one writes the dot on top of the minus symbol. Extra operations not available in SRAM are the binary operations: Xi:=Xj+Xk and Xi:=Xj*Xk.
Since registers store natural numbers and are indexed by natural numbers, a store maps register (indices) to their values. Thus a store actually maps natural numbers to natural numbers and therefore is a function of type N → N where N denotes the type of natural numbers.
The input and output variable are both fixed as register 0.
Readin(x) is the function that maps 0 to x and all other numbers to 0. In other words the initial store moves x into the register 0, all other registers contain initially 0.
Readout(σ) = σ(0).
In other words the result of the program is the value of register 0.
We give the one step semantics now. Note that σ[i → n] abbreviates the store that is the same as σ with the one exception that the value for register i is now set to be number n. As before with other machines, we
16We have not restricted the size of trees that can be stored in variables of WHILE-programs either.
21
⃝c Bernhard Reus, Sussex University 2017–21 Limits of Computation
assume that p is the program l1 : I1;…lm : Ilm;.
p⊢(l,σ) → (l+1,σ[i:=σ(i)+1]) p⊢(l,σ) → (l+1,σ[i:=σ(i)−1]) p⊢(l,σ) → (l+1,σ[i:=0]) p⊢(l,σ) → (l+1,σ[i:=σ(j)]) p⊢(l,σ) → (l+1,σ[i:=0])
p ⊢ (l,σ) → (l′,σ)
p ⊢ (l,σ) → (l′′,σ)
p⊢(l,σ) → (l+1,σ[i→σ(σ(j))]) p⊢(l,σ) → (l+1,σ[σ(i)→σ(j)])
ifIl =Xi:=Xi+1
ifIl =Xi:=Xi- ̇1 and σ(i)>0
ifIl =Xi:=Xi- ̇1 andσ(i)=0
ifIl =Xi:=Xj
ifIl =Xi:=0
ifIl =ifXi=0gotol′ elsel′′ andσ(Xi)=0 ifIl =ifXi=0gotol′ elsel′′ andσ(Xi)̸=0 ifIl =Xi :=
ifIl =
Note how in the penultimate line the indirect addressing for Xj is se- mantically expressed by the “double feature” of “looking up a register value”: σ(σ(j)): the value found in register j, σ(j), is used itself as regis- ter name/number and then the value of this register, σ(σ(j)), is used. The “indirection” refers to this fact, that the value is not directly taken from register j, but from the register addressed by the value in register j. The analogous indirection happens in the last line when the indirect address is used for the register the value from Xj is moved to. This feature is most important to implement pointers arrays.
11.2.4 Counter machines CM
Counter machines are much simpler than register machines. Since the registers for such machines only permit increment and decrement operations and a test for 0, and not any arbitrary register transfer or even more complicated binary operations, the registers for such machines are called counters.
Note that the counter machine that only uses two counter is called 2CM. On closer inspection, the Counter Machine model is actually a subset of the SRAM model where operations on registers are extremely limited which justifies the fact that they are called counters now. Therefore the remainder of this definition should read familiar:
syntax In the following Xi denotes the i-th counter. Instructions are as follows: Xi := Xi+1, Xi := Xi- ̇1, if Xi=0 goto l else l′
store Like for register machines, a store is a function of type N → N where N denotes the type of natural numbers.
readin The input (and output) counter is fixed as counter 0.
Readin(x) is the function that maps 0 to x and all other numbers to 0. In other words the initial store moves x into counter 0, all other counters contain initially 0.
readout Readout(σ) = σ(0).
In other words the result of the program is the value of counter 0.
22
⃝c Bernhard Reus, Sussex University 2017–21 Limits of Computation
semantics We give the one step semantics now. As before, σ[i := n] abbreviates the store that is the same as σ with the one exception that the value for register i is now set to be number n and we assume that p is the program l1 : I1;…lm : Ilm;.
p⊢(l,σ) → (l+1,σ[i:=σ(i)+1]) p⊢(l,σ) → (l+1,σ[i:=σ(i)−1]) p⊢(l,σ) → (l+1,σ[i:=0])
p ⊢ (l,σ) → (l′,σ)
p ⊢ (l,σ) → (l′′,σ)
11.2.5 Cellular Automata
ifIl =Xi:=Xi+1
ifIl =Xi:=Xi- ̇1 and σ(i)>0
ifIl =Xi:=Xi- ̇1 andσ(i)=0
ifIl =ifXi=0gotol′ elsel′′ andσ(Xi)=0 ifIl =ifXi=0gotol′ elsel′′ andσ(Xi)̸=0
Note that those are not part of the assessed part of this module.
The fascination of (two-dimensional) cellular automata has gripped students since John Horton Conway17 devised Game of Life (often simply called Life) in 1970. It shot to immediate fame through publication and promotion in the
“Mathematical Games” section of Scientific American.
The classic version of Life consists of an unbounded two-dimensional orthog-
onal grid of square cells, each of which can be in one of two possible states: dead or alive (0 or 1). Each cell interacts (only) with its eight neighbour cells. At each tick of the clock, the following transitions happen:
1. Any live cell with fewer than two live neighbours dies (under-population).
2. Any live cell with two or three live neighbours lives on to the next gener- ation (survival).
3. Any live cell with more than three live neighbours dies (overcrowding).
4. Any dead cell with exactly three live neighbours becomes a live cell (re- production).
The game starts with a seed of live cells at time tick zero, describing the 0-th generation, i.e. which cells in the two-dimensional grid are initially alive and which are dead. It is important that the rules above are applied to all cells simultaneously at each time tick.
Apparently, the syntax (and semantics) of cellular automata is very different from the notions of computation we have seen so far. This makes it also quite difficult to program a specific desired behaviour and restricts applicability. Nev- ertheless, cellular automata are fascinating the reason being twofold: First of all, it is astounding that complex behaviour emerges from very simple rules. Sec- ondly, computation rules of cellular automata are local and decentralised. There is no central control like in any of our other notions of computation where the semantics always allowed the program to access the entire memory of a store or tape. In cellular automata, rules act locally on neighbourhoods of cells.
17Born 26 December 1937, John Conway is a British mathematician working in the area of finite groups, number theory, combinatorial game theory and others.
23
⃝c Bernhard Reus, Sussex University 2017–21 Limits of Computation
Applications of cellular automata can be found in simulations of dynamical systems in physics (e.g. flow systems, gas behaviour, growth of crystals or snow flakes), natural systems as well as image processing (for instance to express blurring). Cellular automata can also be used to produce natural (self-similar) patterns for computer generated art (images and music).
Game of Life is just one particular instance of a (two-dimensional) cellular automaton. One can, of course, imagine a different rule for cells to develop and one could also imagine one-dimensional grids and more than just two states for each cell. We won’t discuss those generalisations in this module, nor do we now give a formal definition of Game of Life (but we easily could).
Despite the simplicity of the above rules for Game of Life, some “complex” behaviour may evolve (in the shape of patterns). A few simple patterns are depicted in the lecture slides (Lecture 11). There are oscillators like “blinker”, which changes in intervals of even and odd time ticks18 or “stills” like “block”, which do not change at all over time unless their local neighbourhood changes. Some configurations disappear over time.
Probably the fascinating features of Life are however the moving patterns, also called “spaceships”. An example is the “glider” which moves one cell diagonally after four time ticks. Even more amazingly and famously, there exist patterns that create such gliders in regular intervals: the “Gosper Glider Gun” that “shoots” gliders (a form of “spaceships”) in regular intervals every 30 time ticks.
There is an ongoing discussion whether cellular automata are more suitable to explain computational phenomena in nature than more traditional notions of computations, but this discussion is beyond the scope of this module.
11.3 Equivalence of Models
Now that we have given a few various notions of computation we can use them to give evidence for the Church-Turing thesis. We show that these formally defined models of computation are all equivalent. How do we do that? Well, we must show that whenever we can decide a problem with a program (machine) of one kind we can also do it with a program (machine) of the other kind. In other words, we need to be able to translate programs between the languages, preserving the semantics of the program in question. But, hang on a second, this is exactly what we said a compiler does (see Chapter 6 ). Since we do not want to need to write 8×8 compilers (since we have 8 computational models on show) and since we know that we can compose compilers we rather sometimes use circular compilation. For instance, we can show that we can compile CM to SRAM to RAM to TM to GOTO to CM. Since we can compose compilers we thus can compile CM to GOTO and also GOTO to CM for instance.
The diagram in the lecture notes (Lecture 11) gives an overview regarding the compilers needed. Where we use L ⊂ L’ this indicates that the language L is already a subset of L’ and thus actually no compilation is needed19.
18Other patterns may need a longer interval to repeat their original state. 19In other words, the compilation is the identity function.
24
⃝c Bernhard Reus, Sussex University 2017–21 Limits of Computation
11.3.1 Equivalence of WHILE, WH1LE, GOTO, TM, RAM, SRAM, CM, 2CM, and CA
Below we give short explanations of the compilation processes involved. For more information consult Chapter 8 of Neil Jones’s book (or exercises).
• one variable as list of variables: WHILE → WH1LE
The idea of this compiler has been mentioned earlier. We store all values for the variables of the source program in a list that is stored in the one allowed variable of the target program. One has to produce extra code that accesses the right element in the list but this is clearly possible.
• WHILE → GOTO
The main issues is to compile away complex tree expressions as they can appear in WHILE and the while loop itself. The while loop can clearly be compiled into a conditional jump (this is a classic compiler exercise). Com- plex expressions need to be “split” into several individual commands that manipulate only variables (as provided by GOTO). For instance, X:=cons hd nil tl XwillbecompiledintoZ:=nil; Y:=hd Z; U:=tl X; X:=cons Y U. Any (potentially nested) while loop can be compiled away using sev- eral jumps.
• GOTO → WHILE (translation of flowcharts according to B ̈ohm-Jacopini [?]): The idea is that we explicitly model the program counter (as seen in the semantics of GOTO) which is stored in a variable. The WHILE-program then only needs one loop that checks whether the current program (instruction) counter is still “within the program”. If it is, execute the instruction with the corresponding label. Instructions change the program counter and this is how jumps can be executed. All other GOTO instructions are actually also WHILE-instructions.
• TM → GOTO (tape as two lists):
As Turing machines use words over a tape alphabet this requires a change of data representation. But a tape symbols can be easily encoded in D and words are the simply lists of such encoded symbols. The tape of the machine is represented as two lists. The first element of the second list is the currently scanned symbol on the tape. The first list is considered to contain the symbols left of the head in reverse order of direction. The first element of that list is the content of the cell left to the head, the second element the content of the cell left of the cell left to the head and so on. Clearly, each instruction of a Turing machine can then be easily simulated by one or several GOTO instructions.
• GOTO → CM (tupling functions)
The main idea here is to encode each GOTO variable by a CM counter. This requires a change of data representation, as GOTO uses datatype D whereas CM works with natural numbers N. To this end, one defines a
25
⃝c Bernhard Reus, Sussex University 2017–21 Limits of Computation
coding function c : D → N as follows:
c(nil)=0 c⟨d1.d2⟩=1+2c(d1) ×3c(d2)
This encoding is appropriate since for each code there is clearly at most one tree with this encoding (thanks to the use of prime numbers 2 and 3).
One can simulate GOTO instructions by several CM instructions that use the above encoding. One also needs to show that operations on numbers like
m(x)=max{y| thereisazsuchthatx=cy×z}
is CM computable. This is necessary to encode the tree operations (using
c = 2 and c = 3).
• RAM → TM (registers as (address,content) on two tapes)
The idea here is relatively simple: each register is encoded as a pair of two numbers: its address and its content. These pairs of numbers are then stored on the Turing machine tape. Numbers are written on the tape in binary representation. It is best to use 4 tapes for the simulation. One tape for the addresses, one tape for the contents, one tape as “accumulator”, and one tape for intermediate results (scratch). But one can in principle do the same with one tape.
• CM → 2CM (pairing):
The idea is to use one of the two counters to store the values of all the counters of the original CM program. We use prime numbers to encode their different values in one (see also GOTO → CM). So if the original program has counters X0, X1,. . . , Xk and at any given point in the execution they have values x0, x1, . . . , xk, respectively, then the first of the two counters of the 2CM program, let’s call it Y, will contain the following :
Y=2×0 ×3×1 ×···×hxk
where h is the (k+1)th prime number. Now any instruction of the original program can be compiled into one or several of the new 2CM machine. Of course initially the input d must be encoded as
Readin(d)={Y:2d ×30 ×50···×h0,X:0}={Y:2d,X:0}
and before the program returns, the result it must be computed from the
two counters as follows:
Readout(σ)=x ifσ(Y)=2x×c
• TM→CA:
This has been done already by van Neumann and others but a more ef- ficient translation has been suggested since then. One can, however, go further and may wonder, whether the single particular two-dimensional
26
⃝c Bernhard Reus, Sussex University 2017–21 Limits of Computation
CA Game Of Life can compute the same as any Turing Machine can. The latter requires the TM to be encoded in the seed of this particular CA, prov- ing the universality of the cellular automaton which is much harder than compiling a given TM into any CA. Already in 1974, Wainwright observed that there are “Life patterns that can imitate computers.” He showed how to control the spatial location of gliders such that glider streams can be used as signals, how to duplicate them, and how to implement logical gates20 for those signals. In 1982, Elwyn R Berlekamp, John H Conway, and Richard K Guy proved that Game of Life is Turing-complete, i.e. one can simulate Turing machines.
Much later a concrete universal register machine (slightly different variant of the RAM we used) was implemented in Life by Paul Chapman in 2002. Moreover, a fully universal Turing machine was created21 by Paul Rendell. Since the rules of the game are fixed, in order to create register machines or Turing machines, one has to find the right “seed pattern” which is ex- tremely complex. To obtain and manage such complex behaviour, one uses several layers of abstractions, by first implementing logic gates, regis- ters etc. Needless to say, those constructions are very involved creations, their execution is extremely slow. Chapman, for instance, writes about his register machine: “Version 0 of the URM is large (268,096 live cells in an area of 4,558 × 21,469)” . Rendell reports that running the uni- versal Turing machine for a TM with 6 instructions and two symbols that doubles string “101010” needed 116 million life generations to complete. Therefore, the universal machines in Life discussed above are not usable to compute any reasonable register machine or Turing machine program, respectively.
• CA→TM:
This direction is relatively straightforward, if tedious, as one “simply” has to implement the semantics for the cellular automaton. One can use the tape of the TM to simulate the cell space of the cellular automaton. Of course, all local updates have to be simulated sequentially on the TM, which is not at all efficient. It is also important that the initial configuration (seed) is finite.
It turns out that even a one-dimensional version of a cellular automaton can be used to simulate a universal Turing machine as suspected by Stephen Wolfram (a renowned advocate of cellular automata) and later proved by Matthew Cook.
Consequently, cellular automata are no more or less powerful than any other notion of computation discussed. Please note that we ignore complexity con- cerns (as before) and focus just on computability. The universal machines discussed above are not usable to compute any reasonable register or turing machine program, respectively.
20OR, AND, and NOT gates
21see also: http://www.youtube.com/watch?v=My8AsV7bA94
27