0.1. A THEORY OF COMPUTABILITY 1 Lecture Notes #2.
0.1 A Theory of Computability
Computability is the part of logic and theoretical computer science that gives
a mathematically precise formulation
to the concepts algorithm, mechanical procedure, and calculable/computable function.
2
The advent of computability was strongly motivated, in the 1930s, by
Hilbert’s program to found mathematics on a (metamathematically provably) consistent (i.e., free from contradiction) axiomatic basis . . .
. . . in particular by his belief that the Entscheidungsproblem,
or decision problem, for axiomatic theories,
that is, the problem “is this formula a theorem of that theory?” was solvable by a mechanical procedure that was yet to be discovered.
So what IS a “mechanical procedure”?
0.1. A THEORY OF COMPUTABILITY 3
Now, since antiquity, mathematicians have invented “mechanical procedures”, e.g., Euclid’s algorithm for the “greatest common divisor”,1 and had no problem recognizing such procedures when they encountered them.
But how do you mathematically prove the nonexistence of such a me- chanical procedure for a particular problem?
You need a mathematical formulation of what is a “mechanical procedure” in order to do that!
1That is, the largest positive integer that is a common divisor of two given integers.
4
0.1.1 A Programming Framework for Computable Func- tions
So, what is a computable function, mathematically speaking? There are two main ways to approach this question.
1. One is to define a programming formalism—that is, a programming language—and say: “a function is computable precisely if it can be ‘programmed’ in this programming language”.
Examples of such programming languages are • the Turing Machines (or TMs) of Turing
• and the unbounded register machines (or URMs) of Shepherdson and Sturgis.
Note that the term machine in each case is a mis- nomer, as both the TM and the URM formulations are really programming languages,
A TM being very much like the assembly language of “real” computers,
A URM reminding us more of (subsets of) Algol (or Pascal).
0.1. A THEORY OF COMPUTABILITY 5
2. The other main way is to define a set of computable functions directly—without using a programming language as the agent of defini- tion:
How? By a devise that resembles a mathematical proof, called a derivation.
In this approach we say a “ function is computable precisely if it is derivable”.
6
Either way, a computable function is generated by a finite devise (program, or derivation).
In the by-derivation approach we start by accepting some set of initial func- tions I that are immediately recognizable as “intuitively computable”, and choose a set O of function-building operations that preserve the “computable” property.
0.1. A THEORY OF COMPUTABILITY 7
We now embark on defining the high level programming language URM. The alphabet of the language is
←,+,−. ,:,X,0,1,2,3,4,5,6,7,8,9,if,else,goto,stop (1)
Just like any other high level programming language, URM manipulates the contents of variables.
[SS63] called the variables “registers”.
8
1) These variables are restricted to be of natural number type.
2) Since this programming language is for theoretical analysis only—rather than practical implementation—every variable is allowed to hold any natu- ral number whatsoever, without limitations to its size, hence the “UR” in the language name (“unbounded register”).
3) The syntax of the variables is simple: A variable (name) is a string that starts with X and continues with one or more 1:
URM variable set: X1, X11, X111, X1111, . . . (2)
4) Nevertheless, as is customary for the sake of convenience, we will utilize the bold face lower case letters x,y,z,u,v,w, with or without subscripts or primes as metavariables in most of our discussions of the URM, and in examples of specific programs (where yet more, convenient metanotations for variables may be employed).
0.1. A THEORY OF COMPUTABILITY 9 0.1.1.1 Definition. (URM Programs) A URM program is a finite (or-
dered) sequence of instructions (or commands) of the following five types:
L: x←a
L: x←x+1
L : x ← x −. 1 (3) L: stop
L: if x=0gotoM elsegotoR
where L,M,R,a, written in decimal notation, are in N, and x is some variable.
We call instructions of the last type if-statements.
This command is syntactically illegal (meaningless) if any of M or R exceed the label of the program’s stop instruction.
Each instruction in a URM program must be numbered by its position number, L, in the program, where “:” separates the position number from the instruction.
We call these numbers labels. Thus, the label of the first instruction MUST BE “1”.
The instruction stop must occur only once in a program, as the last instruction.
10
The semantics of each command is given below.
0.1.1.2 Definition. (URM Instruction and Computation Semantics)
A URM computation is a sequence of actions caused by the execution of the instructions of the URM as detailed below.
Every computation begins with the instruction labeled “1” as the current instruction.
The semantic action of instructions of each type is defined if and only if they are current, and is as follows:
(i) L : x ← a. Action: The value of x becomes the (natural) number a. Instruction L + 1 will be the next current instruction.
(ii) L : x ← x+1. Action: This causes the value of x to increase by 1. The instruction labeled L + 1 will be the next current instruction.
(iii) L:x←x−. 1. Action: Thiscausesthevalueofxtodecreaseby1,ifit was originally non zero. Otherwise it remains 0. The instruction labeled L + 1 will be the next current instruction.
(iv) L : stop. Action: No variable (referenced in the program) changes value. The next current instruction is still the one labeled L.
(v) L : if x = 0 goto M else goto R. Action: No variable (referenced in the program) changes value. The next current instruction is numbered M if x = 0; otherwise it is numbered R.
What is missing? Read/Write statements! We will come to that!
0.1. A THEORY OF COMPUTABILITY 11 We say that a computation terminates, or halts, iff it ever makes current (as
we say “reaches”) the instruction stop.
Note that the semantics of “L : stop” appear to require the computation to continue for ever. . .
. . . but it does so in a trivial manner where no variable changes value, and the current instruction remains the same: Practically, the computation is over.
When discussing URM programs (or as we just say, “URMs”) one usually gives them names like
.
M,N,P,Q,R,F,H,G
12
NOTATION: We write ⃗xn for the sequence of variables x1 , x2 , . . . , xn . We
write⃗an forthesequenceofvaluesa1,a2,…,an.
It is normal to omit the n (length) from ⃗xn and ⃗an if it is understood or
we don’t care, in which case we write ⃗x and ⃗a.
0.1.1.3 Definition. (URM As an Input/Output Agent) A computa- tion by the URM M computes a function that we denote by
M ⃗x n y
in this precise sense:
The notation means that we chose and designated as input variables of M the following: x1 , . . . , xn . Also indicates that we chose and designated one variable y as the output variable.
We now conclude the definition of the function M ⃗xn : For every choice we y
make for input values ⃗an from Nn,
Aside: “Nn” is borrowed from set theory. It is the cartesian product of n copies of N = {0,1,2,3,…}, that is, Nn is the set of all length-n sequences a1,a2,…,an where each ai is in N; a natural number.
0.1. A THEORY OF COMPUTABILITY 13
(1) We initialize the computation of URM M, by doing two things:
(a) We initialize the input variables x1 , . . . , xn with the input values
a1,…,an
We also initialize all other variables of M to be 0.
This is an implicit read action.
(b) We next make the instruction labeled “1” current, and thus start
the computation.
So, the initialisation is NOT part of the com- putation!
(2) If the computation terminates, that is, if at some point the instruction stop
becomes current, then the value of y at that point (and hence at any future
point, by (iv) above), is the value of the function M⃗xn for input ⃗a . yn
This is an implicit write action.
14
0.1.1.4 Definition. (Computable Functions) A function f : Nn → N of n
variablesx1,…,xn iscalledpartialcomputableiffforsomeURM,M,
we have f = M x1 ,…,xn . y
The set of all partial computable functions is denoted by P.
The set of all the total functions in P—that is, those that are defined on all inputs from N—is the set of computable functions and is denoted by R. The term recursive is used in the literature synonymously with the term computable.
We resume: Lecture NOTES #3. Sept.16
Saying COMPUTABLE or RECURSIVE without qualification implies the qual- ifier TOTAL.
It is OK to add TOTAL on occasion for EMPHASIS!!
“PARTIAL” means “might be total or nontotal”; we do not care, or we do not know.
0.1. A THEORY OF COMPUTABILITY 15
BTW, you recall from MATH1019 that the symbol
left field right field
↓↓ f: Nn → N
simply states that f takes input values from N in each of its input variables and outputs —if it outputs anything for the given input!— a number from N. Note also the terminology in red type in the figure above!
Probably your 1019 text called Nn and N above “domain” and “range”. FORGET THAT! What is the domain of f really? (in symbols dom(f))
Def
dom(f) = {⃗an : (∃y)f(⃗an) = y}
that is, the set of all inputs that actually cause an output.
The range is the set of all possible outputs: Def
A function f : N → N is total iff dom(f) = Nn.
Nontotal iff dom(f) Nn.
If ⃗an ∈ dom(f) we write simply f(⃗an) ↓. Either way, we say “f is defined at ⃗an ”.
The opposite situation is denoted by f(⃗an) ↑ and we say that “f is undefined at ⃗an”. We can also say “f is divergent at ⃗an”.
• Example of a total function: the “x+y” function on the natural numbers.
• Example of a nontotal function: the “⌊x/y⌋” function on the natural numbers. All input pairs of the form “a,0” fail to produce an output: ⌊a/0⌋ is undefined. All the other inputs work.
ran(f) = {y : (∃⃗an)f(⃗an) = y}
16
0.1.1.5 Example. Let M be the program
1:x←x+1 2 : stop
Then Mx is the function f given, for all x ∈ N, by f(x) = x + 1, the successor function.
0.1.1.6 Remark. (λ Notation) To avoid saying verbose things such as “Mx is the function f given, for all x ∈ N, by f(x) = x+1”, we will often use Church’s λ-notation and write instead “Mx = λx.x + 1”.
In general, the notation “λ · · · .” marks the beginning of a sequence of input variables “· · · ” by the symbol “λ”, and the end of the sequence by the symbol “.” What comes after the period “.” is the “rule” that indicates how the output relates to the input.
The template for λ-notation thus is λ“input”.“output-rule”
Relating to the above example, we note that f = λx.x + 1 = λy.y + 1 is correct and we are saying that the two functions viewed as tables are the same.
Note that x,y, are “apparent” variables (“dummy” or bound) and are not free (for substitution).
0.1. A THEORY OF COMPUTABILITY 17 0.1.1.7 Example. Let M be the program
1:x←x−. 1 2 : stop
Then Mx is the function λx.x −. 1, the predecessor function.
The operation −. is called “proper subtraction” —some people pronounce it “monus”— and is in general defined by
. x − y if x ≥ y
x−y=
0 otherwise
It ensures that subtraction (as modified) does not take us out of the set of
the so-called number-theoretic functions, which are those with inputs from N and outputs in N.
18
Pause. Why are we restricting computability theory to number-theoretic functions? Surely, in practice we can compute with negative numbers, rational numbers, and with nonnumerical entities, such as graphs, trees, etc. Theory ought to reflect, and explain, our practices, no?
It does. Negative numbers and rational numbers can be coded by natural number pairs.
Computability of number-theoretic functions can handle such pairing (and unpairing or decoding).
Moreover, finite objects such as graphs, trees, and the like that we manipu- late via computers can be also coded (and decoded) by natural numbers.
After all, the internal representation of all data in computers is, at the low- est level, via natural numbers represented in binary notation.
Computers cannot handle infinite objects such as (irrational) real numbers.
But there is an extensive “higher type” computability theory (which orig- inated with the work of [Kle43]) that can handle such numbers as inputs and also compute with them.
However, this theory is way beyond our scope.
0.1. A THEORY OF COMPUTABILITY 19 0.1.1.8 Example. Let M be the program
1:x←0 2 : stop
Then Mx is the function λx.0, the zero function.
In Definition 0.1.1.4 we spoke of partial computable and total computable func- tions.
We retain the qualifiers partial and total for all number-theoretic functions, even for those that may not be computable.
Total vs. nontotal (no hyphen) has been defined with respect to a chosen and fixed left field for any function in computability.
The set union of all total and nontotal number-theoretic functions is the set
of all partial (number-theoretic) functions. Thus partial is not synonymous with nontotal.
20
0.1.1.9 Example. The unconditional goto instruction, namely, “L : goto L′”,
canbesimulatedbyL: if x=0gotoL′ elsegotoL′. 0.1.1.10 Example. Let M be the program
1:x←0 2 : goto 1 3 : stop
Then Mx is the empty function ∅, sometimes written as λx. ↑.
Thus the empty function is partial computable but nontotal. We have just established ∅ ∈ P − R.
0.1. A THEORY OF COMPUTABILITY 21 0.1.1.11 Example. Let M be the program segment
k−1:x←0
k : if z=0gotok+4elsegotok+1 k+1:z←z−. 1
k+2:x←x+1
k+3: gotok
k+4:…
What it does:
By the time the computation reaches instruction k+3, the program segment has set the value of z to 0, and has made the value of x equal to the value that z had when instruction k − 1 was current.
In short, the above sequence of instructions simulates the following sequence
L: x←z L+1:z←0 L+2:…
where the semantics of L : x ← z are standard in programming:
They require that, upon execution of the instruction, the value of z is copied into x but the value of z remains unchanged.
22
0.1.1.12 Exercise. Write a program segment that simulates precisely L : x ← z; that is, copy the value of z into x without causing z to change as a side-effect.
We say that the “normal” assignment x ← z is non destructive.
0.1. A THEORY OF COMPUTABILITY 23 Because of Exercise 0.1.1.12 above, without loss of generality, one may as-
sume that any input variable, x, of a URM M is read-only.
This means that its value is retained throughout any computation
of the program.
Why “without loss of generality”? Because if x is not such, we can make it be!
Indeed, let’s add a new variable as an input variable, x′ instead of x.
Then, in detail, do this to make x′ read-only:
• Add at the very beginning of M the instruction 1 : x ← x′ of Exer-
cise 0.1.1.12.
• Adjust all the following labels consistently, including, of course, the ones
referenced by if-statements—a tedious but straightforward task.
• Call M′ the so-obtained URM.
Clearly, M′ x′,y1,…,yn = Mx,y1,…,yn, and M′ does not change x′. z z
24
0.1.1.13 Example. (Composing Computable Functions)
Suppose that λx⃗y.f(x,⃗y) and λ⃗z.g(⃗z) are partial computable, and say f = F x , ⃗y
while
u
g = G ⃗zx
Lecture #4. Sept. 21
We assume without loss of generality that x is the only variable common to F and G. Thus, if we concatenate the programs G and
F in that order, and
1. remove the last instruction of G (k : stop, for some k) —call the program
segment that results from this G′, and
2. renumber the instructions of F as k, k + 1, . . . (and, as a result, the refer- ences that if-statements of F make) in order to give G′F the correct program structure,
then, λ⃗y⃗z.f(g(⃗z),⃗y) = (G′F)⃗y,⃗z. u
Note that all non-input variables of F will still hold 0 as soon as the execu- tion of (G′F) makes the first instruction of F current for the first time.
This is because none of these can be changed by G′ under our assumption, thus ensuring that F works as designed.
0.1. A THEORY OF COMPUTABILITY 25 Thus, we have, by repeating the above a finite number of times:
0.1.1.14 Proposition. If λ⃗yn.f(⃗yn) and λ⃗z.gi(⃗z), for i = 1, . . . , n, are partial computable, then so is λ⃗z.f(g1(⃗z), . . . , gn(⃗z)).
f(g1(⃗a), . . . , gn(⃗a)) ↑
We characterized the Definition 0.1.1.15 as “rigid”.
Indeed, note that it requires all the arguments of f to be substituted by some gi(⃗z)—unlike Example 0.1.1.13, where we substituted a function invoca- tion (cf. terminology in 0.1.1.6) only in one variable of f there, and did nothing with the variables ⃗y.
Also, for each call gi (. . .) the argument list, “. . .”, must be the same; in 0.1.1.15 it was ⃗z.
Note that
if any gi(⃗a) ↑
Elsef(g1(⃗a),…,gn(⃗a))↓providedf isdefinedonallgi(⃗an).
For the record, we will define composition to mean the somewhat rigidly defined operation used in 0.1.1.14, that is:
0.1.1.15 Definition. Given any partial functions (computable or not) λ⃗yn.f(⃗yn) and λ⃗z.gi(⃗z), for i = 1, . . . , n, we say that λ⃗z.f(g1(⃗z), . . . , gn(⃗z)) is the result of their composition.
As we will show in examples later, this rigidity is only apparent.
26
0.1.1.16 Theorem. P is closed under composition.
0.1.1.17 Corollary. R is closed under composition. Proof. Let f, gi be in R.
Then they are in P, hence so is h = λ⃗y.fg1(⃗y),…,gm(⃗y) by 0.1.1.16. By assumption, the f, gi are total. So, for any ⃗y, we have gi(⃗y) ↓ —a num-
ber. Hence also fg1(⃗y),…,gm(⃗y) ↓.
That is, h is total, hence, being in P, it is also in R.
We can rephrase 0.1.1.14, saying simply that
0.1. A THEORY OF COMPUTABILITY 27
Composing a number of times that depends on the value of an input variable—or as we may say, a variable number of times—is iteration. The general case of iteration is called primitive recursion.
0.1.1.18 Definition. (Primitive Recursion) A number-theoretic function f is defined by primitive recursion from given functions λ⃗y.h(⃗y) and λx⃗yz.g(x, ⃗y, z) provided, for all x,⃗y, its values are given by the two equations below:
f(0,⃗y) = h(⃗y)
f (x + 1, ⃗y)= g(x, ⃗y, f (x, ⃗y))
h is the basis function, while g is the iterator.
We can take for granted a fundamental (but difficult) result (see EECS 1028, W20, course notes), that a unique f that satisfies the above schema exists.
Moreover, if both h and g are total, then so is f as it can easily be shown by induction on x (Later).
It will be useful to use the notation f = prim(h, g) to indicate in shorthand that f is defined as above from h and g (note the order).
Note that
f (1, ⃗y) = g(0, ⃗y, h(⃗y)),
f (2, ⃗y) = g(1, ⃗y, g(0, ⃗y, h(⃗y))),
f (3, ⃗y) = g(2, ⃗y, g(1, ⃗y, g(0, ⃗y, h(⃗y)))), etc.
Thus the “x-value”, 0, 1, 2, 3, etc., equals the number of times we compose g with itself (i.e., the number of times we iterate g).
28
With a little programming experience, it is easy to see that to compute
f(x,⃗y) of 0.1.1.18 is computed by the pseudo code below:
1 : z ← h ( ⃗y )
2 : for i = 0 to x − 1 3 : z ←g(i,⃗y,z)
At the end of the loop, z holds f(x,⃗y).
Here is how to implement the above as a URM:
0.1.1.19 Example. (Iterating Computable Functions)
Suppose that λx⃗yz.g(x,⃗y,z) and λ⃗y.h(⃗z) are partial computable, and, say, g = Gi,⃗y,z while h = H⃗y.
zz
By earlier remarks we may assume:
(i) The only variables that H and G have in common are z, ⃗y. (ii) The variables ⃗y are read-only in both H and G.
(iii) i is read-only in G.
(iv)xdoesnotoccurinanyofH orG.
We can now see that the following URM program, let us call it F , computes
f defined as in 0.1.1.18 from h and g, where is program H with the stop
instruction removed, is program G that has the stop instruction removed, and instructions renumbered (and if-statements adjusted) as needed:
r: r+1: r+2:
k: k+1: .
k+m:
k + m + 1 : k + m + 2 :
⃗y z
i←0
if x=0gotok+m+2elsegotor+2 x ← x −. 1
i,⃗y,z z
i←i+1 w1 ← 0
wm ← 0 goto r + 1 stop
G′
H′
G′
The instructions wi ← 0 set explicitly to zero all the variables of G′ other than i,z,⃗y to ensure correct behavior of G′. Note that the wi are implicitly initialized to zero only the first time G′ is executed. Clearly, the URM F
simulates the pseudo program above, thus f = Fx,⃗y. z
H′
0.1. A THEORY OF COMPUTABILITY 29 Lecture #5 (Sept. 23)
We have at once:
0.1.1.20 Proposition. If f,g,h relate as in Definition 0.1.1.18 and h and g are in P, then so is f. We say that P is closed under primitive recursion.
0.1.1.21 Corollary. If f,g,h relate as in Definition 0.1.1.18 and h and g are in R, then so is f. We say that R is closed under primitive recursion.
Proof. AsR⊆P,wehavef∈P.
But we saw that if h and g is total, then so is f.
So, f ∈ R.
30
What does the following pseudo program do, if g = Gx,⃗y for some URM G?
1:x←0
2 : while g(x,⃗y) ̸= 0 do (1) 3:x←x+1
We are out here (exited the while-loop) precisely because
• Testing for g(x, ⃗y) ̸= 0 never got stuck due to calling g with some x = m that makes g(m, ⃗y) ↑.
• The loop kicked us out exactly when g(k,⃗y) = 0 was detected, for some k, for the first time, in the while-test.
In short, the k satisfies
k = smallest such that g(k,⃗y) = 0 ∧ (∀z < k)g(z,⃗y) ↓
Now, this k depends on ⃗y so we may define it as a function f , for all INPUTS ⃗a in ⃗y, by:
Def k=f(⃗a) = min x:g(x,⃗a)=0∧(∀y) y