A Subset of the URM Language; FA and NFA
This Note turns to a special case of the URM program- ming language that we call Finite Automata, for short FA.
This part presents almost a balance of How To and Limitations of Computing topics.
Main feature of the latter will be the so-called “Pump- ing Lemma”.
Intro to Automata© 2020, by George Tourlakis
2
0.1. The FA
The FA (programming language)† is introduced infor-
mally as a modified and restricted URM.
This new URM model will have explicit “read” in-
structions.∗
Secondly, any specific URM under this model will ONLY
have ONE variable that we may call generically “x”.
This variable will always be of type single-digit; it can- not hold arbitrary integers, rather it can only hold single digits as values.
†Note that some texts look at it as a “machine”, hence the terminology “automaton”.
∗In Notes #2 we explained why explicit read instructions are theoretically as redundant as explicit write instructions are.
Intro to Automata© 2020, by George Tourlakis
0.1. The FA 3
The FA has no instructions —other than “read”— compared to the FULL URM, except for a simplified if- goto instruction.
In the absence of a stop instruction, how does a compu- tation halt?
We postulate that our modified URMs halt simply by reading something that does not belong, that is, it saw in the input stream an object that is not a member of the input alphabet of permissible digits.
Such an “illegal” symbol serves as an end-marker of the useful stream digits that constitute the input string over the given alphabet. As such it is often called an “end-of-file” marker, for short, eof.
This eof -marker is any “illegal” symbol, that is, a sym-
bol not in the particular FA’s INPUT ALPHABET.
Thus the modified URM halts if IFF it runs out of input, as this is signaled by it reading something NOT in its input alphabet.
Intro to Automata© 2020, by George Tourlakis
4
Our insistence on a URM-like model for the automaton will be confined in this brief motivational introduction and is only meant to illustrate the indebtedness of the fi- nite automata model to the general URM model of Notes #2, as promised above.
Intro to Automata© 2020, by George Tourlakis
0.1. The FA 5 The FA has, for each label L, a group of instructions
as follows.
The typical group-instruction of an automaton.
read
if x = a then goto M′
if x = a′ then goto M′′ L: .
if x = a(n) then goto M(n) if x = eof then halt
where L and M′,…,M(n) are labels —not necessarily distinct— and a, a′, . . . , a(n) are all the possible digit val- ues in the context of a specific URM program, that is, {a, a′, . . . , a(n)} is the input alphabet.
The empty string, λ, will never be part of a FA’s input alphabet.
Intro to Automata© 2020, by George Tourlakis
6
For any particular FA (program) —a particular FA, as we say (omitting “program”)— labels, in practice, are not restricted to be numerical nor even to be consecutive (if numerical).
However, one instruction’s placement is significant.
It is often identified by a label such as “0”, or “q0”, or some such symbol and is placed at the very begin- ning of the program.
This instruction’s label is called the initial state of the specific automaton. Indeed, all labels in an automa- ton are called states in the literature.
Pause. A finite automaton does not care about the order of its other instructions, since they will be reach- able by the goto-structure as needed wherever they are.
Intro to Automata© 2020, by George Tourlakis
0.1. The FA 7 The semantics of the “typical” instruction above is:
• Read into the variable x the first unread digit-value from some “external (to the FA) input stream” that is waiting to be read.
• Then move to the next instruction as is determined by the a(i)s (or the eof ) in the if-cases above (p.5).
Intro to Automata© 2020, by George Tourlakis
8
In order to have the FA make a decision about the input string it just read, we (this is part of the design of the particular FA program) partition the instruction- labels of any given FA into two types: accepting and rejecting.
Their role is as follows: Such an FA, when it has halted,
Pause. When or if ?
will have finished scanning a sequence of digits —a string
over its alphabet.
This string is accepted if the program halted while in an accepting state, otherwise the input is rejected.
0.1.1 Definition. (The Language of an FA)
The language decided by a FA M is called in the liter- ature “the Language accepted by M”. It is, of course,
Def
L(M) = {x : x is accepted by automaton M}
Intro to Automata© 2020, by George Tourlakis
0.1. The FA 9
Since an FA cannot “write”, i.e., cannot change the con- tents of x —since it does not have any of the instructions x←c,x←x+1,x←x−. 1—weneedthetypeof state to “code” the yes/no (accept/reject) answer.
Intro to Automata© 2020, by George Tourlakis
10
0.2. Deterministic Finite Automata and their Languages
0.2.1 Example. Consider the FA below that operates over the input alphabet {0, 1}
read
if x = 0 then goto 0
0:
if x = 1 then goto 1
if x = eof then halt read
if x = 0 then goto 1 1:
if x = 1 then goto 0 if x = eof then halt
What does this program do? Once we have the graph model, we will elaborate on what the above automaton actually does. LATER!
In particular we will look into two cases: • When only state 0 is accepting.
• When only state 1 is accepting.
Intro to Automata© 2020, by George Tourlakis
0.2. Deterministic Finite Automata and their Languages 11 0.2.1. FA as Flow-Diagrams
Moving away from the URM-like programming language for automata, we next consider a “flow chart” or “flow di- agram” formalisation. This is achieved by first abstract- ing an instruction
L: read;ifx=athengotoM (1) as the configuration below:
a
Figure capturing (1) above
Thus the “read” part is implicit, while the labeled ar- row that connects the states L and M denotes exactly the semantics of (1).
L
Intro to Automata© 2020, by George Tourlakis
12
Therefore, an entire automaton can be viewed as a di- rected graph —that is, a finite set of (possibly) labeled circles, the states, and a finite set of arrows, the transi- tions, the latter labeled by members of the automaton’s input alphabet.
Intro to Automata© 2020, by George Tourlakis
0.2. Deterministic Finite Automata and their Languages 13
An arrow label a in the figure above represents “if x = a then goto M”. The arrows or edges interconnect the states. If L = M, then we have the configuration
a
L= M
where the optional label could be L, or M, or L = M (as above), or nothing.
We depict the partition of states into accepting and re- jecting by using two concentric circles for each accepting state as below.
The special start state is denoted by drawing an arrow, that comes from nowhere, pointing to the state.
Intro to Automata© 2020, by George Tourlakis
14
0.2.2 Definition. (FA as Flow Diagrams) A finite au- tomaton, for short, FA, over the FINITE input alphabet Σ is a finite directed graph of circular nodes —the states— and interconnecting edges —the transitions— the latter labeled by members of Σ.
We impose a restriction to the automaton’s structure:
For every state L and every a ∈ Σ, there will be precisely one edge, labeled a, leaving L and pointing to some state M (possibly, L = M).
We say the automaton is fully specified (corresponding to the italics in the part “For every state L and every a ∈ Σ, there will be . . . ”) and deterministic (correspond- ing to the italics in the part “there will be precisely one edge, . . . ”).
This graph depiction of a FA is called its flow diagram
To summarise and firm up:
and is akin to a programming “flow chart”.
Intro to Automata© 2020, by George Tourlakis
0.2. Deterministic Finite Automata and their Languages 15
0.2.3 Remark. (1) Thus, full specification makes the transition function total —that is, for any state-input pair (L, a) as argument, it will yield some state as “out- put”.
On the other hand, determinism ensures that the tran- sition function is indeed a function (single-valued).
(2) On Digits. Each “legal” input symbol is a member of the alphabet Σ, and vice versa. In the pream- ble of this chapter we referred to such legal symbols as “digits” in the interest of preserving the inheritance from the URM of Notes #2, the latter being a number- theoretic programming language.
But what is a “digit”? In binary notation it is one of 0 or1. Indecimalnotationwehavethedigits0,1,…,9. In hexadecimal notation† we add the “digits” a,b,c,d,e,f that have “values”, in that order, 10,11,12,13,14,15. The objective is to have single-symbol, atomic, digits to avoid ambiguities in string notation.
Thus, a “digit” is an atomic symbol (unlike “10” or “11”).
We will drop the terminology “digit” from now on. Thus our automata alphabets are finite sets of symbols
—any length-ONE symbols, period.
†Base 16 notation.
Intro to Automata© 2020, by George Tourlakis
16
0.2.4 Example. Thus, if our alphabet is A = {0, 1}, then we cannot have the following configurations be part of a FA.
Nontotal Transition Function
0
Non-determinism
0
0
0.2.5 Example. The FA of the example of 0.2.1, in flow diagram form but with no decision on which state(s) is/are accepting is given below:
00 1
1
We wrote q0 and q1 for the states “0” and “1” of 0.2.1.
Intro to Automata© 2020, by George Tourlakis
0.2. Deterministic Finite Automata and their Languages 17 Another way to define a FA without the help of flow
diagrams is as follows:
0.2.6 Alternative Definition. (FA —Algebraically)
Afiniteautomaton,FA,isatoolboxM =(Q,A,q0,δ,F),‡ where
(1) Q is a finite set of states.
(2) A is a finite set of symbols; the input alphabet.
(3) q0 ∈ Q is the distinguished start state.
(4) δ : Q×A → Q is a total function, called the transition function.
(5) F ⊆ Q is the set of accepting states; Q−F is the set
of rejecting states.
‡“M” is generic; for “machine”.
Intro to Automata© 2020, by George Tourlakis
18
0.2.7 Remark. Let us compare Definitions 0.2.2 and 0.2.6.
(1) The set of states corresponds with the nodes of the graph (flow diagram) model. It is convenient —but not theoretically necessary in general— to actually name (label) the nodes with names from Q.
(2) The A in the flow diagram model is not announced separately, but can be extracted as the set of all edge labels.
(3) q0 —the start state by any name; q0 being generic— in the graph model is recognised/indicated as the node pointed at by an arrow that emanates from no node.
(4) δ : Q × A → Q in the graph model is given by the arrow structure: Referring to the figure at the begin-
ning of 0.2.1, we have δ(L,a) = M.
Intro to Automata© 2020, by George Tourlakis
0.2. Deterministic Finite Automata and their Languages 19
How does a FA compute? From the URM analogy, we understand the computation of a FA consisting of suc- cessive
• read moves
• attendant changes of state
• until the program halts (by reading the eof ).
• At that point we proclaim that the string formed by the stream of symbols read is accepted or rejected according as the halted machine is in an accepting or rejecting state.
Intro to Automata© 2020, by George Tourlakis
20
To formalise/mathematise FA computations as described above, we use snapshots or Instantaneous Descriptions (of a computation), for short IDs.
The IDs of the FA are very simple, since the machine (program) is incapable of altering the input stream.
You do not need to keep track of how the contents of variables change.
Intro to Automata© 2020, by George Tourlakis
0.2. Deterministic Finite Automata and their Languages 21
0.2.8 Remark. We recall from discrete mathematics, that a binary relation R is a set of ordered pairs and we prefer to write aRb instead of (a, b) ∈ R or R(a, b). For example, we write a ≤ b if R is ≤.
We also recall that the so-called transitive closure of a relation R, denoted R+, is defined by
+ Def
aR b ≡ aRa1Ra2 …am−1Rb, for some ai,i = 1,…,m−1
We note that
for all i, aiRai+1Rai+2 is short for aiRai+1 and ai+1Rai+2 just as a ≤ b ≤ c means a ≤ b and b ≤ c.
The reflexive transitive closure of R is denoted by R∗ and is defined by
∗ Def + aR b ≡ a = b ∨ aR b
The following also are useful:
m Def
aR b ≡ aRa1Ra2Ra3Ra4…am−2Ram−1Rb
that is, exactly m copies of R occur in the R-chain —or just “chain” if R is understood—
aRa1Ra2Ra3Ra4 . . . am−2Ram−1Rb
Finally, “aR