程序代写代做代考 C graph algorithm go A Subset of the URM Language; FA and NFA

A Subset of the URM Language; FA and NFA
Lecture #22; Nov. 30
The FA and NFA of Notes #9 and #10 provide finite descriptions of regular languages, since an FA/NFA M is finite (a
graph, say) and a regular language is an L(M) for some
M.
The next section proposes another type of finite de- scription of regular languages.
Intro to Automata© 2020, by George Tourlakis

2
0.1. Regular Expressions
Regular expressions are familiar to users of the UNIX operating system.
They are names for regular sets as we will see.
• Do they name ALL regular sets, i.e., all sets of the
type L(M) where M is a FA (or NFA, equivalently)? • Do they name any NON regular sets?
We will see that we must answer YES, NO.
Regular Expressions are more than “just names” as they embody enough information —as we will see— to be me- chanically transformable into a NFA (and thus to a FA as well).
Intro to Automata© 2020, by George Tourlakis

0.1. Regular Expressions 3
0.1.1 Definition. (Regular expressions over Σ) Given the finite alphabet of atomic symbols Σ, we form the ex- tended alphabet
Σ∪{∅,+,·,∗,(,)} (1)
where the symbols ∅, +, ·, ∗, (, ) (not including the comma separators) are all abstract or formal∗ and do not occur in Σ. In particular, “∅” in this alphabet is just a symbol — do NOT interpret it! (Yet!)
So are “+”, “·”, “∗” and the brackets. All these sym- bols will be interpreted shortly.
The set of regular expressions over Σ is a set of strings over the augmented alphabet above, given inductively by
Regular expressions are names, formed as strings over the alphabet (1) as follows :
(1) Every member of Σ ∪ {∅} is a regular expression.
Examples for case (1): If Σ = {0, 1} then 0, 1, and ∅, all viewed as abstract symbols with no interpretation are each a regular expression.
(2) If α and β are (names of) regular expressions, then so is the string (α + β)
(3) If α and β are (names of) regular expressions, then so is the string (α · β)
(4) If α is a (name of) regular expression, then so is the string (α∗)
∗Employed to define form or structure.
Intro to Automata© 2020, by George Tourlakis

4
The letters α, β, γ are used as metavariables (syntactic variables) in this definition. They will stand for arbitrary regular expressions (we may add primes or subscripts to increase the number of our metavariables).
Intro to Automata© 2020, by George Tourlakis
􏰙

0.1. Regular Expressions 5 􏰚 0.1.2 Remark.
(i) We emphasize that regular expressions are built starting from the objects contained in Σ ∪ {∅}.
We also emphasize that we have NOT talked about semantics yet, that is, we did NOT say YET what sets these expressions will name, nor, what “+, “·” and “∗” mean.
(ii) We will often omit the “dot” in (α · β) and write simply (αβ).
(iii) We assign the highest priority to ∗, the next lower to · and the lowest to +.
We will let α ◦ α′ ◦ α′′ ◦ α′′′ group (“associate”) from right to left, for any ◦ ∈ {+, ·,∗ }.
Given these priorities, we may omit some brackets, as is usual.
Thus, α + βγ∗ means 􏰞α + 􏰜β(γ∗)􏰝􏰟
and αβγ means (α(βγ)). 􏰙 􏰚
Intro to Automata© 2020, by George Tourlakis

6
We next define what sets these expressions name (se- mantics).
0.1.3 Definition. (Regular expression semantics)
We define the semantics of any regular expression over Σ by recursion on the Definition 0.1.1.
We use the notation L(α) to indicate the set named by α.
(1) L(∅) = ∅, where the left “∅” is the symbol in the augmented alphabet (1) above, while the right “∅” is the name of the empty set in ordinary MATH.
(2) L(a) = {a}, for each a ∈ Σ
(3) L(α+β)=L(α)∪L(β)
(4) L(α · β) = L(α)L(β) —where for two languages (sets of strings!) L and L′, LL′ —the concatenation of the SETS in this order— stands for {xy : x ∈ L∧y ∈ L′}.
(5) L(α∗) = 􏰞L(α)􏰟∗† —where for any set S —finite or not— S∗ denotes the set of all strings
x1x2 …xn, for n ≥ 0, and where all (strings) xi ∈ S where n = 0 means that x1x2 …xn = λ.
Thus, in particular, we have always λ ∈ S∗.
􏰙
†The ∗ in S∗ is called the Kleene closure. So S∗ is the Kleene closure of S. Intro to Automata© 2020, by George Tourlakis

0.1. Regular Expressions 7
0.1.4 Example. Let Σ = {0, 1}. Then L􏰞(0 + 1)∗􏰟 = Σ∗. Indeed, this is because L􏰞0 + 1􏰟 = L(0) ∪ L(1) =
{0} ∪ {1} = {0, 1} = Σ.
0.1.5 Example. We note that L(∅ ) = L(∅) = ∅ =
{λ}.
Why so?
Because Σ∗ is λ along with the set of all strings formed using symbols from Σ.
∅ has no symbols to form strings with. So all we got is λ.
See last “red” comment in Def. 0.1.3.
Because of the above, we add “λ” as a DEFINED NAME —not in the original alphabet— for the set {λ}. 􏰙
∗ 􏰞 􏰟∗ ∗
􏰙
Intro to Automata© 2020, by George Tourlakis

8
Of course, two regular expressions α and β over the same alphabet Σ are equal, written α = β, iff they are so as strings.
We also have another, semantic, concept of regular expression “equality”:
0.1.6 Definition. (Regular expression equivalence) We say that two regular expressions α and β over the same alphabet Σ are equivalent, written α ∼ β, iff they name the same set/language, that is, iff L(α) = L(β).
Intro to Automata© 2020, by George Tourlakis
􏰙

0.1. Regular Expressions 9 0.1.7 Example. Let Σ = {0, 1}. Then (0+1)∗ ∼ 􏰜0∗1∗􏰝∗.
Indeed, L􏰜(0 + 1)∗􏰝 = Σ∗, by 0.1.4. So, if anything, we do have
L􏰞(0 + 1)∗􏰟 ⊇ L􏰞(0∗1∗)∗􏰟
Now —for L􏰞(0 + 1)∗􏰟 ⊆ L􏰞(0∗1∗)∗􏰟— the set
L􏰞(0∗1∗)∗􏰟 􏰥􏰤􏰣􏰦
A
is A∗ where
A=L(0∗1∗)={0n1m :n≥0∧m≥0}
because
L(0∗)=L(0)∗ ={0}∗ ={0n :n≥0}
and similarly for
L(1∗)=L(1)∗ ={1}∗ ={1m :m≥0}
􏰚 It should be clear that any string of 0s and 1s can be built using as building blocks 0n1m judiciously choosing n and m values. 􏰚
E.g., 0110011 can be thought of as 0110 00110 01110
More generally, to show that an arbitrary string over Σ, …0k …1r … (1)
is in A∗ view (1) as
…0k10 …001r …
Intro to Automata© 2020, by George Tourlakis

10
says that Σ∗ ⊆ L􏰞(0∗1∗)∗􏰟. Done.
􏰚 Bytheaboveexample,α∼βdoesNOTimplyα=β. 􏰚
But then the statement between the 􏰚 signs simply
Intro to Automata© 2020, by George Tourlakis
􏰙

0.2. From a Regular Expression to NFA and Back 11 0.2. From a Regular Expression to NFA and
Back
There is a mechanical procedure (algorithm), which from a given regular expression α constructs a NFA M so that L(α) = L(M), and conversely:
Given a NFA M constructs a regular expression α so that L(α) = L(M).
We split the procedure into two directions. First, we go from regular expression to a NFA.
Intro to Automata© 2020, by George Tourlakis

12
0.2.1 Theorem. (Kleene) For any regular expression α over an alphabet Σ we can construct a NFA M with input alphabet Σ so that L(α) = L(M).
Proof. Induction over the closure of Definition 0.1.1 — that is, on the formation of a regular expression α ac- cording to the said definition. For the basis we consider the cases
• α = ∅; the NFA below works
• α = a, where a ∈ Σ; the NFA below works a
Both of the above NFA have EXACTLY ONE accepting state. Our construction maintains this property through- out.
That is, all the NFA we construct in this proof will have that form, namely
Intro to Automata© 2020, by George Tourlakis

0.2. From a Regular Expression to NFA and Back 13
Assume now (the I.H. on regular expressions!) that we have built NFA for α and β —M and N— so that L(α) = L(M) and L(β) = L(N). Moreover, these M and N have the form above. For the induction step we have three cases:
• To build a NFA for α + β, that is, one that accepts the language L(M) ∪ L(N). The NFA below works since the accepting paths are precisely those from M and those from N.
q
q’
However, to maintain the single accepting state form, we modify it as the NFA below.
M
N
M
q
q’
N
Intro to Automata© 2020, by George Tourlakis

14
• To build a NFA for αβ, that is, one that accepts the language L(M)L(N).
The NFA below works —since the accepting paths are precisely those formed by concatenating an accepting path of M (labeled by some x ∈ L(M)) with an λ- move and then with an accepting path of N (labeled by some y ∈ L(N));
in that left to right order.
The λ that connects M and N will not affect the
path name: xλy = xy.
M
Intro to Automata© 2020, by George Tourlakis
N

0.2. From a Regular Expression to NFA and Back 15
• To build a NFA for α∗, that is, one that accepts the language L(M)∗. The NFA below, that we call P, works. That is, L(P) = L(M)∗.
Intro to Automata© 2020, by George Tourlakis
M
􏰙

16
0.2.2 Theorem. (Kleene) For any FA or NFA M with input alphabet Σ we can construct a regular expression α over Σ so that L(α) = L(M).
Proof. Given a FA M (if a NFA is given, then we convert it to a FA first).
We will construct an α with the required properties. The idea is to express L(M ) in terms of simple to describe (indeed, regular themselves) sets of strings over Σ by re- peatedly using the operations ·, ∪ and Kleene star, a finite number of times.
􏰚 These regular sets —NAMEABLE by RegEXs— are called by Kleene “Rk ”, where k ≤ n and where the state set of
the FA is
q1,q2,…,qn —the same “n” as above
It turns out that “􏰱 Rn ” is the set of all FA-acceptable 1j
j
strings, the union taken over all accepting qj. 􏰚
Lecture #23, Dec. 7
ij
Intro to Automata© 2020, by George Tourlakis

0.2. From a Regular Expression to NFA and Back 17
So let Q = {q1,q2,…,qn} be the set of states of M, where q1 is the start state.† We will refer to the set of M’s accepting states as F.
We next define several sets of strings (over Σ) —denoted by Rk , for k = 0,1,…,n and each i and j ranging from 1
ij
to n.
Rk ={x∈Σ∗ :xlabelsapathfromqi toqj
ij
and every qm in this path, other than the (1)
endpoints qi and qj, satisfies m ≤ k}
􏰚 A superscript of n removes the restriction on the path
qi⌢x qj (2) since every state qm satisfies m ≤ n.
Thus Rn contains ALL strings that name FA-paths ij
from qi to qj —no restriction on where these paths pass through. 􏰚
†We start numbering states from 1 rather than 0 for technical convenience; see the blue sentence at the top of next page.
Intro to Automata© 2020, by George Tourlakis

18
We first note that for k = 0 we get very small finite sets.
Indeed, since state numbering starts at 1, the condi- tion m ≤ 0 is false and therefore in R0 we have the
cases:
• if we have i ̸= j, then the condition (2) on p.17 can hold precisely when x = a ∈ Σ for some a —since there can be no nodes in the interior of x.
That is, we have precisely the case:
q i →a q j ( † )
• The case i = j also allows λ in the set, since we have ONE state:
qi = qj (‡) In words, “I can go from qi to qj DETERMINISTI-
CALLY without consuming ANY input”. To summarize, for all i and j we have
ij
􏰘{a∈Σ: Case(†)} ifi̸=j
R0 = (3)
ij {λ}∪{a∈Σ: Case(†)} ifi=j
Intro to Automata© 2020, by George Tourlakis

0.2. From a Regular Expression to NFA and Back 19 Since every finite set of strings can be named by a reg-
ular expression (Exercise!),
there are RegEx: α0 such that L(α0 ) = R0 , for all i,j
(4)
For example, say A = {3, 5, 8, λ}. This is a finite set. It is NOT an alphabet (contains λ).
Then the RegEX 3+5+8+λ = 3+5+8+∅∗ NAMES A. Why? BecauseA={3}∪{5}∪{8}∪{λ}.
Intro to Automata© 2020, by George Tourlakis
ij ijij

20
sively using k as the recursion variable and i, j as parameters, and taking (3) as the basis of the recursion.
To see this, consider a path labeled x in Rk , for k > 0. ij
It is possible that all qm (other than qi and qj) that occur in the path have m < k. Then this x also belongs to Rk−1. If on the other hand we DO have qk appear in the interior of the path labeled x, one or more times, then we have the picture below. ... where the qk occurrences start immediately after the path named z0 and are connected by paths named zi, for i = 1,...,t. Thus, x = z0z1z2 ...ztzt+1. Noting that z0 ∈ Rk−1, zi ∈ Rk−1 —for i = 1,...,t— and zt+1 ∈ Rk−1, we ik kk kj have that x ∈ Rk−1 ·􏰜Rk−1􏰝∗ ·Rk−1. We have established, ik kk kj for all k ≥ 1 and all i, j, that Rk = Rk−1 ∪ Rk−1 · 􏰜Rk−1􏰝∗ · Rk−1 (4) ij ij ik kk kj 􏰚 Explanation. Noting that 􏰜Rk−1􏰝∗ = {λ}∪Rk−1∪ kk kk Rk−1Rk−1∪Rk−1Rk−1Rk−1∪Rk−1Rk−1Rk−1Rk−1∪ . . . kk kk kk kk kk kk kk kk kk Intro to Automata© 2020, by George Tourlakis Next note that the Rk can be COMPUTED recur- ij ij 0.2. From a Regular Expression to NFA and Back 21 the set of paths, from qi to qj depicted in the following part of (4): may contain one interior qk two interior qk three interior qk four interior qk five interior qk etc. Rk−1 · 􏰜Rk−1􏰝∗ · Rk−1 ik kk kj case corresponds to λ case corresponds to Rk−1 case corresponds to Rk−1Rk−1 kk kk case corresponds to Rk−1Rk−1Rk−1 kk kk kk case corresponds to Rk−1Rk−1Rk−1Rk−1 kk kk kk kk 􏰚 I.H. that for k − 1 ≥ 0 (fixed!) and all ij that L(αk−1) = Rk−1 —that is, αk−1 NAMES the set ij ij ij Rk−1. ij We see that we can construct —from the αk−1— reg- ij ular expressions αk for the Rk . ij ij Indeed, using the I.H. and (4), we have the RegEX αk GIVEN, for all i,j and the fixed k, by αk = αk−1 + αk−1􏰜αk−1􏰝∗αk−1 (5) ij ij ik kk kj Now take the values of i and j we have regular expressions αk−1 such kk ij Along with the basis (3) that the R0 sets CAN be named ij being finite, this induction proves that all the Rk can be ij named by regular expressions, which we may construct, from the basis up. Intro to Automata© 2020, by George Tourlakis 22 Therefore, as a RegEX: finitely many terms 􏰲n 􏰣n n􏰦􏰥 n􏰤 α1j =α1j1 +α1j2 +...+α1jm qj ∈F The above is a finite union (F is finite!) of sets named by αn with qj ∈ F. Thus we may construct its name 1j as the “sum” (using “+”, that is) of the names αn with 1j Finally, the set L(M) can be so named. Indeed, L(M) = 􏰳 Rn 1j qj ∈F qj ∈F. 􏰙 Intro to Automata© 2020, by George Tourlakis 0.2. From a Regular Expression to NFA and Back 23 0.2.3 Example. Consider the FA below. 00 1 1 1 2 We will compute regular expressions for: • all sets R0 • all sets R1 ij • all sets R2 ij Recall the definition of the Rk , here for k = 0, 1, 2 and ij i, j ranging in {1, 2} (cf. proof of 0.2.2): {x : qi ⌢x qj , where no state in this computation, other than possibly the end-points qi and qj, has index higher than k} This leads —as we saw— to the recurrence: Rk = Rk−1 ∪ Rk−1(Rk−1)∗Rk−1 ij ij ik kk kj Below I employ the abbreviated (regular expression) name “λ” for ∅∗. Intro to Automata© 2020, by George Tourlakis ij SET RegEx R101 λ+0 R102 1 R201 1 R202 λ+0 24 Superscript 1 now: SET RegEx: By Direct Substitution R11 = R101 ∪ R101(R101)∗R101 λ+0+(λ+0)(λ+0)∗(λ+0) R12 = R102 ∪ R101(R101)∗R102 1+(λ+0)(λ+0)∗1 R211 = R201 ∪ R201(R101)∗R101 1+1(λ+0)∗(λ+0) R212 = R202 ∪ R201(R101)∗R102 λ+0+1(λ+0)∗1 Using the previous table, the reader will have no diffi- culty to fill in the regular expressions under the heading “RegEx: By Direct Substitution” in the next table. To make things easier it is best to simplify the reg- ular expressions of the previous table, meaning, find- ing simpler, equivalent ones. For example, L􏰜λ + 0 + (λ + 0)(λ + 0)∗(λ + 0)􏰝 = {λ,0} ∪ {λ,0}{λ,0}∗{λ,0} = {λ, 0} ∪ {λ, 0}{λ, 0, 00, 000, . . .}{λ, 0} = {0}∗, thus λ + 0 + (λ + 0)(λ + 0)∗(λ + 0) ∼ 0∗ Superscript 2: SET RegEx: By Direct Substitution R121 = R11 ∪ R12(R212)∗R211 R122 = R12 ∪ R12(R212)∗R212 R21 = R201 ∪ R202(R212)∗R211 R22 = R212 ∪ R212(R212)∗R212 Intro to Automata© 2020, by George Tourlakis 􏰙 0.3. Another Example 25 0.3. Another Example 0.3.1 Example. Let us show another NFA to FA con- version. OK, given the following NFA which clearly decides the language over Σ = {0, 1} given by the RegEx (0 + 1)∗00 that is, the language containing ALL strings that end in two 0s. 0, 1 0 ab 0 c The DETERMINISTIC FA equivalent to the above is the following: a 1 0 ab Intro to Automata© 2020, by George Tourlakis 1 1 0abc 0 􏰙