CS计算机代考程序代写 algorithm The Myhill-Nerode Theorem

The Myhill-Nerode Theorem

Prakash Panangaden

2nd February 2021

The collection of strings over an alphabet Σ, i.e. Σ∗ is an infinite set1 with a binary operation
called concatenation and written by placing the arguments next to each other as in xy; occasion-
ally we write x · y when we want to emphasize the operation. This operation has two properties:
associativity, which means ∀x, y, z in Σ∗, (xy)z = x(yz), so bracketing is not needed when perform-
ing repeated concatenations. The second property of concatenation is that it has a unit element,
namely ε; ∀x ∈ Σ∗, xε = εx = x. Such a structure is called a monoid.

Definition 1. A monoid is a set S with a binary associative operation and an identity element
for this operation.

Note that the operation is not necessarily commutative. If the operation happens to be commuta-
tive it is called a commutative monoid . The most basic example of a monoid is the set of natural
numbers with addition as the operation and 0 as the identity element for this operation. Strings
(words) over a non-empty alphabet with concatenation as the operation and the empty word as the
identity is an example. As noted above, this is not commutative. Another example of a monoid,
also not commutative, is the set of functions from a set to itself: the operation is, of course, function
composition2.

Given any set, say X, and an equivalence relation on X we have the set of equivalence classes of X;
written X/ ∼. The number of equivalence classes is called the index of the relation. For example,
we can define an equivalence relation on integers by saying that two numbers are equivalent if they
have the same remainder when divided by 5. This relation has 5 equivalence classes, hence its index
is 5. Note that each equivalence class in this case is infinite; this means that it contains infinitely
many members. Please note the difference between there being infinitely many distinct classes and
each class containing infinitely many elements.

Now we consider what happens when these two concepts interact.

Definition 2. An equivalence relation R on Σ∗ is said to be right-invariant if

∀x, y ∈ Σ∗, xRy ⇒ ∀z ∈ Σ∗, xzRyz.

In other words, if we have a related pair of strings and we stick the same string on the right side
of each we get another related pair, no matter what string we put on the right side. Of course,

1Except when Σ is the empty set.
2None of these examples yield groups

1

not every equivalence relation will have this property. We are going to look at two very crucial
examples of relations that do have this property.

Suppose we have a DFA M = (Q,Σ, q0, δ, F ). We have previously defined the function
δ∗ : Q× Σ∗ −→ Q by the inductive definition

∀q ∈ Q, δ∗(q, ε) = q, and ∀a ∈ Σ, x ∈ Σ∗, δ∗(q, ax) = δ∗(δ(q, a), x).

It follows from this definition that

∀x, y ∈ Σ∗, δ∗(q, xy) = δ∗(δ∗(q, x), y).

In the rest of this note x, y, z always stand for strings in Σ∗.

Definition 3. We define an equivalence relation RM on Σ
∗ as follows,

xRMy if and only if δ
∗(q0, x) = δ

∗(q0, y).

It is obviously an equivalence relation3 and, in fact, is right invariant as the following little calcu-
lation shows: suppose xRMy, then

∀z ∈ Σ∗, δ∗(q0, xz) = δ∗(δ∗(q0, x), z) = δ∗(δ∗(q0, y), z) = δ∗(q0, yz).

But this just means that xzRMyz.

Now we define another such equivalence relation, this time based on languages rather than on
machines.

Definition 4. Given any language L ⊆ Σ∗, not necessarily regular, we define an equivalence
relation RL on Σ

∗ as follows

xRLy if and only if ∀z, xz ∈ L⇔ yz ∈ L.

This says that two strings x and y are related if, for any third string z either both xz and yz are
in the language or neither one is in the language. In particular, this means that both x and y are
in L or neither one is in L; this follows from choosing z = ε.

Lemma 5. The equivalence relation RL is right invariant.

Proof . Suppose that xRLy, we must show that for any choice of z xzRLyz. Let u be any string and
suppose that xzu(= x(zu)) ∈ L, then using the definition of right invariance with zu (remember,
the definition says for every z!) we get that y(zu) ∈ L. Similarly, if xzu is not in L, yzu is also
not in L. What we have just shown is that if we consider xz and yz, then for any u, (xz)u ∈ L
iff (yz)u ∈ L. But this is exactly what xzRLyz means. In short, if xRLy then xzRLyz, i.e. RL is
right invariant.

3Make sure you see why!

2

Now we are ready to state and prove the main theorem of this note.

Theorem 6 (Myhill-Nerode). The following three statements are equivalent:

1. The language L is accepted by a deterministic finite automaton.

2. L is the union of some of the equivalence classes of some right-invariant equivalence relation
of finite index.

3. The equivalence relation RL has finite index. In fact any right-invariant equivalence relation
R with the property that L is the union of some of the equivalence classes of R will have
index greater than RL.

Proof . (1) implies (2):

Assume that L is accepted by some DFA M = (Q,Σ, q0, δ, F ). We have to find a right-invariant
equivalence relation that satisfies the conditions of statement (2). We have the DFA M , let us use
the one that is naturally associated with M , namely RM defined above. We already know that RM
is right invariant. How many equivalence classes can there be? As many as there are reachable
states from q0. Since the DFA has finitely many states

4 the equivalence relation RM has finite
index. Now every reachable state of M defines an equivalence class. The equivalence class of words
associated with the state q is the set

Sq
def
= {x|δ∗(q0, x) = q}.

Now the language L is just

L =

q∈F

Sq

which is exactly what statement (2) claims.

(2) implies (3)
In order to show this we will show that the index of RL is smaller than (or equal to) any equivalence
relation of the type mentioned in statement (2). We will do this by showing that for any equivalence
relation of the type mentioned in (2) the equivalence classes fit inside the equivalence classes of
RL: RL has “bigger” equivalence classes, hence it must have fewer of them. Let R be any right-
invariant equivalence relation with finite index and such that L is the union of some collection of
R’s equivalence classes. Let xRy, we will be done if we show that xRLy. First note that since L
is the union of some set of equivalence classes of R if any string x ∈ L then any string related by
R to x must also be in L. Note also that, since R is right-invariant for any z we have xzRyz. If
xz ∈ L then xz belongs to one of the equivalence classes whose union is L and hence yz is also in
L. The same argument holds with x and y interchanged so xRLy. Thus an equivalence class of R
must fit inside an equivalence class of RL. So the index of RL is less than or equal to the index of
R, which is finite, thus the index of RL is also finite.

(3) implies (1)

4That is what the “F” in DFA means.

3

We construct a DFA (Q′,Σ, q′0, δ
′, F ′) from the language L. Now we are assuming that RL has

finite index and we already know that it is right invariant from the earlier lemma. We take the
collection of equivalence classes of RL to be the set of states Q

′, we take the equivalence class of ε
to be q′0, we take δ

′ to be given by δ′([x], a) = [xa]5 and for F ′ we take {[x]|x ∈ L}. Now this is a
DFA and

δ′∗(q′0, x) = δ
′∗([ε], x) = [x]

so this machine accepts x if and only if x ∈ L by the way we defined F ′.

Now two machines may have different names for the states but be essentially the same. We call
this concept an “isomorphism”.

Definition 7. We say two machines M = (Q,Σ, q0, δ, F ) and M
′ = (Q′,Σ, q′0, δ

′, F ′) are isomor-
phic if there is a bijection φ : Q −→ Q′ such that

1. φ(q0) = q

0,

2. φ(δ(q, a)) = δ′(φ(q), a), Expressed diagrammatically this is:

q

φ

��

a // δ(q, a)

φ
��

φ(q)
a // δ′(φ(q), a) = φ(δ(q, a)),

3. and, q ∈ F ⇔ φ(q) ∈ F ′.

We will view two isomorphic machines as the same machine. So when I use the phrase, “the unique
machine” I mean unique up to isomorphism.

Proposition 8. The machine described in the last part of the proof of the theorem is the unique
minimal DFA that recognizes the language L.

Proof . We saw from the proof of (2) implies (3) that the number of states of any machine must
be greater than or equal to the number of states of the machine constructed in part (3). Suppose
that there is a machine M = (Q,Σ, q0, δ, F ) with the same number of states as the machine
M ′ = (Q′,Σ, q′0, δ

′, F ′) constructed in part (3) of the theorem. Then we can define a mapping
φ of the states of Q to the states of Q′ as follows. Let q ∈ Q. There must be some string
x such that δ∗(q0, x) = q, because if there weren’t we could just remove q from the machine.
We define φ(q) = [x]; note that this definition is independent of the choice of x. Note that
φ(δ(q, a)) = [xa] = δ′([x], a) = δ′(φ(q), a). Now for every “state” [x] (equivalence class of words) of
M ′ there is some q ∈ Q with φ(q) = [x]. Why? because φ(δ∗(q0, x)) = [x]. Thus φ is surjective
and, since the sets of states have the same size, φ is a bijection. It is easy to see that φ(q0) = [ε].
Suppose that q ∈ F then there is some word x in L such that δ∗(q0, x) = q so φ(q) = [x], but if
x ∈ L then [x] is an accept state of M ′; it is easy to reverse this argument to show that if φ(q) = [x]
and [x] is an accept state of M ′ it must be true that q is an accept state of M . Thus φ is an
isomorphism of DFAs.

5Why is this well defined? If I had chosen y from the equivalence class [x] I would have gotten [ya], but this is
OK since we know that if xRLy then xaRLya so it would have given the same equivalence class.

4

This proposition says that M states behave exactly like the corresponding states of M ′. In other
words if it is the same size as M ′ it looks just like M ′, the only difference is that the names of the
states are different.

The algorithms for minimization produce exactly the machine described in the proof of this theorem.
Thus we can algorithmically decide if two regular expressions define the same regular language
by constructing NFAs from the regular expressions, determinizing them to get DFAs and then
minimizing the resulting DFAs. If they describe the same language we must get the same minimal
DFA. It is completely mistaken to use the algebra of regular expressions to reason about machines,
one uses machines to reason about regular expressions. There is a rich and interesting algebraic
theory of regular expressions and one can prove equations without using machine models but it is
much harder. There is no such result for NFAs, there is not even a clear notion of what is meant
by a minimal NFA.

Do not read beyond this: I mean it!

In the definition of isomorphism we can weaken the condition that we have a bijection and just say
that we have a function. We call this a homomorphism of machines. If there is a homomorphism
between two machines they accept the same language. Now we can define a category with machines
as the objects and homomorphisms as the morphisms. In this category we can define an initial and
a terminal object: they will not be finite-state machines any more. One can define other similar
categories of machines not based on Set, thus one can define machines in Top or Vect to get
topological machines or linear machines or machines in categories of measure spaces (which I do for
a living). Thus the use of category theory allows one to lift the simple theory we are studying to
more exotic settings. There is even a beautiful duality theory for automata which I aaaarrrggg….
gasp… OK, I promise not to tell them any more.

5