COMP30026 Models of Computation – Regular Expressions and Non-Regular Languages
COMP30026 Models of Computation
Regular Expressions and Non-Regular Languages
Bach Le / Anna Kalenkova
Lecture Week 8 Part 1
Semester 2, 2021
Models of Computation (Sem 2, 2021) Regular Expressions © University of Melbourne 1 / 36
This Lecture is Being Recorded
Models of Computation (Sem 2, 2021) Regular Expressions © University of Melbourne 2 / 36
Subset Construction Again…
1 2 3
a
a
ǫ
b
Adding new state to DFA:
1 Step 1: Move on a symbol
2 Step 2: Build ǫ-closure
A B∗ C ∗
D
a
a
b
bb a
a,b
a b
A={1} B∗ D
B∗={2,3} B∗ C∗
C∗={3} D C∗
D=∅ D D
Models of Computation (Sem 2, 2021) Regular Expressions © University of Melbourne 3 / 36
Closure Results for Regular Languages
Theorem: The class of regular languages is closed under union.
Proof: Let A and B be regular languages, with DFAs MA and MB as
recognisers. Together the two DFAs make up an NFA which
recognises A ∪ B :
MA
MB
ǫ
ǫ
The ǫ-transitions go to the start states of MA and MB .
Models of Computation (Sem 2, 2021) Regular Expressions © University of Melbourne 4 / 36
Closure Results for Regular Languages
Theorem: The class of regular languages is closed under ◦.
Proof: Let A and B be regular, with these DFAs as recognisers:
From these we can easily construct an NFA that recognises A ◦ B :
ǫ
ǫ
Models of Computation (Sem 2, 2021) Regular Expressions © University of Melbourne 5 / 36
That Last Construction, Formally
Let recognisers for A and B be these DFAs, respectively:
MA = (Q,Σ, δ, q0, F )
MB = (Q
′,Σ, δ′, q′0, F
′)
A recogniser for A ◦ B is the NFA (Q ∪ Q ′,Σ, δ′′, {q0}, F
′), where
δ
′′(q, v ) =
{δ′(q, v )} if q ∈ Q ′ and v ∈ Σ
{δ(q, v )} if q ∈ Q and v ∈ Σ
{q′0} if q ∈ F and v = ǫ
Models of Computation (Sem 2, 2021) Regular Expressions © University of Melbourne 6 / 36
Closure Results for Regular Languages
Theorem: The class of regular languages is closed under Kleene
star.
Proof: Let A be a regular
language with the DFA shown
on the right as recogniser.
Here is how we construct an
NFA to recognise A∗:
ǫ
ǫ
ǫ
Models of Computation (Sem 2, 2021) Regular Expressions © University of Melbourne 7 / 36
Closure Results for Regular Languages
Regular languages have several other closure properties.
They are closed under
intersection,
complement, Ac
difference (this follows, as A \ B = A ∩ Bc)
reversal.
Models of Computation (Sem 2, 2021) Regular Expressions © University of Melbourne 8 / 36
Algorithms for Manipulating Automata
For some of these closure results, we will use the tutorials to develop
useful DFA manipulation algorithms.
For this reason, the exercises are very important.
You will see, for example, how to systematically build DFAs for
languages A ∩ B , out of DFAs for A and B .
Models of Computation (Sem 2, 2021) Regular Expressions © University of Melbourne 9 / 36
Equivalence of DFAs
We can always find a minimal DFA for a given regular language
(by minimal we mean having the smallest possible number of states).
Since a DFA has a unique start state and the transition function is
total and deterministic, we can test two DFAs for equivalence
(modulo the names used for their states) by minimizing them.
Models of Computation (Sem 2, 2021) Regular Expressions © University of Melbourne 10 / 36
Minimizing DFAs
There is no guarantee that DFAs that are produced by the various
algorithms, such as the subset construction method, will be minimal.
1
2 3
a
ǫ
a, b
b
a =⇒
A
B∗ C ∗
D
a
a
b
b
a
a, b
b
A = {1, 3}, B∗ = {1, 2, 3}, C ∗ = {2, 3}, and D = ∅.
Models of Computation (Sem 2, 2021) Regular Expressions © University of Melbourne 11 / 36
Generating a Minimal DFA
The following algorithm takes an NFA and produces an equivalent
minimal DFA. Of course the input can also be a DFA.
1 Reverse the NFA;
2 Determinize the result;
3 Reverse again;
4 Determinize.
To reverse an NFA A with start states I and accept states F , F 6= ∅:
simply reverse every transition in A and swap I and F .
Models of Computation (Sem 2, 2021) Regular Expressions © University of Melbourne 12 / 36
Minimization Example
Consider again the NFA that we determinized two slides ago.
Here it is on the left, with its reversal on the right:
1
2 3
a
ǫ
a, b
b
a
1
2 3
a
ǫ
a, b
b
a
Now make the reversed NFA
deterministic
(we have renamed the states to avoid
later confusion:
4 corresponds to {2}, 5 to {1, 2},
and 6 to {1, 2, 3}).
4 5 6
a
b
a
b a
b
Models of Computation (Sem 2, 2021) Regular Expressions © University of Melbourne 13 / 36
Minimization Example
Now reverse the result:
4 5 6
a
b
a
b
a
b
Finally make the result deterministic:
{5, 6} {4, 5, 6}
∅
a
a, b
b
a, b
Models of Computation (Sem 2, 2021) Regular Expressions © University of Melbourne 14 / 36
Regular Expressions
Regular expressions is a notation for languages.
You are probably familiar with similar notation in Unix, Python or
JavaScript (but note also that “regular expression” means different
things to different programmers).
Example:
(0 ∪ 1)(0 ∪ 1)(0 ∪ 1)((0 ∪ 1)(0 ∪ 1)(0 ∪ 1))∗ denotes the set of
non-empty strings with the lengths that are multiple of 3.
The star binds tighter than concatenation, which in turn binds tighter
than union.
Models of Computation (Sem 2, 2021) Regular Expressions © University of Melbourne 15 / 36
Regular Expressions
Syntax:
The regular expressions over an alphabet Σ = {a1, . . . , an} are given
by the grammar
regexp → a1 | · · · | an | ǫ | ∅
| regexp ∪ regexp | regexp regexp | regexp∗
Semantics:
L(a) = {a}
L(ǫ) = {ǫ}
L(∅) = ∅
L(R1 ∪ R2) = L(R1) ∪ L(R2)
L(R1 R2) = L(R1) ◦ L(R2)
L(R∗) = L(R)∗
Models of Computation (Sem 2, 2021) Regular Expressions © University of Melbourne 16 / 36
Regular Expressions – Examples
ǫ : {ǫ}
1 : {1}
110 : {110}
((0 ∪ 1)(0 ∪ 1))∗ : all binary strings of even length
(0 ∪ ǫ)(ǫ ∪ 1) : {ǫ, 0, 1, 01}
1∗ : all finite sequences of 1s
ǫ ∪ 1 ∪ (ǫ ∪ 1)∗(ǫ ∪ 1) : all finite sequences of 1s
(1∗0∗)∗ : ?
Models of Computation (Sem 2, 2021) Regular Expressions © University of Melbourne 17 / 36
Regular Expressions vs Automata
Theorem: L is regular iff L can be described by a regular expression.
First note that, given NFA N = (Q,Σ, δ, I , F ), we can build an
equivalent NFA with at most one start state, like so: If |I | ≤ 1, do
nothing. Otherwise transform N to N ′ = (Q ∪ {qi},Σ, δ
′, {qi}, F ) by
adding a new state qi , with ǫ transitions from qi to each state in I :
δ
′(q, v ) =
{
I if q = qi and v = ǫ
δ(q, v ) otherwise
Example:
N:
a
b
N ′:
ǫ
a
ǫ b
Models of Computation (Sem 2, 2021) Regular Expressions © University of Melbourne 18 / 36
NFAs from Regular Expressions
We now show the ‘if’ direction of the theorem, by showing how to
convert a regular expression R into an NFA that recognises L(R).
The proof is by structural induction over the form of R .
Case R = a: Construct
a
Case R = ǫ: Construct
Case R = ∅: Construct
Case R = R1 ∪ R2, R = R1 R2, or R = R
∗
1 :
We already gave the constructions when we showed that regular
languages are closed under the regular operations! They work
because we can assume each NFA involved has a single start state.
Models of Computation (Sem 2, 2021) Regular Expressions © University of Melbourne 19 / 36
NFAs from Regular Expressions: Example
Let us construct, in the proposed systematic way, an NFA for
(a ∪ b)∗bc.
Start from innermost expressions and work out:
a
b
So a single-start state NFA for a ∪ b is:
ǫ
a
ǫ
b
Models of Computation (Sem 2, 2021) Regular Expressions © University of Melbourne 20 / 36
NFAs from Regular Expressions
Then (a ∪ b)∗ yields:
ǫ
a
ǫ
b
ǫ
ǫ
ǫǫ
Finally (a ∪ b)∗bc yields:
ǫ
a
ǫ
b
ǫ
ǫ
ǫ
ǫ
ǫ
ǫ
b ǫ c
Models of Computation (Sem 2, 2021) Regular Expressions © University of Melbourne 21 / 36
Regular Expressions from NFAs
We now show the ‘only if’ direction of the theorem.
First note that, given an NFA N, we can build an equivalent NFA
with at most one accept state. We transform N = (Q,Σ, δ, I , F ) to
N ′ = (Q ∪ {qf },Σ, δ
′, I , {qf }) like so: If |F | ≤ 1, do nothing.
Otherwise add a new qf and ǫ transitions to qf from each state in F .
qf becomes the only accept state:
δ
′(q, v ) =
{
δ(q, v ) ∪ {qf } if q ∈ F and v = ǫ
δ(q, v ) otherwise
N:
a
a
a
b
N ′:
a
a
a
bb
ǫ
ǫ
ǫ
Models of Computation (Sem 2, 2021) Regular Expressions © University of Melbourne 22 / 36
Regular Expressions from NFAs
We sketch how an NFA can be turned into a regular expression in a
systematic process of “state elimination”.
In the process, arcs get labelled with regular expressions.
Start by making sure the NFA has a single accept state and a single
start state.
Repeatedly eliminate states that are neither start nor accept states.
The process produces either
R1 R3R2
R4
or
R
We get (R1 ∪ R2R
∗
3R4)
∗R2R
∗
3 in the first case; R
∗ in the second.
Note that some Rs may be ǫ or ∅Models of Computation (Sem 2, 2021) Regular Expressions © University of Melbourne 23 / 36
The State Elimination Process
Consider a node
R2
R1 R3
Any such pair of incoming/outgoing arcs get replaced by a single arc
that bypasses the node. The new arc gets the label R1R
∗
2R3.
If there are m incoming and n outgoing arcs, these arcs are replaced
by m × n bypassing arcs when the node is removed.
Let us illustrate the process on this example:
A B C D
0, 1
1 0, 1 0, 1
Models of Computation (Sem 2, 2021) Regular Expressions © University of Melbourne 24 / 36
State Elimination Example
Create a single accept state:
A B C
D
E
0, 1
1 0, 1
0, 1
ǫ
ǫ
Eliminate D (and use regular expressions with all arcs):
A B C E
0 ∪ 1
1 0 ∪ 1 ǫ ∪ 0 ∪ 1
Now eliminate B : A C E
0 ∪ 1
1(0 ∪ 1) ǫ ∪ 0 ∪ 1
and then C : A E
0 ∪ 1
1(0 ∪ 1)(ǫ ∪ 0 ∪ 1)
Models of Computation (Sem 2, 2021) Regular Expressions © University of Melbourne 25 / 36
State Elimination Example
Note that A E
0 ∪ 1
1(0 ∪ 1)(ǫ ∪ 0 ∪ 1)
is
R1 R3R2
R4
with
R1 = 0 ∪ 1
R2 = 1(0 ∪ 1)(ǫ ∪ 0 ∪ 1)
R3 = R4 = ∅
Hence the instance of the general “recipe” (R1 ∪ R2R
∗
3R4)
∗R2R
∗
3 is
(0 ∪ 1)∗1(0 ∪ 1)(ǫ ∪ 0 ∪ 1)
Sipser (see “Readings Online” on Canvas) provides more details of
this kind of translation.
Models of Computation (Sem 2, 2021) Regular Expressions © University of Melbourne 26 / 36
Some Useful Laws for Regular Expressions
A ∪ A = A
A ∪ B = B ∪ A
(A ∪ B) ∪ C = A ∪ (B ∪ C ) = A ∪ B ∪ C
(A B) C = A (B C ) = A B C
∅ ∪ A = A ∪ ∅ = A
ǫ A = A ǫ = A
∅ A = A ∅ = ∅
Models of Computation (Sem 2, 2021) Regular Expressions © University of Melbourne 27 / 36
More Useful Laws for Regular Expressions
(A ∪ B) C = A C ∪ B C
A (B ∪ C ) = A B ∪ A C
(A∗)∗ = A∗
∅∗ = ǫ∗ = ǫ
(ǫ ∪ A)∗ = A∗
(A ∪ B)∗ = (A∗B∗)∗
Models of Computation (Sem 2, 2021) Regular Expressions © University of Melbourne 28 / 36
Limitations of Finite-State Automata
Consider the language
{0n1n | n ≥ 0} = {ǫ, 01, 0011, 000111, . . .}
Intuitively we cannot build a DFA to recognise this language, because
a DFA has no memory of its actions so far.
Pumping lemma gives the formal proof.
Exercise: Is the language L1 = {0
n1n | 0 ≤ n ≤ 999999999} regular?
Models of Computation (Sem 2, 2021) Regular Expressions © University of Melbourne 29 / 36
The Pumping Lemma for Regular Languages
This is the standard tool for proving languages non-regular.
Loosely, it says that if we have a regular language A and consider a
sufficiently long string s ∈ A, then a recogniser for A must traverse
some loop to accept s.
So A must contain infinitely many strings exhibiting repetition of
some substring in s.
Pumping Lemma: If A is regular then there is a number p such
that for any string s ∈ A with |s| ≥ p, s can be written as s = xyz ,
satisfying
1 xy iz ∈ A for all i ≥ 0
2 y 6= ǫ
3 |xy | ≤ p
We call p the pumping length
Models of Computation (Sem 2, 2021) Regular Expressions © University of Melbourne 30 / 36
Proving the Pumping Lemma
Let DFA M = (Q,Σ, δ, q0, F ) recognise A.
Let p = |Q| and consider s with |s| ≥ p. Let the number of states of
M be p, let |s| ≥ p.
In an accepting run for s,
some state must be re-visited.
Let qi be the first such state.
At the first visit, x has been
consumed, at the second, xy ,
(strictly longer than x).
q0
qi
qf
x
z
y
Consider the first time a state (qi) is re-visited. This suggests a way
of splitting s into x , y and z such that xz , xyz , xyyz , . . . are all in A.
Notice that y 6= ǫ. Let m + 1 be the number of state visits when
reading xy , then |xy | = m ≤ p, because m+ 1 is the number of state
visits with only one repetition.
Models of Computation (Sem 2, 2021) Regular Expressions © University of Melbourne 31 / 36
Using the Pumping Lemma
The pumping lemma says:
A regular ⇒ ∃p∀s ∈ A :
{
s can be written
xyz such that . . .
We can use its contrapositive to show that a language is non-regular:
∀p∃s ∈ A :
{
s can’t be written
xyz such that . . .
}
⇒ A not regular
Coming up with such an s is sometimes easy, sometimes difficult.
Models of Computation (Sem 2, 2021) Regular Expressions © University of Melbourne 32 / 36
Pumping Example 1
We show that B = {0n1n | n ≥ 0} is not regular.
Assume it is, and let p be the pumping length.
Consider 0p1p ∈ B with length greater than p.
By the pumping lemma, 0p1p = xyz , with xy iz in B for all i ≥ 0,
y 6= ǫ, and |xy | ≤ p.
Since |xy | ≤ p, y consists entirely of 0s.
But then xyyz 6∈ B , a contradiction.
So we inevitably arrive at a contradiction if we assume that B is
regular.
Models of Computation (Sem 2, 2021) Regular Expressions © University of Melbourne 33 / 36
Pumping Example 2
C = {w | w has an equal number of 0s and 1s} is not regular.
A simple proof: If C were regular then also B from before would be
regular, since B = C ∩ 0∗1∗ and regular languages are closed under
intersection.
Models of Computation (Sem 2, 2021) Regular Expressions © University of Melbourne 34 / 36
Pumping Example 3
We show that D = {ww | w ∈ {0, 1}∗} is not regular.
Assume it is, and let p be the pumping length.
Consider 0p10p1 ∈ D with length greater than p.
By the pumping lemma, 0p10p1 = xyz , with xy iz in D for all i ≥ 0,
y 6= ǫ, and |xy | ≤ p.
Since |xy | ≤ p, y consists entirely of 0s.
But then xyyz 6∈ D, a contradiction.
Models of Computation (Sem 2, 2021) Regular Expressions © University of Melbourne 35 / 36
Example 4 – Pumping Down
We show that E = {0i1j | i > j} is not regular.
Assume it is, and let p be the pumping length.
Consider 0p+11p ∈ E .
By the pumping lemma, 0p+11p = xyz , with xy iz in E for all i ≥ 0,
y 6= ǫ, and |xy | ≤ p.
Since |xy | ≤ p, y consists entirely of 0s.
But then xz 6∈ E , a contradiction.
Models of Computation (Sem 2, 2021) Regular Expressions © University of Melbourne 36 / 36