PowerPoint Presentation
BU CS 332 – Theory of Computation
Lecture 6:
• Regexes = NFAs
• Non-regular languages
Reading:
Sipser Ch 1.3
“Myhill-Nerode” note
Sipser Ch 1.4 (optional)
Mark Bun
February 10, 2021
Regular Expressions – Syntax
A regular expression 𝑅𝑅 is defined recursively using the
following rules:
1. 𝜀𝜀, ∅, and 𝑎𝑎 are regular expressions for every 𝑎𝑎 ∈ Σ
2. If 𝑅𝑅1 and 𝑅𝑅2 are regular expressions, then so are
(𝑅𝑅1∪ 𝑅𝑅2), (𝑅𝑅1∘ 𝑅𝑅2), and (𝑅𝑅1
∗)
Examples: (over Σ = {𝑎𝑎, 𝑏𝑏, 𝑐𝑐}) (with simplified notation)
𝑎𝑎𝑏𝑏 𝑎𝑎𝑏𝑏∗𝑐𝑐 ∪ 𝑎𝑎∗𝑏𝑏 ∗ ∅
2/10/2021 CS332 – Theory of Computation 2
Regular Expressions – Semantics
𝐿𝐿(𝑅𝑅) = the language a regular expression describes
1. 𝐿𝐿(∅) = ∅
2. 𝐿𝐿 𝜀𝜀 = 𝜀𝜀
3. 𝐿𝐿(𝑎𝑎) = {𝑎𝑎} for every 𝑎𝑎 ∈ Σ
4. 𝐿𝐿((𝑅𝑅1∪ 𝑅𝑅2)) = 𝐿𝐿(𝑅𝑅1) ∪ 𝐿𝐿(𝑅𝑅2)
5. 𝐿𝐿((𝑅𝑅1∘ 𝑅𝑅2)) = 𝐿𝐿(𝑅𝑅1) ∘ 𝐿𝐿(𝑅𝑅2)
6. 𝐿𝐿 𝑅𝑅1
∗ = (𝐿𝐿 𝑅𝑅1 )∗
Example: 𝐿𝐿 𝑎𝑎∗𝑏𝑏∗ = {𝑎𝑎𝑚𝑚𝑏𝑏𝑛𝑛 ∣ 𝑚𝑚,𝑛𝑛 ≥ 0}
2/10/2021 CS332 – Theory of Computation 3
Regular Expressions Describe Regular Languages
Theorem: A language 𝐴𝐴 is regular if and only if it is
described by a regular expression
Theorem 1: Every regular expression has an equivalent NFA
Theorem 2: Every NFA has an equivalent regular expression
2/10/2021 CS332 – Theory of Computation 4
Regular expression -> NFA
Theorem 1: Every regex has an equivalent NFA
Proof: Induction on size of a regex
Base cases:
𝑅𝑅 = ∅
𝑅𝑅 = 𝜀𝜀
𝑅𝑅 = 𝑎𝑎
2/10/2021 CS332 – Theory of Computation 5
Regular expression -> NFA
Theorem 1: Every regex has an equivalent NFA
Proof: Induction on size of a regex
Inductive step:
𝑅𝑅 = (𝑅𝑅1∪ 𝑅𝑅2)
𝑅𝑅 = (𝑅𝑅1𝑅𝑅2)
𝑅𝑅 = 𝑅𝑅1
∗
2/10/2021 CS332 – Theory of Computation 6
Example Convert (1(0 ∪ 1))∗ to an NFA
2/10/2021 CS332 – Theory of Computation 7
Regular Expressions Describe Regular Languages
Theorem: A language 𝐴𝐴 is regular if and only if it is
described by a regular expression
Theorem 1: Every regular expression has an equivalent NFA
Theorem 2: Every NFA has an equivalent regular expression
2/10/2021 CS332 – Theory of Computation 8
NFA -> Regular expression
Theorem 2: Every NFA has an equivalent regex
Proof idea: Simplify NFA by “ripping out” states one at a
time and replacing with regexes
2/10/2021 CS332 – Theory of Computation 9
0
1
001*0
Generalized NFAs
• Every transition is labeled by a regex
• One start state with only outgoing transitions
• Only one accept state with only incoming transitions
• Start state and accept state are distinct
2/10/2021 CS332 – Theory of Computation 10
𝑞𝑞
𝑎𝑎∗𝑏𝑏
𝑞𝑞𝑠𝑠 𝑞𝑞𝑎𝑎
𝑎𝑎 ∪ 𝑏𝑏
𝑎𝑎
Generalized NFA Example
2/10/2021 CS332 – Theory of Computation 11
𝑅𝑅(𝑞𝑞𝑠𝑠, 𝑞𝑞) =
𝑅𝑅(𝑞𝑞𝑎𝑎, 𝑞𝑞) =
𝑅𝑅(𝑞𝑞, 𝑞𝑞𝑠𝑠) =
𝑞𝑞
𝑎𝑎∗𝑏𝑏
𝑞𝑞𝑠𝑠 𝑞𝑞𝑎𝑎
𝑎𝑎 ∪ 𝑏𝑏
𝑎𝑎
Which of these strings is accepted?
Which of the following strings is accepted by this GNFA?
2/10/2021 CS332 – Theory of Computation 12
a) 𝑎𝑎𝑎𝑎𝑎𝑎
b) 𝑎𝑎𝑎𝑎𝑏𝑏𝑏𝑏
c) 𝑏𝑏𝑏𝑏𝑏𝑏
d) 𝑏𝑏𝑏𝑏𝑎𝑎
𝑞𝑞
𝑎𝑎∗𝑏𝑏
𝑞𝑞𝑠𝑠 𝑞𝑞𝑎𝑎
𝑎𝑎 ∪ 𝑏𝑏
𝑎𝑎
NFA -> Regular expression
2/10/2021 CS332 – Theory of Computation 13
NFA GNFA
GNFA
GNFA
Regex
𝑘𝑘 states
𝑘𝑘 + 2 states
𝑘𝑘 + 1 states
2 states
…
NFA -> GNFA
2/10/2021 CS332 – Theory of Computation 14
NFAε
ε
ε
ε
• Add a new start state with no incoming arrows.
• Make a unique accept state with no outgoing arrows.
GNFA -> Regular expression
Idea: While the machine has more than 2 states, rip one
out and relabel the arrows with regexes to account for the
missing state
2/10/2021 CS332 – Theory of Computation 15
𝑎𝑎∗𝑏𝑏
𝑞𝑞1 𝑞𝑞3
𝑎𝑎
𝑞𝑞2
𝑞𝑞1 𝑞𝑞3
GNFA -> Regular expression
Idea: While the machine has more than 2 states, rip one
out and relabel the arrows with regexes to account for the
missing state
2/10/2021 CS332 – Theory of Computation 16
𝑎𝑎∗𝑏𝑏
𝑞𝑞1 𝑞𝑞3
𝑎𝑎
𝑞𝑞2
𝑎𝑎 ∪ 𝑏𝑏
𝑞𝑞1 𝑞𝑞3
GNFA -> Regular expression
Idea: While the machine has more than 2 states, rip one
out and relabel the arrows with regexes to account for the
missing state
2/10/2021 CS332 – Theory of Computation 17
𝑎𝑎∗𝑏𝑏
𝑞𝑞1 𝑞𝑞3
𝑎𝑎
𝑞𝑞2
𝑎𝑎 ∪ 𝑏𝑏
𝑏𝑏
𝑞𝑞1 𝑞𝑞3
GNFA -> Regular expression
Idea: While the machine has more than 2 states, rip one
out and relabel the arrows with regexes to account for the
missing state
2/10/2021 CS332 – Theory of Computation 18
𝑞𝑞1 𝑞𝑞3𝑞𝑞2
𝑅𝑅1
𝑅𝑅2
𝑅𝑅3
𝑅𝑅4
𝑞𝑞1 𝑞𝑞3
2/10/2021 CS332 – Theory of Computation 19
Non-Regular Languages
2/10/2021 CS332 – Theory of Computation 20
Motivating Questions
• We’ve seen techniques for showing that languages are
regular
• How can we tell if we’ve found the smallest DFA
recognizing a language?
• Are all languages regular? How can we prove that a
language is not regular?
2/10/2021 CS332 – Theory of Computation 21
An Example
𝐴𝐴 = 𝑤𝑤 ∈ {0, 1}∗ 𝑤𝑤 ends with 01
Claim: Every DFA recognizing 𝐴𝐴 needs at least 3 states
Proof: Let 𝑀𝑀 be any DFA recognizing 𝐴𝐴. Consider running
𝑀𝑀 on each of 𝑥𝑥 = 𝜀𝜀,𝑦𝑦 = 0,𝑤𝑤 = 01
2/10/2021 CS332 – Theory of Computation 22
A General Technique
Definition: Strings 𝑥𝑥 and 𝑦𝑦 are distinguishable by 𝐿𝐿 if
there exists 𝑧𝑧 such that exactly one of 𝑥𝑥𝑧𝑧 or 𝑦𝑦𝑧𝑧 is in 𝐿𝐿.
Ex. 𝑥𝑥 = 𝜀𝜀, 𝑦𝑦 = 0
Definition: A set of strings 𝑆𝑆 is pairwise distinguishable by
𝐿𝐿 if every pair of distinct strings 𝑥𝑥,𝑦𝑦 ∈ 𝑆𝑆 is distinguishable
by 𝐿𝐿.
Ex. 𝑆𝑆 = {𝜀𝜀, 0, 01}
2/10/2021 CS332 – Theory of Computation 23
𝐴𝐴 = 𝑤𝑤 ∈ {0, 1}∗ 𝑤𝑤 ends with 01
A General Technique
Theorem: If 𝑆𝑆 is pairwise distinguishable by 𝐿𝐿, then every
DFA recognizing 𝐿𝐿 needs at least |𝑆𝑆| states
Proof: Let 𝑀𝑀 be a DFA with < 𝑆𝑆 states. By the
pigeonhole principle, there are 𝑥𝑥,𝑦𝑦 ∈ 𝑆𝑆 such that 𝑀𝑀 ends
up in same state on 𝑥𝑥 and 𝑦𝑦
2/10/2021 CS332 - Theory of Computation 24
Back to Our Example
𝐴𝐴 = 𝑤𝑤 ∈ {0, 1}∗ 𝑤𝑤 ends with 01
Theorem: If 𝑆𝑆 is pairwise distinguishable by 𝐿𝐿, then every
DFA recognizing 𝐿𝐿 needs at least |𝑆𝑆| states
𝑆𝑆 = {𝜀𝜀, 0, 01}
2/10/2021 CS332 - Theory of Computation 25
Another Example
𝐵𝐵 = 𝑤𝑤 ∈ {0, 1}∗ 𝑤𝑤 = 2
Theorem: If 𝑆𝑆 is pairwise distinguishable by 𝐿𝐿, then every
DFA recognizing 𝐿𝐿 needs at least |𝑆𝑆| states
𝑆𝑆 = { }
2/10/2021 CS332 - Theory of Computation 26
Distinguishing Extension
Which of the following is a distinguishing extension for 𝑥𝑥 =
0 and 𝑦𝑦 = 00 for language 𝐵𝐵 = 𝑤𝑤 ∈ {0, 1}∗ 𝑤𝑤 = 2 ?
a) 𝑧𝑧 = 𝜀𝜀
b) 𝑧𝑧 = 0
c) 𝑧𝑧 = 1
d) 𝑧𝑧 = 00
2/10/2021 CS332 - Theory of Computation 27
Non-Regularity
Theorem: If 𝑆𝑆 is pairwise distinguishable by 𝐿𝐿, then every
DFA recognizing 𝐿𝐿 needs at least |𝑆𝑆| states
Corollary: If 𝑆𝑆 is an infinite set that is pairwise
distinguishable by 𝐿𝐿, then no DFA recognizes 𝐿𝐿
2/10/2021 CS332 - Theory of Computation 28
The Classic Example
Theorem: 𝐴𝐴 = 0𝑛𝑛1𝑛𝑛 𝑛𝑛 ≥ 0} is not regular
Proof: Construct an infinite pairwise distinguishable set
2/10/2021 CS332 - Theory of Computation 29
BU CS 332 – Theory of Computation
Regular Expressions – Syntax
Regular Expressions – Semantics
Regular Expressions Describe Regular Languages
Regular expression -> NFA
Regular expression -> NFA
Example
Regular Expressions Describe Regular Languages
NFA -> Regular expression
Generalized NFAs
Generalized NFA Example
Which of these strings is accepted?
NFA -> Regular expression
NFA -> GNFA
GNFA -> Regular expression
GNFA -> Regular expression
GNFA -> Regular expression
GNFA -> Regular expression
Slide Number 19
Non-Regular Languages
Motivating Questions
An Example
A General Technique
A General Technique
Back to Our Example
Another Example
Distinguishing Extension
Non-Regularity
The Classic Example