BU CS 332 – Theory of Computation
Lecture 6:
• NFAs -> Regular expressions • Context-free grammars
• Pumping lemma for CFLs
Reading:
Sipser Ch 1.3, 2.1, 2.3
Mark Bun February 10, 2020
Regular Expressions – Syntax
A regular expression 𝑅 is defined recursively using the following rules:
1. 𝜀, ∅, and 𝑎 are regular expressions for every 𝑎 ∈ Σ 2. If 𝑅 and 𝑅 are regular expressions, then so are
12
(𝑅 ∪𝑅 ),(𝑅 𝑅 ),and(𝑅∗) 12121
Examples: (over Σ = {𝑎, 𝑏, 𝑐})
𝑎𝑏 (𝑎𝑏∗ ∪ 𝑎∗𝑏)∗
∅∗
2/10/2020
CS332 – Theory of Computation
2
Regular Expressions – Semantics
𝐿(𝑅) = the language a regular expression describes
1. 𝐿(∅)=∅
2. 𝐿 𝜀 = 𝜀
3. 𝐿(𝑎) = {𝑎} for every 𝑎 ∈ Σ
4. 𝐿((𝑅 ∪𝑅 ))=𝐿(𝑅 )∪𝐿(𝑅 ) 1212
5. 𝐿((𝑅 𝑅 ))=𝐿(𝑅 )∘𝐿(𝑅 ) 1212
6.𝐿𝑅∗ =(𝐿𝑅)∗ 11
Example: 𝐿(𝑎∗𝑏∗) = {𝑎𝑚𝑏𝑛|𝑚, 𝑛 ≥ 0}
2/10/2020 CS332 – Theory of Computation 3
Regular Expressions Describe Regular Languages
Theorem: A language 𝐴 is regular if and only if it is described by a regular expression
Theorem 1: Every regular expression has an equivalent NFA Theorem 2: Every NFA has an equivalent regular expression
2/10/2020 CS332 – Theory of Computation 4
NFA -> Regular expression
Theorem 2: Every NFA has an equivalent regex
Proof idea: Simplify NFA by “ripping out” states one at a
time and replacing with regexes
0 01*0 0
1
2/10/2020 CS332 – Theory of Computation 5
Generalized NFAs
• Every transition is labeled by a regex
• One start state with only outgoing transitions
• Only one accept state with only incoming transitions • Start state and accept state are distinct
𝑎∪𝑏
𝑞 𝑎 𝑞𝑎
CS332 – Theory of Computation 6
𝑎∗𝑏 𝑞𝑠
2/10/2020
Generalized NFA Example
𝑎∪𝑏
𝑎∗𝑏 𝑞𝑠
𝑞 𝑎 𝑞𝑎 𝑅(𝑞𝑠,𝑞) =
𝑅(𝑞𝑎,𝑞) = 𝑅(𝑞,𝑞𝑠) =
2/10/2020
CS332 – Theory of Computation 7
NFA -> Regular expression
NFA
GNFA
𝑘 states
𝑘 + 2 states
𝑘 + 1 states
GNFA
GNFA
2 states
2/10/2020 CS332 – Theory of Computation
8
Regex
…
NFA -> GNFA
NFA
ε
ε
ε
ε
• Add a new start state with no incoming arrows.
• Make a unique accept state with no outgoing arrows.
2/10/2020 CS332 – Theory of Computation 9
GNFA -> Regular expression
Idea: While the machine has more than 2 states, rip one out and relabel the arrows with regexes to account for the missing state
𝑎∗𝑏 𝑞1
𝑞𝑎𝑞 23
𝑞1
𝑞3
2/10/2020
CS332 – Theory of Computation
10
GNFA -> Regular expression
Idea: While the machine has more than 2 states, rip one out and relabel the arrows with regexes to account for the
missing state
𝑎∪𝑏
𝑞𝑎𝑞 23
𝑎∗𝑏 𝑞1
𝑞1
𝑞3
2/10/2020
CS332 – Theory of Computation
11
GNFA -> Regular expression
Idea: While the machine has more than 2 states, rip one out and relabel the arrows with regexes to account for the
missing state
𝑎∪𝑏
𝑞𝑎𝑞 23
𝑏
𝑎∗𝑏 𝑞1
𝑞1
𝑞3
2/10/2020
CS332 – Theory of Computation
12
GNFA -> Regular expression
Idea: While the machine has more than 2 states, rip one out and relabel the arrows with regexes to account for the
missing state
𝑅2
𝑞 𝑅3 𝑞
𝑅4
𝑅1
𝑞1
23
𝑞1
𝑞3
2/10/2020
CS332 – Theory of Computation
13
Example
2/10/2020 CS332 – Theory of Computation 14
2/10/2020 CS332 – Theory of Computation 15
2/10/2020 CS332 – Theory of Computation 16
Context-Free Grammars
2/10/2020 CS332 – Theory of Computation 17
Some History
An abstract model for two distinct problems
Rules for parsing natural languages
2/10/2020 CS332 – Theory of Computation 18
Some History
An abstract model for two distinct problems
Specification of syntax and compilation for programming languages
1977 ACM Turing Award citation (John Backus)
For profound, influential, and lasting contributions to the design of practical high- level programming systems, notably through his work on FORTRAN, and for seminal publication of formal procedures for the specification of programming languages.
2/10/2020 CS332 – Theory of Computation 19
Context-Free Grammar (Informal)
Example Grammar 𝐺
𝐴 → 0𝐴1 𝐴→𝐵 𝐵→#
Derivation
𝐿(𝐺) =
2/10/2020
CS332 – Theory of Computation 20
Context-Free Grammar (Informal)
Example Grammar 𝐺
𝐸 →𝐸+𝑇 𝐸→𝑇 𝑇→𝑇×𝐹 𝑇→𝐹
𝐹 →(𝐸) 𝐹→𝑎 𝐹→𝑏
Derivation
𝐿(𝐺) =
2/10/2020
CS332 – Theory of Computation 21
Socially Awkward Professor Grammar
2/10/2020
CS332 – Theory of Computation 22
Socially Awkward Professor Grammar
2/10/2020 CS332 – Theory of Computation 23
Context-Free Grammar (Formal)
A CFG is a 4-tuple 𝐺 = 𝑉,Σ,𝑅,𝑆
• 𝑉 is a finite set of variables
• Σ is a finite set of terminal symbols (disjoint from 𝑉)
• 𝑅 is a finite set of production rules of the form 𝐴 → 𝑤, where𝐴∈𝑉and𝑤∈(𝑉∪ Σ)∗
• 𝑆 ∈ 𝑉 is the start symbol
Example: 𝐺 = ({𝑆},Σ,𝑅,𝑆) where 𝑅 = {𝑆 → 𝑎𝑆𝑏,𝑆 → 𝜀}
2/10/2020 CS332 – Theory of Computation 24
Context-Free Grammar (Formal)
A CFG is a 4-tuple 𝐺 = 𝑉,Σ,𝑅,𝑆
𝑉 = variables Σ = terminals 𝑅 = rules 𝑆 = start
• We say 𝑢𝐴𝑣 ⇒ 𝑢𝑤𝑣 (“𝑢𝐴𝑣 yields 𝑢𝑤𝑣”) if 𝐴 → 𝑤 is a rule of the grammar
• We say 𝑢 ∗ 𝑣 (“𝑢 derives 𝑣”) if 𝑢 = 𝑣 or there exists a ⇒
sequence such that 𝑢 ⇒ 𝑢1 ⇒ 𝑢2 ⇒ ⋯ ⇒ 𝑣
• Languageofthegrammar:𝐿 𝐺 ={𝑤∈Σ∗|𝑆 ∗ 𝑤}
⇒
Example: 𝐺 = ({𝑆},Σ,𝑅,𝑆) where 𝑅 = {𝑆 → 𝑎𝑆𝑏,𝑆 → 𝜀} 𝐿 𝐺 = 𝑎𝑛𝑏𝑛 𝑛 ≥ 0}
2/10/2020 CS332 – Theory of Computation 25
CFG Examples
Give context-free grammars for the following languages 1. The empty language
2. Strings of properly nested parentheses
3. Strings with equal # of 𝑎’s and 𝑏’s
2/10/2020 CS332 – Theory of Computation 26
Pumping Lemma II: Pump Harder
2/10/2020 CS332 – Theory of Computation 27
Non context-free languages?
• Could it be the case that every language is context-free? 2/10/2020 CS332 – Theory of Computation 28
Pumping Lemma for regular languages
Let 𝐿 be a regular language.
Then there exists a “pumping length” 𝑝 such that
For every 𝑤 ∈ 𝐿 where |𝑤| ≥ 𝑝,
𝑤 can be split into three parts 𝑤 = 𝑥𝑦𝑧 where:
1. |𝑦| > 0
2. |𝑥𝑦| ≤ 𝑝
3. 𝑥𝑦𝑖𝑧𝐿forall𝑖 ≥ 0
2/10/2020 CS332 – Theory of Computation 29
Pumping Lemma for context-free languages
Let 𝐿 be a context-free language.
Then there exists a “pumping length” 𝑝 such that
For every 𝑤 ∈ 𝐿 where |𝑤| ≥ 𝑝,
𝑤 can be split into five parts 𝑤 = 𝑢𝑣𝑥𝑦𝑧 where:
Example:
𝐿= 𝑤∈ 0,1∗𝑤=𝑤𝑅 𝑤=0
1. |𝑣𝑦| > 0
2. |𝑣𝑥𝑦| ≤ 𝑝
3. 𝑢𝑣𝑖𝑥𝑦𝑖𝑧𝐿forall𝑖 ≥ 0
2/10/2020 CS332 – Theory of Computation 30
Pumping Lemma for context-free languages
Let 𝐿 be a context-free language.
Then there exists a “pumping length” 𝑝 such that
For every 𝑤 ∈ 𝐿 where |𝑤| ≥ 𝑝,
𝑤 can be split into five parts 𝑤 = 𝑢𝑣𝑥𝑦𝑧 where:
Example:
𝐿= 𝑤∈ 0,1∗𝑤=𝑤𝑅 𝑤 = 010
1. |𝑣𝑦| > 0
2. |𝑣𝑥𝑦| ≤ 𝑝
3. 𝑢𝑣𝑖𝑥𝑦𝑖𝑧𝐿forall𝑖 ≥ 0
2/10/2020 CS332 – Theory of Computation 31
Pumping Lemma as a game
1. YOU pick the language 𝐿 to be proved non context-free.
2. ADVERSARY picks a possible pumping length 𝑝.
3. YOU pick 𝑤 of length at least 𝑝.
4. ADVERSARY divides 𝑤 into 𝑢, 𝑣, 𝑥, 𝑦, 𝑧, obeying rules of the Pumping Lemma: |𝑣𝑦| > 0 and |𝑣𝑥𝑦| ≤ 𝑝.
5. YOU win by finding 𝑖 ≥ 0, for which 𝑢𝑣𝑖𝑥𝑦𝑖𝑧 is not in 𝐿.
If regardless of how the ADVERSARY plays this game, you
can always win, then 𝐿 is non context-free
2/10/2020 CS332 – Theory of Computation 32
Pumping Lemma example
Claim: 𝐿 = 𝑎𝑛𝑏𝑛𝑐𝑛 𝑛 ≥ 0} is not regular
Proof: Assume 𝐿 is regular with pumping length 𝑝
1.Find𝑤∈𝐿with 𝑤 ≥𝑝
2. Show that 𝑤 cannot be pumped
If 𝑤 = 𝑢𝑣𝑥𝑦𝑧 with |𝑣𝑦| > 0,|𝑣𝑥𝑦| ≤ 𝑝, then…
2/10/2020 CS332 – Theory of Computation 33
Pumping Lemma example
Claim: 𝐿 = 𝑎𝑛𝑏𝑛𝑐𝑛 𝑛 ≥ 0} is not regular
Proof: Assume 𝐿 is regular with pumping length 𝑝
1.Find𝑤∈𝐿with 𝑤 ≥𝑝
2. Show that 𝑤 cannot be pumped
If 𝑤 = 𝑢𝑣𝑥𝑦𝑧 with |𝑣𝑦| > 0,|𝑣𝑥𝑦| ≤ 𝑝, then…
2/10/2020 CS332 – Theory of Computation 34
Pumping Lemma example
Claim: 𝐿 = 𝑎𝑛𝑏𝑛𝑐𝑛 𝑛 ≥ 0} is not regular
Proof: Assume 𝐿 is regular with pumping length 𝑝
1.Find𝑤∈𝐿with 𝑤 ≥𝑝
2. Show that 𝑤 cannot be pumped
If 𝑤 = 𝑢𝑣𝑥𝑦𝑧 with |𝑣𝑦| > 0,|𝑣𝑥𝑦| ≤ 𝑝, then…
2/10/2020 CS332 – Theory of Computation 35