FIT2014 Theory of Computation Lecture 17 The Pumping Lemma for Context-Free Languages
Monash University
Faculty of Information Technology
FIT2014 Theory of Computation
Lecture 17
The Pumping Lemma for Context-Free Languages
slides by
COMMONWEALTH OF AUSTRALIA
Copyright Regulations 1969
Warning
This material has been reproduced and communicated to you by or on behalf of Monash University
in accordance with s113P of the Copyright Act 1968 (the Act).
The material in this communication may be subject to copyright under the Act.
Any further reproduction or communication of this material by you may be the subject of copyright protection under the Act.
Do not remove this notice.
1 / 22
Overview
I Pumping Lemma for CFLs
I Proof
I application:
showing that some languages
are not context-free
all languages
CFLs
regular
languages
?
2 / 22
Pumping Lemma for Regular Languages (paraphrased)
Recall:
If a Finite Automaton
with N states
accepts
a sufficiently long string,
then the path taken by the string
in the FA
contains a repeated state.
This enables us to “pump” the string —
by repeating one substring —
to generate an infinite family
of members of the language.
3 / 22
EQUAL
1. S → ε
2. S → bA
3. S → aB
4. A → a
5. A → aS
6. A → bAA
7. B → b
8. B → bS
9. B → aBB
Parse tree for bbaaParse tree for bbbaaaParse tree for bbbbaaaaParse tree for baParse tree for bbaaParse tree for bbbaaaParse tree for bbbbaaaaParse tree for ba
S
b A
b A A
aab A A
aab A A
aa
a
4 / 22
Nonterminal repetition in parse tree paths
S
E
E
u zv yx
uvxyz
S
E
E
E
u zv y
v yx
uv2xy2z
S
E
E
E
E
u zv y
v y
v yx
uv3xy3z
5 / 22
Nonterminal repetition in parse tree paths
S
E
E
u zv yx
uvxyz
S
E
u z
x
uxz
6 / 22
Nonterminal repetition in parse tree paths
In a parse tree:
If
length of a root-to-leaf path > # nonterminals
then
some nonterminal appears twice on that path.
Nonterminals: S , E , T , F
S
E
T
F
E
4
7 / 22
Nonterminal repetition in parse tree paths
How can we ensure that this happens?
How to guarantee that the parse tree for a sufficiently long string
has a path with a repeated state?
Consider:
length of a path from root to leaf = # non-leaf nodes in that path.
Each non-leaf node has a nonterminal symbol.
If
max root-to-leaf path length > # nonterminal symbols in the grammar
then some nonterminal symbol occurs twice on that path.
How to guarantee that the parse tree for a sufficiently long string has a sufficiently
long root-to-leaf path?
8 / 22
Nonterminal repetition in binary parse tree paths
Let’s use binary parse trees
Binary parse tree:
2max path length ≥ # leaves = word length
If
word length > 2# nonterminals
then
2max path length > 2# nonterminals ,
∴ max path length > # nonterminals
and so there will be a repeated symbol
in a root-to-leaf path.
S
T T
U U U U
a b b a b a a b
9 / 22
Nonterminal repetition in binary parse tree paths
Let’s use binary parse trees, from Chomsky Normal Form grammars!
Binary CNF parse tree:
2max path length−1 ≥ # leaves = word length
If
word length > 2# nonterminals−1
then
2max path length−1 > 2# nonterminals−1,
∴ max path length > # nonterminals
and so there will be a repeated symbol
in a root-to-leaf path.
S
T T
U U U U
a b b a
10 / 22
Pumping Lemma for CFLs
Let L be any context-free language that has a CFG in CNF with k non-terminal symbols.
Then for every word w ∈ L with > 2k−1 letters,
there exist strings u, v , x , y , z with vy 6= ε︸ ︷︷ ︸
(i.e., v , y not both ε)
such that
I w = uvxyz
I |vxy | ≤ 2k , and
I for all i ≥ 0, uv ixy iz ∈ L,
i.e.,
uxz , uvxyz , uvvxyyz , . . . , uvnxynz , . . . ∈ L.
Symbolically:
∀w ∈ L : |w | ≥ 2k−1 ⇒
(
∃u, v , x , y , z : (w = uvxyz) ∧ (vy 6= ε) ∧ (|vxy | ≤ 2k)
∧ (∀i ≥ 0 : uv ixy iz ∈ L)
)
11 / 22
Pumping Lemma for CFLs
Proof. (outline)
Take any word w ∈ L with > 2k−1 letters.
Let T be a parse tree for w , using the CNF CFG for L.
By our earlier Observations on root-to-leaf paths in CNF parse trees,
some root-to-leaf path P in T contains a repeated nonterminal symbol.
Among all pairs of nodes in P containing the same nonterminals,
choose the pair q, r , with q above r , such that q is as far as possible down the path P.
This ensures all nonterminals below q on P are distinct.
(Reason to be revealed later.)
12 / 22
Pumping Lemma for CFLs
Reading the letters of w from left to right, from the leaves of the tree, define:
u be the letters of w to the left of the subtree Tq rooted at node q.
v be the letters at the leaves of Tq that are to the left of the subtree Tr rooted at r .
x be the letters at the leaves of Tr .
y be the letters of Tq that are to the right of the subtree Tr .
z be the remaining letters of w , i.e., those to the right of Tq.
T
S
q
P
r
u zv yx
Tq
q
r
v yx
Tr
r
x
13 / 22
Pumping Lemma for CFLs
We have:
I w = uvxyz by construction.
I Since q, r are distinct nodes of the path P with q above r ,
the tree Tr is a proper subtree of Tq.
I Furthermore, since the grammar is in CNF, q has two children,
and only one of them is above r ,
so Tq has some leaves that do not belong to Tr .
I Therefore vy 6= ε.
14 / 22
Pumping Lemma for CFLs
I By our choice of q and r , all nonterminals appearing below q on P are distinct.
I Since we have k nonterminals altogether,
the subpath of P from q downwards has ≤ k + 2 nodes
(being q, then at most k nonterminals, then the leaf).
I Therefore it has length ≤ k + 1.
Therefore Tq has ≤ 2k leaves.
These leaves are the strings v , x , y , in order.
I Therefore |vxy | ≤ 2k .
15 / 22
Pumping Lemma for CFLs
I Replacement of Tq by Tr gives a parse tree for uxz .
I Replacement of Tr by Tq in T gives a parse tree for uvvxyyz .
I The new copy of Tq contains a copy of Tr .
Replacing that copy of Tr by Tq gives a parse tree for uvvvxyyyz .
I Any parse tree with a copy of Tr can be enlarged, to be a parse tree
of a longer string, by replacing Tr by Tq.
I These observations can be turned into a full formal proof by induction.
16 / 22
A Tale of Two Pumping Lemmas
If a Finite Automaton If a Context-Free Grammar in CNF
with N states with k nonterminals
accepts generates
a sufficiently long string, a sufficiently long string,
then the path taken by the string then some root-to-leaf path
in the FA in the parse tree
contains a repeated state. contains a repeated nonterminal.
This enables us to “pump” the string — This enables us to “pump” the string —
by repeating one substring — by repeating two substrings —
to generate an infinite family to generate an infinite family
of members of the language. of members of the language.
17 / 22
Pumping Lemma for CFLs: application
Consequence
Using the Pumping Lemma for CFLs we can show there are non-context-free languages.
Method
Assume L is context-free.
Then it has a Context-Free Grammar in CNF.
Let k be the number of nonterminal symbols in this CFG.
Choose a suitable word w ∈ L, of length > 2k−1.
Show that, for any u, v , x , y , z such that w = uvxyz and vy 6= ε and |vxy | ≤ 2k . . .
. . . there exists i ≥ 0 s.t. uv ixy iz 6∈ L.
Contradiction.
Compare quantifiers above with those in the Pumping Lemma for CFLs.
18 / 22
Non-context-free languages
L := {anbnan : n ≥ 0} = {ε, aba, aabbaa, aaabbbaaa, . . .}.
Theorem.
L is not context-free.
Proof. (by contradiction)
Assume that L is context-free. Then it has a CFG.
Then there is a CFG in Chomsky Normal Form that generates L \ {ε}.
Let k = # nonterminals in this CNF CFG.
Take N > 2k−1/3.
Choose w = aNbNaN .
Consider any u, v , x , y , z such that
I vy 6= ε,
I |vxy | ≤ 2k , and
I w = uvxyz .
Think: are uxz , uvxyz , uvvxyyz , . . . , uv ixy iz , . . . all in L?
19 / 22
Non-context-free languages
Case 1: v and y are each all a’s, or all b’s, or empty.
For example:
N letters︷ ︸︸ ︷
aaa · · · · · · · · · ·︸ ︷︷ ︸
v
· · aa
N letters︷ ︸︸ ︷
bbb · · · · · ·︸︷︷︸
y
· · · · · · bb
N letters︷ ︸︸ ︷
aaa · · · · · · · · · · · · aa
Then uvvxyyz can no longer have three equal-length stretches of a’s and b’s, since:
I The two strings v , y must each lie entirely within one stretch,
and there are three stretches,
so one of these stretches is unaltered by pumping.
I But at least one of the other stretches is lengthened, because vy 6= ε.
So uv2xy2z 6∈ L.
20 / 22
Non-context-free languages
Case 2: Either v or y contains ab.
For example: aaa · · · · · · · · · · · · aabbb · · ·︸ ︷︷ ︸
v
· · · · · · · · · bbaaa · · · · · · · · · · · · aa
Then uvvxyyz has two occurrences of ab.
This cannot happen for strings in L. So uv2xy2z 6∈ L.
Case 3: Either v or y contains ba.
For example: aaa · · · · · · · · · · · · aabbb · · · · · · · · · · · · bbaaa · · ·︸ ︷︷ ︸
v
· · · · · · · · · aa
Similar argument to Case 2.
In every possible case, we have found an i such that uv ixy iz 6∈ L.
This violates the conclusion of the Pumping Lemma for CFLs.
Contradiction.
21 / 22
Revision
Reading: Sipser, pp. 125–129.
all languages
CFLs
regular
languages
?
22 / 22