Pumping Lemma
Overview
The Pumping Lemma, whether for regular languages or context-free languages, is a necessary property of languages. In other words, any regular language must obey the conditions of the Pumping Lemma for regular languages, and similarly any context-free language must obey the conditions of the Pumping Lemma for context-free languages. In particular, the Pumping Lemma shows that if there is a ”long enough” string in the language, then there is an infinite number of further strings in the language, which follow a particular set of constraints (given in the statement of it).
The usual use of the Pumping Lemma is to show that a given language is either not regular or not context-free, by using a proof by contradiction. This follows the basic form below.
1. Assume the language L is regular (resp. context-free).
2. Apply the Pumping Lemma
3. Find w and i such that w obeys the constraints, but that xyiz (resp. xyizuiv) is not in L 4. This is a contradiction, so L is not regular (resp. context-free).
Pumping Lemma for Regular Languages Statement
Pumping Lemma for Regular Languages: Let L be a regular language. Then there is an integer n ≥ 1 such that for all strings w ∈ L such that |w| ≥ n, there exist strings x, y and z such that w = xyz and
1. |xy| ≤ n
2. y ̸= λ
3. xyiz∈Lforalli≥0
Note that condition 2 is sometimes stated equivalently as length(y) > 0. Justification
The number n in the statement of the Pumping Lemma is the number of states in a DFA that accepts L. For any string w accepted by this DFA with |w| ≥ n, there must be a cycle in the sequence of states. This generates the string y as above, with the strings x and z forming the parts of the string before and after the cycle respectively.
Use
As for the Pumping Lemma for regular languages, the main use is to show languages are not context-free using a proof by contradiction.
The general form of the proof is as follows. 1. Assume L is context-free
2. Apply Pumping Lemma
3. Choose string w ∈ L
4. Use |xy| ≤ n to get information about y
1
5. Choose i such that xyiz ̸∈ L (i = 2 usually works) 6. Contradiction!
From this, we conclude that L is not regular. All such proofs are the same except for steps 3 and 5.
Template
A template for all such proofs is below.
Claim: The language L = … is not regular.
Proof: Assume L is regular. Then the Pumping Lemma applies, and so there is an n ≥ 1 such that for all w ∈ L such that |w| ≥ n, we have w = xyz where
1. |xy| ≤ n
2. y ̸= λ
3. xyiz∈Lforalli≥0
Let w=STRING,andsow∈Land|w|≥n. SobythePumpingLemma, w=xyz=STRING and |xy| ≤ n.
Now as |xy| ≤ n, this means that y … (some information about y). Use information about y to find an i such that xyiz ̸∈ L.
This may involve analysing different cases. A good heuristic is to use i = 2. So we have xyiz ̸∈ L, which is a contraction. Hence L is not regular. QED
Pumping Lemma for Context-free Languages Statement
Pumping Lemma for Context-free Languages: Let L be a context-free language. Then there is an integer n ≥ 1 such that for all strings w ∈ L such that |w| ≥ n, there exist strings x,y,z,u and v such that w = xyzuv and
1. |yzu| ≤ n
2. y̸=λoru̸=λ
3. xyizuiv∈Lforalli≥0
Note that condition 2 is sometimes stated equivalently as length(y) + length(v) > 0, or that it is not the case that both y and v are λ.
Justification
The number n here is basically the number of variables in the grammar for the language when the grammar is in Chomsky normal form. In such a grammar, all derivations are in the form of a binary tree. This means that we can relate the depth of the derivation to the length of the string generated.
Chomsky normal form: A grammar G is in Chomsky normal form if every rule in the grammar is in one of the three following forms:
S→λandthisistheonlyoccurrenceofλinanyrule A→awhereaisaterminal
2
A → BC where B and C are non-terminals
So the only rule that can have an empty body is S → w and every other rule has a body either a
single terminal or exactly two non-terminals.
Lemma: For any context-free grammar G there is an equivalent grammar G′ in Chomsky normal form.
The proof of this can be found in Sudkamp’s book. The key use of Chomsky normal form is as follows. Lemma: Let G be a context-free grammar in Chomsky normal form, and A ⇒∗ w with the (binary)
derivation tree T . If depth(T ) ≤ n, then length(w) ≤ 2n − 1.
Proof: By induction on the depth of T.
Base case: depth(T) = 1 so the derivation is either S ⇒ λ or A ⇒ a. In either case we have length(w) ≤ 1 = 21 − 1, and so the base case holds.
Inductive case: We assume the result holds for all derivations of depth n. Let A ⇒∗ w be a deriva- tionofdepthn+1. AsGisinChomskynormalform,wemusthaveA⇒BC⇒∗ uvwhereB⇒∗ u and C ⇒∗ v. By the hypothesis the derivations B ⇒∗ u and C ⇒∗ v are of length n or less, and so length(u) ≤ 2n−1 and length(v) ≤ 2n−1, and so length(uv) ≤ 2n−1 + 2n−1 = 2 × 2n−1 = 2n. QED
Corollary: Let G be a context-free grammar in Chomsky normal form, and S ⇒∗ w be a derivation of w. If length(w) ≥ 2n, then the derivation tree has depth at least n + 1.
This is basically the contrapositive of the above lemma. From this we get the Pumping Lemma as follows. Let S ⇒ w where |w| = 2|V |. Then the depth of the derivation tree must be at least |V |+1, which means that there is a branch in this tree with a repeated variable, say A. This part of the derivation can be repeated 0, 1, 2, or an arbitrary number of times. This means that if w = xyzuv, we have |yzu| ≤ n (as this is the ’repeating part’, which can’t use more then n variables), y ̸= λ or u ̸= λ (as the number of steps in the derivation is non-zero, and the only occurrence of λ can be in steps of length 1), and xyizuiv ∈ L for all i ≥ 0 (which comes from the ability to use A in the derivation 1, 2, or an arbitrarily large number of times.
Use
As for the Pumping Lemma for regular languages, the main use is to show languages are not context-free using a proof by contradiction.
The general form of the proof is as follows. 1. Assume L is context-free
2. Apply Pumping Lemma
3. Choose string w ∈ L
4. Use |yzu| ≤ n to get information about y and u
5. Choose i such that xyizuiv ̸∈ L (i = 2 usually works) 6. Contradiction!
From this, we conclude that L is not context-free. All such proofs are the same except for steps 3 and 5.
3
Template
A template for all such proofs is below.
Claim: The language L = … is not context-free.
Proof: Assume L is context-free. Then the Pumping Lemma applies, and so there is an n ≥ 1 such that for all w ∈ L such that |w| ≥ n, we have w = xyzuv where
1. |yzu| ≤ n
2. y̸=λoru̸=λ
3. xyizuiv∈Lforalli≥0
Letw=STRING,andsow∈Land|w|≥n. SobythePumpingLemma,w=xyzuv=STRING and |yzu| ≤ n.
Now as |yzu| ≤ n, this means that y and u … (some information about y and u). Use information about y and u to find an i such that xyizuiv ̸∈ L.
This often involves analysing different cases. A good heuristic is to use i = 2.
So we have xyizuiv ̸∈ L, which is a contraction. Hence L is not context-free. QED
Examples
Claim: The language L = {aibici|i ≥ 0} is not context-free.
Proof: Assume L is context-free. Then the Pumping Lemma applies, and so there is an ngeq1 such
that for all w ∈ L such that |w| ≥ n, we have w = xyzuv where 1. |yzu| ≤ n
2. y̸=λoru̸=λ
3. xyizuiv∈Lforalli≥0
Letw=anbncn,andsow∈Land|w|≥n. SobythePumpingLemma,w=xyzuv=anbncn and |yzu| ≤ n.
Now as |yzu| ≤ n, this means that y and u can only contain two different symbols – they are too short to contains a’s and b’s and c’s, as this would require them to have length at least n + 2.
Now if y or u contains ab or ba, then xyyzuuv will contain ba somewhere. This means that xyyzuu ̸∈ L.
So that means y and u can contain only a’s or only b’s or only c’s. Now we need to work through the possibilities here.
If y contains only a’s and u contains only a’s or b’s, then xyyzuuv will contain more a’s then w, but the same number of c’s as w, and so xyyzuuv ̸∈ L.
If y contains only b’s and u contains only b’s or c’s, then xyyzuuv will contain more b’s then w, but the same number of a’s as w, and so xyyzuuv ̸∈ L.
If y contains only c’s and u contains only c’s, then xyyzuuv will contain more c’s then w, but the same number of a’s and b’s as w, and so xyyzuuv ̸∈ L.
So this exhausts all the possibilities for y and u, given that |yzu| ≤ n. So we have a contradiction, which means that L is not context-free. QED
Claim: The language L = {aibjaibj|i,j ≥ 0} is not context-free.
Proof: Assume L is context-free. Then the Pumping Lemma applies, and so there is an ngeq1 such that for all w ∈ L such that |w| ≥ n, we have w = xyzuv where
4
1. |yzu| ≤ n
2. y̸=λoru̸=λ
3. xyizuiv∈Lforalli≥0
Letw=anbnanbn,andsow∈Land|w|≥n. SobythePumpingLemma,w=xyzuv=anbnanbn and |yzu| ≤ n.
Nowas|yzu|≤n,thismeansthatyanduareeachoftheformakbl orbkal,wherek+l>0,ienot both k and l are 0.
Now if y or u contains ab or ba, then xyyzuuv will not be of the form aibjaibj, as there will be at least three non-empty sequences of a’s and b’s.
So that means y and u can contain only a’s or only b’s. Now we need to work through the possibilities here.
If y contains only a’s and u contains only a’s, then as |yzu| < n, y and u must both belong to the first cluster of a’s, or both to the second cluster. In the first case, xyyzuuv will have a mismatched number of a’s (more in the first cluster than the second) and so xyyzuuv ̸∈ L. In the second case, xyyzuuv will have a mismatched number of a’s (more in the second cluster than the first) and so xyyzuuv ̸∈ L.
If y contains only a’s and u contains only b’s, then either both belong to the first cluster of a’s and b’s, or both belong to the second cluster of a’s and b’s. In the first case, xyyzuuv will have a mismatched number of a’s or b’s (more in the first cluster than the second), and so xyyzuuv ̸∈ L. In the second case, xyyzuuv will have a mismatched number of a’s or b’s (more in the second cluster than the first) and so xyyzuuv ̸∈ L.
If y contains only b’s and u contains only a’s, then y must belong to the first cluster of b’s, and u to the second cluster of a’s. So xyyzuuv will have a mismatched number of b’s (more in the first cluster than the second) or a mismatched number of a’s (more in the second cluster than the first), and so xyyzuuv ̸∈ L.
If y contains only b’s and u contains only b’s, then as |yzu| < n, y and u must both belong to the first cluster of b’s, or both to the second cluster. In the first case, xyyzuuv will have a mismatched number of b’s (more in the first cluster than the second) and so xyyzuuv ̸∈ L. In the second case, xyyzuuv will have a mismatched number of b’s (more in the second cluster than the first) and so xyyzuuv ̸∈ L.
So this exhausts all the possibilities for y and u, given that |yzu| ≤ n. So we have a contradiction, which means that L is not context-free. QED
5