Slide 1
Context-Free and
Noncontext-Free Languages
Chapter 13
Languages That Are and
Are Not Context-Free
a*b* is regular.
AnBn = {anbn : n 0} is context-free but not regular.
AnBnCn = {anbncn : n 0} is not context-free.
Languages and Machines
The Regular and the CF Languages
Theorem: The regular languages are a proper subset of the context-free languages.
Proof: In two parts:
Every regular language is CF.
Every regular grammar is context-free.
or
Every FSM is a PDA (that is, ignoring its stack).
There exists at least one language that is CF but not regular.
AnBn
How Many Context-Free Languages Are There?
Theorem: There is a countably infinite number of CFLs.
Proof:
● Upper bound: we can lexicographically enumerate
all the CFGs.
● Lower bound: {a}, {aa}, {aaa}, … are all CFLs.
So there must exist some languages that are not context-free:
{anbncn : n 0}
Showing that L is Context-Free
Techniques for showing that a language L is context-free:
1. Exhibit a context-free grammar for L.
2. Exhibit a PDA for L.
3. Use the closure properties of context-free languages.
Unfortunately, these are weaker than they are for
regular languages.
Showing that L is Not Context-Free
Remember the pumping argument for regular languages:
134.unknown
A Review of Parse Trees
A parse tree, derived by a grammar G = (V, , R, S), is a rooted, ordered tree in which:
● Every leaf node is labeled with an element of {},
● The root node is labeled S,
● Every other node is labeled with some element of V – ,
● If m is a nonleaf node labeled X and the children of m are labeled x1, x2, …, xn, then the rule X x1, x2, …, xn is in R.
Example
E E + T
E T
T T * F
T F
F (E)
F id
Some Tree Basics
The height of a tree is the length of the longest path from the root to any leaf.
The branching factor of a tree is the largest number of sons associated with any node in the tree.
Theorem: The length of the yield of any tree T with height h and branching factor b is bh.
From Grammars to Trees
Given a context-free grammar G:
● Let n be the number of nonterminal symbols in G.
● Let b be the branching factor of G, i.e., max{|| : A in G}
Suppose that T is generated by G and no nonterminal appears more than once on any path:
The maximum height of T is:
The maximum length of T’s yield is:
n
bn
135.unknown
The Context-Free Pumping Theorem
This time we use parse trees, not automata as the basis for our argument.
If w is “long” , then its parse trees must look like:
Choose one such tree such that there’s no other with fewer nodes.
Choose the X’s the bottommost instances of a repeating nonterminal.
(|w| > bn)
The Context-Free Pumping Theorem
We have the derivations:
S * uXz * uxz L(G)
S * uXz * uvXyz * uvvXyyz * uvvxyyz L(G)
We can similarly derive all the strings: uv2xy2z, uv3xy3z, … L(G)
Thus: q 0, uvqxyqz L(G)
The Context-Free Pumping Theorem
vy ≠
Proof: If vy = , then the derivation S * uXz * uxz would also yield w and it would create a parse tree with fewer nodes. But that contradicts the assumption that we started with a tree with the smallest possible number of nodes.
The Context-Free Pumping Theorem
The height of the subtree rooted at X[1] is at most: n + 1 (because the X’s are the bottommost repeated nonterminals)
So |vxy| bn + 1.
The Context-Free Pumping Theorem
If L is a context-free language, then k 1 such that any w L with |w| k can be written as w = uvxyz, for some u, v, x, y, z *, such that
vy ,
|vxy| k and
q 0, uvqxyqz L
k serves two roles:
● How long must w be to guarantee it is pumpable?
● What’s the bound on |vxy|?
Let n be the number of nonterminals in G.
Let b be the branching factor of G.
What Is k?
If height(T) > n, then some nonterminal occurs more than once on some path. So T is pumpable.
If height(T) n, then |uvxyz| bn.
So if |w| = |uvxyz| > bn, w = uvxyz must be pumpable.
How Long Must w be?
Recall that we are considering the bottom-most two instances of a repeated nonterminal. Then the yield of the upper one has length at most bn+1. That is, |vxy| ≤ bn+1.
So, we need k = max(bn,bn+1); let k = bn+1.
What’s the Bound on |vxy|?
Example
grammar:
E E + T
E T
T T * F
T F
F (E)
F id
n = 3
b = 3
k = 34 = 81
string
w = id + id * id
w = uvxyz
u = id+
v =
x = id
y = *id
z =
vy ,
|vxy| k
q 0:
uvqxyqz = id+id(*id)q is
in the language L(G)
E
E
T [1]
+
T
F
id
T [2]
F
id
F
id
*
u
v
x
y
z
An Example of Pumping: AnBnCn
AnBnCn = {anbncn, n 0}
Choose w = ak bk ck = uvxyz
1 | 2 | 3
If v or y spans regions, then for q = 2 the order of the letters changes.
If v and y each contain only one letter, then for q = 2 the numbers of letters are not the same.
An Example of Pumping: { : n 0}
L = { , n 0}.
For n = k2 we have w = = uvxyz
vy = ap, for some nonzero p.
For q=2, must be in L but it is too short.
The next longer string in L, for n = k2+1, is
That means, p = 2k2+1 but p = |vy| ≤ k.
Nested and Cross-Serial Dependencies
PalEven = {wwR : w {a, b}*}
a a b b a a
The dependencies are nested – is context-free
WW = {ww : w {a, b}*}
a a b a a b
Cross-serial dependencies – is not context-free
Let ak+1bk+1ak+1bk+1 = uvxyz
For q=0, we have that uxz WW, that is, uxz = ww for some w.
Observe that uxz still has the shape a+b+a+b+ (because |vxy| ≤ k so deleting v and y cannot remove more than k symbols). Moreover, only one or two (adjacent) same-letter regions are affected. Also, uxz = ww means that w must have the shape a+b+.
If one region only is decreased, then ww is not in WW.
If two (adjacent) regions are decreased, then ww is not in WW.
WW = {ww : w {a, b}*}
Closure Theorems for Context-Free Languages
The context-free languages are closed under:
● Union
● Concatenation
● Kleene star
● Reverse
Closure Under Union
Let G1 = (V1, 1, R1, S1), and
G2 = (V2, 2, R2, S2).
Assume that G1 and G2 have disjoint sets of nonterminals,
not including S.
Let L = L(G1) L(G2).
We can show that L is CF by exhibiting a CFG for
it:
G = (V1 V2 {S}, 1 2,
R1 R2 {S S1, S S2},
S)
Closure Under Concatenation
Let G1 = (V1, 1, R1, S1), and
G2 = (V2, 2, R2, S2).
Assume that G1 and G2 have disjoint sets of nonterminals,
not including S.
Let L = L(G1)L(G2).
We can show that L is CF by exhibiting a CFG for it:
G = (V1 V2 {S}, 1 2,
R1 R2 {S S1S2},
S)
Closure Under Kleene Star
Let G = (V, , R, S1).
Assume that G does not have the nonterminal S.
Let L = L(G)*.
We can show that L is CF by exhibiting a CFG for it:
G = (V1 {S}, 1,
R1 {S , S SS1},
S)
Closure Under Reverse
LR= {w * : w = xR for some x L}.
Let G = (V, , R, S) be a context-free grammar.
Every rule in G is of the form X α, for α V*.
Construct, from G, a new grammar G, such that L(G) = LR:
G = (VG, G, R, SG), where R is constructed as follows:
● For every rule X α in G, add X αR.
Closure Under Intersection
The context-free languages are not closed under
intersection:
The proof is by counterexample. Let:
L1 = {anbncm: n, m 0} /* equal a’s and b’s.
L2 = {ambncn: n, m 0} /* equal b’s and c’s.
Both L1 and L2 are context-free, since there exist
straightforward context-free grammars for them.
But now consider:
L = L1 L2
= {anbncn: n 0}
Closure Under Complement
The context-free languages are not closed under
complement:
L1 L2 = (L1 L2)
The context-free languages are closed under union, so if they were closed under complement, they would be closed under intersection (which they are not).
Closure Under Complement
An Example
AnBnCn is context-free:
But (AnBnCn) = AnBnCn is not context-free.
Closure Under Difference
The context-free languages are not closed under
difference:
L = * – L.
* is context-free. So, if the context-free languages were closed under difference, the complement of any context-free language would necessarily be context-free. But we just showed that that is not so.
The Intersection of a Context-Free Language and a Regular Language is Context-Free
L = L(M1), a PDA = (K1, , 1, 1, s1, A1).
R = L(M2), a deterministic FSM = (K2, , , s2, A2).
We construct a new PDA, M3, that accepts L R by simulating the parallel execution of M1 and M2.
M = (K1 K2, , 1, , (s1, s2), A1 A2).
Insert into :
For each rule (( q1, a, ), ( p1, )) in 1,
and each rule ( q2, a, p2) in ,
((( q1, q2), a, ), (( p1, p2), )).
For each rule ((q1, , ), (p1, ) in 1,
and each state q2 in K2,
(((q1, q2), , ), ((p1, q2), )).
This works because: we can get away with only one stack.
Theorem: The difference (L1 – L2) between a context-free language L1 and a regular language L2 is context-free.
Proof: L1 – L2 = L1 L2.
If L2 is regular then so is L2.
If L1 is context-free, so is L1 L2.
The Difference between a Context-Free Language and a Regular Language is Context-Free
L = {w {a, b, c}* : #a(w) = #b(w) = #c(w) }
If L were context-free, then L = L a*b*c* would also be context-free.
But L = AnBnCn
So neither is L.
Using intersection with a regular language
a
n
2
ak
4
a
k
4
p
k
a
+
4
a(k
2+1)2 = ak
4+2k2+1
a
(k
2
+1)
2
=a
k
4
+2k
2
+1