CS计算机代考程序代写 Slide 1

Slide 1

Context-Free and
Noncontext-Free Languages
Chapter 13

Languages That Are and
Are Not Context-Free
a*b* is regular.

AnBn = {anbn : n  0} is context-free but not regular.

AnBnCn = {anbncn : n  0} is not context-free.

Languages and Machines

The Regular and the CF Languages
Theorem: The regular languages are a proper subset of the context-free languages.

Proof: In two parts:

Every regular language is CF.
Every regular grammar is context-free.

or
Every FSM is a PDA (that is, ignoring its stack).

There exists at least one language that is CF but not regular.
AnBn

How Many Context-Free Languages Are There?
Theorem: There is a countably infinite number of CFLs.

Proof:
● Upper bound: we can lexicographically enumerate
all the CFGs.

● Lower bound: {a}, {aa}, {aaa}, … are all CFLs.

So there must exist some languages that are not context-free:
{anbncn : n  0}

Showing that L is Context-Free
Techniques for showing that a language L is context-free:

1. Exhibit a context-free grammar for L.

2. Exhibit a PDA for L.

3. Use the closure properties of context-free languages.

Unfortunately, these are weaker than they are for
regular languages.

Showing that L is Not Context-Free
Remember the pumping argument for regular languages:

134.unknown

A Review of Parse Trees
A parse tree, derived by a grammar G = (V, , R, S), is a rooted, ordered tree in which:

● Every leaf node is labeled with an element of   {},

● The root node is labeled S,

● Every other node is labeled with some element of V – ,

● If m is a nonleaf node labeled X and the children of m are labeled x1, x2, …, xn, then the rule X  x1, x2, …, xn is in R.

Example
E  E + T
E T
T  T * F
T  F
F  (E)
F  id

Some Tree Basics
The height of a tree is the length of the longest path from the root to any leaf.

The branching factor of a tree is the largest number of sons associated with any node in the tree.

Theorem: The length of the yield of any tree T with height h and branching factor b is  bh.

From Grammars to Trees
Given a context-free grammar G:

● Let n be the number of nonterminal symbols in G.
● Let b be the branching factor of G, i.e., max{|| : A   in G}

Suppose that T is generated by G and no nonterminal appears more than once on any path:

The maximum height of T is:

The maximum length of T’s yield is:
n
bn

135.unknown

The Context-Free Pumping Theorem
This time we use parse trees, not automata as the basis for our argument.

If w is “long” , then its parse trees must look like:

Choose one such tree such that there’s no other with fewer nodes.
Choose the X’s the bottommost instances of a repeating nonterminal.
(|w| > bn)

The Context-Free Pumping Theorem

We have the derivations:
S * uXz * uxz  L(G)
S * uXz * uvXyz * uvvXyyz * uvvxyyz  L(G)

We can similarly derive all the strings: uv2xy2z, uv3xy3z, …  L(G)

Thus: q  0, uvqxyqz  L(G)

The Context-Free Pumping Theorem

vy ≠ 
Proof: If vy = , then the derivation S * uXz * uxz would also yield w and it would create a parse tree with fewer nodes. But that contradicts the assumption that we started with a tree with the smallest possible number of nodes.

The Context-Free Pumping Theorem

The height of the subtree rooted at X[1] is at most: n + 1 (because the X’s are the bottommost repeated nonterminals)

So |vxy|  bn + 1.

The Context-Free Pumping Theorem

If L is a context-free language, then k  1 such that any w  L with |w|  k can be written as w = uvxyz, for some u, v, x, y, z  *, such that
vy  ,
|vxy|  k and
q  0, uvqxyqz  L

k serves two roles:
● How long must w be to guarantee it is pumpable?
● What’s the bound on |vxy|?

Let n be the number of nonterminals in G.
Let b be the branching factor of G.
What Is k?

If height(T) > n, then some nonterminal occurs more than once on some path. So T is pumpable.

If height(T)  n, then |uvxyz|  bn.

So if |w| = |uvxyz| > bn, w = uvxyz must be pumpable.
How Long Must w be?

Recall that we are considering the bottom-most two instances of a repeated nonterminal. Then the yield of the upper one has length at most bn+1. That is, |vxy| ≤ bn+1.

So, we need k = max(bn,bn+1); let k = bn+1.
What’s the Bound on |vxy|?

Example
grammar:
E  E + T
E T
T  T * F
T  F
F  (E)
F  id

n = 3
b = 3
k = 34 = 81

string
w = id + id * id
w = uvxyz
u = id+
v = 
x = id
y = *id
z = 
vy  ,
|vxy|  k
q  0:
uvqxyqz = id+id(*id)q is
in the language L(G)
E

E
T [1]
+

T
F
id

T [2]
F
id

F
id

*

u
v
x
y
z

An Example of Pumping: AnBnCn
AnBnCn = {anbncn, n 0}

Choose w = ak bk ck = uvxyz
1 | 2 | 3

If v or y spans regions, then for q = 2 the order of the letters changes.

If v and y each contain only one letter, then for q = 2 the numbers of letters are not the same.

An Example of Pumping: { : n 0}
L = { , n  0}.

For n = k2 we have w = = uvxyz

vy = ap, for some nonzero p.

For q=2, must be in L but it is too short.

The next longer string in L, for n = k2+1, is

That means, p = 2k2+1 but p = |vy| ≤ k.

Nested and Cross-Serial Dependencies
PalEven = {wwR : w  {a, b}*}

a a b b a a

The dependencies are nested – is context-free

WW = {ww : w  {a, b}*}

a a b a a b

Cross-serial dependencies – is not context-free

Let ak+1bk+1ak+1bk+1 = uvxyz

For q=0, we have that uxz  WW, that is, uxz = ww for some w.

Observe that uxz still has the shape a+b+a+b+ (because |vxy| ≤ k so deleting v and y cannot remove more than k symbols). Moreover, only one or two (adjacent) same-letter regions are affected. Also, uxz = ww means that w must have the shape a+b+.

If one region only is decreased, then ww is not in WW.

If two (adjacent) regions are decreased, then ww is not in WW.
WW = {ww : w  {a, b}*}

Closure Theorems for Context-Free Languages
The context-free languages are closed under:

● Union

● Concatenation

● Kleene star

● Reverse

Closure Under Union
Let G1 = (V1, 1, R1, S1), and
G2 = (V2, 2, R2, S2).

Assume that G1 and G2 have disjoint sets of nonterminals,
not including S.

Let L = L(G1)  L(G2).

We can show that L is CF by exhibiting a CFG for
it:
G = (V1  V2  {S}, 1  2,
R1  R2  {S  S1, S  S2},
S)

Closure Under Concatenation
Let G1 = (V1, 1, R1, S1), and
G2 = (V2, 2, R2, S2).

Assume that G1 and G2 have disjoint sets of nonterminals,
not including S.

Let L = L(G1)L(G2).

We can show that L is CF by exhibiting a CFG for it:

G = (V1  V2  {S}, 1  2,
R1  R2  {S  S1S2},
S)

Closure Under Kleene Star
Let G = (V, , R, S1).

Assume that G does not have the nonterminal S.

Let L = L(G)*.

We can show that L is CF by exhibiting a CFG for it:

G = (V1  {S}, 1,
R1  {S  , S  SS1},
S)

Closure Under Reverse
LR= {w  * : w = xR for some x  L}.

Let G = (V, , R, S) be a context-free grammar.

Every rule in G is of the form X  α, for α  V*.

Construct, from G, a new grammar G, such that L(G) = LR:
G = (VG, G, R, SG), where R is constructed as follows:

● For every rule X  α in G, add X  αR.

Closure Under Intersection
The context-free languages are not closed under
intersection:

The proof is by counterexample. Let:

L1 = {anbncm: n, m  0} /* equal a’s and b’s.
L2 = {ambncn: n, m  0} /* equal b’s and c’s.

Both L1 and L2 are context-free, since there exist
straightforward context-free grammars for them.

But now consider:
L = L1  L2
= {anbncn: n  0}

Closure Under Complement
The context-free languages are not closed under
complement:

L1  L2 = (L1  L2)

The context-free languages are closed under union, so if they were closed under complement, they would be closed under intersection (which they are not).

Closure Under Complement
An Example
AnBnCn is context-free:

But (AnBnCn) = AnBnCn is not context-free.

Closure Under Difference
The context-free languages are not closed under
difference:

L = * – L.

* is context-free. So, if the context-free languages were closed under difference, the complement of any context-free language would necessarily be context-free. But we just showed that that is not so.

The Intersection of a Context-Free Language and a Regular Language is Context-Free
L = L(M1), a PDA = (K1, , 1, 1, s1, A1).
R = L(M2), a deterministic FSM = (K2, , , s2, A2).

We construct a new PDA, M3, that accepts L  R by simulating the parallel execution of M1 and M2.

M = (K1  K2, , 1, , (s1, s2), A1  A2).

Insert into :

For each rule (( q1, a, ), ( p1, )) in 1,
and each rule ( q2, a, p2) in ,
((( q1, q2), a, ), (( p1, p2), )).

For each rule ((q1, , ), (p1, ) in 1,
and each state q2 in K2,
(((q1, q2), , ), ((p1, q2), )).

This works because: we can get away with only one stack.

Theorem: The difference (L1 – L2) between a context-free language L1 and a regular language L2 is context-free.

Proof: L1 – L2 = L1  L2.

If L2 is regular then so is L2.

If L1 is context-free, so is L1  L2.
The Difference between a Context-Free Language and a Regular Language is Context-Free

L = {w  {a, b, c}* : #a(w) = #b(w) = #c(w) }

If L were context-free, then L = L  a*b*c* would also be context-free.

But L = AnBnCn

So neither is L.
Using intersection with a regular language

a
n
2

ak
4

a
k
4

p
k
a
+
4

a(k
2+1)2 = ak

4+2k2+1

a
(k
2
+1)
2
=a
k
4
+2k
2
+1