Slide 1
Context-Free Grammars
Chapter 11
Languages and Machines
Context-free Grammars, Languages, and PDAs
Context-free Language
Context-free Grammar
PDA
L
Accepts
Context-Free Grammars
A context-free grammar G is a quadruple,
(V, , R, S), where:
V is the rule alphabet
, a subset of V, the set of terminals
V – , the set of nonterminals
R, a finite subset of (V – ) V*, the set of rules
S, an element of V – , the start symbol
Example:
({S, a, b}, {a, b}, {S a S b, S }, S)
Derivations
x G y iff x = A
and A R
y =
w0 G w1 G w2 G . . . G wn is a derivation in G.
Let G* be the reflexive, transitive closure of G.
Then the language generated by G, denoted L(G), is:
L(G) = {w * : S G* w}.
An Example Derivation
Example:
Let G = ({S, a, b}, {a, b}, {S a S b, S }, S)
S a S b aa S bb aaa S bbb aaabbb
S * aaabbb
Definition of a Context-Free Grammar
A language L is context-free iff it is generated by some
context-free grammar G.
AnBn
S
S aSb
Balanced Parentheses
S
S SS
S (S)
Recursive Grammar Rules
A rule is recursive iff it is X w1Yw2, where:
Y * w3Xw4 for some w1, w2, w3, and w4 in V*.
A grammar is recursive iff it contains at least one recursive rule.
Examples: S (S) S (T)
T (S)
Self-Embedding Grammar Rules
A rule in a grammar G is self-embedding iff it is :
X w1Yw2, where Y * w3Xw4 and
both w1w3 and w4w2 are in +.
A grammar is self-embedding iff it contains at least one self-embedding rule.
Example: S aSa is self-embedding
S aS is recursive but not self-
embedding
S aT
T Sa is self-embedding
Where Context-Free Grammars Get Their Power
If a grammar G is not self-embedding then L(G) is regular.
If a language L has the property that every grammar that defines it is self-embedding, then L is not regular.
PalEven = {wwR : w {a, b}*}
G = {{S, a, b}, {a, b}, R, S}, where:
R = { S aSa
S bSb
S }.
Equal Numbers of a’s and b’s
Let L = {w {a, b}*: #a(w) = #b(w)}.
G = {{S, a, b}, {a, b}, R, S}, where:
R = { S aSb
S bSa
S SS
S }.
Arithmetic Expressions
G = (V, , R, E), where
V = {+, *, (, ), id, E},
= {+, *, (, ), id},
R = {
E E + E
E E E
E (E)
E id }
Backus-Naur Form (BNF)
The symbol | should be read as “or”.
Example: S aSb | bSa | SS |
‘ ‘ replaced by “: : -” (read as “can be”)
Allow a nonterminal symbol to be any sequence of characters surrounded by angle brackets.
Examples of nonterminals:
A notation for writing practical context-free grammars
BNF for a Java Fragment
if (
do
return | return
HTML
- Item 1, which will include a sublist
- First item in sublist
- Second item in sublist
- Item 2
A grammar:
HTMLtext Element HTMLtext |
Element UL | LI | … (and other kinds of elements that
are allowed in the body of an HTML document)
UL
- HTMLtext
LI
English
S NP VP
NP the Nominal | a Nominal | Nominal |
ProperNoun | NP PP
Nominal N | Adjs N
N cat | dogs | bear | girl | chocolate | rifle
ProperNoun Chris | Fluffy
Adjs Adj Adjs | Adj
Adj young | older | smart
VP V | V NP | VP PP
V like | likes | thinks | shots | smells
PP Prep NP
Prep with
Designing Context-Free Grammars
● Generate related regions together.
AnBn
● Generate concatenated regions:
A BC
● Generate outside in:
A aAb
Concatenating Independent Languages
Let L = {anbncm : n, m 0}.
The cm portion of any string in L is completely independent of the anbn portion, so we should generate the two portions separately and concatenate them together.
G = ({S, N, C, a, b, c}, {a, b, c}, R, S} where:
R = { S NC
N aNb
N
C cC
C }.
L = { : k 0 and i (ni 0)}
Examples of strings in L: , abab, aabbaaabbbabab
Note that L = {anbn : n 0}*.
G = ({S, M, a, b}, {a, b}, R, S} where:
R = { S MS
S
M aMb
M }.
Unequal a’s and b’s
L = {anbm : n m}
G = (V, , R, S), where
V = {a, b, S, A, B},
= {a, b},
R =
S A /* more a’s than b’s
S B /* more b’s than a’s
A a /* at least one extra a generated
A aA
A aAb
B b /* at least one extra b generated
B Bb
B aBb
Context free languages:
We care about structure.
E
E + E
id E * E
3 id id
5 7
Structure
To capture structure, we must capture the path we took through the grammar. Derivations do that.
Example:
S
S SS
S (S)
1 2 3 4 5 6
S SS (S)S ((S))S (())S (())(S) (())()
S SS (S)S ((S))S ((S))(S) (())(S) (())()
1 2 3 5 4 6
But the order of rule application doesn’t matter.
Derivations
Parse trees capture essential structure:
1 2 3 4 5 6
S SS (S)S ((S))S (())S (())(S) (())()
S SS (S)S ((S))S ((S))(S) (())(S) (())()
1 2 3 5 4 6
S
S S
( S ) ( S )
( S )
Derivations
Parse Trees
A parse tree, derived by a grammar G = (V, , R, S), is a rooted, ordered tree in which:
● Every leaf node is labeled with an element of {},
● The root node is labeled S,
● Every other node is labeled with some element of:
V – , and
● If m is a nonleaf node labeled X and the children of m
are labeled x1, x2, …, xn, then R contains the rule
X x1x2…xn.
Ambiguity
A grammar is ambiguous iff there is at least one string in L(G) for which G produces more than one parse tree.
For most applications of context-free grammars, this is a problem.
An Arithmetic Expression Grammar
E E + E id + id * id
E E E
E (E)
E id
Even a Very Simple Grammar Can be Highly Ambiguous
S (())()
S SS
S (S)
Inherent Ambiguity
Some languages have the property that every grammar for them is ambiguous. We call such languages inherently ambiguous.
Example:
L = {anbncm: n, m 0} {anbmcm: n, m 0}.
It can be proved that L is inherently ambiguous.
We can generate anbncm and anbmcm unambiguously but anbncn will be generated in two ways.
Inherent Ambiguity
Both of the following problems are undecidable:
Given a context-free grammar G, is G ambiguous?
Given a context-free language L, is L inherently
ambiguous?
But We Can Often Reduce Ambiguity
We can get rid of:
● rules like S ,
● rules with symmetric right-hand sides, e.g.,
S SS
E E + E
● rule sets that lead to ambiguous attachment of
optional postfixes.
A Highly Ambiguous Grammar
S
S SS
S (S)
Resolving the Ambiguity with a Different Grammar
The biggest problem is the rule.
A different grammar for the language of balanced parentheses:
S*
S* S
S SS
S (S)
S ()
Nullable Variables
A variable X is nullable iff either:
(1) there is a rule X , or
(2) there is a rule X PQR… and P, Q, R, …
are all nullable.
So compute N, the set of nullable variables, as follows:
1. Set N to the set of variables that satisfy (1).
2. Repeat until no change
Add variables satisfying (2)
A General Technique for Getting Rid of -Rules
Definition: a rule is modifiable iff it is of the form:
P R, for some nullable R, P
removeEps(G: cfg) =
1. Let G = G.
2. Find the set N of nullable variables in G.
3. For each modifiable rule P R of G do
Add the rule P .
4. Delete from G all rules of the form X .
5. Return G.
L(G) = L(G) – {}
An Example
G = {{S, T, A, B, C, a, b, c}, {a, b, c}, R, S), R =
{ S aTa
T ABC
A aA | C
B Bb | C
C c | }
Nullable variables = {A, B, C, T}
G: add : remove:
S aa T B C
T BC T C
T AC T A
T AB A a
B b
What If L?
atmostoneEps(G: cfg) =
1. G = removeEps(G).
2. If SG is nullable then /* i. e., L(G)
2.1 Create in G a new start symbol S*.
2.2 Add to RG the two rules:
S*
S* SG.
3. Return G.
But There is Still Ambiguity
S* What about ()()() ?
S* S
S SS
S (S)
S ()
Eliminating Symmetric Recursive Rules
S*
S* S
S SS
S (S)
S ()
Replace S SS with one of:
S SS1 /* force branching to the left
S S1S /* force branching to the right
So we get:
S* S SS1
S* S S S1
S1 (S)
S1 ()
Eliminating Symmetric Recursive Rules
So we get:
S*
S* S
S SS1
S S1
S1 (S)
S1 ()
S*
S
S S1
S S1
S1
( ) ( ) ( )
Arithmetic Expressions
E E + E
E E E
E (E)
E id
E E
E E E E
E E E E
id id id id id id
Problem 1: Associativity
Arithmetic Expressions
E E + E
E E E
E (E)
E id
E E
E E E E
E E E E
id id + id id id + id
Problem 2: Precedence
Arithmetic Expressions – A Better Way
E E + T
E T
T T * F
T F
F (E)
F id
Example:
id + id * id
*
Dangling “else”
The dangling else problem:
Consider:
if cond1 then if cond2 then stmt1 else stmt2
cond2
if
if
cond1
stmt1
stmt2
cond2
if
if
cond1
stmt2
stmt1
Dangling “else” ambiguity
Dangling “else” solution
|
|
| if
cond2
if
if
cond1
stmt1
stmt2
other_stmt
other_stmt
Dangling “else” solution
k
k
n
n
n
n
n
n
b
a
b
a
b
a
…
2
2
1
1