Chapter 1
Chapter 6
SIMPLIFICATION OF CONTEXT-FREE GRAMMARS AND NORMAL FORMS
Learning Objectives
At the conclusion of the chapter, the student will be able to:
Simplify a context-free grammar by removing useless productions
Simplify a context-free grammar by removing -productions
Simplify a context-free grammar by removing unit-productions
Determine whether or not a context-free grammar is in Chomsky normal form
Transform a context-free grammar into an equivalent grammar in Chomsky normal form
Determine whether or not a context-free grammar is in Greibach normal form
Transform a context-free grammar into an equivalent grammar in Greibach normal form
Methods for Transforming Grammars
The definition of a context-free grammar imposes no restrictions on the right side of a production
In some cases, it is convenient to restrict the form of the right side of all productions
Simplifying a grammar involves eliminating certain types of productions while producing an equivalent grammar, but does not necessarily result in a reduction of the total number of productions
For simplicity, we focus on languages that do not include the empty string
A Useful Substitution Rule
Theorem 6.1 states that, If A and B are distinct variables, a production of the form A uBv can be replaced by a set of productions in which B is substituted by all strings B derives in one step.
Consider the grammar
V = { A, B }, T = { a, b, c }, and productions
A a | aaA | abBc
B abbA | b
We can replace A abBc with two productions that replace B (in red), obtaining an equivalent grammar with productions
A a | aaA | ababbAc | abbc
B abbA | b
Useless Productions
A variable is useful if it occurs in the derivation of at least one string in the language
Otherwise, the variable and any productions in which it appears is considered useless
A variable is useless if:
No terminal strings can be derived from the variable
The variable symbol cannot be reached from S
In the grammar below, B can never be reached from the start symbol S and is therefore considered useless
S A
A aA |
B bA
Removing Useless Productions
It is always possible to remove useless productions from a context-free grammar:
Let V1 be the set of useful variables, initialized to empty
Add a variable A to V1 if there is a production of the form
A terminal symbols or variables in V1
(Repeat until nothing else can be added to V1)
Eliminate any productions containing variables not in V1
Use a dependency graph to identify and eliminate variables that are unreachable from S
Application of the Procedure for Removing Useless Productions
Consider the grammar from example 6.3:
S aS | A | C
A a
B aa
C aCb
In step 2, variables A, B, and S are added to V1
Since C is useless, it is eliminated in step 3, resulting in the grammar with productions
S aS | A
A a
B aa
In step 4, B is identified as unreachable from S, resulting in the grammar with productions
S aS | A
A a
-Productions
A production with on the right side is called a -production
A variable A is called nullable if there is a sequence of derivations through which A produces
If a grammar generates a language not containing , any -productions can be removed
In the grammar below, S1 is nullable
S aS1b
S1 aS1b|
Since the language is -free, we have the equivalent grammar
S aS1b | ab
S1 aS1b | ab
Removing -Productions
It is possible to remove -productions from a context-free grammar that does not generate :
Let VN be the set of nullable variables, initialized to empty
Add a variable A to VN if there is a production having one of the forms:
A λ
A variables already in VN
(Repeat until nothing else can be added to VN)
Eliminate -productions
Add productions in which nullable symbols are replaced by λ in all possible combinations
Application of the Procedure for Removing -Productions
Consider the grammar from example 6.5:
S ABaC
A BC
B b |
C D |
D d
In step 2, variables B, C, and A (in that order) are added to VN
In step 3, -productions are eliminated
In step 4, productions are added by replacing nullable symbols with in all possible combinations, resulting in
S ABaC | BaC | AaC | Aba | aC | Aa | Ba | a
A B | C | BC
B b
C D
D d
Unit-Productions
A production of the form A B (where A and B are variables) is called a unit-production
Unit-productions add unneeded complexity to a grammar and can usually be removed by simple substitution
Theorem 6.4 states that any context-free grammar without -productions has an equivalent grammar without unit-productions
The procedure for eliminating unit-productions assumes that all -productions have been previously removed
Removing Unit-Productions
Draw a dependency graph with an edge from A to B corresponding to every A B production in the grammar
Construct a new grammar that includes all the productions from the original grammar, except for the unit-productions
Whenever there is a path from A to B in the dependency graph, replace B using the substitution rule from Theorem 6.1, but using only the productions in the new grammar
Application of the Procedure for Removing Unit-Productions
Consider the grammar from example 6.6:
S Aa | B
A a | bc |B
B A | bb
The dependency graph contains paths from S to A, S to B, B to A, and A to B
After removing unit-productions and adding the new productions (in red), the resulting grammar is
S Aa | a | bc | bb
A a | bc | bb
B a | bc | bb
Simplification of Grammars
Theorem 6.5 states that, for any context-free language that does not include λ, there is a context-free grammar without useless, -, or unit-productions
Since the removal of one type of production may introduce productions of another type, undesirable productions should be removed in the following order:
Remove -productions
Remove unit-productions
Remove useless productions
Chomsky Normal Form
In Chomsky normal form, the number of symbols on the right side of a production is strictly limited.
A context-free grammar is in Chomsky normal form if all of its productions are in one of the forms below (A, B, C are variables; a is a terminal symbol)
A BC
A a
The grammar below is in Chomsky normal form
S AS | a
A SA| b
Transforming a Grammar into Chomsky Normal Form
For any context-free grammar that does not generate , it is possible to find an equivalent grammar in Chomsky normal form:
Copy any productions of the form A a
For other productions containing a terminal symbol x on the right side, replace x with a variable X and add the production X x
Introduce additional variables to reduce the lengths of the right sides of productions as necessary, replacing long productions with productions of the form W YZ (W, Y, Z are variables)
Application of the Procedure for Removing Unit-Productions
Consider the grammar from example 6.8, which is clearly not in Chomsky normal form
S ABa
A aab
B Ac
After replacing terminal symbols with new variables and adding new productions (in red), the resulting grammar is
S AC
C BX
A XD
D XY
B AZ
X a
Y b
Z c
Greibach Normal Form
In Greibach normal form, there are restrictions on the positions of terminal and variable symbols
A context-free grammar is in Greibach Normal Form if, in all of its productions, the right side consists of single terminal followed by any number of variables
The grammar below is in Greibach normal form
S aAB | bBB | bB
A aA| bB | b
B b
Transforming a Grammar into Greibach Normal Form
For any context-free grammar that does not generate , it is possible to find an equivalent grammar in Greibach normal form
Consider the grammar from example 6.10, which is clearly not in Greibach normal form
S abSb | aa
After replacing terminal symbols with new variables and adding new productions (in red), the resulting grammar is
S aBSB | aA
A a
B b
/docProps/thumbnail.jpeg