FIT2014 Theory of Computation Lecture 16 Chomsky Normal Form, Cocke-Younger-Kasami algorithm
Monash University
Faculty of Information Technology
FIT2014 Theory of Computation
Lecture 16
Chomsky Normal Form, Cocke-Younger-Kasami algorithm
slides by
COMMONWEALTH OF AUSTRALIA
Copyright Regulations 1969
Warning
This material has been reproduced and communicated to you by or on behalf of Monash University
in accordance with s113P of the Copyright Act 1968 (the Act).
The material in this communication may be subject to copyright under the Act.
Any further reproduction or communication of this material by you may be the subject of copyright protection under the Act.
Do not remove this notice.
1 / 22
Overview
I Chomsky Normal Form
I Nullability
I CYK Parsing algorithm
2 / 22
Chomsky Normal Form
A CFG is said to be in Chomsky Normal Form if all the productions are in the form
Nonterminal −→
(called a live production)
or
Nonterminal −→ terminal
(called a dead production).
3 / 22
Chomsky Normal Form
Theorem.
For any context-free language L, the non-empty words in L can be generated by a
grammar in Chomsky Normal Form.
Proof.
Outline:
1. Eliminate ε-productions. (i.e., production rules of the form X −→ ε )
2. Eliminate unit productions. (i.e., production rules of the form X −→ Y )
3. Give each terminal its own corresponding nonterminal that produces it.
4. Use these nonterminals to eliminate terminals, except where they appear alone.
5. Break down rules that produce at least three nonterminals, using new
nonterminals, to give a set of rules producing just two nonterminals.
4 / 22
1. Eliminate all ε-productions.
For every production rule X −→ ε:
For every other rule with X in the body (right-hand side):
Create new rules with all possible replacements of occurrences of X by ε
(and keep the old rule).
Remove the rule X −→ ε.
For example:
old rules with X in body new rules
A −→ bXQ A −→ bXQ and
A −→ bQ
A −→ bXQX A −→ bXQX and
A −→ bQX and
A −→ bXQ and
A −→ bQ
A −→ X A −→ X and
A −→ ε
5 / 22
Chomsky Normal Form
Keep doing this until there are no more ε-productions.
Once this step is complete: no rule has an empty right-hand side.
Housekeeping:
Suppose we have a nonterminal that never appears on the left of any rule.
(This situation may be created by our elimination of some ε-productions.)
Then we can delete all rules where it appears on the right.
I If a rule has such a nonterminal on the right,
then that nonterminal can never be replaced,
so such a rule can never be used in any derivation of a string of terminals.
I This deletion is not strictly necessary for getting a valid CNF grammar.
But it can yield a simpler result.
6 / 22
Chomsky Normal Form
2. Eliminate all unit productions.
For every production rule X −→ Y :
For every rule with Y on the left:
For every rule with X in the body (right-hand side):
Create a new rule with X on the left instead of Y (& keep the old rule).
Remove the rule X −→ Y .
For example:
old rules with X on left new rules
Y −→ abQR Y −→ abQR and
X −→ abQR
Y −→ Q Y −→ Q and
X −→ Q (unless X −→ Q has been dealt with previously)
Y −→ X Y −→ X and
X −→ X
7 / 22
Chomsky Normal Form
2. (continued)
Keep doing this until there are no more unit productions.
Once this step is complete: every rule’s right-hand side is either
I a single terminal, or
I at least two symbols (terminals and/or nonterminals)
8 / 22
Chomsky Normal Form
3. Give each terminal its own corresponding nonterminal that produces it.
For each terminal z , create a new nonterminal Xz and a new rule Xz −→ z .
For example:
If our terminals are a,b, then create new nonterminals Xa,Xb and new rules
Xa −→ a
Xb −→ b
9 / 22
Chomsky Normal Form
4. In all rules that don’t just produce a single terminal,
replace each terminal by its corresponding new nonterminal.
Y −→ abQR becomes Y −→ XaXbQR
Once this step is complete: every rule’s right-hand side is either
I a single terminal, or
I at least two nonterminals.
10 / 22
Chomsky Normal Form
5. For every rule with more than
two nonterminals on the right,
create new nonterminals as needed
to replace the rule by a set of rules
with just two nonterminals on
the right.
Once this step is complete:
every rule’s right-hand side is either
I a single terminal, or
I exactly two nonterminals.
Example:
old rule new rules
Y −→ Z1Z2Z3 Y −→ Z12Z3 and
Z12 −→ Z1Z2 and
Y −→ Z1Z2Z3Z4 Y −→ Z12Z34 and
Z12 −→ Z1Z2 and
Z34 −→ Z3Z4
Y −→ Z1Z2Z3Z4Z5 Y −→ Z1234Z5 and
Z1234 −→ Z12Z34 and
Z12 −→ Z1Z2 and
Z34 −→ Z3Z4
. . . where Z12,Z34,Z1234 are new nonterminals.
11 / 22
S −→ bA
S −→ aB
A −→ a
A −→ aS
A −→ bAA
B −→ b
B −→ bS
B −→ aBB
Steps
3 & 4
S −→ XbA
S −→ XaB
A −→ a
A −→ XaS
A −→ XbAA
B −→ b
B −→ XbS
B −→ XaBB
Xa −→ a
Xb −→ b
Step 5
S −→ XbA
S −→ XaB
A −→ a
A −→ XaS
A −→ YA
B −→ b
B −→ XbS
B −→ ZB
Xa −→ a
Xb −→ b
Y −→ XbA
Z −→ XaB
12 / 22
Consequences
Cocke-Younger-Kasami (CYK) algorithm (today)
I Given a CFG and a string,
decides whether or not the string can be generated by the CFG.
I polynomial time
I a bottom-up parsing algorithm.
Pumping Lemma for CFG (next lecture)
I for proving that certain languages are not context-free.
13 / 22
Nullability
Given a CFG, how to decide whether or not it generates the empty string?
A nonterminal A is nullable if the empty string can be derived from it:
A =⇒ · · · =⇒ ε.
Algorithm:
1. For every rule of the form X −→ ε, mark X as nullable.
2. While there is a rule Y −→ Y1Y2 · · ·Yk that
only produces nonterminals and all those nonterminals have been marked:
I Mark Y .
3. If S has been marked, Accept, else Reject.
14 / 22
CYK Algorithm
For each CFG and string s, we can decide whether or not s is generated by the CFG.
Input: s = t1t2 . . . tn, where each ti is a letter and n ≥ 0.
If s = ε then use the Nullability algorithm.
From now on, s is nonempty.
Find the Chomsky Normal Form for the non-empty words generated by the grammar.
For each letter tk find the nonterminals which can produce tk .
15 / 22
CYK Algorithm
For each pair of consecutive letters ti ti+1 (where 1 ≤ i ≤ n − 1),
find the nonterminals that can generate the pair, as follows:
I For each pair X ,Y such that
X generates ti and Y generates ti+1,
find all W such that there is a rule W −→ XY .
W
X Y
ti ti+1
16 / 22
CYK Algorithm
For each triple of consecutive letters ti ti+1ti+2,
find the nonterminals that can generate the triple,
as follows:
I For each pair X ,Y such that
X
∗
=⇒ ti ti+1 and Y
∗
=⇒ ti+2,
find all W such that
there is a rule W −→ XY .
I For each pair X ,Y such that
X
∗
=⇒ ti and Y
∗
=⇒ ti+1ti+2,
find all W such that
there is a rule W −→ XY .
W
X Y
ti ti+1 ti+2
W
X Y
ti ti+1ti+2
17 / 22
CYK Algorithm
For each quadruple of consecutive letters ti ti+1ti+2ti+3,
find the nonterminals that can generate it:
W
X Y
ti ti+1ti+2 ti+3
W
X Y
ti ti+1 ti+2ti+3
W
X Y
ti ti+1ti+2ti+3
Continue, in this way . . .
Eventually, find all nonterminals that can produce s = t1t2 . . . tn.
If S is one of them,
then Accept, as s can be generated; otherwise, Reject, as s cannot be generated.
18 / 22
CYK Algorithm
S → aSa
S → b
CNF S → TA
S → b
A → a
T → AS
Input string: aabaa
Single
letters
A→ a
S → b
Pairs
??⇒ AA⇒ aa
T ⇒ AS ⇒ ab
??⇒ SA⇒ ba
??⇒ AA⇒ aa
19 / 22
CYK Algorithm
Triples
??⇒ aa b
??⇒ AT ⇒ a ab
S ⇒ TA⇒ ab a
??⇒ a ba
??⇒ ba a
??⇒ b aa
4-tuples
??⇒ aab a
??⇒ aa ba
T ⇒ AS ⇒ a aba
??⇒ SA⇒ aba a
??⇒ ab aa
??⇒ a baa
5-tuples
S ⇒ TA⇒ aaba a
??⇒ aab aa
??⇒ aa baa
??⇒ a abaa
So S can generate aabaa, and we are done.
20 / 22
CYK Algorithm
Exercises:
Write the algorithm more formally.
Prove by induction that the algorithm works.
Determine the complexity of the algorithm, in big-O notation.
21 / 22
Revision
I Know the uses of Chomsky Normal Form, and be able to convert a grammar to it.
I Avoid confusion between
Chomsky Normal Form (CNF) and Conjunctive Normal Form (CNF)!
I Know and use the CYK algorithm.
Reading: Sipser, pp. 108–111.
22 / 22