CS计算机代考程序代写 algorithm FIT2014 Theory of Computation Lecture 16 Chomsky Normal Form, Cocke-Younger-Kasami algorithm

FIT2014 Theory of Computation Lecture 16 Chomsky Normal Form, Cocke-Younger-Kasami algorithm

Monash University
Faculty of Information Technology

FIT2014 Theory of Computation

Lecture 16
Chomsky Normal Form, Cocke-Younger-Kasami algorithm

slides by

COMMONWEALTH OF AUSTRALIA
Copyright Regulations 1969

Warning
This material has been reproduced and communicated to you by or on behalf of Monash University

in accordance with s113P of the Copyright Act 1968 (the Act).
The material in this communication may be subject to copyright under the Act.

Any further reproduction or communication of this material by you may be the subject of copyright protection under the Act.
Do not remove this notice.

1 / 22

Overview

I Chomsky Normal Form

I Nullability

I CYK Parsing algorithm

2 / 22

Chomsky Normal Form

A CFG is said to be in Chomsky Normal Form if all the productions are in the form

Nonterminal −→

(called a live production)
or

Nonterminal −→ terminal

(called a dead production).

3 / 22

Chomsky Normal Form

Theorem.
For any context-free language L, the non-empty words in L can be generated by a
grammar in Chomsky Normal Form.

Proof.
Outline:

1. Eliminate ε-productions. (i.e., production rules of the form X −→ ε )
2. Eliminate unit productions. (i.e., production rules of the form X −→ Y )
3. Give each terminal its own corresponding nonterminal that produces it.

4. Use these nonterminals to eliminate terminals, except where they appear alone.

5. Break down rules that produce at least three nonterminals, using new
nonterminals, to give a set of rules producing just two nonterminals.

4 / 22

1. Eliminate all ε-productions.
For every production rule X −→ ε:

For every other rule with X in the body (right-hand side):
Create new rules with all possible replacements of occurrences of X by ε

(and keep the old rule).
Remove the rule X −→ ε.

For example:

old rules with X in body new rules

A −→ bXQ A −→ bXQ and
A −→ bQ

A −→ bXQX A −→ bXQX and
A −→ bQX and
A −→ bXQ and
A −→ bQ

A −→ X A −→ X and
A −→ ε

5 / 22

Chomsky Normal Form

Keep doing this until there are no more ε-productions.

Once this step is complete: no rule has an empty right-hand side.

Housekeeping:
Suppose we have a nonterminal that never appears on the left of any rule.

(This situation may be created by our elimination of some ε-productions.)

Then we can delete all rules where it appears on the right.

I If a rule has such a nonterminal on the right,
then that nonterminal can never be replaced,
so such a rule can never be used in any derivation of a string of terminals.

I This deletion is not strictly necessary for getting a valid CNF grammar.
But it can yield a simpler result.

6 / 22

Chomsky Normal Form
2. Eliminate all unit productions.
For every production rule X −→ Y :

For every rule with Y on the left:
For every rule with X in the body (right-hand side):

Create a new rule with X on the left instead of Y (& keep the old rule).
Remove the rule X −→ Y .

For example:
old rules with X on left new rules

Y −→ abQR Y −→ abQR and
X −→ abQR

Y −→ Q Y −→ Q and
X −→ Q (unless X −→ Q has been dealt with previously)

Y −→ X Y −→ X and
X −→ X

7 / 22

Chomsky Normal Form

2. (continued)
Keep doing this until there are no more unit productions.

Once this step is complete: every rule’s right-hand side is either

I a single terminal, or

I at least two symbols (terminals and/or nonterminals)

8 / 22

Chomsky Normal Form

3. Give each terminal its own corresponding nonterminal that produces it.

For each terminal z , create a new nonterminal Xz and a new rule Xz −→ z .

For example:
If our terminals are a,b, then create new nonterminals Xa,Xb and new rules

Xa −→ a
Xb −→ b

9 / 22

Chomsky Normal Form

4. In all rules that don’t just produce a single terminal,
replace each terminal by its corresponding new nonterminal.

Y −→ abQR becomes Y −→ XaXbQR

Once this step is complete: every rule’s right-hand side is either

I a single terminal, or

I at least two nonterminals.

10 / 22

Chomsky Normal Form

5. For every rule with more than
two nonterminals on the right,
create new nonterminals as needed
to replace the rule by a set of rules
with just two nonterminals on
the right.

Once this step is complete:
every rule’s right-hand side is either

I a single terminal, or

I exactly two nonterminals.

Example:

old rule new rules

Y −→ Z1Z2Z3 Y −→ Z12Z3 and
Z12 −→ Z1Z2 and

Y −→ Z1Z2Z3Z4 Y −→ Z12Z34 and
Z12 −→ Z1Z2 and
Z34 −→ Z3Z4

Y −→ Z1Z2Z3Z4Z5 Y −→ Z1234Z5 and
Z1234 −→ Z12Z34 and
Z12 −→ Z1Z2 and
Z34 −→ Z3Z4

. . . where Z12,Z34,Z1234 are new nonterminals.

11 / 22

S −→ bA
S −→ aB
A −→ a
A −→ aS
A −→ bAA
B −→ b
B −→ bS
B −→ aBB

Steps
3 & 4

S −→ XbA
S −→ XaB
A −→ a
A −→ XaS
A −→ XbAA
B −→ b
B −→ XbS
B −→ XaBB
Xa −→ a
Xb −→ b

Step 5

S −→ XbA
S −→ XaB
A −→ a
A −→ XaS
A −→ YA
B −→ b
B −→ XbS
B −→ ZB
Xa −→ a
Xb −→ b
Y −→ XbA
Z −→ XaB

12 / 22

Consequences

Cocke-Younger-Kasami (CYK) algorithm (today)

I Given a CFG and a string,
decides whether or not the string can be generated by the CFG.

I polynomial time

I a bottom-up parsing algorithm.

Pumping Lemma for CFG (next lecture)

I for proving that certain languages are not context-free.

13 / 22

Nullability

Given a CFG, how to decide whether or not it generates the empty string?

A nonterminal A is nullable if the empty string can be derived from it:

A =⇒ · · · =⇒ ε.

Algorithm:

1. For every rule of the form X −→ ε, mark X as nullable.
2. While there is a rule Y −→ Y1Y2 · · ·Yk that

only produces nonterminals and all those nonterminals have been marked:
I Mark Y .

3. If S has been marked, Accept, else Reject.

14 / 22

CYK Algorithm

For each CFG and string s, we can decide whether or not s is generated by the CFG.

Input: s = t1t2 . . . tn, where each ti is a letter and n ≥ 0.

If s = ε then use the Nullability algorithm.

From now on, s is nonempty.

Find the Chomsky Normal Form for the non-empty words generated by the grammar.

For each letter tk find the nonterminals which can produce tk .

15 / 22

CYK Algorithm

For each pair of consecutive letters ti ti+1 (where 1 ≤ i ≤ n − 1),
find the nonterminals that can generate the pair, as follows:

I For each pair X ,Y such that
X generates ti and Y generates ti+1,

find all W such that there is a rule W −→ XY .

W

X Y

ti ti+1

16 / 22

CYK Algorithm

For each triple of consecutive letters ti ti+1ti+2,
find the nonterminals that can generate the triple,
as follows:

I For each pair X ,Y such that
X


=⇒ ti ti+1 and Y


=⇒ ti+2,

find all W such that
there is a rule W −→ XY .

I For each pair X ,Y such that
X


=⇒ ti and Y


=⇒ ti+1ti+2,

find all W such that
there is a rule W −→ XY .

W

X Y

ti ti+1 ti+2

W

X Y

ti ti+1ti+2

17 / 22

CYK Algorithm
For each quadruple of consecutive letters ti ti+1ti+2ti+3,
find the nonterminals that can generate it:

W

X Y

ti ti+1ti+2 ti+3

W

X Y

ti ti+1 ti+2ti+3

W

X Y

ti ti+1ti+2ti+3

Continue, in this way . . .

Eventually, find all nonterminals that can produce s = t1t2 . . . tn.
If S is one of them,
then Accept, as s can be generated; otherwise, Reject, as s cannot be generated.

18 / 22

CYK Algorithm

S → aSa
S → b

CNF S → TA
S → b
A → a
T → AS

Input string: aabaa

Single
letters

A→ a
S → b

Pairs

??⇒ AA⇒ aa
T ⇒ AS ⇒ ab
??⇒ SA⇒ ba
??⇒ AA⇒ aa

19 / 22

CYK Algorithm

Triples

??⇒ aa b
??⇒ AT ⇒ a ab

S ⇒ TA⇒ ab a
??⇒ a ba

??⇒ ba a
??⇒ b aa

4-tuples

??⇒ aab a
??⇒ aa ba

T ⇒ AS ⇒ a aba

??⇒ SA⇒ aba a
??⇒ ab aa
??⇒ a baa

5-tuples

S ⇒ TA⇒ aaba a
??⇒ aab aa
??⇒ aa baa
??⇒ a abaa

So S can generate aabaa, and we are done.

20 / 22

CYK Algorithm

Exercises:

Write the algorithm more formally.

Prove by induction that the algorithm works.

Determine the complexity of the algorithm, in big-O notation.

21 / 22

Revision

I Know the uses of Chomsky Normal Form, and be able to convert a grammar to it.

I Avoid confusion between
Chomsky Normal Form (CNF) and Conjunctive Normal Form (CNF)!

I Know and use the CYK algorithm.

Reading: Sipser, pp. 108–111.

22 / 22