EECS2001:
Introduction to Theory of Computation
Summer 2019
Ali Mahmoodi
amahmoodi@cse.yorku.ca
Office: Lassonde 2015 Course page: Moodle
Notes based on work by Professor Suprakash Datta
6/19/2019 EECS2001, Summer 2019 1
Next
•Chapter 2:
• Context-Free Languages (CFL) • Context-Free Grammars (CFG) • Chomsky Normal Form of CFG
• RL CFL
6/19/2019 EECS2001, Summer 2019 2
Context-Free Languages (Ch. 2)
Context-free languages (CFLs) are a more powerful (augmented) model than FA.
CFLs allow us to describe non-regular languages like { 0n1n | n0}
General idea: CFLs are languages that can be recognized by automata that have one single stack:
{ 0n1n | n0} is a CFL
{ 0n1n0n | n0} is not a CFL
6/19/2019 EECS2001, Summer 2019 3
Context-Free Grammars
Grammars: define/specify a language
Which simple machine produces the non-regular
language { 0n1n | n N }?
Start symbol S with rewrite rules: 1) S → 0S1
2) S → ε
S yields 0n1n according to
S→0S1→00S11→…→0nS1n →0n1n
6/19/2019 EECS2001, Summer 2019 4
Context-Free Grammars (Def.)
A context free grammar G=(V,,R,S) is defined by
• V: a finite set variables
• : finite set terminals (with V=)
• R: finite set of substitution rules V → (V)* • S: start symbol V
The language of grammar G is denoted by L(G):
L(G) = { w* | S * w }
6/19/2019 EECS2001, Summer 2019 5
Derivation *
A single step derivation “” consist of the substitution of a variable by a string according to a substitution rule.
Example: with the rule “A→BB”, we can have the derivation “01AB0 01BBB0”.
A sequence of several derivations (or none) is indicated by “ * ”
Same example: “0AA * 0BBBB”
6/19/2019 EECS2001, Summer 2019 6
Some Remarks
The language L(G) = { w* | S * w } contains only strings of terminals, not variables.
Notation: we summarize several rules, like A→B
A → 01 by A → B | 01 | AA A→AA
Unless stated otherwise: topmost rule concerns the start variable
6/19/2019 EECS2001, Summer 2019 7
Context-Free Grammars (Ex.)
Consider the CFG G=(V,,R,S) with V = {S}
= {0,1}
R: S → 0S1 | 0Z1
Z → 0Z |
Then L(G) = {0i1j | ij }
S yields 0j+k1j according to: S0S1…0jS1j 0jZ1j0j0Z1j … 0j+kZ1j 0j+k1j = 0j+k1j
6/19/2019 EECS2001, Summer 2019 8
Importance of CFL
Model for natural languages (Noam Chomsky)
Specification of programming languages: “parsing of a computer program”
Describes mathematical structures
Intermediate between regular languages and computable languages (Chapters 3,4,5 and 6)
6/19/2019 EECS2001, Summer 2019 9
Example Boolean Algebra
Consider the CFG G=(V,,R,S) with V = {S,Z}
= {0,1,(,),,,}
R: S → 0 | 1 | (S) | (S)(S) | (S)(S)
Some elements of L(G): 0
(((0))(1)) (1)((0)(0))
Note: Parentheses prevent “100” confusion. 6/19/2019 EECS2001, Summer 2019 10
Human Languages
Number of rules:
Possible element: the boy sees the girl 6/19/2019 EECS2001, Summer 2019 11
Parse Trees
The parse tree of (0)((0)(1)) via rule
S → 0 | 1 | (S) | (S)(S) | (S)(S):
S (S)
0
0
(S)
(S)
1
(S)
6/19/2019
EECS2001, Summer 2019
12
Ambiguity
A grammar is ambiguous if some strings are derived ambiguously.
A string is derived ambiguously if it has more than one leftmost derivations.
Typical example: rule S → 0 | 1 | S+S | SS
S S+S SS+S 0S+S 01+S 01+1 versus
S SS 0S 0S+S 01+S 01+1 6/19/2019 EECS2001, Summer 2019 13
Ambiguity and Parse Trees
The ambiguity of 01+1 is shown by the two different parse trees:
S
SS10S+S
S
S
S
+S
S
01 11
6/19/2019 EECS2001, Summer 2019 14
More on Ambiguity
The two different derivations: S S+S 0+S 0+1
and
S S+S S+1 0+1
do not constitute an ambiguous string 0+1 (they will have the same parse tree)
Languages that can only be generated by ambiguous grammars are “inherently ambiguous”
6/19/2019 EECS2001, Summer 2019 15
Context-Free Languages
Any language that can be generated by a context free grammar is a context-free language (CFL).
The CFL { 0n1n | n0 } shows us that certain CFLs are nonregular languages.
Q1: Are all regular languages context free?
Q2: Which languages are outside the class CFL?
6/19/2019 EECS2001, Summer 2019 16
“Chomsky Normal Form”
A context-free grammar G = (V,,R,S) is in Chomsky normal form if every rule is of the form
A → BC or A→x
with variables AV and B,CV \{S}, and x For the start variable S we also allow the rule
S→
Advantage: Grammars in this form are far
easier to analyze.
6/19/2019 EECS2001, Summer 2019 17
Theorem 2.9
Every context-free language can be described by a grammar in Chomsky normal form.
Outline of Proof:
We rewrite every CFG in Chomsky normal form. We do this by replacing, one-by-one, every rule that is not ‘Chomsky’.
We have to take care of: Starting Symbol,
symbol, all other violating rules.
6/19/2019 EECS2001, Summer 2019 18
Proof of Theorem 2.9
Given a context-free grammar G = (V,,R,S), rewrite it to Chomsky Normal Form by
1) New start symbol S0 (and add rule S0→S) 2) Remove A→ rules (from the tail):
before: B→xAy and A→, after: B→ xAy | xy 3) Remove unit rules A→B (by the head): “A→B”
and “B→xCy”, becomes “A→xCy” and “B→xCy” 4) Shorten all rules to two: before: “A→B1B2…Bk”,
after: A→B1A1, A1→B2A2,…, Ak-2→Bk-1Bk
5) Replace ill-placed terminals “a” by T with T →a aa
6/19/2019 EECS2001, Summer 2019
19
Proof of Theorem 2.9
Given a context-free grammar G = (V,,R,S), rewrite it to Chomsky Normal Form by
1) New start symbol S0 (and add rule S0→S) 2) Remove A→ rules (from the tail):
before: B→xAy and A→, after: B→ xAy | xy 3) Remove unit rules A→B (by the head): “A→B”
and “B→xCy”, becomes “A→xCy” and “B→xCy” 4) Shorten all rules to two: before: “A→B1B2…Bk”,
after: A→B1A1, A1→B2A2,…, Ak-2→Bk-1Bk
5) Replace ill-placed terminals “a” by T with T →a aa
6/19/2019 EECS2001, Summer 2019
20
Careful Removing of Rules
Do not introduce new rules that you removed earlier.
Example: A→A simply disappears
When removing A→ rules, insert all new replacements:
B→AaA becomes B→ AaA | aA | Aa | a
6/19/2019 EECS2001, Summer 2019 21
Example of Chomsky NF
Initial grammar: S→ aSb | In Chomsky normal form:
6/19/2019
EECS2001, Summer 2019 22
S →|TT |TX 0aba
X → STb S→TT |TX
T→a a
Tb → b
aba
Example 2.10 Sipser 2nd Ed
6/19/2019 EECS2001, Summer 2019 23
Example 2.10 (Continued)
6/19/2019 EECS2001, Summer 2019 24
Example 2.10 (Continued)
6/19/2019 EECS2001, Summer 2019 25
RL CFL
Every regular language can be expressed by
a context-free grammar.
Proof Idea:
Given a DFA M = (Q,,,q0,F), we construct a corresponding CF grammar GM = (V,,R,S) with V = Q and S = q0
Rules of GM:
6/19/2019
qi → x (qi,x) for all qiV and all x
qi →
for all qiF
EECS2001, Summer 2019 26
The DFA
Example RL CFL 01
10
q1 q2 q3
leads to the
context-free grammar
GM = (Q,,R,q1) with the rules
q1 →0q1 |1q2
q2 →0q3 |1q2 | q3 →0q2 |1q2
0,1
6/19/2019 EECS2001, Summer 2019
27
Picture Thus Far
??
context-free languages
Regular languages
6/19/2019
EECS2001, Summer 2019
28
{ 0n1n }