FIT2014 Theory of Computation Lecture 12 Context-Free Grammars
Monash University
Faculty of Information Technology
FIT2014 Theory of Computation
Lecture 12
Context-Free Grammars
slides by Graham Farr
based in part on previous slides by David Albrecht
COMMONWEALTH OF AUSTRALIA
Copyright Regulations 1969
Warning
This material has been reproduced and communicated to you by or on behalf of Monash University
in accordance with s113P of the Copyright Act 1968 (the Act).
The material in this communication may be subject to copyright under the Act.
Any further reproduction or communication of this material by you may be the subject of copyright protection under the Act.
Do not remove this notice.
1 / 27
Overview
I Inductive Definitions
I Context Free Grammars
I Parse Trees
I Derivations
2 / 27
Arithmetic Expressions
1. All integers are Arithmetic Expressions
2. If A and B are Arithmetic Expressions, so are:
(i) A + B
(ii) A− B
(iii) A ∗ B
(iv) A/B
(v) (A)
3 / 27
Production Rules
AE → integer
AE → AE + AE
AE → AE − AE
AE → AE ∗ AE
AE → AE/AE
AE → (AE )
S → A
A → integer
A → A + A
A → A− A
A → A ∗ A
A → A/A
A → (A)
4 / 27
Backus-Naur Form (a.k.a. Backus Normal Form)
S → A
A → integer | A + A | A− A | A ∗ A | A/A | (A)
John Backus (1924–2007)
https://mathshistory.st-andrews.
ac.uk/Biographies/Backus/
Peter Naur (1928–2016)
https://datamuseum.dk/
5 / 27
https://mathshistory.st-andrews.ac.uk/Biographies/Backus/
https://mathshistory.st-andrews.ac.uk/Biographies/Backus/
https://datamuseum.dk/
Historical example: fragment of the BNF of ALGOL 60
from: J. W. Backus et al., Comm. ACM 3 (5) (May 1960) 299–314.
6 / 27
EQUAL
A string is in EQUAL if it has an equal number of a’s and b’s.
{ε, ab, ba, aabb, abab, abba, baba, . . .}
An a-type string has one more a than b.
A b-type string has one more b than a.
7 / 27
EQUAL
A string is in EQUAL if it is
I ε, or
I a followed by a string of b-type, or
I b followed by a string of a-type.
A string is of a-type if it is
I just a, or
I a followed by a string in EQUAL, or
I b followed by two strings of a-type.
A string is of b-type if it is
I just b, or
I b followed by a string in EQUAL, or
I a followed by two strings of b-type.
S −→ ε
S −→ aB
S −→ bA
A −→ a
A −→ aS
A −→ bAA
B −→ b
B −→ bS
B −→ aBB
8 / 27
Context Free Grammar (CFG)
A Context Free Grammar consists of:
1. An alphabet
I The letters are called terminals.
2. Another set of symbols
I We call these symbols nonterminals.
I often represented by upper-case letters.
I One of these symbols is the Start symbol .
I S is often used as the start symbol.
3. A finite set of production rules of the form:
One nonterminal −→ finite string of terminals and/or nonterminals
9 / 27
Context Free Grammar (CFG)
Definition
The language generated by a Context Free Grammar (CFG) consists of those strings
which can be produced from the start symbol using the production rules.
A language generated by a CFG is called a Context Free Language (CFL).
10 / 27
EQUAL
Terminals: a, b
Nonterminals: S , A, B
Production rules:
S −→ ε
S −→ bA
S −→ aB
A −→ a
A −→ aS
A −→ bAA
B −→ b
B −→ bS
B −→ aBB
This CFG generates the language EQUAL.
11 / 27
History
Pān. ini (c520BC–c460BC)
I studied Sanskrit
Noam Chomsky (b. 1928)
I studied natural languages
John Backus
I studied programming languages
https://mathshistory.st-andrews.ac.uk/
Biographies/Panini/
Noam Chomsky, during visit to Australia
in 2011 to accept Sydney Peace Prize.
http://www.abc.net.au/news/2011-06-02/
noam-chomsky/2741826 12 / 27
https://mathshistory.st-andrews.ac.uk/Biographies/Panini/
https://mathshistory.st-andrews.ac.uk/Biographies/Panini/
http://www.abc.net.au/news/2011-06-02/noam-chomsky/2741826
http://www.abc.net.au/news/2011-06-02/noam-chomsky/2741826
S → aS | Sa | ε
1. S → Sa
2. S → aS
3. S → ε
Derivation of aaaa
S ⇒ Sa (Rule 1)
⇒ aSa (Rule 2)
⇒ aaSa (Rule 2)
⇒ aaSaa (Rule 1)
⇒ aaεaa (Rule 3)
= aaaa
13 / 27
Parse Tree
S
S a
a S
a S
S a
ε
Derivation of aaaa
S ⇒ Sa (Rule 1)
⇒ aSa (Rule 2)
⇒ aaSa (Rule 2)
⇒ aaSaa (Rule 1)
⇒ aaεaa (Rule 3)
= aaaa
14 / 27
EQUAL
1. S → ε
2. S → bA
3. S → aB
4. A → a
5. A → aS
6. A → bAA
7. B → b
8. B → bS
9. B → aBB
Derivation of baaabbab
S ⇒ bA (Rule 2)
⇒ baS (Rule 5)
⇒ baaB (Rule 3)
⇒ baaaBB (Rule 9)
⇒ baaaBb (Rule 7)
⇒ baaabSb (Rule 8)
⇒ baaabbAb (Rule 2)
⇒ baaabbab (Rule 4)
15 / 27
Parse Tree
S
b A
a S
a B
a B B
b S b
b A
a
Derivation of baaabbab
S ⇒ bA (Rule 2)
⇒ baS (Rule 5)
⇒ baaB (Rule 3)
⇒ baaaBB (Rule 9)
⇒ baaaBb (Rule 7)
⇒ baaabSb (Rule 8)
⇒ baaabbAb (Rule 2)
⇒ baaabbab (Rule 4)
16 / 27
PARENTHESES: the Dyck Language
PARENTHESES is the language over the two-letter alphabet { (,) }
consisting of all strings of correctly matched parentheses.
PARENTHESES = { ε, (), ()(), (()), ()()(), ()(()), (())(), (()()), ((())), . . . }.
Non-members: ()) (((()))
Expressing PARENTHESES strings in terms of smaller PARENTHESES strings:
Any non-empty string of parentheses must start with ( . Where is its matching ) ?
It could be at the other end: ( · · · · · · · · · · · · · · · · · · · · · · · · · · ·︸ ︷︷ ︸
smaller PARENTHESES string
)
It could be before the other end: ( · · · · · · · · · · · · · · · )︸ ︷︷ ︸
smaller
PARENTHESES
string
( · · · · · · · · · )︸ ︷︷ ︸
smaller
PARENTHESES
string
17 / 27
PARENTHESES: the Dyck Language
Inductive Definition
A string of parentheses S is one of the
following:
I the empty string, ε
I (S ′), where S ′ is a string of parentheses
I S1S2, where S1, S2 are strings of parentheses.
Context-Free Grammar
1. S → ε
2. S → (S)
3. S → SS
18 / 27
PARENTHESES: the Dyck Language
1. S → ε
2. S → (S)
3. S → SS
Walther von Dyck (1856–1934)
https://mathshistory.st-andrews.
ac.uk/Biographies/Von_Dyck/
Derivation of ()(())
S ⇒ SS (Rule 3)
⇒ (S)S (Rule 2)
⇒ (S)(S) (Rule 2)
⇒ ()(S) (Rule 1)
⇒ ()((S)) (Rule 2)
⇒ ()(()) (Rule 1)
19 / 27
https://mathshistory.st-andrews.ac.uk/Biographies/Von_Dyck/
https://mathshistory.st-andrews.ac.uk/Biographies/Von_Dyck/
PARENTHESES: the Dyck Language
Parse tree:
S
S
( S )
ε
S
( S )
( S )
ε
Derivation of ()(())
S ⇒ SS (Rule 3)
⇒ (S)S (Rule 2)
⇒ (S)(S) (Rule 2)
⇒ ()(S) (Rule 1)
⇒ ()((S)) (Rule 2)
⇒ ()(()) (Rule 1)
20 / 27
Exercises
I Suppose we have two types of brackets, such as round and square: ( ) and [ ] .
Find a context-free language for the set of all valid strings of such brackets.
I Find a context-free grammar for PALINDROMES
I For other languages we have met:
I find context-free grammars for them, OR
I if you think they don’t have one, think about why.
21 / 27
A simple property of derivations
At any stage, the string to the left of the first nonterminal must be a prefix of the final
(derived) string.
S =⇒ · · ·
…
…
=⇒ x1 · · · xkAB · · ·
=⇒ x1 · · · xkaXYB · · · (using A −→ aXY )
…
…
=⇒ x1 · · · xka · · · · · · (derived string)
22 / 27
4 + 2*3
S −→ E
E −→ T + E | T − E | T
T −→ F ∗ T | F/T | F
F −→ integer | (E )
S
E
T + E
F
4
T
F * T
2 F
3
Leftmost derivation:
S ⇒ E
⇒ T + E
⇒ F + E
⇒ 4 + E
⇒ 4 + T
⇒ 4 + F ∗ T
⇒ 4 + 2 ∗ T
⇒ 4 + 2 ∗ F
⇒ 4 + 2 ∗ 3
23 / 27
4 + 2*3
S −→ E
E −→ T + E | T − E | T
T −→ F ∗ T | F/T | F
F −→ integer | (E )
S
E
T + E
F
4
T
F * T
2 F
3
Rightmost derivation:
S ⇒ E
⇒ T + E
⇒ T + T
⇒ T + F ∗ T
⇒ T + F ∗ F
⇒ T + F ∗ 3
⇒ T + 2 ∗ 3
⇒ F + 2 ∗ 3
⇒ 4 + 2 ∗ 3
24 / 27
Leftmost and rightmost derivations
In a Leftmost derivation, the leftmost non-terminal is always replaced first.
In a Rightmost derivation, the rightmost non-terminal is always replaced first.
Theorem.
Whenever a string has a derivation, it also has a leftmost derivation of the same length.
Proof. See Tute 4.
Does the same hold for rightmost derivations?
25 / 27
A simple property of leftmost derivations
Whenever a production
X −→ terminals Non-terminal theRest
is applied, the terminal letters on the left are appended to the current prefix to give a
larger prefix of the derived string.
S =⇒ · · ·
…
…
=⇒ x1 · · · xk AB · · ·
=⇒ x1 · · · xk aXY B · · · (using A −→ aXY )
…
…
=⇒ x1 · · · xka · · · · · · (derived string)
26 / 27
Revision
I Context Free Grammars
I Definition. How to use them.
I Parse Trees
I Definition. How to make them.
I The Dyck language
I Leftmost and rightmost derivations.
Read:
Sipser, Ch. 2, pp. 101–108.
27 / 27