lec4.key
CS 314 Principles of Programming Languages
Prof. Zheng Zhang
Rutgers University
Lecture 4: Syntax Analysis (Parsing)
September 14, 2018
Review: Overview of Compilation
Compiler is in charge of translation:
• Intermediate representation (IR)
• Front end maps legal code into IR
• Back end maps IR onto target machine
2
Source
Code
Machine
Code
Errors
Front
End
Back
End
IR
Compiler
Review: Compiler Front End
3
Source
Code
Tokens
Scanner Parser IR
Compiler Front End
Errors
Syntax & semantic analyzer, IR code generator
Front End Responsibilities:
• Recognize legal programs
• Report errors
• Produce intermediate representation
Review: Compiler Front End
4
Source
Code
Tokens
Scanner Parser IR
Compiler Front End
Errors
Syntax & semantic analyzer, IR code generator
Front End Responsibilities:
• Recognize legal programs
• Report errors
• Produce intermediate representation
Review: Parsing Example
5
if id <= id then
:=id
Token Sequence
parser
if then
Parse Tree
Review: Context Free Grammars (CFGs)
• A formalism to for describing languages
• A CFG G = < T, N, P, S >:
1. A set T of terminal symbols (tokens).
2. A set N of nonterminal symbols.
3. A set P production (rewrite) rules.
4. A special start symbol S.
• The language L(G) is the set of sentences of terminal symbols in T* that
can be derived from the start symbol S:
L(G) = {w ∈ T* | S ⇒* w}
6
7
Terminal Symbol: Symbol-in-Boldface
Non-Terminal Symbol: Symbol-in-Angle-Brackets
Production Rule: Non-Terminal ::= Sequence of Symbols
or
Non-Terminal ::= Sequence | Sequence |
…
Elements of BNF Syntax
Example:
…
An Example of a Partial Context Free Grammar
8
…
…
CFGs are rewrite systems with restrictions on the form of
rewrite (production) rules that can be used. The left hand side
of a production rule can only be one non-terminal symbol.
Context free grammar
Rule 1 $1 ⇒ 1&
Rule 2 $0 ⇒ 0$
Rule 3 &1 ⇒ 1$
Rule 4 &0 ⇒ 0&
Rule 5 $# ⇒ →A
Rule 6 &# ⇒ →B
Not a context free grammar
How a BNF Grammar Describes a Language
A sentence is a sequence of terminal symbols (tokens).
The language L(G) of a BNF grammar G is the set of sentences
generated using the grammar:
• Begin with start symbol.
• Iteratively replace non-terminals with terminals according to the
rules (rewrite system).
This replacing process is called a derivation (⇒).
Zero or multiple derivation steps are written as ⇒*.
Formally, L(G) = {w ∈ T* | S ⇒* w}
9
10
Is X2 := 0 ∈ L(G), in another word, can X2 := 0 be derived in G?
Derivation in a Grammar (G)
In which order to apply the rules?
Typically, there are three options:
leftmost (⇒L), rightmost (⇒R), any (⇒)
Grammar G:
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
Start Symbol :
X2 := 0
?…
leftmost derivation rule
applied
X
X2 := 0
rightmost derivation rule
applied
X2 := 0
11
Derivation in a Grammar (G)
Is X2 := 0 ∈ L(G), i.e., can X2 := 0 be derived in G?
rightmost derivation rule
applied
X2 := 0
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
leftmost derivation rule
applied
X
X2 := 0
rightmost derivation rule
applied
X2 := 0
12
Derivation in a Grammar (G)
Is X2 := 0 ∈ L(G), i.e., can X2 := 0 be derived in G?
rightmost derivation rule
applied
X2 := 0
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
leftmost derivation rule
applied
X
X2 := 0
rightmost derivation rule
applied
X2 := 0
13
Derivation in a Grammar (G)
Is X2 := 0 ∈ L(G), i.e., can X2 := 0 be derived in G?
rightmost derivation rule
applied
X2 := 0
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
leftmost derivation rule
applied
X
X2 := 0
rightmost derivation rule
applied
X2 := 0
14
Derivation in a Grammar (G)
Is X2 := 0 ∈ L(G), i.e., can X2 := 0 be derived in G?
rightmost derivation rule
applied
X2 := 0
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
leftmost derivation rule
applied
X
X2 := 0
rightmost derivation rule
applied
X2 := 0
15
Derivation in a Grammar (G)
Is X2 := 0 ∈ L(G), i.e., can X2 := 0 be derived in G?
rightmost derivation rule
applied
X2 := 0
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
leftmost derivation rule
applied
X
X2 := 0
rightmost derivation rule
applied
X2 := 0
16
Derivation in a Grammar (G)
Is X2 := 0 ∈ L(G), i.e., can X2 := 0 be derived in G?
rightmost derivation rule
applied
X2 := 0
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
leftmost derivation rule
applied
X
X2 := 0
rightmost derivation rule
applied
X2 := 0
17
Derivation in a Grammar (G)
Is X2 := 0 ∈ L(G), i.e., can X2 := 0 be derived in G?
rightmost derivation rule
applied
X2 := 0
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
leftmost derivation rule
applied
X
X2 := 0
rightmost derivation rule
applied
X2 := 0
18
Derivation in a Grammar (G)
Is X2 := 0 ∈ L(G), i.e., can X2 := 0 be derived in G?
rightmost derivation rule
applied
X2 := 0
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
leftmost derivation rule
applied
X
X2 := 0
rightmost derivation rule
applied
X2 := 0
19
Derivation in a Grammar (G)
Is X2 := 0 ∈ L(G), i.e., can X2 := 0 be derived in G?
rightmost derivation rule
applied
X2 := 0
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
leftmost derivation rule
applied
X
X2 := 0
rightmost derivation rule
applied
X2 := 0
20
Derivation in a Grammar (G)
Is X2 := 0 ∈ L(G), i.e., can X2 := 0 be derived in G?
rightmost derivation rule
applied
X2 := 0
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
leftmost derivation rule
applied
X
X2 := 0
rightmost derivation rule
applied
X2 := 0
21
Derivation in a Grammar (G)
Is X2 := 0 ∈ L(G), i.e., can X2 := 0 be derived in G?
rightmost derivation rule
applied
X2 := 0
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
leftmost derivation rule
applied
X
X2 := 0
22
Derivation in a Grammar (G)
Is X2 := 0 ∈ L(G), i.e., can X2 := 0 be derived in G?
rightmost derivation rule
applied
X2 := 0
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
leftmost derivation rule
applied
X
X2 := 0
rightmost derivation rule
applied
X2 := 0
23
Derivation in a Grammar (G)
Is X2 := 0 ∈ L(G), i.e., can X2 := 0 be derived in G?
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
leftmost derivation rule
applied
X
X2 := 0
rightmost derivation rule
applied
X2 := 0
24
Derivation in a Grammar (G)
Is X2 := 0 ∈ L(G), i.e., can X2 := 0 be derived in G?
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
leftmost derivation rule
applied
X
X2 := 0
rightmost derivation rule
applied
X2 := 0
25
Derivation in a Grammar (G)
Is X2 := 0 ∈ L(G), i.e., can X2 := 0 be derived in G?
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
leftmost derivation rule
applied
X
X2 := 0
rightmost derivation rule
applied
X2 := 0
26
Derivation in a Grammar (G)
Is X2 := 0 ∈ L(G), i.e., can X2 := 0 be derived in G?
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
leftmost derivation rule
applied
X
X2 := 0
rightmost derivation rule
applied
X2 := 0
27
Derivation in a Grammar (G)
Is X2 := 0 ∈ L(G), i.e., can X2 := 0 be derived in G?
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
leftmost derivation rule
applied
X
X2 := 0
rightmost derivation rule
applied
X2 := 0
28
Derivation in a Grammar (G)
Is X2 := 0 ∈ L(G), i.e., can X2 := 0 be derived in G?
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
leftmost derivation rule
applied
X
X2 := 0
rightmost derivation rule
applied
X2 := 0
29
Derivation in a Grammar (G)
Is X2 := 0 ∈ L(G), i.e., can X2 := 0 be derived in G?
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
leftmost derivation rule
applied
X
X2 := 0
rightmost derivation rule
applied
X2 := 0
30
Derivation in a Grammar (G)
Is X2 := 0 ∈ L(G), i.e., can X2 := 0 be derived in G?
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
leftmost derivation rule
applied
X
X2 := 0
rightmost derivation rule
applied
X2 := 0
31
Derivation in a Grammar (G)
Is X2 := 0 ∈ L(G), i.e., can X2 := 0 be derived in G?
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
32
Derivation in a Grammar (G)
Is X2 := 0 ∈ L(G), i.e., can X2 := 0 be derived in G?
leftmost derivation rule
applied
X
X2 := 0
rightmost derivation rule
applied
X2 := 0
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
X2 := 0 Rule
33
Parsing of a Language L(G)
Can we recognize X2 := 0 as being in L(G)?
We will talk about LL(1) grammars and an example parser for
a small language (tinyL) that is implemented using mutually
recursive procedures (recursive descent parser).
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
X2 := 0 Rule
34
Parsing of a Language L(G)
Can we recognize X2 := 0 as being in L(G)?
We will talk about LL(1) grammars and an example parser for
a small language (tinyL) that is implemented using mutually
recursive procedures (recursive descent parser).
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
X2 := 0 Rule
35
Parsing of a Language L(G)
Can we recognize X2 := 0 as being in L(G)?
We will talk about LL(1) grammars and an example parser for
a small language (tinyL) that is implemented using mutually
recursive procedures (recursive descent parser).
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
X2 := 0 Rule
36
Parsing of a Language L(G)
Can we recognize X2 := 0 as being in L(G)?
We will talk about LL(1) grammars and an example parser for
a small language (tinyL) that is implemented using mutually
recursive procedures (recursive descent parser).
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
X2 := 0 Rule
37
Parsing of a Language L(G)
Can we recognize X2 := 0 as being in L(G)?
We will talk about LL(1) grammars and an example parser for
a small language (tinyL) that is implemented using mutually
recursive procedures (recursive descent parser).
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
X2 := 0 Rule
38
Parsing of a Language L(G)
Can we recognize X2 := 0 as being in L(G)?
We will talk about LL(1) grammars and an example parser for
a small language (tinyL) that is implemented using mutually
recursive procedures (recursive descent parser).
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
X2 := 0 Rule
39
Parsing of a Language L(G)
Can we recognize X2 := 0 as being in L(G)?
We will talk about LL(1) grammars and an example parser for
a small language (tinyL) that is implemented using mutually
recursive procedures (recursive descent parser).
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
X2 := 0 Rule
40
Parsing of a Language L(G)
Can we recognize X2 := 0 as being in L(G)?
We will talk about LL(1) grammars and an example parser for
a small language (tinyL) that is implemented using mutually
recursive procedures (recursive descent parser).
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
X2 := 0 Rule
41
Parsing of a Language L(G)
Can we recognize X2 := 0 as being in L(G)?
We will talk about LL(1) grammars and an example parser for
a small language (tinyL) that is implemented using mutually
recursive procedures (recursive descent parser).
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
X2 := 0 Rule
42
Parsing of a Language L(G)
Can we recognize X2 := 0 as being in L(G)?
We will talk about LL(1) grammars and an example parser for
a small language (tinyL) that is implemented using mutually
recursive procedures (recursive descent parser).
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
X2 := 0 Rule
43
Parsing of a Language L(G)
Can we recognize X2 := 0 as being in L(G)?
We will talk about LL(1) grammars and an example parser for
a small language (tinyL) that is implemented using mutually
recursive procedures (recursive descent parser).
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
X2 := 0 Rule
44
Parsing of a Language L(G)
Can we recognize X2 := 0 as being in L(G)?
We will talk about LL(1) grammars and an example parser for
a small language (tinyL) that is implemented using mutually
recursive procedures (recursive descent parser).
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
X2 := 0 Rule
45
Parsing of a Language L(G)
Can we recognize X2 := 0 as being in L(G)?
We will talk about LL(1) grammars and an example parser for
a small language (tinyL) that is implemented using mutually
recursive procedures (recursive descent parser).
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
46
Parsing of a Language L(G)
Can we recognize X2 := 0 as being in L(G)?
X2 := 0 Rule
We will talk about LL(1) grammars and an example parser for
a small language (tinyL) that is implemented using mutually
recursive procedures (recursive descent parser).
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
47
Parsing of a Language L(G)
Can we recognize X2 := 0 as being in L(G)?
X2 := 0 Rule
We will talk about LL(1) grammars and an example parser for
a small language (tinyL) that is implemented using mutually
recursive procedures (recursive descent parser) in next class.
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
Parse Trees (in G)
48
A parse tree of X2 := 0 in (G):
Each internal (non-leaf) node is a nonterminal;
its children are the RHS of a production for that NT.
The parse tree demonstrates that the grammar generates the terminal
string on the frontier.
X2 := 0 Rule
2
Parse Trees (in G)
49
A parse tree of X2 := 0 in (G):
Each internal (non-leaf) node is a nonterminal;
its children are the RHS of a production for that NT.
The parse tree demonstrates that the grammar generates the terminal
string on the frontier.
X2 := 0 Rule
2
Parse Trees (in G)
50
A parse tree of X2 := 0 in (G):
Each internal (non-leaf) node is a nonterminal;
its children are the RHS of a production for that NT.
The parse tree demonstrates that the grammar generates the terminal
string on the frontier.
X2 := 0 Rule
2
Parse Trees (in G)
51
A parse tree of X2 := 0 in (G):
Each internal node is a nonterminal;
its children are the RHS of a production for that NT.
The parse tree demonstrates that the grammar generates the terminal
string on the frontier.
X2 := 0 Rule
2
X
Parse Trees (in G)
52
A parse tree of X2 := 0 in (G):
Each internal node is a nonterminal;
its children are the RHS of a production for that NT.
The parse tree demonstrates that the grammar generates the terminal
string on the frontier.
X2 := 0 Rule
2
X
53
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < ident > ::= < letter > |
4. < ident > < letterordigit >
5. < stmt > ::= < ident > := 0
6. < letterordigit > ::= < letter > | < digit >
A Language May Have Many Grammars
Consider G’:
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
The Original Grammar G:
54
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < ident > ::= < letter > |
4. < ident > < letterordigit >
5. < stmt > ::= < ident > := 0
6. < letterordigit > ::= < letter > | < digit >
A Language May Have Many Grammars
Consider G’: The Original Grammar G:
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
How about the parse tree of X2 := 0 in G’ and G?
55
A Language May Have Many Grammars
Consider G’: The Original Grammar G:
0
2
0
2
X2 := 0
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < ident > ::= < letter > |
4. < ident > < letterordigit >
5. < stmt > ::= < ident > := 0
6. < letterordigit > ::= < letter > | < digit >
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
X X
56
A Language May Have Many Grammars
Consider G’: The Original Grammar G:
0
2
2
X2 := 0
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < ident > ::= < letter > |
4. < ident > < letterordigit >
5. < stmt > ::= < ident > := 0
6. < letterordigit > ::= < letter > | < digit >
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
X X
57
A Language May Have Many Grammars
Consider G’: The Original Grammar G:
0
2
2
X2 := 0
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < ident > ::= < letter > |
4. < ident > < letterordigit >
5. < stmt > ::= < ident > := 0
6. < letterordigit > ::= < letter > | < digit >
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
X X
58
A Language May Have Many Grammars
Consider G’: The Original Grammar G:
0
2
2
X2 := 0
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < ident > ::= < letter > |
4. < ident > < letterordigit >
5. < stmt > ::= < ident > := 0
6. < letterordigit > ::= < letter > | < digit >
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
X X
59
A Language May Have Many Grammars
Consider G’: The Original Grammar G:
0
2
2
X2 := 0
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < ident > ::= < letter > |
4. < ident > < letterordigit >
5. < stmt > ::= < ident > := 0
6. < letterordigit > ::= < letter > | < digit >
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
X X
60
A Language May Have Many Grammars
Consider G’: The Original Grammar G:
0
2
2
X2 := 0
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < ident > ::= < letter > |
4. < ident > < letterordigit >
5. < stmt > ::= < ident > := 0
6. < letterordigit > ::= < letter > | < digit >
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
X X
61
A Language May Have Many Grammars
Consider G’: The Original Grammar G:
0
2
2
X2 := 0
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < ident > ::= < letter > |
4. < ident > < letterordigit >
5. < stmt > ::= < ident > := 0
6. < letterordigit > ::= < letter > | < digit >
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
X X
62
A Language May Have Many Grammars
Consider G’: The Original Grammar G:
0
2
2
X2 := 0
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < ident > ::= < letter > |
4. < ident > < letterordigit >
5. < stmt > ::= < ident > := 0
6. < letterordigit > ::= < letter > | < digit >
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
X X
63
A Language May Have Many Grammars
Consider G’: The Original Grammar G:
2
2
X2 := 0
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < ident > ::= < letter > |
4. < ident > < letterordigit >
5. < stmt > ::= < ident > := 0
6. < letterordigit > ::= < letter > | < digit >
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
X X
64
A Language May Have Many Grammars
Consider G’: The Original Grammar G:
2
2
X2 := 0
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < ident > ::= < letter > |
4. < ident > < letterordigit >
5. < stmt > ::= < ident > := 0
6. < letterordigit > ::= < letter > | < digit >
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
X X
65
A Language May Have Many Grammars
Consider G’: The Original Grammar G:
2
2
X2 := 0
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < ident > ::= < letter > |
4. < ident > < letterordigit >
5. < stmt > ::= < ident > := 0
6. < letterordigit > ::= < letter > | < digit >
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
X X
66
A Language May Have Many Grammars
Consider G’: The Original Grammar G:
2
2
X2 := 0
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < ident > ::= < letter > |
4. < ident > < letterordigit >
5. < stmt > ::= < ident > := 0
6. < letterordigit > ::= < letter > | < digit >
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
X X
67
A Language May Have Many Grammars
Consider G’: The Original Grammar G:
2
2
X2 := 0
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < ident > ::= < letter > |
4. < ident > < letterordigit >
5. < stmt > ::= < ident > := 0
6. < letterordigit > ::= < letter > | < digit >
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
X X
68
A Language May Have Many Grammars
Consider G’: The Original Grammar G:
How about the parse tree of X2 := 0 in G and G?
G and G’ generate the same language, but yield different parse trees.
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < ident > ::= < letter > |
4. < identifier > < letterordigit >
5. < stmt > ::= < ident > := 0
6. < letterordigit > ::= < letter > | < digit >
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
Grammars and Programming Languages
• Captures the logic structure of the language
⇒ structure carries some semantic information
(example: expression grammar)
• Use meaningful names
• Are easy to read
• Are unambiguous
• …
69
Many grammars may correspond to one programming language.
Good grammars:
What’s problem with ambiguity?
Ambiguous Grammars
70
• two distinct parse trees for w, or
• two distinct leftmost derivations for w, or
• two distinct rightmost derivations for w.
We want a unique semantics of our programs, which typically
requires a unique syntactic structure.
“Time flies like an arrow; fruit flies like a banana.”
A grammar G is ambiguous iff there exists a w ∈ L(G) such that
there are:
Simple Statement Grammar
71
if
if x == 0 then if y == 0 then z := 1 else z := 2
How are nested if statements parsed with this grammar?
Simple Statement Grammar
72
if x == 0 then if y == 0 then z := 1 else z := 2
How are nested if statements parsed with this grammar?
[ ]
if
Simple Statement Grammar
73
if x == 0 then if y == 0 then z := 1 else z := 2
How are nested if statements parsed with this grammar?
[ ]
if
Dangling Else Ambiguity
74
How to deal with ambiguity?
if x = 0 then if y = 0 then z := 1 else z := 2
• Change the language to include delimiters
Example: Adding new terminal symbols.
It can fix the dangling else, expression grammars.
• Change the grammar
Example: Impose associativity and precedence in an arithmetic
expression grammar.
Dangling Else Ambiguity
75
How to deal with ambiguity?
if x = 0 then if y = 0 then z := 1 else z := 2
• Change the language to include delimiters
Example: Adding new terminal symbols.
It can fix the dangling else, expression grammars.
• Change the grammar
Example: Impose associativity and precedence in an arithmetic
expression grammar.
Changing the Language to Include Delimiters
76
if
Algol 68 if statement:
How would you use this syntax to express the meaning of
the two different parse trees for:
if x = 0 then if y = 0 then z := 1 else z := 2
if x = 0 then if y = 0 then z := 1 fi else z := 2 fi
if x = 0 then if y = 0 then z := 1 else z := 2 fi fi
This:
Or that:
Dangling Else Ambiguity
77
How to deal with ambiguity?
if x = 0 then if y = 0 then z := 1 else z := 2
• Change the language to include delimiters
Example: Adding new terminal symbols.
It can fix the dangling else, expression grammars.
• Change the grammar
Example: Impose associativity and precedence in an arithmetic
expression grammar.
Dangling Else Ambiguity
78
How to deal with ambiguity?
if x = 0 then if y = 0 then z := 1 else z := 2
• Change the language to include delimiters
Example: Adding new terminal symbols.
It can fix the dangling else, expression grammars.
• Change the grammar
Example: Impose associativity and precedence in an arithmetic
expression grammar.
Arithmetic Expression Grammar
79
< start > ::= < expr >
< expr > :: = < expr > + < expr > |
< expr > – < expr > |
< expr > * < expr > |
< expr > / < expr > |
< expr > ^ < expr > |
< d > | < l >
< d > :: = 0 | 1 | 2 | 3 | … | 9
< l > :: = a | b | c | … | z
Parse “8 – 3 * 2”:
80
Possible Parse Trees
Parse “8 – 3 * 2”: < start > ::= < expr >
< expr > :: = < expr > + < expr > |
< expr > – < expr > |
< expr > * < expr > |
< expr > / < expr > |
< expr > ^ < expr > |
< d > | < l >
< d > :: = 0 | 1 | 2 | 3 | … | 9
< l > :: = a | b | c | … | z
< expr >
< expr > – < expr >
< expr > * < expr >
< d > < d >
< d >
3 2
8
IParse Tree
81
Possible Parse Trees
Parse “8 – 3 * 2”: < start > ::= < expr >
< expr > :: = < expr > + < expr > |
< expr > – < expr > |
< expr > * < expr > |
< expr > / < expr > |
< expr > ^ < expr > |
< d > | < l >
< d > :: = 0 | 1 | 2 | 3 | … | 9
< l > :: = a | b | c | … | z
< expr >
< expr > – < expr >
< expr > * < expr >
< d > < d >
< d >
3 2
8
IParse Tree
82
Possible Parse Trees
Parse “8 – 3 * 2”: < start > ::= < expr >
< expr > :: = < expr > + < expr > |
< expr > – < expr > |
< expr > * < expr > |
< expr > / < expr > |
< expr > ^ < expr > |
< d > | < l >
< d > :: = 0 | 1 | 2 | 3 | … | 9
< l > :: = a | b | c | … | z
< expr >
< expr > – < expr >
< expr > * < expr >
< d > < d >
< d >
3 2
8
IParse Tree
83
Possible Parse Trees
Parse “8 – 3 * 2”: < start > ::= < expr >
< expr > :: = < expr > + < expr > |
< expr > – < expr > |
< expr > * < expr > |
< expr > / < expr > |
< expr > ^ < expr > |
< d > | < l >
< d > :: = 0 | 1 | 2 | 3 | … | 9
< l > :: = a | b | c | … | z
< expr >
< expr > – < expr >
< expr > * < expr >
< d > < d >
< d >
3 2
8
IParse Tree
84
Possible Parse Trees
Parse “8 – 3 * 2”: < start > ::= < expr >
< expr > :: = < expr > + < expr > |
< expr > – < expr > |
< expr > * < expr > |
< expr > / < expr > |
< expr > ^ < expr > |
< d > | < l >
< d > :: = 0 | 1 | 2 | 3 | … | 9
< l > :: = a | b | c | … | z
< expr >
< expr > – < expr >
< expr > * < expr >
< d > < d >
< d >
3 2
8
IParse Tree
85
Possible Parse Trees
Parse “8 – 3 * 2”: < start > ::= < expr >
< expr > :: = < expr > + < expr > |
< expr > – < expr > |
< expr > * < expr > |
< expr > / < expr > |
< expr > ^ < expr > |
< d > | < l >
< d > :: = 0 | 1 | 2 | 3 | … | 9
< l > :: = a | b | c | … | z
< expr >
< expr > – < expr >
< expr > * < expr >
< d > < d >
< d >
3 2
8
IParse Tree
86
Possible Parse Trees
Parse “8 – 3 * 2”: < start > ::= < expr >
< expr > :: = < expr > + < expr > |
< expr > – < expr > |
< expr > * < expr > |
< expr > / < expr > |
< expr > ^ < expr > |
< d > | < l >
< d > :: = 0 | 1 | 2 | 3 | … | 9
< l > :: = a | b | c | … | z
< expr >
< expr > * < expr >
< expr > – < expr >
< d > < d >
< d >
3
2
8
IIParse Tree
87
Possible Parse Trees
Parse “8 – 3 * 2”: < start > ::= < expr >
< expr > :: = < expr > + < expr > |
< expr > – < expr > |
< expr > * < expr > |
< expr > / < expr > |
< expr > ^ < expr > |
< d > | < l >
< d > :: = 0 | 1 | 2 | 3 | … | 9
< l > :: = a | b | c | … | z
< expr >
< expr > * < expr >
< expr > – < expr >
< d > < d >
< d >
3
2
8
IIParse Tree
88
Possible Parse Trees
Parse “8 – 3 * 2”: < start > ::= < expr >
< expr > :: = < expr > + < expr > |
< expr > – < expr > |
< expr > * < expr > |
< expr > / < expr > |
< expr > ^ < expr > |
< d > | < l >
< d > :: = 0 | 1 | 2 | 3 | … | 9
< l > :: = a | b | c | … | z
< expr >
< expr > * < expr >
< expr > – < expr >
< d > < d >
< d >
3
2
8
IIParse Tree
89
Possible Parse Trees
Parse “8 – 3 * 2”: < start > ::= < expr >
< expr > :: = < expr > + < expr > |
< expr > – < expr > |
< expr > * < expr > |
< expr > / < expr > |
< expr > ^ < expr > |
< d > | < l >
< d > :: = 0 | 1 | 2 | 3 | … | 9
< l > :: = a | b | c | … | z
< expr >
< expr > * < expr >
< expr > – < expr >
< d > < d >
< d >
3
2
8
IIParse Tree
90
Possible Parse Trees
Parse “8 – 3 * 2”: < start > ::= < expr >
< expr > :: = < expr > + < expr > |
< expr > – < expr > |
< expr > * < expr > |
< expr > / < expr > |
< expr > ^ < expr > |
< d > | < l >
< d > :: = 0 | 1 | 2 | 3 | … | 9
< l > :: = a | b | c | … | z
< expr >
< expr > * < expr >
< expr > – < expr >
< d > < d >
< d >
3
2
8
IIParse Tree
91
Possible Parse Trees
Parse “8 – 3 * 2”: < start > ::= < expr >
< expr > :: = < expr > + < expr > |
< expr > – < expr > |
< expr > * < expr > |
< expr > / < expr > |
< expr > ^ < expr > |
< d > | < l >
< d > :: = 0 | 1 | 2 | 3 | … | 9
< l > :: = a | b | c | … | z
< expr >
< expr > * < expr >
< expr > – < expr >
< d > < d >
< d >
3
2
8
IIParse Tree
Changing the Language to Include Delimiters
92
(
Changing the Language to Include Delimiters
93
(
Pretty ugly, isn’t it?
Is there any other way to disambiguate our expression grammar?
(8) – ( (3) * (2) )
OR
( (8) – (3) ) * (2)
“8 – 3 * 2”
Changing the Grammar to Impose Precedence
94
< start > ::= < expr >
< expr > ::= < expr > + < expr > |
< expr > – < expr > |
< term >
< term > ::= < term > * < term > |
< term > / < term > |
< d > | < l >
< d > ::= 0 | 1 | 2 | 3 | … | 9
< l > ::= a | b | c | … | z
< start > ::= < expr >
< expr > ::= < expr > + < expr > |
< expr > – < expr > |
< expr > * < expr > |
< expr > / < expr > |
< d > | < l >
< d > ::= 0 | 1 | 2 | 3 | … | 9
< l > ::= a | b | c | … | z
New Grammar G’ Original Grammar G
95
Grouping in Parse Tree Now Reflects Precedence
Parse “8 – 3 * 2”: < start > ::= < expr >
< expr > :: = < expr > + < expr > |
< expr > – < expr > |
< term >
< term > ::= < term > * < term > |
< term > / < term > |
< d > | < l >
< d > :: = 0 | 1 | 2 | 3 | … | 9
< l > :: = a | b | c | … | z
< expr >
< expr > – < expr >
< term > * < term >
< d > < d >
< term >
3 2
8
< d >
96
Grouping in Parse Tree Now Reflects Precedence
Parse “8 – 3 * 2”: < start > ::= < expr >
< expr > :: = < expr > + < expr > |
< expr > – < expr > |
< term >
< term > ::= < term > * < term > |
< term > / < term > |
< d > | < l >
< d > :: = 0 | 1 | 2 | 3 | … | 9
< l > :: = a | b | c | … | z
< expr >
< expr > – < expr >
< term > * < term >
< d > < d >
< term >
3 2
8
< d >
97
Grouping in Parse Tree Now Reflects Precedence
Parse “8 – 3 * 2”: < start > ::= < expr >
< expr > :: = < expr > + < expr > |
< expr > – < expr > |
< term >
< term > ::= < term > * < term > |
< term > / < term > |
< d > | < l >
< d > :: = 0 | 1 | 2 | 3 | … | 9
< l > :: = a | b | c | … | z
< expr >
< expr > –
< term > * < term >
< d > < d >
< term >
3 2
8
< d >
< expr >
Only One Possible Parse Tree
Precedence
98
• Low Precedence:
Addition + and Subtraction –
• Medium Precedence:
Multiplication * and Division /
• Highest Precedence:
Exponentiation ^
< start > ::= < expr >
< expr > :: = < expr > + < expr > |
< expr > – < expr > |
< term >
< term > ::= < term > * < term > |
< term > / < term > |
< d > | < l >
< d > :: = 0 | 1 | 2 | 3 | … | 9
< l > :: = a | b | c | … | z
Still Have Ambiguity…
99
3 – 2 – 1 still a problem:
< start >
< expr > – < expr >
< expr > – < expr >
< start >
< expr >–< expr >
< expr > – < expr >
3 2
1 3
2 1
Eval: (3 – 2) – 1 = 0 Eval: 3 – (2 – 1) = 2
< start > ::= < expr >
< expr > :: = < expr > + < expr > | < expr > – < expr > | < term >
< term > ::= < term > * < term > | < term > / < term > | < d > | < l >
< d > :: = 0 | 1 | 2 | 3 | … | 9
< l > :: = a | b | c | … | z
Still Have Ambiguity…
100
3 – 2 – 1 still a problem:
< start >
< expr > – < expr >
< expr > – < expr >
< start >
< expr >–< expr >
< expr > – < expr >
3 2
1 3
2 1
Eval: (3 – 2) – 1 = 0 Eval: 3 – (2 – 1) = 2
< start > ::= < expr >
< expr > :: = < expr > + < expr > | < expr > – < expr > | < term >
< term > ::= < term > * < term > | < term > / < term > | < d > | < l >
< d > :: = 0 | 1 | 2 | 3 | … | 9
< l > :: = a | b | c | … | z
Still Have Ambiguity…
101
• Grouping of operators of same precedence not disambiguated.
• Non-commutative operators: only one parse tree correct.
3 – 2 – 1 still a problem:
< start > ::= < expr >
< expr > :: = < expr > + < expr > | < expr > – < expr > | < term >
< term > ::= < term > * < term > | < term > / < term > | < d > | < l >
< d > :: = 0 | 1 | 2 | 3 | … | 9
< l > :: = a | b | c | … | z
Imposing Associativity
102
< expr > :: = < d > – < expr > |
< d >
< d > :: = 0 | 1 | 2 | 3 | … | 9
Same grammar with left / right recursion for – :
Our choices:
< expr > :: = < expr > – < d > |
< d >
< d > :: = 0 | 1 | 2 | 3 | … | 9
Or: Which one do we want for
3 – 2 – 1 ?
Associativity
103
• Deals with operators of same precedence
• Implicit grouping or parenthesizing
• Left to right: *, /, +, –
• Right to left: ^
Complete, Unambiguous Arithmetic Expression Grammar
104
< start > ::= < expr >
< expr > ::= < expr > + < term > |
< expr > − < term > |
< term >
< term > ::= < term > * < factor > |
< term > / < factor > |
< factor >
< factor >::= < g > ^ < factor > |
< g >
< g > ::= ( < expr > ) | < d > | < l >
< d > ::= 0 | 1 | 2 | … | 9
< l > ::= a | b | c | … | z
< start > ::= < expr >
< expr > :: = < expr > + < expr > |
< expr > – < expr > |
< expr > * < expr > |
< expr > / < expr > |
< expr > ^ < expr > |
< d > | < l >
< d > :: = 0 | 1 | 2 | 3 | … | 9
< l > :: = a | b | c | … | z
Original Ambiguous Grammar G
Unambiguous Grammar G
Dealing with Ambiguity
Some facts:
• Can’t always remove an ambiguity from a grammar by
restructuring productions.
• An inherently ambiguous language does not possess an
unambiguous grammar.
• Detecting ambiguity in context-free grammars is an
undecidable problem. There is no polynomial-time
algorithm that can examine an arbitrary context-free
grammar and tell if it is ambiguous.
105
Next Lecture
106
• Read Scott, Chapter 2.2 – 2.5 (skip 2.3.3 bottom-up parsing)
Things to do: