lec5.key
CS 314 Principles of Programming Languages
Prof. Zheng Zhang
Rutgers University
Lecture 5: Syntax Analysis (Parsing)
September 19, 2018
Class Information
2
• Homework 1 is being graded now.
The sample solution will be posted soon.
• Homework 2 will be posted tomorrow.
Review: Context Free Grammars (CFGs)
• A formalism to for describing languages
• A CFG G = < T, N, P, S >:
1. A set T of terminal symbols (tokens).
2. A set N of nonterminal symbols.
3. A set P production (rewrite) rules.
4. A special start symbol S.
• The language L(G) is the set of sentences of terminal symbols in T* that
can be derived from the start symbol S:
L(G) = {w ∈ T* | S ⇒* w}
3
4
Terminal Symbol: Symbol-in-Boldface
Non-Terminal Symbol: Symbol-in-Angle-Brackets
Production Rule: Non-Terminal ::= Sequence of Symbols
or
Non-Terminal ::= Sequence | Sequence |
…
Elements of BNF Syntax
Example:
…
terminal
non-terminalterminal
Review: Context Free Grammar
5
…
…
CFGs are rewrite systems with restrictions on the form of
rewrite (production) rules that can be used. The left hand side
of a production rule can only be one non-terminal symbol.
Rule 1 $1 ⇒ 1&
Rule 2 $0 ⇒ 0$
Rule 3 &1 ⇒ 1$
Rule 4 &0 ⇒ 0&
Rule 5 $# ⇒ →A
Rule 6 &# ⇒ →B
Context free grammar Not a context free grammar
6
A Language May Have Many Grammars
Consider G’: The Original Grammar G:
2
2
X2 := 0
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < ident > ::= < letter > |
4. < ident > < letterordigit >
5. < stmt > ::= < ident > := 0
6. < letterordigit > ::= < letter > | < digit >
1. < letter > ::= A | B | C | … | Z
2. < digit > ::= 0 | 1 | 2 | … | 9
3. < identifier > ::= < letter > |
4. < identifier > < letter > |
5. < identifier > < digit >
6. < stmt > ::= < identifier > := 0
X X
Review: Grammars and Programming Languages
• Captures the logic structure of the language
⇒ structure carries some semantic information
(example: expression grammar)
• Use meaningful names
• Are easy to read
• Are unambiguous
• …
7
Many grammars may correspond to one programming language.
Good grammars:
Review: Ambiguous Grammars
8
• two distinct parse trees for w, or
• two distinct leftmost derivations for w, or
• two distinct rightmost derivations for w.
We want a unique semantics of our programs, which
typically requires a unique syntactic structure.
“Time flies like an arrow; fruit flies like a banana.”
A grammar G is ambiguous iff there exists a w ∈ L(G) such that
there are:
Review: Arithmetic Expression Grammar
9
< start > ::= < expr >
< expr > :: = < expr > – < expr > |
< expr > * < expr > |
< d > | < l >
< d > :: = 0 | 1 | 2 | 3 | … | 9
< l > :: = a | b | c | … | z
Parse “8 – 3 * 2”:
< expr >
< expr > – < expr >
< expr > * < expr >
< d > < d >
< d >
3 2
8
< expr >
< expr > * < expr >
< expr > – < expr >
< d > < d >
< d >
3
2
8
Two parse trees!
Review: Arithmetic Expression Grammar
10
Parse “8 – 3 * 2”:
< expr >
< expr > – < expr >
< expr > * < expr >
< d > < d >
< d >
3 28
< expr >
< expr > * < expr >
< expr > – < expr >
< d > < d > < d >
3 28
leftmost derivation
leftmost derivation
Two Parse Trees —> Two leftmost derivations!
Review: Ambiguity
11
How to deal with ambiguity?
• Change the language
Example: Adding new terminal symbols as delimiters.
Fix the dangling else, expression grammars.
• Change the grammar
Example: Impose associativity and precedence in an arithmetic
expression grammar.
Changing the Grammar to Impose Precedence
12
< start > ::= < expr >
< expr > ::= < expr > + < expr > |
< expr > – < expr > |
< term >
< term > ::= < term > * < term > |
< term > / < term > |
< d > | < l >
< d > ::= 0 | 1 | 2 | 3 | … | 9
< l > ::= a | b | c | … | z
< start > ::= < expr >
< expr > ::= < expr > + < expr > |
< expr > – < expr > |
< expr > * < expr > |
< expr > / < expr > |
< d > | < l >
< d > ::= 0 | 1 | 2 | 3 | … | 9
< l > ::= a | b | c | … | z
Modified Grammar G’Original Grammar G
13
Grouping in Parse Tree Now Reflects Precedence
Parse “8 – 3 * 2”: < start > ::= < expr >
< expr > :: = < expr > + < expr > |
< expr > – < expr > |
< term >
< term > ::= < term > * < term > |
< term > / < term > |
< d > | < l >
< d > :: = 0 | 1 | 2 | 3 | … | 9
< l > :: = a | b | c | … | z
< expr >
< expr > – < expr >
< term > * < term >
< d > < d >
< term >
3 28
< d >
Modified Grammar G’
Only One Possible Parse Tree
Precedence
14
• Low Precedence:
Addition + and Subtraction –
• Medium Precedence:
Multiplication * and Division /
• Highest Precedence:
Exponentiation ^
< start > ::= < expr >
< expr > :: = < expr > + < expr > |
< expr > – < expr > |
< term >
< term > ::= < term > * < term > |
< term > / < term > |
< d > | < l >
< d > :: = 0 | 1 | 2 | 3 | … | 9
< l > :: = a | b | c | … | z
Still Have Ambiguity…
15
< start >
< expr > – < expr >
< expr > – < expr >
< start >
< expr >–< expr >
< expr > – < expr >
3 2
1 3
2 1
Eval: (3 – 2) – 1 = 0 Eval: 3 – (2 – 1) = 2
< start > ::= < expr >
< expr > :: = < expr > + < expr > | < expr > – < expr > | < term >
< term > ::= < term > * < term > | < term > / < term > | < d > | < l >
< d > :: = 0 | 1 | 2 | 3 | … | 9
< l > :: = a | b | c | … | z
3 – 2 – 13 – 2 – 1 OR?
How about 3 – 2 – 1 ?
Still Have Ambiguity…
16
How about 3 – 2 – 1 ?
< start > ::= < expr >
< expr > :: = < expr > + < expr > | < expr > – < expr > | < term >
< term > ::= < term > * < term > | < term > / < term > | < d > | < l >
< d > :: = 0 | 1 | 2 | 3 | … | 9
< l > :: = a | b | c | … | z
< start >
< expr > – < expr >
< expr > – < expr >
< start >
< expr >–< expr >
< expr > – < expr >
3 2
1 3
2 1
3 – 2 – 13 – 2 – 1 OR?
Still Have Ambiguity…
17
• Grouping of operators of same precedence not disambiguated.
• Non-commutative operators: only one parse tree correct.
< start > ::= < expr >
< expr > :: = < expr > + < expr > | < expr > – < expr > | < term >
< term > ::= < term > * < term > | < term > / < term > | < d > | < l >
< d > :: = 0 | 1 | 2 | 3 | … | 9
< l > :: = a | b | c | … | z
Imposing Associativity
18
< expr > :: = < d > – < expr > |
< d >
< d > :: = 0 | 1 | 2 | 3 | … | 9
Same grammar with left / right recursion for – :
Our choices:
< expr > :: = < expr > – < d > |
< d >
< d > :: = 0 | 1 | 2 | 3 | … | 9
Or:
Which one do we want for – in the calculator language?
< d > –
< d > – < d > –
< d > – < d > – < d > –
…
…
Associativity
19
• Deals with operators of same precedence
• Implicit grouping or parenthesizing
• Left to right: *, /, +, –
• Right to left: ^
Complete, Unambiguous Arithmetic Expression Grammar
20
< start > ::= < expr >
< expr > ::= < expr > + < term > |
< expr > − < term > |
< term >
< term > ::= < term > * < factor > |
< term > / < factor > |
< factor >
< factor >::= < g > ^ < factor > |
< g >
< g > ::= ( < expr > ) | < d > | < l >
< d > ::= 0 | 1 | 2 | … | 9
< l > ::= a | b | c | … | z
< start > ::= < expr >
< expr > :: = < expr > + < expr > |
< expr > – < expr > |
< expr > * < expr > |
< expr > / < expr > |
< expr > ^ < expr > |
< d > | < l >
< d > :: = 0 | 1 | 2 | 3 | … | 9
< l > :: = a | b | c | … | z
Original Ambiguous Grammar G
Unambiguous Grammar G
Abstract versus Concrete Syntax
21
Concrete Syntax:
Representation of a construct in a particular language,
including placement of keywords and delimiters.
Abstract Syntax:
Structure of meaningful components of each language construct .
Example:
22
< S > ::= < E >
< E > ::= < E > − < T > | < T >
< T > ::= < T > * id | id
Consider A * B − C:
Concrete Syntax Tree
Abstract Syntax Tree
(AST)
< S >
< E >
< E > < T > −
< T >
< T > * id
id
id
−
* id
id id
Abstract versus Concrete Syntax
23
Same abstract syntax, different concrete syntax:
Pascal while x <> A[i], do
i := i + 1
end
C while ( x != A[i])
i = i + 1;
Regular vs. Context Free
24
• All Regular languages are context free languages
• Not all context free languages are regular languages
Example:
is equivalent to: a b* | c+
Is {an bn |n ≥ 0} a context free language?
Question:
Regular vs. Context Free
Regular vs. Context Free
26
• All Regular languages are context free languages
• Not all context free languages are regular languages
Example:
is equivalent to: a b* | c+
Is {an bn |n ≥ 0} a context free language?
Is {an bn |n ≥ 0} a regular language?
Question:
Regular Grammars
27
CFGs with restrictions on the shape of production rules.
Left-linear:
Right-linear:
Complexity of Parsing
28
Classification of languages that can be recognized by
specific grammars.
Regular grammars DFAs O(n)
LR grammars Kunth’s algorithm O(n)
Arbitrary CFGs Earley’s algorithm O(n3)
Arbitrary CSGs LBAs P-SPACE
COMPLETE
Complexity:
Reading:
Scott Chapter 2.3.4 (for LR parser) and Chapter 2.4.3 for
language class.
Earley, Jay (1970), “An efficient context-free parsing
algorithm”, Communications of the ACM.
CSG
CFG
RG
LR
Top – Down Parsing – LL(1)
29
• The parse tree is constructed from the root, expanding non-terminal
nodes on the tree’s frontier following a leftmost derivation.
• The input program is read from left to right, and input tokens are read
(consumed) as the program is parsed.
• The next non-terminal symbol is replaced using one of its rules. The
particular choice has to be unique and uses parts of the input (partially
parsed program), for instance the first token of the remaining input.
Basic Idea:
x
y
α
β
Top – Down Parsing – LL(1) (cont.)
30
Example:
S ::= a S b | ε
How can we parse (automatically construct a leftmost derivation)
the input string a a a b b b using a PDA (push-down automaton)
and only the first symbol of the remaining input?
INPUT: a a a b b b eof
LL(1) Parsing Example
31
S ::= a S b | ε
S Remaining Input: a a a b b b
Sentential Form:
S
Applied Production:
S
LL(1) Parsing Example
32
S ::= a S b | ε
S Remaining Input: a a a b b b
Sentential Form:
a S b
Applied Production:
S ::= a S b
Sa b
a
S
b
LL(1) Parsing Example
33
S ::= a S b | ε
S Remaining Input: a a a b b b
Sentential Form:
a S b
Applied Production:
Sa b
Match!
a
S
b
LL(1) Parsing Example
34
S ::= a S b | ε
S Remaining Input: a a b b b
Sentential Form:
a S b
Applied Production:
Sa b
S
b
LL(1) Parsing Example
35
S ::= a S b | ε
S Remaining Input: a a b b b
Sentential Form:
a a S b b
Applied Production:
S ::= a S b
Sa b
Sa b
a
S
b
b
LL(1) Parsing Example
36
S ::= a S b | ε
S Remaining Input: a a b b b
Sentential Form:
a a S b b
Applied Production:
Sa b
Sa b
Match!
a
S
b
b
LL(1) Parsing Example
37
S ::= a S b | ε
S Remaining Input: a b b b
Sentential Form:
a a S b b
Applied Production:
Sa b
Sa b
S
b
b
LL(1) Parsing Example
38
S ::= a S b | ε
S Remaining Input: a b b b
Sentential Form:
a a a S b b b
Applied Production:
S ::= a S b
Sa b
Sa b
S ba
a
S
b
b
b
LL(1) Parsing Example
39
S ::= a S b | ε
S Remaining Input: a b b b
Sentential Form:
a a a S b b b
Applied Production:
Sa b
Sa b
S ba
Match!
a
S
b
b
b
LL(1) Parsing Example
40
S ::= a S b | ε
S Remaining Input: b b b
Sentential Form:
a a a S b b b
Applied Production:
Sa b
Sa b
S baS
b
b
b
LL(1) Parsing Example
41
S ::= a S b | ε
S Remaining Input: b b b
Sentential Form:
a a a b b b
Applied Production:
S ::= ε
Sa b
Sa b
S ba
ε
b
b
b
LL(1) Parsing Example
42
S ::= a S b | ε
S Remaining Input: b b b
Sentential Form:
a a a b b b
Applied Production:
Sa b
Sa b
S ba
ε
Match!b
b
b
LL(1) Parsing Example
43
S ::= a S b | ε
S Remaining Input: b b
Sentential Form:
a a a b b b
Applied Production:
Sa b
Sa b
S ba
εb
b
LL(1) Parsing Example
44
S ::= a S b | ε
S Remaining Input: b b
Sentential Form:
a a a b b b
Applied Production:
Sa b
Sa b
S ba
ε
Match!
b
b
LL(1) Parsing Example
45
S ::= a S b | ε
S Remaining Input: b
Sentential Form:
a a a b b b
Applied Production:
Sa b
Sa b
S ba
ε
b
LL(1) Parsing Example
46
S ::= a S b | ε
S Remaining Input: b
Sentential Form:
a a a b b b
Applied Production:
Sa b
Sa b
S ba
ε
Match!
b
LL(1) Parsing Example
47
S ::= a S b | ε
S Remaining Input:
Sentential Form:
a a a b b b
Applied Production:
Sa b
Sa b
S ba
ε
Next Lecture
48
• Read Scott, Chapter 2.3.1 – 2.3.2 (and the materials on companion site)
Next Time: