PowerPoint Presentation
Defining syntax using CFGs
Roadmap
Last time
Defined context-free grammar
This time
CFGs for specifying a language’s syntax
Language membership
List grammars
Resolving ambiguity
CFG Review
G = (N,Σ,P,S)
means “derives in 1 or more steps”
CFG generates a string by applying productions until no non-terminals remain
Example: Nested parens
N = { Q }
Σ = { ( , ) }
P = Q → ( Q )
| ε
S = Q
Formal Definition of a CFG’s Language
Let G = (N,Σ,P,S) be a CFG. Then
L(G) = where
S is the start nonterminal of G, and
w is a sequence that consists of
(only) terminal symbols or
4
A CFG Defines a Language
CFG productions define the syntax of a language
We call this notation “BNF” (for “Backus-Naur Form”) or “extended BNF”
HTTP grammar using BNF:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html
Prog → begin Stmts end
Stmts → Stmts semicolon Stmt
| Stmt
Stmt → id assign Expr
Expr → id
| Expr plus id
List Grammars
Useful to repeat a structure arbitrarily often
Stmts → Stmts semicolon Stmt | Stmt
Stmts
;
Stmts
Stmt
Stmts
;
Stmt
Stmts
Stmt
Stmts
;
Stmts
Stmt
Stmts
;
List skews left
…
List Grammars
Stmts
;
Stmts
Stmt
Stmts
;
Stmt
Stmts
Stmt
Stmts
;
Stmts
Stmt
Stmts
;
List skews right
Useful to repeat a structure arbitrarily often
Stmts → Stmt semicolon Stmts | Stmt
List Grammars
Stmts
;
Stmts
Stmts
What if we allowed both “skews”?
Stmts → Stmts semicolon Stmts | Stmt
Stmts
;
Stmts
Stmts
;
Stmts
Stmt
Stmts
;
Stmts
Derivation Order
Leftmost Derivation: always expand the leftmost nonterminal
Rightmost Derivation: always expand the rightmost nonterminal
Prog → begin Stmts end
Stmts → Stmts semicolon Stmt
| Stmt
Stmt → id assign Expr
Expr → id
| Expr plus id
Stmt
Stmts
end
begin
Stmts
Prog
semicolon
Leftmost expands
this nonterminal
Rightmost expands
this nonterminal
Ambiguity
Even with a fixed derivation order, it is possible to derive the same string in multiple ways
For Grammar G and string w
G is ambiguous if
>1 leftmost derivation of w
>1 rightmost derivation of w
> 1 parse tree for w
Exercise
Give a grammar G and a word w that has more than 1 left-most derivation in G
Example: Ambiguous Grammars
Expr
Expr
minus
Expr
Expr → intlit
| Expr minus Expr
| Expr times Expr
| lparen Expr rparen
Derive the string 4 – 7 * 3
(assume tokenization)
Expr
Expr
times
intlit
intlit
intlit
4
7
3
Expr
times
Expr
Expr
Expr
minus
intlit
intlit
4
7
Expr
intlit
3
Parse Tree 1
Parse Tree 2
Why is Ambiguity Bad?
Why is Ambiguity Bad?
Eventually, we’ll be using CFGs as the basis for our parser
Parsing is much easier when there is no ambiguity in the grammar
The parse tree may mismatch user understanding!
Expr
Expr
minus
Expr
Expr
Expr
times
intlit
intlit
intlit
4
7
3
Expr
times
Expr
Expr
Expr
minus
intlit
intlit
4
7
Expr
intlit
3
4 – 7 * 3
Operator precedence
Resolving Grammar Ambiguity: Precedence
Intuitive problem
Nonterminals are the same for both operators
To fix precedence
1 nonterminal per precedence level
Parse lowest level first
Expr → intlit
| Expr minus Expr
| Expr times Expr
| lparen Expr rparen
Resolving Grammar Ambiguity: Precedence
lowest precedence level first
1 nonterminal per precedence level
Expr → intlit
| Expr minus Expr
| Expr times Expr
| lparen Expr rparen
Expr → Expr minus Expr
| Term
Term → Term times Term
| Factor
Factor → intlit
| lparen Expr rparen
Expr
Expr
minus
Expr
Term
Term
Term
times
Factor
Factor
3
intlit
7
intlit
Term
Factor
intlit
4
Derive the string 4 – 7 * 3
Resolving Grammar Ambiguity: Precedence
Fixed Grammar
Expr → expr minus expr
| Term
Term → Term times Term
| Factor
Factor → intlit
| lparen Expr rparen
Expr
Expr
minus
Expr
Term
Term
Term
times
Factor
Factor
3
intlit
7
intlit
Term
Factor
intlit
4
Derive the string 4 – 7 * 3
Let’s try to re-build the wrong parse tree
Term
Term
times
Term
Factor
intlit
3
Expr
We’ll never be able to derive minus
without parens
Did we fix all ambiguity?
Fixed Grammar
Expr → Expr minus Expr
| Term
Term → Term times Term
| Factor
Factor → intlit
| lparen Expr rparen
Derive the string 4 – 7 – 3
Expr
Expr
minus
Expr
intlit
Term
Factor
intlit
Term
Factor
intlit
Term
Factor
Expr
Expr
minus
NO!
These subtrees could have
been swapped!
Where we are so far
Precedence
We want correct behavior on 4 – 7 * 9
A new nonterminal for each precedence level
Associativity
We want correct behavior on 4 – 7 – 9
Minus should be left associative: a – b – c = (a – b) – c
Problem: the recursion in a rule like
Expr → Expr minus Expr
Definition: Recursion in Grammars
A grammar is recursive in (nonterminal) X if
for non-empty strings of symbols and
A grammar is left-recursive in X if
for non-empty string of symbols
A grammar is right-recursive in X if
for non-empty string of symbols
Resolving Grammar Ambiguity: Associativity
Recognize left-assoc operators with left-recursive productions
Recognize right-assoc operators with right-recursive productions
Expr → Expr minus Expr
| Term
Term → Term times Term
| Factor
Factor → intlit | lparen Expr rparen
Term
Factor
Example: 4 – 7 – 9
E
–
E
intlit
T
F
intlit
T
F
intlit
T
F
E
–
4
7
9
Expr → Expr minus Term
| Term
Term → Term times Factor
| Factor
Factor → intlit | lparen Expr rparen
Example: 4 – 7 – 9
E
–
E
T
intlit
T
F
4
Resolving Grammar Ambiguity: Associativity
Let’s try to re-build the wrong parse tree again
We’ll never be able to derive minus
without parens
Example
Language of Boolean expressions
bexp → TRUE
bexp → FALSE
bexp → bexp OR bexp
bexp → bexp AND bexp
bexp → NOT bexp
bexp → LPAREN bexp RPAREN
Add nonterminals so that OR has lowest precedence, then AND, then NOT. Then change the grammar to reflect the fact that both AND and OR are left associative.
Draw a parse tree for the expression:
true AND NOT true.
Another ambiguous example
Stmt →
if Cond then Stmt |
if Cond then Stmt else Stmt | …
Consider this word in this grammar:
if a then if b then s else s2
How would you derive it?
Summary
To understand how a parser works, we start by understanding context-free grammars, which are used to define the language recognized by the parser.
terminal symbol
(non)terminal symbol
grammar rule (or production)
derivation (leftmost derivation, rightmost derivation)
parse (or derivation) tree
the language defined by a grammar
ambiguous grammar
/docProps/thumbnail.jpeg