Defining syntax using CFGs
– Defined context-free grammar
Copyright By PowCoder代写 加微信 powcoder
– CFGs for specifying a language’s syntax
• Language membership
• List grammars
• Resolving ambiguity
CFG Review
• G = (N,Σ,P,S)
• ⇒!means “derives in
1 or more steps”
• CFG generates a
string by applying
productions until no
non-terminals remain
Example: Nested parens
Σ = { ( , ) }
P = Q → ( Q )
Formal Definition of a CFG’s Language
Let G = (N,Σ,P,S) be a CFG. Then
L(G) = 𝑤 𝑆 ⇒! 𝑤 where
S is the start nonterminal of G, and
w is a sequence that consists of
(only) terminal symbols or 𝜀
A CFG Defines a Language
CFG productions define the syntax of a language
We call this notation “BNF” (for “Backus-Naur Form”) or “extended BNF”
HTTP grammar using BNF:
– http://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html
1. Prog → begin Stmts end
2. Stmts → Stmts semicolon Stmt
3. | Stmt
4. Stmt → id assign Expr
5. Expr → id
6. | Expr plus id
List Grammars
• Useful to repeat a structure arbitrarily often
Stmts → Stmts semicolon Stmt | ;
StmtStmts ;
List skews left
List Grammars
List skews right
• Useful to repeat a structure arbitrarily often
Stmts → Stmt semicolon Stmts | Stmt
List Grammars
• What if we allowed both “skews”?
Stmts → Stmts semicolon Stmts | ; ; Stmts;Stmts
Derivation Order
• Leftmost Derivation: always expand the leftmost nonterminal
• Rightmost Derivation: always expand the rightmost nonterminal
1. Prog → begin Stmts end
2. Stmts → Stmts semicolon Stmt
3. | Stmt
4. Stmt → id assign Expr
5. Expr → id
6. | Expr plus id
Leftmost expands
this nonterminal
Rightmost expands
this nonterminal
Even with a fixed derivation order, it is
possible to derive the same string in
multiple ways
For Grammar G and string w
–G is ambiguous if
• >1 leftmost derivation of w
• >1 rightmost derivation of w
• > 1 parse tree for w
• Give a grammar G and a word w that has more
than 1 left-most derivation in G
Example: Ambiguous Grammars
ExprExpr minus
| Expr minus Expr
| Expr times Expr
| lparen Expr rparen
Derive the string 4 – 7 * 3
(assume tokenization)
ExprExpr times
Expr times
Expr minus
intlit intlit
Parse Tree 1 Parse Tree 2
Why is Ambiguity Bad?
Why is Ambiguity Bad?
Eventually, we’ll be using CFGs as the basis for our parser
– Parsing is much easier when there is no ambiguity in the
– The parse tree may mismatch user understanding!
ExprExpr minus
Expr times
Expr times
Expr minus
intlit intlit
precedence
Resolving Grammar Ambiguity: Precedence
Intuitive problem
– Nonterminals are the same for both
To fix precedence
– 1 nonterminal per precedence level
– Parse lowest level first
Expr → intlit
| Expr minus Expr
| Expr times Expr
| lparen Expr rparen
Resolving Grammar Ambiguity: Precedence
lowest precedence level first
1 nonterminal per precedence levelExpr → intlit
| Expr minus Expr
| Expr times Expr
| lparen Expr rparen
Expr → Expr minus Expr
Term → Term times Term
Factor → intlit
| lparen Expr rparen
ExprExpr minus
TermTerm times
Factor Factor
Derive the string 4 – 7 * 3
Resolving Grammar Ambiguity: Precedence
Fixed Grammar
Expr → expr minus expr
Term → Term times Term
Factor → intlit
| lparen Expr rparen
ExprExpr minus
TermTerm times
Factor Factor
3intlit 7 intlit
Derive the string 4 – 7 * 3
Let’s try to re-build the wrong parse tree
TermTerm times
We’ll never be able
to derive minus
without parens
Did we fix all ambiguity?
Fixed Grammar
Expr → Expr minus Expr
Term → Term times Term
Factor → intlit
| lparen Expr rparen
Derive the string 4 – 7 – 3
ExprExpr minus
ExprExpr minus
These subtrees could have
been swapped!
Where we are so far
Precedence
– We want correct behavior on 4 – 7 * 9
– A new nonterminal for each precedence level
Associativity
– We want correct behavior on 4 – 7 – 9
– Minus should be left associative: a – b – c = (a – b) – c
– Problem: the recursion in a rule like
Expr → Expr minus Expr
Definition: Recursion in Grammars
• A grammar is recursive in (nonterminal) X if
𝑋 ⇒! α𝑋γ for non-empty strings of symbols α and γ
• A grammar is left-recursive in X if
𝑋 ⇒! 𝑋γ for non-empty string of symbols γ
• A grammar is right-recursive in X if
𝑋 ⇒! α𝑋 for non-empty string of symbols α
Resolving Grammar Ambiguity: Associativity
Recognize left-assoc operators with left-recursive productions
Recognize right-assoc operators with right-recursive productions
Expr → Expr minus Expr
Term → Term times Term
Factor → intlit | lparen Expr rparen
Example: 4 – 7 – 9
Expr → Expr minus Term
Term → Term times Factor
Factor → intlit | lparen Expr rparen
Example: 4 – 7 – 9
Resolving Grammar Ambiguity: Associativity
Let’s try to re-build the wrong parse tree again
We’ll never be able
to derive minus
without parens
• Language of Boolean expressions
– bexp → TRUE
bexp → FALSE
bexp → bexp OR bexp
bexp → bexp AND bexp
bexp → NOT bexp
bexp → LPAREN bexp RPAREN
• Add nonterminals so that OR has lowest
precedence, then AND, then NOT. Then change
the grammar to reflect the fact that
both AND and OR are left associative.
• Draw a parse tree for the expression:
– true AND NOT true.
Another ambiguous example
if Cond then Stmt |
if Cond then Stmt else Stmt | …
Consider this word in this grammar:
if a then if b then s else s2
How would you derive it?
To understand how a parser works, we start by
understanding context-free grammars, which are used to
define the language recognized by the parser.
terminal symbol
– (non)terminal symbol
– grammar rule (or production)
– derivation (leftmost derivation, rightmost derivation)
– parse (or derivation) tree
– the language defined by a grammar
– ambiguous grammar
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com