留学生代考 Defining syntax using CFGs

Defining syntax using CFGs

– Defined context-free grammar

– CFGs for specifying a language’s syntax
• Language membership
• List grammars
• Resolving ambiguity

CFG Review

• G = (N,Σ,P,S)
• ⇒!means “derives in

1 or more steps”
• CFG generates a

string by applying
productions until no
non-terminals remain

Example: Nested parens
Σ = { ( , ) }
P = Q → ( Q )

Formal Definition of a CFG’s Language

Let G = (N,Σ,P,S) be a CFG. Then

L(G) = 𝑤 𝑆 ⇒! 𝑤 where
S is the start nonterminal of G, and
w is a sequence that consists of

(only) terminal symbols or 𝜀

A CFG Defines a Language
CFG productions define the syntax of a language

We call this notation “BNF” (for “Backus-Naur Form”) or “extended BNF”
HTTP grammar using BNF:

– http://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html

1. Prog → begin Stmts end

2. Stmts → Stmts semicolon Stmt

3. | Stmt

4. Stmt → id assign Expr

5. Expr → id

6. | Expr plus id

List Grammars
• Useful to repeat a structure arbitrarily often

Stmts → Stmts semicolon Stmt | ;

StmtStmts ;

List skews left

List Grammars

List skews right

• Useful to repeat a structure arbitrarily often

Stmts → Stmt semicolon Stmts | Stmt

List Grammars

• What if we allowed both “skews”?

Stmts → Stmts semicolon Stmts | ; ; Stmts;Stmts

Derivation Order
• Leftmost Derivation: always expand the leftmost nonterminal
• Rightmost Derivation: always expand the rightmost nonterminal

1. Prog → begin Stmts end

2. Stmts → Stmts semicolon Stmt

3. | Stmt

4. Stmt → id assign Expr

5. Expr → id

6. | Expr plus id

Leftmost expands
this nonterminal

Rightmost expands
this nonterminal

Even with a fixed derivation order, it is
possible to derive the same string in
multiple ways
For Grammar G and string w
–G is ambiguous if
• >1 leftmost derivation of w
• >1 rightmost derivation of w
• > 1 parse tree for w

• Give a grammar G and a word w that has more
than 1 left-most derivation in G

Example: Ambiguous Grammars

ExprExpr minus

| Expr minus Expr

| Expr times Expr

| lparen Expr rparen

Derive the string 4 – 7 * 3

(assume tokenization)

ExprExpr times

Expr times

Expr minus

intlit intlit

Parse Tree 1 Parse Tree 2

Why is Ambiguity Bad?

Eventually, we’ll be using CFGs as the basis for our parser
– Parsing is much easier when there is no ambiguity in the

– The parse tree may mismatch user understanding!

ExprExpr minus

Expr times

Expr minus

intlit intlit

precedence

Resolving Grammar Ambiguity: Precedence

Intuitive problem
– Nonterminals are the same for both

To fix precedence
– 1 nonterminal per precedence level
– Parse lowest level first

Expr → intlit

| Expr minus Expr

| Expr times Expr

| lparen Expr rparen

Resolving Grammar Ambiguity: Precedence
lowest precedence level first
1 nonterminal per precedence levelExpr → intlit

| Expr minus Expr

| Expr times Expr

| lparen Expr rparen

Expr → Expr minus Expr

Term → Term times Term

Factor → intlit

| lparen Expr rparen

ExprExpr minus

TermTerm times

Factor Factor

Derive the string 4 – 7 * 3

Resolving Grammar Ambiguity: Precedence
Fixed Grammar

Expr → expr minus expr

Term → Term times Term

Factor → intlit

| lparen Expr rparen

ExprExpr minus

TermTerm times

Factor Factor

3intlit 7 intlit

Derive the string 4 – 7 * 3
Let’s try to re-build the wrong parse tree

TermTerm times

We’ll never be able
to derive minus
without parens

Did we fix all ambiguity?
Fixed Grammar

Expr → Expr minus Expr

Term → Term times Term

Factor → intlit

| lparen Expr rparen

Derive the string 4 – 7 – 3

ExprExpr minus

These subtrees could have
been swapped!

Where we are so far

Precedence
– We want correct behavior on 4 – 7 * 9
– A new nonterminal for each precedence level

Associativity
– We want correct behavior on 4 – 7 – 9
– Minus should be left associative: a – b – c = (a – b) – c
– Problem: the recursion in a rule like

Expr → Expr minus Expr

Definition: Recursion in Grammars

• A grammar is recursive in (nonterminal) X if
𝑋 ⇒! α𝑋γ for non-empty strings of symbols α and γ

• A grammar is left-recursive in X if
𝑋 ⇒! 𝑋γ for non-empty string of symbols γ

• A grammar is right-recursive in X if
𝑋 ⇒! α𝑋 for non-empty string of symbols α

Resolving Grammar Ambiguity: Associativity

Recognize left-assoc operators with left-recursive productions
Recognize right-assoc operators with right-recursive productions

Expr → Expr minus Expr

Term → Term times Term

Factor → intlit | lparen Expr rparen

Example: 4 – 7 – 9

Expr → Expr minus Term

Term → Term times Factor

Factor → intlit | lparen Expr rparen

Example: 4 – 7 – 9

Resolving Grammar Ambiguity: Associativity

Let’s try to re-build the wrong parse tree again

We’ll never be able
to derive minus
without parens

• Language of Boolean expressions
– bexp → TRUE

bexp → FALSE
bexp → bexp OR bexp
bexp → bexp AND bexp
bexp → NOT bexp
bexp → LPAREN bexp RPAREN

• Add nonterminals so that OR has lowest
precedence, then AND, then NOT. Then change
the grammar to reflect the fact that
both AND and OR are left associative.

• Draw a parse tree for the expression:
– true AND NOT true.

Another ambiguous example

if Cond then Stmt |
if Cond then Stmt else Stmt | …

Consider this word in this grammar:
if a then if b then s else s2

How would you derive it?

To understand how a parser works, we start by
understanding context-free grammars, which are used to
define the language recognized by the parser.
terminal symbol
– (non)terminal symbol
– grammar rule (or production)
– derivation (leftmost derivation, rightmost derivation)
– parse (or derivation) tree
– the language defined by a grammar
– ambiguous grammar

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts