Context-Free Grammars
19 March 2019 OSU CSE 1
BL Compiler Structure
Copyright By PowCoder代写 加微信 powcoder
string of characters (source code)
string of tokens (“words”)
abstract program
integers (object code)
19 March 2019
The parser is arguably the most interesting, and most difficult, piece of the BL compiler.
Code Generator
Plan for the BL Parser
• Design a context-free grammar (CFG) to specify syntactically valid BL programs
• Use the grammar to implement a
recursive-descent parser (i.e., an
algorithm to parse a BL program and construct the corresponding Program
19 March 2019 OSU CSE 3
Plan for the BL Parser
• Design a context-free grammar (CFG) to specify syntactically valid BL programs
• Use the grammar to implement a recursive-descent parser (i.e., an
construct the corresponding Program
A algorithm to parse
a language.
formation rules for strings in
19 March 2019 OSU CSE 4
Plan for the BL Parser
• Design a context-free grammar (CFG) to specify syntactically valid BL programs
• Use the grammar to implement a recursive-descent parser (i.e., an
A grammar is context-free algorithm to parse a BL program and
if it satisfies certain construct the corresponding Program
technical conditions described herein.
19 March 2019 OSU CSE 5
• A language is a set of strings over some
alphabet Σ
• If L is a language, then mathematically it is
a set of string of Σ
19 March 2019 OSU CSE 6
Aside: Characters vs. Tokens
• In the following examples of CFGs, we deal with languages over the alphabet of
individual characters (e.g., Java’s char values)
Σ = character
• In the BL project, we deal with languages
over an alphabet of tokens (to be explained later)
19 March 2019 OSU CSE 7
Example: Real-Number Constants
• Some syntactically valid real-number constants (i.e., some strings in the “language of valid real-number constants”):
19 March 2019
real-const
exponent digit-seq
CFG Rewrite Rules
→ digit-seq . digit-seq |
digit-seq . digit-seq exponent | digit-seq . |
digit-seq . exponent
→ E digit-seq |
E + digit-seq | E – digit-seq
→ digit digit-seq | digit
digit →0|1|2|3|4|5|6|7|8|9
19 March 2019 OSU CSE 9
real-const
exponent digit-seq
CFG Rewrite Rules
→ digit-seq . digit-seq |
digit-seq . digit-seq exponent | digit-seq . |
digit-seq . exponent
→ E digit-seq |
E + digit-seq |
This is a rewrite rule (a E – digit-seq
replacement rule), which → digit digit-seq |
describes how strings in the
language may be formed.
digit →0|1|2|3|4|5|6|7|8|9
19 March 2019 OSU CSE 10
real-const
CFG Rewrite Rules
→ digit-seq . digit-seq |
digit-seq . digit-seq exponent | digit-seq . |
digit-seq . exponent
exponent digit-seq →
e left of a
→ E digit-seq |
E + digit-seq |
terminal symbol.
digit →0|1|2|3|4|5|6|7|8|9
19 March 2019 OSU CSE 11
real-const exponent
CFG Rewrite Rules
→ digit-seq . digit-seq |
digit-seq . digit-seq exponent | digit-seq . |
digit-seq . exponent
→ E digit-seq |
E + digit-seq |
digit-seq m
written as”
n be replaced by”.
G symbol →
digit →0|1|2|3|4|5|6|7|8|9
19 March 2019 OSU CSE 12
real-const
exponent digit-seq
CFG Rewrite Rules
→ digit-seq . digit-seq |
digit-seq . digit-seq exponent | digit-seq . |
digit-seq . exponent
→ E digit-seq |
E + digit-seq |
The special CFG symbol | E – digit-seq
means “or”, i.e., there are → digit digit-seq |
multiple possible “rewrites”
for the same non-terminal.
digit →0|1|2|3|4|5|6|7|8|9
19 March 2019 OSU CSE 13
real-const
CFG Rewrite Rules
→ digit-seq . digit-seq |
digit-seq . digit-seq exponent | digit-seq . |
digit-seq . exponent
→ E digit-seq |
E + digit-seq | E – digit-seq
exponent digit-seq
→ digit digit-seq |
So this …
digit →0|1|2|3|4|5|6|7|8|9
19 March 2019 OSU CSE 14
real-const real-const real-const real-const exponent
CFG Rewrite Rules
→ digit-seq . digit-seq
→ digit-seq . digit-seq exponent → digit-seq .
→ digit-seq . exponent
→ E digit-seq |
E + digit-seq | E – digit-seq
… means exactly the same
→ digit digit-seq |
thing as these four separate
digit →0|1|2|3|4|5|6|7|8|9
19 March 2019 OSU CSE 15
rewrite rules.
real-const
exponent digit-seq
CFG Rewrite Rules
→ digit-seq . digit-seq |
digit-seq . digit-seq exponent | digit-seq . |
digit-seq . exponent
→ E digit-seq |
E + digit-seq |
One non-terminal symbol
E – digit-seq
(normally in the first rewrite
→ digit digit-seq | rule) is called the
start symbol.
digit →0|1|2|3|4|5|6|7|8|9
19 March 2019 OSU CSE 16
real-const
CFG Rewrite Rules
→ digit-seq . digit-seq |
digit-seq . digit-seq exponent | digit-seq . |
digit-seq . exponent
→ E digit-seq |
digit-seq digit
19 March 2019
E + digit-seq |
A symbol from the alphabet
E – digit-seq
on the right-hand side of a
→ digit digit-seq | rewrite rule is called a
terminal symbol. →0|1|2|3|4|5|6|7|8|9
OSU CSE 17
real-const exponent
CFG Rewrite Rules
→ digit-seq . digit-seq |
digit-seq . digit-seq exponent | digit-seq . |
digit-seq . exponent
→ E digit-seq |
E + digit-seq |
To remember the name: terminal
E – digit-seq
symbols are what you end up with
digit-seq → digit digit-seq |
when generating strings in the
language (see below).
digit →0|1|2|3|4|5|6|7|8|9
19 March 2019 OSU CSE 18
Four Components of a CFG • Non-terminal symbols for this CFG:
– real-const, exponent, digit-seq, digit • Terminal symbols for this CFG:
– .,E,+,-,0,1,2,3,4,5,6,7,8,9 • Start symbol for this CFG:
– real-const
• Rewrite rules for this CFG:
– (see previous slides)
19 March 2019 OSU CSE 19
Derivations
• A derivation of a string of terminal symbols consists of a sequence of specific rewrite-rule applications that begin with the start symbol and continue until only terminal symbols remain
– A string is in the language of the CFG iff there is a derivation that leads to it
• The symbol ⇒ indicates a derivation step, i.e., a specific rewrite-rule application
19 March 2019 OSU CSE 20
Example: Derivation of 5.6E10 • Begin with the start symbol:
real-const ⇒
19 March 2019 OSU CSE 21
Example: Derivation of 5.6E10 • Begin with the start symbol:
real-const ⇒
• … and pick one possible rewrite:
real-const → digit-seq . digit-seq |
Which rewrite is appropriate
digit-seq . digit-seq exponent |
digit-seq . |
digit-seq . exponent to derive
5.6E10? 19 March 2019
OSU CSE 22
Example: Derivation of 5.6E10 • This is the first step of the derivation:
real-const ⇒ digit-seq . digit-seq exponent
19 March 2019 OSU CSE 23
Example: Derivation of 5.6E10 • Choose a non-terminal to rewrite:
real-const ⇒ digit-seq . digit-seq exponent
19 March 2019 OSU CSE 24
Example: Derivation of 5.6E10
real-const ⇒ digit-seq . digit-seq exponent
• Choose a non-terminal to rewrite:
• … and pick one possible rewrite: digit-seq → digit digit-seq |
Which rewrite is appropriate to derive 5.6E10?
19 March 2019
OSU CSE 25
Example: Derivation of 5.6E10 • This is the second step of the derivation:
real-const ⇒ digit-seq . digit-seq exponent ⇒ digit . digit-seq exponent
19 March 2019 OSU CSE 26
Example: Derivation of 5.6E10 • Choose a non-terminal to rewrite:
real-const ⇒ digit-seq . digit-seq exponent ⇒ digit . digit-seq exponent
19 March 2019 OSU CSE 27
Example: Derivation of 5.6E10 • Choose a non-terminal to rewrite:
real-const ⇒ digit-seq . digit-seq exponent ⇒ digit . digit-seq exponent
• … and pick one possible rewrite:
digit →0|1|2|3|4|5|6|7|8|9
19 March 2019 OSU CSE 28
Example: Derivation of 5.6E10 • This is the third step of the derivation:
real-const ⇒ digit-seq . digit-seq exponent ⇒ digit . digit-seq exponent
⇒ 5 . digit-seq exponent
19 March 2019 OSU CSE 29
Example: Derivation of 5.6E10 • Choose a non-terminal to rewrite:
real-const ⇒ digit-seq . digit-seq exponent ⇒ digit . digit-seq exponent
⇒ 5 . digit-seq exponent
19 March 2019 OSU CSE 30
Example: Derivation of 5.6E10 • Choose a non-terminal to rewrite:
real-const ⇒ digit-seq . digit-seq exponent ⇒ digit . digit-seq exponent
⇒ 5 . digit-seq exponent
• … and pick one possible rewrite: digit-seq → digit digit-seq |
19 March 2019
OSU CSE 31
One Derivation of 5.6E10
real-const
⇒ digit-seq . digit-seq exponent ⇒ digit . digit-seq exponent
⇒ 5 . digit-seq exponent
⇒ 5 . digit exponent
19 March 2019
⇒ 5 . 6 exponent
⇒ 5 . 6 E digit-seq
⇒ 5 . 6 E digit digit-seq ⇒ 5.6 E1digit-seq
⇒ 5.6 E1digit
OSU CSE 32
19 March 2019
OSU CSE 33
One Derivation of 5.6E10
⇒ digit-seq .
d ⇒ digit.digit-s
real-const
⇒ 5 . digit-se l
⇒ 5.digit ex
⇒ 5 . 6 exponent
⇒ 5 . 6 E digit-seq
⇒ 5 . 6 E digit digit-seq ⇒ 5.6 E1digit-seq
⇒ 5.6 E1digit
string in the
ge of the CFG.
Another Derivation of 5.6E10
real-const
⇒ digit-seq . digit-seq exponent
⇒ digit-seq.digit-seqEdigit-seq
⇒ digit-seq . digit-seq E digit digit-seq ⇒ digit-seq . digit-seq E digit digit
⇒ digit-seq.digit-seqEdigit0
⇒ digit-seq.digit-seqE10
⇒ digit-seq.digitE10
⇒ digit-seq.6E10
⇒ digit.6E10
19 March 2019
OSU CSE 34
Derivation Trees
• A derivation tree depicts a derivation (such as those above) in a tree
• Note that the order in which rewrites are done is sometimes arbitrary
– A tree captures the required temporal order of rewrites from top-to-bottom
– A tree captures the required spatial order among terminal symbols from left-to-right
19 March 2019 OSU CSE 35
A Derivation Tree for 5.6E10 real-const
digit-seq . digit-seq digit digit
E digit-seq
digit digit-seq
19 March 2019
A Derivation Tree for 5.6E10 real-const
digit-seq . digit-seq digit digit
This tree captures both derivations previously illustrated (and all others) for 5.6E10.
E digit-seq
digit digit-seq
19 March 2019 OSU CSE
Other Examples
• Can you find a derivation tree for 5.E3? – If so, it’s in the language of the CFG;
otherwise it’s not in that language
• Can you find a derivation tree for .6E10?
– If so, it’s in the language of the CFG; otherwise it’s not in that language
19 March 2019 OSU CSE 38
expr term factor add-op mult-op
digit-seq digit
A Famous CFG
→ expr add-op term | term
→ term mult-op factor | factor → ( expr ) | digit-seq →+|-
→ * | DIV | REM
→ digit digit-seq | digit →0|1|2|3|4|5|6|7|8|9
19 March 2019
OSU CSE 39
Example: 4+6*2 • Find a derivation tree for 4+6*2
19 March 2019 OSU CSE 40
A Derivation Tree for 4+6*2 expr
term factor digit-seq
term mult-op factor *
46 19 March 2019 OSU CSE
Example: (4+6)*2 • Find a derivation tree for (4+6)*2
• How is it different from the previous one?
19 March 2019 OSU CSE 42
A Simpler CFG for Expressions
expr → expr op expr | ( expr ) | digit-seq op →+|-|*|DIV|REM
digit-seq → digit digit-seq | digit
digit →0|1|2|3|4|5|6|7|8|9
19 March 2019 OSU CSE 43
One Derivation Tree for 4+6*2 expr
digit-seq digit
+ expr op expr digit-seq * digit-seq
digit digit
19 March 2019
Another Derivation Tree for 4+6*2 expr
expr op digit-seq +
expr * digit-seq
19 March 2019 OSU CSE
• The second (simpler) CFG for arithmetic expressions is ambiguous because some strings in the language of the CFG have more than one derivation tree
• As is often the case, ambiguity is bad
– If you want to use the derivation tree as the basis for evaluating the expression, only one of the derivation trees shown above results in the right answer (which one?)
19 March 2019 OSU CSE 46
• Wikipedia: Context-Free Grammar
– http://en.wikipedia.org/wiki/Context-free_grammar
19 March 2019 OSU CSE 47
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com