代写 compiler Compiler Theory

Compiler Theory
Assignment 6, February 7
Parsing with Context-Free Grammars

Reading: Chapter 4 of the text.

Finite state machines and regular expressions cannot parse all the languages
we want; they can only handle “regular” languages. In particular, they cannot
count arbitrary numbers; they cannot match parentheses to arbitrary depth.
Therefore, we need a more powerful type of grammar and parser.

Programming languages these days are mostly designed to be “context-free”
languages, which means they can be defined by “context-free grammars”. A
context-free grammar is defined by substitution rules. In a substitution
rule, a term representing part of a program is allowed to be replaced by
a sequence of terms, until it gets down to the actual tokens of the program.
The terms representing individual tokens are called “terminals” (since the
substitution process stops there), and the terms that are substitutable, that
is occuring on the left side of substitution rules, are called “nonterminals”.
There is one nonterminal, such as “program”, that represents the entire
program.

“Context-free” means the left side of a substitution rule contains exactly
one nonterminal, and it may be replaced by the right side regardless of the
context in which the left side occurs.

Substitution rules are also called “productions” or “rewriting rules”.

“Parsing” is the finding of a sequence of substitutions that turns “program”
into the actual input stream of tokens. The parsing of a program is often
represented as a “parse tree”, which has “program” as the root at the top,
and each nonterminal is a node with its children being the right side of
some rule for that nonterminal, until the leaves of the tree are the input
tokens.

Parsing can also be represented as sequence of lines of text, starting with
program
and expanding nonterminals one at a time according to their rules, until
everything is input tokens. Doing this by hand works best for short programs,
obviously, but that is all we will be doing in this assignment.

leftmost derivation: The leftmost nonterminal in the line is substituted at
each step.

rightmost derivation: The rightmost nonterminal in the line is substituted at
each step.

There may be multiple possible substitution rules for a particular nonterminal,
but it needs only be possible to choose a sequence of substitutions to get
a parse. So parsing can be ambiguous.

In the exercises below, “e” will stand for the empty sequence of tokens.
Also, since the main theme of these is arithmetic expressions, “E” will stand
for a whole expression, instead of using “program”.

1. A grammar for simple nested parentheses is
E -> (E)
E -> e
Give a derivation and a parse tree for ((())).

OVER

2. a. Write a grammar for arbitrary properly nested parentheses, such as
(()())() .
b. Draw a parse tree for (()())().
c. Give a leftmost derivation for (()())().
d. Give a rightmost derivation for (()())().

3. a. Write a grammar for arbitrary properly nested parentheses and brackets,
such as ([](([()]))).
b. Draw a parse tree for ([](([()]))).
c. Give a leftmost derivation for ([](([()]))).

4. The grammar
E -> E + E
E -> E – E
E -> NUMBER
is ambiguous.
a. Show two parse trees for 1 + 2 – 3 – 4 that turn out to have
different answers when you actually do the arithmetic.
b. Rewrite the grammar, possibly introducing new nonterminals, so
the language is left associative, that is,
1 + 2 – 3 – 4
parses as ((1 + 2) – 3) – 4.

5. The grammar
E -> E + E
E -> E – E
E -> E * E
E -> E / E
E -> NUMBER
is ambiguous.
a. Show two parse trees for 1 + 6 / 3 * 4 that turn out to have
different answers when you actually do the arithmetic.
b. Rewrite the grammar, possibly introducing new nonterminals, so
that * and / have precedence over + and – (that is, * and /
happen before + and – when the arithmetic is actually done),
* and / are left-associative among themselves, and + and –
are left-associative among themselves, so 1 + 6 / 3 * 4 parses
as 1 + ((6 / 3) * 4).

6. The common if-then-else construct of programming languages is
notoriously ambiguous.
statement -> if expr then statement else statement
statement -> if expr then statement
statement -> S
where I am letting S stand for any non-if statement, and expr
is a Boolean expression you need not worry about further.
a. Show two different parse trees for
if expr then if expr then S else S
b. Rewrite the grammar, possibly introducing new nonterminals,
so that an “else” always goes with the immediately preceding
available “if”, so you would get
if expr then { if expr then S else S }