程序代写代做代考 chain algorithm compiler COMP90045 Programming Language Implementation

COMP90045 Programming Language Implementation
Context-Free Grammars
Harald Søndergaard Lecture 6
Semester 1, 2019
PLI (Sem 1, 2019) Context-Free Grammars ⃝c University of Melbourne 1 / 18

The Role of the Parser
The parser transforms a stream of tokens into a syntax tree.
In theory, one could convert the entire input into a stream a tokens before invoking the parser.
In practice, the execution of the parser and the lexer are usually interleaved: the parser invokes the scanner whenever it needs to know what the next token in the input is.
The syntax tree shows the hierarchical structure of the input.
We use context free grammars to describe the structure we expect the input to have.
Since the only grammars involved in syntax analysis are context free grammars, we often use “grammar” to mean “context free grammar”.
PLI (Sem 1, 2019) Context-Free Grammars ⃝c University of Melbourne 2 / 18

Context-Free Grammars
The building blocks of CFGs are terminal and nonterminal symbols.
The terminal symbols of a grammar are the tokens returned by the scanner. They are called “terminal” because the grammar considers them to be atomic.
The nonterminal symbols of a grammar denote syntactic categories, such as expressions, statements and declarations. They are called “nonterminal” because the grammar specifies how they can be decomposed into sequences of smaller units. One nonterminal is distinguished as the start symbol.
A context-free grammar consists of a set of productions of the form nonterminal → (terminal | nonterminal)∗
PLI (Sem 1, 2019) Context-Free Grammars ⃝c University of Melbourne 3 / 18

An Example Grammar
E→E+T E→T T→T∗F T→F F→(E) F → id
This grammar says that an expression is the sum of one or more terms, a term is the product of one or more factors, and a factor is a parenthesized expression, or the value of a variable. E, T and F are nonterminals, whereas +, ∗, (, ) and id are terminals.
PLI (Sem 1, 2019) Context-Free Grammars ⃝c University of Melbourne 4 / 18

Notations for Grammars
We often show grammars in a shorthand form, with all productions for a given nonterminal joined together:
E→E+T|T T→T∗F|F F → (E)|id
We use symbols such as α, β and γ to denote sequences of zero or more symbols, which may be either terminal or nonterminal symbols.
We use ǫ to denote the empty symbol sequence.
PLI (Sem 1, 2019) Context-Free Grammars ⃝c University of Melbourne 5 / 18

Derivation Steps
A derivation step transforms one string of symbols into another by replacing a nonterminal in the input string by the right-hand side of one of the grammar’s productions for that nonterminal.
If A → γ is a production, then a derivation step can transform the string αAβ into the string αγβ.
This is denoted αAβ ⇒ αγβ.
Derivation steps can be chained together:
E⇒E+T ⇒T+T
PLI (Sem 1, 2019) Context-Free Grammars ⃝c University of Melbourne 6 / 18

Derivations
A derivation is a sequence of strings of symbols.
The first string consists of one nonterminal, the start symbol. The last consists of a sequence of terminals.
Each string in the sequence (except the first) is the result of applying a derivation step to the previous string.
We use A ⇒∗ α to indicate that α can be derived from A in zero or more steps, and A ⇒+ α to indicate that α can be derived from A in one or more steps.
Leftmost derivations replace the leftmost nonterminal at each step. Rightmost derivations replace the rightmost nonterminal at each step.
PLI (Sem 1, 2019) Context-Free Grammars ⃝c University of Melbourne 7 / 18

An Example Derivation
E→E+T E→T T→T∗F T→F F→(E) F → id
E⇒E+T ⇒T+T ⇒F+T
⇒ id+T
⇒ id+F
⇒ id+(E)
⇒ id+(T)
⇒ id+(F)
⇒ id + (id)
We say that id + (id) is generated by the grammar, or derived from E.
PLI (Sem 1, 2019) Context-Free Grammars ⃝c University of Melbourne 8 / 18

The Language of a Grammar
IfSisthestartsymbolofagrammarandS⇒∗ α,thenαisa sentential form.
If a sentential form α contains no nonterminals, then it is a sentence. Given our example grammar, id + (T ) is a sentential form, while
id + (id) is a sentence.
A string of terminals w is in L(G), the language recognized or
generated by a grammar G, if S is the start symbol of G and S ⇒+ w. One can build a parse tree for a sentence from a derivation yielding
that sentence; each derivation step adds the children of one node.
PLI (Sem 1, 2019) Context-Free Grammars ⃝c University of Melbourne 9 / 18

Parse Trees and Derivations
L → L;S|S
S → incR|R=R+int R → id | ∗ id
L⇒L;S ⇒S;S
⇒ incR;S
⇒ incid; S
⇒ inc id ; R = R + int
⇒ incid;id=R+int
⇒ incid;id= ∗id+int
L L;S
S R = R + int inc R id * id
id
PLI (Sem 1, 2019)
Context-Free Grammars ⃝c University of Melbourne 10 / 18

Ambiguity
Consider the grammar
E → E+E|E∗E|(E)|0|1|…|9
This grammar allows not only different derivations, but different parse trees for3+7*2:
EE E+E E*E 3E*EE+E2
7237
PLI (Sem 1, 2019) Context-Free Grammars ⃝c University of Melbourne 11 / 18

Ambiguity
A grammar that has different parse trees for some sentence is ambiguous.
Sometimes we can find a better grammar which is not ambiguous, and which generates the same language, such as the grammar for expressions using terms and factors.
However, this is not always possible. Some context-free languages are inherently ambiguous.
Even when it is possible, it is sometimes preferable to keep the grammar ambiguous and give the parser disambiguation rules that discard all parse trees except the one we want.
PLI (Sem 1, 2019) Context-Free Grammars ⃝c University of Melbourne 12 / 18

QUIZ: Ambiguity
Show that this grammar fragment for conditionals is ambiguous:
stmt → if expr then stmt
| if expr then stmt else stmt
| other
PLI (Sem 1, 2019) Context-Free Grammars ⃝c University of Melbourne 13 / 18

QUIZ: Ambiguity
Show that this grammar fragment for conditionals is ambiguous:
stmt → if expr then stmt
| if expr then stmt else stmt
| other
The sentence if expr then if expr then other else other has two
readings:
if expr then (if expr then other) else other
if expr then (if expr then other else other)
ICD’s Grammar 2.13 is an unambiguous grammar for the construct.
PLI (Sem 1, 2019) Context-Free Grammars ⃝c University of Melbourne 13 / 18

Undecidable Grammar Problems
No algorithm can exists to determine whether the language generated by some grammar is inherently ambiguous. This is a well-known example of an undecidable problem.
Similarly it is impossible to come up with an algorithm that takes two context-free grammars and decides whether they generate the same language.
PLI (Sem 1, 2019) Context-Free Grammars ⃝c University of Melbourne 14 / 18

Error Handling
Programs frequently have errors, so the compiler should assist the programmer in finding and identifying errors.
Types of errors, with examples:
lexical syntactic semantic logical
whiel (p) {…}
x = (x + 1))
y = abs(“xyz”) while (1) {x = x;}
misspelt keyword
unbalanced parentheses
incompatible types
infinite loop
Lexical errors are uncommon and most logical errors cannot be detected by the compiler, so most errors are found by during syntax analysis and semantic analysis.
PLI (Sem 1, 2019) Context-Free Grammars ⃝c University of Melbourne 15 / 18

Syntax Error Handling
Aims:
Tell user the (approximate) position of error. Indicate the type of error.
On line 27
Error: unbalanced parenthesis ^
Error recovery: don’t stop parsing after error; allow a single compiler invocation to find and report as many of the errors in the program as possible.
x = (y + z * 2;
PLI (Sem 1, 2019) Context-Free Grammars ⃝c University of Melbourne 16 / 18

Error Recovery Strategies
Panic mode: discard tokens until we find a synchronizer, a token that programmers rarely misplace (e.g., a semicolon in C).
Phrase level recovery: modify the program so parsing can continue (e.g., add a right parenthesis).
Error productions: augment grammar with extra productions to handle common errors.
In all these cases, we must be careful to avoid error avalanches, in which later errors (syntactic or semantic) are caused by the parser’s attempt to fix earlier errors.
This is much easier to do with the simpler error-recovery strategies.
PLI (Sem 1, 2019) Context-Free Grammars ⃝c University of Melbourne 17 / 18

Coming Up
Top-down parsing, including recursive descent.
PLI (Sem 1, 2019) Context-Free Grammars ⃝c University of Melbourne 18 / 18