PARSE TREES AND DERIVATIONS
PRINCIPLES OF PROGRAMMING LANGUAGES
Norbert Zeh Winter 2021
Dalhousie University
1/7
ROAD MAP
• Parsing: Transform (tokenized) program text into parse tree
• Modelling programming languages: Context-free grammars and languages
• Capturing the syntactic structure of a program: Parse trees
• Types of parsers and types of grammars they can parse
• Grammars that describe programming languages and can be parsed
efficiently
• Construction of an LL(1) grammar
• Parsing LL(1) languages
• Push-down automata
2/7
PARSE TREES
Every derivation can be represented by a parse tree:
• The root is S.
• The leaves, called the yield of the parse tree, are terminals.
• Every internal node is a non-terminal.
• The children of each non-terminal are the symbols it is replaced with.
3/7
PARSE TREES
Every derivation can be represented by a parse tree:
• The root is S.
• The leaves, called the yield of the parse tree, are terminals.
• Every internal node is a non-terminal.
• The children of each non-terminal are the symbols it is replaced with.
Sentence
Sentence
3/7
PARSE TREES
Every derivation can be represented by a parse tree:
• The root is S.
• The leaves, called the yield of the parse tree, are terminals.
• Every internal node is a non-terminal.
• The children of each non-terminal are the symbols it is replaced with.
Sentence ⇒ Phrase Verb Phrase
Sentence
3/7
PARSE TREES
Every derivation can be represented by a parse tree:
• The root is S.
• The leaves, called the yield of the parse tree, are terminals.
• Every internal node is a non-terminal.
• The children of each non-terminal are the symbols it is replaced with.
Sentence ⇒ Phrase Verb Phrase
Sentence
Phrase Verb Phrase
3/7
PARSE TREES
Every derivation can be represented by a parse tree:
• The root is S.
• The leaves, called the yield of the parse tree, are terminals.
• Every internal node is a non-terminal.
• The children of each non-terminal are the symbols it is replaced with.
Sentence ⇒ Phrase Verb Phrase ⇒ Noun Verb Phrase
Sentence
Phrase Verb Phrase
3/7
PARSE TREES
Every derivation can be represented by a parse tree:
• The root is S.
• The leaves, called the yield of the parse tree, are terminals.
• Every internal node is a non-terminal.
• The children of each non-terminal are the symbols it is replaced with.
Sentence ⇒ Phrase Verb Phrase ⇒ Noun Verb Phrase
Phrase Noun
Sentence
Verb Phrase
3/7
PARSE TREES
Every derivation can be represented by a parse tree:
• The root is S.
• The leaves, called the yield of the parse tree, are terminals.
• Every internal node is a non-terminal.
• The children of each non-terminal are the symbols it is replaced with.
Sentence ⇒ Phrase Verb Phrase ⇒ Noun Verb Phrase
⇒ Noun Verb Noun
Phrase Noun
Sentence
Verb Phrase
3/7
PARSE TREES
Every derivation can be represented by a parse tree:
• The root is S.
• The leaves, called the yield of the parse tree, are terminals.
• Every internal node is a non-terminal.
• The children of each non-terminal are the symbols it is replaced with.
Sentence ⇒ Phrase Verb Phrase ⇒ Noun Verb Phrase
⇒ Noun Verb Noun
Phrase Noun
Sentence Verb
Phrase Noun
3/7
PARSE TREES
Every derivation can be represented by a parse tree:
• The root is S.
• The leaves, called the yield of the parse tree, are terminals.
• Every internal node is a non-terminal.
• The children of each non-terminal are the symbols it is replaced with.
Sentence ⇒ Phrase Verb Phrase ⇒ Noun Verb Phrase
⇒ Noun Verb Noun ⇒ Jim Verb Noun
Phrase Noun
Sentence Verb
Phrase Noun
3/7
PARSE TREES
Every derivation can be represented by a parse tree:
• The root is S.
• The leaves, called the yield of the parse tree, are terminals.
• Every internal node is a non-terminal.
• The children of each non-terminal are the symbols it is replaced with.
Sentence ⇒ Phrase Verb Phrase ⇒ Noun Verb Phrase
⇒ Noun Verb Noun ⇒ Jim Verb Noun
Phrase
Noun
Jim
Sentence Verb
Phrase Noun
3/7
PARSE TREES
Every derivation can be represented by a parse tree:
• The root is S.
• The leaves, called the yield of the parse tree, are terminals.
• Every internal node is a non-terminal.
• The children of each non-terminal are the symbols it is replaced with.
Sentence ⇒ Phrase Verb Phrase ⇒ Noun Verb Phrase
⇒ Noun Verb Noun ⇒ Jim Verb Noun ⇒ Jim ate Noun
Phrase Noun Jim
Sentence Verb
Phrase Noun
3/7
PARSE TREES
Every derivation can be represented by a parse tree:
• The root is S.
• The leaves, called the yield of the parse tree, are terminals.
• Every internal node is a non-terminal.
• The children of each non-terminal are the symbols it is replaced with.
Sentence ⇒ Phrase Verb Phrase ⇒ Noun Verb Phrase
⇒ Noun Verb Noun ⇒ Jim Verb Noun ⇒ Jim ate Noun
Sentence Phrase Verb
Phrase Noun ate Noun
Jim
3/7
PARSE TREES
Every derivation can be represented by a parse tree:
• The root is S.
• The leaves, called the yield of the parse tree, are terminals.
• Every internal node is a non-terminal.
• The children of each non-terminal are the symbols it is replaced with.
Sentence ⇒ Phrase Verb Phrase ⇒ Noun Verb Phrase
⇒ Noun Verb Noun
⇒ Jim Verb Noun
⇒ Jim ate Noun
⇒ Jim ate cheese Jim
Sentence Phrase Verb
Phrase Noun ate Noun
3/7
PARSE TREES
Every derivation can be represented by a parse tree:
• The root is S.
• The leaves, called the yield of the parse tree, are terminals.
• Every internal node is a non-terminal.
• The children of each non-terminal are the symbols it is replaced with.
Sentence ⇒ Phrase Verb Phrase ⇒ Noun Verb Phrase
⇒ Noun Verb Noun ⇒ Jim Verb Noun ⇒ Jim ate Noun
⇒ Jim ate cheese
Phrase Noun Jim
Sentence Verb ate
Phrase
Noun
cheese
3/7
PARSE TREES
Every derivation can be represented by a parse tree:
• The root is S.
• The leaves, called the yield of the parse tree, are terminals.
• Every internal node is a non-terminal.
• The children of each non-terminal are the symbols it is replaced with.
Sentence ⇒ Phrase Verb Phrase ⇒ Noun Verb Phrase
⇒ Noun Verb Noun ⇒ Jim Verb Noun ⇒ Jim ate Noun
⇒ Jim ate cheese
Phrase
Noun
Jim
Sentence Verb ate
Phrase
Noun
cheese
Note: In general, there are multiple derivations with the same parse tree.
3/7
PARSE TREES
Reading the yield left to
Every derivation can be represented by a parse tree:
right gives a string in L(G).
• The root is S.
• The leaves, called the yield of the parse tree, are terminals.
• Every internal node is a non-terminal.
• The children of each non-terminal are the symbols it is replaced with.
Sentence ⇒ Phrase Verb Phrase ⇒ Noun Verb Phrase
⇒ Noun Verb Noun ⇒ Jim Verb Noun ⇒ Jim ate Noun
⇒ Jim ate cheese
Phrase
Noun
Jim
Sentence Verb ate
Phrase
Noun
cheese
Note: In general, there are multiple derivations with the same parse tree.
3/7
AMBIGUITY
There are infinitely many context-free grammars generating a given context-free language:
4/7
AMBIGUITY
There are infinitely many context-free grammars generating a given context-free language:
Just add arbitrary non-terminals to the right-hand sides of productions and then add ε-productions for these non-terminals.
4/7
AMBIGUITY
There are infinitely many context-free grammars generating a given context-free language:
Just add arbitrary non-terminals to the right-hand sides of productions and then add ε-productions for these non-terminals.
There may be more than one parse tree for the same sentence generated by a grammar G. If this is the case, we call G ambiguous.
4/7
AMBIGUITY
There are infinitely many context-free grammars generating a given context-free language:
Just add arbitrary non-terminals to the right-hand sides of productions and then add ε-productions for these non-terminals.
There may be more than one parse tree for the same sentence generated by a grammar G. If this is the case, we call G ambiguous.
To allow parsing a programming language unambiguously, its grammar has to be unambiguous. (A bit of a tautology.)
4/7
AMBIGUITY
There are infinitely many context-free grammars generating a given context-free language:
Just add arbitrary non-terminals to the right-hand sides of productions and then add ε-productions for these non-terminals.
There may be more than one parse tree for the same sentence generated by a grammar G. If this is the case, we call G ambiguous.
To allow parsing a programming language unambiguously, its grammar has to be unambiguous. (A bit of a tautology.)
Some context-free languages are inherently ambiguous, that is, do not have unambiguous grammars. Usually, this is not the case for programming languages. 4/7
AMBIGUITY: EXAMPLE (1)
E → num E → id
E → E ’+’ E E → E ’-’ E E → E ’*’ E E → E ’/’ E E → ’(’ E ’)’
2+3∗4
5/7
AMBIGUITY: EXAMPLE (1)
E → num E → id
E → E ’+’ E E → E ’-’ E E → E ’*’ E E → E ’/’ E E → ’(’ E ’)’
2+3∗4
E
E ’+’ E
’2’ E ’*’ E ’3’ ’4’
5/7
AMBIGUITY: EXAMPLE (1)
E → num E → id
E → E ’+’ E E → E ’-’ E E → E ’*’ E E → E ’/’ E E → ’(’ E ’)’
2+3∗4 EE
E ’+’ E
’2’ E ’*’ E
’3’ ’4’
E ’*’ E E ’+’ E ’4’
’2’ ’3’
5/7
AMBIGUITY: EXAMPLE (1)
E → num E → id
E → E ’+’ E E → E ’-’ E E → E ’*’ E E → E ’/’ E E → ’(’ E ’)’
2+3∗4 EE
E ’+’ E
’2’ E ’*’ E
’3’ ’4’
This grammar is ambiguous!
E ’*’ E E ’+’ E ’4’
’2’ ’3’
5/7
AMBIGUITY: EXAMPLE (1)
E → num E → id
E → E ’+’ E E → E ’-’ E E → E ’*’ E E → E ’/’ E E → ’(’ E ’)’
2+3∗4 EE
E ’+’ E
’2’ E ’*’ E
’3’ ’4’
This grammar is ambiguous!
E ’*’ E E ’+’ E ’4’
’2’ ’3’
Violates precedence rules!
5/7
AMBIGUITY: EXAMPLE (2)
An unambiguous grammar for the same language:
E→T
E → E ’+’ T E → E ’-’ T T→F
T → T ’*’ F T → T ’/’ F F → num F → id
F → ’(’ E ’)’
6/7
AMBIGUITY: EXAMPLE (2)
An unambiguous grammar for the same language:
E→T E
E → E ’+’ T E → E ’-’ T T→F
T → T ’*’ F T → T ’/’ F F → num F → id
F → ’(’ E ’)’
E ’+’ T
T T ’*’ F FF ’4’
’2’ ’3’
6/7
AMBIGUITY: EXAMPLE (2)
An unambiguous grammar for the same language:
E→T E
E → E ’+’ T E → E ’-’ T T→F
T → T ’*’ F T → T ’/’ F F → num F → id
E ’+’ T
T T ’*’ F FF ’4’
’2’ ’3’
F → ’(’ E ’)’
This grammar respects precedence rules. 6/7
AMBIGUITY: EXAMPLE (2)
An unambiguous grammar for the same language:
E→TEE
E → E ’+’ T E → E ’-’ T T→F
T → T ’*’ F T → T ’/’ F F → num F → id
E ’+’ T
T T ’*’ F F F ’4’
’2’ ’3’
E ’-’ T E ’-’ T F
T
F
F ’4’ ’3’
F → ’(’ E ’)’
This grammar respects precedence rules. It also respects left-associativity. 6/7
’2’
AMBIGUITY: EXAMPLE (2)
An unambiguous grammar for the same language:
E→TEE Remember, variables capture
E → E ’+’ T E → E ’-’ T T→F
T → T ’*’ F T → T ’/’ F F → num F → id
“structural components” of
E ’-’ T E ’-’ T F
T
E ’+’ T the string:
E = Expression
T T ’*’ F
T = Term
F =FactorF ’4’
’2’ ’3’
F ’4’ F ’3’
F → ’(’ E ’)’
This grammar respects precedence rules. It also respects left-associativity. 6/7
’2’
LEFTMOST AND RIGHTMOST DERIVATIONS
For every language defined by an unambiguous grammar, there is only one parse tree that generates each sentence in the language.
7/7
LEFTMOST AND RIGHTMOST DERIVATIONS
For every language defined by an unambiguous grammar, there is only one parse tree that generates each sentence in the language.
There are many different derivations corresponding to this parse tree.
E
E ’+’ T
T T ’*’ F
FF ’4’ ’2’ ’3’
7/7
LEFTMOST AND RIGHTMOST DERIVATIONS
For every language defined by an unambiguous grammar, there is only one parse tree that generates each sentence in the language.
There are many different derivations corresponding to this parse tree. Leftmost derivation: Replaces the
leftmost non-terminal in each step.
E ⇒ E ’+’ T
⇒ T ’+’ T
⇒ F ’+’ T
⇒ ’2’ ’+’ T
⇒ ’2’ ’+’ T ’*’ F ⇒ ’2’ ’+’ F ’*’ F ⇒ ’2’ ’+’ ’3’ ’*’ F ⇒ ’2’ ’+’ ’3’ ’*’ ’4’
E
E ’+’ T
T T ’*’ F
FF ’4’ ’2’ ’3’
7/7
LEFTMOST AND RIGHTMOST DERIVATIONS
For every language defined by an unambiguous grammar, there is only one parse tree that generates each sentence in the language.
There are many different derivations corresponding to this parse tree.
Leftmost derivation: Replaces the leftmost non-terminal in each step.
Rightmost derivation: Replaces the rightmost non-terminal in each step.
E ⇒ E ’+’ T
⇒ T ’+’ T
⇒ F ’+’ T
⇒ ’2’ ’+’ T
⇒ ’2’ ’+’ T ’*’ F ⇒ ’2’ ’+’ F ’*’ F ⇒ ’2’ ’+’ ’3’ ’*’ F ⇒ ’2’ ’+’ ’3’ ’*’ ’4’
E E ’+’
T T
F F
’2’ ’3’
E
⇒ E ’+’ T
⇒ E ’+’ T ’*’ F
⇒ E ’+’ T ’*’ ’4’ ⇒ E ’+’ F ’*’ ’4’ ⇒ E ’+’ ’3’ ’*’ ’4’ ⇒ T ’+’ ’3’ ’*’ ’4’ ⇒ F ’+’ ’3’ ’*’ ’4’ ⇒ ’2’ ’+’ ’3’ ’*’ ’4’
T
’*’
F
’4’
7/7