Lecture 2. Context Free Grammar
January 22, 2021
COM S 342 Principles of Programming Languages @ Iowa State University 1
Overview
what is a grammar
what is a context free grammar (cfg) relations of strings and grammar
derivation: derive a string from a grammar
parsing: parse a string to determine if it comforms to the gramma
parse tree and semantics ambiguity
design grammar
COM S 342 Principles of Programming Languages @ Iowa State University 2
What is a grammar
COM S 342 Principles of Programming Languages @ Iowa State University 3
Specify Patterns in String
a program is a string that follows certain ”patterns” and ”rules”
the rules can be specified by:
informal grammar – English description formal grammar
1. what are the atoms? (terminals, lexeme, token), e.g., int a = 0; lexeme: int, a, =, 0, ; terminals (given in the grammar) identifier, numbers; tokens (terminal + link to the symbol table, for implementing compiler or interpreter)
2. how to compose them to form a sentence?
3. others, e.g., regular expressions
a program is syntactically correct (valid) if it follows the grammar of the language (rules) in which it is written
COM S 342 Principles of Programming Languages @ Iowa State University 4
Grammar
A language is the set of strings
Rules to define a string in the language
Every programming language has a grammar describing the syntax (the order or arrangement of terminals need to follow certain patterns, rules)
Bases for parsing – given a program (string) and a grammar, validate the grammatical correctness of the string, i.e., if the string follows the rules of the language
COM S 342 Principles of Programming Languages @ Iowa State University 5
Connection to Formal Language and Formal Grammars
Language types are determined by their grammars; the grammar types are determined by the patterns of their production rules
Type-3 : regular language – a*b*
Type-2 : context free languages – typical programs Type-1 : context-sensitive languages – (anbncn)
Type-0: recursively enumerable
Chomsky Hierarchy. COM S 331 Theory of Computing.
COM S 342 Principles of Programming Languages @ Iowa State University 6
Formal Grammars Define Formal Languages
Chomsky Hierarchy
COM S 342 Principles of Programming Languages @ Iowa State University 7
Context Free Grammar (CFG)
1. formal definitions
2. example
3. notations
4. string and grammar: derivation, parsing, design grammars
COM S 342 Principles of Programming Languages @ Iowa State University 8
Context Free Grammar (CFG)
A CFG contains:
A set of terminals or atoms in the language
A set of non-terminals
A set of (production rules) which describes how a non-terminal can be expanded/rewritten to a sequence of terminals and non-terminals
COM S 342 Principles of Programming Languages @ Iowa State University 9
Context Free Grammar (CFG)
A CFG is a tuple G = (Σ,V,S,P), where
σ is a set of terminals
V isasetofnon-terminalssuchthatΣ∩V =∅
S ∈ V is a start non-terminal
P is a set of product rules, each of the form: X → ω, such that X ∈ V and ω ∈ (Σ ∪ V )+
COM S 342 Principles of Programming Languages @ Iowa State University 10
Context Free Grammar (CFG): Example
G = (Σ,V,S,P), where
Σ = 0,1,2,3,4,5,6,7,8,9,+,−,.
V = unknown, part, digit, sign
S = unknown
P:
unknown → sign part . part|sign part sign → +|−
part → digit|digit part
digit → 0|1|2|3|4|5|6|7|8|9
What language is unknown?
COM S 342 Principles of Programming Languages @ Iowa State University 11
Notations for Production Rules
left hand side (LHS)→ right hand side (RHS)
if not specified, assume LHS of first production is the start symbol
productions with the same LHS are combined with |
Backus-Naur Form (BNF) production rules, e.g., ⟨A⟩ := ⟨B⟩ ”c” ⟨D⟩ for A → BcD, by John Backus, Peter Naur, used for describing the grammar of Algol in 1962
you will see slightly different notations in different systems
COM S 342 Principles of Programming Languages @ Iowa State University 12
Strings and grammar
L(G) is the language defined by grammar G: L(G) = {s ∈ Σ∗|S ⇒+ s}
S is the start symbol of the grammar
Σ is the set of terminals for that grammar
L(G) contains all strings that can be derived from the start symbol via one or more derivations
a program s is a string over a set of tokens (keywords, identifies, numbers, operators …). s is said to be syntactically correct if and only if s ∈ L(G ), L(G) is the programming language in which s is written, and G is its grammar
if s ∈ L(G ), we say that the grammar accepts the string
COM S 342 Principles of Programming Languages @ Iowa State University 13
Strings and Grammar: generating strings from grammar
Rewriting/derivation: apply a sequence of production rules starting at the start symbol, each application is called a rewrite
Given grammar rules: S → 0S|1S|ε, generating a string 011 from G
COM S 342 Principles of Programming Languages @ Iowa State University 14
Strings and Grammar: generating strings from grammar(contd.)
Rewriting/derivation: apply a sequence of production rules starting at the start symbol, each application is called a rewrite
Given grammar rules: S → 0S|1S|ε, generating a string 011 from G Derivation:
S ⇒ 0S ⇒ 0S
⇒ 01S ⇒ 011S ⇒ 011
Is ε a terminal?
COM S 342 Principles of Programming Languages @ Iowa State University 15
Derivation Notations
COM S 342 Principles of Programming Languages @ Iowa State University 16
In-class exercises: derivation
(1) Given grammar rules: S → (S)|ε, can you generate (), (()), ()()? (2) Given grammar: G = (Σ, V , S, P), where
Σ = 0,1,2,3,4,5,6,7,8,9,+,−,.
S = unknown
P:
unknown → sign part . part|sign part sign → +|−
part → digit|digit part
digit → 0|1|2|3|4|5|6|7|8|9
What strings can unknown generate?
COM S 342 Principles of Programming Languages @ Iowa State University 17
Leftmost and Rightmost Derivations
COM S 342 Principles of Programming Languages @ Iowa State University 18
Leftmost and Rightmost Derivations
S is a start symbol
digit→ 0|1|2|3|4|5|…|9 part→ digit|digit part
S→ part|S+S |S-S|S*S|S/S
find the derivation for 1+2+3
COM S 342 Principles of Programming Languages @ Iowa State University 19
Leftmost and Rightmost Derivations
COM S 342 Principles of Programming Languages @ Iowa State University 20
Parsing
COM S 342 Principles of Programming Languages @ Iowa State University 21
Parsing
Parsing: discovering the derivation and constructing a tree (called a parse tree) based on the derivation
a parse tree shows the structure of an expressions it corresponds to a grammar
Example
digit→ 0|1|2|3|4|5|…|9 part→ digit|digit part
S→ part|S+S |S-S|S*S|S/S
find the derivation for 1+2+3
COM S 342 Principles of Programming Languages @ Iowa State University 22
Parse Tree
COM S 342 Principles of Programming Languages @ Iowa State University 23
Parse Tree
A parse tree results from the derivation sequences
a terminal is a leaf node
each node in the tree is either a terminal or non-terminal in the
production rule
each edge in the tree from a non-terminal results from the application of production on the non-terminal
application of production rule always result in new nodes in the tree
COM S 342 Principles of Programming Languages @ Iowa State University 24
In-class exercise: constructing parse tree
Grammar: E → a|b|c|d|E + E|E − E|E ∗ E|(E) construct parse tree for a, a*c, c*(b+d)
COM S 342 Principles of Programming Languages @ Iowa State University 25
COM S 342 Principles of Programming Languages @ Iowa State University 26
COM S 342 Principles of Programming Languages @ Iowa State University 27
Computing Values Using Parse Tree: Evaluating Semantics
We call it evaluate an expression – get an value from the expression Semantics: meaning of a syntactically correct sentence
How to automatically get a value from the string
Classify the terminals into operands and operator classes
Associate meaning with each operand.
Associate meaning with the application of operator(s) on operand(s). Evaluate the semantics using parse tree
COM S 342 Principles of Programming Languages @ Iowa State University 28
Computing Values Using Parse Tree: Evaluating Semantics
Start from the leaf-nodes to create the operand and find their meanings.
Apply the operators on the generated operands to obtain the meaning of the application of the operators.
Continue until you reach the root-node.
COM S 342 Principles of Programming Languages @ Iowa State University 29
Evaluating Semantics: an example
Given a grammar
digit→ 0|1|2|3|4|5|…|9 part→ digit|digit part
S→ part|S+S |S-S|S*S|S/S
Given a string 1+2+3
COM S 342 Principles of Programming Languages @ Iowa State University 30
Evaluating Semantics: an example continued
Meaning of a number n can be n points.
Meaning of Operators :+, -, *
Follow the evaluation order given by sem(+, sem(+, sem(1), sem(2)), sem(3)) to compute the final value
COM S 342 Principles of Programming Languages @ Iowa State University 31
Ambiguity
A grammar is ambiguous if there exists a string that has at least two distinct parse trees
Given a grammar
digit→ 0|1|2|3|4|5|…|9 part→ digit|digit part
S→ part|S+S |S-S|S*S|S/S
Draw a parse tree for 1+2*3
COM S 342 Principles of Programming Languages @ Iowa State University 32
Ambiguity
Ambiguity in Grammar can result in generating different values Ambiguity in Grammar can result in incorrect syntax-directed
translation
COM S 342 Principles of Programming Languages @ Iowa State University 33
Ambiguity
COM S 342 Principles of Programming Languages @ Iowa State University 34
Ambiguity
COM S 342 Principles of Programming Languages @ Iowa State University 35
Remove Ambiguity
Add delimiters (e.g., parenthesis) – modify grammars, add terminals of deliminator
Add operator precedence and associativity – modify grammars, giving additional rules beyond grammar
COM S 342 Principles of Programming Languages @ Iowa State University 36
Delimiters
COM S 342 Principles of Programming Languages @ Iowa State University 37
Give Operator Precedence
If more than one operator is present in the expression, the precedence order decides the order in which the operators should be applied.
Modify grammar: add non-terminals for each precedence level, push the higher levels towards the bottom of the parse trees
S→ part|S+S |S-S|S*S|S/S S→ S+S |S-S|T
T→ T*T|T/T|part
COM S 342 Principles of Programming Languages @ Iowa State University 38
Operator Precedence
COM S 342 Principles of Programming Languages @ Iowa State University 39
Define Associativity
If the same operator appears more than once in the same expression, then associativity rule decides the order in which the operators should be applied.
There are two types of associativity: left and right:
Left associativity: operators on the left are applied before the
operators on the right.
Right associativity: operators on the right are applied before the
operators on the left.
Examples: x/y/z becomes (x/y)/z if / is left associative.
COM S 342 Principles of Programming Languages @ Iowa State University 40
Define Associativity: example
Given: S→ S+S |S-S|T T→ T*T|T/T|part
Assume that + is left-associative: What is the parse tree for 1 + 2 + 3 * 4?
Allow expansion only on the left-hand side of the parse tree, how to modify the grammar?
S→ S+T |S-S|T T→ T*T|T/T|part
COM S 342 Principles of Programming Languages @ Iowa State University 41
Remove Ambiguity: revisit
There are no techniques that can always convert an ambiguous grammar to an unambiguous grammar
A language is ambiguous if its grammar is ambiguous
Some grammars are left ambiguous for better representation of
concepts
Observe how grammar rules can impact the shape of the parse trees
Questions:
Does the left-derivation or right-derivation strategies impact the ambiguity of the grammar?
If the grammar is unambiguous, do the two derivation strategies result in the same sequence of production rules?
COM S 342 Principles of Programming Languages @ Iowa State University 42
Design Grammars
How to specify the string patterns:
regular expressions a∗b+: ab, aab, … (* the letter occurs 0 or more time, + the letter occurs 1 or more times)
English description – palindrome on the alphabet of a b c: aba, aa, acca …
set {an(bm|cm)|m > n > 0}
grammars
COM S 342 Principles of Programming Languages @ Iowa State University 43
Designing Grammars
COM S 342 Principles of Programming Languages @ Iowa State University 44
Designing Grammars
COM S 342 Principles of Programming Languages @ Iowa State University 45
Review: homework and exam points
grammar
CFG
strings, language and grammar
derivation: left-most and right-most derivations parsing
parse tree
evaluating an expression using parse tree
ambiguity
removing ambiguity
design grammar
COM S 342 Principles of Programming Languages @ Iowa State University 46