Lecture 2. Context Free Grammar
Lecture 2. Context Free Grammar
January 22, 2021
COM S 342 Principles of Programming Languages @ Iowa State University 1
Overview
I what is a grammar
I what is a context free grammar (cfg)
I relations of strings and grammar
I derivation: derive a string from a grammar
I parsing: parse a string to determine if it comforms to the gramma
I parse tree and semantics
I ambiguity
I design grammar
COM S 342 Principles of Programming Languages @ Iowa State University 2
What is a grammar
COM S 342 Principles of Programming Languages @ Iowa State University 3
Specify Patterns in String
a program is a string that follows certain ”patterns” and ”rules”
the rules can be specified by:
I informal grammar – English description
I formal grammar
1. what are the atoms? (terminals, lexeme, token), e.g., int a = 0;
lexeme: int, a, =, 0, ; terminals (given in the grammar) identifier,
numbers; tokens (terminal + link to the symbol table, for
implementing compiler or interpreter)
2. how to compose them to form a sentence?
3. others, e.g., regular expressions
a program is syntactically correct (valid) if it follows the grammar of
the language (rules) in which it is written
COM S 342 Principles of Programming Languages @ Iowa State University 4
Grammar
I A language is the set of strings
I Rules to define a string in the language
I Every programming language has a grammar describing the syntax
(the order or arrangement of terminals need to follow certain
patterns, rules)
I Bases for parsing – given a program (string) and a grammar,
validate the grammatical correctness of the string, i.e., if the string
follows the rules of the language
COM S 342 Principles of Programming Languages @ Iowa State University 5
Connection to Formal Language and Formal Grammars
Language types are determined by their grammars; the grammar types
are determined by the patterns of their production rules
I Type-3 : regular language – a*b*
I Type-2 : context free languages – typical programs
I Type-1 : context-sensitive languages – (anbncn)
I Type-0: recursively enumerable
Chomsky Hierarchy. COM S 331 Theory of Computing.
COM S 342 Principles of Programming Languages @ Iowa State University 6
Formal Grammars Define Formal Languages
Chomsky Hierarchy
COM S 342 Principles of Programming Languages @ Iowa State University 7
Context Free Grammar (CFG)
1. formal definitions
2. example
3. notations
4. string and grammar: derivation, parsing, design grammars
COM S 342 Principles of Programming Languages @ Iowa State University 8
Context Free Grammar (CFG)
A CFG contains:
I A set of terminals or atoms in the language
I A set of non-terminals
I A set of (production rules) which describes how a non-terminal can
be expanded/rewritten to a sequence of terminals and non-terminals
COM S 342 Principles of Programming Languages @ Iowa State University 9
Context Free Grammar (CFG)
A CFG is a tuple G = (Σ,V ,S ,P), where
I σ is a set of terminals
I V is a set of non-terminals such that Σ ∩ V = ∅
I S ∈ V is a start non-terminal
I P is a set of product rules, each of the form: X → ω, such that
X ∈ V and ω ∈ (Σ ∪ V )+
COM S 342 Principles of Programming Languages @ Iowa State University 10
Context Free Grammar (CFG): Example
G = (Σ,V ,S ,P), where
I Σ = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,+,−, .
I V = unknown, part, digit, sign
I S = unknown
I P:
unknown → sign part . part|sign part
sign → +|−
part → digit|digit part
digit → 0|1|2|3|4|5|6|7|8|9
What language is unknown?
COM S 342 Principles of Programming Languages @ Iowa State University 11
Notations for Production Rules
I left hand side (LHS)→ right hand side (RHS)
I if not specified, assume LHS of first production is the start symbol
I productions with the same LHS are combined with |
I Backus-Naur Form (BNF) production rules, e.g., 〈A〉 := 〈B〉 ”c” 〈D〉
for A→ BcD, by , , used for describing the
grammar of Algol in 1962
I you will see slightly different notations in different systems
COM S 342 Principles of Programming Languages @ Iowa State University 12
Strings and grammar
I L(G) is the language defined by grammar G:
L(G) = {s ∈ Σ∗|S ⇒+ s}
S is the start symbol of the grammar
Σ is the set of terminals for that grammar
I L(G) contains all strings that can be derived from the start symbol
via one or more derivations
I a program s is a string over a set of tokens (keywords, identifies,
numbers, operators …). s is said to be syntactically correct if and
only if s ∈ L(G ), L(G) is the programming language in which s is
written, and G is its grammar
I if s ∈ L(G ), we say that the grammar accepts the string
COM S 342 Principles of Programming Languages @ Iowa State University 13
Strings and Grammar: generating strings from grammar
Rewriting/derivation: apply a sequence of production rules starting at
the start symbol, each application is called a rewrite
Given grammar rules: S → 0S |1S |�, generating a string 011 from G
COM S 342 Principles of Programming Languages @ Iowa State University 14
Strings and Grammar: generating strings from
grammar(contd.)
Rewriting/derivation: apply a sequence of production rules starting at
the start symbol, each application is called a rewrite
Given grammar rules: S → 0S |1S |�, generating a string 011 from G
Derivation:
S ⇒ 0S
⇒ 0S
⇒ 01S
⇒ 011S
⇒ 011
Is � a terminal?
COM S 342 Principles of Programming Languages @ Iowa State University 15
Derivation Notations
COM S 342 Principles of Programming Languages @ Iowa State University 16
In-class exercises: derivation
(1) Given grammar rules: S → (S)|�, can you generate (), (()), ()()?
(2) Given grammar: G = (Σ,V ,S ,P), where
I Σ = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,+,−, .
I S = unknown
I P:
unknown → sign part . part|sign part
sign → +|−
part → digit|digit part
digit → 0|1|2|3|4|5|6|7|8|9
What strings can unknown generate?
COM S 342 Principles of Programming Languages @ Iowa State University 17
Leftmost and Rightmost Derivations
COM S 342 Principles of Programming Languages @ Iowa State University 18
Leftmost and Rightmost Derivations
S is a start symbol
digit→ 0|1|2|3|4|5|…|9
part→ digit|digit part
S→ part|S+S |S-S|S*S|S/S
find the derivation for 1+2+3
COM S 342 Principles of Programming Languages @ Iowa State University 19
Leftmost and Rightmost Derivations
COM S 342 Principles of Programming Languages @ Iowa State University 20
Parsing
COM S 342 Principles of Programming Languages @ Iowa State University 21
Parsing
I Parsing: discovering the derivation and constructing a tree (called a
parse tree) based on the derivation
I a parse tree shows the structure of an expressions it corresponds to a
grammar
I Example
digit→ 0|1|2|3|4|5|…|9
part→ digit|digit part
S→ part|S+S |S-S|S*S|S/S
find the derivation for 1+2+3
COM S 342 Principles of Programming Languages @ Iowa State University 22
Parse Tree
COM S 342 Principles of Programming Languages @ Iowa State University 23
Parse Tree
A parse tree results from the derivation sequences
I a terminal is a leaf node
I each node in the tree is either a terminal or non-terminal in the
production rule
I each edge in the tree from a non-terminal results from the
application of production on the non-terminal
I application of production rule always result in new nodes in the tree
COM S 342 Principles of Programming Languages @ Iowa State University 24
In-class exercise: constructing parse tree
Grammar: E → a|b|c |d |E + E |E − E |E ∗ E |(E ) construct parse tree for
a, a*c, c*(b+d)
COM S 342 Principles of Programming Languages @ Iowa State University 25
COM S 342 Principles of Programming Languages @ Iowa State University 26
COM S 342 Principles of Programming Languages @ Iowa State University 27
Computing Values Using Parse Tree: Evaluating Semantics
I We call it evaluate an expression – get an value from the expression
I Semantics: meaning of a syntactically correct sentence
I How to automatically get a value from the string
I Classify the terminals into operands and operator classes
I Associate meaning with each operand.
I Associate meaning with the application of operator(s) on operand(s).
I Evaluate the semantics using parse tree
COM S 342 Principles of Programming Languages @ Iowa State University 28
Computing Values Using Parse Tree: Evaluating Semantics
I Start from the leaf-nodes to create the operand and find their
meanings.
I Apply the operators on the generated operands to obtain the
meaning of the application of the operators.
I Continue until you reach the root-node.
COM S 342 Principles of Programming Languages @ Iowa State University 29
Evaluating Semantics: an example
Given a grammar
digit→ 0|1|2|3|4|5|…|9
part→ digit|digit part
S→ part|S+S |S-S|S*S|S/S
Given a string 1+2+3
COM S 342 Principles of Programming Languages @ Iowa State University 30
Evaluating Semantics: an example continued
I Meaning of a number n can be n points.
I Meaning of Operators :+, -, *
I Follow the evaluation order given by
sem(+, sem(+, sem(1), sem(2)), sem(3))
to compute the final value
COM S 342 Principles of Programming Languages @ Iowa State University 31
Ambiguity
A grammar is ambiguous if there exists a string that has at least two
distinct parse trees
Given a grammar
digit→ 0|1|2|3|4|5|…|9
part→ digit|digit part
S→ part|S+S |S-S|S*S|S/S
Draw a parse tree for 1+2*3
COM S 342 Principles of Programming Languages @ Iowa State University 32
Ambiguity
I Ambiguity in Grammar can result in generating different values
I Ambiguity in Grammar can result in incorrect syntax-directed
translation
COM S 342 Principles of Programming Languages @ Iowa State University 33
Ambiguity
COM S 342 Principles of Programming Languages @ Iowa State University 34
Ambiguity
COM S 342 Principles of Programming Languages @ Iowa State University 35
Remove Ambiguity
I Add delimiters (e.g., parenthesis) – modify grammars, add terminals
of deliminator
I Add operator precedence and associativity – modify grammars,
giving additional rules beyond grammar
COM S 342 Principles of Programming Languages @ Iowa State University 36
Delimiters
COM S 342 Principles of Programming Languages @ Iowa State University 37
Give Operator Precedence
I If more than one operator is present in the expression, the precedence
order decides the order in which the operators should be applied.
I Modify grammar: add non-terminals for each precedence level, push
the higher levels towards the bottom of the parse trees
S→ part|S+S |S-S|S*S|S/S
S→ S+S |S-S|T
T→ T*T|T/T|part
COM S 342 Principles of Programming Languages @ Iowa State University 38
Operator Precedence
COM S 342 Principles of Programming Languages @ Iowa State University 39
Define Associativity
I If the same operator appears more than once in the same expression,
then associativity rule decides the order in which the operators
should be applied.
I There are two types of associativity: left and right:
I Left associativity: operators on the left are applied before the
operators on the right.
I Right associativity: operators on the right are applied before the
operators on the left.
Examples: x/y/z becomes (x/y)/z if / is left associative.
COM S 342 Principles of Programming Languages @ Iowa State University 40
Define Associativity: example
Given: S→ S+S |S-S|T
T→ T*T|T/T|part
I Assume that + is left-associative: What is the parse tree for 1 + 2
+ 3 * 4?
I Allow expansion only on the left-hand side of the parse tree, how to
modify the grammar?
S→ S+T |S-S|T
T→ T*T|T/T|part
COM S 342 Principles of Programming Languages @ Iowa State University 41
Remove Ambiguity: revisit
I There are no techniques that can always convert an ambiguous
grammar to an unambiguous grammar
I A language is ambiguous if its grammar is ambiguous
I Some grammars are left ambiguous for better representation of
concepts
I Observe how grammar rules can impact the shape of the parse trees
Questions:
I Does the left-derivation or right-derivation strategies impact the
ambiguity of the grammar?
I If the grammar is unambiguous, do the two derivation strategies
result in the same sequence of production rules?
COM S 342 Principles of Programming Languages @ Iowa State University 42
Design Grammars
How to specify the string patterns:
I regular expressions a∗b+: ab, aab, … (* the letter occurs 0 or more
time, + the letter occurs 1 or more times)
I English description – palindrome on the alphabet of a b c: aba, aa,
acca …
I set {an(bm|cm)|m > n > 0}
I grammars
COM S 342 Principles of Programming Languages @ Iowa State University 43
Designing Grammars
COM S 342 Principles of Programming Languages @ Iowa State University 44
Designing Grammars
COM S 342 Principles of Programming Languages @ Iowa State University 45
Review: homework and exam points
I grammar
I CFG
I strings, language and grammar
I derivation: left-most and right-most derivations
I parsing
I parse tree
I evaluating an expression using parse tree
I ambiguity
I removing ambiguity
I design grammar
COM S 342 Principles of Programming Languages @ Iowa State University 46