CS计算机代考程序代写 Context Free Languages interpreter compiler Lecture 2. Context Free Grammar

Lecture 2. Context Free Grammar
January 22, 2021
COM S 342 Principles of Programming Languages @ Iowa State University 1

Overview
􏰉 what is a grammar
􏰉 what is a context free grammar (cfg) 􏰉 relations of strings and grammar
􏰉 derivation: derive a string from a grammar
􏰉 parsing: parse a string to determine if it comforms to the gramma
􏰉 parse tree and semantics 􏰉 ambiguity
􏰉 design grammar
COM S 342 Principles of Programming Languages @ Iowa State University 2

What is a grammar
COM S 342 Principles of Programming Languages @ Iowa State University 3

Specify Patterns in String
a program is a string that follows certain ”patterns” and ”rules”
the rules can be specified by:
􏰉 informal grammar – English description 􏰉 formal grammar
1. what are the atoms? (terminals, lexeme, token), e.g., int a = 0; lexeme: int, a, =, 0, ; terminals (given in the grammar) identifier, numbers; tokens (terminal + link to the symbol table, for implementing compiler or interpreter)
2. how to compose them to form a sentence?
3. others, e.g., regular expressions
a program is syntactically correct (valid) if it follows the grammar of the language (rules) in which it is written
COM S 342 Principles of Programming Languages @ Iowa State University 4

Grammar
􏰉 A language is the set of strings
􏰉 Rules to define a string in the language
􏰉 Every programming language has a grammar describing the syntax (the order or arrangement of terminals need to follow certain patterns, rules)
􏰉 Bases for parsing – given a program (string) and a grammar, validate the grammatical correctness of the string, i.e., if the string follows the rules of the language
COM S 342 Principles of Programming Languages @ Iowa State University 5

Connection to Formal Language and Formal Grammars
Language types are determined by their grammars; the grammar types are determined by the patterns of their production rules
􏰉 Type-3 : regular language – a*b*
􏰉 Type-2 : context free languages – typical programs 􏰉 Type-1 : context-sensitive languages – (anbncn)
􏰉 Type-0: recursively enumerable
Chomsky Hierarchy. COM S 331 Theory of Computing.
COM S 342 Principles of Programming Languages @ Iowa State University 6

Formal Grammars Define Formal Languages
Chomsky Hierarchy
COM S 342 Principles of Programming Languages @ Iowa State University 7

Context Free Grammar (CFG)
1. formal definitions
2. example
3. notations
4. string and grammar: derivation, parsing, design grammars
COM S 342 Principles of Programming Languages @ Iowa State University 8

Context Free Grammar (CFG)
A CFG contains:
􏰉 A set of terminals or atoms in the language
􏰉 A set of non-terminals
􏰉 A set of (production rules) which describes how a non-terminal can be expanded/rewritten to a sequence of terminals and non-terminals
COM S 342 Principles of Programming Languages @ Iowa State University 9

Context Free Grammar (CFG)
A CFG is a tuple G = (Σ,V,S,P), where
􏰉 σ is a set of terminals
􏰉 V isasetofnon-terminalssuchthatΣ∩V =∅
􏰉 S ∈ V is a start non-terminal
􏰉 P is a set of product rules, each of the form: X → ω, such that X ∈ V and ω ∈ (Σ ∪ V )+
COM S 342 Principles of Programming Languages @ Iowa State University 10

Context Free Grammar (CFG): Example
G = (Σ,V,S,P), where
􏰉 Σ = 0,1,2,3,4,5,6,7,8,9,+,−,.
􏰉 V = unknown, part, digit, sign
􏰉 S = unknown
􏰉 P:
unknown → sign part . part|sign part sign → +|−
part → digit|digit part
digit → 0|1|2|3|4|5|6|7|8|9
What language is unknown?
COM S 342 Principles of Programming Languages @ Iowa State University 11

Notations for Production Rules
􏰉 left hand side (LHS)→ right hand side (RHS)
􏰉 if not specified, assume LHS of first production is the start symbol
􏰉 productions with the same LHS are combined with |
􏰉 Backus-Naur Form (BNF) production rules, e.g., ⟨A⟩ := ⟨B⟩ ”c” ⟨D⟩ for A → BcD, by John Backus, Peter Naur, used for describing the grammar of Algol in 1962
􏰉 you will see slightly different notations in different systems
COM S 342 Principles of Programming Languages @ Iowa State University 12

Strings and grammar
􏰉 L(G) is the language defined by grammar G: L(G) = {s ∈ Σ∗|S ⇒+ s}
S is the start symbol of the grammar
Σ is the set of terminals for that grammar
􏰉 L(G) contains all strings that can be derived from the start symbol via one or more derivations
􏰉 a program s is a string over a set of tokens (keywords, identifies, numbers, operators …). s is said to be syntactically correct if and only if s ∈ L(G ), L(G) is the programming language in which s is written, and G is its grammar
􏰉 if s ∈ L(G ), we say that the grammar accepts the string
COM S 342 Principles of Programming Languages @ Iowa State University 13

Strings and Grammar: generating strings from grammar
Rewriting/derivation: apply a sequence of production rules starting at the start symbol, each application is called a rewrite
Given grammar rules: S → 0S|1S|ε, generating a string 011 from G
COM S 342 Principles of Programming Languages @ Iowa State University 14

Strings and Grammar: generating strings from grammar(contd.)
Rewriting/derivation: apply a sequence of production rules starting at the start symbol, each application is called a rewrite
Given grammar rules: S → 0S|1S|ε, generating a string 011 from G Derivation:
S ⇒ 0S ⇒ 0S
⇒ 01S ⇒ 011S ⇒ 011
Is ε a terminal?
COM S 342 Principles of Programming Languages @ Iowa State University 15

Derivation Notations
COM S 342 Principles of Programming Languages @ Iowa State University 16

In-class exercises: derivation
(1) Given grammar rules: S → (S)|ε, can you generate (), (()), ()()? (2) Given grammar: G = (Σ, V , S, P), where
􏰉 Σ = 0,1,2,3,4,5,6,7,8,9,+,−,.
􏰉 S = unknown
􏰉 P:
unknown → sign part . part|sign part sign → +|−
part → digit|digit part
digit → 0|1|2|3|4|5|6|7|8|9
What strings can unknown generate?
COM S 342 Principles of Programming Languages @ Iowa State University 17

Leftmost and Rightmost Derivations
COM S 342 Principles of Programming Languages @ Iowa State University 18

Leftmost and Rightmost Derivations
S is a start symbol
digit→ 0|1|2|3|4|5|…|9 part→ digit|digit part
S→ part|S+S |S-S|S*S|S/S
find the derivation for 1+2+3
COM S 342 Principles of Programming Languages @ Iowa State University 19

Leftmost and Rightmost Derivations
COM S 342 Principles of Programming Languages @ Iowa State University 20

Parsing
COM S 342 Principles of Programming Languages @ Iowa State University 21

Parsing
􏰉 Parsing: discovering the derivation and constructing a tree (called a parse tree) based on the derivation
􏰉 a parse tree shows the structure of an expressions it corresponds to a grammar
􏰉 Example
digit→ 0|1|2|3|4|5|…|9 part→ digit|digit part
S→ part|S+S |S-S|S*S|S/S
find the derivation for 1+2+3
COM S 342 Principles of Programming Languages @ Iowa State University 22

Parse Tree
COM S 342 Principles of Programming Languages @ Iowa State University 23

Parse Tree
A parse tree results from the derivation sequences
􏰉 a terminal is a leaf node
􏰉 each node in the tree is either a terminal or non-terminal in the
production rule
􏰉 each edge in the tree from a non-terminal results from the application of production on the non-terminal
􏰉 application of production rule always result in new nodes in the tree
COM S 342 Principles of Programming Languages @ Iowa State University 24

In-class exercise: constructing parse tree
Grammar: E → a|b|c|d|E + E|E − E|E ∗ E|(E) construct parse tree for a, a*c, c*(b+d)
COM S 342 Principles of Programming Languages @ Iowa State University 25

COM S 342 Principles of Programming Languages @ Iowa State University 26

COM S 342 Principles of Programming Languages @ Iowa State University 27

Computing Values Using Parse Tree: Evaluating Semantics
􏰉 We call it evaluate an expression – get an value from the expression 􏰉 Semantics: meaning of a syntactically correct sentence
􏰉 How to automatically get a value from the string
􏰉 Classify the terminals into operands and operator classes
􏰉 Associate meaning with each operand.
􏰉 Associate meaning with the application of operator(s) on operand(s). 􏰉 Evaluate the semantics using parse tree
COM S 342 Principles of Programming Languages @ Iowa State University 28

Computing Values Using Parse Tree: Evaluating Semantics
􏰉 Start from the leaf-nodes to create the operand and find their meanings.
􏰉 Apply the operators on the generated operands to obtain the meaning of the application of the operators.
􏰉 Continue until you reach the root-node.
COM S 342 Principles of Programming Languages @ Iowa State University 29

Evaluating Semantics: an example
Given a grammar
digit→ 0|1|2|3|4|5|…|9 part→ digit|digit part
S→ part|S+S |S-S|S*S|S/S
Given a string 1+2+3
COM S 342 Principles of Programming Languages @ Iowa State University 30

Evaluating Semantics: an example continued
􏰉 Meaning of a number n can be n points.
􏰉 Meaning of Operators :+, -, *
􏰉 Follow the evaluation order given by sem(+, sem(+, sem(1), sem(2)), sem(3)) to compute the final value
COM S 342 Principles of Programming Languages @ Iowa State University 31

Ambiguity
A grammar is ambiguous if there exists a string that has at least two distinct parse trees
Given a grammar
digit→ 0|1|2|3|4|5|…|9 part→ digit|digit part
S→ part|S+S |S-S|S*S|S/S
Draw a parse tree for 1+2*3
COM S 342 Principles of Programming Languages @ Iowa State University 32

Ambiguity
􏰉 Ambiguity in Grammar can result in generating different values 􏰉 Ambiguity in Grammar can result in incorrect syntax-directed
translation
COM S 342 Principles of Programming Languages @ Iowa State University 33

Ambiguity
COM S 342 Principles of Programming Languages @ Iowa State University 34

Ambiguity
COM S 342 Principles of Programming Languages @ Iowa State University 35

Remove Ambiguity
􏰉 Add delimiters (e.g., parenthesis) – modify grammars, add terminals of deliminator
􏰉 Add operator precedence and associativity – modify grammars, giving additional rules beyond grammar
COM S 342 Principles of Programming Languages @ Iowa State University 36

Delimiters
COM S 342 Principles of Programming Languages @ Iowa State University 37

Give Operator Precedence
􏰉 If more than one operator is present in the expression, the precedence order decides the order in which the operators should be applied.
􏰉 Modify grammar: add non-terminals for each precedence level, push the higher levels towards the bottom of the parse trees
S→ part|S+S |S-S|S*S|S/S S→ S+S |S-S|T
T→ T*T|T/T|part
COM S 342 Principles of Programming Languages @ Iowa State University 38

Operator Precedence
COM S 342 Principles of Programming Languages @ Iowa State University 39

Define Associativity
􏰉 If the same operator appears more than once in the same expression, then associativity rule decides the order in which the operators should be applied.
􏰉 There are two types of associativity: left and right:
􏰉 Left associativity: operators on the left are applied before the
operators on the right.
􏰉 Right associativity: operators on the right are applied before the
operators on the left.
Examples: x/y/z becomes (x/y)/z if / is left associative.
COM S 342 Principles of Programming Languages @ Iowa State University 40

Define Associativity: example
Given: S→ S+S |S-S|T T→ T*T|T/T|part
􏰉 Assume that + is left-associative: What is the parse tree for 1 + 2 + 3 * 4?
􏰉 Allow expansion only on the left-hand side of the parse tree, how to modify the grammar?
S→ S+T |S-S|T T→ T*T|T/T|part
COM S 342 Principles of Programming Languages @ Iowa State University 41

Remove Ambiguity: revisit
􏰉 There are no techniques that can always convert an ambiguous grammar to an unambiguous grammar
􏰉 A language is ambiguous if its grammar is ambiguous
􏰉 Some grammars are left ambiguous for better representation of
concepts
􏰉 Observe how grammar rules can impact the shape of the parse trees
Questions:
􏰉 Does the left-derivation or right-derivation strategies impact the ambiguity of the grammar?
􏰉 If the grammar is unambiguous, do the two derivation strategies result in the same sequence of production rules?
COM S 342 Principles of Programming Languages @ Iowa State University 42

Design Grammars
How to specify the string patterns:
􏰉 regular expressions a∗b+: ab, aab, … (* the letter occurs 0 or more time, + the letter occurs 1 or more times)
􏰉 English description – palindrome on the alphabet of a b c: aba, aa, acca …
􏰉 set {an(bm|cm)|m > n > 0}
􏰉 grammars
COM S 342 Principles of Programming Languages @ Iowa State University 43

Designing Grammars
COM S 342 Principles of Programming Languages @ Iowa State University 44

Designing Grammars
COM S 342 Principles of Programming Languages @ Iowa State University 45

Review: homework and exam points
􏰉 grammar
􏰉 CFG
􏰉 strings, language and grammar
􏰉 derivation: left-most and right-most derivations 􏰉 parsing
􏰉 parse tree
􏰉 evaluating an expression using parse tree
􏰉 ambiguity
􏰉 removing ambiguity
􏰉 design grammar
COM S 342 Principles of Programming Languages @ Iowa State University 46