CS计算机代考程序代写 compiler Context Free Languages interpreter Lecture 2. Context Free Grammar

Lecture 2. Context Free Grammar

Lecture 2. Context Free Grammar

January 22, 2021

COM S 342 Principles of Programming Languages @ Iowa State University 1

Overview

I what is a grammar
I what is a context free grammar (cfg)
I relations of strings and grammar

I derivation: derive a string from a grammar
I parsing: parse a string to determine if it comforms to the gramma

I parse tree and semantics
I ambiguity
I design grammar

COM S 342 Principles of Programming Languages @ Iowa State University 2

What is a grammar

COM S 342 Principles of Programming Languages @ Iowa State University 3

Specify Patterns in String

a program is a string that follows certain ”patterns” and ”rules”

the rules can be specified by:

I informal grammar – English description
I formal grammar

1. what are the atoms? (terminals, lexeme, token), e.g., int a = 0;
lexeme: int, a, =, 0, ; terminals (given in the grammar) identifier,
numbers; tokens (terminal + link to the symbol table, for
implementing compiler or interpreter)

2. how to compose them to form a sentence?
3. others, e.g., regular expressions

a program is syntactically correct (valid) if it follows the grammar of
the language (rules) in which it is written

COM S 342 Principles of Programming Languages @ Iowa State University 4

Grammar

I A language is the set of strings

I Rules to define a string in the language

I Every programming language has a grammar describing the syntax
(the order or arrangement of terminals need to follow certain
patterns, rules)

I Bases for parsing – given a program (string) and a grammar,
validate the grammatical correctness of the string, i.e., if the string
follows the rules of the language

COM S 342 Principles of Programming Languages @ Iowa State University 5

Connection to Formal Language and Formal Grammars

Language types are determined by their grammars; the grammar types
are determined by the patterns of their production rules

I Type-3 : regular language – a*b*

I Type-2 : context free languages – typical programs

I Type-1 : context-sensitive languages – (anbncn)

I Type-0: recursively enumerable

Chomsky Hierarchy. COM S 331 Theory of Computing.

COM S 342 Principles of Programming Languages @ Iowa State University 6

Formal Grammars Define Formal Languages

Chomsky Hierarchy

COM S 342 Principles of Programming Languages @ Iowa State University 7

Context Free Grammar (CFG)

1. formal definitions

2. example

3. notations

4. string and grammar: derivation, parsing, design grammars

COM S 342 Principles of Programming Languages @ Iowa State University 8

Context Free Grammar (CFG)

A CFG contains:

I A set of terminals or atoms in the language

I A set of non-terminals

I A set of (production rules) which describes how a non-terminal can
be expanded/rewritten to a sequence of terminals and non-terminals

COM S 342 Principles of Programming Languages @ Iowa State University 9

Context Free Grammar (CFG)

A CFG is a tuple G = (Σ,V ,S ,P), where

I σ is a set of terminals

I V is a set of non-terminals such that Σ ∩ V = ∅
I S ∈ V is a start non-terminal
I P is a set of product rules, each of the form: X → ω, such that

X ∈ V and ω ∈ (Σ ∪ V )+

COM S 342 Principles of Programming Languages @ Iowa State University 10

Context Free Grammar (CFG): Example

G = (Σ,V ,S ,P), where

I Σ = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,+,−, .
I V = unknown, part, digit, sign
I S = unknown
I P:

unknown → sign part . part|sign part
sign → +|−
part → digit|digit part
digit → 0|1|2|3|4|5|6|7|8|9

What language is unknown?

COM S 342 Principles of Programming Languages @ Iowa State University 11

Notations for Production Rules

I left hand side (LHS)→ right hand side (RHS)
I if not specified, assume LHS of first production is the start symbol
I productions with the same LHS are combined with |
I Backus-Naur Form (BNF) production rules, e.g., 〈A〉 := 〈B〉 ”c” 〈D〉

for A→ BcD, by , , used for describing the
grammar of Algol in 1962

I you will see slightly different notations in different systems

COM S 342 Principles of Programming Languages @ Iowa State University 12

Strings and grammar

I L(G) is the language defined by grammar G:
L(G) = {s ∈ Σ∗|S ⇒+ s}
S is the start symbol of the grammar
Σ is the set of terminals for that grammar

I L(G) contains all strings that can be derived from the start symbol
via one or more derivations

I a program s is a string over a set of tokens (keywords, identifies,
numbers, operators …). s is said to be syntactically correct if and
only if s ∈ L(G ), L(G) is the programming language in which s is
written, and G is its grammar

I if s ∈ L(G ), we say that the grammar accepts the string

COM S 342 Principles of Programming Languages @ Iowa State University 13

Strings and Grammar: generating strings from grammar

Rewriting/derivation: apply a sequence of production rules starting at
the start symbol, each application is called a rewrite

Given grammar rules: S → 0S |1S |�, generating a string 011 from G

COM S 342 Principles of Programming Languages @ Iowa State University 14

Strings and Grammar: generating strings from
grammar(contd.)

Rewriting/derivation: apply a sequence of production rules starting at
the start symbol, each application is called a rewrite

Given grammar rules: S → 0S |1S |�, generating a string 011 from G

Derivation:
S ⇒ 0S
⇒ 0S
⇒ 01S
⇒ 011S
⇒ 011

Is � a terminal?

COM S 342 Principles of Programming Languages @ Iowa State University 15

Derivation Notations

COM S 342 Principles of Programming Languages @ Iowa State University 16

In-class exercises: derivation

(1) Given grammar rules: S → (S)|�, can you generate (), (()), ()()?

(2) Given grammar: G = (Σ,V ,S ,P), where

I Σ = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,+,−, .
I S = unknown
I P:

unknown → sign part . part|sign part
sign → +|−
part → digit|digit part
digit → 0|1|2|3|4|5|6|7|8|9

What strings can unknown generate?

COM S 342 Principles of Programming Languages @ Iowa State University 17

Leftmost and Rightmost Derivations

COM S 342 Principles of Programming Languages @ Iowa State University 18

Leftmost and Rightmost Derivations

S is a start symbol
digit→ 0|1|2|3|4|5|…|9
part→ digit|digit part
S→ part|S+S |S-S|S*S|S/S

find the derivation for 1+2+3

COM S 342 Principles of Programming Languages @ Iowa State University 19

Leftmost and Rightmost Derivations

COM S 342 Principles of Programming Languages @ Iowa State University 20

Parsing

COM S 342 Principles of Programming Languages @ Iowa State University 21

Parsing

I Parsing: discovering the derivation and constructing a tree (called a
parse tree) based on the derivation

I a parse tree shows the structure of an expressions it corresponds to a
grammar

I Example
digit→ 0|1|2|3|4|5|…|9
part→ digit|digit part
S→ part|S+S |S-S|S*S|S/S

find the derivation for 1+2+3

COM S 342 Principles of Programming Languages @ Iowa State University 22

Parse Tree

COM S 342 Principles of Programming Languages @ Iowa State University 23

Parse Tree

A parse tree results from the derivation sequences

I a terminal is a leaf node

I each node in the tree is either a terminal or non-terminal in the
production rule

I each edge in the tree from a non-terminal results from the
application of production on the non-terminal

I application of production rule always result in new nodes in the tree

COM S 342 Principles of Programming Languages @ Iowa State University 24

In-class exercise: constructing parse tree

Grammar: E → a|b|c |d |E + E |E − E |E ∗ E |(E ) construct parse tree for

a, a*c, c*(b+d)

COM S 342 Principles of Programming Languages @ Iowa State University 25

COM S 342 Principles of Programming Languages @ Iowa State University 26

COM S 342 Principles of Programming Languages @ Iowa State University 27

Computing Values Using Parse Tree: Evaluating Semantics

I We call it evaluate an expression – get an value from the expression
I Semantics: meaning of a syntactically correct sentence
I How to automatically get a value from the string

I Classify the terminals into operands and operator classes
I Associate meaning with each operand.
I Associate meaning with the application of operator(s) on operand(s).
I Evaluate the semantics using parse tree

COM S 342 Principles of Programming Languages @ Iowa State University 28

Computing Values Using Parse Tree: Evaluating Semantics

I Start from the leaf-nodes to create the operand and find their
meanings.

I Apply the operators on the generated operands to obtain the
meaning of the application of the operators.

I Continue until you reach the root-node.

COM S 342 Principles of Programming Languages @ Iowa State University 29

Evaluating Semantics: an example

Given a grammar

digit→ 0|1|2|3|4|5|…|9
part→ digit|digit part
S→ part|S+S |S-S|S*S|S/S

Given a string 1+2+3

COM S 342 Principles of Programming Languages @ Iowa State University 30

Evaluating Semantics: an example continued

I Meaning of a number n can be n points.
I Meaning of Operators :+, -, *
I Follow the evaluation order given by

sem(+, sem(+, sem(1), sem(2)), sem(3))
to compute the final value

COM S 342 Principles of Programming Languages @ Iowa State University 31

Ambiguity

A grammar is ambiguous if there exists a string that has at least two
distinct parse trees

Given a grammar

digit→ 0|1|2|3|4|5|…|9
part→ digit|digit part
S→ part|S+S |S-S|S*S|S/S

Draw a parse tree for 1+2*3

COM S 342 Principles of Programming Languages @ Iowa State University 32

Ambiguity

I Ambiguity in Grammar can result in generating different values

I Ambiguity in Grammar can result in incorrect syntax-directed
translation

COM S 342 Principles of Programming Languages @ Iowa State University 33

Ambiguity

COM S 342 Principles of Programming Languages @ Iowa State University 34

Ambiguity

COM S 342 Principles of Programming Languages @ Iowa State University 35

Remove Ambiguity

I Add delimiters (e.g., parenthesis) – modify grammars, add terminals
of deliminator

I Add operator precedence and associativity – modify grammars,
giving additional rules beyond grammar

COM S 342 Principles of Programming Languages @ Iowa State University 36

Delimiters

COM S 342 Principles of Programming Languages @ Iowa State University 37

Give Operator Precedence

I If more than one operator is present in the expression, the precedence
order decides the order in which the operators should be applied.

I Modify grammar: add non-terminals for each precedence level, push
the higher levels towards the bottom of the parse trees

S→ part|S+S |S-S|S*S|S/S

S→ S+S |S-S|T
T→ T*T|T/T|part

COM S 342 Principles of Programming Languages @ Iowa State University 38

Operator Precedence

COM S 342 Principles of Programming Languages @ Iowa State University 39

Define Associativity

I If the same operator appears more than once in the same expression,
then associativity rule decides the order in which the operators
should be applied.

I There are two types of associativity: left and right:
I Left associativity: operators on the left are applied before the

operators on the right.
I Right associativity: operators on the right are applied before the

operators on the left.

Examples: x/y/z becomes (x/y)/z if / is left associative.

COM S 342 Principles of Programming Languages @ Iowa State University 40

Define Associativity: example

Given: S→ S+S |S-S|T
T→ T*T|T/T|part

I Assume that + is left-associative: What is the parse tree for 1 + 2
+ 3 * 4?

I Allow expansion only on the left-hand side of the parse tree, how to
modify the grammar?

S→ S+T |S-S|T
T→ T*T|T/T|part

COM S 342 Principles of Programming Languages @ Iowa State University 41

Remove Ambiguity: revisit

I There are no techniques that can always convert an ambiguous
grammar to an unambiguous grammar

I A language is ambiguous if its grammar is ambiguous

I Some grammars are left ambiguous for better representation of
concepts

I Observe how grammar rules can impact the shape of the parse trees

Questions:

I Does the left-derivation or right-derivation strategies impact the
ambiguity of the grammar?

I If the grammar is unambiguous, do the two derivation strategies
result in the same sequence of production rules?

COM S 342 Principles of Programming Languages @ Iowa State University 42

Design Grammars

How to specify the string patterns:

I regular expressions a∗b+: ab, aab, … (* the letter occurs 0 or more
time, + the letter occurs 1 or more times)

I English description – palindrome on the alphabet of a b c: aba, aa,
acca …

I set {an(bm|cm)|m > n > 0}
I grammars

COM S 342 Principles of Programming Languages @ Iowa State University 43

Designing Grammars

COM S 342 Principles of Programming Languages @ Iowa State University 44

Designing Grammars

COM S 342 Principles of Programming Languages @ Iowa State University 45

Review: homework and exam points

I grammar

I CFG
I strings, language and grammar
I derivation: left-most and right-most derivations
I parsing
I parse tree
I evaluating an expression using parse tree
I ambiguity
I removing ambiguity
I design grammar

COM S 342 Principles of Programming Languages @ Iowa State University 46