CONTEXT-FREE GRAMMARS AND SYNTACTIC ANALYSIS
PRINCIPLES OF PROGRAMMING LANGUAGES
Norbert Zeh Winter 2021
Dalhousie University
1/10
ROAD MAP
• Parsing: Transform (tokenized) program text into parse tree
• Modelling programming languages: Context-free grammars and languages
• Capturing the syntactic structure of a program: Parse trees
• Types of parsers and types of grammars they can parse
• Grammars that describe programming languages and can be parsed
efficiently
• Construction of an LL(1) grammar
• Parsing LL(1) languages
• Push-down automata
2/10
GRAMMARS DESCRIBE LANGUAGES
A grammar for a subset of natural language
Sentence → Phrase Verb Phrase . Phrase → Noun
Phrase → Adjective Noun
Adjective → ’big’ | ’green’ Noun → ’cheese’ | ’Jim’
Verb → ’ate’
3/10
GRAMMARS DESCRIBE LANGUAGES
A grammar for a subset of natural language
Sentence → Phrase Verb Phrase . Phrase → Noun
Phrase → Adjective Noun
Adjective → ’big’ | ’green’ Noun → ’cheese’ | ’Jim’
Verb → ’ate’
Sentences in the language described by this grammar
• big Jim ate green cheese.
• green Jim ate green cheese. • Jim ate cheese.
• cheese ate Jim.
3/10
ANOTHER EXAMPLE
A grammar for simple arithmetic expressions
Expression → Term
Expression → Expression ‘+’ Term Expression → Expression ‘-’ Term
Term → Factor
Term → Term ‘*’ Factor Term → Term ‘/’ Factor
Factor → number
Factor → identifier Factor → ‘(’ Expression ‘)’
4/10
CONTEXT-FREE GRAMMARS
Definition: Context-free grammar
A quadruple G = (V, Σ, P, S) with • A set of non-terminals or
variables V,
• A set of terminals Σ,
• A set of rules or productions in the form
V → (V∪Σ)∗
• A start symbol S ∈ V.
5/10
CONTEXT-FREE GRAMMARS
Definition: Context-free grammar
A quadruple G = (V, Σ, P, S) with • A set of non-terminals or
variables V,
• A set of terminals Σ,
• A set of rules or productions in
The “alphabet” we use to form words in
V → (V∪Σ)∗ • A start symbol S ∈ V.
the form
the language described by the grammar.
5/10
CONTEXT-FREE GRAMMARS
Definition: Context-frReepgrreasmenmta“srtructural
a string.
A quadruple G = (V, Σ, P, S) with • A set of non-terminals or
variables V,
• A set of terminals Σ,
• A set of rules or productions in the form
V → (V∪Σ)∗
• A start symbol S ∈ V.
components” of
5/10
CONTEXT-FREE GRAMMARS
Definition: Context-free grammar
A quadruple G = (V, Σ, P, S) with • A set of non-terminals or
variables V,
• A set of terminals Σ,
• A set of rules or productions in the form
V → (V∪Σ)∗
• A start symbol S ∈ V.
Example:
Expression → Term
Expression → Expression ‘+’ Term Expression → Expression ‘-’ Term
Term → Factor
Term → Term ‘*’ Factor Term → Term ‘/’ Factor
Factor → number
Factor → identifier Factor → ‘(’ Expression ‘)’
5/10
CONTEXT-FREE GRAMMARS
Definition: Context-free grammar
A quadruple G = (V, Σ, P, S) with • A set of non-terminals or
variables V, Non-terminals:
•ExApsretssoifotne,rTmerinma,lFsaΣct, or
(All the symbols on the left-hand
• A set of rules or productions in sides of productions)
the form
Convention: ∗ V → (V∪Σ)
Non-terminals start with capital
• A start symbol S ∈ V. letters, use italic font
Example:
Expression → Term
Expression → Expression ‘+’ Term Expression → Expression ‘-’ Term
Term → Factor
Term → Term ‘*’ Factor Term → Term ‘/’ Factor
Factor → number
Factor → identifier Factor → ‘(’ Expression ‘)’
5/10
CONTEXT-FREE GRAMMARS
Definition: Context-free grammar
A quadruple G = (V, Σ, P, S) with • A set of non-terminals or
variables V, Terminals:
Example:
Expression → Term
Expression → Expression ‘+’ Term Expression → Expression ‘-’ Term
Term → Factor
Term → Term ‘*’ Factor Term → Term ‘/’ Factor
Factor → number
Factor → identifier Factor → ‘(’ Expression ‘)’
•nAumsebteor,fitdeernmtifinearls, ‘Σ+’, ‘-’, ‘*’, ‘/’, ‘(’, ‘)’ (All the symbols not on the left-hand
• A set of rules or productions in sides of productions)
the form
Convention: ∗ V → (V∪Σ)
Terminals start with lowercase letters,
• A start symbol S ∈ V.
use italic font or are strings in quotes
5/10
CONTEXT-FREE GRAMMARS
Definition: Context-free grammar
A quadruple G = (V, Σ, P, S) with • A set of non-terminals or
Example:
Expression → Term
Expression → Expression ‘+’ Term Expression → Expression ‘-’ Term
Term → Factor
Term → Term ‘*’ Factor Term → Term ‘/’ Factor
Factor → number
Factor → identifier Factor → ‘(’ Expression ‘)’
variables V,
• A set of terminals Σ,
Rules:
The ones you see here
• A set of rules or productions in the form
V → (V∪Σ)∗ • A start symbol S ∈ V.
5/10
CONTEXT-FREE GRAMMARS
Definition: Context-free grammar
Convention:
A quadruple G = (V, Σ, P, S) with
The start symbol is the
• A set of non-terminals or left-hand side of the
variables V,
first rule
• A set of terminals Σ,
• A set of rules or productions in
the form
V → (V∪Σ)∗
• A start symbol S ∈ V.
Example:
Expression → Term
Expression → Expression ‘+’ Term Expression → Expression ‘-’ Term
Term → Factor
Term → Term ‘*’ Factor Term → Term ‘/’ Factor
Factor → number
Factor → identifier Factor → ‘(’ Expression ‘)’
5/10
NOTATIONAL VARIATIONS IN PRODUCTIONS (1)
6/10
NOTATIONAL VARIATIONS IN PRODUCTIONS (1)
Merging alternatives using ‘|’:
Phrase → Noun Phrase → Noun | Adjective Noun Phrase → Adjective Noun
6/10
NOTATIONAL VARIATIONS IN PRODUCTIONS (1)
Merging alternatives using ‘|’:
Phrase → Noun
Phrase → Adjective Noun
Backus-Naur form:
Phrase → Adjective Noun
Phrase → Noun | Adjective Noun
Phrase ::= Adjective Noun
6/10
NOTATIONAL VARIATIONS IN PRODUCTIONS (2)
7/10
NOTATIONAL VARIATIONS IN PRODUCTIONS (2)
Optional components:
FunctionDef → DeclSpecsopt Decl DeclListopt CompoundStmt
7/10
NOTATIONAL VARIATIONS IN PRODUCTIONS (2)
Optional components:
FunctionDef → DeclSpecsopt Decl DeclListopt CompoundStmt
Regular expression on RHS:
Expr → Term (op Term)∗
7/10
DERIVATIONS
Definition: Derivation
Sequence of rewriting operations that starts with the string σ = S and repeats the following until σ contains only terminals:
• Choose a non-terminal X in σ and a production X → β
in the grammar G.
• Replace X with β in σ.
As a picture:
σ=αXγ⇒G σ′ =αβγ
8/10
DERIVATIONS
Definition: Derivation
Sequence of rewriting operations that starts with the string σ = S and repeats the following until σ contains only terminals:
• Choose a non-terminal X in σ and a production X → β
in the grammar G.
• Replace X with β in σ.
As a picture:
σ=αXγ⇒G σ′ =αβγ
Example:
Sentence ⇒G Phrase Verb Phrase
8/10
DERIVATIONS
Definition: Derivation
Sequence of rewriting operations that starts with the string σ = S and repeats the following until σ contains only terminals:
• Choose a non-terminal X in σ and a production X → β
in the grammar G.
• Replace X with β in σ.
As a picture:
σ=αXγ⇒G σ′ =αβγ
Example:
Sentence ⇒G Phrase Verb Phrase Phrase Verb Phrase ⇒G Noun Verb Phrase
8/10
DERIVATIONS
Definition: Derivation
Sequence of rewriting operations that starts with the string σ = S and repeats the following until σ contains only terminals:
• Choose a non-terminal X in σ and a production X → β
in the grammar G.
• Replace X with β in σ.
As a picture:
σ=αXγ⇒G σ′ =αβγ
Example:
Sentence ⇒G Phrase Verb Phrase Phrase Verb Phrase ⇒G Noun Verb Phrase
Noun Verb Phrase ⇒G Noun Verb Noun
8/10
DERIVATIONS
Definition: Derivation
Sequence of rewriting operations that starts with the string σ = S and repeats the following until σ contains only terminals:
• Choose a non-terminal X in σ and a production X → β
in the grammar G.
• Replace X with β in σ.
As a picture:
σ=αXγ⇒G σ′ =αβγ
Example:
Sentence ⇒G Phrase Verb Phrase Phrase Verb Phrase ⇒G Noun Verb Phrase
Noun Verb Phrase ⇒G Noun Verb Noun Noun Verb Noun ⇒G Jim Verb Noun
8/10
DERIVATIONS
Definition: Derivation
Sequence of rewriting operations that starts with the string σ = S and repeats the following until σ contains only terminals:
• Choose a non-terminal X in σ and a production X → β
in the grammar G.
• Replace X with β in σ.
As a picture:
σ=αXγ⇒G σ′ =αβγ
Example:
Sentence ⇒G Phrase Verb Phrase Phrase Verb Phrase ⇒G Noun Verb Phrase
Noun Verb Phrase ⇒G Noun Verb Noun Noun Verb Noun ⇒G Jim Verb Noun
Jim Verb Noun ⇒G Jim ate Noun
8/10
DERIVATIONS
Definition: Derivation
Sequence of rewriting operations that starts with the string σ = S and repeats the following until σ contains only terminals:
• Choose a non-terminal X in σ and a production X → β
in the grammar G.
• Replace X with β in σ.
As a picture:
σ=αXγ⇒G σ′ =αβγ
Example:
Sentence ⇒G Phrase Verb Phrase Phrase Verb Phrase ⇒G Noun Verb Phrase
Noun Verb Phrase ⇒G Noun Verb Noun Noun Verb Noun ⇒G Jim Verb Noun
Jim Verb Noun ⇒G Jim ate Noun Jim ate Noun ⇒G Jim ate cheese
8/10
DERIVATIONS
Definition: Derivation
Sequence of rewriting operations that starts with the string σ = S and repeats the following until σ contains only terminals:
• Choose a non-terminal X in σ and a production X → β
in the grammar G.
• Replace X with β in σ.
As a picture:
σ=αXγ⇒G σ′ =αβγ
Example:
Sentence ⇒G Phrase Verb Phrase Phrase Verb Phrase ⇒G Noun Verb Phrase
Noun Verb Phrase ⇒G Noun Verb Noun Noun Verb Noun ⇒G Jim Verb Noun
Jim Verb Noun ⇒G Jim ate Noun Jim ate Noun ⇒G Jim ate cheese
The intermediate strings are called sentential forms.
8/10
THE LANGUAGE DEFINED BY A CONTEXT-FREE GRAMMAR
WewriteS⇒∗G σifthereexistsasequenceS⇒G σ1 ⇒G σ2 ⇒G ···⇒G σ.
9/10
THE LANGUAGE DEFINED BY A CONTEXT-FREE GRAMMAR
WewriteS⇒∗G σifthereexistsasequenceS⇒G σ1 ⇒G σ2 ⇒G ···⇒G σ.
Definition: Language defined by a grammar
The language defined by a grammar G = (V, Σ, P, S) is L(G)={σ∈Σ∗ |S⇒∗G σ}.
9/10
THE LANGUAGE DEFINED BY A CONTEXT-FREE GRAMMAR
WewriteS⇒∗G σifthereexistsasequenceS⇒G σ1 ⇒G σ2 ⇒G ···⇒G σ.
Definition: Language defined by a grammar
The language defined by a grammar G = (V, Σ, P, S) is L(G)={σ∈Σ∗ |S⇒∗G σ}.
If G is a context-free grammar, then L(G) is a context-free language.
9/10
THE LANGUAGE DEFINED BY A CONTEXT-FREE GRAMMAR
WewriteS⇒∗G σifthereexistsasequenceS⇒G σ1 ⇒G σ2 ⇒G ···⇒G σ.
Definition: Language defined by a grammar
The language defined by a grammar G = (V, Σ, P, S) is L(G)={σ∈Σ∗ |S⇒∗G σ}.
If G is a context-free grammar, then L(G) is a context-free language.
We only discussed context-free grammars so far:
Rules of the form X → σ are applicable no matter what
surrounds X in the current sentential form (without context).
Context-sensitive grammars may include rules of the form
αXβ → γYδ; the replacement of X with Y depends on the context.
9/10
THE LANGUAGE DEFINED BY A CONTEXT-FREE GRAMMAR
WewriteS⇒∗G σifthereexistsasequenceS⇒G σ1 ⇒G σ2 ⇒G ···⇒G σ.
Definition: Language defined by a grammar
The language defined by a grammar G = (V, Σ, P, S) is L(G)={σ∈Σ∗ |S⇒∗G σ}.
If G is a context-free grammar, then L(G) is a context-free language.
Example:
The “Jim-ate-cheese” grammar defines the language
L(G) = {Jim ate cheese, Jim ate Jim, big Jim ate cheese, . . .}.
9/10
CONTEXT-FREE LANGUAGES: EXAMPLES
←− ∗ The following grammar defines the language L(G) = {σ σ | σ ∈ {0, 1} }
S→ε S→0S0 S→1S1
←−
( σ is σ written backwards):
10/10
CONTEXT-FREE LANGUAGES: EXAMPLES
←− ∗ The following grammar defines the language L(G) = {σ σ | σ ∈ {0, 1} }
S→ε S→0S0 S→1S1
←−
( σ is σ written backwards):
The following grammar defines the language L(G) = {0n1n | n ≥ 0}: S→ε
S→0S1
10/10
CONTEXT-FREE LANGUAGES: EXAMPLES
←− ∗ The following grammar defines the language L(G) = {σ σ | σ ∈ {0, 1} }
S→ε S→0S0 S→1S1
←−
( σ is σ written backwards):
The following grammar defines the language L(G) = {0n1n | n ≥ 0}: S→ε
S→0S1 Neither of these two languages is regular!
10/10