COMP3630/6360: Theory of Computation Semester 1, 2022
The Australian National University
Context Free Languages
Copyright By PowCoder代写 加微信 powcoder
This lecture covers Chapter 5 of HMU: Context-free Grammars
(Context-free) Grammars
(Leftmost and Rightmost) Derivations
Parse Trees
An Equivalence between Derivations and Parse Trees Ambiguity in Grammars
Additional Reading: Chapter 5 of HMU.
Introduction to Grammars
We have so far seen machine-like means (e.g., DFAs) and declarative means (e.g., regular expressions) of defining languages
Grammars are a generative means of defining languages.
Grammars can be used to create a strictly larger class of languages.
They are especially useful in compiler and parser design; they can be used to check if:
ó parantheses are balanced in a program,
ó else occurrences have a matching if, etc.
Grammars: Formal Definition
A context-free grammar (CFG) G = (V,T,P,S), where
ó V is a finite set whose elements are called variables or non-terminal symbols.
Notation: upper case letters, e.g., A, B, . . ..
ó T is a finite set whose elements are called terminal symbols; T is precisely the alphabet of the language generated by the grammar G.
Notation: lower case letters, e.g., s1,s2,….
ó P ⊆ V ×(V ∪T)∗ is a finite set of production rules.
ó Each production rule (A, α) is also written as A −→ α.
Terminology: A , α are called the head and body of the production rule, resp. ó S ∈ T is the unique variable/non-terminal that ‘generates’ the language.
ó Strings consisting of non-terminals and/or terminals will be denoted by greek symbols, e.g., α, β, . . ..
ó Strings of terminals will be denoted by lower case letters, e.g., w,u,v
Derivations
How do Grammars Generate Languages?
A string w ∈ T∗ is in the language L(G) generated by G = (V,T,P,S) iff we can derive w from S, i.e.,
start from S and use production rule(s) repeatedly to replace heads of the rules by their bodies until a string in T∗ is obtained.
Let G = ({S}, {0, 1}, P, S) be a CFG with P given by
0S0 000 010
1111 11111
0110 01S10
(S,ε),(S,0),(S,1)
(S, 0S0), (S, 1S1) 0
01110 10101
S −→ 0 (2) S−→1
S −→ 0S0 S −→ 1S1
(3) S−→ε|0|1|0S0|1S1
10S01 1001
0000 00100 00000
Derivations
Derivation: Formal Definition
Definition
Given G = (V , T , P , S ) and α, β ∈ (V ∪ T )∗ , a derivation of β from α is a finite sequenceofstringsγ1 ⇒γ2 ⇒···⇒γk forsomek∈Nwhere
GGG 1. γ1 = α and γk = β;
2.γ1,…,γk ∈(V∪T)∗
3. For each i = 1,…,k − 1, γi+1 is obtained from γi by replacing the head of a
production rule of P by its body.
The following phrases are used interchangeably.
β is derived from α ⇔ there exists a derivation of β from α ⇔ α ⇒∗ β. G
For the grammar G = ({S},{0,1},P,S)with P given by S −→ ε|0|1|0S0|1S1, the following is a derivation of 010111010 from S
S ⇒ 0S0 ⇒ 01S10 ⇒ 010S010 ⇒ 0101S1010 ⇒ 010111010. GGGGG
S→0S0 S→1S1 S→0S0 S→1S1 S→1
Derivations
Sentential Forms and Language Generated by a Grammar: Definitions
Definition
GivenG=(V,T,P,S),anystringin(V∪T)∗ derivedfromSisasententialform.
The set of all sentential forms of G (denoted by SF(G)) is defined inductively:
ó Basis: S ∈ SF(G)
óInduction:ifαAγ∈SF(G)forsomeα,γ∈(V∪T)∗ andA∈V,andA−→βis
a production rule, then αβγ ∈ SF(G).
ó Only those strings that are generated by the above induction are sentential forms.
Definition
Given CFG G = (V , T , P , S ), the language L(G ) generated by G are the sentential forms that are in T∗, i.e., L(G) = SF(G) ∩ T∗.
For the CFG G = ({S},{0,1},P,S)with P given by S −→ ε|0|1|0S0|1S1,
(1) S, ε, 0, 1 0S0, 00, 000, 010, 1S1, 11, 101, 111,… are all sentential forms. (2) S, ε, 0, 1 0S0, 00, 000, 010, 1S1, 11, 101, 111,… are in L(G).
Derivations
Other Sentential Forms
At each step of a derivation, one can replace any variable by a suitable production.
If at each non-trivial step of the derivation the leftmost (or rightmost) variable is
replaced by a production rule, then the derivation is said to be a leftmost (or
rightmost) derivation, respectively. We let α ⇒∗ β (or α ⇒∗ β) to denote the LM RM
existence of a leftmost (or rightmost) derivation of β from α, respectively.
Sentential forms derived via leftmost (or rightmost) derivations are known as
leftmost (or rightmost) sentential forms, respectively.
Balanced Parantheses Example
Consider the CFG G = ({S},{(,)},P,S) with P given by S −→ SS |(S)|().
[Derivation] [Leftmost Derivation] [Rightmost Derivation
S ⇒ SS ⇒ (S)S ⇒ (S)() ⇒ (())() ↑G↑G↑G↑G
S ⇒ SS ⇒ (S)S ⇒ (())S ⇒ (())() ↑G↑G↑G↑G
S ⇒ SS ⇒ S() ⇒ (S)() ⇒ (())() ↑G↑G↑G↑G
In the above, ↑ indicates the variable that is replaced in the following step
Parse Trees
Parse Trees
Parse trees are a graphical method of representing derivations. They are used in compilers to represent the source program.
Definition
Given a CFG G = (V,T,P,S), a parse tree for G is any directed labelled tree that meets the following three conditions:
ó every interior node is labelled by a non-terminal (i.e., variable);
ó every leaf node is labelled by a non-terminal, or a terminal or ε; however if it is labelled by ε, it is the sole child of its parent.
ó if an interior node is labelled by A ∈ V , and it’s childrenarelabelleds1,…,sk ∈V∪T∪{ε},then A −→ s1 ···sk is a production rule in P.
The yield of a parse tree is the string formed from the labels of the tree leaves read from left to right.
Note: The yield is not necessarily a string of terminals.
G=( S à (à) àPàS) P : S ! SS (S) Ý
(S)(S) (S) Ý
Ý yield = (())()
An Equivalence between Parse Trees and Derivations
Derivations and Parse Trees
Parse trees, derivations, leftmost derivations, and rightmost derivations are equivalent means of generating the language L(G) of a CFG G.
The proof for equivalence of rightmost derivations mirrors that of leftmost derivations. (So we’ll not delve into rightmost derivations).
Theorem 5.5.1
Let CFG G = (V,T,P,S) be given. Let A ∈ V and w ∈ T∗. Then,
A⇒∗ w ⇔ A⇒∗ w ⇔ thereexistsaparsetreewithrootAandyieldw ⇔ A⇒∗ w.
Proof Idea
We’ll show the following implications.
Existence of a parse tree with root A and yield w
A)⇤ w ByDefinition A)⇤ w LM G
An Equivalence between Parse Trees and Derivations
Part (a) of Proof of Theorem 5.5.1: A ⇒∗ w ⇒ ∃ Parse Tree G
We use induction on the (length of the) derivation. Lemma 5.5.2
LetCFGG=(V,T,P,S)begiven.LetA∈Vandα∈SF(G).IfA⇒∗ α,thenthere G
exists a parse tree with root A and yield α.
Proof of Lemma 5.5.2 (Induction on the length of derivation)
ó Suppose A ⇒∗ α is a derivation of length 0. G
ó Then A is a parse tree with root A and yield A.
An Equivalence between Parse Trees and Derivations
Part (a) of Proof of Theorem 5.5.1: A ⇒∗ w ⇒ ∃ Parse Tree G
Proof of Lemma 5.5.2 (Induction on derivations)
ó Hypothesis: the claim is true for all derivations of length k − 1 or lesser for some k ≥ 1.
ó Suppose a derivation of α from A in k steps exists. A=γ1 ⇒γ2 ⇒γ3 ⇒···⇒γk−1 ⇒γk =α
ó The last step must involve the application of a
production rule. Hence, γk−1 = βBω and α = βλω where (a) β,ω ∈ (V ∪ T)∗, (b) B ∈ V, and (b)
B −→ λ is a production rule.
Parse tree for
A )⇤ þ = í G
Parse tree for
A )⇤ âk 1 G
ó Extend the parse tree from the first k − 1 steps by:
Ifλ=X1…Xn withX1,…,Xn ∈V∪T,add childen X1,…,Xn to node B.
An Equivalence between Parse Trees and Derivations
Part (b) of Proof of Theorem 5.5.1: Parse Tree ⇒ A ⇒∗ w LM
Proof of Theorem 5.5.1 (Induction on the height of the tree)
ó Base case: the parse tree has height 0
ó Then A is a leftmost derivation in zero steps.
ó Induction: Let the claim be true for all parse trees of up to height l − 1.
ó Consider the root and its (say k) children. This corre- sponds to a production rule A −→ X1 ···Xk.
ó If Xi is a leaf, then the yield of the sub-tree rooted at Xi is wi = Xi itself. Then trivially Xi ⇒∗ wi .
óIfXi isnotaleaf,letwi betheyieldoftheparse
(sub-)tree rooted at Xi of depth l − 1 or less. Then, by induction hypothesis, Xi ⇒∗ wi .
Then, the following is a leftmost derivation for α from A
A ⇒ X1X2 ···Xk ⇒∗ w1X2 ···Xk ⇒∗ w1w2X3 ···Xk ⇒∗ ··· ⇒∗ w1 ···wk G LM LM LMLM
s1 s2 ààà sÔ
í = s1 à à à sÔ
( A à í ) ⌘ ( A ! í ) 2 P
Induction:
Ambiguous Grammars
Ambiguity in CFGs
Definition
A given CFG G is ambiguous if a string w ∈ L(G ) is the yield of two different parse trees. Equivalently, a CFG G is ambiguous if a string w ∈ L(G ) has two different leftmost (or rightmost) derivations.
ó Ambiguity is a property of a grammar, and not the language it generates.
An Example
ó CFG G = ({E }, {0, 1, . . . , 9, +, ∗}, P , E ) with P : E −→ E + E |E ∗ E |0|1| · · · |9
ó Consider the parse trees for 9 + 2 ∗ 2.
ó Since there are two distinct parse trees, a compiler will not know to reduce this to 13 or to 22.
EE E+E E⇤E
ó This ambiguity is addressed by precedence rules for operators.
Ambiguous Grammars
Ambiguity in CFGs
ó Some languages are generated by unambiguous as well as ambiguous grammars.
Balanced Parantheses Example
ó CFG G1 = ({S},{(,)},P,S) with P : S −→ SS|(S)|()
ó CFG G2 = ({B,R},{(,)},Q,B) with Q : B −→ (RB|ε and R −→)|(RR
ó G1 is ambiguous for there are two leftmost derivations for ()()().
S ⇒SS ⇒()S ⇒()SS ⇒()()S ⇒()()() LM LM LM LM LM
S ⇒SS ⇒SSS ⇒()SS ⇒()()S ⇒∗ ()()() LM LM LM LM LM
ó G2 is not ambiguous, since there is precisely only one rule at any stage of derivation. B ⇒∗ (RB ⇒ ()B ⇒ ()(RB ⇒ ()()B ⇒ ()()()B ⇒ ()()()ε
LM LM LM LM LM LM
ó Some languages are intrinsically ambiguous, e.g., {0i 1j 2k : i = j or j = k }. All grammars for such languages are ambiguous.
ó In general, there is no way to tell if a grammar is ambiguous.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com