CS代考 COMP3630/6360: Theory of Computation Semester 1, 2022

COMP3630/6360: Theory of Computation Semester 1, 2022
The Australian National University
Context Free Languages

This lecture covers Chapter 5 of HMU: Context-free Grammars
(Context-free) Grammars
(Leftmost and Rightmost) Derivations
Parse Trees
An Equivalence between Derivations and Parse Trees Ambiguity in Grammars
Additional Reading: Chapter 5 of HMU.

Introduction to Grammars
We have so far seen machine-like means (e.g., DFAs) and declarative means (e.g., regular expressions) of defining languages
Grammars are a generative means of defining languages.
Grammars can be used to create a strictly larger class of languages.
They are especially useful in compiler and parser design; they can be used to check if:
ó parantheses are balanced in a program,
ó else occurrences have a matching if, etc.

Grammars: Formal Definition
A context-free grammar (CFG) G = (V,T,P,S), where
ó V is a finite set whose elements are called variables or non-terminal symbols.
Notation: upper case letters, e.g., A, B, . . ..
ó T is a finite set whose elements are called terminal symbols; T is precisely the alphabet of the language generated by the grammar G.
Notation: lower case letters, e.g., s1,s2,….
ó P ⊆ V ×(V ∪T)∗ is a finite set of production rules.
ó Each production rule (A, α) is also written as A −→ α.
Terminology: A , α are called the head and body of the production rule, resp. ó S ∈ T is the unique variable/non-terminal that ‘generates’ the language.
ó Strings consisting of non-terminals and/or terminals will be denoted by greek symbols, e.g., α, β, . . ..
ó Strings of terminals will be denoted by lower case letters, e.g., w,u,v

Derivations
How do Grammars Generate Languages?
A string w ∈ T∗ is in the language L(G) generated by G = (V,T,P,S) iff we can derive w from S, i.e.,
start from S and use production rule(s) repeatedly to replace heads of the rules by their bodies until a string in T∗ is obtained.
Let G = ({S}, {0, 1}, P, S) be a CFG with P given by
0S0 000 010
1111 11111
0110 01S10
􏰌 (S,ε),(S,0),(S,1) 􏰍
(S, 0S0), (S, 1S1) 0
01110 10101
S −→ 0 (2) S−→1
S −→ 0S0 S −→ 1S1
(3) S−→ε|0|1|0S0|1S1
10S01 1001
0000 00100 00000

Derivations
Derivation: Formal Definition
Definition
Given G = (V , T , P , S ) and α, β ∈ (V ∪ T )∗ , a derivation of β from α is a finite sequenceofstringsγ1 ⇒γ2 ⇒···⇒γk forsomek∈Nwhere
GGG 1. γ1 = α and γk = β;
2.γ1,…,γk ∈(V∪T)∗
3. For each i = 1,…,k − 1, γi+1 is obtained from γi by replacing the head of a
production rule of P by its body.
The following phrases are used interchangeably.
β is derived from α ⇔ there exists a derivation of β from α ⇔ α ⇒∗ β. G
For the grammar G = ({S},{0,1},P,S)with P given by S −→ ε|0|1|0S0|1S1, the following is a derivation of 010111010 from S
S ⇒ 0S0 ⇒ 01S10 ⇒ 010S010 ⇒ 0101S1010 ⇒ 010111010. GGGGG
S→0S0 S→1S1 S→0S0 S→1S1 S→1

Derivations
Sentential Forms and Language Generated by a Grammar: Definitions
Definition
GivenG=(V,T,P,S),anystringin(V∪T)∗ derivedfromSisasententialform.
The set of all sentential forms of G (denoted by SF(G)) is defined inductively:
ó Basis: S ∈ SF(G)
óInduction:ifαAγ∈SF(G)forsomeα,γ∈(V∪T)∗ andA∈V,andA−→βis
a production rule, then αβγ ∈ SF(G).
ó Only those strings that are generated by the above induction are sentential forms.
Definition
Given CFG G = (V , T , P , S ), the language L(G ) generated by G are the sentential forms that are in T∗, i.e., L(G) = SF(G) ∩ T∗.
For the CFG G = ({S},{0,1},P,S)with P given by S −→ ε|0|1|0S0|1S1,
(1) S, ε, 0, 1 0S0, 00, 000, 010, 1S1, 11, 101, 111,… are all sentential forms. (2) S, ε, 0, 1 0S0, 00, 000, 010, 1S1, 11, 101, 111,… are in L(G).

Derivations
Other Sentential Forms
At each step of a derivation, one can replace any variable by a suitable production.
If at each non-trivial step of the derivation the leftmost (or rightmost) variable is
replaced by a production rule, then the derivation is said to be a leftmost (or
rightmost) derivation, respectively. We let α ⇒∗ β (or α ⇒∗ β) to denote the LM RM
existence of a leftmost (or rightmost) derivation of β from α, respectively.
Sentential forms derived via leftmost (or rightmost) derivations are known as
leftmost (or rightmost) sentential forms, respectively.
Balanced Parantheses Example
Consider the CFG G = ({S},{(,)},P,S) with P given by S −→ SS |(S)|().
[Derivation] [Leftmost Derivation] [Rightmost Derivation
S ⇒ SS ⇒ (S)S ⇒ (S)() ⇒ (())() ↑G↑G↑G↑G
S ⇒ SS ⇒ (S)S ⇒ (())S ⇒ (())() ↑G↑G↑G↑G
S ⇒ SS ⇒ S() ⇒ (S)() ⇒ (())() ↑G↑G↑G↑G
In the above, ↑ indicates the variable that is replaced in the following step

Parse Trees
Parse Trees
Parse trees are a graphical method of representing derivations. They are used in compilers to represent the source program.
Definition
Given a CFG G = (V,T,P,S), a parse tree for G is any directed labelled tree that meets the following three conditions:
ó every interior node is labelled by a non-terminal (i.e., variable);
ó every leaf node is labelled by a non-terminal, or a terminal or ε; however if it is labelled by ε, it is the sole child of its parent.
ó if an interior node is labelled by A ∈ V , and it’s childrenarelabelleds1,…,sk ∈V∪T∪{ε},then A −→ s1 ···sk is a production rule in P.
The yield of a parse tree is the string formed from the labels of the tree leaves read from left to right.
Note: The yield is not necessarily a string of terminals.
G=( S à (à) àPàS) P : S ! SS (S) Ý
(S)(S) (S) Ý
Ý yield = (())()

An Equivalence between Parse Trees and Derivations
Derivations and Parse Trees
Parse trees, derivations, leftmost derivations, and rightmost derivations are equivalent means of generating the language L(G) of a CFG G.
The proof for equivalence of rightmost derivations mirrors that of leftmost derivations. (So we’ll not delve into rightmost derivations).
Theorem 5.5.1
Let CFG G = (V,T,P,S) be given. Let A ∈ V and w ∈ T∗. Then,
A⇒∗ w ⇔ A⇒∗ w ⇔ thereexistsaparsetreewithrootAandyieldw ⇔ A⇒∗ w.
Proof Idea
We’ll show the following implications.
Existence of a parse tree with root A and yield w
A)⇤ w ByDefinition A)⇤ w LM G

An Equivalence between Parse Trees and Derivations
Part (a) of Proof of Theorem 5.5.1: A ⇒∗ w ⇒ ∃ Parse Tree G
We use induction on the (length of the) derivation. Lemma 5.5.2
LetCFGG=(V,T,P,S)begiven.LetA∈Vandα∈SF(G).IfA⇒∗ α,thenthere G
exists a parse tree with root A and yield α.
Proof of Lemma 5.5.2 (Induction on the length of derivation)
ó Suppose A ⇒∗ α is a derivation of length 0. G
ó Then A is a parse tree with root A and yield A.

An Equivalence between Parse Trees and Derivations
Part (a) of Proof of Theorem 5.5.1: A ⇒∗ w ⇒ ∃ Parse Tree G
Proof of Lemma 5.5.2 (Induction on derivations)
ó Hypothesis: the claim is true for all derivations of length k − 1 or lesser for some k ≥ 1.
ó Suppose a derivation of α from A in k steps exists. A=γ1 ⇒γ2 ⇒γ3 ⇒···⇒γk−1 ⇒γk =α
ó The last step must involve the application of a
production rule. Hence, γk−1 = βBω and α = βλω where (a) β,ω ∈ (V ∪ T)∗, (b) B ∈ V, and (b)
B −→ λ is a production rule.
Parse tree for
A )⇤ þ = í G
Parse tree for
A )⇤ âk1 G
ó Extend the parse tree from the first k − 1 steps by:
Ifλ=X1…Xn withX1,…,Xn ∈V∪T,add childen X1,…,Xn to node B.

An Equivalence between Parse Trees and Derivations
Part (b) of Proof of Theorem 5.5.1: Parse Tree ⇒ A ⇒∗ w LM
Proof of Theorem 5.5.1 (Induction on the height of the tree)
ó Base case: the parse tree has height 0
ó Then A is a leftmost derivation in zero steps.
ó Induction: Let the claim be true for all parse trees of up to height l − 1.
ó Consider the root and its (say k) children. This corre- sponds to a production rule A −→ X1 ···Xk.
ó If Xi is a leaf, then the yield of the sub-tree rooted at Xi is wi = Xi itself. Then trivially Xi ⇒∗ wi .
óIfXi isnotaleaf,letwi betheyieldoftheparse
(sub-)tree rooted at Xi of depth l − 1 or less. Then, by induction hypothesis, Xi ⇒∗ wi .
Then, the following is a leftmost derivation for α from A
A ⇒ X1X2 ···Xk ⇒∗ w1X2 ···Xk ⇒∗ w1w2X3 ···Xk ⇒∗ ··· ⇒∗ w1 ···wk G LM LM LMLM
s1 s2 ààà sÔ
í = s1 à à à sÔ
( A à í ) ⌘ ( A ! í ) 2 P
Induction:

Ambiguous Grammars
Ambiguity in CFGs
Definition
A given CFG G is ambiguous if a string w ∈ L(G ) is the yield of two different parse trees. Equivalently, a CFG G is ambiguous if a string w ∈ L(G ) has two different leftmost (or rightmost) derivations.
ó Ambiguity is a property of a grammar, and not the language it generates.
An Example
ó CFG G = ({E }, {0, 1, . . . , 9, +, ∗}, P , E ) with P : E −→ E + E |E ∗ E |0|1| · · · |9
ó Consider the parse trees for 9 + 2 ∗ 2.
ó Since there are two distinct parse trees, a compiler will not know to reduce this to 13 or to 22.
EE E+E E⇤E
ó This ambiguity is addressed by precedence rules for operators.

Ambiguous Grammars
Ambiguity in CFGs
ó Some languages are generated by unambiguous as well as ambiguous grammars.
Balanced Parantheses Example
ó CFG G1 = ({S},{(,)},P,S) with P : S −→ SS|(S)|()
ó CFG G2 = ({B,R},{(,)},Q,B) with Q : B −→ (RB|ε and R −→)|(RR
ó G1 is ambiguous for there are two leftmost derivations for ()()().
S ⇒SS ⇒()S ⇒()SS ⇒()()S ⇒()()() LM LM LM LM LM
S ⇒SS ⇒SSS ⇒()SS ⇒()()S ⇒∗ ()()() LM LM LM LM LM
ó G2 is not ambiguous, since there is precisely only one rule at any stage of derivation. B ⇒∗ (RB ⇒ ()B ⇒ ()(RB ⇒ ()()B ⇒ ()()()B ⇒ ()()()ε
LM LM LM LM LM LM
ó Some languages are intrinsically ambiguous, e.g., {0i 1j 2k : i = j or j = k }. All grammars for such languages are ambiguous.
ó In general, there is no way to tell if a grammar is ambiguous.

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts