COMP3630/6360: Theory of Computation Semester 1, 2022
The Australian National University
Normal Forms and Closure Properties
Copyright By PowCoder代写 加微信 powcoder
This lecture covers Chapter 7 of HMU: Properties of CFLs
Chomsky Normal Form
Pumping Lemma for CFGs Closure Properties of CFLs Decision Properties of CFLs
Additional Reading: Chapter 7 of HMU.
Chomsky Normal Form (CNF) for CFG
Chomsky Normal Form (CNF) for CFG
Chomsky Normal Form (CNF) for CFG
Chomsky Normal Forms
ó A normal or canonical form (be it in algebra, matrices, or languages) is a standardized way of presenting the object (in this case, languages).
ó A normal form for CFGs provides a prescribed structure to the grammar without compromising on its power to define all context-free languages.
ó Every non-empty language L with ε ∈/ L has Chomsky Normal Form grammar G = (V,T,P,S) where every production rule is of the form:
ó A −→ BC for A,B,C ∈ V, or
ó A−→aforA∈V anda∈T.
and every variable in V is useful, i.e. appears in the derivation of at least one
terminal string: for all X ∈ V there is α,β,w such that S ⇒∗ αXβ ⇒∗ w. GG
ó CNF disallows:
ó A −→ ε [ε-productions].
ó (A −→ B for A, B ∈ V . [Unit productions].
ó A−(→(B1···Bk,A∈V,Bi ∈V∪T fork≥2 [Complexproductions]. (
Chomsky Normal Form (CNF) for CFG
Towards CNF [Step 1: Remove ε-Productions]
ó ε-production: A −→ ε for some A ∈ V.
ó LetuscallavariableA∈V asnullableifA⇒∗ ε.
ó We can identify nullable variables as follows:
ó Basis: A∈V isnullableifA−→εisaproductionruleinP.
ó Induction: B ∈ V is nullable if B −→ A1 ···Ak is in P, and each Ai is nullable.
Procedure to Eliminate ε-Productions
ó Given G = (V,T,P,S) define Gno-ε = (V,T,Pno-ε,S) as follows:
1. Start with Pno-ε = P. Find all nullable variables of G. 3. For each production rule in P do the following:
ó If the body contains k > 0 nullable variables, add 2k productions to Pno-ε obtained by choosing a subset of nullable variables and replacing each by ε
4. Delete any production in Pno-ε of the form Y → ε for any Y ∈ V.
For example, suppose that in a given grammar, B,D are nullable and C is not. If A −→ BCD is a rule in P, then A −→ BCD|CD|BC|C are rules in Pno-ε. Similarly, if A −→ BD is a rule in P, then A −→ BD|B|D are rules in Pno-ε.
Chomsky Normal Form (CNF) for CFG
Towards CNF [Step 1: Remove ε-Productions]
An Example
Suppose G = ({A,B,C},{0,1},P,A) with P: A −→ BC; B −→ 0B|ε; C −→ C11|ε. ó B and C are nullable since B −→ ε and C −→ ε. Then, A is also nullable.
ó Define Gno-ε = ({A, B , C }, {0, 1}, Pno-ε , A) with Pno-ε containing
ó A −→ BC | B | C | ε
ó B −→ 0B | 0 | ε
ó C −→ C11 | 11 | ε
Theorem 7.1.1
The above induction procedure described in Slide 4 identifies all nullable variables. Theorem 7.1.2
L(Gno-ε) = L(G) \ {ε}.a
aProof in the Additional Proofs Section at the end
Chomsky Normal Form (CNF) for CFG
Towards CNF [Step 2: Remove Unit Productions]
ó Given a grammar G and variables A,B ∈ V, we say (A,B) form a unit pair if A ⇒∗ B G
using unit productions alone.
ó We can identify unit pairs as follows:
ó Basis: For each A ∈ V, (A,A) is a unit pair (since A ⇒∗ A). G
ó Induction: If (A,B) is a unit pair, and B → C is a production in P, then (A,C) is a unit pair.
ó Note: Suppose A −→ BC and C −→ ε are productions then A ⇒∗ B, but (A, B) is G
not a unit pair.
Procedure to Eliminate Unit Productions
ó Given G = (V,T,P,S) define Gno-unit = (V,T,Pno-unit,S) as follows:
1. Start with Pno-unit = P. Find all unit pairs of G.
2. For every unit pair (A, B) and non-unit production rule B −→ α, add rule
A −→ α to Pno-unit.
3. Delete all unit production rules in Pno-unit.
Chomsky Normal Form (CNF) for CFG
Towards CNF [Step 2: Remove Unit Productions]
An Example
Suppose G = ({A,B,C,D},{a,b},P,A) with P: A−→B|aC; B −→A|bD; C −→aC|ε; D −→bD|ε.
ó (A,B) and (B,A) are the only two non-trivial pairs of unit variables.
ó Define Gno-unit = ({A,B,C,D},{a,b},Pno-unit,A) with Pno-unit containing
ó A−→aC|bD|B
ó B −→ bD|aC|A óC−→aC|ε ó D −→ bD|ε
ó Note: Rules with B being the head can never be used. Theorem 7.1.3
The induction procedure on Slide 6 identifies all unit pairs. Theorem 7.1.4
L(Gno-unit) = L(G).b
bOutline of the proof is given in the Additional Proofs Section at the end
Chomsky Normal Form (CNF) for CFG
Towards CNF [Step 3: Remove Useless Variables]
ó A symbol X ∈ V ∪ T is said to be ógeneratingifX⇒∗ wforsomew∈T∗;
óreachableifS⇒∗ αXβforsomeα,β∈(V∪T)∗;and G
óusefulifS⇒∗ αXβ⇒∗ wforsomew∈T∗ andα,β∈(V∪T)∗. GG
(Useful ⇒ Reachable + Generating, but not necessarily vice versa!) ó Given a grammar G, we can identify generating variables as follows:
óBasis:Foreachs∈T,s⇒∗ s.Sosisgenerating G
ó Induction: If A −→ α, and every symbol of α is generating, so is A. ó Given a grammar G, we can identify reachable variables as follows:
ó Basis: S ⇒∗ S so S is reachable. G
ó Induction: If A −→ α, and A is reachable, so is every symbol of α.
Chomsky Normal Form (CNF) for CFG
Towards CNF [Step 3: Remove Useless Variables]
Procedure to Eliminate Useless Variables
ó Given G = (V,T,P,S) define GG = (VG,T,PG,S) as follows: ó Find all generating symbols of G
ó VG is the set of all generating variables.
ó PG is the set of production rules involving only generating symbols.
ó Now, define GGR = (VGR, TGR, PGR, S) as follows: ó Find all reachable symbols of GG
ó VGR is the set of all reachable variables.
ó PGR is the set of production rules involving only reachable symbols.
The Order of Eliminating Variables is Important!
ó Consider G = ({A,B,S},{0,1},P,S) with P : S −→ AB|0; A −→ 1A; B −→ 1.
ó A is not generating. Removing A and the rules S −→ AB and A −→ 1A results in B
being unreachable. Removing B and B → 1 yields GGR = ({S}, {0}, S −→ 0, S).
ó Reversing the order, we first see that all symbols are reachable; removing then the non-generating symbol A and production rules S −→ AB and A −→ 1A yields GRG = ({B,S},{0},S −→ 0 and B −→ 0,S). But B is unreachable now!
Chomsky Normal Form (CNF) for CFG
Towards CNF [Step 3: Remove Useless Variables]
Theorem 7.1.5
The induction procedure on Slide 9 identifies all generating variables. Theorem 7.1.6
The induction procedure on Slide 9 identifies all reachable variables. Theorem 7.1.7
(1) L(G) = L(GGR); and (2) Every symbol in GGR is useful.c cProof in the Additional Proofs Section at the end
Chomsky Normal Form (CNF) for CFG
Towards CNF [Step 4: Remove Complex Productions]
Procedure to Eliminate Complex Productions
ó Given G = (V,T,P,S), define Gˆ = (Vˆ,T,Pˆ,S) as follows:
ó Start with Gˆ = G and do the following operations.
ó For every terminal a ∈ T that appears in the body of length 2 or more, introduce
a new variable A and a new production rule A −→ a.
ó Replace the occurrence all such terminals in the body of length 2 or more by the
introduced variables.
ó Replace every rule A −→ B1 ···Bk for k > 2, by introducing k − 2 variables
D1,…,k −2, and by replacing the rule by the following k −1 rules:
A −→ B1D1 D2 −→ B3 D3 ··· Dk−2 −→ Bk−1Bk
D1 −→ B2D2 ··· Dk−3 −→ Bk−2 Dk−2 ó Note: Each introduced variable appears in the head exactly once.
Theorem 7.1.8
L(G) = L(Gˆ).d
dOutline of the proof is given in the Additional Proofs Section at the end
Chomsky Normal Form (CNF) for CFG
The Chomsky Normal Form
Theorem 7.1.9
For every context-free language L containing a non-empty string, there exists a grammar G in Chomsky Normal Form such that L \ {ε} = L(G ).
ó Since L is a CFL, it must correspond to some CFG G.
ó Eliminate ε productions (Step 1) to derive a grammar G1 from G such that
L(G1) = L(G) \ {ε}.
ó Eliminate unit productions (Step 2) to derive a grammar G2 from G1 such that
L(G2) = L(G1).
ó Eliminate useless variables (Step 3) to derive a grammar G3 from G2 such that
L(G3) = L(G2).
ó Eliminate complex productions (Step 4) to derive a grammar G4 from G3 such that
L(G4) = L(G3).
ó G4 contains no ε-productions, no unit productions, no useless variables, and no productions with body consisting of 3 or more symbols; Hence G4 is in CNF.
Pumping Lemma for CFLs
Pumping Lemma for CFLs
Pumping Lemma for CFLs
Pumping 7.2.1
Let L ̸= ∅ be a CFL. Then there exists n > 0 such that for any string z ∈ L with |z| ≥ n, (1) z=uvwxy; (2)vx̸=ε; (3)|vwx|≤n; uviwxiy∈Lforanyi≥0.
ó Since the claim only pertains to non-empty strings, we can show the claim for L \ {ε}. ó Let CNF grammar G generate L\{ε}. Choose n = 2m where m = |V| in G.
ó Pickanyz with|z|≥n.
ó Depth d ≥ m + 1.
At most two children per node
A ! s 23
2d- 2d-
22 yield=zand z 2m
Pumping Lemma for CFLs
ó SincedepthD=m+1ormore,theremustbeapathwithm+1ormoreedgesin the tree.
ó There must be two labels that match in the last m + 1 edges of the path! ó The claim follows from the following pictorial argument.
Pumping Lemma for CFLs
Uses of Pumping Lemma
ó Pumping lemma can be used to argue that some langauges are not CFLs.
Proof that L = {0n1n2n : n ≥ 0} is Not Context-Free
ó Suppose it were.
ó There exists an n such that for strings z longer than n pumping lemma applies.
ó Applying pumping lemma to z = 0n1n2n, we see that z = uvwxy such that |vwx| ≤ n.
znznzn 2 possible options for vwx
ó vwx cannot contain both zeros and twos. Two cases arise:
ó Case (a): Suppose vwx contains no 2s. Then uwy contains fewer 1s or 0s than
2s. Such a string is not in L.
ó Case (b): Suppose vwx contains no 1s. Then uwy contains fewer 1s or 2s than
1s. Such a string is not in L.
Closure Properties
Closure Properties
Closure Properties
Substitution of Symbols with Languages
ó LetLbeaCFLonΣ1,andlethbeasubstitution,i.e.,foreacha∈Σ1,h(a)isa language over some alphabet Σa.
ó We can extend the substitution to words by concatenation, i.e., h(s1 ···sk) = h(s1)h(s2)···h(sk).
ó One can then extend the substitution to languages by unioning, i.e., h(L) := h(s1 ···sl) = h(s1)···h(sl)
s1 ···sl ∈L s1 ···sl ∈L
i.e., h(L) is the language formed by substituting each symbol in a string in the
language L by a corresponding language. An Example
Suppose L = {anbn : n ≥ 0} and h(a) = {0} and h(b) = {1,11}. Then, h(L)={0n1m :n≤m≤2n}
Theorem 7.3.1
IfLisaCFLoverΣ1 andh(a)isaCFLforeverya∈Σ1,thenh(L)isalsoaCFL.
Closure Properties
Substitution of Symbols with Languages
Proof of Theorem 7.3.1
ó Let G = (V,Σ1,P,S) be a grammar that generates L.
ó Let for a ∈ Σ1, let Ga = (Va, Σa, Pa, Sa) be a grammar that generates h(a).
ó WLOG,assumethatV∩Va =∅foreacha∈Σ1.
ó Now define Gˆ = (V,{Sa : a ∈ Σ1},Pˆ,S) by
ó Every rule of Pˆ is a rule of P obtained by replacing each a ∈ Σ1 by Sa. óForexample,X→aXbinPwillcorrespondtoX→SaXSb inPˆifa,b∈Σ1.
ó Let Gsub = (V ∪ (∪a∈Σ1 Va), ∪a∈Σ1 Σa, Pˆ ∪ (∪a∈Σ1 Pa), S)
ó Claim: Gsub generates h(L).
ó Note that w ∈ h(L) can be written as wa1 ···wal for wai ∈ h(ai) for each i, and for some a1 ···al ∈ L.
S)⇤ a1àààaÔ (inG) Fori=1à:::àÔ, Sai )⇤ wai (inGai) ⇤+ˆ⇤+
S ) Sa1 àààSaÔ (in G as well as Gsub) Sai ) wai (in Gsub)
S ) Sa1 àààSaÔ ) wa1Sa2 àààSaÔ ) wa1wa2Sa3 àààSaÔ ) ààà ) wa1 àààwaÔ
Closure Properties
Closure under substitution means…
Closure under
ó (Finite) Union: Let L = {1,2,…,k} and h(i) = Li be a CFL for each i = 1,…,k.
By Theorem 7.3.1, h(L) = L1 ∪ ··· ∪ Lk is a CFL.
ó (Finite) Concatenation: Let L = {a1a2 ···ak} and h(ai) = Lai be a CFL for each
i = 1,…,k. By Theorem 7.3.1, h(L) = La1 ···Lak is a CFL.
ó Kleene-∗ closure: Let L = {a}∗ and h(a) = La be a CFL. By Theorem 7.3.1,
h(L) = (La)∗ is a CFL.
ó Positiveclosure: LetL={a}+ :={an :n≥1}andh(a)=La beaCFL.By
Theorem 7.3.1, h(L) = La(La)∗ is a CFL.
ó Homomorphism: Let L be a CFL and g be a homomorphism (i.e., h maps symbols to strings of symbols over some alphabet). Define h(a) = {g(a)}, which is a regular/CF language.Then, h(L) = g(L) and by Theorem 7.3.1, it is a CFL.
Closure Properties
Some More Closure Properties – 1
Theorem 7.3.2
If L is CFL, then so is LR.
Proof of Theorem 7.3.2
ó If G = (V,T,P,S) generates L, then GR = (V,T,PR,S) generates LR where A→X1···Xl inP ⇐⇒ A→(X1···Xl)R =XlXl−1···X1 inPR (1)
Theorem 7.3.3
If L is a CFL, R is a regular language, then L∩R is a CFL. Proof of Theorem 7.3.3
ó Product PDA Approach: Run the PDA accepting L1 and DFA accpeting L2 in parallel. Accept input string iff both machines are in one of their respective final states.
Closure Properties
Some More Closure Properties – 2
Theorem 7.3.4
If L is a CFL and h is a homomorphism, h−1(L) = {w : h(w) ∈ L} is also a CFL. A Coarse Outline of Proof of Theorem 7.3.4
PDA P accepting L
ó Note that editing the input tape is not a valid PDA operation.
ó To fix that, we need to alter the state of the PDA P to store h(a) in the state itself.
ó Let L be a language over {0,1} and h(0) = aa and h(1) = bbb.
ó Let the states of PDA P be q0,…,qk. Then, the PDA that accepts h−1(L) has
6k states, namely (qi,aa), (qi,a), (qi,ε), (qi,bbb), (qi,bb), and (qi,b).
ó The transition between states of P′ is defined as if the second component is the
input tape. When the second component is empty, the PDA has the choice to read an input symbol a and move into a state with h(a) as the second component.
Closure Properties
Some Non-Closure Properties
ó CFLs are not closed under intersection.
ó LetL1 ={0n1n2m :n,m≥0},L2 ={0n1m2n :n,m≥0}. BothareCFLs.
However, L1 ∩ L2 = {0n1n2n : n ≥ 0} is not a CFL.
ó CFLs are not closed under complementation.
ó Suppose CFLs are closed under complementation. Let L1,L2 be the
aforementioned CFLs. Then L1 ∩ L2 = (Lc1 ∪ Lc2)c must be a CFL, but it is not.
(See Slide 14). Hence, CFLs cannot be closed under complementation.
ó Note: There exist CFLs L such that Lc is a CFL as well.
ó CFLs are not closed under set difference.
ó Since CFLs are not closed under complementation, choose a CFL L such that Lc
is not a CFL. But Lc = Σ∗ \L and Σ∗ is a CFL. Hence, CFLs are not closed under
set difference.
ó Note: There exist CFLs L1, L2 such that L1 \ L2 is a CFL as well.
Decision Properties
Decision Properties
Decision Properties
Emptiness and Membership
ó Conversion of a grammar G to a corresponding PDA, PDA to a corresponding grammar G, and a grammar to CNF can each be achieved in polynomial time.
Emptiness of a CFL L
ó Let a grammar G = (V,T,P,S) generating the language L be given. (If PDA is
given, convert it to a grammar G).
ó G is non-empty ⇐⇒ S is generating.
Decision Properties
Emptiness and Membership
Membership of w in a CFL L
óGivenCNFG=(V,T,P,S)andw=a1···al weidentifyl(l+1)/2setsEi,j
ó Ei ,j corresponds to all variables that can derive ai · · · aj .
ó Ei,j’s are identify from bottom to top, left to right by the following induction.
ó Basis: For each i = 1,…,l, Ei,i contains all variables X such that X → ai.
ó Induction: For each i = 1,…,l and j > i, Eij contains X if: (1) X −→ YZ (2)
Y∈Ei,i′ andZ∈Ei′+1,j forsomei≤i′≤j. óS∈E1,l ⇐⇒w∈L(G).
E1àÔ E1àÔ 1 E2àÔ
E1à3 E2à4
E1à2 E2à3
E1à1 E2à2 E3à3 ààà EÔàÔ
a1 a2 a3 ààà aÔ
à à à EÔ 1àÔ
Decision Properties
Some Undecidable Questions about CFGs and CFLs
ó Is a given grammar unambiguous/ambiguous? ó Is a given CFL inherently ambiguous?
ó Is the intersection of two CFLs empty?
ó Are two CFLs identical?
ó Is a given CFL equal to Σ∗?
Additional Proofs
Additional Proofs
Additional Proofs
Additional Proofs
Proof of Theorem 7.1.2
⇐ Construct a parse tree with yield w ∈ L(G ) \ {ε}. Identify a maximal subtree, rooted at say X, whose yield is ε. Delete X and its subtree. Repeat until no such subtrees remain. In this illustrative example below, suppose that there is only one subtree with ε yield; let B be its label and let A −→ BCD be the production that introduced B. Now, delete B and its subtree. This new subtree is a parse tree for Gno-ε with yield w since A −→ CD is a valid production rule in Pno-ε [Why? B is nullable].
⇒ Construct a parse tree with yield w ∈ L(Gno-ε). Identify production rules (used in the tree) that are not in P. For each such rule, find an appropriate rule by appending nullable variables. To the parse tree, add the corresponding nullable variable(s) and a zero-yield subtrees to transform it to a parse tree for G.
In the example, the portion of the parse tree in yellow corresponds to the rule A −→ CD; then there must be some rule in P (namely A −→ BCD) such that the added variable(s) (B in this case) is nullable. So we add a child node with label B to the node with label A and append a sub-tree of yield ε rooted at B. This is now a parse tree for G with yield w.
Gno-Ý S G S
. yield:w .
A ! BCD A CD BCD
. . sub-yield = Ý . . yield:w . z .
Additional Proofs
Additional Proofs
Outline of Proof of Theorem 7.1.4
L(Gno-unit) ⊆ L(G): By definition, A → γ in Pno-unit iff there exists a B ∈ V such that A⇒∗ BandB−→γinP.
ó Thus, every production rule A → γ of Pno-unit is effectively a derivation A ⇒∗ α in G. G
ó Hence, every derivation of Gno-unit is a derivation of G. L(G) ⊆ L(Gno-unit): Consider a derivation of w ∈ L(G) from S.
ó Argue that every run of unit productions in P that are used in this derivation must be followed by a non-unit production rule in P.
ó Each such run of unit productions in P followed by a non-unit production can be condensed to a single production in Pno-unit. [See definition of Pno-unit]
ó The condensed derivation is then a derivation of w using rules in Pno-unit.
Additional Proofs
Additional Proofs
Proof of Theorem 7.1.7
(1) L(GGR) ⊆ L(G)) since the alphabets and the rule of GGR are subsets of those of G.
ó Suppose w ∈ L(G). Then, there must be such a derivation of w from S:
S⇒γ1⇒γ2⇒γ3···⇒γk =w. GGGG
Rule:R1 R2 R3 Rk
ó Since every variable symbol that appears in this derivation is generating, they and the
productionrulesR1,…,Rk usedinthisderivationwillbepresentinGG.
ó Every variable in this derivation is reachable; consequently, the variables that appear
and the rules R1, . . . , Rk will be present in GGR. Then, w ∈ L(GGR).
(2) A straightforward exercise in verifying the definition on Slide 7. Note that the remaining symbols have to be shown to be useful in GGR, and not in G!
Additional Proofs
Additional Proofs
Outline of Proof of Theorem 7.1.8
ó L(G) ⊆ L(Gˆ) because every production rule of Gˆ has a corresponding equivalent derivation of α from A in Gˆ.
ó Consider the parse tree of w ∈ L(Gˆ). If there are no introduced variables, then this is also the parse tree of w in G and hence w ∈ L(G).
ó If there are introduced variables, replace them by the complex production in G that introduced them in the first place (such replacements are always possible). The resultant tree is a parse tree for w in G, and hence w ∈ G.
B1 B2 ààà Bk B3
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com