The Australian National University Semester 1, 2020 Research School of Computer Science Tutorial 5
Theory of Computation
This tutorial sheet contains way more exercises that you will be able to solve or discuss in the tutorial.
Exercise 1 Context Free or Not?
Copyright By PowCoder代写 加微信 powcoder
For each of the following languages, either prove that it is context free, or prove that it is not.
1. L1 = {w ∈ {0, 1}∗ : w is a palindrome}
Solution. Yes, we can choose the CFG
S → 0S0 | 1S1 | 1 | 0 | ε
2. L2 ={0p :pprime}
Solution. No, we can show this language is not context-free via the pumping lemma. Assume for contradiction that L2 is context-free. Then there exists some n ∈ N such that the pumping lemma for CFLs applies. Let z = 0pn , where pn is the nth prime number. Clearly, pn > n. Then there exists a decomposition z = uvwxy with |vwx| ≤ n, vx ̸= ε and ∀t ≥ 0,uvtwxty ∈ L2. So, let u = 0i,v = 0j,w = 0k,x = 0l,y = 0m. Then we have that ∀t ≥ 0,0i0jt0kxtl0m ∈ L. So, for all t ≥ 0, i + jt + k + tl + m is prime.
Choose t = 0. Then Hence, i+k+m≥2.
i + jt + k + tl + m = i + k + m is prime
Now choose t = i + k + m. Then
i + jt + k + tl + m = (j + l)t + i + k + m = (j + l + 1)(i + k + m)
We have from before that i+k+m ≥ 2. Since vx ̸= ε, we have that |vx| = j+l ≥ 1. Hence j + l + 1 ≥ 2. So we have a valid factorization of i + jt + k + tl + m where each factor is at least 2, and hence i + jt + k + tl + m cannot be prime, a contradiction.
3. L3 = {aibjck | i ̸= j or j ̸= k}. Solution. Yes, we can choose the CFG
S1 →aS1b|aA|Bb A → aA | ε
B → Bb | ε
C → Cc | ε
4. L6 ={αα:α∈{0,1}∗}
S → S1C | AS2
Solution. No, we argue via the pumping lemma. There exists some n ∈ N such that pumping lemma applies. Choose z = 0n1n0n1n. Clearly, z ∈ L and |z| = 4n ≥ n. Then we decompose z = uvwxy.
We shall prove via cases on the middle section vwx.
Case 1: vwx sits entirely inside one of the blocks of consecutive digits.
Suppose vwx sits inside the first block of zeros. Then the string uv0wx0y = uwy is of the form 0k1n0n1n where k < n, which cannot be written as ww for some w, as the only place to split is into
S2 →bS2c|bB|Cc
0k1n and 0n1n (otherwise the strings will start/end with different characters), but then k = n, a contradiction.
We can repeat the same proof for all the blocks by the same argument.
Case 2: vwx straddles the boundary between the first block of zeros, and the first block of ones.
So vwx = 0i1j. Without loss of generality, assume that j ̸= 0 (otherwise we get the same case as before.) Then the string uv0wx0y has fewer zeros in the first half as compared to the second, or fewer ones in the first half than the second. For the same argument as above, there’s only one place to split the string to potentially be written as αα for some α, and the first α will have fewer tokens then the second α, a contradiction.
Case 3: vwx straddles the middle boundary between the first block of ones, and the second block of twos.
So vwx = 1i0j. As before, we can only hope to split the string via the middle boundary, so by deleting v and x, we upset the balance between the number of 1s, or the number of 0s (or both), so it cannot be written as αα for some α, a contradiction.
Case 4: vwx straddles the third boundary between the second block of zeros, and the second block of ones.
Repeat the same argument as case 2.
All cases being covered, we always get a contradiction. Hence L6 cannot be context free.
5. A regular language.
Solution. Yes, we prove by transforming each of the regular expression constructors to a corre- sponding production rule.
Base Case:
(a) ∅ is ignored, and we do not create a production rule. (If the entire regular expression is ∅, then the set of production rules is empty.)
(b) ε. We can add a production rule S → ε.
(c) x∈Σ.WecanaddaproductionruleS→x.
Step Case: Assume that R1,R2 are regular expressions, with a corresponding CFGs G1 = (V1, T1, S1, P1) and G2 = (V2, T2, S2, P2) respectively.
(a) R1 +R2. Create a new CFG G = (V1 ∪V2,T1 ∪T2,S,P = P1 ∪P2), and add a new rule S → S1 | S2 to P .
(b) R1R2. Same as above, but add rule S → S1S2.
(c) R∗1. Same as above, but add rule S → SS1 | ε, which allows us to generate arbitrary many
copies of S (emulating S∗.)
Alternatively, we have – for a regular language L – a DFA A that accepts precisely L, i.e. L = L(A). We can turn this into a PDA that doesn’t manipulate the stack at all. Let A = (Q, Σ, δ, q0, F ). We considerthePDAP =(Σ,Γ,∆,δP,q0,Z,F)where
• Γ = {Z} (we just have a single stack symbol that we never touch) • δP (q, a, Z) = {(δ(q, a), Z)} (we just simulate the automaton).
Given that both automata have the same final states, we see that L is also accepted by a PDA (even a deterministic PDA).
Exercise 2 Conversion to CNF
Convert the following CFG’s to Chomsky Normal Form.
Solution. We first remove the ε productions. Clearly, S is nullable. So we remove the epsilon
S → aSb | ε
production, and add a new production S → ab, as previously we has S → ε. Thus,
S → aSb | ab
There are no unit productions, nor useless symbols to eliminate. We now add new non-terminals
for each terminal symbol, A → a and B → b. Thus, S→ASB|ab A→a B→b
The last step is to break up the production S → ASB into two smaller productions, S → AX and X → SB. Hence,
S →AX |AB X → SB
S →aSbS |bSaS |ab|ba
Solution. There are no epsilon productions to eliminate. There are no unit productions, nor useless
variables.
We add new variables for each of the terminals,
and add new variables to split the lengthy productions S → aSbS and S → bSaS into productions
of the form X → Y Z.
S→AX1 |BY1 |AB|BA X1 →SX2
Y1 →SY2 Y2 → AS
S → RcKD | RcKE | MD K → cK | ε | E
R → aaRb | aRb | aab D → aDa
E → aE | ε
M → KE G → bG | ε
Solution. We first remove the epsilon productions. Immediately we have that K,E and G are nullable. We also have that M is nullable, as M → KE.
We remove the productions of the form X → ε, and for any production containing nullable variables, we rewrite them with all possible combinations of removing the nullable variable or not. Note that in the production M → KE, all the variables are nullable, so we discard the entire production. Afterwards, we have
S → RcKD | RcD | RcKE | RcK | RcE | Rc | MD | D K → cK | c
R → aaRb | aRb | aab D → aDa
F → aG | a
E → aE | a
G → bG | b
We now eliminate unit productions, the only one is S → D. We replace it with what D could
produce directly.
S → RcKD | RcD | RcKE | RcK | RcE | Rc | MD | aDa K → cK | c
R → aaRb | aRb | aab D → aDa
F → aG | a
E → aE | a
G → bG | b
We now remove the useless symbols. First, we remove non-generating symbols (symbols that cannot
produce a terminal string).
Clearly, K, F, E, G, R are all generating, as they have a production that directly returns a string of terminals. Thus, so is S, as S → Rc. The variable D is non-generating, as it can only make more copies of D surrounded by as. We remove it, and any productions containing D.
S →RcKE |RcK |RcE |Rc K → cK | c
R → aaRb | aRb | aab F → aG | a
E → aE | a
G → bG | b
We now remove all non-reachable symbols and their related productions. The only reachable symbols are S,K,R,E, so we remove F and G.
S →RcKE |RcK |RcE |Rc K → cK | c
R → aaRb | aRb | aab E → aE | a
We now add new productions for the terminal symbols,
A→a B→b C→c
and rewrite any appearances of them in productions with more than one output symbol.
S →RCKE |RCK |RCE |RC K → CK | c
R → AARB | ARB | AAB E → AE | a
At long last, we add intermediary symbols to break up longer productions to productions of length
Exercise 3
S →RS1 |RS3 |RS4 |RC S1 → CS2
K → CK | c
R→AR1 |AR2 |AR3 R1 → AR2
E → AE | a A→a B→b C→c
Understanding the CYK Algorithm
It can be tricky to check if a string is in a CFL by trying to find a derivation. Consider the following grammar that generates expressions containing zero, one, and sums of zeros and ones with parentheses.
S→U+S|U U →0|1|(S)
Note that the grammar contains no ε-productions, so every non-terminal either produces a single non- terminal, or a string of more terminals and non-terminals. So if we are half-way through trying to find a derivation for a string w of terminals, and the string it produces so far is strictly longer, or if a prefix/suffix of terminals doesn’t match, we know it cannot produce w.
1. Show by brute forcing leftmost derivations from S that this grammar cannot produce the string 0 + +. (You can reject a potential derivation as soon as it starts/ends with the wrong character, it contains a character not in the string 0 + +, or it’s at least 4 tokens long.)
Solution. We compute all possible left derivations, stopping when the first/last character doesn’t match, the string contains a left or right bracket, or it’s too long. Since left derivations describe the entire language, 0 + + is not in the language.
0+S 1+S (S)+S 0 1 (S) 0+U+S 0+U
0+0 0+1 0+(S)
2. Convert the grammar to Chomsky Normal Form.
Solution. There are no epsilon productions to remove. We remove the unit production S → U and rewrite as
S → U + S | 0 | 1 | (S) U →0|1|(S)
There are no useless symbols to remove. We add new non-terminals for each terminal symbol, P→+ L→( R→)
and rewrite so terminals are not present in any productions other than ones of the form X → x. S → UPS | 0 | 1 | LSR
U →0|1|LSR
We then break up long productions into chains of productions of length two. Note that LSR shows
up twice, so we can reuse the same new variable B → SR for both.
S → UA | 0 | 1 | LB A→PS
U → 0 | 1 | LB
P→+ L→( R→)
3. Using the CYK Algorithm, show that 0 + + is not in the language. (See Question 4 if you need a reminder of the CYK algorithm.)
Solution. We run the algorithm, and generate the following table.
{S,U} {P} {P} 0++
Since X13 doesn’t contain S, the CYK algorithm indicates that this string cannot be generated by this grammar.
4. Using the CYK Algorithm, show that 0 + (1 + 0) is in the language.
∅ ∅ ∅ {S} ∅
∅ ∅ ∅ ∅ {A} {B}
{S,U} {P} {L} {S,U} {P} {S,U} {R} 0+(1+0)
5. Using the previous question, give a leftmost derivation of the string 0 + (1 + 0).
S ⇒ UA ⇒ 0A ⇒ OPS ⇒ 0 + S ⇒ 0 + LB ⇒ 0 + (SR
⇒ 0 + (UAR ⇒ 0 + (1AR ⇒ 0 + (1PSR ⇒ 0 + (1 + SR ⇒ 0 + (1 + 0R ⇒ 0 + (1 + 0)
Exercise 4 Proving Properties of the CYK Algorithm
We now wish to prove that the CYK algorithm really can check if a string is in the language or not.
Let G = (V,T,S,P) be a CFG. Assume that G is already in Chomsky Normal Form.
Let w = a1a2 ...an ∈ T∗ be a string of length n. Let Xij denote the table entries in the CYK algorithm (as in the figure below for n = 5).
X13 X24 X35
X12 X23 X34 X45
X11 X22 X33 X44 X55
a1 a2 a3 a4 a5 We can describe the CYK algorithm inductively as follows.
Base Case: Since the grammar is in CNF, the only productions that can create terminals are of the form A → a. Hence,
Xii ={A∈V :(A→aii)∈P}
Step Case: When we want to compute Xij (given we have already computed all the rows below), we’re
looking for productions A ⇒ aiai+1 . . . aj . Since the grammar is in CNF, this is only possible if A has the G
production rule A → BC where B ⇒ aiai+1 ...ak and C ⇒ ak+1 ...aj. Since we’ve already computed GG
all variables that produce those two substrings (they are Xik and Xk+1,j respectively), we have that Xij ={A∈V :∃k,i≤k