HW1
50pts
Posted Friday, September 4, 2020 Due Friday, September 18, 2020
Note: Lecture 2 covers material for problems 1-3, Lectures 3 and 4 cover problems 4-5.
Problem 1 (10pts). Describe in English the languages denoted by the following regular expres- sions.
(a) (2pts) 0(0|1)∗0 (b) (2pts) ((ε|0)1*)*
(c) (2pts) (0|1)∗0(0|1)(0|1) (d) (2pts) 0*10*10*10*
(e) (2pts) (00|11)*((01|10)(00|11)*(01|10)(00|11)*)*
Note: Your description should be a high level characterization — one that would still make sense if we were using a different regular expression for the same language. For example, (10*10*)* should be described as “All strings of 0’s and 1’s, beginning with 1 and having even number of 1’s.” not as, for example, “The string of 1 followed by any number of 0’s followed by a 1 followed by any number of 0’s, repeated any number of times.”.
Problem 2 (10pts). Consider the grammar S→0S1S|1S0S|ε
(a) (2pts) Show that this grammar is ambiguous by constructing two different leftmost derivations for the sentence 0101.
(b) (2pts) Construct the corresponding rightmost derivations.
(c) (3pts) Construct the corresponding parse trees.
(d) (3pts) The grammar generates strings with equal number of 0’s and 1’s. Does it generate all such strings? (You are not required to write a formal proof. If you answer YES, write a brief justification. If you answer NO, show a string that cannot be generated by the grammar.)
Problem 3 (10pts). The following is an ambiguous expression grammar with one unary operator ∗ and n binary infix operators, θ1, θ2, …, θn, at n different levels of precedence:
E→Eθ1 E|Eθ2 E|…|Eθn E|E∗ |(E)|id
(a) (2pts) Show that the grammar is ambiguous.
(b) (8pts) Construct an equivalent unambiguous grammar such that binary operators θ1, θ2, …, θn
are all right-associative. Unary operator ∗ has the highest precedence. θn takes prece- dence over θn−1, θn−1 takes precedence over θn−2, etc.
Problem 4 (10pts). Consider the following LL(1) grammar for a simplified subset of Lisp: 1
2
P →E$$
E → atom E→’E
E → (EEs) Es → EEs
Es → ε
atom, ’, (, ), and $$ are the terminals (tokens), and P, E and Es are the nonterminals.
(a) (3pts) (b) (3pts)
(c) (4pts)
What is FOLLOW(Es)? FOLLOW(E)? PREDICT(Es → ε)?
Give a parse tree for the string (cdr ’(a b c)) $$. Note: keyword cdr and identifiers a, b, and c are atoms.
Consider a recursive descent parser running on the same input. At the point where the quote token (’) is matched, which recursive descent routines will be active (i.e., what routines will have a frame on the run-time stack)?
Problem 5 (10pts). For each grammar below, determine if it is LL(1) or not and if it is ambiguous or not. You do not need to justify your answer, just write YES or NO. As an example of the format for answers we are looking for: (x) LL(1): YES, Ambiguous: YES.
(a)(2pts) A→0A1|01 (b)(2pts) A→+AA| ∗ AA|a
(c)(2pts) A→A(A)A|ε
(d) (2pts) A → B a | b B c | d c | b d c B → d
(e)(2pts) S→CaCb|DbDa C→ε D→ε
3
Some additional problems if you are looking for an extra challenge. You DO NOT have to solve these problems. These problems won’t be graded and won’t bring credit even if you submit a solution.
Problem 1. The following grammar generates numbers in binary notation. C is the start symbol.
(1) C → C 0 | A 1 | 0
(2) A → B 0 | C 1 | 1
(3) B → A 0 | B 1
(a) Warmup: Construct a derivation that generates the binary notation of 21.
(b) Prove that the generated numbers are multiples of 3. (Hint: Structural induction.)
(c) Prove that all such numbers (i.e., nonnegative numbers that are multiples of 3) are generated by the grammar. (Hint: Strong induction on string length, contradiction.)
(d) Write a regular grammar G such that the numbers generated by G are multiples of 7 and all such numbers (i.e., nonnegative multiples of 7) are generated by G.
Problem 2. Write a top-down depth-first parser with backtracking for the language
(1) S → aSbS (2) S → bSaS (3) S → ε
Write your parser in Python. Include a function dfparse that takes a string, and returns a tuple (True, Seq) when the string is in the language, or (False, []) when the string is not in the language. Seq is the sequence of productions the depth-first parser applies. For example, dfparse(’abab’) yields (True, [1, 2, 3, 3, 3]), and dfparse(’abbb’) yields (False, []).