CS3342 – Assignment 1 due Feb. 10, 2021
1. (15pt) Given the simple calculator (Syntax: slide 16; textbook: p.54), give an example of input code that produces exactly three errors from the scanner. The scanner encounters an error when reading an invalid token. When that happens, assume that the scanner skips everything until the next white space (\ ,\t,\n). Your three invalid tokens should start with three different characters. Your input code will be a string over the alphabet {a,b,…,z,0,1,…,9,.,:,=,+,-,*,/,(,),\ ,\t,\n}. Explain in detail how the three errors are produced from the scanner.
2. (15pt) The scanners identify keywords as identifiers because they fit the definition of an identifier and because it is simpler to first categorize them as identifiers and then look them up to see whether in fact they are keywords. Draw a deterministic finite automaton that recognizes the keywords directly, that is, each keyword will be identified in a separate accepting state, which is different from any state recognizing identifiers. Label each accepting state accordingly.
Identifiers and keywords are defined as follows:
identifier −→ (_ | letter)(_ | letter | digit)∗ letter −→ a|b|···z
digit −→ 0|1|···9
keyword −→ if | in | int
That means, 0a is an invalid token whereas int1 and i are identifiers.
For simplicity, when labelling transitions, you are allowed to use a notation for ranges and comple- ments, e.g., a..d for a,b,c,d and ̸=7 for any character (in our alphabet) different from 7.
3. (20pt) Assume you have a C++ source file which you edit using the BBEdit editor (barebones.com). You need to replace all comments in the /* … */ style that span a single line with the same comment in the // … \n style. This can be done using the Find/Replace Grep feature. You are required to provide the two regular expressions corresponding to Find and Replace boxes. Comments spanning more than one line should not be affected. You are not required to handle strings containing comment-like strings, that is, such pseudo-comments would be replaced as well.
You will need to use the backreference feature: subpatterns captured by Find can be identified by enclosing them in parentheses, (…), and then used in Replace with \1,\2,… in the order they appear in Find.
4. (25pt) Consider the following grammar G (nonterminals are upper case, the rest are terminals):
1. P −→ E$$ 2. E −→ TL 3. L −→ +TL 4. L −→ ε
5. T −→ FM 6. M −→ *FM 7. M −→ ε
8. F −→ (E) 9. F −→ a
10. F −→ b 11. F −→ 0 12. F −→ 1
1
(a) (5pt) Show the parse tree for the input: (a+b)*0$$.
(b) (10pt) Compute first(X) and follow(X) sets for all nonterminals X (Syntax: slide 49). Use the algorithm given below and indicate, for each symbol added to a set, the pair (step, prod), where 1 ≤ step ≤ 3 is the step in the algorithm (showed as , in the algorithm) and 1 ≤ prod ≤ 11 is the production involved. (You can use jflap to check your calculations but have to show that you know how to compute the sets.)
(c) (10pt) Prove that G is LL(1). You need to use the definition of LL(1) (Syntax: slide 73).
1,
2
1
2
3
2
3
READ ME!
(a) (5pt) Is G LL(1)? Explain your answer.
(b) (10pt) Modify G by eliminating left-recursion on E. Is the new grammar, G′, LL(1)?
(c) (10pt) Modify G′ to become LL(1) by removing common prefixes.
Submit all your answers as a single pdf file on owl.uwo.ca. Solutions should be typed but high quality hand written solutions are acceptable. (Source code, if required, is submitted as separate files.)
You are allowed to use jflap (jflap.org) for your solutions, whenever possible.
The solutions should provide sufficient detail to support your claim. A simple yes/no answer without any supporting arguments will not be given any marks.
Self-reported absences can be used to extend the deadline by 48h. In this case, you must label “SELF-REPORTED ABSENCE” clearly at the top of the assignment and submit it by email to cs3342@uwo.ca instead of OWL.
5.
(25pt) Given the grammar G:
S −→ S −→ E −→ E −→ E −→
id=E $$ E
E +id
E *id id
3