CS计算机代考程序代写 c++ algorithm CS3342 – Assignment 1 due Feb. 10, 2021

CS3342 – Assignment 1 due Feb. 10, 2021
1. (15pt) Given the simple calculator (Syntax: slide 16; textbook: p.54), give an example of input code that produces exactly three errors from the scanner. The scanner encounters an error when reading an invalid token. When that happens, assume that the scanner skips everything until the next white space (\ ,\t,\n). Your three invalid tokens should start with three different characters. Your input code will be a string over the alphabet {a,b,…,z,0,1,…,9,.,:,=,+,-,*,/,(,),\ ,\t,\n}. Explain in detail how the three errors are produced from the scanner.
Solution: 5pt for each correct example
Input code: :: .. 0a
After reading ’:’, the scanner is expecting ’=’ but receives ’:’. After reading ’.’, the scanner is expecting a digit but receives another ’.’. After reading ’0’, the scanner is expecting another digit or a ’.’ but receives ’a’.
2. (15pt) The scanners identify keywords as identifiers because they fit the definition of an identifier and because it is simpler to first categorize them as identifiers and then look them up to see whether in fact they are keywords. Draw a deterministic finite automaton that recognizes the keywords directly, that is, each keyword will be identified in a separate accepting state, which is different from any state recognizing identifiers. Label each accepting state accordingly.
Identifiers and keywords are defined as follows:
identifier −→ (_ | letter)(_ | letter | digit)∗ letter −→ a|b|···z
digit −→ 0|1|···9 keyword −→ if | in | int
That means, 0a is an invalid token whereas int1 and i are identifiers.
For simplicity, when labelling transitions, you are allowed to use a notation for ranges and comple-
ments, e.g., a..d for a,b,c,d and ̸=7 for any character (in our alphabet) different from 7. Solution: 3pt for each correctly accepting state
3. (20pt) Assume you have a C++ source file which you edit using the BBEdit editor (barebones.com). You need to replace all comments in the /* … */ style that span a single line with the same comment in the // … \n style. This can be done using the Find/Replace Grep feature. You are required to provide the two regular expressions corresponding to Find and Replace boxes. Comments spanning more than one line should not be affected. You are not required to handle strings containing comment-like strings, that is, such pseudo-comments would be replaced as well.
1

You will need to use the backreference feature: subpatterns captured by Find can be identified by enclosing them in parentheses, (…), and then used in Replace with \1,\2,… in the order they appear in Find.
Solution: 10pt for each correct Find/Replace reg. ex. Find: /\*(.*)\*/
Replace: //\1\n
4. (25pt) Consider the following grammar G (nonterminals are upper case, the rest are terminals):
1. P −→ E$$ 2. E −→ TL 3. L −→ +TL 4. L −→ ε
5. T −→ FM 6. M −→ *FM 7. M −→ ε
8. F −→ (E) 9. F −→ a
10. F −→ b 11. F −→ 0 12. F −→ 1
(a) (5pt) Show the parse tree for the input: (a+b)*0$$.
(b) (10pt) Compute first(X) and follow(X) sets for all nonterminals X (Syntax: slide 49). Use the algorithm given below and indicate, for each symbol added to a set, the pair (step, prod), where 1 ≤ step ≤ 3 is the step in the algorithm (showed as , in the algorithm) and 1 ≤ prod ≤ 12 is the production involved. (You can use jflap to check your calculations but have to show that you know how to compute the sets.)
(c) (10pt) Prove that G is LL(1). You need to use the definition of LL(1) (Syntax: slide 73).
1,
2
2
3

1
2
3
Solution:
(a) (5pt) 5pt for entire correct tree – 1pt subtracted for each error
3

(b) (10pt) 2pt for each nonterminal other than P , and only if the (step, prod) is correct, otherwise 0 as it is from jflap (in some cases, there is more than one correct production; any correct one receives full marks)
X P E T L F M
first(X )
(,a,b,0,1 : (1,1)
(,a,b,0,1 : (1,2)
(,a,b,0,1 : (1,5)
+ : (1,3)
( : (1,8),a : (1,9),b : (1,10),0 : (1,11),1 : (1,12) * : (1,6)
follow(X )
$$ : (2,1),) : (2,8)
+ : (2,2),$$,) : (3,3) $$,) : (3,2)
* : (2,5),+,$$,) : (3,5) +,$$,) : (3,5)
2pt for F-productions, 4pt for L-productions, 4pt for M-productions have first(+T L) ∩ follow(L) = {+} ∩ {$$, )}∅. For the two M -productions (6-7), we have
first(*F M ) ∩ follow(M ) = {*} ∩ {+, $$, )}∅. 5. (25pt) Given the grammar G:
(c) (10pt)
All F-productions (8-12) start with different terminals. For the two L-productions (3-4), we
S −→ S −→ E −→ E −→ E −→
id=E $$ E
E +id
E *id id
(a) (5pt) Is G LL(1)? Explain your answer.
(b) (10pt) Modify G by eliminating left-recursion on E. Is the new grammar, G′, LL(1)?
(c) (10pt) Modify G′ to become LL(1) by removing common prefixes. Solution:
(a) (5pt) 5pt for correct argument G is not LL(1) because id ∈ first(E), so there is a problem with the two S-productions. (Or E-productions can be used as well – left recursion).
4

READ ME!
This grammar is LL(1) because follow(T) = follow(X) = {$$} and $$ does not belong to any first set.
Submit all your answers as a single pdf file on owl.uwo.ca. Solutions should be typed but high quality hand written solutions are acceptable. (Source code, if required, is submitted as separate files.)
You are allowed to use jflap (jflap.org) for your solutions, whenever possible.
The solutions should provide sufficient detail to support your claim. A simple yes/no answer without any supporting arguments will not be given any marks.
Self-reported absences can be used to extend the deadline by 48h. In this case, you must label “SELF-REPORTED ABSENCE” clearly at the top of the assignment and submit it by email to cs3342@uwo.ca instead of OWL.
(b) (10pt) 5pt for correct left-recursion elimination, 5pt for correct arguments G′, shown below, is still not LL(1) because the problem mentioned above in G has not been affected by the change.
id=idX $$ idX
+idX *idX
ε
And then eliminate the common prefix of the S-productions:
S −→ S −→ E −→ X −→ X −→ X −→
id=E $$ E
idX +idX *idX
ε
(c) (10pt)
There are no common prefixes to remove, but the grammar is still not LL(1). So, we need to think. The grammar is not LL(1) because of the conflict between the S-productions. There is no obvious common prefix, but a hidden one: the id that we can generate as S =⇒ id=E$$ as well as S =⇒ E =⇒ idX. This is actually making the grammar not LL(1). If we eliminate E (there is only one E-production), then we have common prefixes:
5pt for correct construction, 5pt for correct arguments
S −→
S −→
T −→
T −→ X −→ X −→ X −→
S −→ X −→ X −→ X −→
5
idT $$ =idX X +idX *idX ε