CS 352 Fall 2020 Compiler Project 2.1 Posted Oct. 3, 2020 October 18th, 2020, 11:59pm
1. Milestones in Project 2
To ensure that students make timely progress, we divide project 2 into three parts, setting three milestones (or due dates). The scope of the project increases from one milestone to the next, eventually covering the entire scope of MiniJava+. No “sample solution” will be provided for any of the milestones, except that a code skeleton will be provided as a start point for the first part of Project 2. Hence, if any language feature is missed in a submission, the student must still complete it as he or she moves to meet the next milestone, as such a feature may still have an impact on the test cases in a later.
This project is to be done by each student individually. Review the Academic Honesty Policy posted on Piazza carefully. In order to ensure timely progress, the project will have a few milestones (i.e. progressive deadlines). The due date/time for each milestone will be firm, except for the grace days allotted to each student (see the beginning note in Project 1 handout about the grace days).
Students are reminded that, with the scale of 100 for all projects this semester, Project 2 (including all parts) counts 35, of which the first part (P2.1) counts 5, the 2nd (P2.2) counts 10, and the last part (P2.3) counts 20.
We may release the specifications for P2.2 and P2.3 well before an earlier part is due. This will allow students who finish the earlier part before it is due to start working on the next part, and it gives students a better opportunity to write their code with the later parts in mind.
2. Overview of Project 2
This section gives an overview of the tasks in Project 2. Minute details will be clarified on Piazza as various questions may be posed by students.
2.1 The Main Tasks
Project 2 consists of three main components:
• Add semantic actions to the parser built in Project 1 to construct an abstract syntax tree (AST) and a symbol table for the input program.
• Traverse the AST to perform type checking and report type rule violations.
• Interpret the program execution if no type rule violations are found. The test cases for program interpretation are supposed to cause no exceptions. The error message must be printed exactly as specified, except that 2.5 A Special Note on Requirements versus Assumptions about Test Cases 1) No offline submission (such as email) is accepted. Hint: The program features covered by P2.1 are much less than the entire MiniJava+. The student may consider include just enough production rules to cover the required features for P2.1 to reduce the chance of parsing conflicts. Similar bonus points may be offered in the follow-up parts of Project 2 as the number of required rules increases. We note that Chapter 15 often refers to earlier sections, e.g. variable reference issues and types of variables (Chapter 4). Students must follow references that are relevant to Project 2. Keep in mind however, only a very small part of the referenced sections is relevant to MiniJava+. Also, MiniJava+ does not permit any type conversion or type casting, except the widening and narrowing of class references between a class and its extensions. A2. Handling Declarations
• If an operation is found to violate a type rule, then an error message must be printed. The type of the operation result becomes undefined. Any operation that has an undefined operand will not be reported as a type violation. NOTE: This prevents excessive error messages due to propagation of undefined types through a sequence of intermediate results.
The type checking part and the interpretation part will carry approximately the same weight in grading. There will be no partial credits for AST construction by itself, because there are no effective ways to check how good the AST is constructed without examining the type checking and/or interpretation results.
As discussed in the lectures, the production rules in your Yacc program need to be written in such a way that, when parsed by Yacc generated parser, the AST can be constructed correctly by the semantic actions according to Java semantic rules. Hence you may need to revise some of the production rules written for Project 1, especially those for arithmetic and logical operations. For projects 2 and 3, you are allowed to use Yacc defaults and Yacc operator precedence directives such as %left and %right in case your grammar causes parsing conflicts, but it is your responsibility to ensure that the AST reflects correct operation execution order according to Java semantics.
Note that although Yacc resolves parsing conflicts according to the specified operator precedence, it will still give warnings about the existence of such conflicts. To encourage students to apply the ideas discussed in the lectures to minimize parsing conflicts, bonus points will be granted for minimizing such conflicts. This will be explained in more details later in this document.
2.2 Type Checking
All test cases are supposed to meet the MiniJava+ syntax form posted for Project 1. Hence the only errors in the test cases will be the violations of the type rules. Your code generated by Yacc must print error messages to report type rule violations found in the input MiniJava+ program, using the format to be discussed later in this document. Such error messages must be written to the standard error file (stderr). If and only if a program has no type errors found, the program is interpreted and the output specified in the program (through the MiniJava+ printing statements) must be written to the standard output file (stdout). Your code generated by Yacc must not produce any output except those specified above. Hence, if your yacc program has additional printing statements added to yyerror() for debugging purpose, you must remove them before submitting the program.
The semantics of a MiniJava+ follows that in Java standard. Students should carefully read Java type rules (as specified in https://docs.oracle.com/javase/specs/jls/se7/html/jls-4.html#jls-4.2 ).
Each error message must be printed on a separate line, in the form of
Type Violation in Line
NOTE: Many users have found it highly useful to track the line and column numbers using YYLTYPE as in an online document. (https://www.gnu.org/software/bison/manual/html_node/Tracking- Locations.html). Its explanation in the subsection “Location Default Action” can be modified to detect the number of lines in a multiline comments. Note that from your yacc file you can access the contents of the YYLTYPE by using “@$, @1, @2, …” in the same way “$$, $1, $2, …” are accessed.
2.3 Program Interpretation
The main purpose of implementing interpretation for this project is to ensure that your AST is constructed properly, which is also important to Project 3 which generates assembly code. Interpretation provides a way to debug AST at source level, which is much easier than at assembly level.
The general idea for interpreting a source code is to use the AST nodes to store the intermediate operation results in an expression and to use the symbol table to update the values of variables when they are re-assigned. In the introduction we already discussed how to interpret println and print.
2.4 Java Standard Classes and Methods in This Project
System.out.println, System.out.print and Integer.parseInt are standard Java methods. It is up to the students whether to treat these method name as special tokens (i.e. three reserved words) or store the class names System.out and Integer, as well as the method names println, print, and parseInt in the symbol table, marked as standard methods with predefined operations.
It is expected that System.out.println and System.out.print are both implemented, when an input program is interpreted, by calling C printf standard function, with %d format for integer and Boolean values and %s for string literals. No additional white spaces are to be printed. If the student chooses to implement the project in C++, an analogous C++ printing function should be called. For the execution semantics of this method, see standard description through this link https://docs.oracle.com/javase/8/docs/api/java/io/PrintStream.html
Integer.parseInt converts a string to an integer, e.g. Integer.parseInt(a[0]) in the sample program factorial.java. It is expected that this method to be implemented by calling C atoi() standard function. If the student chooses to implement the project in C++, an analogous C++ printing function should be called. For the execution semantics of this method, see standard description through this link https://docs.oracle.com/javase/8/docs/api/java/lang/Integer.html#parseInt-java.lang.String-
In the project descriptions this semester, when we say “so-and-so is not allowed …” or “so-and-so is illegal …”, then your compiler must be able to detect any occurrence of so-and-so and report it as an error. If, however, we say that “the input is assumed to contain no so-and-so …”, then you may assume that so-and-so will not appear in the test cases.
3. Scope of Part 2.1
It is very important that students start early on this project and check to see whether the progress meets the timeline expectation. For this project (namely P2.1), the student must complete the following:
1. Focus on a subset of MiniJava+ programs that
(i) contain the main class, but no other classes.
(ii) The main class will have the main method only.
(iii) The main method body will have just a sequence of assignment statements and the two printing statements mentioned in the syntax specification. The method body has one name scope only, i.e. no nested block of statements.
(iv) All variables and constant literals are of int type. All variables are simple scalar variables (i.e. no arrays and no objects, etc). All expressions specified in MiniJava+ syntax specification must be implemented as long as the operands in the expressions are of the types specified above.
2. Implement type checking to make sure a variable is declared before it is referenced. You may assume that all variable declarations will declare variables to be of int type. However, both int expressions and string constant literals may appear in the printing statements. NOTE: In the terminology of programming languages, variables and constants are special cases of expressions. No concatenation or type conversion of int expressions and string literals will appear in any part of the test cases.
3. Implement a symbol table for declared variables so you are able to see whether a variable has been declared when a reference is made. It will also be used for storing the current value of each variable during the interpretation (see 4 below). For this milestone, no AST is needed, but students should feel free to implement it because the next milestone will need it.
4. Implement an interpreter to perform the computation specified in MiniJava+ programs in the subset defined above. For this project, this can be done by semantic actions associated with each recognized expression and assignment. If the student builds the AST, the interpretation can also be performed by traversing the tree.
4. Submission instruction
2) Use the following command on CS lab machines that run any Linux OS to submit your homework.
turnin –c cs352 –p p2_1 [your working directory]
(Your home directories are directly accessible to you on all CS lab machines.)
Prior to executing the turnin command, it is mandatory for you to make sure the contents in the submitted directory meet the requirement specified in Project 1.
3) You are free to write your own Make file, as long as the following requirements are met.
The grader will run your Makefile by executing the command “make interpret” to produce the executable program named interpret. Hence, it is important that your Makefile produces such an executable when invoked.
You must also write your Makefile such that when “make clean” is run, all files, except the Makefile, your Yacc and lex programs, and the .h and .c files (or the C++ equivalents), are all removed. This is to ensure that the system does not exceed the storage limit set by the system staff.
4) Your program MUST compile and run without any error on CS lab’s Linux machines. Please do a final test of this before submission, especially if you do your project on other computers.
5) Your code will be tested for grading by the following command;
> ./interpret program_name
Where “program_name” is the file name containing the input program.
NOTE: Deviation from the above requirement will get a 10% penalty. (For example, if your code receives 90% of the total points, the final score on the Brightspace for P2.1 will be 80%.)
5. Discussion Online
The policies for online discussion and information sharing concerning this project are the same as in Project 1 under the section titled “Discussion Online”
6. Bonus Points for Removing Parsing Conflicts
The bonus points received for removing parsing conflicts depend on the thoroughness of such removal. The maximum bonus points, 5% of the total points, are granted if no parsing conflicts are reported by Yacc. With each remaining parsing conflict reported, the bonus points received is decreased by 1 percentage point, i.e. 4% bonus points if one conflict remaining.
APPENDIX A Guideline for Type Checking and Interpretation
This appendix discusses type checking and interpretation for Project 2, not just for its first milestone.
The authoritative document for type and run-time evaluation rules for our project is Java Language Specification (Java SE 7bEdition, abbreviated as JSE7 in the rest of this document). This document can be found at https://docs.oracle.com/javase/specs/jls/se7/html/index.html
This is a large document, but only a small part concerns our Project 2, mainly because MiniJava+ is a small subset of Java. The limited syntax form of MiniJava+ dictates that its type rules are also limited. Students are expected to have learned the relevant type rules in the introductory Java programming class (CS180), but the JSE7 document formalizes the rules and serves as the final authority for correctness of type checking and program interpretation in Project 2.
It is always a good idea to test simplest forms of statements and expressions first, then increment the level of complexity of test cases. Students are encouraged to discuss and share test cases on Piazza, but the teaching staff will not comment on or endorse any opinions unless the discussion involves project requirements which are legitimately perceived as lacking sufficient clarity.
Once students gain confidence with the ASTs constructed for a subset of MiniJava+, the type checking implementation can proceed for that subset. Following the same idea, subsets of increased complexity are defined, and program interpretation assuming the absence of type errors is conducted to test the expanded ASTs. Lastly, type checking is performed. This continues until the full set of MiniJava+ is covered.
In Project 2, we do not require handling of run time errors and we do not perform run time type checking.
The rest of the document focuses mainly on the type checking issues, with program interpretation mentioned when appropriate.
A1. Expressions
We suggest that students start by reviewing all forms of expressions in MiniJavaPlus and enumerate the type rules govern such expressions. The most important document concerning type rules and run time semantics for expressions is Chapter 15 of JSE7. This may seem to be a long chapter, but much of it covers the syntax rules, which we already covered in Project 1. Also, much of it concerns language features not in MiniJava+ and can be skipped. Descriptions of type rules and run time semantics are the focus of Project 2. For the latter, students should pay special attention to the evaluation order of expressions, argument lists, and statements. For Project 2, we do not handle the issue of abrupt termination of evaluation of expression (Section 15.6). This issue will be discussed in a lecture later this semester.
The following is an incomplete, but most important, list of type checking issues concerning expressions. We will leave lingering issues for the students to discover, as an opportunity to learn how to understand a language specification, such as that in JSE7. Take this also as a great opportunity to strengthen what is learned from an introductory Java programming course and become a better-equipped Java programmer.
A general discussion of types of expression is made in JSE7 Sections 15.3 and 15.5. We suggest the following order (from easiest to the most complicated) for implementation of type checking concerning expressions.
A1.1 Operations involving simple variables of primitive types
This part involves unary plus, unary minus, and unary complementary logical operation, as well as binary arithmetic, relational, conditional logical, equality and simple assignment operations. The type rules and evaluation rules are specified in JSE7 Sections 15.15.3, 15.15.4, 15.15.6, 15.17, 15.20, 15.21, 15.23, 15.24, 15.26.1.
Note that for comparison operations between reference variables, MiniJava+ has much simpler cases than the full Java, because we do not have casting operations. 1.2 Operations involving fields of objects and members of arrays.
Work continues on aforementioned expressions that involve references to fields in an object and members in an array. Array creation can also be implemented in this step. The type and evaluation rules governing field accesses are presented in JSE7 Section 15.11. A brief discussion of rules governing array member accesses and array creation can be found in JSE7, Chapter 10. In this course, we do not handle array bound checking at run time.
This part will account for 10% of effort and grade points for type checking and program interpretation, respectively.
A1.2 Operations involving fields of objects and members of arrays.
The type and evaluation rules governing field accesses are presented in JSE7 Section 15.11. A brief discussion of rules governing array member accesses and array creation can be found in JSE7, Chapter 10. In this course, we do not handle array bound checking at run time.
A1.3 Operations involving method invocations
JSE7 has an elaborated discussion on method invocation rules (Sections 15.9, 15.10, 15.12). This involves simple method invocation, argument/parameter mapping, constructor invocation, creation of arrays, etc. Section 15.12 is specifically devoted to discussions of compiler tasks and potential complications for handling method invocations.
All type information for MiniJava+ originates from variable declarations. To obtain type information, the subtrees for variable declarations are traversed and information is stored in the symbol table
During program interpretation, the values of simple variables and simple object fields may be stored in the symbol table also. The intermediate results are stored in AST internal nodes that represent subexpressions. A variable assignment causes a new value to be stored to the symbol table entry for the assigned variable. A reference to an operand causes the interpreter to fetch the variable value from the corresponding symbol table entry.
However, values of array members must be recorded in corresponding “shadow” arrays that are dynamically allocated when the MiniJava+ array is dynamically created by a new operation.
A2.1 Classes and Method Types
Recall that MiniJava+ has no type conversion and type casting, except for the widening and narrowing of class references between a class and its extensions. The rules governing class extensions are found in JSE7, Section 5.5.1.
The members of a class type are all of the following:
• Members inherited from its direct superclass (§8.1.4), except in class Object, which has no direct superclass
• Members declared in the body of the class (§8.1.6)
Keep in mind that MiniJava+ does not have the keyword package and it has only a single Java class source file. Rules governing class declarations are in JSE7, Chapter 8
The type of a method is described by an ordered 3-tuple consisting of:
• Argument types: a list of the types of the arguments to the method.
• Return type: the return type of the method member.
• Throws clause (not in MiniJava+): exception types declared in the throws clause of the method member.
Fields, methods, and member types of a class type may have the same name, since they are used in different contexts and are disambiguated by different lookup procedures (JSE7, §6.5). However, this is discouraged as a matter of style.
We will discuss method invocations, including object constructors, in the lecture.