窗体顶端
窗体底端
Last updated: Mon 8 Apr 2019 11:16:58 AEST.
COMP4403/COMP7402 – Compilers and Interpreters
Assignment 2
Modify the assignment 2 compiler for the language PL0 (provided on the course web page with this assignment) to add record (struct) and pointer types.
Assignment Compiler files
All sources for the assignment PL0 compiler are available as a2.zip (below). Please be sure to use the version for this assignment and not the one used for the tutorials or another assignment. There are differences (like the lexical tokens you need for this assignment are only defined in this version).
• a2.zip Save this .zip file and follow the instructions for setting up a compiler project in IntelliJ
• Setting up and running PL0 in IntellliJ
• Brief documentation on assignment 2 files
Please ensure you follow the course Piazza bulletin board for any updates and further information on the assignment.
• Many people for assignment 1 imported arbitrary external libraries (that weren’t actually used). Such imports led to the compilation failing in my test environment that does not include those libraries. Please check you aren’t importing external libraries before submitted. The only external imports allowed are from java.util.*
• You must only modify the files that must be submitted (see below).
• You must not modify any other files because we will be testing your implementation using the existing other files with your submitted files.
• Please do not reformat the files because we would like to just print the differences between the originals and the versions you hand in.
• Please keep the length of lines in your files below 100 characters, so that we can print them sensibly.
• Please avoid using non-standard characters, e.g. Chinese characters, including in the comments. Non-standard characters are not accepted by some Java compilers and all comments should be readable by the person assessing your assignment.
• Your implementation should be in Java 1.8. Set the IntelliJ preferences for the Java compiler to 1.8 (or use the “-source 1.8” option to the Java compiler).
• Please remove any debugging output before your assignment is submitted because debugging output will cause your program to fail our automated testing of your assignment.
• Either avoid using tabs or set your tabs stops to 4 spaces (this is the default for IntelliJ) so that your files will print sensibly.
Read the fine print in detail before you start! And, most important, when you have finished implementing the assignment, come back and reread the fine print again.
Records and pointers
Records are similar to C structs (and very simple Java classes). They only have fields (no methods). As well as records, we have a pointer type. Unlike Java classes, we distinguish between a record object and a pointer to that object.
Records are accessed via value (rather than reference), i.e., assignment of records copies all fields rather than assigning a reference. However, for a variable of type pointer to a record we only assign the pointer.
Record and pointer types may be declared in a type definition, for example, we can declare List as the type of a pointer to a value of type Node. The type Node is a record that contains two fields: x of type integer, and next of type List (i.e., pointer to a Node).
type
Node = record x: int; next: List end;
List = ^ Node;
For example, one may declare variables r and r2 to be of a record type, and p and q to be pointers as follows.
var r: Node; r2: Node; p : List; q: List;
A pointer value may be assigned a dynamically allocated object of its base type via a new expression.
p := new List
Fields of a record r may be assigned appropriate values:
r.x := 2; r.next := p
and the values of the fields may be accessed:
write r.x
Assignment of whole records is also allowed, e.g.,
r2 := r
assigns both fields of r2 the corresponding fields of r.
A record constructor allows a value of record type to be built, e.g., the following assignment to r is equivalent to the two assignments to the fields of r above.
r := Node{ 2, p }
A record constructor consists of the type identifier of a record followed by a list of expressions in braces that are assigned to the fields of the record. The order of the expressions corresponds to the order of declaration of the fields in the record type.
The value pointed to by a pointer is referenced by p^ and the fields of that object by p^.x and p^.next. For example, we may set the x field of the object pointed to by p to be 34 via the assignment
p^.x := 34
and we may dynamically allocate a new record object and set the next field of the record pointed to by p to point to that object by
p^.next := new List
Note that a new List expression can be used anywhere an expression of type List is expected. Arbitrary chains of dereferencing are allowed. For example, p^.next^.next references the next field of the object pointed to by the next field of the object pointed to by p. Such references may be used either on the left side of an assignment or on the right side within an expression where a value of that type (List in the example) is expected. For example, we may increment the field x of the object p^.next by the statement:
p^.next^.x := p^.next^.x + 1
Pointers may be directly assigned. For example, after the assignment
q := p
both q and p both point to the same object. If we then perform the assignment
q^.next := nil
where nil is the special pointer constant for a null pointer, then because both q and p point to the same object, the value of p^.next will also be nil. Pointers may also be compared (but only for equality and inequality), so after the above assignment the comparison
if p^.next = nil then …
will return true and the if statement will take the then branch.
Syntax Changes
The reserved keywords “record” and “new” have already been added to the lexical analyser as the tokens KW_RECORD and KW_NEW, and the symbols “{“, “}”, “.” and “^” have been added as tokens LCURLY, RCURLY, PERIOD and POINTER. They have also been added to the terminals definitions in PL0.cup.
The syntax for record and pointer type definitions follows. It is given in EBNF, but you won’t be able to use the EBNF directly in the parser generator Java-CUP.
Type → SubrangeType | TypeIdentifier | RecordType | PointerType
RecordType → “record” Fields “end”
Fields → Field { “;” Field }
Field → IDENTIFIER “:” TypeIdentifier
PointerType → “^” TypeIdentifier
A reference to a field of a record can be used as an LValue either within an expression or on the left side of an assignment. A pointer dereference can also be used as an LValue. A “new” expression or a record expression can be used as a Factor in an expression.
LValue → IDENTIFIER | LValue “.” IDENTIFIER | LValue “^”
Factor → … | “new” TypeIdentifier | TypeIdentifier “{” ExpList “}”
ExpList → Condition { “,” Condition }
Note that a field of a record may itself be of type record. Hence the above syntax has an LValue before the “.” rather than an IDENTIFIER and similarly there is an LValue before “^”.
The Parser Generator Java-CUP
The parser specification for the compiler is specified in the file PL0.cup. You will need to add productions (and their associated actions) to the specification and then run the parser generator Java-CUP (using run configuration “Run CUP”) to generate the files CUPParser.java and CUPToken.java. Do not modify these two Java files directly (even if you think you understand them (do you really?)) – remake them from PL0.cup. You should make the compiler before you change anything just to see what forms of messages to expect. When you make the compiler (before you modify it) there will be some warning messages about the terminal symbols like ILLEGAL being declared but never used; these are to be expected at this stage. Any new warning/error messages will be significant. There is HTML documentation for Java-CUP available from the class web page (with the assignment).
The Scanner Generator JFlex
All the lexical tokens for this version of the compiler have already been added to the lexical analyser.
The file Lexer.java is automatically generated by the scanner generator JFlex from PL0.flex; again, do not modify Lexer.java – remake Lexer.java from PL0.flex.
Both Java-CUP and JFlex are available with the assignment files on the course web page, with instructions on how to run them in IntelliJ. Java archives for Java-CUP and JFlex are part of the IntelliJ project for the assignment.
Static Semantic Restrictions
Records
The names of fields of a record must be distinct.
All of the type identifiers used for fields must be defined type identifiers.
A field of a record may be of any type, including a pointer or record type but may not include the same record type as being declared either directly or indirectly, e.g. the following is invalid both because R directly includes R and because R indirectly includes R, via Q.
type R = record x : int; p : Q; next: R end;
Q = record y : int; q : R end;
The rule for well formedness of a record type is given below.
∀ i,j ∈ 1..n • i ≠ j ⇒ idi ≠ idj
∀ j ∈ 1..n • syms ⊢ typeof(tj) = Tj
syms ⊢ typeof(record id1: t1; id2: t2; … idn: tn end) = RecordType([ (id1, T1), (id2, T2), … (idn, Tn) ])
A reference to a field of a record (e.g., r.x) consists of an LValue, (e.g., r), which must be a record and a field name, (e.g., x), which must be one of the fields of that record. Note that an LValue may appear on either the left or right side of an assignment. Hence the two uses of ref in the following rule.
j ∈ 1..n
syms ⊢ e: ref(RecordType([ (id1, T1), (id2, T2), … (idn, Tn) ]))
syms ⊢ e.idj: ref(Tj)
For example,
syms ⊢ r: ref(RecordType([ (x, int), (b, boolean) ]))
syms ⊢ r.x : ref(int)
For a record constructor, its type identifier must be a record type and it gives the type of the constructed record. The expressions in the type constructor must match the fields of the record in the order in which they were declared in the record type declaration. Each expression must be assignment compatible with its corresponding field.
id ∈ dom(syms)
syms(id) = TypeEntry(RecordType([ (id1, T1), (id2, T2), … (idn, Tn) ]))
∀ j ∈ 1..n • syms ⊢ ej : Tj
syms ⊢ id{ e1, e2, … en } : RecordType([ (id1, T1), (id2, T2), … (idn, Tn) ])
Assignments between records are allowed, provided the left and right sides of the assignment are of the same type. Two types are considered equivalent if they are the same type identifier, or if one is a type identifier, B, that is defined as equal to another type identifier, A, i.e.,
type
A = record x: int; y: int end;
B = A;
[This makes the implementation simpler than doing structural equivalence of records.]
Comparison of records is not supported.
Pointers
The identifier in the pointer type definition must be a type identifier. It can be of any type (not necessarily a record type as in the example).
The rule for well formedness of a pointer type is given below.
syms ⊢ typeof(t) = T
syms ⊢ typeof(^t) = PointerType(T)
To be dereferenced (using “^”) an LValue, p, must be of type ref(PointerType(T)), for some type T. The type of the result of the pointer dereference is ref(T).
syms ⊢ e : ref(PointerType(T))
syms ⊢ e^ : ref(T)
The identifier in a new expression must be a pointer type. Hence, for a type identifier id,
id ∈ dom(syms)
syms(id) = TypeEntry(PointerType(B))
syms ⊢ (new id) : PointerType(B)
Assignment of pointer values and comparison of pointer values, for equality and inequality only, is allowed provided the types are equivalent. If T is declared as a pointer type (i.e., PointerType(B) for some type B) then the equality and inequality operators have the following additional types.
= : T x T → boolean
≠ : T x T → boolean
An exception here is the special constant nil which is a pointer value that is compatible with all pointer types. The constant nil has already been defined. It is of the special pointer type NIL_TYPE, which is compatible with any pointer type. Hence for any type identifier id,
id ∈ dom(syms)
syms(id) = TypeEntry(PointerType(B))
syms ⊢ nil : PointerType(B)
Two pointer types are considered equivalent if their base types are equivalent. e.g.,
type
C = ^A; D = C; E = ^A;
Types C, D and E are all compatible.
Note that Type.java already includes record and pointer types.
Dynamic Semantics
Records
A record must be allocated enough space to contain values for all its fields, either on the stack if it is a local variable, or on the heap if it is dynamically allocated using new.
A field reference, r.x, acts like a variable of the type of the field and hence can be used on either the left or right sides of assignments.
For the record assignment
r := p^
all the fields of r are assigned the corresponding fields of the record pointed to by p.
Because we are copying values and not references, all the fields need to be assigned (see the Object Code section for a way to do this simply).
Pointers
The dynamic semantics of the new expression and of accessing a pointer are conventional.
If p is of type ref(PointerType(T)) then a pointer dereference, p^, corresponds to the object (of type ref(T)) pointed to by the pointer p. If the value of the pointer p being dereferenced is nil the dereference is invalid (a run time error).
The expression new List dynamically allocates space on the heap for an object of the type pointed to by List and returns the address of that new object. Objects dynamically allocated via a new have a life time that is not restricted to that of the variable via which they were allocated. For example, an object may be created within a procedure using a local variable within the procedure to perform the new. Provided that variable’s value (the pointer to the new object) is assigned to a variable or field that is accessible via variables global to the procedure, before the procedure exits, the object will remain accessible.
Although we dynamically allocate objects via the new statement, we won’t implement garbage collection of objects that are no longer accessible.
Object Code
The PL0 compiler generates code for the Stack Machine. A description of the stack machine is available in Inst.pdf. See also the file StackMachine.java (near the end) for details of the instruction set.
Records
To implement record assignments the stack machine includes the LOAD_MULTI and STORE_MULTI instructions, both of which take a count of the number of words to load from or store to an address.
• the source address for loading (or target address for storing) is in second top of stack, and
• the number of words to be loaded (or stored) is in the top of the stack.
Pointers
There is an instruction, ALLOC_HEAP, which assumes that there is an integer on the top of the stack that represents the size of the object it is to allocate (new). It pops that value from the stack and replaces it with the absolute address of a newly allocated object of that size. The stack machine does not support disposing objects or garbage collection.
If there is insufficient space then ALLOC_HEAP will fail with a “memory overflow” message. In the implementation of the stack machine there is a single array representing the whole of memory: the stack (the bottom end of the array) and the heap of dynamically allocated objects (the top end). If either pushing onto the stack reaches the lower limit of the heap, or allocating on the heap reaches the top of the stack, then there is insufficient memory and the program aborts (with the same error message in both cases).
As the instructions LOAD_FRAME and STORE_FRAME expect an address that is an offset from the frame pointer for the current (procedure) scope, you will need to use the instruction TO_LOCAL which converts an absolute address (the pointer value) into an address relative to the current frame pointer.
The STOP instruction with a parameter of StackMachine.NIL_POINTER indicates a nil pointer dereference error.
Student Misconduct
Students are reminded of the University’s policy on student misconduct, including plagiarism. See the course profile and the School web page http://www.itee.uq.edu.au/itee-student-misconduct-including-plagiarism
You are expected to protect your files so that they are not readable by other users.
You are also reminded not to post your (partial) solutions to assignments to any place accessible by others, including the bulletin board or emailing to other students. If you need that sort of help consult the lecturer and/or tutor. Note that Piazza allows private posts to the instructors.
Late Submission
A penalty of 5% of the maximum mark for an assignment will be deducted for each day or part thereof late up to a limit of 7 days, after which time assignments will not be accepted for assessment unless you have been granted an extension.
As we plan to hand back assignments a week or two after submission, requests for extension will not be accepted more than one week late, unless there are exceptional circumstances.
Requests for extensions should be accompanied by suitable documentation (see the course profile for details).
Personal hardware or computer failures are not grounds for extension. Assignments must be backed up on the university system.
Submission
Please keep the length of lines in your files below 100 characters, so that we can print them sensibly. You should avoid using tabs or set your tabs stops to 4 spaces so that when we print them (with tab stops set to 4 spaces) they will print sensibly. Don’t forget to remove any code generating debugging output and rogue external imports before submission.
You must submit your completed assignment electronically through the assessment section of the course BlackBoard site (the BlackBoard Assessment page rather than the course web pages).
You need to submit the following list of individual files, not a .zip file (note that file names are case-sensitive) for evaluation and marking.
• PL0.cup
• StaticChecker.java
• CodeGenerator.java
• ExpNode.java
• ExpTransform.java
You can submit your assignment multiple times, but only the last copy submitted will be retained for marking.
Assessment
The assignment is marked out of a total of 20 marks.
Marks will be allocated as follows:
• 6 Syntax analysis and tree building
• 9 Static semantics checking
• 5 Code generation
Marks will be awarded for the correctness of the changes to each category. Readability and modular structure will also be criteria. We will not be concerned with the quality of syntactic error recovery because the parser generator CUP takes care of that for the most part, but you must handle semantic errors appropriately, including handling the situation when there is a syntax error, i.e., your compiler should not crash because of a syntax error.
Your assignment files will be compiled in the context of the remaining assignment files and put through a sequence of tests. The total mark for this assignment will be limited by the overall success of the development in the following way:
• The program submitted does not compile: Maximum 10/20.
• The program submitted will not correctly handle any test case with the new facilities: Maximum 13/20.
You are not required to correct any bugs that may exist in the original compiler. However, we would appreciate being informed of any such bugs that you detect, so that we can correct them, or any improvements to the compiler you might like to suggest.
窗体顶端
窗体底端