Table of Contents
1 Introduction …………………………………………………………………………………………………………………….. 2
1.1 Goal of the Project………………………………………………………………………………………………………4
1.2 Project Bundle …………………………………………………………………………………………………………… 5
2 First Part: The Sigma Calculus …………………………………………………………………………………………. 7
2.1 Abstract Syntax………………………………………………………………………………………………………….. 7
2.2 Interpreter …………………………………………………………………………………………………………………. 8
3 Second Part: An Extended Source Language …………………………………………………………………… 11
3.1 Abstract Syntax………………………………………………………………………………………………………… 11 3.2 Classes ……………………………………………………………………………………………………………………. 13 3.3 Single Inheritance …………………………………………………………………………………………………….. 15
4 Third Part: Free Extensions and Improvements ………………………………………………………………. 18
4.1 Some pointers ………………………………………………………………………………………………………….. 18
5 Guidelines ………………………………………………………………………………………………………………………. 19
5.1 Final Report …………………………………………………………………………………………………………….. 19 5.2 Advice …………………………………………………………………………………………………………………….. 20 5.3 Submission………………………………………………………………………………………………………………. 20
1 Introduction
In this project we will explore the design of a simple Object-Oriented Language. The language is based on the Sigma calculus, which is a core OOP language described by Abadi and Cardelli in the book:
A Theory of Objects
Unfortunatelly the book is not freely available online, but there are a few sets of slides available with information about the Sigma calculus. For example:
Slides for A Theory of Objects
In part two of the slides above a technical description of the Sigma calculus is given. Interested students can have a look for further reference, but the project bundle should be self-contained and explain the major concepts involved.
The Sigma calculus and the Theory of Objects play a similar role to the lambda calculus for Object-Oriented Languages. It aims to capture the key concepts in Object-Oriented Programming in a minimal language. In the Sigma calculus, there are 4 basic constructs:
1. Variables
2. Objects
3. Method Calls
4. Method Updates
The first 3 constructs should be familiar to anyone that has used an OOP language before. The 4th construct is related to method overriding, but it is more significantly more powerful.
Variables are just like the variables we covered in the course and that appear in essentially all programming languages.
Objects are collections of methods. An important point of the Sigma calculus is that classes are not needed to build objects: instead objects can be built directly. In the base Sigma calculus there is no new construct, which is what languages like Java use to create an object from a class. In technical terms the Sigma calculus is an object-based language, rather than a class based language. As a side note the Javascript OOP-model is also object-based, though in recent versions of JavaScript classes have been added.
We will also add classes to our language later. But for now we focus on the basic Sigma calculus.
Objects are built directly. The following expression gives a first example of objects in the Sigma calculus:
2
[l1 = {this} this, l2 = {this} 0]
In this example, we create an object. The syntax for objects is minimalistic, and it is basically a collection of components enclosed by square brackets. Each component starts with a label (the name of the method or field) and has a method definition.
An important point about the Sigma calculus is that method definitions take a special argument, which is basically the self-reference to the object. For instance the method definition for l1 is {this} this. The first part ({this}) introduces a variable called this, which is the self- reference. The this in the Sigma calculus plays a similar role to this in languages like Java. But the Sigma calculus is different from those languages in that:
• All methods have to explicitly introduce a variable for the self-reference; In contrast in languages like Java, this is a keyword and it is implicitly available in a class.
• You can name the self-reference with any name. For instance, we could have written the same program above as:
[l1 = {x} x, l2 = {x} 0]
In both examples, the first method (l1), returns the object itself. The second method l2 returns 0. The two versions of the program behave in exactly the same way. The only difference is syntactic: in one program the self-reference is called this, while in the other it is called x (note that you could even choose different names for each method).
Method calls, allow calling a method in an object, and they are similar to method calls in standard OOP languages. A simple example that illustrates methods calls is:
[l1 = {x} x, l2 = {x} 0].l1.l2
In this example, right after the creation of the object, we call methods l1 and l2. Since l1 just returns the object itself, the subsequent method method call to l2 will just return
0.
Another example is a small variant of the earlier program:
[l1 = {this} this, l2 = {this} 0, l3 = {this} this.l2]
In this program the method l3 calls l2 via the self reference this with the call expression this.l2.
The most unusual construct is the method update construct. A method update allows replacing an existing method implementation in an object by another one. Some dynamically typed
3
languages, offer similar functionality, but most statically typed OOP languages do not allow this. An example of a method update is:
[ contents = {x} 0, set = {this} this.contents <~ {y} 10]
In this code, the set method, when executed, will replace the implementation of contents by
{y} 10. Thus, if you execute:
[ contents = {x} 0, set = {this} this.contents <~ {y}
10].set.contents
You should get 10. Note that in the example, the self reference variables x and y are unused. In our concrete syntax, we can omit unused self-reference variables. Thus, you can write the code as:
[ contents = 0, set = {this} this.contents <~ 10]
Our parser will translate it to the correct form. Notice also, how the code above looks like mutable update of contents in the method set. This is not a mere coincidence: the method update construct generalizes assignment and mutable updates.
1.1 Goal of the Project
As part of the project we already provide an implementation of an interpreter for the Sigma calculus.
The project will consist of three main parts:
1. To extend the basic interpreter for the Sigma calculus with more constructs, such as more types of data, and local variable declarations;
2. To create a more complete OOP source language that “compiles” into our core Sigma calculus.
3. To allow for creative extensions where students are free to add their own features and improvements.
The first two parts of the project will basically account for 75% of the marks, while the 3rd part will account for the remaining 25%.
4
Note that the parser for the language (as well as large parts of the interpreter) are already implemented for you. You don’t need to extend the parser yourself, although for part 3 of the project, if you wish, you could try to extend the syntax of the language as well.
1.2 Project Bundle
For this project there is a bundle of files that can be downloaded from Moodle.
As assignments/tutorials, we already provide a stack project with full configuration. The structure of fold is:
-- app
|-- Main.hs
-- src
|-- Declare.hs
|-- Parser.hs
|-- Tokens.hs
|-- Target.hs
|-- Source.hs
|-- Testcase.hs
-- test
-- testcases
|-- (some examples)
In app folder, there is only one file Main.hs, for calling the functions in src folder. The src folder is the core.
Tokens.hs contains all tokens we might use, generated from Tokens.x. Parser.hs is the parser of concrete syntax, generated from Parser.y. Declare.hs contains all the definition of abstract syntax with a pretty printer. Target.hs contains a partial implementation of the Sigma Calculus.
Source.hs is the file that contains the source language. Testcase.hscontains some unit tests that help you check your program.
5
The test/testcaes folder provides some examples help you check the program. Feel free to add more tests.
There are several useful commands you might need to know:
stack clean delete the executable file and all the object files from the directory. stack build compile the whole project.
stack run -- (filename) load the file from test folder and run it. For example, if you want to run the file cell.obj in test/testcases, you should type stack run - testcases/cell.obj when your directory is “.../bundle”. stack test run the unit tests from the comments.
6
2 FirstPart:TheSigmaCalculus
In the bundle you will find an implementation of an interpreter for the Sigma calculus in the file Target.hs. The definitions of the syntax and other definitions can be found in the file Declare.hs.
2.1 Abstract Syntax
We start with the abstract syntax:
data Method = Method Var SigmaTerm deriving Eq
data SigmaTerm = SigmaVar Var | Object [(Label, Method)]
| Call SigmaTerm Label
| Update SigmaTerm Label Method
-- variables:
-- objects:
-- calls:
-- update:
x, y ...
[l = {this}
0]
x.l
x.l <~ {this}
0
| Let Var SigmaTerm SigmaTerm -- variable declarations: var x = 1; 2
| Lit Int -- numbers | Boolean Bool -- booleans | Binary BinaryOp SigmaTerm SigmaTerm-- binary op
| Unary UnaryOp SigmaTerm -- unary op
| If SigmaTerm SigmaTerm SigmaTerm -- conditionals if (x > 0) then 1 else 2 deriving Eq
The first 4 constructors correspond to the 4 language constructs of the Sigma calculus that were already presented in the introduction. In addition to those, we also have some other constructs that were covered in the classes: local variable declarations; integers, booleans, unary and binary operations and conditionals. For example, we support code such as
var x = 10; [ l = x + 10 ].l
If we evaluate this code, the result should be 20.
7
1,2, 100
true, false
1 + 2, 10 * 6
-(5), not true
2.2 Interpreter
The interpreter uses a standard environment that keeps track of local variables and the values associated with them. Moreover, it also uses a memory model, like the interpreter in Lecture 12.
The type Mem is a model of memory: type Object = [(Label,
MethodClosure)] type Mem = [Object]
It is used to store objects in memory. For this language we have to have memory, just as in the interpreter with mutable state, because objects are mutable. That is, the method update operation can actually modify methods stored in objects. Therefore the memory stores all the objects that are allocated in the program.
Notice that the objects stored in memory are essentially collections of method closures; each method has a name (the label) and a method closure. The method closure, like with function closures, stores the environment at the point of definition of the method.
We use a replace operation:
replace :: Int -> Label -> MethodClosure -> Mem -> Mem to update a method in an object stored in memory.
The values in the language are:
data Value = VInt Int | VBool Bool | ObjRef Int deriving Eq
We have 3 values: integers, booleans and object references. Whenever evaluation evaluates an expression that computes an “object” what is returned is not the object itself, but rather a reference to the location of the object in memory.
The type signature for the evaluator for the Sigma calculus is:
evaluateS :: SigmaTerm -> Env -> Mem -> Maybe (Value, Mem)
Basically we have 3 inputs and two outputs. The types of the inputs are: 1. SigmaTerm: The expression to be evaluated
8
2. Env: The current environment 3. Mem: The current memory
The outputs are:
1. Value: The value that is computed
2. Mem: The updated memory (in case method update operations have been performed)
We suggest that students study the (partial) implementation of the evaluateS in Target.hs. The main point is that when objects are created, memory is allocated to store the object information (the methods) in memory. To access the objects in memory we use object references.
Question 1. (10 points) Adding unary and binary operators, including pretty printer in Declare.hs and the definition in the sigma calculus interpreter (function evaluateS) in Target.hs. Note that, in the file Declare.hs there are already two datatypes BinaryOp and UnaryOp, which contain tags for all the supported unary and binary operations. You should implement code that deals with all the binary and unary operators in those datatypes.
Question 2. (10 points) Adding missing cases for local variable declarations and conditionals, including in the pretty printing function in Declare.hs and evaluateS function in Target.hs.
Question 3. (15 points) Add clone expression, including pretty printing in Declare.hs and evaluateS function in Target.hs.
The semantics of clone(a) is that it returns a new object with the same methods as the object denoted by the expression a. Any change of the cloned value should not affect the value of a. During the evaluation, you should retrieve the value of a, and allocated a fresh object with the same methods as a in memory.
An example using clone is:
var o1 = [l = {this} 0]; var o2 = (clone(o1).l <~ 10); o1.l + o2.l
In this example, we create an object o1, and then create a clone of the object, but with an updated implementation of l. The update of l should not affect o1 (only the cloned object). Thus the final result should be 10.
Question 4. (15 points) In this question the goal is to refactor the code for the interpreter in Target.hs to use a monadic style.
9
In Lecture 12 we saw how to write an interpreter with memory in two different styles.
The first style, lets call it direct-style, explicitly passes the memory as an argument and returns the updated memory as the result. The current implementation in Target.hs uses a direct style. The other implementation approach (lets call it the monadic approach) was to use the Stateful Monad. The second approach is described in slides 37 to 42, and the corresponding code for this implementation appears in the StatefulMonad.hs file in the code for Lecture 12.
In the monadic interpreter the code completely encapsulates the memory management. For instance, the refactored code for binary operations in the interpreter for Lecture 12 is:
evaluate (Binary op a b) env = do
v1 <- evaluate a env
v2 <- evaluate b env
return (binary op v1
v2)
In such code all the memory management is implicitly dealt by the bind and return operations of the Monad. Moreover, there are a few little auxiliary functions, such as:
newMemory :: Value -> Stateful Value newMemory val = ST (\mem-> (AddressV (length mem), mem ++ [val]))
That are helpful to encapsulate memory management even in cases where the memory needs to be changed. For instance, for mutable cells, we used the following code:
evaluate (Mutable e) env = do– @Mutable(2+3)
v <- evaluate e env
newMemory v
For this question you should refactor the code in Target.hs and write in monadic style, so that all memory management is encapsulated. You can create auxiliary functions, such as newMemory to help you with this. You can use a StatefulMonad like the one used in Lecture 12, or you can even create your own modified versions that will further help making the code more elegant.
10
3 SecondPart:AnExtendedSourceLanguage 3.1 Abstract Syntax
Realistic programming languages are often built around a small core like the Sigma Calculus or the Lambda Calculus, but providing many convenience source level features that can be encoded in terms of the small calculus. We will take a similar approach in the project. In addition to the Sigma Calculus, which will be our core language, we will have another, richer, language that translates to the Sigma calculus. In essence we will have the following architecture:
Source --> Sigma
The first thing that you might notice is that we do not have first-class functions in the Sigma Calculus. The reason is that we can encode lambda expressions with the existing constructs in the Sigma calculus. We use the concrete syntax \var -> exp to represent function expressions, just like Haskell and a(b) to represent function application.
The Abstract Syntax for the Source language is:
data SMethod = SMethod Var Exp deriving Eq
data Exp = SLit Int | SBool Bool
| SUnary UnaryOp Exp
| SBin BinaryOp Exp Exp
| SIf Exp Exp Exp
| SVar Var
| Lam Var Exp
| Apply Exp Exp
| SObject [(Label, SMethod)]
| SCall Exp Label | SUpdate Exp Label SMethod
| SLet Var Exp Exp deriving Eq
—
new!
—
new!
11
There are only two new constructs: Lam is used for defining lambda expressions, and Apply is used to define function applications. We have seen such constructs in the class. For example, the code (\x -> x)(5+6)
will evaluate to 11.
The code
[contents=0, set={this}\n- >this.contents<~n].set(3).contents will evaluate to 3.
Question 5. (10 points) Implement the extensions supporting first-class functions as an encoding in terms of the Sigma calculus in Source.hs and complete the pretty print in Declare.hs.
Concretely, you should implement a function
translate :: Exp -> SigmaTerm
It converts terms in the source language to terms in the target language. The translation of most constructs are straightforward, since they already exist in the target. Only the two new constructs for first-class functions are more challenging. We give some pseudo-
code for what the translation next. You can also look at slide 61 for a more formal definition of the translation (the first two cases in the slides can be ignored).
For lambda expressions \ x -> b, the translation proceeds as follows: translate (\x -> b) ~->
[ arg = {x} x.arg, val = {x} (substitute (translate b) x (x.arg)) ]
The idea is to encode a lambda expression using an object with two fields/methods. The first field (arg) stores the argument. For the second field, the translation first translates the lambda body b, and then substitures all the occurrences of x by x.arg in the translated body. A simple example is:
\x -> (x + 1) ~->
[arg = {x} x.arg, val = {x} x.arg+1]
For function applications b(a), the translation proceeds as follows: translate ( b(a) ) ~->
var f = clone(translate b); y = translate a; (f.arg <~ {z} y).val
12
The translation is also tricky. We firstly translate b, then clone the value and assign it to f. We then translate a, and have a method update that updates the function argument with y(f.arg <~ {z} y). Then we perform the method call by invoking val. A simple example of a translation of a function application is:
javascript (\x -> (x + 1) ) (5) ~-> let f = clone([arg={x} x.arg, val={x} x.arg+1]) in let y = 5 in (f.arg<~{_} y).val
3.2 Classes
In modern OOP languages, classes are a basic feature. We will also add support for classes in our source language. For that, we need three new constructs that we add to the source language.
data Exp = ...
| Top --- new
| Class [(Label, SMethod)] Exp --- new | SNew Exp --- new
Top represent an empty class. Top is like Object in Java. We use it to tell the interpreter that our class does not inherit from any class (or you can say it inherits from the Top class).
For example,
class { l = {x} exp }// exp refers to an expression
will be interpreted to Class [Label "l", SMethod (Var "x") exp] Top
Another example of a class definition in concrete syntax is:
class {
contents = {x} 0,
set = {x} \ n -> x.contents <~ n
}
Classes look like objects, but instead of defining methods, they define pre-methods. Unlike objects in the Sigma calculus, we cannot simply define a class and immediately call a method. We must first create an instance of the class using new. This is the same as class-based languages like Java.
Question 6. (10 points) In this question you need to complete the definition of the function classGen in Source.hs. classGen transforms source expressions: it takes a source
13
expression with classes, and produces another source expression where classes are encoded in terms of objects.
The procedure of classGen is basically to generate an object using 2 steps: 1) it changes all the pre-methods into methods that have a self-reference argument; 2) it adds adds a new method to create an object instance. In other words: assume that you have a class declaration with multiple methods:
class {l_i = {x_i} b_i} // for i from 1 to n classGen should return
[ new = {z} [li = {x} z.li(x) ],
li = \xi -> bi ]
For example, you have class definition:
class {
contents = {x} 0 , set = {x} \ n -> x.contents <~ n
}
// for i from 1 to n
Then the classGen function should return
[ new = {z}
[contents = {x} z.contents(x), set = {x}
z.set(x)], contents = \ x -> 0 , set = \ x ->
\ n -> x.contents <~ n
]
Object instances are created with new. An example of new is:
var cell = class {
contents = {x} 0 , set = {x} \ n -> x.contents <~ n
};
(new cell).set(3).contents It will evaluate to 3.
We denote a class as cell, and we create a new an instance and do some operations on that.
14
The expression new cell actually does two things: 1) applies classGen to cell and get an object; 2) invokes new component from the object.
For example, the result of classGen of cell will be:
[new={z} [set={x} z.set(x), contents={x} z.contents(x)], set = \x -> \n -> x.contents<~n, contents = \x -> 0]
For solve this question, we might need to change the function signature of translation function:
— before:
translate :: Exp -> SigmaTerm — new: type TClass = [(Var, Exp)] translate :: Exp -> TClass -> SigmaTerm classGen :: Exp -> TClass -> Exp
We add a new context TClass for storing class definitions. When you have class definition, you should store it to the context. When you use new on a class, you look up the class definition from the context.
The translation for new operation is (in pseudo-code):
translate (SNew a) tc = Call (translate (classGen a tc) tc) (Label
“new”)
3.3 Single Inheritance
Our classes can easily support single inheritance. We describe the functionality of classes and the encoding of classes as objects next.
An example of single inheritance using concrete syntax as:
var cell = class {
contents = {x} 0 , set = {x} \ n -> x.contents <~ n
};
var gcell = class {
15
get = {x} x.contents } extends cell;
(new gcell).set(5).get
After running the classGen function we should get:
[new = {z}
[get = {x} z.get(x), set = {x} z.set(x),
contents = {x} z.contents(x)], get = \x ->
x.contents, set =
super.set, contents = super.contents]
We can even support method overriding. In the
example below,
var cell = class {
contents = {x} 0 , set = {x} \ n -> x.contents <~ n
};
var uncell = class {
undo = {x} x, set = {x} super.set(var y=clone(x); x.undo <~ y)
} extends cell;
(new uncell).set(3).set(5).undo.contents After running the classGen function we should get:
[new = {z}
[get = {x} z.get(x), set = {x} z.set(x),
contents = {x} z.contents(x), undo = {x}
z.undo(x)],
set = \ x -> super.set(var y=clone(x); x.undo <~ y) , contents = super.contents,
undo = \ x -> x]
Remember that our class definition is
16
Class [(Label, SMethod)] Exp
With single inheritance enabled, Exp refers to the super class, otherwise it refers to Top. For
supporting single inheritance, we could just modify classGen a little bit:
1. when meeting the super, refer it to the super class. You already store all the class
definition in context.
2. for every pre-method m_i in super classes, check if it is defined in the sub class. If no,
change the pre-method to m_i = super.m_i in sub class.
3. you don’t need to change other things.
Question 7. (5 points) modify classGen to make it support single inheritance.
17
4 ThirdPart:FreeExtensionsandImprovements Question 8. (25 points)
The final part of the project gives you freedom to improve the interpreter and language in various ways. There are multiple ways in which the interpreter can be improved: that includes adding new language features; as well as improving the convenience and use of the interpreter. In this part we hope you can unleash your creativity. The more interesting your improvements are the better your grades will be for this part.
We leave a few ideas of possible improvements next:
• Improved Error Handling: Besides encapsulating memory menagement in the interpreter, you may want to have improve error handling so that the code uses a Checked Monad like in Lecture 11. The easier option to achieve this is to create a single Monad that combines the Stateful Monad from Lecture 12 with the Checked Monad from Lecture 11. This requires you to create a suitable datatype and monad instance. Reading more about monads will be helpful for this.
• A command line interpreter: Command line interpreters, such as ghci (for Haskell) are a great way to interact and test your programs. Thus, a command line interpreter would be a valuable extension to the current implementation.
• More language features: While the current interpreter already has a few interesting language features, there are many more that can be added. For instance, we have built-in support for lists and list-comprehensions, loops, support method overriding, support multiple inheritance, etc. All of these are just a few, of many possible language extensions.
• Type checking: The current language is dynamically typed, but it is possible to support static typing as well. You could try to add a simple type system to the language, perhaps without supporting all the features of the language. The slides on the theory of objects contain rules for a simple type system, for example. For simplicity, you can avoid the use of Subtyping by using only equality. Moreover also, notice that supporting methods that return this is tricky, as this requires recursive types. So, you may want to consider those methods as ill-typed (i.e. they would not type-check), since recursive types requires some advanced concepts.
4.1 Some pointers
To help you with ideas for programming language extensions, as well as implementation techniques that you can use. We leave a number of pointers next.
18
• More about Monads. The paper:
Monads for Functional Programming
describes the use of Monads for improving the code of interpreters. This paper can be helpful improving your code in the interpreter.
• Happy and Alex: So far, in this course, you did not need to extend the concrete syntax and grammar rules yourself. However, if you decide to add new language extensions you will need to extend the concrete syntax and the grammar of the language. Our implementations use the Happy Parser Generator for Haskell, and Alex for generating lexical analysers in Haskell. You can find documentation for both in the following links: https://www.haskell.org/happy/ https://www.haskell.org/alex/
• Types and Programming Languages: A book that covers many programming language features (and the type systems for those features) is the “Types and Programming Languages” book. Note that a pdf for an (early) version of this book can be found here:
http://ropas.snu.ac.kr/~kwang/520/pierce_book.pdf
You may find some interesting ideas for language extensions there. Generally speaking, other books on programming language implementation can be a source of ideas. You may want to read the chapters 27 (Imperative Objects) and 32 (Purely Functional Objects) of the book Types and Programming Languages for getting deeper understanding of Objects and some ideas for extra features.
• More slides for the Theory of Objects 5 Guidelines
Please read and follow the guidelines carefully!
5.1 Final Report
You must submit a report in pdf format along with the solution that:
1. Describes how you implemented each feature
2. Explains briefly any helper function that you used
3. Explains the extra features, in the final part, that you implemented.
19
Note that point 3 is particularly important for us to evaluate your extensions. You should try to explain the features, give some code examples or illustrate with screen captures, and describe how the feature was implemented (as well as listing any sources/references for the implementation techniques).
5.2 Advice
The main advice is to get the basic functionality for the project correct (i.e. parts 1 and
2), together with a good report. This alone will probably guarantee a good pass grade for the course. If time permits you can try to design some extra features (for part 3) for extra marks.
Although we provide some testcases, you’d better to check your program with more own examples.
Please make sure to refer to the course material in implementing new features.
5.3 Submission
Please make sure to submit your solution (report must be in src folder) via moodle before the deadline. Deadline (May 24th, 2021) is strictly set by university and we cannot change it. Late submissions will not be accepted in any case.
Please submit the complete bundle of solution via zip file. Your zip file must be named as PROJECT_XXXXXX.zip where XXXXXX must be replaced by your UID. For example PROJECT_3030058128.zip.
20