Assignment 4 Programming Languages and Paradigms
1 Introduction
In this project, you will implement a language called MiniML, in OCaml. This will allow you to work with a larger code base, make use of many of the concepts that you have seen in class throughout the semester, and gain a deeper understanding on key concepts such as free variables, substitution, evaluation, type checking and type inference. MiniML is similar to OCaml itself and very similar to the language that is discussed in the lectures and lecture notes; it is however also more powerful than the minimal language discussed in the lecture notes. In particular, it includes n-ary tuples (pairs written as (e1, . . . , en)), a more general let-expression, functions, function application, and recursion.
Goal of the project
In this project, you will extend existing implementation with simple code analysis tools, an inter- preter (which evaluates an expression), a type checker, and eventually, a type inference engine.
Overview
The general pipeline we will use in this project is as follows:
1. Frontend (From External Language to Internal Language): In the frontend, we read and parse a program written in an external language and produce the corresponding internal program (i.e. the program in the internal language). We already have implemented this frontend, which means you can conveniently write programs in the external language.
2. Core : The core of this project consists of code analyzer, an evaluator, a type checker, and a type inference engine. All these programs take a program in the internal language as an input. Hence, you will be writing programs that analyze and evaluate expressions in the internal language. The frontend translates MiniML programs into their corresponding internal representation and thus allows you to write MiniML programs in a clean, readable form.
We will explain this pipeline in more detail in the next section.
1
2 Internal and External Language
Our MiniML internal language can be described as an extension of the language that is given in the lecture notes:
typesτ:=int|bool|τ1 ->τ2 |τ1 ∗…∗τn
expressionse:=…|fnx:τ => e|e1 e2 |recf:τ=>e|(e1, …, e2)|letdsine end declaration d :=val x = e | val (x1,…,xn) = e | name x = e
declarations ds :=· | d ds
It is worth noting that this grammar describes an internal language where issues regarding precedence order, brackets, etc. have been already resolved. A programmer actually writes pro- grams in the external language, i.e. the language that the parser accepts. In the project, we provide a frontend which reads and parses programs written in the external language and produces the corresponding internal program (i.e. the program in the internal language). For example, we write
1 2 3 4 5 6 7 8
let fun fact (x : int) : int = if x = 0 then
1
else
x * fact(x – 1)
in
fact 5 end;
which corresponds to
letvalfact=recfact:int->int=>fnx:int => ifx=0then1elsex∗fact(x−1)infact5 end
Having such a frontend makes it easier to write programs as input.
In general, a valid MiniML program is just an expression followed by a semicolon. All MiniML programs eventually compute to values (if they terminate). Our external language supports various operators. They include arithmetic operators like addition (+), subtraction (-), multiplication (*), division (/) and negation (~), arithmetic comparisons (<, >, <=, >=), equality comparisons (=, !=) and logical operators like logic and (&&) and logic or (||). All operators have familiar precedence levels. Notice that equality comparison operators only apply to integers.
3 With A Little Help From Your Friendly Teaching Assistants . . . 3.1 Parsing
We provide a number of helper functions in order to help you focus on the essential part of the project.
Following is the signature of a parsing module which we provide:
1 2 3
module P : sig
val parse : string -> (string, exp) either
end
You can invoke this function by
2
1 P.parse “some string”
The function returns Right e if the string is a valid MiniML program, or Left s if it is not and s is
the corresponding (not-so-specific) error message. For example:
P.parse “let fun fact (x : int) : int = if x = 0 then
1 else
x * fact(x – 1) in
fact 5 end;”;; (* this – Right
([ Val (Rec
(“fact”, TArrow (TInt, TInt), Fn
(“x”, Some TInt , If
(Primop (Equals , [Var “x”; Int 0]),
Int 1,
Primop (Times ,
[Var “x”;
Apply (Var “fact”,
Primop (Minus ,
[Var “x”; Int 1]))])))),
“fact”)],
Apply (Var “fact”, Int 5)))
*)
program returns (Let
1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
1 2
1 2 3 4
1 2 3 4 5 6 7
In a negative case:
P.parse “1” (* should end with semicolon *) ;; – : Left “Expected SEMICOLON token”
3.2 Pretty Printing
Clearly, abstract syntax is not very easy to read. Sometimes you want to quickly verify if an abstract syntax tree is as expected. In that case, you can use a printing helper which is provided by the following module:
module Print : sig
val exp_to_string : exp -> string val typ_to_string : typ -> string
end
which converts an exp/typ to a readable form:
let Right e = P.parse “let fun fact (x : int) : int = if x = 0 then
1 else
x * fact(x – 1) in
fact 5
3
end;” in
print_string Print.exp_to_string e; print_newline ();;
(* this program prints
let fun fact : int -> int x = if x = 0 then 1 else x * fact (x – 1) in fact 5 end *)
8
9 10 11 12 13 14 15
3.3 Testing
We have two methods to help you test your code, unit tests and hidden tests.
We implement some basic helper functions to unit test your code. For a requested function foo, you can write your own unit tests in foo_tests and run your tests by evaluating your code and executing run_foo_tests (). For example, for Q1, you can write some unit tests in free_vars_tests and run run_free_var_tests () to run all the tests. You are encouraged to test your code frequently! Hidden tests can be run by clicking the “Grade” button. The report will simply tell you either that you pass all the tests or the number of tests you fail. The tests are here to catch some common errors and do not meant to be exhaustive. You should write more unit tests to make sure your code
is correct!
At last, please remember to back up your code! You will be working on this project for days
and weeks. You don’t want to restart everything from scratch because LearnOCaml erases your work!
3.4 Other Helpers
We provide other helper definitions in the prelude buffer above the main buffer. It includes some useful helper functions as well as some examples of valid MiniML programs. Make sure to take a quick look before getting started!
4 Grading
We provide the testing mechanisms in order to help you focus on the technical problems themselves. This project is graded offline. That is, your code might pass all the hidden tests on LearnOCaml, but that does not imply a guaranteed 100%; bugs might still get caught by our offline tests and our manual code review. For this reason, please try your best to test your own code!
You can organize the given code freely, including defining your own top-level definitions, adding or removing rec, etc, as long as you implement the target functions correctly with the exact given types.
5 Questions
5.1 Q0 (0 point): Trying out Parser Frontend
Get yourself familiar with the external syntax of MiniML and the usage of the parsing function P.parse. Please use parse_tests : (string * (string, exp) either) list to familiarize yourself. Each
4
entry is a MiniML program and the expected output of P.parse. Hit Grade button and see if you have got the tests right.
5.2 Q1 (10 points): Free Variables
Implement the function free_vars : exp -> name list which when given an expression computes the free variables occurring in the expression. This function is key to implementing substitution properly.
Your implementation should follow closely the definition of free variables given in the notes and extend it to tuples, let expressions, functions and recursion. The order of the returned names does not matter.
5.3 Q2 (15 points): Unused Variables
OCaml also tests whether a given variable is used or not; this is a useful little tool which we would also like to support for MinML. Implement a function unused_vars: exp -> name list which given an expression checks for each variable introduced by a binding construct (i.e. a function, let expression, etc. ) whether the variable the construct introduces is in fact used in the body of the expression.
For example:
let val x = 3 in 4 end; xisunused letvalx=trueinletvaly=4inx+5endend; yisunused
let val x = 3 in let val x = 4 in x + x end end; x(thefirstoccurrence)isunused
Similarly, given the following program, x and test is unused: 1 let fun test (x : int) : int = 3 in 4 end;
Implement the function unused_vars: exp -> name list. It traverses an expression and for every construct which introduces some bound variables (i.e. functions, let-expression) we will check whether these variables occur free in the body; if they do, then indeed these variables are used; if they do not occur free in the body, then the variables are unused. You might want to use free_vars that you just implemented. The order of names does not matter.
5.4 Q3 (15 points): Substitution
Finish the implementation of function subst:(exp * name) -> exp -> exp for substitution to handle let-name, pairs, let-pair, functions, function application, and recursion.
The first argument to the function subst contains a tuple of an expression e’ and a variable name x denoting the substitution [e’/x]. The second argument is an expression e to which we apply the substitution. In other words,
subst (e’,x) e = [e’/x]e
i.e. the function subst will replace any free occurrence of the variable x in the expression e by the expression e’. Note that subst is capture-avoiding. Use fresh_var to obtain a fresh variable name if necessary.
You can then for example test your substitution function as follows:
1 2
# subst (Int 5, “x”) (If (Bool(true), Var “x”, Var “y”)); val it = If (Bool true, Int 5, Var “y”) : exp
5
5.5 Q4 (30 points): Big Step Evaluation
In this question, we will implement an interpreter based on big-step evaluation. Your task is to implement the missing cases of the function eval for big-step evaluation. In particular, add the cases for functions, function application, recursion, and handling generalized let-expressions, i.e. those cases which currently raise the exception NotImplemented. The evaluation rules are fully listed in Figure 1.
If you encounter a situation not covered by these rules, then you should raise (Stuck “message”) (with something more descriptive than “message”). See the other cases for examples of this.
The evaluator is not environment-based, so there is a direct translation from the above rules to code. Use subst that you just implemented.
As is often the case, not very much code is required.
5.6 Q5 (25 points): Type Checking
In this question, we are inferring the type of an expression. We use the type annotations on recur- sion and functions, to guarantee that we can always uniquely infer a unique type of an expression. It simplifies the type inference problem. For now, you can ignore type variables.
To describe the type inference we will use the following judgement:
Γ ⊢ e ⇒ T Infer type T for expression e in context Γ
The inference rules are given in Figure 2. Your implementation of the function infer: context -> exp -> typ should follow exactly the algorithm described by the typing rules. It takes as input a context describing the typing assumptions and an expression. It returns the type of the expression. For this question, you can assume that all functions are annotated with their types. We will handle those that are not in the next question.
If an expression is not well-typed, then we can raise a TypeError exception by calling the type_fail function.
5.7 Q6 (30 points): Unification
Lastly, we want to enable full type inference; as a consequence, we can omit the type annotations on functions and recursion. Key to type inference is unification. Your task is to implement the function unify: typ -> typ -> unit, which checks whether two types are unifiable, i.e. if we can make them syntactically equal.
You should follow the description of unification in the class notes. Type variables are modeled via references. An uninstantiated type variable is modeled as a pointer to a cell with content None. In other words to create a new type variable we can simply use a function
1 let fresh_tvar () = TVar (ref None)
Once we know what a type variable should be instantiated with we simply assign it the correct type. For example, if we have a type variable TVar x, then x has type (typ option) ref and we can replace every occurrence of TVar x by the type Int using assignment x := Some TInt.
This will destructively update the type variable x and directly propagate the instantiation for it. No extra implementation of a substitution function is necessary to propagate instantiations.
Implement the function unify : typ -> typ -> unit. If two types are unifiable, they will be de- noting the same type after unification succeeds. If unification fails, raise a TypeError exception by
6
e ⇓ v expression e evaluates to value v
e⇓v
let · in e end ⇓ v
e1 ⇓ v1 [v1/x](let decs in e end) ⇓ v let val x1 = e1 decs in e end ⇓ v [e1/x](let decs in e end) ⇓ v letnamex1 =e1 decsine end ⇓ v
(B-NO-DECS) (B-LET-VAL) (B-LETN) (B-LET-TUPLE) (B-TUPLE) (B-REC) (B-APP)
(B-FN) (B-IFTRUE) (B-IFFALSE) (B-ANNO)
e1 ⇓true e1||e2 ⇓true
e ⇓ v ∼e⇓−v
e1 ⇓ (v1,…,vn)
let val (x1,…,xn) = e1 decs in e end ⇓ v
for all i ei ⇓ vi
(e1, …, en) ⇓ (v1,…,vm) [recf:τ=>e/f]e ⇓ v recf:τ=>e ⇓ v
[v1/x1,…,vn/xn](let decs in e end) ⇓ v
e1 ⇓fnx=>e e2 ⇓v2 e1 e2 ⇓ v
[v2/x]e⇓v
fn x => e ⇓ fn x => e
e1 ⇓ true e2 ⇓ v
if e1 then e2 else e3 ⇓ v
e1 ⇓false e3 ⇓v
if e1 then e2 else e3 ⇓ v e⇓v
e:τ ⇓ v
n is a number b ∈ {true, false}
e1 ⇓true e1&&e2 ⇓v
n⇓n b⇓b e1 ⇓false e1 ⇓false
e2 ⇓v e1 ⇓ v1
e2 ⇓v e1&&e2 ⇓false e1||e2 ⇓v
e2 ⇓ v2 opisabinaryoperator e1ope2 ⇓v1opv2 B-OP
Figure 1: rules for big-step evaluation
7
Inference Rules
Γ ⊢e⇒bool Γ ⊢e1 ⇒τ Γ ⊢e2 ⇒τ Γ ⊢ if e then e1 else e2 ⇒ τ
Γ, x : τ ⊢ e ⇒ τ ′
Γ⊢(fnx:τ => e)⇒τ->τ′
Γ ⊢ e1 ⇒ τ1 · · · Γ ⊢ en ⇒ τn
Γ ⊢ (e1, . . . , en) ⇒ (τ1 ∗ · · · ∗ τn)
Γ, f : τ ⊢ e ⇒ τ
Γ ⊢ (rec f : τ => e) ⇒ τ
Γ ⊢e1 ⇒τ->τ′ Γ ⊢e2 ⇒τ
Γ ⊢ e1 e2 ⇒ τ′
Γ ⊢ decs ⇒ Γ ′ Γ, Γ ′ ⊢ e ⇒ τ Γ⊢letdecsine end⇒τ Γ⊢e⇒τ
Γ ⊢ (e : τ) ⇒ τ
(T-IF) (T-FN) (T-TUPLE) (T-REC) (T-APP) (T-LET) (T-ANNO)
n is a number Γ ⊢n⇒int
Rules for declarations
Γ ⊢dec1 ⇒Γ1 Γ,Γ1 ⊢decs⇒Γ2 Γ ⊢dec1 decs⇒Γ1,Γ2 Γ⊢e⇒τ
Γ ⊢(valx=e) ⇒ (x:τ) Γ⊢e⇒τ
Γ ⊢(namex=e) ⇒ (x:τ)
Γ ⊢ e ⇒ (τ1 ∗ · · · ∗ τn)
Γ ⊢ (val (x1,…,xn) = e) ⇒ (x1 : τ1),…,(xn : τn)
Figure 2: Type inference rules
b ∈ {true, false} Γ ⊢b⇒bool
Γ ⊢ei ⇒int opis<,<=,>=,>=,=,!= Γ ⊢e1 ope2 ⇒bool
Γ ⊢ei ⇒int
Γ ⊢e1 ope2 ⇒int
opis+,-,*,/ Γ ⊢ei ⇒bool
Γ ⊢e⇒int Γ ⊢∼e⇒int
opis&&,|| Γ ⊢e1 ope2 ⇒bool
(T-DECS) (T-BY-VAL) (T-BY-NAME) (T-BY-VAL-TUPLE)
8
calling the type_fail function. Follow the algorithm described in the notes to unify two types, and fill in the implementation for unify.
5.8 Q7 (10 points) EXTRA CREDIT: Type Inference
Modify your function infer that it supports type inference. This is quite easy: if the type annotation in recursion and functions is None and we don’t know the type of a variable x we want to add to the context, we simply generate a fresh type variable tv and add to the context the fact that x has type tv; instead of checking for equality, you call unification.
9