1 2 3 4 5 6
Chapter 5 References
“Many people tend to look at programming styles and languages like reli- gions: if you belong to one, you cannot belong to others. But this analogy is another fallacy.”
– Niklaus Wirth
Previously, we were careful to emphasize that expressions in OCaml (or similar functional languages) always have a type and evaluate to a value. In this note, we will see that expressions can also have an effect. An effect is an action resulting from evaluation of an expression other than returning a value. In particular, we will consider language extension which supports allocation and mutation of storage during evaluation. During evaluation, we will see that storage may be allocated, and updated, and thereby effect future evaluation.
This is one of the main differences between pure functional languages such as Haskell (languages which do not support effectful computation directly) and impure functional language (languages which do support effectful computation directly).
5.1 Binding, Scope
So far, we have seen variables in let-expressions or functions. An important point about these local variables or binders were that 1) they only exist within a scope and 2) their names did not matter. Recall the following example:
let test () =
letpi =3.14in
let area = (fun r -> pi *. r *. r) in leta2 =area(2.0)in
letpi =6.0 in
let a3 = area 2.0 in
(*1*) (* 2 *) (*3*) (*4*)
55 c B. Pientka – Fall 2020
Chapter 5: References 5.2 Reference Cells
7 8 9
1 2 3
1 2
print_string (“Area a2 = ” ^ string_of_float a2 ^ “\n”);
print_string (“Area a3 = ” ^ string_of_float a3 ^ “\n”) ;
The binder or local variable pi in line (* 1 *) is bound to 3.14. If we are trying to establish a new binding for pi in line (* 4 *), then this only overshadows the previos binding, but it is important to note that it does not effect or change other bindings such as area or a2. In fact, OCaml will give you a warning:
letpi =6.0 in (*4*) ^^
Warning Y: unused variable pi.
Since we do not use the second binding for pi in the subsequent code, the variable pi with the new value 6.0 is unused. The result of the expression above will be 12.56, and a2 will have value 12.56. The fact that we overshadowed the previous binding of pi in line (* 4 *) did not have any effect.
5.2 Reference Cells
To support mutable storage, we will extend our language with reference cells. A reference cell may be thought of as a container in which a data value of a specified type is stored. During execution of a program, the contents of a cell by be read or replaced by any other value of the appropriate type. Since reference cells are updated and read by issuing “commands”, programming with cells is also called imperative programming.
Changing the contents of a reference cell introduces a temporal aspect. We of- ten will speak of previous and future values of a reference cell when discussing the behavior of a program. This is in sharp contrast to the effect-free fragment we have encountered so far. A binding for a variable does not change when we evaluate within the scope of that variable. The content of a reference cell may well change when we evaluate another expression.
A reference cell is created or allocated by the constructor ref. ref 0 will create a reference cell with content 0, i.e. it is a reference cell in which we store integers. Hence the type of ref 0 will be int ref. Reference cells are like all other values. They may be bound to variable names, passed as arguments to functions, returned as results of functions, even be stored within other reference cells.
Here are some examples:
let r = ref 0 let s = ref 0
56 c B. Pientka – Fall 2020
1 2 3 4
1 2 3 4 5 6 7 8 9
10 11 12 13 14
Chapter 5: References 5.2 Reference Cells
In the first line, we create a new reference cell and initializes it with content 0. The name of the reference cell is r. In the second line, we create a new reference cell with content 0 and bind it to the name s. It is worth pointing out that s and r are not referring to the same reference cell! In fact, we can compare s and r, by r == s which will evaluate to false.
In OCaml, there are two comparisons we can in fact make:
• r == s compares whether the cell r, i.e. r’s address in memory, is the same cell as s, i.e. s’s address in memory. == compares the actual location, i.e. addresses. In the above example, this returns false.
• r = 1 compare the content of the cell r with the content of the cell s. In the above example, this will return true.
We can read the content of a reference cell using the construct !. !r yields the current content of the reference cell r, in this example it will return 0.
We can also directly pattern match on the content of a cell writing the following pattern:
Here x and y are bound to the values contained in cell r and s respectively.
let {contents = x} = r;; val x : int = 0
# let {contents = y} = s;; val y : int = 0
To change the content of a reference cell, we use the assignment operator := and it typically written as an infix operator. The first argument must be a reference cell of type ⌧ ref, and the second argument must be an expression of type ⌧. The assignment r := 5 will replace the content of the cell r with the value 5. Note that the old content of the cell r has been destroyed and does not exist anymore!
Consider the following piece of code:
let r = ref 0 let s = ref 0 let a = r=s
let a = r==s let_=r:= 3 letx=!s+ !r let t = r
let w = r = s let v = t = s letb=s==t letc=r==t let_=t:= 5 lety=!s+ !r letz=!t+ !r
(*1*) (*2*) (*3*) (*4*) (*5*) (*6*) (*7*) (*8*) (*9*) (* 10 *)
57 c B. Pientka – Fall 2020
1
1
1 2 3 4 5 6 7 8 9
Chapter 5: References 5.2 Reference Cells
In line (* 2 *), the variable x will be bound to 3. Line (* 3 *) establishes a new binding. We can now use the name t to refer to the same reference cell as r. This is also called aliasing, i.e. two variables are bound to the same reference cell. Comparing the content of r with the content of s (see line (* 4 *)) will return false. Comparing the content of t and s (see line (* 5 *))) also returns false. Comparing the address of cell s with the address of cell t will also return false. Obviously, the cell s and the cell t are not the same – they have different addresses in memory and contain different values.
However, comparing the address of cell r with the address of cell t will also return true! Updating the content of the reference cell t in line (* 8 *) will mean that also r will have the new value 5. Remember, that t and r do not stand for different reference cells, but they are just different names for the same reference cell!
Finally, y will evaluate to 5 (see line (* 9 *)), and z will evaluate to 10 (see line (* 10 *)).
Notice also the use of a wildcard in line (* 1 *). The value of the expressions r := 3 is discarded by the binding. Assignment r := 3 has the empty value () and is of type unit.
Often it is convenient to sequentially compose expressions. This can be done using the semicolon ;. The expression
exp1 ; exp2
is shorthand for the expression
let_=exp1in exp2
It essentially means we first evaluate exp1 for its effect, and then evaluate exp2.
Rewriting the introductory example with references gives us the following:
let test_update () =
letpi =ref3.14in
let area = (fun r -> !pi *. r *. r) in leta2 =area(2.0)in
let_ =(pi:=6.0)in
let a3 = area 2.0 in
print_string (“Area a2 = ” ^ string_of_float a2 ^ print_string (“Area a3 = ” ^ string_of_float a3 ^
(*1*) (* 2 *) (*3*) (*4*)
“\n”); “\n”)
;
Note that now a2 will still bound to 12.56, while when we compute the value for a3 the result will be 24. Updating the reference cell pi will have an effect on previously defined function. Since an assignment destroys the previous values held in a reference cell, it is also sometimes referred to as destructive update.
58 c B. Pientka – Fall 2020
Chapter 5: References 5.3 Observation
5.3 Observation
It is worth pointing out that OCaml and other ML-languages distinguishe cleanly between the reference cell and the content of a cell. In more familiar languages, such as C all variables are implicitely bound to reference cells, and they are implicitly dereferenced whenever they are used so that a variable always stands for its current contents.
Reference cells make reasoning about programs a lot more difficult. In OCaml, we typically use references sparingly. This makes OCaml programs often simpler to reason about.
5.4 Programming well with references
Using references it is possible to mimic the style of programming used in imperative languages such as C. For example, we might define the factorial function in imitation of such languages as follows:
1 2 3 4 5 6 7 8
let imperative_fact n = let result = ref 1 in let i = ref 0 in
let rec loop () =
if !i = n then ()
else (i := !i + 1; result := !result * !i; loop ()) in
loop (); !result
Notice that the function loop is essentially just a while loop. It repeatedly exe- cutes its body until the contents of the cell bound to i reaches n. This is bad style!!! The purpose of the function imperative fact is to compute a simple function on natural numbers. There is no reason why state must be maintained. The original program is shorter, simpler, more efficient, and hence more suitable.
However, there are good uses of references and important uses of state. For example, we may need to generate fresh names. To implement a function which generates fresh names, we clearly need a global variable which keeps track of what variable names we have generated so far.
let counter = ref 0
(* newName () ===> a, where a is a new name *) (*
Names are described by strings denoting natural numbers. *)
let newName () =
(counter := !counter+1;
1 2 3 4 5 6 7 8
59 c B. Pientka – Fall 2020
9
Chapter 5: References 5.4 Programming well with references
“a”^ string_of_int (!counter))
Later on we will see how to in fact model objects using functions (=closures) and references.
5.4.1 Mutable data structures
So far we have only considered immutable data structures such as lists or trees, i.e. data structures that it is impossible to change the structure of the list without build- ing a modified copy of that structure. Immutable data structures are persistent, i.e. operations performed on them does not destroy the original structure. This often makes our implementations easier to understand and reason about. However, some- times we do not want to rebuild our data structure. A classic example is maintaining a dictionary. It is clearly wasteful if we would need to carry around a large dictionary and when we want to update it, we need to make a copy of it. This is what we would like in this case is an ”in place update” operation. For this we must have ephemeral (opposite of persistent) datastructures. We can achieve this by using references in SML.
1 2
1 2 3 4 5 6 7 8
Then we can define a circular list as follows:
(* Datatype for Reference Lists. *)
type ’a rlist = Empty | RCons of ’a * ’a rlist ref
# let l1 = ref (RCons(4, ref Empty)) let l2 = ref (RCons(5, l1));;
val l1 : int rlist ref = {contents = RCons (4, {contents = Empty})} val l2 : int rlist ref =
{contents = RCons (5, {contents = RCons (4, {contents = Empty})})} # l1 := !l2;;
How does the list l1 and the list l2 look like?
60 c B. Pientka – Fall 2020
Chapter 5: References
Before the assignment l2
After the assignment l1 := !l2
l2 5
l1
5.4 Programming well with references
This is what l1 looks like…
This is what l2 looks like…
5
4
l1
l1
You can also ask OCaml to show you the result of l1. It is quite patient and will print the circular list repeating RCons(5, …) until you simply see
1 RCons (5,{contents = RCons (…)})
We can also observe its output behavior using the following function observe:’a rlist * int -> () . Given a reference list l and a bound n, this function will print the first n elements. If we would not have this bound n, then observe would print repeatedly the elements in a circular list and would loop for circular lists.
let rec observe l n = match l with | Empty -> print_string “0”
| RCons(x, l) ->
if n = 0 then print_string “STOP\n”
else (print_string (string_of_int x ^ ” “); observe !l (n-1))
1 2 3 4 5 6
1 2 3 4
So for example here is what happens if we want to observe how l1 looks like.
5.4.2 Destructive append and reverse
Next, consider appending two reference lists. Intuitively, the following should hap- pen: The first list is a pointer to a reference list. So we just walk to the end of this list, and re-assign it to the second list! What should the type of the function rapp be? Is there anything returned? No. We are destructively modifying the first list. So we will write the following function:
61 c B. Pientka – Fall 2020
Empty
5
5
# observe !l1 5;; 5 5 5 5 5 STOP
– : unit = ()
#
1 2 3 4 5 6
1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17
1 2 3 4 5 6 7 8 9
10
Note that as a side effect the reference list r1 is updated. This change can only be observed by reading r1 after we have appended another list. So for example:
Chapter 5: References
5.4 Programming well with references
type ’a refList = (’a rlist) ref
(* rapp: ’ a refList * ’a refList -> unit*)
let rec rapp r1 r2 = match r1 with
| {contents = Empty} -> r1 := !r2
| {contents = RCons (h,t)} -> rapp t r2
# let r1 = ref (RCons (1, ref Empty));;
val r1 : int rlist ref = {contents = RCons (1, {contents = Empty})} # let r2 = ref (RCons (4, ref (RCons (6, ref Empty))));;
val r2 : int rlist ref =
{contents = RCons (4, {contents = RCons (6, {contents = Empty})})} # rapp r1 r2;;
– : unit = ()
# r1 ;;
– : int rlist ref =
{contents =
RCons (1,
{contents = RCons (4, {contents = RCons (6, {contents = Empty})})})}
# r2 ;;
– : int rlist ref =
{contents = RCons (4, {contents = RCons (6, {contents = Empty})})}
Note how executing the function rapp has modified the list r1 is pointing to.
Next, let us consider how reverse could be implemented. Notice again that we accomplish this destructively, and the function rev returns unit as a result.
(* rev : ’a refList -> unit *)
let rec rev l0 =
let r = ref Empty in
let rec rev’ l = match l with
| {contents = Empty} -> l0 := !r | {contents = RCons(h,t)} ->
(r := RCons(h, ref (!r)); rev’ t)
in
rev’ l0
The idea is that we first create a tempory reference to an empty list. While we recursively traverse the input list l, we will pop the elements from l and push them on to the tempory reference list r. When we are done, we let reassign l. Note that l is a pointer to a list, and by l := !r we let this pointer now point to the reversed list. In our program, we use pattern matching on reference cells to retrieve the value stored in a location. Instead of pattern matching, we could also have used the !-operator for simply reading from a location the stored value.
62 c B. Pientka – Fall 2020
1 2 3 4 5 6 7 8 9
10
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8 9
Again we can observe the behavior.
Chapter 5: References
5.5 Closures, References and Objects
(* rev : ’a refList -> unit *)
let rev l0 =
let r = ref Empty in
let rec rev’ l = match !l with
in
| Empty | RCons (r := rev’
rev’ l0
-> l0 := !r
(h,t) ->
RCons(h, ref (!r)); t)
# rev r1;;
– : unit = ()
# r1;;
– : int rlist ref = {contents =
RCons (6,
{contents = RCons (4, {contents = RCons (1, {contents = Empty})})})}
#
Often in imperative languages, we explicitly return a value. This amounts to modifying the program and returning l0.
let rev l0 =
let r = ref Empty in
let rec rev’ l = match !l with
| Empty -> l0 := !r | RCons (h,t) ->
(r := RCons(h, ref (!r)); rev’ t)
in
(rev’ l0; l0)
1 2 3 4
It is not strictly necessary to actually return l0, as whatever location we passed as input to the function rev was updatd destructively and hence it will have changed.
5.5 Closures, References and Objects
We can also write a counter which is incremented by the function tick and reset by the function reset.
let counter = ref 0 in
let tick () = counter := !counter + 1; !counter in let reset () = (counter := 0) in
(tick , reset)
This declaration introduces two functions tick:unit -> int and reset:unit -> unit. Their definitions share a private variable counter that is bound to a reference
63 c B. Pientka – Fall 2020
Chapter 5: References 5.5 Closures, References and Objects
cell containing the current value of a shared counter. The tick operation increments the counter and returns its new value, and the reset operation resets its value to zero. The types already suggest that implicit state is involved. We then package the two functions together with a tuple.
Suppose we wish to have several different instances of a counter and different pairs of functions tick and reset. We can achieve this by defining a counter generator. We first declare a record counter_object which contains two functions (i.e. methods). You can think of this as the interface or signature of the object. We then define the methods in the function newCounter which when called will give us a new object with methods tick and reset.
1 2 3 4 5 6
1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19 20 21 22
type counter_object
let newCounter () = let counter = ref {tick = (fun () reset = fun ()
= {tick : unit -> int ; reset: unit -> unit}
0 in
-> counter := !counter + 1; !counter) ; -> counter := 0}
We’ve packaged the two operations into a record containing two functions that share private state. There is an obvious analogy with class-based object-oriented programming. The function newCounter may be thought of as a constructor for a class of counter objects. Each object has a private instance variable counter that is shared between the methods tick and reset.
Here is how to use counters.
# let c1 = newCounter();;
val c1 : counter_object = {tick =
val c2 : counter_object = {tick =
– : unit -> int =
# c1.tick ();;
– : int = 1
# c1.tick ();;
– : int = 2
# c2.tick ();;
– : int = 1
# c1.reset ();;
– : unit = ()
# c2.tick ();;
– : int = 2
# c2.tick ();;
– : int = 3
# c1.tick ();;
– : int = 1
#
Notice, that c1 and c2 are distinct counters!
64 c B. Pientka – Fall 2020
type student = {name : string ;
id: int;
birthdate: int * int * int; city : string;
mutable balance: float }
1 2 3 4 5 6
1
2 3
4
1 2 3 4 5
6
1 2 3 4 5 6
Chapter 5: References 5.6 Other Common Mutable Data-structures
5.6 Other Common Mutable Data-structures
We already have seen records as an example of a mutable data-structure in OCaml1. Records are useful to for example create a data entry. Here we create a data entry for a student by defining a record type student with fields name, id, birthdate, and city. We also include a field balance which describes the amount the student is owing to the university. We define the field balance as mutable which ensures that we can update it. For example, we want to set it to 0 once the outstanding balance has been paid.
We can now create the first student data entry.
To access and update the student’s balance we simply write:
# let s1 = {name = “James McGill”; id = 206234444; birthdate = (6, 10, 1744); city = “Montreal”; balance = 1650.0 };;
val s1 : student =
{name = “James McGill”; id = 206234444; birthdate = (6, 10, 1744); city = ”
Montreal”; balance = 1650.} #
# s1.balance <- 0.0;; - : unit = ()
# s1;;
- : student =
{name = "James McGill"; id = 206234444; birthdate = (6, 10, 1744); city = " Montreal"; balance = 0.}
#
We can only update a field in a record, if it is declared to be mutable. For example, we cannot update the city to Toronto, in case James McGill moved.
# s1.city <- "Toronto";; Characters 0-20:
s1.city <- "Toronto";;
^^^^^^^^^^^^^^^^^^^^
Error: The record field city is not mutable #
1Records are simply a labelled n-ary tuple where we can use the label to retrieve a given compo- nent. In OCaml records are a a labelled n-arry tuple where each entry itself is a location in memory. In general, records do not have to be mutable.
65 c B. Pientka – Fall 2020
Chapter 5: References 5.6 Other Common Mutable Data-structures
Last, we mention that OCaml has both static and dynamic arrays that resize them- selves when elements are added or removed, which could be used to create a data- base of student data entries. For more information on arrays, please see the docu- mentation:
https://caml.inria.fr/pub/docs/manual-ocaml/libref/Array.html
66 c B. Pientka – Fall 2020