Chapter 6
Abstraction
Type structure is a syntactic discipline for maintaining levels of
abstraction – John Reynolds, “Types, Abstraction and Parametric
Polymorphism”
Abstraction, also known as information hiding, is fundamental to computer
science. When faced with creating and maintaining a complex system, the
interactions of different components can be simplified by hiding the details of
each component’s implementation from the rest of the system.
Details of a component’s implementation are hidden by protecting it with
an interface. An interface describes the information which is exposed to other
components in the system. Abstraction is maintained by ensuring that the rest
of the system is invariant to changes of implementation that do not affect the
interface.
6.1 Abstraction in OCaml
6.1.1 Modules
The most powerful form of abstraction in OCaml is achieved using the mod-
ule system. The module system is basically its own language within OCaml,
consisting of modules and module types. All OCaml definitions (e.g. values,
types, exceptions, classes) live within modules, so the module system’s support
for abstraction includes support for abstraction of any OCaml definition.
Structures
A structure creates a module from a collection of OCaml definitions. For exam-
ple, the following defines a module with the definitions of a simple implementa-
tion of a set of integers:
module IntSet = struct
61
62 CHAPTER 6. ABSTRACTION
type t = int l i s t
l e t empty = [ ]
l e t is_empty = function
| [ ] -> true
| _ -> f a l s e
l e t equal_member (x : int ) (y : int ) =
x = y
l e t rec mem x = function
| [ ] -> f a l s e
| y : : r e s t ->
i f (equal_member x y) then true
e l s e mem x res t
l e t add x t =
i f (mem x t ) then t
e l s e x : : t
l e t rec remove x = function
| [ ] -> [ ]
| y : : r e s t ->
i f (equal_member x y) then re s t
e l s e y : : ( remove x re s t )
l e t to_l i s t t = t
end
The module IntSet uses lists of integers to represent sets of integers. This
is indicated by the inclusion of a type t defined as an alias to int list. The
implementation provides the basic operations of sets as a collection of functions
that operate on these int lists.
The components of a structure are accessed using the . operator. For exam-
ple, the following creates a set containing 1, 2 and 3.
l e t one_two_three : IntSet . t =
IntSet . add 1 ( IntSet . add 2 ( IntSet . add 3 IntSet . empty) )
A structure’s components can also be made available using open to avoid
needing to repeatedly use the . operator:
open IntSet
l e t one_two_three : t =
add 1 (add 2 (add 3 empty) )
There is also a scoped opening syntax to temporarily make a structure’s com-
ponents available without the . operator:
l e t one_two_three : IntSet . t =
IntSet . ( add 1 (add 2 (add 3 empty) ) )
Structures can be built from other structures using include. For example,
we can build a structure containing all the components of IntSet as well as a
singleton function:
6.1. ABSTRACTION IN OCAML 63
module IntSetPlus = struct
include IntSet
l e t s ing leton x = add x empty
end
Signatures
Signatures are interfaces for structures. They are a kind of module type, and
the most general signature is automatically inferred for a structure definition.
The signature inferred for our IntSet structure is as follows:
s i g
type t = int l i s t
val empty : ’ a l i s t
val is_empty : ’ a l i s t -> bool
val equal_member : int -> int -> bool
val mem : int -> int l i s t -> bool
val add : int -> int l i s t -> int l i s t
val remove : int -> int l i s t -> int l i s t
val to_l i s t : ’ a -> ’a
end
We can use a signature to hide components of the structure, and also to
expose a component with a restricted type. For example, we can remove the
equal_member function, and restrict empty, is_empty and to_list to only op-
erate on int lists:
module IntSet : s i g
type t = int l i s t
val empty : int l i s t
val is_empty : int l i s t -> bool
val mem : int -> int l i s t -> bool
val add : int -> int l i s t -> int l i s t
val remove : int -> int l i s t -> int l i s t
val to_l i s t : int l i s t -> int l i s t
end = struct
. . .
end
For convenience, we can name the signature using a module type declaration:
module type IntSetS = s ig
type t = int l i s t
val empty : int l i s t
val is_empty : int l i s t -> bool
val mem : int -> int l i s t -> bool
val add : int -> int l i s t -> int l i s t
val remove : int -> int l i s t -> int l i s t
val to_l i s t : int l i s t -> int l i s t
end
module IntSet : IntSetS = struct
. . .
end
64 CHAPTER 6. ABSTRACTION
Abstract types
The above definition of IntSet still exposes the fact that our sets of integers are
represented using int list. This means that code outside of the module may rely
on the fact that our sets are lists of integers. For example,
l e t print_set ( s : IntSet . t ) : unit =
l e t rec loop = function
| x : : xs -> print_int x ; print_str ing ” ” ; loop xs
| [ ] -> ()
in
print_string ”{ ” ;
loop s ;
print_string ”}”
Such code is correct, but it will break if we later decide to use a different
representation for our sets of integers.
In order to prevent this, we must make the type alias IntSet.t into an abstract
type, by hiding its definition as an alias of int list. This gives us the following
definition:
module type IntSetS = s ig
type t
val empty : t
val is_empty : t -> bool
val mem : int -> t -> bool
val add : int -> t -> t
val remove : int -> t -> t
val to_l i s t : t -> int l i s t
end
module IntSet : IntSetS = struct
. . .
end
Observe that we also change int list in the types of the functions to t (except
for the result of to_list).
Now that the type is abstract, code outside of IntSet can only pass the set
values around and use the functions in IntSet to create new ones, it cannot
use values of type IntSet.t in any other way because it cannot see the type’s
definition.
This means that the implementation of IntSet can be replaced with a more
efficient one (perhaps based on binary trees), safe in the knowledge that the
change will not break any code outside of IntSet.
Compilation Units
In OCaml, every source file defines a structure (e.g. “foo.ml” defines a module
named Foo). The signature for these modules is defined in a corresponding
interface file (e.g. “foo.mli” defines the signature of the Foo module). Note that
all such compilation units in a program must have a unique name.
6.1. ABSTRACTION IN OCAML 65
6.1.2 Invariants
Abstraction has further implications beyond the ability to replace one implemen-
tation with another. In particular, abstraction allows us to preserve invariants
on types.
Consider the following module:
module Pos i t ive : s i g
type t
val zero : t
val succ : t -> t
val to_int : t -> int
end = struct
type t = int
l e t zero = 0
l e t succ x = x + 1
l e t to_int x = x
end
Here the abstract type t is represented by an int. However, we can also show
that, thanks to abstraction, all values of type t will be positive integers1.
Informally, this is because all values of type t must be created using either
zero (which is a positive integer), or succ (which returns a positive integer when
given a positive integer), so all values of type t must be positive integers.
6.1.3 The meaning of types
The ability for types to represent invariants beyond their particular data rep-
resentation fundamentally changes the notion of what a type is. It is a key
difference between languages with abstraction (e.g. System F) and languages
without it (e.g the simply typed lambda calculus).
In the simply typed lambda calculus types only represent particular data rep-
resentations. For example, Bool → Bool represents functions that take a boolean
and produce a boolean. In this setting, the purpose of types is to prevent mean-
ingless expressions which have no defined behaviour according to the semantics
of the language. Types cannot be used to represent concepts beyond these
semantics.
This prevents certain kinds of bugs but the majority of bugs in a program are
not related to the semantics of the language but to the intended semantics of the
program. For example, a program might be using integers to represent peoples’
heights, and whilst it would be a bug if a value intending to represent a height
was in fact a function, a more likely mistake would be for a person’s height to
be represented by a negative integer. Although the language’s semantics define
how this negative integer should behave for all the operations which could be
performed on a height (e.g. addition) it is still not an acceptable value for a
height.
Abstraction extends the possible meanings of types to include arbitrary in-
variants on values. This allows types to represent concepts within the intended
1We ignore overflow for the sake of simplicity
66 CHAPTER 6. ABSTRACTION
semantics of the program. For example, abstraction can create a type that rep-
resents people’s heights, and ensure that all values of this type are acceptable
values for a person’s height.
6.1.4 Phantom types
The Positive.t type in the earlier example represented not just the data repre-
sentation (an integer) but also an invariant (that the integer be positive). Using
higher-kinded types we can take the idea of types as invariants even further.
Consider the following file I/O interface:
module Fi l e : s i g
type t
val open_readwrite : s t r ing -> t
val open_readonly : s t r ing -> t
val read : t -> st r ing
val write : t -> st r ing -> unit
end = struct
type t = int
l e t open_readwrite f i lename = . . .
l e t open_readonly fi lename = . . .
l e t read f = . . .
l e t write f s = . . .
end
It allows files to be opened in either read-only or read-write mode, and it pro-
vides functions to read from and write to these files.
One problem interface is that it does not prevent you from trying to write to
a file which was opened read-only. Instead, such attempts result in a run-time
error:
# le t f = Fi l e . open_readonly ” foo ” in
Fi l e . write f ”bar ” ; ;
Except ion : Inval id_argument ” wr i t e : f i l e i s read−only ” .
This is unfortunate, since such errors could easily be caught at compile-time,
giving us more confidence in the correctness of our programs.
To detect these errors at compile-time we add a type parameter to the File.t
type, which represents whether the file was opened in read-only or read-write
mode. Each mode is represented by a type without a definition (readonly and
readwrite). These types have no data representation – they only exist to repre-
sent invariants:
module Fi l e : s i g
type readonly
type readwrite
type ’a t
val open_readwrite : s t r ing -> readwrite t
val open_readonly : s t r ing -> readonly t
val read : ’ a t -> st r ing
val write : readwrite t -> st r ing -> unit
end = struct
type readonly
6.1. ABSTRACTION IN OCAML 67
type readwrite
type ’a t = int
l e t open_readwrite f i lename = . . .
l e t open_readonly fi lename = . . .
l e t read f = . . .
l e t write f s = . . .
end
The return types of open_readonly and open_readwite are restricted to pro-
ducing files whose type parameter represents the appropriate mode. Similarly,
write is restricted to only operate on values of type readwrite t. This prevents
the errors we are trying to avoid. However, read is polymorphic in the mode of
the file to be read – it will operate on files opened in either mode.
Note that the File.t type is still defined as an integer. The type parameter is
not actually used in the type’s definition: it is a phantom type. Within the File
module the type readonly t is equal to the type readwrite t – since they both
equal int. However, thanks to abstraction, these types are not equal outside
of the module and the invariant that files opened by open_readonly cannot be
passed to write is preserved.
Using this interface, the previous example now produces a compiler-time
error:
# le t f = Fi l e . open_readonly ” foo ” in
Fi l e . write f ”bar ” ; ;
Characters 51−52:
F i l e . wr i t e f ” bar ” ; ;
^
Error : This expre s s i on has type F i l e . readon ly F i l e . t
but an expre s s i on was expec ted o f type
F i l e . r eadwr i t e F i l e . t
Type F i l e . readon ly i s not compat i b l e wi th type
F i l e . r eadwr i t e
6.1.5 The meaning of types (continued)
Just as abstraction allows types to represent more than just a particular data
representation, higher-kinded abstraction allows types to represent an even
wider set of concepts. Base-kinded abstraction restricts types to directly rep-
resenting invariants on values, with each type corresponding to particular set
of values. With higher-kinded abstraction, types can represent more general
concepts without a direct correspondence to values.
For example, the readonly type in the above example represents the general
concept of “read-only mode”. There are no actual values of this type since it
does not directly correspond to a property of values themselves. The File.t type
can then be parameterized by this concept in order to represent file handles for
read-only files.
68 CHAPTER 6. ABSTRACTION
Further types, (e.g. channels) may also be parameterized by the same con-
cept, allowing types to express relationships between these values (e.g. a func-
tion which takes a file and produces a channel of the same mode).
6.1.6 Existential types in OCaml
We have seen that OCaml’s module system provides abstraction for all OCaml
definitions. This includes abstract types, which are closely related to existential
types in System F𝜔. However, OCaml also provides more direct support for
existential types within its core language. This can sometimes be more conve-
nient than using the module system, which is quite verbose, but it only works
for types of kind *.
Type inference for general existential types is undecidable. As an illustration,
consider the following OCaml function:
fun p x y -> i f p then x e l s e y
This expression could have a number of System F𝜔 types, including:
∀𝛼 : : * . Bool → 𝛼 → 𝛼 → 𝛼
∀𝛼 : : * . ∀𝛽 : : * . Bool → 𝛼 → 𝛽 → ∃𝛾 : : * . 𝛾
and none of these types is more general than the rest, so we require some
annotations in order to type-check programs involving existentials. The required
annotations include explicit pack and open statements, as well as explicitly
specifying the type of the existential created by a pack statement.
Rather than directly using open and pack with type annotations, existential
types in OCaml are provided through sum types. The constructors of the sum
type act as pack statements in expressions, and open statements in patterns.
The declaration of a sum type includes specifying the types of its construc-
tors arguments, which provide us with the required type annotations for pack
statements.
The following definition defines a type corresponding to ∃𝛼.𝛼 × (𝛼 → 𝛼) ×
(𝛼 → string):
type t = E : ’a * ( ’ a -> ’a) * ( ’ a -> st r ing ) -> t
Building a value using the E constructor corresponds to the pack operation
of System F𝜔:
l e t in t s = E(0 , ( fun x -> x + 1) , string_of_int )
l e t f l o a t s = E(0 .0 , ( fun x -> x +. 1 .0) , str ing_of_float )
Destructing a value using the E constructor with let or match corresponds
to the open operation of System F𝜔:
l e t E(z , s , p) = int s in
p ( s ( s z ) )
6.1. ABSTRACTION IN OCAML 69
6.1.7 Example: lightweight static capabilities
To illustrate the kind of invariants that can be enforced using higher-kinded
abstraction, we will look at an example from the paper “Lightweight Static
Capabilities” (Kiselyov and Shan [2007]).
Consider the following interface for an array type:
module Array : s i g
type ’a t = ’a array
val length : ’ a t -> int
val set : ’ a t -> int -> ’a -> unit
val get : ’ a t -> int -> ’a
end
We can use this interface to try to write binary search of a sorted array:
l e t search cmp arr v =
l e t rec look low high =
i f high < low then None
e l s e begin
l e t mid = ( high + low)/2 in
l e t x = Array . get arr mid in
l e t res = cmp v x in
i f res = 0 then Some mid
e l s e i f res < 0 then look low (mid - 1)
e l s e look (mid + 1) high
end
in
look 0 (Array . length arr )
This function takes a comparison function cmp, an array arr sorted according to
cmp and a value v. If v is in arr then the function returns its index, otherwise
it returns None.
However, if we try a few examples with this function, we find that there is
a problem:
# le t arr = [ | ’ a ’ ; ’ b ’ ; ’ c ’ ; ’ d ’ | ] ; ;
v a l arr : char array = [ | ’ a ’ ; ’ b ’ ; ’ c ’ ; ’ d ’ | ]
# le t test1 = search compare arr ’ c ’ ; ;
v a l t e s t 1 : i n t op t ion = Some 2
# le t test2 = search compare arr ’a ’ ; ;
v a l t e s t 2 : i n t op t ion = Some 0
# le t test3 = search compare arr ’x ’ ; ;
Except ion : Inval id_argument ” index out o f bounds ” .
70 CHAPTER 6. ABSTRACTION
Our last example raises an Invalid_argument exception because we have tried
to access an index outside the bounds of the array.
The problem is easy enough to fix – we need to change the last line to use
the index of the last element of arr rather than its length:
look 0 ((Array . length arr ) - 1)
However, we would rather catch such mistakes at compile-time.
To prevent out-of-bounds accesses at compile-time, we add another type
parameter to the array type, which represents the size of the array. We also
replace int with an abstract type index for representing array indices. The
index type is also parameterized by a size type, which indicates that the index
is within the bounds of arrays of that size.
module BArray : s i g
type ( ’ s , ’ a ) t
type ’ s index
val l a s t : ( ’ s , ’ a ) t -> ’ s index
val set : ( ’ s , ’ a ) t -> ’ s index -> ’a -> unit
val get : ( ’ s , ’ a ) t -> ’ s index -> ’a
end
The types of set and get ensure that only indices that are within the bounds of
the array are allowed by enforcing the size parameter of the array and the index
to match.
We could try to use sophisticated types to represent sizes – perhaps en-
coding the size using type-level arithmetic. This would allow us to represent
relationships between the different sizes – for instance allowing us to represent
one array being smaller than another. However, such sophistication comes with
added complexity, so we will take a simpler approach: size types will be abstract.
We extend our array interface with a function brand:
type ’a brand =
| Brand : ( ’ s , ’ a ) t -> ’a brand
| Empty : ’ a brand
val brand : ’ a array -> ’a brand
The Brand constructor contains a value of type (’s, ’a) t, where ’s is an existential
type variable. In essence, the brand function takes a regular OCaml array and
returns a combination of an abstract size type and a t value of that size.
Since the size of each branded array is abstract, we cannot use indices for
one array to access a different array:
# le t Brand x = brand [ | ’a ’ ; ’b ’ ; ’ c ’ ; ’d ’ | ] in
l e t Brand y = brand [ | ’a ’ ; ’b ’ | ] in
get y ( l a s t x) ; ;
Characters 96−104:
g e t y ( l a s t x ) ; ;
^^^^^^^^
Error : This expre s s i on has type s#1 BArray . index
6.1. ABSTRACTION IN OCAML 71
but an expre s s i on was expec ted o f type
s#2 BArray . index
Type s#1 i s not compat i b l e wi th type s#2
Finally, we add some functions to our interface for manipulating indices.
val zero : ’ s index
val l a s t : ( ’ s , ’ a ) t -> ’ s index
val index : ( ’ s , ’ a ) t -> int -> ’ s index option
val pos i t ion : ’ s index -> int
val middle : ’ s index -> ’ s index -> ’ s index
val next : ’ s index -> ’ s index -> ’ s index option
val previous : ’ s index -> ’ s index ->
’ s index option
Each of these functions must maintain the invariant that an index of type ’s
index is always valid for an array of type (’s, ’a) t. For example, the next
function, which takes an index and returns the index of the next element in the
array, also takes an additional index parameter and will only return the new
index if it is less than this additional index. This ensures that the new index
lies between two existing indices, and is therefore a valid index.
The full implementation of this safe array interface is given in Fig. 6.1. We
can use it to implement our binary search function without too many changes:
l e t bsearch cmp arr v =
l e t open BArray in
l e t rec look barr low high =
l e t mid = middle low high in
l e t x = get barr mid in
l e t res = cmp v x in
i f res = 0 then Some ( pos i t ion mid)
e l s e i f res < 0 then
match previous low mid with
| Some prev -> look barr low prev
| None -> None
e l s e
match next mid high with
| Some next -> look barr next high
| None -> None
in
match brand arr with
| Brand barr -> look barr zero ( l a s t barr )
| Empty -> None
This function is guaranteed not to make an out-of-bounds access to the array,
giving us greater confidence in the correctness of its implementation.
Abstraction is the key to this technique. Thanks to abstraction we know that
if the implementation of BArray preserves our invariant then so must the entire
program. In essence, we have reduced the problem of proving our invariant for
the whole problem to proving our invariant for a small trusted kernel – if we
trust the implementation of this kernel we can trust the entire program.
72 CHAPTER 6. ABSTRACTION
type ( ’ s , ’ a ) t = ’a array
type ’a brand =
| Brand : ( ’ s , ’ a ) t -> ’a brand
| Empty : ’ a brand
l e t brand arr =
i f Array . length arr > 0 then Brand arr
e l s e Empty
type ’ s index = int
l e t index arr i =
i f i > 0 && i < Array . length arr then Some i
e l s e None
l e t pos i t ion idx = idx
l e t zero = 0
l e t l a s t arr = (Array . length arr ) - 1
l e t middle idx1 idx2 = ( idx1 + idx2 )/2
l e t next idx l imi t =
l e t next = idx + 1 in
i f next <= l imi t then Some next
e l s e None
l e t previous l imi t idx =
l e t prev = idx - 1 in
i f prev >= l imi t then Some prev
e l s e None
l e t set = Array . set
l e t get = Array . get
Figure 6.1: Implementation of the safe array interface
6.2. ABSTRACTION IN SYSTEM FΩ 73
As an additional benefit, we can safely adjust our implementation to use the
unsafe variants of OCaml’s get and set primitives:
l e t set = Array . unsafe_set
l e t get = Array . unsafe_get
This means that our array accesses will not perform any run-time checks for
out-of-bounds accesses: by using abstraction to preserve a safety invariant we
are able to improve the performance of our programs.
6.2 Abstraction in System F𝜔
6.2.1 Existential types
The abstract types in OCaml’s module system correspond to existential types in
System F𝜔. Just like abstract types, existentials can pack together operations
on a shared type, without exposing the definition of that type. As an example we
will implement our IntSet with an abstract type using existentials in System F𝜔.
For convenience, we will use natural numbers instead of integers and use simpler,
less efficient, implementations of the set operations.
First, we create a type constructor and some functions for dealing with the
products that represent the structures we are implementing:
NatSetImpl =
𝜆𝛼 : : * .
𝛼
× (𝛼 → Bool )
× (Nat → 𝛼 → Bool )
× (Nat → 𝛼 → 𝛼)
× (Nat → 𝛼 → 𝛼)
× (𝛼 → List Nat) ;
empty = Λ𝛼 : : * . 𝜆s : NatSetImpl 𝛼 .𝜋1 s ;
is_empty = Λ𝛼 : : * . 𝜆s : NatSetImpl 𝛼 .𝜋2 s ;
mem = Λ𝛼 : : * . 𝜆s : NatSetImpl 𝛼 .𝜋3 s ;
add = Λ𝛼 : : * . 𝜆s : NatSetImpl 𝛼 .𝜋4 s ;
remove = Λ𝛼 : : * . 𝜆s : NatSetImpl 𝛼 .𝜋5 s ;
to_l i s t = Λ𝛼 : : * . 𝜆s : NatSetImpl 𝛼 .𝜋6 s ;
Now we can create our implementation of sets of naturals, and give it the
type corresponding to the abstract IntSet signature using pack:
nat_set_package =
pack List Nat , 〈
n i l [Nat ] ,
isempty [Nat ] ,
𝜆n :Nat . fo ld [Nat ] [ Bool ]
(𝜆x :Nat .𝜆y : Bool . or y ( equal_nat n x) )
fa l s e ,
cons [Nat ] ,
𝜆n :Nat . fo ld [Nat ] [ L ist Nat ]
(𝜆x :Nat .𝜆 l : L i st Nat
i f ( equal_nat n x) [ List Nat ] l ( cons [Nat ] x l ) )
74 CHAPTER 6. ABSTRACTION
( n i l [Nat ] ) ,
𝜆 l : L i st Nat . l 〉
as ∃𝛼 : : * . NatSetImpl 𝛼 ;
By opening nat_set_package as nat_set in the environment using open
open nat_set_package as NatSet , nat_set ;
we are able to write one_two_three in System F𝜔:
one_two_three =
(add [ NatSet ] nat_set ) one
(( add [ NatSet ] nat_set ) two
((add [ NatSet ] nat_set ) three
(empty [ NatSet ] nat_set ) ) ) ;
If we look at the typing rules for existentials (Section 2.3.1), we can see that
the type which is packed (List Nat) is not present in the type of the package
(∃𝛼::*.NatSetImpl 𝛼) – it is replaced by a fresh type variable (𝛼). As with
OCaml’s abstract types, this means code outside of nat_set_package can only
pass the set values around and use the functions in nat_set_package to create
new ones, it cannot use values of type 𝛼 in any other way, because the it cannot
see the type’s definition.
This means that we can replace nat_set_impl with a more efficient imple-
mentation, safe in the knowledge that the change will not break code using
nat_set_package.
6.2.2 Relational abstraction
In the previous sections we have been quite loose in our description of abstrac-
tion. We have talked about abstraction as invariance under change of imple-
mentation, but we have not made this notion precise.
We can give precise descriptions of abstraction using relations between types.
To keep things simple we will restrict ourselves to System F for this discussion.
Changing set implementations
We have talked about abstraction in terms of a system being invariant under a
change of a component’s implementation that does not affect the component’s
interface. In order to give a precise definition to abstraction, we must consider
what it means to change a component’s implementation without affecting its
interface.
For example, consider the interface for sets of integers in Fig. 6.2. This is a
reduced version of the set interface used earlier in the chapter, with the addition
of the if_empty function. The if_empty function takes a set and two values as
arguments, if the set is empty it returns the first argument, otherwise it returns
the second argument.
Two implementations of this interface are shown in Fig. 6.3 and Fig. 6.4–
one based on lists, the other based on ordered trees.
6.2. ABSTRACTION IN SYSTEM FΩ 75
type t
val empty : t
val is_empty : t -> bool
val mem : t -> int -> bool
val add : t -> int -> t
val if_empty : t -> ’a -> ’a -> ’a
Figure 6.2: A set interface
type t list = int l i s t
l e t emptylist = [ ]
l e t is_emptylist = function
| [ ] -> true
| _ -> f a l s e
l e t rec memlist x = function
| [ ] -> f a l s e
| y : : r e s t ->
i f x = y then true
e l s e memlist x re s t
l e t addlist x t =
i f (memlist x t ) then t
e l s e x : : t
l e t if_emptylist t x y =
match t with
| [ ] -> x
| _ -> y
Figure 6.3: List implementation of the set interface
76 CHAPTER 6. ABSTRACTION
type ttree =
| Empty
| Node of ttree * int * ttree
l e t emptytree = Empty
l e t is_emptytree = function
| Empty -> true
| _ -> f a l s e
l e t rec memtree x = function
| Empty -> f a l s e
| Node( l , y , r ) ->
i f x = y then true
e l s e i f x < y then memtree x l
e l s e memtree x r
l e t rec addtree x t =
match t with
| Empty -> Node(Empty, x , Empty)
| Node( l , y , r ) as t ->
i f x = y then t
e l s e i f x < y then Node(addtree x l , y , r )
e l s e Node( l , y , addtree x r )
l e t if_emptytree t x y =
match t with
| Empty -> x
| _ -> y
Figure 6.4: Tree implementation of the set interface
6.2. ABSTRACTION IN SYSTEM FΩ 77
We would like to know that swapping one implementation with the other
will not affect the rest of our program. In other words, how do we show that
switching between these implementations will not affect the interface?
Relations between types
If the t types in our two implementations both represent sets then there must
be some relation between these that describes how sets in one representation
can be represented in the other representation.
In other words, given a set represented as a list and a set represented as a
tree there must be a relation that tells us if they represent the same set. For
example, the list [] and the tree Empty both represent the empty set. Similarly
the lists [1; 2] and [2; 1], and the trees Node(Node(Empty, 1, Empty), 2, Empty)
and Node(Empty, 1, Node(Empty, 2, Empty)) all represent a set containing 1
and 2.
Throughout this chapter we shall use relations of the following form:
(𝑥 ∶ 𝐴, 𝑦 ∶ 𝐵).𝜙[𝑥, 𝑦]
where A and B are System F types, and 𝜙[𝑥, 𝑦] is a logical formula involving 𝑥
and 𝑦.
We will not overly concern ourselves with the particular choice of logic used
for these formulae, but we will assume the existence of certain kinds of term:
• We will assume the that we have basic logical connectives:
𝜙 ∶∶= 𝜙 ∧ 𝜓 | 𝜙 ∨ 𝜓 | 𝜙 ⇒ 𝜓
• We will assume that we have universal quantification over terms, types
and relations:
𝜙 ∶∶= ∀𝑥 ∶ 𝐴.𝜙 | ∀𝛼.𝜙 | ∀𝑅 ⊂ 𝐴 × 𝐵.𝜙
and similarly for existential quantification:
𝜙 ∶∶= ∃𝑥 ∶ 𝐴.𝜙 | ∃𝛼.𝜙 | ∃𝑅 ⊂ 𝐴 × 𝐵.𝜙
• We will assume that we can apply a relation to terms:
𝜙 ∶∶= 𝑅(𝑡, 𝑢)
• We will assume that we have equality on terms at a given type:
𝜙 ∶∶= (𝑡 =𝐴 𝑢)
which represents the equational theory of System F (e.g. beta equalities,
eta equalities).
In the case of our set implementations we leave the precise logical formula
as an exercise for the reader, and will simply refer to the relation as 𝜎.
78 CHAPTER 6. ABSTRACTION
Relations between values
To show that the values in our implementations implement the same interface,
we must show that they have the same behaviour in terms of the sets being
represented. For each value, this equivalence of behaviour can be represented
by a relation. Considering each of the values in our set interface in turn:
empty The empty values of our implementations behave the same if they rep-
resent the same set. More precisely:
𝜎(empty
list
, empty
tree
)
where 𝜎 is the relation between sets as lists and sets as trees.
is_empty The is_empty values behave the same if they agree about which
sets are empty. They should return true on the same sets and false on the same
sets. More precisely:
∀𝑥 ∶ 𝑡list . ∀𝑦 ∶ 𝑡tree .
𝜎(𝑥, 𝑦) ⇒ (is_empty
list
𝑥 = is_empty
tree
𝑦)
mem The mem values behave the same if they agree about which integers are
members of which sets. Their results should be the equivalent when given the
same sets and integers. More precisely:
∀𝑥 ∶ 𝑡list . ∀𝑦 ∶ 𝑡tree . ∀𝑖 ∶ 𝐼𝑛𝑡. ∀𝑗 ∶ 𝐼𝑛𝑡.
𝜎(𝑥, 𝑦) ⇒ (𝑖 = 𝑗) ⇒ (memlist 𝑥 𝑖 = memtree 𝑦 𝑗)
add The relation for add values is similar to that for mem values, except that
instead of requiring that the results be equivalent we require that they represent
the same set:
∀𝑥 ∶ 𝑡list . ∀𝑦 ∶ 𝑡tree . ∀𝑖 ∶ 𝐼𝑛𝑡. ∀𝑗 ∶ 𝐼𝑛𝑡.
𝜎(𝑥, 𝑦) ⇒ (𝑖 = 𝑗) ⇒ 𝜎(addlist 𝑥 𝑖, addtree 𝑦 𝑗)
if_empty The relation for if_empty is more complicated than the others. We
might be tempted to use the relation:
∀𝛾. ∀𝛿.
∀𝑥 ∶ 𝑡list . ∀𝑦 ∶ 𝑡tree . ∀𝑎 ∶ 𝛾. ∀𝑏 ∶ 𝛾. ∀𝑐 ∶ 𝛿. ∀𝑑 ∶ 𝛿.
𝜎(𝑥, 𝑦) ⇒ (𝑎 = 𝑐) ⇒ (𝑏 = 𝑑) ⇒
(if_empty
list
𝑥 𝑎 𝑏 = if_empty
tree
𝑦 𝑐 𝑑)
which would ensure that the behaviour was the same for calls like:
if_empty t 5 6
6.2. ABSTRACTION IN SYSTEM FΩ 79
where t is a value representing a set. However, it would not ensure equivalent
behaviour for calls such as:
if_empty t t (add t 1)
where the second and third arguments are also sets. In this case, we do not
want to guarantee that our if_empty implementations will produce equivalent
sets when given equivalent inputs, since a set represented as a list will never be
equivalent to a set represented as a tree. Instead we would like to guarantee
that our implementations will produce related results when given related inputs.
This leads us to the much stronger relation:
∀𝛾. ∀𝛿. ∀𝜌 ⊂ 𝛾 × 𝛿.
∀𝑥 ∶ 𝑡list . ∀𝑦 ∶ 𝑡tree . ∀𝑎 ∶ 𝛾. ∀𝑏 ∶ 𝛾. ∀𝑐 ∶ 𝛿. ∀𝑑 ∶ 𝛿.
𝜎(𝑥, 𝑦) ⇒ 𝜌(𝑎, 𝑐) ⇒ 𝜌(𝑏, 𝑑) ⇒
𝜌(if_empty
list
𝑥 𝑎 𝑏, if_empty
tree
𝑦 𝑐 𝑑)
that must be satisfied by our implementations of if_empty. This condition en-
sures that all relations will be preserved by if_empty, including equality between
integers and 𝜎 between sets.
The existence of the relation 𝜎 along with demonstrations that each of the
five relations above hold, is sufficient to demonstrate that our two implementa-
tions implement the same interface. One can safely be replaced by the other,
without affecting any of the other components in the system. By generalising
this approach we can produce a precise definition of abstraction.
The relational interpretation
The table in Fig. 6.5 compares the types of each of the values in our set interface
with the relations that they must satisfy. From this we can see that the type of
the value completely determines the relation:
• Every t in the type produces as 𝜎 in the relation.
• Every free type variable (e.g. int, bool) in the type produces an equality
in the relation.
• Every -> in the type produces an implication in the relation.
• Every universal quantification over types in the type produces a universal
quantification over relations in the relation.
We can represent this translation as an interpretation of types as relations.
Given a type 𝑇 with free variables ⃗⃗ ⃗⃗𝛼 = 𝛼1, … , 𝛼𝑛 and relations ⃗𝜌 = 𝜌1 ⊂
𝐴1 × 𝐵1, … , 𝜌𝑛 ⊂ 𝐴𝑛 × 𝐵𝑛, we define the relation J𝑇 K[ ⃗𝜌] ⊂ 𝑇 [ ⃗⃗ ⃗⃗ ⃗⃗𝐴] × 𝑇 [ ⃗⃗⃗ ⃗⃗𝐵] as
follows:
• if 𝑇 is 𝛼𝑖 then J𝑇 K[ ⃗𝜌] = 𝜌𝑖
80 CHAPTER 6. ABSTRACTION
val empty:
t 𝜎(empty
list
, empty
tree
)
val is_empty:
t -> bool
∀𝑥 ∶ 𝑡list . ∀𝑦 ∶ 𝑡tree .
𝜎(𝑥, 𝑦) ⇒ (is_empty
list
𝑥 = is_empty
tree
𝑦)
val mem:
t -> int -> bool
∀𝑥 ∶ 𝑡list . ∀𝑦 ∶ 𝑡tree . ∀𝑖 ∶ 𝐼𝑛𝑡. ∀𝑗 ∶ 𝐼𝑛𝑡.
𝜎(𝑥, 𝑦) ⇒ (𝑖 = 𝑗) ⇒
(memlist 𝑥 𝑖 = memtree 𝑦 𝑗)
val add:
t -> int -> t
∀𝑥 ∶ 𝑡list . ∀𝑦 ∶ 𝑡tree . ∀𝑖 ∶ 𝐼𝑛𝑡. ∀𝑗 ∶ 𝐼𝑛𝑡.
𝜎(𝑥, 𝑦) ⇒ (𝑖 = 𝑗) ⇒
𝜎(addlist 𝑥 𝑖, addtree 𝑦 𝑗)
val if_empty:
t -> ’a -> ’a -> ’a
∀𝛾. ∀𝛿. ∀𝜌 ⊂ 𝛾 × 𝛿.
∀𝑥 ∶ 𝑡list . ∀𝑦 ∶ 𝑡tree . ∀𝑎 ∶ 𝛾. ∀𝑏 ∶ 𝛾. ∀𝑐 ∶ 𝛿. ∀𝑑 ∶ 𝛿.
𝜎(𝑥, 𝑦) ⇒ 𝜌(𝑎, 𝑐) ⇒ 𝜌(𝑏, 𝑑) ⇒
𝜌(if_empty
list
𝑥 𝑎 𝑏, if_empty
tree
𝑦 𝑐 𝑑)
Figure 6.5: Types and relations for the set interface
6.2. ABSTRACTION IN SYSTEM FΩ 81
• if 𝑇 is 𝑇 ′ × 𝑇 ″ then
J𝑇 K[ ⃗𝜌] = (𝑥 ∶ 𝑇 [ ⃗⃗ ⃗⃗ ⃗⃗𝐴], 𝑦 ∶ 𝑇 [ ⃗⃗⃗ ⃗⃗𝐵]).J𝑇 ′K[ ⃗𝜌](𝑓𝑠𝑡(𝑥), 𝑓𝑠𝑡(𝑦))
∧ J𝑇 ″K[ ⃗𝜌](𝑠𝑛𝑑(𝑥), 𝑠𝑛𝑑(𝑦))
• if 𝑇 is 𝑇 ′ + 𝑇 ″ then
J𝑇 K[ ⃗𝜌] = (𝑥 ∶ 𝑇 [ ⃗⃗ ⃗⃗ ⃗⃗𝐴], 𝑦 ∶ 𝑇 [ ⃗⃗⃗ ⃗⃗𝐵]).
∃𝑢′ ∶ 𝑇 ′[ ⃗⃗ ⃗⃗ ⃗⃗𝐴]. ∃𝑣′ ∶ 𝑇 ′[ ⃗⃗⃗ ⃗⃗𝐵].
𝑥 = 𝑖𝑛𝑙(𝑢′) ∧ 𝑦 = 𝑖𝑛𝑙(𝑣′)
∧ J𝑇 ′K[ ⃗𝜌](𝑢′, 𝑣′)
∨
∃𝑢″ ∶ 𝑇 ″[ ⃗⃗ ⃗⃗ ⃗⃗𝐴]. ∃𝑣″ ∶ 𝑇 ″[ ⃗⃗⃗ ⃗⃗𝐵].
𝑥 = 𝑖𝑛𝑟(𝑢″) ∧ 𝑦 = 𝑖𝑛𝑟(𝑣″)
∧ J𝑇 ″K[ ⃗𝜌](𝑢″, 𝑣″)
• if 𝑇 is 𝑇 ′ → 𝑇 ″ then
J𝑇 K[ ⃗𝜌] = (𝑓 ∶ 𝑇 [ ⃗⃗ ⃗⃗ ⃗⃗𝐴], 𝑔 ∶ 𝑇 [ ⃗⃗⃗ ⃗⃗𝐵]).
∀𝑢 ∶ 𝑇 ′[ ⃗⃗ ⃗⃗ ⃗⃗𝐴]. ∀𝑣 ∶ 𝑇 ′[ ⃗⃗⃗ ⃗⃗𝐵].J𝑇 ′K[ ⃗𝜌](𝑢, 𝑣) ⇒ J𝑇 ″K[ ⃗𝜌](𝑓 𝑢, 𝑔 𝑣)
• if 𝑇 is ∀𝛽.𝑇 ′ then
J𝑇 K[ ⃗𝜌] = (𝑥 ∶ 𝑇 [ ⃗⃗ ⃗⃗ ⃗⃗𝐴], 𝑦 ∶ 𝑇 [ ⃗⃗⃗ ⃗⃗𝐵]).
∀𝛾. ∀𝛿. ∀𝜌′ ⊂ 𝛾 × 𝛿.J𝑇 ′K[ ⃗𝜌, 𝜌′](𝑥[𝛾], 𝑦[𝛿])
• if 𝑇 is ∃𝛽.𝑇 ′ then
J𝑇 K[ ⃗𝜌] = (𝑥 ∶ 𝑇 [ ⃗⃗ ⃗⃗ ⃗⃗𝐴], 𝑦 ∶ 𝑇 [ ⃗⃗⃗ ⃗⃗𝐵]).
∃𝛾. ∃𝛿. ∃𝜌′ ⊂ 𝛾 × 𝛿.
∃𝑢 ∶ 𝑇 ′[ ⃗⃗ ⃗⃗ ⃗⃗𝐴, 𝛾]. ∃𝑣 ∶ 𝑇 ′[ ⃗⃗⃗ ⃗⃗𝐵, 𝛿].
𝑥 = pack 𝛾, 𝑢 as 𝑇 [ ⃗⃗ ⃗⃗ ⃗⃗𝐴]
∧ 𝑦 = pack 𝛿, 𝑣 as 𝑇 [ ⃗⃗⃗ ⃗⃗𝐵]
∧ J𝑇 ′K[ ⃗𝜌, 𝜌′](𝑢, 𝑣)
Using this relational interpretation, the relation that our two set implemen-
tations must satisfy to show that they are implementing the same interface can
82 CHAPTER 6. ABSTRACTION
be written:
J𝛼
× (𝛼 → 𝛾)
× (𝛼 → 𝛽 → 𝛾)
× (𝛼 → 𝛽 → 𝛼)
× (∀𝛿. 𝛼 → 𝛿 → 𝛿 → 𝛿)K[𝜎, =Int, =Bool](setlist , settree)
where setlist and settree are products containing the implementations of set using
lists and trees respectively.
The relational interpretation can be thought of as representing equality be-
tween values of that type under the assumption that the substituted relations
represent equality for values of the free type variables.
A relational definition of abstraction
Using the relational interpretation we can now give a precise meaning to the
idea that existential types provide abstraction.
Given a type 𝑇 with free variables 𝛼, 𝛽1, … , 𝛽𝑛:
∀𝐵1. … ∀𝐵𝑛. ∀𝑥 ∶ (∃𝛼.𝑇 [𝛼, 𝐵1, … , 𝐵𝑛]). ∀𝑦 ∶ (∃𝛼.𝑇 [𝛼, 𝐵1, … , 𝐵𝑛]).
𝑥 = 𝑦 ⇔
∃𝛾. ∃𝛿. ∃𝜎 ⊂ 𝛾 × 𝛿.
∃𝑢 ∶ 𝑇 [𝛾, 𝐵1, … , 𝐵𝑛]. ∃𝑣 ∶ 𝑇 [𝛿, 𝐵1, … , 𝐵𝑛].
𝑥 = pack 𝛾, 𝑢 as ∃𝛼.𝑇 [𝛼, 𝐵1, … , 𝐵𝑛]
∧ 𝑦 = pack 𝛿, 𝑣 as ∃𝛼.𝑇 [𝛼, 𝐵1, … , 𝐵𝑛]
∧ J𝑇 K[𝜎, =𝐵1 , … , =𝐵𝑛 ](𝑢, 𝑣)
This formula can be read as: For two values 𝑥 and 𝑦 with existential type,
if there is a way to view their implementation types (𝛾 and 𝛿) as representing
the same thing – captured by the relation 𝜎 – and their implementations (𝑢 and
𝑣) behave the same with respect to 𝜎, then 𝑥 and 𝑦 are equal: they will behave
the same in all contexts.
This is the essence of abstraction: if two implementations behave the same
with respect to some relation, then once they have been packed into an existen-
tial type they are indistinguishable.
6.2.3 Invariants
Now that we have a precise description of abstraction, we can talk about the
implications of abstraction beyond the ability to replace one implementation
with another. In particular, the ability of abstraction to preserve invariants on
types.
We can represent an invariant 𝜙[𝑥] on a type 𝛾 as a relation 𝜌 ⊂ 𝛾 × 𝛾:
𝜌(𝑥 ∶ 𝛾, 𝑦 ∶ 𝛾) = (𝑥 = 𝑦) ∧ 𝜙[𝑥]
6.2. ABSTRACTION IN SYSTEM FΩ 83
Using this representation, J𝑇 K[𝜌](𝑢, 𝑢) holds for some value 𝑢 of type 𝑇 [𝛾] iff 𝑢
preserves the invariant 𝜙 on type 𝛾.
Given
• a type 𝑇 with free variable 𝛼
• a type 𝛾
• a value 𝑢 of type 𝑇 [𝛾]
• an expression 𝐸 with free variable 𝑥 such that if 𝑥 has type 𝛽 then 𝐸 also
has type 𝛽
it can be shown from the abstraction property of existentials that:
∀𝜌 ⊂ 𝛾 × 𝛾. J𝑇 K[𝜌](𝑢, 𝑢) ⇒
𝜌(
open (pack 𝛾, 𝑢 as ∃𝛾. 𝑇 [𝛾]) as 𝑥, 𝛾 in 𝐸,
open (pack 𝛾, 𝑢 as ∃𝛾. 𝑇 [𝛾]) as 𝑥, 𝛾 in 𝐸
)
This means that if 𝑢 preserves an invariant on type 𝛾 – represented by
relation 𝜌 – and 𝑢 is packed up into an existential type then the invariant will
hold for any value of the (now abstract) type 𝛾 that are created from 𝑢.
In other words, if we can show that an implementation of an interface pre-
serves an invariant on an abstract type, then that invariant holds for all values
of that abstract type in the program.
6.2.4 Identity extension
The relational interpretation can be thought of as representing equality between
values of that type under the assumption that the substituted relations represent
equality for values of the free type variables.
In particular, given a type 𝑇 with free variables 𝛼1, … , 𝛼𝑛, if we substitute
equality relations (=) for a type’s free variables we get the equality relation of
that type:
∀𝐴1. … ∀𝐴𝑛. ∀𝑥 ∶ 𝑇 [𝐴1, … , 𝐴𝑛]. ∀𝑦 ∶ 𝑇 [𝐴1, … , 𝐴𝑛].
(𝑥 =𝑇 [𝐴1,…,𝐴𝑛] 𝑦) ⇔ J𝑇 K[=𝐴1 , … , =𝐴𝑛 ](𝑥, 𝑦)
This property of the relational interpretation is known as identity extension.
The abstraction property of existentials follow as a direct consequence of identity
extension.
Identity extension is a key property of System F which is often overlooked
when discussing the soundness of extensions to it. The traditional approach of
proving soundness by showing preservation and progress (Wright and Felleisen
[1994]) does not demonstrate that a system maintains identity extension, and
thus ignores the key question of whether abstraction is preserved by that system.
In the next chapter we will look at another consequence of identity extension:
parametricity.
84 CHAPTER 6. ABSTRACTION