From JavaScript to Haskell (via PureScript)
Learning Outcomes
· Compare a lambda-calculus inspired Haskell-like language (PureScript) with the functional programming concepts explored earlier in JavaScript
· Understand how tail call optimisation is applied in languages which support it
Introduction
JavaScript is a multiparadigm language that—due to its support for functions as objects, closures and, therefore, higher-order functions—is able to be used in a functional programming style. However, if you are really enamoured with currying and combining higher-order functions, then it really makes a lot of sense to use a language that is actually designed for it.
There are a number of purpose-built Functional Programming languages. Lisp (as we have already discussed) is the original, but there are many others. Scheme is a Lisp derivative, as is (more recently) Clojure. SML and its derivatives (e.g. OCaml, F#, etc.) form another family of functional programming languages. However, the strongest effort to build a language that holds to the principles of lambda-calculus inspired functional programming such as immutability (purity) is the Haskell family.
There are a number of efforts to bring Haskell-like purity to web programming, inspired by the potential benefits the functional-style holds for managing complex state in asynchronous and distributed applications. Firstly, it is possible to compile Haskell code directly to JavaScript (using GHCJS) although the generated code is opaque and requires a runtime. Another promising and increasingly popular haskell-inspired language for client-side web development is Elm, although this again requires a runtime. Also, Elm is rather specialised for creating interactive web apps.
The JavaScript-targeting Haskell derivative we are going to look at now is PureScript. The reason for this choice is that PureScript generates standalone and surprisingly readable JavaScript. For a full introduction to the language, the PureScript Book, written by the language’s creator, is available for free. However, in this unit we will only make a brief foray into PureScript as a segue from JavaScript to Haskell. To avoid overwhelming ourselves with minor syntactic differences we will also endeavor to stick to a subset of PureScript that is syntactically the same as Haskell.
Hello Functional Language!
Without further ado, here is some PureScript code. Fibonacci number computation is often called the “hello world!” of functional programming:
fibs :: Int -> Int
fibs 0 = 1
fibs 1 = 1
fibs n = fibs (n-1) + fibs (n-2)
Woah! A function for Fibonacci numbers that is about as minimal as you can get! And the top line, which just declares the type of the function, is often optional – depending on whether the compiler can infer it from the context. Having said that, it’s good practice to include a type declaration, especially for top-level functions (functions defined without indentation and therefore in-scope everywhere in the file). This function takes an Int (integer) parameter, and returns an Int. Note that the arrow shorthand for the function type definition is highly reminiscent of the JavaScript fat-arrow (=>) though skinnier.
The next three lines define the actual logic of the function, which very simply gives a recursive definition for the nth Fibonacci number. This definition uses a feature common to many functional programming languages: pattern matching. That is, we define the fibs function three times, with the first two definitions handling the base cases. It says, literally: “the 0th and 1st fibs are both 1”. The last line defines the general case, that the remaining fibonacci numbers are each the sum of their two predecessors. Note, this definition is not perfect. Calling:
fibs -1
would be a bad idea. Good practice would be to add some exceptions for incorrect input to our function. In a perfect world we would have a compiler that would check types dependent on values (actually, languages that support dependent types exist, e.g. the Idris language is an interesting possible successor to Haskell in this space).
One thing you will have noticed by now is that Haskell-like languages are light on syntax. Especially, use of brackets is minimal, and typically to be avoided when evaluation order can be inferred correctly by the compiler’s application of lambda-calculus inspired precedence rules for function and operator application.
We can define a main function for our program, that maps the fibs function to a (Nil-terminated) linked-list of numbers and displays them to the console like so:
main = log $ show $ map fibs $ 1..10
and here’s the output when you run it from the command line:
(1 : 2 : 3 : 5 : 8 : 13 : 21 : 34 : 55 : 89 : Nil)
I’m omitting the type declaration for main because the type for functions that have input-output side-effects is a little more complicated, differs from haskell – and the compiler doesn’t strictly need it yet anyway.
The above definition for main is a chain of functions and the order of evaluation (and hence how you should read it) is right-to-left. The $ symbol is actually shorthand for brackets around everything to the symbol’s right. In other words, the above definition for main is equivalent to:
main = log ( show ( map fibs ( 1..10 )))
The $ is not special syntax (i.e. it is not a keyword in the language definition). Rather, it is an operator defined in the PureScript Prelude like so:
infixr 0 apply as $
That is, $ is an infix, right associative operator with binding precedence 0 (the lowest) that invokes the apply function:
apply f x = f x
Woah! What is f and what is x? Well, in PureScript functions are generic by default – but we (and the compiler) can infer, since f x is a function call with argument x, that f is a function and x is… anything. So apply literally applies the function f to the argument x. Since the binding precedence of the $ operator is so low compared to most things that could be placed to its right, brackets are (usually) unnecessary.
Exercise
· If one didn’t happen to like the fact that function chaining with the $ operator reads right to left, how would one go about creating an operator that chains left to right? (Hint: infixl is a thing and you will need to make a slightly different apply function also).
So anyway, back to the chain of functions in main:
main = log $ show $ map fibs $ 1..10
log is a function that wraps JavaScript’s console.log show is a function that is overloaded to convert various types to strings. In this case, we’ll be showing a List of Int. map is (equivalent to our old friend from our JavaScript exercises) a function that applies a function to stuff inside a… let’s call it a container for now… in this case our Container is a List. 1..10 uses the .. (range) infix operator to create a List of Int between 1 and 10.
Peeking under the hood
So all this may seem pretty foreign, but actually, since we’ve already covered many of the functional programming fundamentals in JavaScript, let’s take a look at the JavaScript code that the PureScript compiler generates for fibs and main and see if anything looks familiar. Here’s fibs, exactly as it comes out of the compiler:
var fibs = function (v) {
if (v === 0) {
return 1;
};
if (v === 1) {
return 1;
};
return fibs(v – 1 | 0) + fibs(v – 2 | 0) | 0;
};
Woah! It’s pretty much the way a savvy JavaScript programmer would write it. The one part that may look a bit unusual are the expressions like v – 1 | 0. Of course, JavaScript has no Int type, so this is PureScript trying to sensibly convert to the all-purpose JavaScript number type. The | is a bitwise OR, so |0 ensures that resulting expression is an integer which is both a safety measure and a potential optimisation. It’s a situation where the declared types give the PureScript compiler more information about the intent of the code than would otherwise be present in JavaScript, and which it’s able to use to good effect.
At first glance the code generated for main is a bit denser. Here it is, again as generated by the compiler but I’ve inserted some line breaks so we can see it a little more clearly:
var main = Control_Monad_Eff_Console.log(
Data_Show.show(
Data_List_Types.showList(Data_Show.showInt)
)(
Data_Functor.map
(Data_List_Types.functorList)(fibs)(Data_List.range(1)(10))
)
);
Each of the functions lives in an object that encapsulates the module where it is defined. That’s pretty standard JavaScript practice. The rest is just function calls (application). The call to the range function is interesting:
Data_List.range(1)(10)
Woah! It’s a curried function! Data_List.range(1) returns a function that creates lists of numbers starting from 1. The second call specifies the upper bound.
Exercise
· What other functions called in the JavaScript code generated for the above definition of main are curried? Why?
Tail Call Optimisation
Our definition for fibs was recursive. This has a nice declarative style about it. The definition is very close to a mathematical definition. But at some point in your training for imperative programming you will have most likely been told that recursion is evil and inefficient. Indeed, we’ve seen at the start of this course that there is overhead due to creating new stack frames for each function call. Looping recursively creates a new stack frame for each iteration and so our (finite) stack memory will be consumed linearly with the number of iterations. However, there are certain patterns of recursive function calls that our compiler can easily recognise and replace with an iterative loop. We can see this happening directly in PureScript if we reconfigure our fibs definition to use a tail call.
fibs n = f n 0 1
where
f 0 _ b = b
f i a b = f (i-1) b (a+b)
In general, as we have seen with $, PureScript (and Haskell) have relatively few keywords, instead preferring functions and operators built with the language itself in the Prelude (the base library functions that are available by default). The where keyword, however, is one of the exceptions. It allows us to make some local definitions inside the scope of the function. Here we define f whose first parameter is an iteration counter, whose base case is 0. The key feature of f is that its recursive call is the very last thing to happen in the function body. That is, it is in the tail position.
The other important aspect of PureScript that we are encountering for the first time in the above definition is that indentation is used to determine scope (as in python).
Here’s the JavaScript that is generated this time:
var fibs = function (n) {
var f = function ($copy_v) {
return function ($copy_v1) {
return function ($copy_b) {
var $tco_var_v = $copy_v;
var $tco_var_v1 = $copy_v1;
var $tco_done = false;
var $tco_result;
function $tco_loop(v, v1, b) {
if (v === 0) {
$tco_done = true;
return b;
};
$tco_var_v = v – 1 | 0;
$tco_var_v1 = b;
$copy_b = v1 + b | 0;
return;
};
while (!$tco_done) {
$tco_result = $tco_loop($tco_var_v, $tco_var_v1, $copy_b);
};
return $tco_result;
};
};
};
return f(n)(0)(1);
};
Obviously, it’s a less direct translation than was generated for our previous version of fibs. However, you can fairly easily understand it still. Hint, the tco_ prefix in many of the generated variable names stands for “Tail Call Optimisation” and the local function f is a curried function, as are all functions of more than one argument in PureScript. The important thing is that the recursive call is gone, replaced by a while loop.
We have seen all we need for now of PureScript. It’s a small but nicely put together language. It takes the best features of Haskell and reinterprets some of them quite cleverly to achieve relatively seamless interop with JavaScript. However, it’s still a bit niche. For the remainder of this unit we’ll dive more deeply into Haskell, which has a long history and is supported by a very large and active community across academia and industry.
Creating and Running Haskell Programs
Learning Outcomes
· Use the GHCi REPL to test Haskell programs and expressions
· Compare the syntax of Haskell programs to Functional-style code in JavaScript
· Understand that by default Haskell uses a lazy evaluation strategy
· Create and use Haskell lists and tuples
· Create Haskell functions using pattern-matching, guards, and local definitions using where and let clauses.
Introduction
This section is not your usual “First Introduction To Haskell” because it assumes you have arrived here having already studied some reasonably sophisticated functional programming concepts in JavaScript, basic parametric polymorphic types in TypeScript, and Lambda Calculus. Familiarity with higher-order and curried functions is assumed.
I try to summarise the syntax we use with “cheatsheets” throughout, but the official CheatSheet will be a useful reference. The other reference every Haskell programmer needs is the official library docs on Hackage which is searchable by the excellent Hoogle search engine, which lets you search by types as well as function keywords.
If you would like a more gradual introduction, “ from First Principles” by Allen and Moronuki is a recent and excellent introduction to Haskell that is quite compatible with the goals of this course. The ebook is not too expensive, but unfortunately, it is independently published and hence not available from our library. There are a few copies of Programming Haskell by Hutton which is an excellent academic textbook, but it’s expensive to buy. ”Learn you a Haskell” by is a freely available alternative that is also a useful introduction.
Starting with the GHCi REPL
A good way to get started with haskell is simply to experiment with the GHCi REPL (Read Eval Print Loop).
Start by making a file: fibs.hs
fibs 0 = 1 — two base cases,
fibs 1 = 1 — resolved by pattern matching
fibs n = fibs (n-1) + fibs (n-2) — recursive definition
Then load it into GHCi like so:
$ stack ghci fibs.hs
You’ll get a prompt that looks like:
Prelude>
Where the text left of the > tells you what libraries are loaded into the current scope. “Prelude” is the default library which already includes everything we need for this chapter.
You can enter haskell expressions directly at the prompt:
Prelude> fibs 6
13
I’m going to stop showing the prompt now, but you can enter all of the following directly at the prompt and you will see similar results printed to those indicated below.
Basic logic operators are similar to C/Java/etc: ==, &&, ||.
fibs 6 == 13
True
An exception is “not-equal”, whose operator is /= (Haskell tends to prefer more “mathy” syntax whenever possible).
fibs 6 /= 13
False
If-then-else expressions return a result (like javascript ternary ? 🙂
if fibs 6 == 13 then “yes” else “no”
“yes”
if fibs 6 == 13 && fibs 7 == 12 then “yes” else “no”
“no”
GHCi also has a number of non-haskell commands you can enter from the prompt, they are prefixed by “:”. You can reload your .hs file into ghci after an edit with :r. Type :h for help.
You can declare local variables in the repl:
x = 3
y = 4
x + y
7
One of Haskell’s distinguishing features from other languages is that its variables are strictly immutable (i.e. not variable at all really). This is similar to variables declared with JavaScript’s const keyword – but everything in Haskell is deeply immutable by default.
However, in the GHCi REPL, you can redefine variables and haskell will not complain.
Creating a Runnable Haskell Program
Both the simplest and tail-recursive versions of our PureScript fibs code are also perfectly legal Haskell code. The main function will be a little different, however:
main :: IO ()
main = print $ map fibs [1..10]
I’ve included the type signature for main although it’s not absolutely necessary (the compiler can usually infer the type of such functions automatically, as it did for our fibs function definition above), but it is good practice to define types for all top-level functions (functions that are not nested inside other functions) and also the IO type is interesting, and will be discussed at length later. The main function takes no inputs (no need for -> with something on the left) and it returns something in the IO monad. Without getting into it too much, yet, monads are special types that can also wrap some other value. In this case, the main function just does output, so there is no wrapped value and hence the () (called unit) indicates this. You can think of it as being similar to the void type in C, Java or TypeScript.
What this tells us is that the main function produces an IO side effect. This mechanism is what allows Haskell to be a pure functional programming language while still allowing you to get useful stuff done. Side effects can happen, but when they do occur they must be neatly bundled up and declared to the type system, in this case through the IO monad. For functions without side-effects, we have strong, compiler-checked guarantees that this is indeed so (that they are pure).
By the way, once you are in the IO monad, you can’t easily get rid of it. Any function that calls a function that returns an IO monad, must have IO in its return type. Thus, effectful code is possible, but the type system ensures we are aware of it and can limit its taint. The general strategy is to use pure functions wherever possible, and push the effectful code as high in your call hierarchy as possible – that is, limit the size and scope of impure code as much as possible. Pure functions are much more easily reusable in different contexts.
The print function is equivalent to the PureScript log $ show. That is, it uses any available show function for the type of value being printed to convert it to a string, and it then prints that string. Haskell defines show for many types in the Prelude, but print in this case invokes it for us. The other difference here is that square brackets operators are defined in the prelude for linked lists. In PureScript they were used for Arrays – which (in PureScript) don’t have the range operator (..) defined so I avoided them. Speaking of List operators, here’s a summary:
Basic List and Tuple Operator Cheatsheet
The default Haskell lists are cons lists (linked lists defined with a cons function), similar to those we defined in JavaScript.
[] — an empty list
[1,2,3,4] — a simple lists of values
[1..4] — ==[1,2,3,4] (..) is range operator
1:[2,3,4] — ==[1,2,3,4], use `:` to “cons” an element to the start of a list
1:2:3:[4] — ==[1,2,3,4], you can chain `:`
[1,2]++[3,4] — ==[1,2,3,4], i.e. (++) is concat
— You can use `:` to pattern match lists in function definitions.
— Not the enclosing `()` to delimit the pattern for the parameter.
length [] = 0
length (x:xs) = 1 + length xs
— (although you don’t need to define `length`, it’s already loaded by the prelude)
length [1,2,3]
3
Some other useful functions for dealing with lists:
head [1,2,3] — 1
tail [1,2,3] — [2,3]
sum [1,2,3] — 6 (but only applicable for lists of things that can be summed)
minimum [1,2,3] — 1 (but only for Ordinal types)
maximum [1,2,3] — 3
map f [1,2,3] — maps the function f over the elements of the list returning the result in another list
Tuples are fixed-length collections of values that may not necessarily be of the same type. They are enclosed in ()
t = (1,”hello”) — define variable t to a tuple of an Int and a String.
fst t
1
snd t
“hello”
And you can destructure and pattern match tuples:
(a,b) = t
a
1
b
“hello”
Note that we created tuples in JavaScript using [] – actually they were fixed-length arrays, don’t confuse them for Haskell lists or tuples.
Lazy by Default
Haskell strategy for evaluating expressions is lazy by default – that is it defers evaluation of expressions until it absolutely must produce a value. Laziness is of course possible in other languages (as we have seen in JavaScript), and there are many lazy data-structures defined and available for PureScript (and most other functional languages). Conversely, Haskell can be forced to use strict evaluation and has libraries of datastructures with strict semantics if you need them.
However, lazy by default sets Haskell apart. It has pros and cons, on the pro side:
· It can make certain operations more efficient, for example, we have already seen in JavaScript how it can make streaming of large data efficient
· It can enable infinite sequences to be defined and used efficiently (this is a significant semantic difference)
· It opens up possibilities for the compiler to be really quite smart about its optimisations.
But there are definitely cons:
· It can be hard to reason about run-time performance
· Mixing up strict and lazy evaluation (which can happen inadvertently) can lead to (for example) O(n2) behaviour in what should be linear time processing.
A Side Note on the Y Combinator
The Haskell way of defining Lambda (anonymous) functions is heavily inspired by Lambda Calculus, but also looks a bit reminiscent of the JavaScript arrow syntax:
JavaScript
x=>x
Lambda Calculus
λx. x
Haskell
\x -> x
Since it’s lazy-by-default, it’s possible to transfer the version of the Y-combinator we explored in Lambda Calculus into haskell code almost as it appears in Lambda Calculus:
y = \f -> (\x -> f (x x)) (\x -> f (x x))
However, to get it to type-check one has to either write some gnarly type definitions or force the compiler to do some unsafe type coercion. The following (along with versions of the Y-Combinator that do type check in haskell) are from an excellent Stack Overflow post:
import Unsafe.Coerce
y :: (a -> a) -> a
y = \f -> (\x -> f (unsafeCoerce x x)) (\x -> f (unsafeCoerce x x))
main = putStrLn $ y (“circular reasoning works because ” ++)
Functional Programming in Haskell versus JavaScript =p) =p) Guards case Data Types and Type Classes Type Kinds Another sort of “kind” are for type classes which we will introduce more properly in a moment. For example, the “kind” for the Ord type class (the class of things that are Orderable and which we came across in our simple implementation of quicksort) is: Creating custom instances of type classes > Spade < Heart
True
The Show typeclass allows the data to be converted to strings with the show function (e.g. so that GHCi can display it). The Enum typeclass allows enumeration, e.g.:
> [Spade .. Heart] For example, the built-in lookup function can be used to search a list of key-value pairs, and fail gracefully by returning Nothing if there is no matching key. > :t lookup > lookup “Fred” phonebook > lookup “Tim” phonebook *GHCi> printNumber “Fred” Functor and Applicative
Consider the following pseudocode for a simple recursive definition of the Quick Sort algorithm:
QuickSort list:
Take head of list as a pivot
Take tail of list as rest
return
QuickSort( elements of rest < pivot ) ++ (pivot : QuickSort( elements of rest >= pivot ))
We’ve added a bit of notation here: a : l inserts a (“cons”es) to the front of a list l ; l1 ++ l2 is the concatenation of lists l1 and l2.
In JavaScript the fact that we have anonymous functions through compact arrow syntax and expression syntax if (with ? : ) means that we can write pure functions that implement this recursive algorithm in a very functional, fully-curried style. However, the language syntax really doesn’t do us any favours!
For example,
const
sort = lessThan=>
list=> !list ? null :
(pivot=>rest=>
(lesser=>greater=>
concat(sort(lessThan)(lesser))
(cons(pivot)(sort(lessThan)(greater)))
)(filter(a=> lessThan(a)(pivot))(rest))
(filter(a=> !lessThan(a)(pivot))(rest))
)(head(list))(tail(list))
Consider, the following, more-or-less equivalent haskell implementation:
sort [] = []
sort (pivot:rest) = lesser ++ [pivot] ++ greater
where
lesser = sort $ filter (
An essential thing to know before trying to type in the above function is that Haskell delimits the scope of multi-line function definitions (and all multiline expressions) with indentation (complete indentation rules reference here). The where keyword lets us create multiple function definitions that are visible within the scope of the parent function, but they must all be left-aligned with each other and to the right of the start of the line containing the where keyword.
Haskell also helps with a number of other language features.
First, is pattern matching. Pattern matching is like function overloading that you may be familiar with from languages like Java or C++ – where the compiler matches the version of the function to invoke for a given call by matching the type of the parameters to the type of the call – except in Haskell the compiler goes a bit deeper to inspect the values of the parameters.
There are two declarations of the sort function above. The first handles the base case of an empty list. The second handles the general case, and pattern matching is again used to destructure the lead cons expression into the pivot and rest variables. No explicit call to head and tail functions is required.
The next big difference between our Haskell quicksort and our previous JavaScript definition is the Haskell style of function application – which has more in common with lambda calculus than JavaScript. The expression f x is application of the function f to whatever x is.
Another thing that helps with readability is infix operators. For example, ++ is an infix binary operator for list concatenation. The : operator for cons is another. There is also the aforementioned $ which gives us another trick for removing brackets, and finally, the < and >= operators. Note, that infix operators can also be curried and left only partially applied as in (
I deliberately avoided the type declaration for the above function because, (1) we haven’t really talked about types properly yet, and (2) because I wanted to show off how clever Haskell type inference is. However, it is actually good practice to include the type signature. If one were to load the above code, without type definition, into GHCi (the EPL), one could interrogate the type like so:
> :t sort
sort :: Ord t => [t] -> [t]
Thus, the function sort has a generic type-parameter t (we’ll talk more about such parametric polymorphism in haskell later) which is constrained to be in the Ord type class (anything that is orderable – we’ll talk more about type classes too). It’s input parameter is a list of t, as is its return type. This is also precisely the syntax that one would use to declare the type explicitly. Usually, for all top-level functions in a Haskell file it is good practice to explicitly give the type declaration. Although, it is not always necessary, it can avoid ambiguity in many situations, and secondly, once you get good at reading Haskell types, it becomes useful documentation.
Here’s another refactoring of the quick-sort code. This time with type declaration because I just said it was the right thing to do:
sort :: Ord t => [t] -> [t]
sort [] = []
sort (pivot:rest) = below pivot rest ++ [pivot] ++ above pivot rest
where
below p = partition (
partition comparison = sort . filter comparison
The list parameter for below and above has been eta-reduced away just as we were able to eta-reduce lambda calculus expressions. The definition of the partition function in this version uses the . operator for function composition. That is, partition comparison is the composition of sort and filter comparison and again the list parameter is eta-reduced away.
Although it looks like the comparison parameter could also go away here with eta conversion, actually the low precedence of the . operator means there is (effectively) implicit parentheses around filter comparison. We will see how to more agressively refactor code to be point-free later.
The idea of refactoring our code into the above form was to demonstrate the freedom that Haskell gives us to express logic in a way that makes sense to us. This version reads almost like a natural language declarative definition of the algorithm. That is, you can read:
sort (pivot:rest) = below pivot rest ++ [pivot] ++ above pivot rest
as:
the sort of a list where we take the first element as the ‘pivot’ and everything after the list as ‘rest’ is everything that is below pivot in rest,
concatenated with a list containing just the pivot,
concatenated with everything that is above pivot in rest.
Haskell has a number of features that allow us to express ourselves in different ways. Above we used a where clause to give a post-hoc, locally-scoped declaration of the below and above functions. Alternately, we could define them at the start of the function body with let
sort :: Ord t => [t] -> [t]
sort [] = []
sort (pivot:rest) = let
below p = partition (
in
below pivot rest ++ [pivot] ++ above pivot rest
where
partition comparison = sort . filter comparison
Note that where is only available in function declarations, not inside expressions and therefore is not available in a lambda. However, let, in is part of the expression, and therefore available inside a lambda function. A silly example would be: \i -> let f x = 2*x in f i, which could also be spread across lines, but be careful to get the correct indentation.
Conditional Code Constructs Cheatsheet
Pattern matching
Provides alternative cases for function definitions matching different values or possible destructurings of the function arguments (more detail). As per examples above and:
fibs 0 = 1
fibs 1 = 1
fibs n = fibs (n-1) + fibs (n-2)
if-then-else
If
just like javascript ternary if operator:
fibs n = if n==0 then 1 else if n==1 then 1 else fibs (n-1) + fibs (n-2)
Can test Bool expressions (i.e. not just values matching as in pattern matching)
fibs n
| n == 0 = 1
| n == 1 = 1
|otherwise = fibs (n-1) + fibs (n-2)
fibs n = case n of
0 -> 1
1 -> 1
otherwise -> fibs (n-1) + fibs (n-2)
Learning Outcomes
· Define data structures using Haskell’s Algebraic Data Types and use pattern matching to define functions that handle each of the possible instances
· Use the alternate record syntax to define data structures with named fields
· Understand that Haskell type classes are similar to TypeScript interfaces in providing a definition for the set of functions that must be available for instances of those type classes and that typeclasses can extend upon one another to create rich hierarchies
· Understand that the Maybe type provides an elegant way to handle partial functions.
Algebraic Data Types
We can declare custom types for data in Haskell using the data keyword. Consider the following declaration of our familiar cons list:
data ConsList = Nil | Cons Int ConsList
The | operator looks rather like the union type operator in TypeScript, and indeed it serves a similar purpose. Here, a ConsList is defined as being a composite type, composed of either Null or a Cons of an Int value and another ConsList. This is called an “algebraic data type” because | is like an “or”, or algebraic “sum” operation for combining elements of the type while separating them with a space is akin to “and” or a “product” operation.
Note that neither Nil or Cons are built in. They are simply labels for constructor functions for the different versions of a ConsList node. You could equally well call them EndOfList and MakeList or anything else that’s meaningful to you. Nil is a function with no parameters, Cons is a function with two parameters. Int is a built-in primitive type for limited-precision integers.
Now we can create a small list like so:
l = Cons 1 $ Cons 2 $ Cons 3 Nil
Pattern Matching
In Haskell, we can define multiple versions of a function to handle the instances of an algebraic data types. This is done by providing a pattern in the variable list of the function definition, in the form of an expression beginning with the constructor of the data instance (e.g. Cons or Nil) and variable names which will be bound to the different fields of the data instance.
For example, we can create a function to determine a ConsList’s length using pattern matching; to not only create different definitions of the function for each of the possible instances of a ConsList, but also to destructure the non-empty Cons:
consLength :: ConsList -> Int
consLength Nil = 0
consLength (Cons _ rest) = 1 + consLength rest
Since we don’t care about the head value in this function, we match it with _, an unnamed variable, which effectively ignores it. Note that another way to conditionally destructure with pattern matching is using a case statement.
Note that such a definition for lists is made completely redundant by Haskell’s wonderful built-in lists, where [] is the empty list, and : is an infix cons operator. We can pattern match the empty list or destructure (head:rest), e.g.:
intListLength :: [Int] -> Int — takes a list of Int as input and returns an Int
intListLength [] = 0
intListLength (_:rest) = 1 + intListLength rest
Type Parameters and Polymorphism
Similar to typescript Haskell provides parametric polymorphism. That is, the type definitions for functions and data structures (defined with data like the ConsList above) can have type parameters (AKA type variables). For example, the definition intListLength above is defined to only work with lists with Int elements. This seems a silly restriction because in this function we don’t actually do anything with the elements themselves. Below, we introduce the type parameter a so that the length function will able to work with lists of any type of elements.
length :: [a] -> Int — a is a type parameter
length [] = 0
length (_:rest) = 1 + length rest
The following visual summary shows pair data structures with accessor functions fst and sec defined using Record Syntax with varying degrees of type flexibility, and compared with the equivalent typescript generic notation:
· Hard-coded for Int pairs only
· with one type parameter (by convention called a in Haskell, and T in TypeScript)
· with two type parameters such that the two elements may be different types
GHCi allows you to use the :kind (or :k) command to interrogate the Kind of types – think of it as “meta information” about types and their type parameters. The kind syntax indicates the arity or number of type parameters a type has. Note that it is like the syntax for function types (with the ->), you can think of it as information about what is required in terms of type parameters to instantiate the type. If the constructor takes no type parameters the kind is just *, (it returns a type), *->* if it takes one type parameter, *->*->* for two type parameters and so on.
> :k :: * -> Constraint
This tells us that Ord takes one type parameter (for example it could be an Int or other numeric type, or something more complex like the Student type below), and returns a Constraint rather than an actual type. Such a constraint is used to narrow the set of types to which a function may be applied, just as we saw Ord being used as the type constraint for sort:
> :t sort
sort :: Ord t => [t] -> [t]
Record Syntax
Consider the following simple record data type:
data Student = Student Int String Int
A Student has three fields, mysteriously typed Int, String and Int. Let’s say my intention in creating the above data type was to store a student’s id, name and mark. I would create a record like so:
> t = Student 123 “Tim” 95
Here’s how one would search for the student with the best mark:
best :: [Student] -> Student -> Student
best [] b = b
best _ _ am):rest) _ _ bm) =
if am > bm
then best rest a
else best rest b
The @ notation, as in _ _ bm) stores the record itself in the variable b but also allows you to unpack its elements, e.g. bm is bound to mark.
To get the data out of a record I would need to either destructure using pattern matching, as above, every time, or create some accessor functions:
id (Student n _ _) = n
name (Student _ n _) = n
mark (Student _ _ n) = n
> name t
“Tim”
It’s starting to look a bit like annoying boilerplate code. Luckily, Haskell has another way to define such record types, called record syntax:
data Student = Student { id::Integer, name::String, mark::Int }
This creates a record type in every way the same as the above, but the accessor functions id, name and mark are created automatically.
Typeclasses
Haskell uses “type classes” as a way to associate functions with types. A type class is like a promise that a certain type will have specific operations and functions available. You can think of it as being similar to a TypeScript interface.
Despite the name however, it is not like an ES6/TypeScript class, since a Haskell type class does not actually give definitions for the functions themselves, only their type signatures.
The function bodies are defined in “instances” of the type class. A good starting point for gaining familiarity with type classes is seeing how they are used in the standard Haskell prelude. From GHCi we can ask for information about a specific typeclass with the :i command, for example, Num is a typeclass common to numeric types:
GHCi> :i Num
class Num a where
(+) :: a -> a -> a
(-) :: a -> a -> a
(*) :: a -> a -> a
negate :: a -> a
abs :: a -> a
signum :: a -> a
fromInteger :: Integer -> a
{-# MINIMAL (+), (*), abs, signum, fromInteger, (negate | (-)) #-}
— Defined in `GHC.Num’
instance Num Word — Defined in `GHC.Num’
instance Num Integer — Defined in `GHC.Num’
instance Num Int — Defined in `GHC.Num’
instance Num Float — Defined in `GHC.Float’
instance Num Double — Defined in `GHC.Float’
The first line (beginning class) tells us that for a type to be an instance of the Num typeclass, it must provide the operators +, * and the functions abs, signum and fromInteger, and either (-) or negate. The last is an option because a default definition exists for each in terms of the other. The last five lines (beginning with “instance”) tell us which types have been declared as instances of Num and hence have definitions of the necessary functions. These are Word, Integer, Int, Float and Double. Obviously this is a much more finely grained set of types than JavaScript’s universal “number” type. This granularity allows the type system to guard against improper use of numbers that might result in loss in precision or division by zero.
The main numeric type we will use in this course is Int, i.e. fixed-precision integers.
Note some obvious operations we would likely need to perform on numbers that are missing from the Num typeclass. For example, equality checking. This is defined in a separate type class Eq, that is also instanced by concrete numeric types like Int:
> :i Eq
class Eq a where
(==) :: a -> a -> Bool
(/=) :: a -> a -> Bool
{-# MINIMAL (==) | (/=) #-}
…
instance Eq Int
…
Note again that instances need implement only == or /= (not equal to), since each can be easily defined in terms of the other. Still we are missing some obviously important operations, e.g., what about greater-than and less-than? These are defined in the Ord type class:
> :i Ord
class Eq a => Ord a where
compare :: a -> a -> Ordering
(<) :: a -> a -> Bool
(<=) :: a -> a -> Bool
(>) :: a -> a -> Bool
(>=) :: a -> a -> Bool
max :: a -> a -> a
min :: a -> a -> a
{-# MINIMAL compare | (<=) #-}
The compare function returns an Ordering:
> :i Ordering
data Ordering = LT | EQ | GT
A custom data type can be made an instance of Ord by implementing either compare or <=. The definition Eq a => Ord a means that anything that is an instance of Ord must also be an instance of Eq. Thus, typeclasses can build upon each other into rich hierarchies:
If we have our own data types, how can we make standard operations like equality and inequality testing work with them? Luckily, the most common type classes can easily be instanced automatically through the deriving keyword. For example, if we want to define a Suit type for a card game we can automatically generate default instances of the functions (and operators) associated with Equality testing, Ordinal comparisons, Enumerating the different possible values of the type, and Showing them (or converting them to string):
data Suit = Spade|Club|Diamond|Heart
deriving (Eq,Ord,Enum,Show)
[Spade,Club,Diamond,Heart]
We can also create custom instances of typeclasses by providing our own implementation of the necessary functions, e.g.:
instance Show Suit where
show Spade = “^” — OK, these characters are not
show Club = “&” — brilliant approximations of the
show Diamond = “O” — actual playing card symbols ♠ ♣ ♦ ♥
show Heart = “V” — but GHCi support for unicode
— characters is a bit sketch
[Spade .. Heart]
[^,&,O,V]
Maybe
Another important built-in type is Maybe:
> :i Maybe
data Maybe a = Nothing | Just a
All the functions we have considered so far are assumed to be total. That is, the function provides a mapping for every element in the input type to an element in the output type. Maybe allows us to have a sensible return-type for partial functions, that is, functions which do not have a mapping for every input:
phonebook :: [(String, String)]
phonebook = [ (“Bob”, “01788 665242”), (“Fred”, “01624 556442”), (“Alice”, “01889 985333”) ]
lookup :: Eq a => a -> [(a, b)] -> Maybe b
Just “01624 556442”
Nothing
We can use pattern matching to extract values from a Maybe (when we have Just a value), or to perform some sensible default behaviour when we have Nothing.
printNumber name = msg $ lookup name phonebook
where
msg (Just number) = print number
msg Nothing = print $ name ++ ” not found in database”
“01624 556442”
*GHCi> printNumber “Tim”
“Tim not found in database”
In this chapter we see how the Haskell language features we introduced in previous chapters (from function application rules based on Lambda Calculus to Typeclasses) lead to highly flexible and refactorable code and powerful abstractions.
Learning Outcomes
· Understand how eta-conversion, operator sectioning and compose, together provide the ability to transform code to achieve a composable point free form and use this technique to refactor code.
· Understand that in Haskell the ability to map over container structures is generalised into the Functor typeclass, such that any type that is an instance of Functor has the fmap or (<$>) operation.
· Understand that the extends Functor such that containers of functions may be applied (using the (<*>) operator) to containers of values.
· Understand that Functor and Applicative allow powerful composable types through exploring a simple applicative functor for parsing.
Refactoring Cheatsheet
The following equivalences make many refactorings possible in Haskell:
Eta Conversion
Exactly as per Lambda Calculus:
f x ≡ g x
f ≡ g
Operator Sectioning
Remember haskell binary operators are just infix curried functions of two parameters and that putting brackets around them makes them prefix instead of infix.
x + y ≡ (+) x y
≡ ((+) x) y — making function application precedence explicit
≡ (x+) y — binary operators can also be partially applied
Such operator sectioning allows us to get the right-most parameter of the function on it’s own at the right-hand side of the body expression such that we can apply Eta conversion, thus:
f x = 1 + x
f x = (1+) x
f = (1+) — Eta conversion
Compose
Has its own operator in haskell (.), inspired by the mathematical function composition symbol ∘:
(f ∘ g) (x) ≡ f (g(x)) — math notation
(f . g) x ≡ f (g x) — haskell
Again, this gives us another way to get the right-most parameter on it’s own outside the body expression:
f x = sqrt (1 / x)
f x = sqrt ((1/) x) — operator section
f x = (sqrt . (1/)) x — by the definition of composition
f = sqrt . (1/) — eta conversion
Point Free Code
We have discussed point-free and tacit coding style earlier in these notes. In particular, eta-conversion works in Haskell the same as in lambda calculus and for curried JavaScript functions. It is easy to do and usually declutters code of unnecessary arguments that help to distill their essence, e.g.:
lessThan :: (Ord a) => a -> [a] -> [a]
lessThan n aList = filter (
lessThan n = filter (
(f . g) x = f (g x)
To see how to use (.) in lessThan we need to refactor it to look like the right-hand side of the definition above, i.e. f (g x). For lessThan, this takes a couple of steps, because the order we pass arguments to (<) matters. Partially applying infix operators like (
lessThan n = filter (n>)
Now we can use the non-infix form of (>):
lessThan n = filter ((>) n)
And we see from our definition of compose, that if we were to replace filter by f, (>) by g, and n by x, we would have exactly the definition of (.). Thus,
lessThan n = (filter . (>)) n
And now we can apply eta-conversion:
lessThan = filter . (>)
Between operator sectioning, the compose combinator (.), and eta-conversion it is possible to write many functions in point-free form. For example, the flip combinator:
flip :: (a -> b -> c) -> b -> a -> c
flip f a b = f b a
can also be useful in reversing the arguments of a function or operator in order to get them into a position such that they can be eta-reduced.
In code written by experienced haskellers it is very common to see functions reduced to point-free form. Does it make it for more readable code? To experienced haskellers, many times yes. To novices, perhaps not. When to do it is a matter of preference. Experienced haskellers tend to prefer it, they will argue that it reduces functions like the example one above “to their essence”, removing the “unnecessary plumbing” of explicitly named variables. Whether you like it or not, it is worth being familiar with the tricks above, because you will undoubtedly see them used in practice. The other place where point free style is very useful is when you would otherwise need to use a lambda function.
Some more (and deeper) discussion is available on the .
Exercises
· Refactor the following function to be point-free:
f a b c = (a+b)*c
(This is clearly an extreme example but is a useful – and easily verified – practice of operator sectioning, composition and eta-conversion.)
Functor
We’ve been mapping over lists and arrays many times, first in JavaScript:
console> [1,2,3].map(x=>x+1)
[2,3,4]
Now in haskell:
GHCi> map (\i->i+1) [1,2,3]
[2,3,4]
Or (eta-reduce the lambda to be point-free):
GHCi> map (+1) [1,2,3]
[2,3,4]
Here’s the implementation of map for lists as it’s defined in the GHC standard library:
map :: (a -> b) -> [a] -> [b]
map _ [] = []
map f (x:xs) = f x : map f xs
It’s easy to generalise this pattern to any data structure that holds one or more values: mapping a function over a data structure creates a new data structure whose elements are the result of applying the function to the elements of the original data structure.
In Haskell this pattern is captured in a type class called Functor, which defines a function called fmap.
Prelude> :i Functor
class Functor (f :: * -> *) where
fmap :: (a -> b) -> f a -> f b
instance Functor [] — naturally lists are an instance
instance Functor Maybe — but this may surprise!
… — and some other instances we’ll talk about shortly
The first line says that an instances of the Functor typeclass f must be over a type that has the kind (* -> *), that is, their constructors must be parameterised with a single type variable. After this, the class definition specifies fmap as a function that will be available to any instance of Functor and that f is the type parameter for the constructor function, which again, takes one type parameter, e.g. f a as the input to fmap, which returns an f b.
Naturally, lists have an instance:
Prelude> :i []
…
instance Functor [] — Defined in `GHC.Base’
We can actually look up GHC’s implementation of