Last time: staging basics
.< e>.
1/ 55
Staging recap
Goal: specialise with available data to improve future performance
New constructs: ’a code .< e >. .~e !.e
Example: pow
Improvements: unrolling loops
2/ 55
Power, staged
let rec pow x n =
if n = 0 then .< 1 >.
else .< .~x * .~(pow x (n - 1)) >.
let pow_code n = .< fun x → .~(pow .
# pow_code 3;;
.
# let pow3 ’ = !. (pow_code 3);;
val pow3 ’ : int → int =
# pow3 ’ 4;;
– : int = 64
3/ 55
The staging process, idealized
1. Write the program as usual:
val program : t_sta → t_dyn → t
2. Add staging annotations:
val staged_program : t_sta → t_dyn code → t code
3. Compile using back:
val back: (’a code → ’b code) → (’a → ’b) code
val code_generator : t_sta → (t_dyn → t)
4. Construct static inputs:
val s : t_sta
5. Apply code generator to static inputs:
val specialized_code : (t_dyn → t) code
6. Run specialized code to build a specialized function:
val specialized_function : t_dyn → t
4/ 55
The staging process, idealized
1. Write the program as usual:
val program : t_sta → t_dyn → t
2. Add staging annotations:
val staged_program : t_sta → t_dyn code → t code
3. Compile using back:
val back: (’a code → ’b code) → (’a → ’b) code
val code_generator : t_sta → (t_dyn → t)
4. Construct static inputs:
val s : t_sta
5. Apply code generator to static inputs:
val specialized_code : (t_dyn → t) code
6. Run specialized code to build a specialized function:
val specialized_function : t_dyn → t
4/ 55
The staging process, idealized
1. Write the program as usual:
val program : t_sta → t_dyn → t
2. Add staging annotations:
val staged_program : t_sta → t_dyn code → t code
3. Compile using back:
val back: (’a code → ’b code) → (’a → ’b) code
val code_generator : t_sta → (t_dyn → t)
4. Construct static inputs:
val s : t_sta
5. Apply code generator to static inputs:
val specialized_code : (t_dyn → t) code
6. Run specialized code to build a specialized function:
val specialized_function : t_dyn → t
4/ 55
The staging process, idealized
1. Write the program as usual:
val program : t_sta → t_dyn → t
2. Add staging annotations:
val staged_program : t_sta → t_dyn code → t code
3. Compile using back:
val back: (’a code → ’b code) → (’a → ’b) code
val code_generator : t_sta → (t_dyn → t)
4. Construct static inputs:
val s : t_sta
5. Apply code generator to static inputs:
val specialized_code : (t_dyn → t) code
6. Run specialized code to build a specialized function:
val specialized_function : t_dyn → t
4/ 55
The staging process, idealized
1. Write the program as usual:
val program : t_sta → t_dyn → t
2. Add staging annotations:
val staged_program : t_sta → t_dyn code → t code
3. Compile using back:
val back: (’a code → ’b code) → (’a → ’b) code
val code_generator : t_sta → (t_dyn → t)
4. Construct static inputs:
val s : t_sta
5. Apply code generator to static inputs:
val specialized_code : (t_dyn → t) code
6. Run specialized code to build a specialized function:
val specialized_function : t_dyn → t
4/ 55
The staging process, idealized
1. Write the program as usual:
val program : t_sta → t_dyn → t
2. Add staging annotations:
val staged_program : t_sta → t_dyn code → t code
3. Compile using back:
val back: (’a code → ’b code) → (’a → ’b) code
val code_generator : t_sta → (t_dyn → t)
4. Construct static inputs:
val s : t_sta
5. Apply code generator to static inputs:
val specialized_code : (t_dyn → t) code
6. Run specialized code to build a specialized function:
val specialized_function : t_dyn → t
4/ 55
Inner product
let dot
: int → float array → float array → float
= fun n l r →
let rec loop i =
if i = n then 0.
else l.(i) *. r.(i)
+. loop (i + 1)
in loop 0
5/ 55
Inner product, loop unrolling
let dot ’
: int → float array code → float array code → float code
= fun n l r →
let rec loop i =
if i = n then .< 0. >.
else .< ((.~l).(i) *. (.~r).(i))
+. .~(loop (i + 1)) >.
in loop 0
6/ 55
Inner product, loop unrolling
# .< fun l r → .~(dot ’ 3 .
– : (float array → float array → float) code =
.< fun l r →
(l.(0) *. r.(0)) +.
((l.(1) *. r.(1)) +. ((l.(2) *. r.(2)) +. 0.))>.
7/ 55
Inner product, eliding no-ops
let dot ’’
: float array → float array code → float code =
fun l r →
let n = Array.length l in
let rec loop i =
if i = n then .< 0. >.
else match l.(i) with
0.0 → loop (i + 1)
| 1.0 → .<(.~r).(i) +. .~(loop (i + 1)) >.
| x → .<(x *. (.~r).(i)) +. .~(loop (i + 1)) >.
in loop 0
8/ 55
Inner product, eliding no-ops
# .< fun r → .~(dot ’’ [| 1.0; 0.0; 3.5 |] .
– : (float array → float) code =
.< fun r → r.(0) +. ((3.5 *. r.(2)) +. 0.)>.
9/ 55
Binding-time analysis
Classify variables into dynamic (’a code) / static (’a)
let dot ’
: int → float array code → float array code → float code
=
fun n l r →
dynamic: l, r
static: n
Classify expressions into static (no dynamic variables) / dynamic
if i = n then 0
else l.(i) *. r.(i)
dynamic: l.(i) *. r.(i)
static: i = n
Goal: reduce static expressions during code generation.
10/ 55
Partially-static data structures
11/ 55
Stack machines again
. . .
x
y
. . .
x+y
Add
. . .
y
x
c
. . .
(y,x)[c]
If
. . . . . .
c
PushConst
12/ 55
Stack machines: higher-order vs first-order
type (’s, ’t) t = ’s → ’t
let add (x, (y, s)) = (x + y, s)
type (’s, ’t) t = (’s, ’t) instrs
let add = Add :: Stop
13/ 55
Recap: optimising stack machines
val (>>=) : ’a t → (’a → ’b t) → ’b t
val (⊗) : (’a → ’b) t → ’a t → ’b t
. . . . . .
x
. . .
x
y
. . .
x
y
b
. . .
y
PushConst x
PushConst y
PushConst true
If
. . . . . .
y
PushConst
y
14/ 55
Stack machines: basic interface
module type STACKM =
sig
type (’s, ’t) t
val nothing : (’s, ’s) t
val (⊗) : (’r, ’s) t →
(’s, ’t) t →
(’r, ’t) t
val add : (int * (int * ’s),
int * ’s) t
val _if_ : (bool * (’a * (’a * ’s)),
’a * ’s) t
val push_const : ’a → (’s,
’a * ’s) t
val execute : (’s, ’t) t → ’s → ’t
end
15/ 55
Higher-order stack machines
module StackM : STACKM =
struct
type (’s, ’t) t = ’s → ’t
let nothing s = s
let (⊗) f x s = x (f s)
let add (x, (y, s)) = ((x + y, s))
let _if_ (c, (x, (y, s))) = ((if c then x else y), s)
let push_const v s = (v, s)
let execute f s = f s
end
16/ 55
Optimising higher-order stack machines
Why are the higher-order machines hard to optimize?
let (⊗) f x s = x (f s)
let push_const v s = (v, s)
let add (x, (y, s)) = ((x + y, s))
push_const 3 ⊗
push_const 4 ⊗
add
17/ 55
Optimising higher-order stack machines
Why are the higher-order machines hard to optimize?
let (⊗) f x s = x (f s)
let push_const v s = (v, s)
let add (x, (y, s)) = ((x + y, s))
Inlining push_const, add:
(fun s → (3, s)) ⊗
(fun s → (4, s)) ⊗
(fun (x, (y, s)) → ((x + y, s)))
18/ 55
Optimising higher-order stack machines
Why are the higher-order machines hard to optimize?
let (⊗) f x s = x (f s)
let push_const v s = (v, s)
let add (x, (y, s)) = ((x + y, s))
Inlining ⊗:
(fun s →
(fun (x, (y, s)) → ((x + y, s)))
((fun s → (fun s → (4, s)) ((fun s → (3, s)) s))
s))
19/ 55
Optimising higher-order stack machines
Why are the higher-order machines hard to optimize?
let (⊗) f x s = x (f s)
let push_const v s = (v, s)
let add (x, (y, s)) = ((x + y, s))
Inlining ⊗:
(fun s →
(fun (x, (y, s)) → ((x + y, s)))
((fun s → (fun s → (4, s)) ((fun s → (3, s)) s))
s))
Difficulty: evaluating under lambda
19/ 55
Stack machines: higher-order vs first-order vs staged
type (’s, ’t) t = ’s → ’t
let add (x, (y, s)) = (x + y, s)
type (’s, ’t) t = (’s, ’t) instrs
let add = Add :: Stop
type (’s, ’t) t = ’s code → ’t code
let add p = .
20/ 55
Staging the higher-order stack machine
module type STACKM_staged =
sig
include STACKM
val compile : (’s, ’t) t → (’s → ’t) code
end
21/ 55
Staging the higher-order stack machine
module StackM_staged : STACKM_staged =
struct
type (’s, ’t) t = ’s code → ’t code
let nothing s = s
let (⊗) f x s = x (f s)
let add p =
(.< let (x, (y, s)) = .~p in (x + y, s) >.)
let _if_ p =
.< let (c, (x, (y, s))) = .~p in ((if c then x else y), s) >.
let push_const v s =
.< (v, .~s) >.
let compile f = .< fun s → .~(f .. ) >.
let execute f s = !.( compile f) s
end
22/ 55
Staging the higher-order stack machine: output
# compile (push_const true ⊗ _if_);;
– : (’_a * (’_a * ’_b) → ’_a * ’_b) code =
.< fun s_59 →
let (c,(x,(y,s))) = (true , s) in
((if c then x else y), s)>.
# compile (push_const 3 ⊗ push_const 4 ⊗
push_const false ⊗ _if_);;
– : (’_a → int * ’_a) code =
.< fun s →
let (c,(x,(y,s))) = (false , (4, (3, s))) in
((if c then x else y), s)>.
# compile (push_const 3 ⊗ push_const 4 ⊗
push_const false ⊗ _if_);;
– : (’_a → int * ’_a) code =
.< fun s →
let (c,(x,(y,s))) = (false , (4, (3, s))) in
((if c then x else y), s)>.
23/ 55
Possibly-static values
type ’a sd =
| Sta : ’a → ’a sd
| Dyn : ’a code → ’a sd
let unsd : ’a.’a sd → ’a code =
function
Sta v → .
| Dyn v → v
24/ 55
Partially-static stacks
type ’a stack =
Tail : ’a code → ’a stack
| :: : ’a sd * ’b stack → (’a * ’b) stack
let rec unsd_stack : type s.s stack → s code =
function
Tail s → s
| c :: s → .<(.~(unsd c), .~(unsd_stack s)) >.
25/ 55
Stack machine: binding-time analysis
type (’s, ’t) t = ’s → ’t
let add (x, (y, s)) = (x + y, s)
type (’s, ’t) t = (’s, ’t) instrs
let add = Add :: Stop
type (’s, ’t) t = ’s code → ’t code
let add p = .
type (’s, ’t) t = ’s stack → ’t stack
let rec add : type s.(int * (int * s), int * s) t =
function
Sta x :: Sta y :: s → Sta (x + y) :: s
| . . .
26/ 55
Stack machine: optimising add
let extend : ’a ’b.(’a * ’b) stack → (’a * ’b) stack =
function
Tail s → Dyn .
| _ :: _ as s → s
let rec add : type s.(int * (int * s), int * s) t =
function
Sta x :: Sta y :: s → Sta (x + y) :: s
| x :: y :: s → Dyn .<.~(unsd x) + .~(unsd y) >. :: s
| Tail _ as s → add (extend s)
| c :: (Tail _ as s) → add (c :: extend s)
27/ 55
Stack machine: optimising branches
let rec _if_
: type s a.(bool * (a * (a * s)), a * s) t =
function
| Sta true :: x :: y :: s → x :: s
| Sta false :: x :: y :: s → y :: s
| Dyn c :: x :: y :: s →
Dyn .< if .~c then .~(unsd y) else .~(unsd x) >. :: s
| (Tail _ as s) → _if_ (extend s)
| c :: (Tail _ as s) → _if_ (c :: extend s)
| c :: x :: (Tail _ as s) →
_if_ (c :: x :: extend s)
28/ 55
Stack machine: top-level compilation
val compile : (’s, ’t) t → (’s → ’t) code
let compile f =
.< fun s → .~(unsd_stack (f (Tail .. )) ) >.
29/ 55
Stack machine: flexible optimisation
# compile add;;
– : (int * (int * ’_a) → int * ’_a) code =
.< fun s → ((fst s + fst (snd s)), snd (snd s))>.
# compile _if_;;
– : (bool * (’_a * (’_a * ’_b)) → ’_a * ’_b) code =
.< fun s →
((if fst s
then fst (snd (snd s))
else fst (snd s)),
(snd (snd (snd s))))>.
# compile (push_const true ⊗ _if_);;
– : (’_a * (’_a * ’_b) → ’_a * ’_b) code =
.< fun s → (fst s, snd (snd s))>.
30/ 55
Stack machine: flexible optimisation
# compile (push_const false ⊗ _if_);;
– : (’_a * (’_a * ’_b) → ’_a * ’_b) code =
.< fun s → (fst (snd s), snd (snd s))>.
# compile (push_const 3 ⊗ push_const 4 ⊗
push_const false ⊗ _if_);;
– : (’_a → int * ’_a) code =
.< fun s → (3, s)>.
# compile (push_const 3 ⊗ push_const 4 ⊗
add ⊗ push_const 2 ⊗
push_const false ⊗ _if_);;
– : (’_a → int * ’_a) code =
.< fun s → (7, s)>.
31/ 55
Staging generic programming
val gshow : ’a data → (’a → string) code
32/ 55
Generic programming: binding-time analysis
gshow.q (list (int * bool)) [(1, true); (2; false)]
Type representations are static Values are dynamic.
We’ve used type representations to traverse values.
Now we’ll use type representations to generate code.
Goal: generate code that contains no typeable or data values.
33/ 55
Desired code for gshow
val gshow : ’a data → (’a → string) code
type tree =
Empty : tree
| Branch : branch → tree
and branch = tree * int * tree
let rec show_tree = function
Empty → “Empty”
| Branch b → “( Branch “^ show_branch b ^”)”
and show_branch (l, v, r) =
show_tree l ^”, “^ show_int v ^”, “^ show_tree r
34/ 55
Generic programming
Type equality
type ’a typeable
val int : int typeable
val (=~=) :
’a typeable → ’b typeable → (’a,’b) eql option
Traversals
type ’a data
and ’u genericQ =
{ q: ’t. ’t data → ’t → ’u }
val int : int data
val gmapQ : ’u genericQ → ’u list genericQ
Generic functions
val gshow : string genericQ
35/ 55
Generic programming, staged
Type equality
type ’a typeable
val int : int typeable
val (=~=) :
’a typeable → ’b typeable → (’a,’b) eql option
Traversals
type ’a data
and ’u genericQ =
{ q: ’t. ’t data → ’t code → ’u code }
val int : int data
val gmapQ : ’u genericQ → ’u list genericQ
Generic functions
val gshow : string genericQ
36/ 55
Staging gmapQ
let ( * ) a b = {
. . .
gmapQ = fun { q } (x, y) → [q a x; q b y];
}
let ( * ) a b = {
. . .
gmapQ = fun { q } p →
.< let (x, y) = .~p in [.~(q a .
}
37/ 55
Staging gshow
let rec gshow : string genericQ =
{ q =
fun data v ->
“(“^ data.constructor v
^ String.concat ” ” (( gmapQ gshow).q data v)
^ “)” }
Difficulty: recursion
38/ 55
Cyclic static structures
39/ 55
Cyclic type structures
tree
tree
branch
int
40/ 55
Cyclic type structures
tree
tree
branch
int
gshow tree
gshow branch
gshow tree
. . .
gshow intgshow tree
. . .
40/ 55
Memoization
let rec fib = function
0 → 0
| 1 → 1
| n → fib (n – 1) + fib (n – 2)
41/ 55
Memoization
let rec fib = function
0 → 0
| 1 → 1
| n → fib (n – 1) + fib (n – 2)
fib 4
41/ 55
Memoization
let rec fib = function
0 → 0
| 1 → 1
| n → fib (n – 1) + fib (n – 2)
fib 4
fib 2fib 3
41/ 55
Memoization
let rec fib = function
0 → 0
| 1 → 1
| n → fib (n – 1) + fib (n – 2)
fib 4
fib 2
fib 0fib 1
fib 3
fib 1fib 2
41/ 55
Memoization
let rec fib = function
0 → 0
| 1 → 1
| n → fib (n – 1) + fib (n – 2)
fib 4
fib 2
fib 0fib 1
fib 3
fib 1fib 2
fib 0fib 1
41/ 55
Memoization
let table = ref []
let rec fib n =
try List.assoc n !table
with Not_found →
let r = fib_aux n in
table := (n, r) :: !table;
r
and fib_aux = function
0 → 0
| 1 → 1
| n → fib (n – 1) + fib (n – 2)
fib 4
fib 3
fib 2
fib 0fib 1
42/ 55
Memoization, factored
val memoize : ((’a → ’b) → (’a → ’b)) → ’a → ’b
let memoize f n =
let table = ref [] in
let rec f’ n =
try List.assoc n !table
with Not_found →
let r = f f’ n in
table := (n, r) :: !table;
r
in f’ n
let open_fib fib = function
0 → 0
| 1 → 1
| n → fib (n – 1) + fib (n – 2)
let fib = memoize open_fib
43/ 55
Typed maps
type t
val empty : t
val add : t → ’a data → (’a → string) code → t
val lookup : t → ’a data → (’a → string) code option
44/ 55
Typed maps
type t
val empty : t
val add : t → ’a data → (’a → string) code → t
val lookup : t → ’a data → (’a → string) code option
type t =
Nil : t
| Cons : ’a data * (’a → string) code * t → t
45/ 55
Typed maps
type t
val empty : t
val add : t → ’a data → (’a → string) code → t
val lookup : t → ’a data → (’a → string) code option
type t =
Nil : t
| Cons : ’a data * (’a → string) code * t → t
let empty = Nil
let add t d x = Cons (d, x, t)
let rec lookup :
type a.t → a data → (a → string) code option =
fun t l → match t with
Nil → None
| Cons (r, d, rest) →
match l.typeable =~= r.typeable with
Some Refl → Some d
| None → lookup rest l
46/ 55
Generating recursive definitions
47/ 55
Mutually-recursive definitions
let rec evenp x =
x = 0 || oddp (pred x)
and oddp x =
not (evenp x)
Difficulty: building up arbitrary-size let rec . . . and . . . and . . ..
n-ary operators are difficult to abstract!
48/ 55
Recursion via mutable state
let evenp = ref (fun _ → assert false)
let oddp = ref (fun _ → assert false)
evenp := fun x → x = 0 || !oddp (pred x)
oddp := fun x → not (!evenp x)
What if evenp and oddp generated in different parts of the code?
Plan: use let-insertion to interleave bindings and assignments.
49/ 55
Let insertion
val let_locus : (unit → ’w code) → ’w code
val genlet : ’a code → ’a code
.< 1 + .~(let_locus (fun () → .< 2 + .~(genlet .< 3 + 4 >.) >.)) >.
1 +
let x = 3 + 4 in
2 + x
50/ 55
Let rec insertion
val letrec : ((’a → ’b) code → (’a → ’b) code) →
(’a → ’b) code
let letrec k =
let r = genlet (.< ref (fun _ -> assert false) >.) in
let _ = genlet (.<.~r := .~(k .< ! .~r >.) >.) in
.< ! .~r >.
51/ 55
Generating code for gshow
val memofix : (string genericQ → string genericQ) →
string genericQ
let memofix h =
{ q = fun t →
let tbl = ref empty in
let rec result d x = match lookup !tbl d with
Some f → .<.~f .~x >.
| None →
let g = letrec (fun self ->
tbl := add !tbl d self;
.< fun y -> .~(h result .
in .< .~g .~x >.
52/ 55
Generating code for gshow
val memofix : (string genericQ → string genericQ) →
string genericQ
let gshow_gen : string genericQ → string genericQ =
fun gshow →
{ q = fun data v →
.< "("^ .~(data.constructor v)
^ String.concat " " .~((gmapQ gshow).q
data v)
^ ")" >. }
let gshow = memofix gshow_gen
53/ 55
Generated code for gshow
let show_tree = ref (fun _ → assert false) in
let show_branch = ref (fun _ → assert false) in
let show_int = ref (fun _ → assert false) in
let _ = show_int :=
fun i →
“(“^ string_of_int i ^ String.concat ” ” [] ^”)” in
let _ = show_branch :=
fun b →
“(“^ “(,)”^
(( String.concat ” ”
(let (l,v,r) = b in
[! show_tree l; !show_int v; !show_tree r]))
^”)”) in
let _ = show_tree :=
(fun t →
“(“^ ((match t with Empty → “Empty”
| Branch _ → “Branch “) ^
(( String.concat ” ”
(match t with
| Empty → []
| Branch b → [! show_branch b])) ^”)”))) in
!show_tree
54/ 55
Next time: reagents
55/ 55