simpleFUN Semantics
1 The SimpleFUN Abstract Syntax
We include the abstract syntax here for easy reference when studying the domains and transition rules in the following sections. There
is one minor change in the abstract syntax from handout 4: we explicitly include expressions using the unary and binary operators (¬e
and e1⊕e2, respectively). They are important in deciding when to create a notK or binopLeftK/binopRightK continuation, respectively
(see subsection 2.2). Thus, we have a way of differentiating between a function call that uses a user-defined function (e f (e1 . . . en),
where e f is not a unary or binary operator) and one that uses a built-in operator (¬e and e1 ⊕ e2). Note that e1 ⊕ e2 is shorthand
for ⊕(e1, e2), and that ¬e is shorthand for ¬(e) (meaning in both cases, they are essentially function calls, where the function is the
operator).
x ∈ Variable n ∈ N b ∈ Bool name, cons, fld ∈ Label ⊕ ∈ BinaryOp
prog ∈ Program ::= typedef 1 . . . typedef n e
typedef ∈ TypeDef ::= type name = cons1 :τ1 . . . consn :τn
e ∈ Exp ::= x | n | b | nil | ¬e | e1 ⊕ e2 | (x1 :τ1 . . . xn :τn)⇒ e | e f (e1 . . . en)
| if e1 e2 e3 | let x = e1 in e2 | rec x :τ = e1 in e2 | [fld1 =e1 . . . fldn =en]
| e.fld | cons e | case e of cons1 x1 ⇒ e1 . . . consn xn ⇒ en
2 Semantics
In this section we describe the following: (1) the semantic domains; (2) the state transition rules; and (3) the helper functions used by
the above descriptions.
1
2.1 Semantic Domains
The following figure describes SimpleFUN’s semantic domains, or the set of things that have and give meaning to the language.
Evaluating a SimpleFUN expression means to turn a syntactic expression (for example, 1 + (2 ∗ 3)) into some final semantic value
(for example, the natural number 7). When evaluating an expression, it is often necessary to evaluate its subexpressions first. We
use a construct called a continuation to do so, which allows us to remember the larger expression we are evaluating before we begin
evaluating its smaller subexpressions. Upon finishing the evaluation of some expression and returning some semantic value, we can
look at the current continuation to see how to use this value in some larger context (explained in more detail below).
You may notice that unlike the abstract syntax, which uses the ::= and | symbols in creating production rules, the figure below
uses = to separate domains and + to separate values within each domain. This change is notation is to reflect that we are defining
semantic objects, rather than syntactic objects. The vector notation represents an ordered sequence where (by abuse of notation) the
sequence is of unspecified length n and the subscript i ranges from 1 to n.
F ∈ TransitionFunction = State→ State
ς ∈ State = Term × Env ×
−−−−→
Konti
t ∈ Term = Exp + Value
ρ ∈ Env = Variable→ Value
v ∈ Value = NumV + BoolV + NilV + ClosureV + RecordV + ConstructorV + LetRecV
nv ∈ NumV = numV(N)
bv ∈ BoolV = boolV(true + false)
u ∈ NilV = nilV
clo ∈ ClosureV = closureV(
−−−−−−−→
Variablei,Exp,Env)
r ∈ RecordV = recordV(Label→ Value)
con ∈ ConstructorV = constructorV(Label,Value)
rl ∈ LetRecV = letrecV(Variable,Exp,Env)
κ ∈ Kont = binopLeftK(⊕,Exp) +
binopRightK(⊕,Value) +
notK +
appK(
−−−→
Expi,
−−−−→
Valuei) +
ifK(Exp,Exp) +
letK(Variable,Exp) +
recordK(
−−−−−→
Labeli,
−−−→
Expi,
−−−−→
Valuei) +
accessK(Label) +
consK(Label) +
caseK(−−−−−−−−−−→cons x⇒ e) +
restoreK(Env)
2.1.1 Values
A value can be a number, boolean, nil, closure (a function and its environment), record, constructor (user-defined type), and recursive
let. A NumV represents the semantic value of a number. A BoolV represents the semantic value of a boolean. A ClosureV contains
a function and the environment that was present when it was created (used to capture any of its free variables). RecordV and
ConstructorV represent the semantic values of their syntactic counterparts. A LetRecV is used to allow recursion in the language, by
giving us a structure that holds the unevaluated recursive expression, and the variable to which it should be bound, until needed.
2
2.1.2 Continuations
As introduced above, a continuation is a construct used to model control flow in this purely functional language, meaning we can
explicitly and deterministically control which expression we need to currently evaluate as well as remember any expressions we need
to evaluate afterwards.
Continuation Example For example, say we have some SimpleFUN expression like (3+4)+ (5∗6). To evaluate it, we will have to
perform an addition between the operands (3 + 4) and (5 ∗ 6). However, neither of these operands are values yet, so they too must be
further evaluated. We say that the next expression to evaluate is (3 + 4), and we remember that we were in the middle of trying to add
the expression (3 + 4) to the expression (5 ∗ 6). This act of remembering is creating a continuation (here, a binopLeftK) and putting
said continuation on our continuation stack. Then we proceed with the evaluation of the subexpression (3 + 4). As explained above,
we differentiate between the syntactic number 3, which is an expression, and the semantic number, or actual value, of 3. Because
of this, we need to evaluate each of the operands in this subexpression as well. To do that, we make the expression 3 the current
expression under evaluation, remembering that we were in the middle of trying to add the expression 3 to the expression 4 by creating
another continuation (a binopLeftK) and putting it on top of the continuation stack. Now that the current expression under evaluation
is 3, we know to convert that syntactic 3 into a semantic 3 value (here there is no tangible difference between the syntactic 3 and the
semantic 3 (i.e. its mathematical meaning); however, this will not always be the case, so we must use an explicit rule that shows the
correspondence between syntax and mathematical meaning, even for numbers).
Now that we have a value, we look on top of the continuation stack to figure out what we were doing beforehand that necessitated
getting this value 3. Our top continuation is a binopLeftK, which tells us that we were in the middle of adding two operands and that
the current value we have is the left operand. We then proceed to make the right operand expression, 4, the current expression under
evaluation, remembering that we were in the middle of an addition operation and that we already know the left operand, which is
the value 3. To remember this, we pop off the old binopLeftK continuation and put a binopRightK continuation on the continuation
stack. Then we proceed evaluating the expression 4, which becomes the value 4, and again we look on top of the continuation stack
to see what we were in the middle of doing. Since the top continuation is a binopRightK, we know that the current value 4 is the
right operand of the addition, and since we now have evaluated both operands, we can add them together to get the value 7. We pop
off the old binopRightK continuation and see that the top of the continuation stack is binopLeftK; this is because we originally put
this continuation on the stack when we began evaluating the expression (3 + 4). Now that we know that (3 + 4) evaluates to 7 and
that the top continuation is a binopLeftK, we know that we were in the middle of an addition operation and that the left operand is
7. We proceed by popping off this binopLeftK continuation, setting the right-hand side expression (5 ∗ 6) as the current expression
under evaluation, and placing a binopRightK on the continuation stack. After following the steps as we did previously, except now
for the expression (5 ∗ 6), we’ll eventually get to the point that the current value we just evaluated is 30 and the top continuation is
binopRightK. Since both operands 7 and 30 are fully evaluated, we pop off that binopRightK continuation from the stack and see that
we can finally do that original addition we wanted to do all along, resulting in the value 37. Since we just evaluated a value, and since
the continuation stack will be empty at this point (meaning there’s nothing else to do), we know we have finished evaluation of the
entire program.
Continuation Descriptions The following are descriptions of each of the continuations that can be placed on the continuation stack
~κ. These descriptions are a high-level overview of the basic way each is used. To get a clearer picture of exactly how they are created
and used, please study the transition rules in subsection 2.2, especially noting how the −−→κnew column changes during the evaluation of
most of the expressions in rules #1 through #15. During rules #16 through #32, which show how to handle terms which are values,
the continuation stack is often inspected as part of the premises; the continuation on top of the stack tells us what expression or
operation we were in the middle of evaluating and which term we need to look at next.
binopLeftK A binopLeftK continuation is used to remember 1) that the current expression under evaluation is the left-hand
operand of some binary operation, 2) the binary operator, and 3) the right-hand operand to be evaluated later. When we eventually
evaluate the left-hand operand down to a single value and see that the continuation on the top of the stack is a binopLeftK, we know
that we must now begin evaluating the right-hand operand, which was saved for us in the binopLeftK.
binopRightK A binopRightK continuation is used to remember that 1) that the current expression under evaluation is the right-
hand operand of some binary operation, 2) the binary operator, and 3) the left-hand operand, which was evaluated to a value previ-
ously. When we eventually evaluate the right-hand operand down to a single value and see that the continuation on top of the stack is
binopRightK, we know that we have evaluated both operands and can now perform the actual operation with that saved operator and
the two operand values.
3
notK A notK continuation is used to remember that the current expression under evaluation needs to be negated when it fi-
nally becomes a value. So when we eventually evaluate the expression down to a value, assuming that value is a boolean and the
continuation on top of the stack is a notK, we know that we must produce the negated version of the current boolean value.
appK An appK continuation is used to remember that we are in the middle of evaluating the function and arguments to a function
call. When we encounter a function call expression, we evaluate the function expression and use an appK continuation to store the
list of unevaluated argument expressions and the list of evaluated arguments and the eventual function value. Whenever we have
evaluated an expression down to a value and the top of the continuation stack is an appK, we add the current value to that list of
evaluated arguments and set the current expression under evaluation to be the head of the list of unevaluated argument expressions.
When we finally reach a value and the list of unevaluated expressions in the appK continuation on the top of the stack is empty, we
know that we can proceed with the function call (meaning we set the new expression under evaluation to be the function’s body,
updating the environment to include a mapping from the function’s formal paramters to the actual evaluated argument values).
ifK An ifK continuation is used to remember that we’re in the middle of evaluating an if expression. When we first encounter
an if expression, we set the condition expression as the expression under evaluation and create an ifK continuation to remember the
expressions to evaluate if the condition is true and if it is false. When we finally evaluate the condition expression to a value and
see that the top of the continuation stack holds an if continuation, we set the new expression under evaluation to be one of the two
expressions saved in the if continuation, depending on the whether the value of the condition was true or false.
letK A letK continuation is used to remember that the current expression under evaluation needs to be bound to the variable being
introduced in the let expression when the current expression eventually evaluates to a value, so that we can then begin evaluating the
right-hand side of the let expression with this new variable-to-value mapping added to the environment.
recordK A recordK continuation is used to remember 1) that we are in the middle of evaluating a record expression and 2) the
list of fields that have been evaluated and remain to be evaluated. This is so that when we reach a value and see that the top of the
continuation stack is a recordK continuation, we know to add the current value to the list of saved values and then continue evaluation
by evaluating one of the remaining unevaluated expressions used in constructing the record. When the recordK continuation’s list of
unevaluated expressions is empty, we can finally proceed by creating a record value filled with the list of values we had been saving
all along.
accessK A accessK continuation is used to remember 1) that the current expression under evaluation should be a record when
finally evaluated and 2) the field which we wish to extract out of this record value.
consK A consK continuation is used to remember 1) the name of a constructor being created and 2) that the current expression
under evaluation, when finally evaluated, should be the value used to create the constructor value.
caseK A caseK continuation is used to remember 1) that the current expression under evaluation should eventually be a user-
defined constructor to be matched against and 2) the list of branches to take. The branch to take will be the branch whose constructor
name matches the eventual value of the expression under evaluation.
restoreK A restoreK continuation is used to remember some environment before evaluating expressions that may alter the
current environment, so that upon reaching a state where the top of the continuation stack is a restoreK, we may continue evaluation
with the original environment.
2.2 State Transition Rules
The transition function F is defined in the table below. We treat the sequence of continuations ~κ as a stack; thus, κ is the top of the
continuation stack in the source state and ~κ1 is the rest of that continuation stack. We use :: to indicate the concatenation of sequences,
so that ~κ = κ :: ~κ1. For example, when we write appK(~ei, v :: ~vi) :: ~κ1, this is the same as if we had popped the top continuation κ off the
stack and then pushed an appK(~ei, v :: ~vi) continuation onto the remaining part of the stack ~κ1 like so:
appK(~ei, v :: ~vi)
~κ1
This corresponds to the manipulation of the continuation stack that occurs in rule #23.
4
A Comment on Notation In the table of transition rules below, we rely on special notation to be both precise and concise. We
use “ ” when we don’t care about the value in a particular position. We use ρ[x 7→ v] to indicate updating the environment ρ to
include a mapping from the key x to the value v, yielding a new environment in the process. Similarly, we use ρ[~xi 7→ ~vi] to indicate
updating the environment ρ to include a mapping from a sequence of keys ~xi to a sequence of values ~vi; thus, it is shorthand for
ρ[x1 7→ v1, x2 7→ v2, . . . , xn−1 7→ vn−1, xn 7→ vn]. Implicit in this notation is that |~xi| = |~vi| = n.
5
Table 1: The transition function F . This moves from an input state (〈t, ρ,~κ〉) to the next state (〈tnew, ρnew,
−−→
κnew〉). A program terminates normally when our term t is a
non-letrecV value and the continuation stack ~κ is empty; we represent this by trivially looping forever via rule #17.
# t Premises tnew ρnew
−−→
κnew
1 x x ∈ keys(ρ), v = ρ(x) v ρ ~κ
2 n numV(n) ρ ~κ
3 b boolV(b) ρ ~κ
4 nil nilV ρ ~κ
5 e1 ⊕ e2 e1 ρ binopLeftK(⊕, e2) ::~κ
6 ¬e e ρ notK ::~κ
7 (x1 :τ1 . . . xn :τn)⇒ e closureV(x1 . . . xn, e, ρ) ρ ~κ
8 e f (e1 . . . en) e f ρ appK(e1 . . . en, []) ::~κ
9 if e1 e2 e3 e1 ρ ifK(e2, e3) ::~κ
10 let x = e1 in e2 e1 ρ letK(x, e2) ::~κ
11 rec x :τ = e1 in e2 e2 ρ[x 7→ letrecV(x, e1, ρ)] restoreK(ρ) ::~κ
12 [fld1 =e1 . . . fldn =en] e1 ρ recordK(fld1 . . . fldn, e2 . . . en, []) ::~κ
13 e.fld e ρ accessK(fld) ::~κ
14 cons e e ρ consK(cons) ::~κ
15 case e of −−−−−−−−−−→cons x⇒ ei e ρ caseK(
−−−−−−−−−−→cons x⇒ ei) ::~κ
16 letrecV(x, e1, ρ′) e1 ρ′[x 7→ letrecV(x, e1, ρ′)] restoreK(ρ) ::~κ
17 v v , letrecV( , , ),~κ = [] v ρ []
18 v v , letrecV( , , ), κ = restoreK(ρ′) v ρ′ ~κ1
19 v v , letrecV( , , ), k = binopLeftK(⊕, e) e ρ binopRightK(⊕, v) :: ~κ1
20 v2 v2 , letrecV( , , ), k = binopRightK(⊕, v1), v ρ ~κ1
v = valueOf(⊕, v1, v2)
21 boolV(true) k = notK boolV(false) ρ ~κ1
22 boolV(false) k = notK boolV(true) ρ ~κ1
23 v v , letrecV( , , ), k = appK(e :: ~ei, ~vi) e ρ appK(~ei, v :: ~vi) :: ~κ1
24 v v , letrecV( , , ), k = appK([], ~vi) e ρ′[~xi 7→
−−−→vargs] restoreK(ρ) :: ~κ1
v f ::
−−−→vargs = reverse(v :: ~vi),
v f = closureV(~xi, e, ρ′), |
−−−→vargs| = |~xi|
25 boolV(true) k = ifK(e2, e3) e2 ρ ~κ1
26 boolV(false) k = ifK(e2, e3) e3 ρ ~κ1
27 v v , letrecV( , , ), k = letK(x, e2) e2 ρ[x 7→ v] restoreK(ρ) :: ~κ1
28 v v , letrecV( , , ), k = recordK(
−−→
fldi, e :: ~ei, ~vi) e ρ recordK(
−−→
fldi, ~ei, v :: ~vi) :: ~κ1
29 v v , letrecV( , , ), k = recordK(
−−→
fldi, [], ~vi) recordV([
−−→
fldi 7→
−→vall]) ρ ~κ1
~vall = reverse(v :: ~vi)
30 recordV([ . . . , fld = v, . . . ]) k = accessK(fld) v ρ ~κ1
31 v v , letrecV( , , ), k = consK(cons) constructorV(cons, v) ρ ~κ1
32 constructorV(cons, v) k = caseK(−−−−−−−−−−→cons x⇒ ei) e ρ[x 7→ v] restoreK(ρ) :: ~κ1
(cons x⇒ e) ∈ −−−−−−−−−−→cons x⇒ ei
6
2.3 Helper Functions
We define the helper functions used by the previous sections. The functions are listed in alphabetical order.
2.3.1 valueOf
valueOf takes a binary operator and two values and returns the result of operating on those two values with said operator. By using
the ˆ symbol over an operator, we differentiate between a syntactic operator (e.g. +, which is just a token from the parser) and the
mathemtical meaning of this operator (e.g. +̂ , meaning addition). Normally, the semantic meaning of a syntactic operator is obvious,
but we must be explicit when defining semantics.
valueOf ∈ BinaryOp × Value × Value→ Value
valueOf(+, numV(n1), numV(n2)) = numV(n1 +̂ n2)
valueOf(−, numV(n1), numV(n2)) = numV(n1 −̂ n2)
valueOf(×, numV(n1), numV(n2)) = numV(n1 ×̂ n2)
valueOf(÷, numV(n1), numV(n2)) = numV(n1 ÷̂ n2)
valueOf(<, numV(n1), numV(n2)) = boolV(n1 <̂ n2)
valueOf(=, numV(n1), numV(n2)) = boolV(n1 =̂ n2)
valueOf(∧, boolV(b1), boolV(b2)) = boolV(b1 ∧̂ b2)
valueOf(∨, boolV(b1), boolV(b2)) = boolV(b1 ∨̂ b2)
2.3.2 reverse
reverse reverses the elements of the given list. Because its implementation is standard, we do not bother to list it explicitly.
reverse ∈ List[A]→ List[A]
3 Example
Example Program call1:
((x: num)⇒x)(7)
Table 2: The sequence of state transitions for program call1.
Rule # t ρ ~κ
8 ((x : num)⇒ x)(7) {} []
7 ((x : num)⇒ x) {} appK([7], []) :: []
23 closureV([x], x, {}) {} appK([7], []) :: []
2 7 {} appK([], [closureV([x], x, {})]) :: []
24 numV(7) {} appK([], [closureV([x], x, {})]) :: []
1 x {x 7→ numV(7)} restoreK({}) :: []
18 numV(7) {x 7→ numV(7)} restoreK({}) :: []
17 numV(7) {} []
4 Getting Stuck
In general, it is not always possible to transition from one state to another state. For example, consider the following program:
1 + true
Eventually, this program will reach a point where execution cannot proceed. Specifically, this will occur upon a call of the valueOf
helper function, with the + operation. The + operation expects both of its arguments to be instances of NumV , but in this case +
is being asked to operate on an instance of NumV and an instance of BoolV . There is no listed behavior for this case, so execution
cannot proceed. This is referred to as “getting stuck”. If we were to try to draw out the automata for this program’s execution, it
would abruptly stop at the state just before the fateful call to the valueOf helper function.
7
4.1 Relationship to Types
Oftentimes, stuck programs are somehow inherently erroneous. Stuck programs, by their construction, attempt to do something for
which we intentionally did not specify a behavior for, as no behavior made sense. Understandably, programmers usually do not want
their programs to get stuck, as stuck programs do not usually do useful work.
Type systems are commonly employed to help avoid stuck programs. Observe that with our example, the typechecker you
previously wrote for simpleFUN would reject this program ahead of time. In this way, static type systems try to eliminate programs
which we know may get stuck at runtime.
An alternative to rejecting programs which may get stuck ahead of time is to modify the semantics themselves to never get stuck.
With the previous example, this would mean adding a catch-all case to valueOf of some sort, describing exactly what to do if we
receive a nonsensical input. A simple approach is to return some sort of error code or to otherwise indicate that an error has occurred,
ideally in a way that the program itself can respond to the error. Importantly, these errors exist in the semantics themselves. This
approach is often employed in dynamically-typed languages upon encountering a runtime type error. Even statically-typed languages
usually have to do at least some runtime checks to avoid getting stuck, such as bounds checking for array accesses in Java.
While programmers generally want to know when their programs get stuck (as this usually indicates some sort of programming
error), observe that this comes at a cost: static type checking can be both difficult to perform and restrictive, and dynamically checking
types incurs runtime overhead. As such, some language implementations effectively give up on informing programmers that their
programs get stuck. This can simplify the implementation significantly without sacrificing performance or flexibility, though at the
cost of making the language less safe and reliable. This approach is taken by weakly typed languages like C, which give up on
trying to catch a number of programming errors, such as accessing out-of-bounds elements in arrays. Instead, C puts this burden on
programmers themselves, who “promise” the C compiler that they will not write programs that can ever possibly get stuck. If this
promise is ever broken, the behavior of the C code becomes undefined, meaning anything can happen. From an automata perspective,
undefined behavior means that instead of abruptly ending the automata at a given state S , S will instead transition to some completely
arbitrary state S ′. Moreover, based on the definitions specified in the C standard, the transition rules themselves get thrown out for
all subsequent execution: S ′ may transition to another completely arbitrary state, and there is no longer any way to predict what the
automata will do. In practice, this leads to all sorts of nasty bugs, and has been the basis for major exploits like Heartbleed and the
recent Cloudflare leaks.
8
The SimpleFUN Abstract Syntax
Semantics
Semantic Domains
Values
Continuations
State Transition Rules
Helper Functions
valueOf
reverse
Example
Getting Stuck
Relationship to Types