COMP3161/COMP9164 Supplementary Lecture Notes
Semantics
Gabriele Keller, Liam O’Connor September 29, 2019
After discussing syntactic properties of languages, let us now look at the semantics of program- ming languages and how they can be specified using inference rules. In a programming language, we distinguish between the static semantics and the dynamic semantics.
1 Static Semantics
Static semantics includes all those semantic properties which can be checked at compile time. What these properties actually, and to which extend a compiler is able to extract information about them depends on the programming language. In general, they are either related to the scoping or he typing rules of a language. In some languages, the type of an expression or the scope of a variable can only be determined during runtime, so this is not part of the language’s static semantics. We will look at both, dynamic typing and dynamic scoping, later on in the course.
In our language of arithmetic expressions, typing rules are pretty pointless, since we only have a single type, and every expression which is syntactically correct is also type correct. So the only interesting static semantic feature of this language is the scoping of the variables. The expression
let x=x+1
in
x*x
is syntactically correct, but not semantically, since x in the subexpression x + 1 is not bound to a value. We would like to have a set of inference rules which define a predicate ok which holds for a given expression e is correct in the sense that it contains no free variables at all, therefore e ok. As we know from the definition of the abstract syntax, the expression can either be a number, a variable, an addition, a multiplication, or a let-binding. Therefore, we need an inference rule for each of these cases stating under which conditions the expression is ok. For expressions representing simple numbers, it is straight forward, as they never contain a free variable, and we might write:
(Num int) ok
However, variables are more difficult, so we leave it for later, and look at addition instead: a term
(Plus e1 e2) is ok if both e1 and e2 are ok:
t1 ok t2 ok
(Plus t1 t2) ok
Obviously, multiplication works in exactly the same way. Now, let a look at let-expressions: the term (Let e1 x.e2) is ok if e1 is ok. What about e2, though? It may contain free occurrences of x. So, to determine is e2 is ok, we have to somehow remember that x is bound in e2. It seems it is
1
not sufficient to define ok depending solely on the expression, but we need a way to keep track of the variables which are bound in the expression. This can be done by changing adding a context Γ, a set of assumptions, to our judgment, and write
Γ ⊢ e ok
to mean e is ok under the assumptions in the context Γ. In this case, Γ will consist of assumptions written x bound, that indicates that x is in scope.
We extend the rules for numbers, addition and multiplication to include the context Γ: they do not change otherwise, for these cases, it is neither necessary to add anything to the environment nor to check the content.
Γ ⊢ (Num int) ok
Γ⊢t1 ok Γ⊢t2 ok Γ⊢t1 ok Γ⊢t2 ok
Γ⊢(Plus t1 t2)ok Γ⊢(Times t1 t2)ok
We can now also handle the case where the expression consists of a single variable: the expression is ok only if the variable is already in the environment. In case of a let-binding, we first check the right-hand side of the binding with the old environment, and then insert the new variable into the environment and check the body expression:
xbound ∈Γ Γ⊢t1 ok Γ,xbound ⊢t2 ok Γ⊢xok Γ⊢(Let t1 x.t2)ok
Initially, the environment is empty, and a arithmetic expression is ok if and only if we can derive ⊢ e ok (with an empty context) using the rules listed above.
2 Dynamic Semantics
Semantics of a programming language connect the syntax of the language to some kind of computa- tional model. There are different techniques to describe the semantics of programming languages: axiomatic semantics, denotational, and operational semantics. In this course, we only use opera- tional semantics, that is, we specify how programs are being executed.
Small Step Operational Semantics, also called Structured Operational Semantic (SOS) or Tran- sitional Semantics achieves this by defining an abstract machine and step-by-step execution of a program on this abstract machine. Big Step, Evaluation or Natural Semantics specifies the se- mantics of a program in terms of the results of the complete evaluation of subprograms.
2.1 Small Step or Structural Operational Semantics(SOS)
Let us start by giving the SOS for the arithmetic expression example. We first have to define a transition system, which can be viewed as a abstract machine together with a set of rules defining how the state of the machine changes during the evaluation of a program. To be more precise, we need to define:
• a set of states S on an abstract computing device
• asetofinitialstatesI ⊆ S
• asetoffinalstatesF ⊆ S
• a relation → ∈ S × S describing the effect of a single evaluation step on state s
2
A machine can start up in any of the initial states, and the execution terminates if the machine is inafinalstate. Thatis,forsf ∈ F,thereisnos ∈ S suchthatsf → s.
According to this notation s1 → s2 can be read as state s1 evaluates to s2 in a single execution step. A execution sequence or trace is simply a sequence of states s0, s1, s2, …, sn where s0 ∈ I ands0 →s1 →s2 →…→sn
We say that a execution sequence is maximal if there is no sn+1 such that sn → sn+1, and complete complete, if sn is a final state.
Note that, although every complete execution sequence is maximal, not every maximal sequence is complete, as there may be states for which no follow up state exist, but which are not in F . Such a state is called a stuck state and in some sense, it corresponds to run-time errors in a program. Obviously, stuck states should be avoided, and transition systems defined in a way that stuck states cannot be reached from any initial state.
How should S, I, F and → be defined for our arithmetic expressions? If we evaluate arithmetic expressions, we simplify them step wise, until we end up with the result. We can define our transition system similarly.
The Set of States S we include all syntactically correct arithmetic expressions. S ={e|∃Γ.Γ⊢ eok}
The Set of Initial States I then should be all expressions which can be evaluated to a integer value. These are all syntactically correct expressions without free variables:
I={e|⊢ eok}
The Set of Final States F as every expression should be evaluated to a number eventually,
we define the set of final states
F = {(Num int)}
Operations The next step is to determine the operations of the abstract machine. For all the arithmetic operations like addition and multiplication, we need the corresponding “machine” operation. To evaluate let-bindings, we also have to be able to replace variables in an expression by a value, so we add substitution of variables to the set of operations our abstract machine can perform. Substitution is a reasonable complex operation requiring the traversal of the whole expression, so by assuming substitution is an operation of the machine means that it is pretty far from a realistic machine. We will later look at alternative definition of abstract machines which do without substitution.
The →-Relation Finally, we do have to define the →-relation inductively over the structure. We do not have to provide any rules for terms of the form (Num n), since they represent final states. We start with the evaluation of addition. Keep in mind that → only describes a single evaluation step of the machine. If both arguments of plus are already fully evaluated, we can simply add the two values using the “machine addition”:
(Plus (Num n) (Num m)) → (Num (n + m))
What should happen if the arguments are not yet fully evaluated? We have to decide which
argument to evaluate first — it does not really matter. So, we start with the leftmost:
e 1 → e ′1
(Plus e1 e2) → (Plus e′1 e2)
3
This step is repeated until the first argument is fully evaluated, at which point be continue with the second argument:
e 2 → e ′2
(Plus (Num n) e2) → (Plus (Num n) e′2)
Multiplication works in exactly the same way:
(Times (Num n) (Num m)) → Num (n ∗ m)
e 1 → e ′1
(Times e1 e2) → (Times e′1 e2)
e 2 → e ′2
(Times (Num n) e2) → (Times (Num n) e′2)
Let-bindings are slightly more interesting. Again, we have to make up our mind in which order we want to evaluate the arguments. If we evaluate the first argument (i.e., the right-hand side of the binding) first and then replace all occurrences of the variable by this value, we have the following rules:
e 1 → e ′1
(Let e1 x.e2) → (Let e′1 x.e2)
(Let (Num n) x.e2) → e2[x := (Num n)]
Note that we could have decided to replace the variable immediately with the expression e1 in e2. If x occurs in e2 multiple times, it means, however, that we copy the expression, and consequently have to evaluate it more than once.
Example Given the rules listed above, the evaluation of an expression (Let (Plus 5,3) x. (Times x, 4)) proceeds as follows:
(Let (Plus (Num 5) (Num 3)) x.(Times x (Num 4)))
→
→
→
(Let (Num 8) x. (Times x (Num 4)))
(Times (Num 8) (Num 4))
(Num 32)
Note that we did not give the derivation for each of the evaluation steps.
4
⋆
More Notation We use the relation s1 → s2 to denote that a state s1 evaluates to a state s2 ⋆
in zero or more steps. In other words, the relation → is the reflexive, transitive closure of →. That is
⋆
s → s
s1 →s2 s2 →s3
s1 → s3
Furthermore, we write s1 → s2 to express that s1 fully evaluates in zero or more steps to a state
⋆ ⋆
!
2.2 Big Step Semantics
s2, or more formally,
!′⋆′′ s→s,ifs→sands ∈F
Let us now look at the big step semantics. Similar to the initial states in SOS, we have to define a set of evaluable expressions E, a set of values (corresponding to the final states in SOS), which can, but do not have to be a subset of E. Finally, we define a relation “evaluates to” ⇓ ∈ E × V . Note that in contrast to SOS, the relation does not say anything about the number of steps the evaluation requires.
Applied to our example, we define
• the set E of evaluable expressions to be : {e | ⊢ e ok} • the set V of values to be integers.
To define ⇓, we again have to consider all possible cases for e. This time, we do need rules for e = (Num(n)):
(Num n) ⇓ n
On the other hand, we only need a single rule for each addition and multiplication, big step semantics does not specify which of the arguments is evaluated first, since the evaluation of the two expressions is independent:
e1 ⇓ n1 e2 ⇓ n2 (Plus e1 e2) ⇓ (n1 + n2)
e1 ⇓ n1 e2 ⇓ n2 (Times e1 e2) ⇓ (n1 × n2)
The rules for the let-binding, however, state that e1 has to be evaluated first, because the variable is replaced by the resulting value, and therefore we have a data dependency:
e1 ⇓n1 e2[x:=(Num n1)]⇓n2 (Let e1 x.e2) ⇓ n2
Now consider the example expression used to demonstrate evaluation using the SOS rules. Not surprisingly, the expression evaluates to the same value using big step semantics. Here is the derivation:
(Num 5)⇓5 (Num 3)⇓3 (Num 8)⇓8 (Num 4)⇓4 (Plus (Num 5) (Num 3)) ⇓ 8 (Times (Num 8) (Num 4)) ⇓ 32
(Let (Plus (Num 5) (Num 3)) x.(Times x (Num 4))) ⇓ 32
The concept of an evaluation sequence does not make sense for big step semantics, as expressions evaluate in a single “step”.
5