程序代写代做代考 Haskell go flex compiler COMP3161/COMP9164 Supplementary Lecture Notes

COMP3161/COMP9164 Supplementary Lecture Notes
Syntax
Gabriele Keller September 26, 2019
1 Concrete Syntax versus Abstract Syntax
The concrete syntax of a programming language is designed with the user/programmer in mind: it should be well structured and easy to read. The parser checks if a given program adheres to the concrete syntax and translates it into a suitable internal representation. The internal representation is usually quite different form the concrete syntax expression. To demonstrate why, let us go back to the arithmetic expressions example. Consider the following three expressions:
•1+2∗3
• (1)+(2)∗(3) • ((1))+(2∗3)
Syntactically, all three are different, but semantically, they denote exactly the same computation. Therefore they should ideally have the same internal representation. If we would have chosen a term representation of arithmetic expressions instead of an infix notation, we would not have had to worry about the ambiguity of the grammar nor about superfluous parentheses, and we could have defined the language with the following three simple rules:
i ∈ Int (Num i) expr
t1 expr t2 expr t1 expr t2 expr (Plus t1 t2) expr (Times t1 t2) expr
and all three expressions above would correspond to the term Plus (Num 1) (Times (Num 2) (Num 3)). Such a term-based syntax is obviously not well suited as concrete syntax of a practical language — it would be a nightmare to write a program in this style. However, it is the appropriate format for the internal representation. A parser, therefore, has to translate expressions of the concrete syntax into terms of the abstract syntax.
2 First Order Abstract Syntax
How can we specify the translation of expressions of concrete syntax into abstract syntax terms? As we discussed previously, inference rules and judgements cannot only be used to define simple properties of single objects, but also relationships between a number of objects. We use inference rules, therefore to define a relationship
e1 SExpr ←→ e2 expr
which holds if and only if the (concrete grammar) expression e1 corresponds to (abstract grammar) expression e2.
1

We take the inference rules of SExpr as basis, and add the translation for each case:
e PExpr ←→ e′ expr e SExpr ←→ e′ expr
e FExpr ←→ e′ expr e PExpr ←→ e′ expr
eSExpr←→e′ expr
(e) FExpr ←→ e′ expr i FExpr ←→ (Num i) expr
Previously, we used the inference rules to prove that an object had a certain property, e.g., that 1 + 3 is indeed an SExpr. With relations, inference rules become more interesting. For example, we can view the rules as a description of how to construct a abstract syntax term for a given arithmetic expression, say 1 + 3 ∗ 4. In this case, we start of with the proof/derivation goal 1 + 2 ∗ 3 SExpr ←→ ???, that is, the right hand side of the relation is not fixed yet. To apply the addition rule, however, it has to have the form Plus ?? ??. With every rule application, we know more about the exact form of this term, until we finally end up with Plus (Num 1) (Times (Num 2) (Num (3))) :
e1 SExpr ←→ e′1 expr e2 PExpr ←→ e′2 expr e1 + e2 SExpr ←→ (Plus e′1 e′2) expr
e1 PExpr ←→ e′1 expr e2 FExpr ←→ e′2 expr e1 * e2 PExpr ←→ (Times e′1 e′2) expr
i∈Int
1 FExpr ←→ (Num 1) expr 1 PExpr ←→ (Num 1) expr 1 SExpr ←→ (Num 1) expr
2 FExpr ←→ (Num 2) expr 3 FExpr ←→ (Num 3) expr 2 PExpr ←→ (Num 2) expr
2 ∗ 3 PExpr ←→ (Times (Num 2) (Num 3)) expr 1 + 2 ∗ 3 SExpr ←→ (Plus (Num 1) (Times (Num 2) (Num 3))) expr
The process of finding, for a given s, s SExpr, a term t, t expr, with s SExpr ←→ t expr is called parsing. A parser has to be complete, in the sense that for every s SExpr it should find the corresponding abstract syntax term. Furthermore, it has to be unambiguous, and return for every s SExpr a unique t. The inverse process is called unparsing. Since each of the parsing rules given above directly corresponds to a production rule of SExpr, it is trivial to show the completeness using rule induction.
We can interpret the inference rules given above also differently, and instead of deriving a term t expr for a given s, we can start with an abstract syntax term and derive a matching arithmetic expression. This process is called unparsing. Unparsing is, in our example and in general, ambiguous. It is also not necessarily complete, although it is for the arithmetic expression we defined. Going one step further and converting the concrete syntax token sequence back into a string is called pretty printing. Pretty printing is useful, for example, to view intermediate code in a compiler, or in tools which re-format a program. Pretty printing is even more ambiguous than unparsing, since we are free to add spaces and new lines in many places. A pretty printer aims at choosing the most readable representation, which is of course a very subjective measure.
3 Higher Order Abstract Syntax
Let us extend our simple language of arithmetic expressions by introducing variables and a let- construct to bind variables to values:
let x = 3 let x = 3
in x+1 in let y = x+1
in x+y
We need to extend the set of rules which define the concrete and abstract syntax accordingly.
2

Concrete Syntax: A let-expression acts like a parenthesised expression, and behave therefore like a FExpr. As with numbers, where we use int to denote an integer without further specifying it, we use id to represent an identifier:
e1 SExpr e2 SExpr letid=e1 ine2 FExpr
Abstract Syntax: Variables are represented by terms, similar to numbers, with the exception that the argument of the term is a string id which contains the name of the variable. A let- expression is represented by a term which requires three arguments: an identifier (bound variable), a term (the term the variable is bound to), and the body of the let-expression, i.e., the term in which the variable is defined. Since the first argument has to be an identifier, we can drop the var operator in the argument position:
(Var id) expr t1 expr t2 expr (Var id) expr (Let id t1 t2) expr
3.1 Scope of a Variable
Given an expression let x = t1 in t2, the variable x is bound in t2, but not in t1. The term t2 is called the scope of x, and x is bound to the value of t1 everywhere in t2. In particular, in the following situation where the same variable is bound twice, the outer binding is shadowed by the inner binding, and the value of the expression is 10.
let x = 3
in let x = 5
in x + x
Note that the scope of a variable in a let-binding is defined differently in Haskell: x is bound
in t1 and t2. As a consequence, the compiler accepts an expression
let
x = x+1
in x
even though the expression cannot be evaluated and leads to a run-time exception.
3.2 Representation of Variables
In the first order abstract syntax definition of arithmetic expressions, variables are treated just like numbers and represented by terms, although they play a special role. Higher-order abstract syntax addresses this shortcoming, and provides variables and variable bindings as part of the meta-language: first order terms can either
1. be a constant (e.g, ints, strings), or
2. have the form (Op t1 … tn), where t1 to tn are terms,
– (Num 4)
– (Plus (Num 2) (Num 1))
In addition, a higher-order term can
3. be a variable
4. have the form x.t′, meaning the variable x is bound in term t′
– x. (Plus x (Num 1))
3

– x.y. (Plus x y)
An term of the form x.t is called an abstraction. It is a term whose value depends1 on the value of the variable x. In this respect, it is similar to a function body.
An abstraction x.t is said to bind all occurrence of x in t. All variables of a term which are not bound at the position they occur are called free variables of that term. We denote the set of free variables of a term by FV(t). It is inductively defined follows:
FV (int) = {}
FV (x) = {x}
FV (o (t1, …, tn)) = FV(t1) ∪ … FV(tn) FV (x.t) = FV(t) \ {x}
For example, x is in FV ( plus (x, let (5, x.x))), which corresponds to the concrete syntax expres- sion
x + (let x = 5 in x)
since
If we want to use higher order syntax, we have to change the rules for variables and let-bindings in the definition of the abstract syntax:
t1 expr t2 expr id expr (Let t1 (id.t2)) expr
and adapt the the translation accordingly:
e1 SExpr ←→ t1 expr e2 SExpr ←→ t2 expr id FExpr ←→ (id expr) let id = e1 in e2 FExpr ←→ (Let t1 (id.t2)) expr
Now, the operator let accepts only two arguments, one being the right hand side, the second the abstraction of the body of the body.
3.3 Substitution and α-equivalence Consider, for example, the following two expressions:
let let x=3 y=3
in in x+1 y+1
They express exactly the same computation and only differ in the choice of the variable names. They are represented by the term Let (Num 3) (x. Plus x (Num 1)) and Let (Num 3) (y. Plus y (Num 1)) respectively. If two terms, as in the above example, can be made identical by renaming the variables, they are called α-equivalent, written ≡α. As the name suggests, ≡α is an equivalence relation. This means ≡α is
1. reflexive: for all terms t, t ≡α t
1More precisely, may depend, since it is possible that x does not actually occur in t, as in x.1
FV ( (Plus x (Let (Num 5) (x.x)))) =FV (x) ∪ FV (Let (Num 5) (x.x)))) ={x} ∪ FV(5) ∪ FV(x.x)
={x} ∪ (FV(x) \ {x})
={x} ∪ ({x} \ {x}) ={x} ∪ {}
4

2. symmetric: for all terms t1, t2, if t1 ≡α t2 then t2 ≡α t1
3. transitive: for all terms t1, t2, and t3: if t1 ≡α t2 and t2 ≡α t3 then t1 ≡α t3
If we want to determine the value of a let-expression, at some point, we have to replace the variable in the body with the value the variable is bound to. This process of replacing a variable with a value, or in general, an arbitrary term, is called substitution. We use the notation:
t[x := t′]
to describe a term t where every free occurrence of x has been replaced by t′. We can rename the variables in a term now by replacing the variable at its binding occurrence, and substituting it wherever it occurs freely in the term:
x.t ≡α y.t[x := y], if y ̸∈ FV(t)
We have to be careful about the choice of y, though. If we try to rename x to y in the term x. x + ywedonotwanttoendupwiththetermy. y + y,sincetheyintheoriginalterm is now captured, and the new term is not α-equivalent to the original term anymore. Therefore, we require that the new variable does not occur freely anywhere in the original term.
Let us now give the exact definition of substitution, first for variables:
x[x := y]
z[x := y]
(Op t1 …tn)[x := y] x.t[x := y]
z.t[x := y]
y.t[x := y]
= y
= z, if x ̸= z
= (Op t1[x := y] … tn[x := y]) = x.t
= z.t[x := y] if x ̸= z, y ̸= z, = undefined if x ̸= y
To avoid the problem of capturing, we require that the variable which is introduced does not occur anywhere in the binding position of a term. Similarly, if we substitute terms, we require that none of the free variables in the new term occurs at a binding position in the original term:
x[x := u]
z[x := u]
(Op t1 …tn)[x := u] x.t[x := u]
z.t[x :=u]
y.t[x :=u]
= u
= z, if x ̸= z
= (Op t1[u := x]… tn[u := x])
= x.t
= z.t[u :=x] if x ̸= z, z ̸∈ FV(u), = undefined if y ∈ FV(u)
In practice, it does not matter that substitution is only partially defined, because we can always rename the variable such that clashes do not occur. Many compilers will actually rename all the variables defined by the user and map them to unique names to simplify the further compilation steps. We silently assume from now on that the programs we are dealing with have unique names.
5