COMP3000: Tips for assignment 2
The assignment is all about syntax analysis; so we need to understand how all the Kiama stuff works. I will
use examples from the weeks 5, 6 classes.
Parsers
Consider:
lazy val term : PackratParser[Expression] =
term ~ (“*” ~> factor) ^^ { case t ~ f => StarExp (t, f) } |
term ~ (“/” ~> factor) ^^ { case t ~ f => SlashExp (t, f) } |
factor
This is defining a new parser (PackratParser) called term. It is going to produce an Expression. There are
three possible things that can be parsed as a term and they are laid out in the next three lines. Note that
those three lines must be separated by “|”. It is very easy to forget one of those at the end of a line (but
not the last line).
Note that the three alternatives for term are tried in the order they are listed. Consider why factor was
listed last.
Each of those lines is usually going to look like:
some things to be parsed ^^ what to do with the data that comes from the parsers
The right part of the line is done only if the left part was successfully parsed.
Left of ^^
Each parser produces some data but not all of it is useful; e.g.
term ~ (“*” ~> factor)
Here there are three parsers but we don’t need any data from parsing “*”. It needs to be part of what gets
parsed but once the parsing has been successful then any data from “*” can be ignored because there is
no relevant data from it. Instead we want the data from term and factor. In the week 6 class we saw how
to discard unwanted data by having the parser combinators pointing at what we want to keep. From the
data above, we get:
(data from term) ~ (data from factor)
Here, we can think of ~ as binding these two data items together. The result of this is passed to the right-
hand side of our line.
^^
Simplistically, ^^ is an operator that expects its left operand to be data and its right operand to be a
function; and it passes that data as an argument to the function. E.g.
a ^^ f evaluates f(a)
Right of ^^
The right-hand side of our line is:
{ case t ~ f => StarExp (t, f) }
This is a Scala partial function – its parameter must be of the form t ~ f otherwise an exception will be
thrown. The data getting passed to it is indeed of that form:
t corresponds to the data from term
f corresponds to the data from factor
This partial function must produce a result of type Expression because this is what the first line of the term
parser declares (Packrat[Expression]). StarExp is a subclass of expression, so the result will be an
Expression.
^^ with function without parameters
^^ is smarter than what has been described above; for this example, it is sufficient to write:
term ~ (“*” ~> factor) ^^ StarExp |
If ^^’s left operand has n items of data bound together by n-1 occurrences of ~ (e.g. a ~ b ~ c), and you
have a function that expects n arguments then the above form can be used.
This is often not possible. Consider the integer and factor parsers:
lazy val factor : PackratParser[Expression] =
integer ^^ (s => IntExp (s.toInt)) |
idnuse ^^ IdnExp
lazy val integer : PackratParser[String] =
regex (“[0-9]+”.r)
We can see that the integer parser will return a String. In factor, for the integer alternative, integer
returns a string but IntExp expects an Int; so this conversion has to be explicitly handled and a lambda
expression (rather than a partial function) has been used because there is not requirement to match on
the function parameter.
^^^
There is a variant of ^^ that is sometimes useful. In assignment 2 in the alternatives for the literal parser
there is:
“false” ^^^ BoolExp (false)
Basically, this just says to ignore the result of the left operand (after the parse has been successful) and
return the value of the right operand.
Other lines
Some lines are even simpler when the result of the parse is the result that is required; e.g. the third
alternative for term is:
factor
This parser returns an Expression (as can be seen from the definition of factor) and that is what is required
for term, so nothing further to do.
Parsing repetition
+ and * can be used to express repetition in parsing; e.g. :
(statement+) ^^ ExpProgram
This is requiring one or more statements for a program. + and * will produce vectors. Note that the
brackets around statement+ are required.
Sometimes a list of things with a separator between each is needed, e.g. to parse a list of integers (“3, 1,
4”). This could be done with (assuming at least one integer):
integer ~ ((“,” ~> integer)*)
but this is a bit clunky. The simpler way to do this is (when there is a separator):
rep1sep(integer, “,”)
rep1sep corresponds to +, meaning at least one occurrence of the parser expression that is the first
argument. The second argument is the separator. For zero or more occurrences (with a separator), use
repsep.
\ in Scala strings
In a string delimited by “ (e.g. “abc”), \ is used to signal special characters (e.g. \n for newline). To get an
actual \ in your string, you need \\. E.g. “ab\\c” is a string of length 4 containing the characters ‘a’, ‘b’, ‘\’,
‘c’.
Scala also has multi-line strings delimited by “””. You will see this used in some of the bigger test cases. In
a multi-line string, \ has no special meaning, so it can be used as is.
Precedence and associativity
You should not start writing code for the assignment until you have rewritten the assignment grammar to
remove ambiguity from the grammar by using the rules of precedence and associativity as spelt out in the
assignment spec.
This needs to be done (separately) for:
expressions – you will need expression parsers between factor and exp
types – tipe and basictipe are enough type parsers; you just need to fill in their details
patterns – pat and basicpat are enough pattern parsers; you just need to fill in their details
There is an exercise in the week 7 class that shows how to rewrite the grammar.
We tend to think of the rules as applying to binary operators but this is too narrow a view. A good
example is function application. In HasLang, this is done just by juxtaposing two expressions – no brackets
or binary operator required. It still needs full precedence and associativity treatment.
What are precedence and associativity
Associativity tells us how to bracket an ambiguous expression; e.g.
60 / 6 / 2
This can be bracketed as:
(60 / 6) / 2 or 60 /( 6 / 2) [giving 5 or 20]
Normally division is left associative (i.e. bracket from the left like the left one above; but for this
assignment it is right associative.
Operators also have different levels of precedence; e.g. multiplication has higher precedence than
addition. This means that the ambiguous expression:
2 + 3 * 4
which can be bracketed as:
(2 + 3) * 4 or 2 + (3 * 4)
is done the second way because the operator with higher precedence (*) is done first. Precedence will
always be an issue when there are two adjacent operators of different precedence (like the example
above).
FunLine
In the definitions of variables, variable values (expressions) and multi-line functions are all represented by
FunLine objects.
A Funline takes three values:
an identifier (string)
a vector of patterns
an expression
For a definition of a variable with a value, the first two FunLine fields are irrelevant and are filled with “”
and Vector(). For example for:
x :: Int = 100
we get:
Defn(IdnDef(“x”, IntType()),
Vector(FunLine(“”, Vector(), IntExp(100))))
For a multi-line function, the identifier repeats the name of the function and there is a non-empty vector
of patterns. For example for:
length :: [Int] -> Int
length [] = 0.
length h:t = 1 + length t
we get:
Defn(IdnDef(“length”, FunType(ListType(IntType()),
IntType())),
Vector(FunLine(“length”, Vector(ListPat(Vector())),
IntExp(0)),
FunLine(“length”,
Vector(ConsPat(IdentPat(“h”),
IdentPat(“t”))),
PlusExp(IntExp(1),
AppExp(IdnUse(“length”),
IdnUse(“t”)))))
The first function line was used to populate the IdnDef with the name and type of the function.
The second and third lines of the function (note the dot separator at the end of the second line) each have
a FunLine with “length”, a pattern and an expression.
Additional information based on consult session 13 Sep
In SyntaxAnalysis.scala:
in factor: don’t change the existing lines in factor, just add some extra lines, but always keep the
failure line last (and don’t use failure anywhere else in the file)
exp needs to be the top-level parser for expressions; you will need other expression parsers
between factor and exp (based on precedence and associativity analysis) but do not rename exp
None of the existing test cases address precedence or associativity. You will need to write some.
Testing
The way you will need to test that == is not associative is to parse “1 == 2 == 3” and find that it ignores the
last part (“ == 3”) and produces a syntax tree just for “1 == 2”.
Because I have made – and / right associative, this has some consequences if you have + and – or * and /
next to each other. For example, “1 + 2 – 3” and “1 / 2 * 3” don’t produce what you might expect. There
will be no test cases like these two.
Some precedence and associativity tests don’t make sense from a type perspective but still need to be
done.
Additional information based on consult session 20 Sep
In the HasLang syntax, function application is shown as: exp exp , i.e. two expressions next to each
other.
In the specification of associativity, the list cons operator is listed; this also refers to “:” when used in a
pattern.
Some parts of the syntax do not have precedence and associativity because there is no parsing ambiguity
when they are used – they can be added to factor.
Note that once you have rewritten the grammar using the precedence and associativity rules, your
grammar will have extra rows (like the B row and the C row in the Precedence and Associativity
presentation). The A row corresponds to factor and the D row corresponds to exp. For each of the rows in
between, you will need a new parser, i.e. something that starts with
lazy val your-parser-name : PackratParser[Exp] =
You have rewritten the grammar but it is still just a grammar. Each line now needs to use Kiama’s parser
combinators (~, ~>, <~) and rewritten like:
... ~ ... ^^ ... (with | at end if another alternative follows)
See week 6 class for how the IF statement was incorporated into the parsing.
There will be no testing involving HasLang comments. Just ignore comments.
Distinguishing between a bracketed expression and a tuple may be difficult. The last page of the
Precedence and Associativity presentation may provide a hint.
20 Sep 2021