Advanced Programming DIKU, B1-2017/2018 Question 1: AP : A simple computer algebra system
Computer algebra system (CASs), such as M or M , are domain speci c languages for manipulating – often interactively – mathematical expressions, for purposes such as algebraic simpli cation or symbolic di erentiation.
Modern CASs actually have little, if any, knowledge of common mathematical syntax or semantics hard-wired into their core program code. Instead, they are based on relatively general-purpose term-rewriting engines that can be instantiated to many di erent tasks. In particular, such engines support repeatedly applying simple equations from algebra and calculus to transform a term into a desired form. As a special case, this rewriting can include just evaluating a closed arithmetic expression to a numeric result (like in a functional language), but CASs can also work meaningfully with terms containing unbound variables. In this question, you will implement a simple CAS called AP (pronounced like
“maple”, but without the initial ‘m’).
Informal presentation of AP
An AP term is either a variable, an (unbounded) integer, or a function symbol applied to a list of arguments. Some binary functions may be optionally declared as in x operators with speci ed precedence and associativity conventions. For example, f(x,y)+2 is a valid term.
AP rewriting rules are written as algebraic equations between terms, considered as oriented from left to right. For example, here are some possible simpli cation rules ( le tiny.ap) governing addition and multiplication (assuming that + and * have been declared as left-associative in x operators):
- 1 0 + t = t.
- 2 t + 0 = t.
- 3 t1 + (t2 + t3) = (t1 + t2) + t3.
- 4 t + t = 2 * t.
- 5 0 * t = 0.
(The rule numbers in the margin are just for reference in the following.) Rewriting a term consists of repeatedly matching the left-hand side (LHS) of a rule against a part of the term (by suitably instantiating variables in the rule) and replacing that subterm with the right-hand side (RHS) of the rule (with the same instantiation applied).
For example, with the previous ve rules, we can pose a query to simplify an expres- sion:
> 3 * (x + (0 + y)) ? 3*(x+y)
Note that this simpli cation can be achieved either by applying rule 1 to the subexpression 0 + y (instantiating t in the rule as y), or by rst applying rule 3 (with t1 taken as x, t2 as 0, and t3 as y), and then rule 2 (with t as x).
3
Advanced Programming DIKU, B1-2017/2018
In general, especially for complex rule sets, simplifying an expression may give di erent results depending on the strategy used for choosing which rules to apply, and where. To see how a result was derived, we can request verbose output from the query:
> (0+x)+(x+0) ?? 0+x+(x+0) = 0+x+x+0 = 0+x+x =
x+x = 2*x
(Note that terms are output without redundant parentheses, given that + is declared as left-associative; but rules only match according to the underlying tree structure of the term. For example, we could not use rule 4 to rewrite the term 0+x+x+0 to 0+2*x+0, because the term is actually ((0+x)+x)+0, which does not have x+x as a subterm.)
It is useful to think of the term on the LHS of a rule as a pattern in a functional language. However, unlike in Haskell (but like in Erlang or Prolog), the pattern may contain multiple occurrences of the same variable, in which case the rule will only match if all occurrences are instantiated to identical terms. For example, rule 4 above would rewrite x*y + x*y to 2*(x*y), but it it would not match, e.g., x+y or (x+1)+(1+x).
Also, like in a functional language, all variables occurring on the RHS of the rule must be bound by the rule; otherwise an error is reported:
> f(x) = y+1. > f(z)? f(z) Error: Unbound var: y
Sometimes we only want to apply a rule in particular circumstances. For example, if we want to move numeric constants to the end of an expression, we might add a conditional rule,
n + t = t + n | num(n).
This says that we should only swap the arguments of an addition when the rule variable n is instantiated as a numeric constant (as checked by the built-in predicate num), while t can be an arbitrary term. (Note that this rule will still result in in nite rewriting on a term like 3 + 4.) A more general form of conditional rules also allows variables to be bound by computation:
n1 + n2 = n3 | num(n1), num(n2), add(n1, n2; n3).
This says that we can constant-fold a +-expression: the built-in predicate add binds its third argument to the arithmetic sum of the rst two, which must be numerals. (The semicolon separates input and output arguments to the predicate.) Adding this rule before the commutativity one will ensure that 3+4 gets rewritten to 7, rather than to 4+3.
4
Advanced Programming DIKU, B1-2017/2018
Finally, as demonstrated previously, it is possible to accidentally specify rule sets that will lead to in nite rewriting in some cases. AP has two mechanisms for coping with this. First, if successive rewriting steps ever reach a term seen before, further rewriting would be pointless since (AP ’s strategy being deterministic), we’d just go into an in nite loop. So, with the rules above (except the constant folding), we’d get:
> 3 + (4 + y) ?? 3+(4+y) = 3+4+y = 4+3+y =
4+3+y Error: Loop
Second, any rewriting sequence will stop after a con gurable number of steps. (For simplicity, this is just a constant maxSteps in AST; a more general approach would be to include it in GEnv so that it could be changed interactively.) For example,
> t(x) = t(x+1) | num(x). > t(5)? t(505) Error: Too many steps
Note that each recursive call involves two rewriting steps: t(5) = t(5+1) = t(6) = t(6+1) = …. That is why we stop at 505, not at 1005.
A larger set of rules can be seen in Figure 1. With these de nitions, a sample interaction could be:
> mypoly(x,y)=(x+y)**3. > mypoly(3,4)? 343 > D(a,mypoly(a,b))? a*a+a*b+a*a+a*b+b*a+b*b+b*a+b*b+a*a+a*b+b*a+b*b
With this introduction, we now present the details of the assignment.
Part 1: Syntax of AP
The full grammar of AP is shown in Figure 2
This grammar can be parsed into the following AST:
module AST where
type ErrMsg = String — human-readable error messages
type VName = String — variable names
type FName = String — function (including operator) names
5
Advanced Programming DIKU, B1-2017/2018
n1 + n2 = n3 | num(n1), num(n2), add(n1,n2;n3). n1 * n2 = n3 | num(n1), num(n2), mul(n1,n2;n3).
0 + t = t. t + 0 = t. t1 + (t2 + t3) = t1 + t2 + t3.
t1 - t2 = t1 + ~1 * t2.
0 * t = 0. 1 * t = t. (t1 + t2) * t3 = t1 * t3 + t2 * t3.
t * 0 = 0. t * 1 = t. t1 * (t2 + t3) = t1 * t2 + t1 * t3.
t ** 0 = 1. t ** n = t * t ** (n + ~1) | num(n).
D(x,n) = 0 | num(n).
D(x,x) = 1. D(x,y) = 0 | var(y), lexless(x,y). D(x,y) = 0 | var(y), lexless(y,x).
D(x,t1+t2) = D(x,t1) + D(x,t2). D(x,t1*t2) = t1*D(x,t2) + t2*D(x,t1).
Figure 1: Sample rule collection, rules.ap
6
Advanced Programming
DIKU, B1-2017/2018
Term ::= |
| | |
Termz ::= |
Terms ::= |
Cond ::= |
Conds ::= |
Rule ::= |
Cmd ::= |
|
vname
number
fname ‘(’ Termz ‘)’ Term oper Term
‘(’ Term ‘)’
Terms Term
Term ‘,’ Terms pname ‘(’ Termz ‘)’
pname ‘(’ Termz ‘;’ Terms ‘)’ Cond
Cond ‘,’ Conds Term ‘=’ Term ‘.’
Term ‘=’ Term ‘|’ Conds ‘.’
Rule
Term ‘?’ Term ‘??’
Cmds ::= |
Cmd Cmds
• vname, fname, pname: any non-empty sequence of letters and digits, starting with a
Nonterminals:
letter. There are no reserved names.
- number: any non-empty sequence of decimal digits, optionally preceded (without intervening whitespace) by a tilde character (‘~’), representing a negative sign.
- oper: any non-empty sequence of characters from the set “!@#+-*/\<>=”, except for a single ‘=’ (which is reserved.
Whitespace:
• All tokens may be separated by arbitrary whitespace (spaces, tabs, and newlines). • There are no comments.Disambiguation: see text.
Figure 2: Grammar and lexical specication of AP
7
Advanced Programming
DIKU, B1-2017/2018
type PName = String — predicate names
data Term = TVar VName
| TNum Integer
| TFun FName [Term]
deriving (Eq, Ord, Show, Read)
data Cond = Cond PName [Term] [Term] deriving (Eq, Show, Read)
data Rule = Rule Term Term [Cond] deriving (Eq, Show, Read)
data Cmd = CRule Rule
| CQuery Term Bool{-verbosity flag-} deriving (Eq, Show, Read)
data Fixity = FLeft | FRight | FNone deriving (Eq, Show, Read)
data OpTable = OpTable [(Fixity, [FName])] deriving (Eq, Show, Read)
-- the remaing definitions only relate to the Semantics part
data Rsp = Rsp [Term] (Maybe ErrMsg) deriving (Eq, Show)
maxSteps :: Int maxSteps = 1000
Since AP is a general-purpose system, none of the operators are hardcoded in the grammar. Rather, the available operators are speci ed in a separate operator table, such as arithmetic.op:
OpTable [(FNone, ["<=", "<"]),
(FLeft, ["+", "-"]), (FLeft, ["*"]), (FRight, ["**"])]
This table lists all the operators, grouped by increasing order of precedence. To avoid ambiguities, we specify that all operators at a given precedence level have the same associativity (as in Haskell’s infixl, infixr, and infix declarations). You may assume that the operator table is well formed, i.e., that all operator names are lexically valid, and that any operator name occurs at most once in the tale.
8
Advanced Programming DIKU, B1-2017/2018 Question 1.1: Parsing
Implement a parser for AP , in the le ParserImpl.hs The parser must provide the following top-level functions:
parseStringTerm :: OpTable -> String -> Either ErrMsg Term parseStringCmds :: OpTable -> String -> Either ErrMsg [Cmd]
You may use Parsec or ReadP for your parser. If you use Parsec, then only plain Parsec is allowed, namely the following submodules of Text.Parsec: Prim, Char, Error, String, and Combinator (or the compatibility modules in Text.ParserCombinators). In particular you are disallowed to use Text.Parsec.Token and Text.Parsec.Expr.
Hint If you cannot get the general OpTable-parameterized parser to work, make the exported parser functions check that their rst argument is precisely the xed table in arithmetic.op, and hard-code this collection of operators in your grammar. Be sure to document the restriction in your report.
In fact, writing such a hard-coded parser may be a useful rst step towards producing a general one.
Question 1.2: Pretty-printing
Implement a pretty-printer for terms. The output should be a syntactically valid AP Term. It should contain the minimal number of parentheses, and no whitespace. For example, with the standard operator table, the term
TFun “*” [TNum 3,
TFun “+” [TFun “+” [TVar “x”,
TFun “f” [TVar “y”, TNum 4]],
TVar “z”]]
should pretty-print as the string “3*(x+f(y,4)+z)”.
The printer, in PrinterImpl.hs, must provide the following function: printTerm :: OpTable -> Term -> String
Hint If you cannot get the OpTable-parameterized pretty-printer to work, make a rst version that simply includes enough parentheses in the output that operator precedences and associativities do not matter. As a second version, make one that handles the xed set of operators from arithmetic.op correctly, and prints all others with redundant parentheses, so that the output is still parseable. You might nd the version that prints many parentheses useful in your testing.
9
Advanced Programming DIKU, B1-2017/2018 Question 1.3: QuickChecking the syntax handling
In addition to the normal unit testing of the Parser and Printer modules separately, you should devise one or more QuickCheck tests, in le SyntaxQC.hs, that verify some non- trivial properties of the parser and/or printer.
It is important that your quick-check tests are black-box: they should only refer to the above-mentioned three top-level functions, as exported by the Syntax module. That is, your tests would also work (and potentially nd bugs in) alternative implementations of the Syntax API. In the report, discuss brie y what kind of errors your QuickCheck tests would and would not be likely to nd in someone else’s implementations of the Parser and Printer.
Part 2: Semantics of AP
We elaborate on the intended behavior of AP (beyond the previously given informal overview) in terms of concrete syntax for readability, but it is not a requirement that you have a functioning parser or printer to complete this part.
Rule-application terminology A rule consists of a LHS term, a RHS term, and zero or more conditions. We say that a rule matches a term if we can consistently bind the rule’s variables to terms, such that the LHS instantiated according to those bindings becomes identical to the term we’re trying to match. The rule applies to the term if it matches, and additionally all the rule conditions (if any) are satis ed (as detailed below). The result of the application is the RHS of the rule, with all variables instantiated according to the bindings. (It is an error for the RHS to contain unbound variables.)
Finally, a rule applies inside a term if it applies to a proper subterm of the term. In that case, the result of the whole application is the result of applying the rule to the subterm, placed into the subterm’s original context, as in normal equational reasoning about algebraic expressions.
Rule selection For a given term, several rules may apply to it, or inside it. AP speci es a deterministic strategy for where and how to apply rules in each rewriting step:
- Iftwopotentialruleapplicationsarenestedwithineachothertheoutermostischosen. For example, using the rules from tiny.ap in the informal presentation, the term 3 + 0 * (0 + x)isrewrittenbyrule5to3 + 0(andnotbyrule1to3 + 0 * x).
- Iftwopotentialruleapplicationsareindependent(non-nested),theleftmostischosen. Forexample,theterm(3 + 0 * x) * (x + 0)isrewrittenbyrule5to
(3 + 0) * (x + 0)(andnotbyrule2to(3 + 0 * x) + x).
• If multiple rules apply at the same position in the term, the rst one (in the order listed) is chosen. For example, the term 3 * (0 + 0) is rewritten by rule 1 to 3 * 0 (andnotbyrule4to3 * (2 * 0)).
10
Advanced Programming DIKU, B1-2017/2018
Rule conditions In a rule with conditions, the conditions are evaluated left-to-right, after any variable bindings arising from matching the LHS of the rule against the term. (It is an error for an input argument of a condition to contain unbound rule variables.) The conditions are all invocations of a collection of built-in predicates. A successful predicate invocation may further bind rule variables in the output arguments of the condition. Such bindings apply to any subsequent conditions in the list, as well as to the RHS of the rule.
The built-in predicates are:
• num(t) succeeds i t is a numeric constant.
• var(t) succeeds i t is a variable.
• add(t1,t2;t3) succeeds if t1 and t2 are numeric constants, and t3 can be bound to (or already is) their sum. For example, add(3,4;7) simply succeeds; add(3,4;x) (where x is not already bound) succeeds while binding x to 7; and add(3,4,8) fails. If t1 and/or t2 are not numbers, the predicate signals an error.
• mul(t1,t2;t3) is analogous to add, only computing the product of the numbers, instead of their sum.
• lexless(t1,t2) succeeds if the term t1 is lexicographically less than t2 (as de ned by Haskell’s automatically derived Ord class instance on the type Term). In particular, if t1 and t2 are both numbers, the lexicographic ordering coincides with the usual arithmetic one; and if they are both variables, the ordering coincides with the string ordering on the variable names.
Any attempted invocations of predicates other than the above, or with the wrong argument counts, signal an error.
Question 1.4: A rewriting engine for AP
Your engine, in SemanticsImpl.hs should be organized as speci ed below. It is very important that you do not modify the types of any of the intermediate functions you are asked to implement, as they will be subjected to automated testing (as well as human inspection).
The Global monad and related functions
type GEnv = [Rule]
newtype Global a = Global {runGlobal :: GEnv -> Either (Maybe ErrMsg) a} instance Monad Global where …
getRules :: Global [Rule]
failS :: Global a
failH :: ErrMsg -> Global a
tryS :: Global a -> Global a -> Global a
11
Advanced Programming DIKU, B1-2017/2018
The Global monad organizes the general rewriting process, not related to any particular rule application. The GEnv type contains the list of all currently available rewrite rules, accessible at any time with the getRules function.
We also formalize that the rewriting can fail, in one of two ways: (1) soft failures, signaled by failS, which represent recoverable conditions, such as a particular rule LHS not matching a term, or a rule condition not being satis ed; and (2) hard failures, signaled by failH, which indicate that the rewriting process has encountered an irrecoverable error condition, such as referencing an unbound variable or predicate, or a rewriting loop. Soft failures carry no further data, while hard failures come with an error message to be reported; this correspond to the Maybe ErrMsg type in the Left branch of Global.
The function tryM m1 m2 runs m1; if that succeeds, its result is the result of the whole expression, and m2 is not used. If m1 signals a soft failure, m2 is run instead. On the other hand, if m1 signals a hard failure, then that will be returned, and m2 is again not used.
Complete the instance declaration of Global as a Monad (as well as Functor and Applicative, as usual), and de ne the related functions. The remainder of your code should never invoke the Global term constructor directly, but only through the above-de ned functions.
The Local monad and related functions
type LEnv = [(VName, Term)]
newtype Local a = Local {runLocal :: LEnv -> Global (a, LEnv)} instance Monad Local where …
inc :: Global a -> Local a
askVar :: VName -> Local Term tellVar :: VName -> Term -> Local ()
The Local monad keeps track of variable bindings in the context of applying a single rule. The local environment LEnv is an extend-only association list (i.e., once a variable has been bound, it cannot be further modi ed). inc m views a Global computation m as a special case of a Local computation that simply does not access or modify the local environment. askVar v returns the current binding of v, or signals a hard failure (with a suitable error message) if the variable is unbound. tellVar v t binds v to t, if v is currently unbound. If v is already bound to t, tellVar does nothing; whereas if v is bound to some term other than t, tellVar signals a soft failure.
Again, the remainder of your code should not use the Local term constructor directly, but only the above-de ned functions.
Matching and instantiation
matchTerm :: Term -> Term -> Local () instTerm :: Term -> Local Term
The function matchTerm p t attempts to match the subject term t against the pattern term p, potentially binding variables in p to the corresponding subterms of t (but not the other
12
Advanced Programming DIKU, B1-2017/2018
way around). If the match fails (possibly because of already existing con icting variable bindings), matchTerm signals a soft failure.
instTerm t replaces all variables in t with their current values from the local environment. If t contains an unbound variable, a hard failure is signaled.
Conditions and rule application
evalCond :: PName -> [Term] -> Global [Term] applyRule :: Rule -> Term -> Global Term
evalCond pn ts evaluates the predicate pn on the input arguments ts (which are assumed to have already been instantiated according to the local environment). If the predicate succeeds, it returns the values of its output arguments (if any), to be matched against the pattern terms (usually just variables) in the predicate invocation. If the predicate does not hold for the input arguments, it signals a soft failure. On the other hand, if the predicate is not de ned, or invoked on illegal arguments, evalCond signals a hard failure.
applyRule r t attempts to apply the rewrite rule r to the term t. If this is not possible (e.g., because the rule LHS does not match t, or because one or more conditions in the rule is not satis ed), applyRule signals a soft failure. On the other hand, if attempting to apply the rule leads to a hard failure (e.g., because the rule RHS contains an unbound variable, or a condition mis-invokes a predicate), applyRule fails likewise.
Single-step term rewriting
rewriteTerm :: Term -> Global Term
rewriteTerm t attempts to rewrite the term once, by applying a rule anywhere within t according to the strategy de ned for AP (i.e., outermost, leftmost, rst). If no rule applies, rewriteTerm signals a soft failure. If attempting to apply a rule signals a hard failure, so does rewriteTerm.
Top-level interaction
processCmd :: Cmd -> GEnv -> (Rsp, GEnv)
processCmd c ge processes the command c in the global environment ge, and returns a
response and a possibly updated new global environment.
The command CRule r simply adds r to the end of the rule list in the global environment and returns an empty response (no terms and no message).
The command CQuery t True returns a response with the list of the successive rewritings of t (starting with t itself). If the rewriting stopped because no further rules applied, there is no message. If rewriting stopped because of an error (either during a rewriting step, or because of a detected loop or timeout), the message will say so; see the examples in the informal presentation. In any case, the global environment is unchanged. The command CQuery t False is similar, but includes only the last term in the rewriting list (i.e., the nal result, or the term at which the error occurred), not all the preceding ones.
13