CISC/CMPE 422,835: Formal Methods in Software
Engineering (Fall 2020)
Assignment 1: Evaluating predicate logic queries for time series data
Due date: Mon, Sept 28, 5pm (GitHub classroom and OnQ submission)
Learning Outcomes
The purpose of this assignment is to give you practical experience with
expressing properties formally and declaratively using queries and constraints in first-order logic,
defining the syntax of a language through the use of a parser,
defining the semantics of a language through the implementation of an evaluator, and
incremental software development,
as well as exposure to relevant software engineering tooling such as
code repositories such as GitHub,
parser generators such as ANTLR, and
build systems such as Gradle.
Intro
In this assignment, you will implement an evaluator for a language called QL (Query Language). Starter code is
provided that already contains code to read in all necessary input. In particular, a parser and an abstract syntax tree
builder for QL are given, which will construct an appropriate internal representation for your evaluation code to
operate on.
QL is based on predicate logic and has been designed specifically to express properties of and queries for 2-
dimensional arrays of sales data. Every row in such an array represents a different product, while every column
represents a different day. Individual products (or days) in the array will be referred by the (1-based) number of the
row (or column) they are in. Consider, for instance, the array:
3 2 5 10 0
0 1 4 0 9
1 9 2 3 2
2 0 1 2 2
This array contains sales amounts for 4 products (which can be referred by the natural numbers 1 to 4) on 5 days
(which can be referred to by the natural numbers 1 to 5). It expresses that, e.g., 3 units of product 1 were sold on day
1, 4 units of product 2 were sold on on day 3, and 10 units of product 1 were sold on day 4.
Description of QL
QL allows the formulation of formulas and expressions that are evaluated using such a sales array. Formulas can
capture properties such as “no product had a day with no sales”, while expressions can be used to express queries to,
e.g., collect “the set of days with at least one product with no sales”, or determine “the number of products with sales
greater than 10 on day 1”. The syntax of QL is defined by the following grammar.
// formulas
::= Identifier
// numeric expressions
// set expressions
Start symbols are
days, products, and sales amounts), the standard propositional connectives (negation, conjunction, disjunction,
implication, and logical equivalence), as well as numerical expressions and set expressions. Numerical expressions
(
some arithmetic operation on these). Set expressions (
the following numeric expressions can be used:
M[
(i.e., row) and the day (i.e., column), respectively. E.g., for the sales array above, the expression M[2,3] would
evaluate to 4.
salesForD(M,
sales array above, salesForD(M,3) would evaluate to 12.
salesForP(M,
the sales array above, salesForP(M,3) would evaluate to 17.
salesForM(M) which refers to the sum of all sales for all products on all days. E.g., for the sales array above,
salesForM(M) would evaluate to 58.
Moreover, numerical expressions can be built by computing the size of a set, or performing some standard
artithmetic operation such as addition or multiplication, etc.
To build set expressions, the following atomic set expressions are supported:
Day, which refers to the set of days used in the sales array. E.g., for the array above, Day would evaluate to
the set {1,2,3,4,5}.
Prod, which refers to the set of products used in the sales array. E.g., for the array above, Prod would
evaluate to the set {1,2,3,4}.
Sale, which refers to the set of sale amounts used in the sales array. E.g., for the array above, Sale would
evaluate to the set {0,1,2,3,4,5,9,10}.
Also, set comprehension is available. E.g., for the array above, {s:Sale | exists p:Prod. exists d:Day. M[p,d]=s &&
s>=5} would evaluate to the set {5,9,10}. Finally, the standard set operations union (denoted by +), intersection (&),
and difference (-) can also be used to build sets.
Examples
Using QL, the property “no product had a day with no sales” can be expressed by the formula !(exists
p:Prod. exists d:Day. M[p,d]=0) or by the formula forall p:Prod. forall d:Day. !(M[p,d]=0).
To determine “the set of days with at least one product with no sales” the set comprehension {d:Day |
exists p:Prod. M[p,d]=0} can be used.
The numerical expression size({p:Prod | M[p,1]>10}) will evaluate to the “the number of products with
sales greater than 10 on day 1”.
Parsing QL
Before a QL formula or expression can be evaluated, it must be parsed. The evaluator code provided to you already
contains a parser. That parser has been generated with the ANTLR parser generator (www.antlr.org). After the QL
formula or expression has been read in, the parser will, assuming that it is syntactically correct, build and return an
internal representation of it in the form of an abstract syntax tree (AST). The following class diagram describes the
classes and their relationships (attributes) used to define ASTs (note the correspondence between this class diagram
and the grammar).
In the diagram above, the top-level classes Formula, NExp, and SExp are high-lighted for readability; their names are
shown in italics, because they are abstract; classes labeled with ‘E’ in the upper right-hand corner are enumerations;
also, all associations have an implicit ‘exactly 1’ multiplicity constraint on their target end (e.g., objects of type
AtomicS have two attributes lhs and rhs each of which will point to exactly one object of type SExp). Note that the
https://www.antlr.org/
https://en.wikipedia.org/wiki/Abstract_syntax_tree
grammar contains productions allowing for bracketed formulas and expressions, but, for simplicity, the class diagram
does not contain classes for these. Instead, AST builder produces an instance of the type of the bracketed formula or
expression (e.g., the type of the root object of the AST for (2+3) will be BinNExp, rather than BracketedNExp).
The following object diagram shows the AST created for the formula forall d:Day. exists p:Prod. !(M[p,d]>5).
Note how the AST conforms to class diagram, i.e., the AST objects have the attributes shown in the class diagram.
Note that the starter code contains code that will create these object diagrams for the input formula or expression.
Provided evaluator code
The structure of the starter code is as follows: The QL grammar for ANTLR is stored in file src/main/antlr/QL.g, and
after the build, the generated parser will reside in directory src/generated/java/ql. Directory src/main/java contains
code to
build the AST after the parsing (subdirectory ast),
read in a sales array from a file in csv format (class CSVReader),
implement an environment, i.e., a data structure that stores the binding of variables to values (class Env),
declare methods to evaluate formulas and expressions (class BaseEval), and
declare exceptions for problems that might occur during the evaluation (class BaseEval).
It also contains the class Main which
reads in the command line arguments,
calls the csv reader, the parser, and the AST builder (and, optionally, draws the AST of the input formula or
expression),
http://localhost:50724/src/main/antlr/QL.g
http://localhost:50724/src/main/java/
http://localhost:50724/src/main/java/ast/
http://localhost:50724/src/main/java/CSVReader.java
http://localhost:50724/src/main/java/Env.java
http://localhost:50724/src/main/java/BaseEval.java
http://localhost:50724/src/main/java/BaseEval.java
http://localhost:50724/src/main/java/Main.java
creates an instance of the Eval class which has access to the sales array read in and starts the evaluation by
invoking the top-level eval method with the AST of the input formula or expression as argument (line
boolean result = new Eval(data).eval(ast)), and then
outputs the evaluation result, or an error message should an exception be thrown during the evaluation.
Running the provided evaluator code
Use your favourite means to clone and create a local copy of this repository. If you are unfamiliar with GitHub and
don’t know how to do this, please see the GitHub Classroom Quick Start pages. Using Git-Bash, the following will
clone, build, and run all unit tests in src/test/java.
git clone https://github.com/CISC422/a1
cd a1 && ./gradlew test
To build a runnable jar file eval.jar with Git-Bash run the Gradle (a build tool) wrapper in directory a1
./gradlew jar
The evaluator supports the following command line options:
Usage: java -jar eval.jar (-f=
-f=
-h, –help Show this help message and exit
-i Parse input from stdin
-V, –version Print version information and exit
-x Show AST
Examples
java -jar eval.jar -f=’forall p:Prod. forall d:Day. !(M[p,d]=0)’ m1.csv will run the evaluator on the
given formula and the sales array in file m1.csv
java -jar eval.jar -i m1.csv starts an interactive input session in which formulas or expressions can be
input (one per line) and which will be evaluated on m1.csv whenever each line is terminated. Ctlr-c or ‘exit’
ends the interactive input session. For instance, the screen shot below shows the use of the interactive input
to evaluate the three QL examples given above in the sales array also given above:
java -jar eval.jar -f=’forall p:Prod. forall d:Day. !(M[p,d]=0)’ -x m1.csv will have as the previous
invocation, and also open a window showing the AST of the input formula.
http://localhost:50724/src/main/java/Eval.java
http://localhost:50724/src/main/java/BaseEval.java#L56-L66
http://localhost:50724/src/main/java/Main.java#L112
https://github.com/CISC422/classroom-quickstart
http://localhost:50724/src/test/java
Your task
Your task is to implement the evaluation function for QL formulas and expressions. To this end, complete the
implementation of the class Eval(src/main/java/Eval.java) in src/main/java. Eval imports the AST classes which will
allow you to access the AST passed in from the call eval(ast) in lines 105, 112, and 119 in Main. Depending on the
type of the top-level node of the AST, this call will in invoke evalFormula(formula, new Env()), evalNExp(nExp, new
Env()), or eval(sExp, new Env()). You must complete the class Eval by providing implementations of these three
methods. Note that you should modify only the class Eval and no other class should be changed.
In general, the evaluation will require a traversal of the AST. We suggest that you break down the implementation of
evalFormula(Formula f, Env e), evalNExp(NExp nExp, Env e), and eval(SExp sExp, Env e) in Eval.java into the
following incremental development steps, each of which you should test thoroughly before advancing to the next
step. Note that the environment mentioned above will not be needed until Step 5.
Step 1: No variables, atomic formulas and expressions only
First, assume that the input to be evaluated does not contain any variables and is atomic, as in, for instance,
Day=Prod, 1>3, 42, Sale, or salesForM(M). In this case, your implementation of evalFormula, evalNExp, and
evalSExp should be straight-forward with symbols denoting numbers (such as 42) or relational operators
comparing them (see non-terminal
above, the numeric expression salesForM(M) should evaluate to the sum of all sales amount of all products
on all days, while the atomic set expressions Day, Prod, and Sale should evaluate to, respectively, the set of
days, products, and sales amounts occuring in the input sales array.
Step 2: No variables, atomic formulas and set expressions, but composite numeric expressions
Add support for composite numeric expressions such as 1+2, (2(3+4))/3, 1+M[5-3,2], salesForD(M,23), and
salesForP(M,3). The arithmetic operators (see
with ‘/’ representing truncating division on natural numbers (e.g., 5/2 should evaluate to 2). salesForD
should evaluate to the sum of the sales amounts for all products on a given day. Similarly for salesForP.
Step 3: No variables, atomic formulas, but composite numeric expressions and composite set
expressions (w/o set comprehension)
Add support for composite set expressions such as Day + Prod, and (Sale – Day) & Prod where the set
operators (see
Step 4: No variables, but composite formulas and expressions (w/o set comprehension)
Add support for composite formulas such as !(Day+Prod = Day), and (M[1,2]=M[2,1] && Day=Prod) where
the propositional connectives (see
Step 5: Composite formulas and expressions with variables, but no set comprehension
Add support for quantified formulas such as forall p:Prod. exists d:Day. forall p1:Prod. (!p=p1 =>
M[p,d]>=M[p1,d]). The occurrence of an identifier x in some formula evaluates to a value in the type of the
most recent quantification of x that contains that occurrence of x in its scope. That quantification also
determines whether x is universally or existentially quantified. For instance, identifier x in the array access
M[x,2] in the formula forall x:Day. exists s:Sale. exists x:Prod. M[x,2]=s is existentially quantified
and may evaluate to any product in the sales array. This scoping rule has the following consequences:
1. The environment (i.e., the data stucture that holds the variable bindings) should behave like stack,
i.e., a new binding for some variable x put into the environment for the purposes of evaluating
some subformula f should not destroy any existing bindings of x, but only ‘shadow’ them
temporarily until the evaluation of f is complete and then be removed. That is why Env is
implemented as a stack. Also, note that the method lookup returns the value in the binding of x
http://localhost:50724/src/main/java
http://localhost:50724/src/main/java/Eval.java
http://localhost:50724/src/main/java/Main.java#L105-L117
http://localhost:50724/src/main/java/BaseEval.java#L47-L57
http://localhost:50724/src/main/java/Eval.java
that is closest to the top of the stack (i.e., that was pushed onto the stack most recently). You
should use the class Env as is (again, all your changes should be confined to class Eval).
2. If the scope of a quantification of x over a non-empty domain does not contain any free
occurrences of x, that quantification is redundant. For instance, since the scope of the outermost
quantification in the above formula does not contain a free occurrence of x, that quantification can
be omitted, that is, the above formula is logically equivalent to exists s:Sale. exists x:Prod.
M[x,2]=s Thus, the formula (forall x:Day. exists s:Sale. exists x:Prod. M[x,2]=s) <=>
(exists s:Sale. exists x:Prod. M[x,2]=s) should be valid, i.e., evaluate to true for any sales
array.
Step 6: All of QL
Add support for set comprehension.
Error conditions
Not every syntactically correct QL formula or expression can be evaluated. I.e., there are syntactically correct formulas
and expressions that do not have a value. There are two cases in which this can occur:
1. Division by 0: As, for instance, in 1 = 3/(1-1).
2. Free variable: If the formula contains an occurrence of a variable x that is not bound by any quantification,
that is, it is a free occurrence of x, then that occurrence of x cannot be evaluated. Consider, for instance,
5 = x
forall p:Prod. M[p,d]=2
(exists s:Sale. s>100) && s<50 The provided code defines two exceptions (DivisonByZeroException and UnboundException) for each of these cases (class BaseEval). When your implementation of the evaluation detects that the input formula or expression cannot be evaluated, it should raise the corresponding exception (which will then be caught in Main.java and cause an appropriate error message to be printed). Hints The parser doesn't handle operator precedences right, that is, 3 + 4 * 6 gets parsed into (3 + 4) * 6, not 3 + (4 * 6). Always use parenthesis to enforce the intented order of evaluation. To determine the type of a formula or expression, you may find Java's instanceof operator useful. Your implementation of the evaluation routine should be relatively short (our reference implementation of Eval.java consists of significantly less than 300 lines of code). Notes The lexer (converts the input stream of characters into a stream of tokens) and the parser (converts the stream of tokens into an AST) used in the starter code have been generated using ANTLR. The use of lexer and parser generators is very common in industry. ANTLR is one of the most mature and widely used such tools. Other frequently used generators include JavaCC, Java CUP, and LEX/YACC. The sales arrays used in this assignment are an example of time series data as found in many business, industrial, social, scientific, and engineering domains. Note how QL and its tooling (including your evaluator) could easily be changed to support other kinds of time series such as rain measurements, stock prices, or body temperature readings. The Gradle wrapper file gradlew.bat in the starter code will build the code for you which includes resolving any dependencies by downloading the necessary code (such as ANTLR or any Java libraries) and invoking http://localhost:50724/src/main/java/BaseEval.java http://localhost:50724/src/main/java/Main.java#L125 https://www.antlr.org/ https://en.wikipedia.org/wiki/Time_series http://localhost:50724/gradlew.bat the parser generator. Industrial code is almost always built using build automation software such as Gradle. Other popular build tools include make, Apache Ant, and Apache Maven. Instructions Important: ignoring the instructions below may reduce your grade! Only change the contents of Eval.java and leave all other files unchanged. Also, Eval.java should not print anything to the console. Make sure you comment out any print statements prior to submission. Document the your code appropriately. Test your implementation using the code and sales array provided in src/test, but note that your code will also be tested using other formulas, expressions, and sales arrays. What and how to submit As for Assignment 0, submit your assignment before the deadline to GitHub Classroom and OnQ. For the OnQ submission, download an archive of your repository from GitHub and upload it as is to OnQ (see submission instructions for Assignment 0 for more details).