RA: A Relational Algebra Interpreter Introduction
RA is a simple relational algebra interpreter written in Java. It is built on top of an SQL-based relational database system. It implements relational algebra queries by translating them into SQL queries and executing them on the underlying database system through JDBC. RA is packaged with SQLiteJDBC, so you can use RA as a standalone relational-algebra database system. Alternatively, you can use RA as a relational-algebra frontend to other database systems.
Currently Supported Database Systems
SQLite (SQLite JDBC v056): JDBC driver is included in the JAR. PostgreSQL 8.4: JDBC driver is included in the JAR.
MySQL 5: JDBC driver is included in the JAR.
IBM DB2 9.7: JDBC driver is included in the JAR.
Getting RA
The current version of RA is 2.2b (released Aug. 15, 2014). You can download one of the following:
A .tar.gz file with everything including source code.
A .zip file with everything including source code.
Just the .jar file. But we also recommend you download sample.db (sample SQLite database), sample.properties (RA connection properties file), and sample.ra (RA script for creating/restoring the sample database), and put them in the same directory as the .jar file.
Or follow the project on GitHub! Using RA
To run RA, just type “java -jar ra.jar” in the directory containing ra.jar. RA will then by default run on the SQLite database file sample.db (on which query examples below are based). To see more options of running RA, type “java -jar ra.jar -h”. Options -i (read commands/queries from a file), -o (log output to a file), and -v (turn on verbose mode) are all pretty useful.
If you want to connect to another database or a different database system, you will find it convenient to write a connection properties file of your own. An example is the file sample.properties. Once you have rolled your own properties file, say PROP_FILE, run RA using “java -jar ra.jar PROP_FILE”; add -P if you prefer typing your password at runtime instead of storing it in the properties file.
Once you are in RA, you will see the ra> prompt. For help, type \help;. You exit RA by issuing the \quit; command. Use the \list; command to see what relations are available for query in your database.
ADVANCED USERS: Using \sqlexec_{statement};, you can send an SQL command to the underlying database. This command allows you to manipulate the database in ways (e.g., updating it) that you cannot with just relational algebra. For example, the SQLite database file sample.db was in fact created by running “java -jar ra.jar sample.properties -i sample.ra”, where sample.ra makes heavy use of \sqlexec.
RA supports command-line input history and editing using arrow keys: Up/Down recall previous/next lines, and Left/Right move within the current line.
The simplest relational query you can write is one that returns the content of a relation: Just type relName;, where relName is the name of the relation. Note that every command/operator should start with a backslash (\), and every query/command should be terminated by a semicolon (;).
For most database systems that RA runs on (except MySQL), relation and attribute names are case-insensitive per SQL standard. For example, drink is just as good as DRINK. Attributes can be of a variety of types. Details are not important; just beware that types such as INTEGER, SMALLINT, FLOAT, REAL, DOUBLE, DECIMAL, NUMERIC are for numbers, and CHAR and VARCHAR are for strings.
Here is an example of a complex query, which returns beers liked by those drinkers who do not frequent James Joyce Pub:
\project_{beer} (
((\project_{name}
Drinker)
\diff
(\rename_{name}
// all drinkers
// rename so we can diff
\project_{drinker}
\select_{bar = ‘James Joyce Pub’}
// drinkers who frequent JJP
Frequents))
\join_{drinker = name} /* join with Likes to find beers */
Likes
);
The syntax is insensitive to white space, and it is fine to enter a query on multiple lines; RA will number the lines (beyond the first one) you enter for the current query. C/C++/Java-style comments (// and /*…*/) are supported.
RA supports the following relational algebra operators:
\select_{cond} is the relational selection operator. For example, to select Drinker tuples with name Amy or Ben, we can write \select_{name = ‘Amy’ or name = ‘Ben’} Drinker;. Syntax for cond follows SQL. Note that string literals should be enclosed in single quotes, and you may use boolean operators and, or, and not. Comparison operators <=, <, =, >, >=, and <> work on both string and numeric types. For string match you can use the SQL LIKE operator; e.g., \select_{name like ‘A%’} drinker; finds all drinkers whose name start with A, as % is a wildcard character that matches any number of characters. \project_{attr_list} is the relational projection operator, where attr_list is a comma-separated list of attribute names. For example, to find out what beers are served by Talk of the Town (but without the price information), we can write \project_{bar, beer} (\select_{bar = ‘Talk of the Town’} Serves);.
\join_{cond} is the relational theta-join operator. For example, to join Drinker(name, address) and Frequents(drinker, bar, times_a_week) relations together using drinker name, we can write Drinker \join_{name = drinker} Frequents;. Syntax for cond again follows SQL; see notes on \select for more details.
\join is the relational natural join operator. For example, to join Drinker(name, address) and Frequents(drinker, bar, times_a_week)
relations together using drinker name, we can write Drinker \join \rename_{name, bar, times_a_week} Frequents;. Natural join will automatically equate all pairs of identically named attributes from its inputs (in this case, name), and output only one attribute per pair. Here we use \rename to create two name attributes for the natural join; see notes on \rename below for more details.
\cross is the relational cross product operator. For example, to compute the cross product of Drinker and Frequents, we can write Drinker \cross Frequents;.
\union, \diff, and \intersect are the relational union, difference, and intersect operators. For a trivial example, to compute the union, difference, and intersection between Drinker and itself, we can write Drinker \union Drinker;, Drinker \diff Drinker;, and Drinker \intersect Drinker;, which would return Drinker itself, an empty relation, and Drinker itself, respectively.
\rename_{new_attr_name_list} is the relational rename operator, where new_attr_name_list is a comma-separated list of new names, one for each attribute of the input relation. For example, to rename the attributes of relation Drinker and compute the cross product of Drinker and itself, we can write \rename_{name1, address1} Drinker \cross \rename_{name2, address2} Drinker;.
Known Issues/Limitations
\rename only supports renaming of attributes; it does not support renaming of the input relation.
\union, \diff, and \intersect check the compatibility of input relations by checking the types of each pair of attributes in the order they appear in the input relations. Therefore, (\project_{name, address} Drinker) \diff (\project_{address, name} Drinker) will not return an empty table, because the ordering of attributes is important. This behavior is consistant with SQL (which departs from the “purest” relational model where the ordering of attributes is unimportant).
Error messages may not be especially meaningful. Recall that RA translates relational algebra expressions into SQL queries. RA expressions with ill-formed selection/join conditions or attribute name lists simply result in incorrect SQL
queries. In these cases, RA just passes back the error messages from the underlying database system, without attempting to create RA-specific messages. For some database systems, RA does not allow a relational algebra expression (or subexpression) to produce a result relation with identically named attributes, because these attributes cannot be distinguished. Depending on the underlying database system, you may get an error message like column “name” specified more than once (from PostgreSQL), or the statement does not include a required column list (from DB2).
In particular, \cross and \join_{cond} (theta-join) do not allow input relations with identically named attributes. For example, Drinker \cross Drinker; is illegal because it would generate a relation with two name and two address attributes. Therefore, the attributes need to be renamed before the cross product. Also, \project cannot output the same attribute more than once. SQLite has no problem with identically named attributes, because it automatically assigns attributes different names.
As a general rule, RA may signal an error whenever it has trouble determining how to name the attributes of a result relation. This behavior is database- dependent.
For example, consider Bar \union Beer;. Never mind what this query means, but it is arguably legal because both relations have two attributes with compatible string types. PostgreSQL, SQLite, and MySQL have no problem with this query, but DB2 will complain that their attribute names are different. In any case, \rename_{A, B} Bar \union \rename_{A, B} Beer; will always work. Also, \project_{price+1} Serves does not work on DB2, not because it does not support arithmetic operators, but because RA does not know how to name the resulting attribute. On the other hand, PostgreSQL will accept this query and assign a default attribute name ?column?, though it will still bark at \project_{price+1, price+2} Serves. SQLite and MySQL happily take both, by automatically assigning different names to output attributes.
Maintained by Jun Yang