CS 457/557 Fall 2015
Final Exam Project – Implement Document Store Queries
Mode: Individual
Due Date: Dec. 9 Demos 11:30-2:00 am in SEC3429/3430 – NO LATE DEMOS
For this assignment you are to write a program to implement the operations required to process and execute NoSQL queries on a preexisting file for a document store database. The data in the file corresponds to a collection in a database. In order to process the query you will have to parse the query to identify which operations it is requesting, perform the operations on the specified documents and display the results. You cannot run these queries using a database management system, instead you are implementing some of the software that would be used by the NoSQL DBS.
You may use the programming language of your choice since you can run it on your laptop for the DEMO.
Data: The name of your collection should be “CS457” for students in CS457 and “CS557” for students in CS557. Use the appropriate name, since it will be referenced in all queries. You will read in the input data and store it in a file. Assume each line in the input data represents a different document. Since this is a NoSQL database the names of each field are included in the data along with the value of the field. The fields may be stored in a different order for each document. Each field is a field_name: value – note the field name is followed by a colon and a space. We will assume all values for a field are integer. You should generate an ID field for each document.
Sample input data for final to be read and stored in a file
SNum: 555 Dept: 5
Dept: 10 Manager: 555 SNum: 777 Age: 20
SNum: 888 Age: 18
Age: 10 Manager: 555 SNum: 222
You can store the data anyway you choose. You are writing the DBMS, so do what you want to the data in order to process the query.
Queries: You will prompt the user to input a NoSQL query. The NoSQL query is similar to MongoDB but it is NOT EXACTLY the same. You will implement the following 3 operations.
Operation 1-
query: returns values to fields specified for documents that satisfy the specified condition(s)
db.CS457.query(condition, field)
condition — field_name op value
zero or more select conditions. If there is more than one condition they will be separated by an ‘and’ (‘or’ will not be included). Zero conditions are denoted with just a comma and a space before the field, and it means include all documents e.g. (, field).
op can be =, <, >
field — field_name(s)
zero or more field names separated by ‘+’ signs. This operation specifies the list of fields to be displayed. If there are zero fields, the closing parenthesis is used and it means include all fields for each document that satisfies the condition, including the ID field. Unlike Mongo DB, the ID field is only included if specified. If a document does not have a field in the field_name list, but it has other fields appearing in the list, then it should be included in the result.
query() returns the entire collection, all fields for all documents
Output: For each document satisfying the conditions, output each field name ending with a colon and the value for the field. The fields do not have to be in the same order in each of the documents, nor do the fields have to be in the same order as the field_name list.
Example queries and their results:
db.CS457.query(Manager = 555, SNum+Dept)
Dept: 10 SNum: 777
SNum: 222
db.CS457.query(Age > 15 and Manager = 555)
ID: 2 Dept: 10 Manager: 555 SNum: 777 Age: 20
db.CS457.query(, ID+SNum)
ID: 1 SNum: 555
ID: 2 SNum: 777
ID: 3 SNum: 888
ID: 4 SNum: 222
Since this is a NoSQL DB, there should not be an error generated if there is no match for the name of a collection or a field. In other words, if there is no such collection, output nothing. If no documents satisfy the specified condition, output nothing. If a field specified in a condition does not exist in the collection, then output nothing. If no such field exists in the document, ignore it in the output. However if other fields listed do exist, then list those in the output.
Example queries and their results:
db.CS457.query(Agge = 20, SNum)
//returns nothing, no Agge field
Operation 1-
aggregates: aggregate function which can be avg, sum or max
aggregate(field)
where avg is average of the specified field, sum is the sum of the values of the field and max is the maximum value of the field
Output: value of the computed aggregate. Use similar rules described above, e.g. if no such field, return nothing. Obviously, do not include a document in the calculation if it does not contain the field specified for the aggregate.
Example aggregate queries and their results:
db.CS457.sum(Age)
48
db.CS457.max(SNum)
888
db.CS457.avg(Age)
16
db.CS457.max(Cost)
//returns nothing, no Cost field in any document
Operation 3 –
cartprod: Cartesian product
cartprod(f1, f2)
Output: the Cartesian product between the all of the values of the two specified fields f1 and f2.
Example of Cartesian product query and its result:
db.CS457.cartprod(Dept, Age)
DEPT: 5 Age: 20
DEPT: 5 Age: 18
DEPT: 5 Age: 10
DEPT: 10 Age: 20
DEPT: 10 Age: 18
DEPT: 10 Age: 10
NOTE: I am sure there will be many questions about this assignment that I did not anticipate. Expect further clarifications to this assignment, so check it regularly. Start early so I can answer everyone’s questions.
Demos: You will demo your project on December 9 between 11:30 and 2:00. You will be given a series of queries to run. I will give you the dataset during dead week. Sign up sheet will available in class during dead week. Demos will be in SEC3429 and SEC3430.
Email your code to vrbsky@cs.ua.edu by 11 am Dec. 9.
CS557 – do the above plus:
1. Include the operation sort(field), which sorts the documents in ascending order of the specified field. Documents that do not contain the sort field should just be listed at the end. Always a assume a field will be specified and always assume ascending order of the sort. Example of sort and its result:
db.CS557.sort(Age)
Age: 10 Manager: 555 SNum: 222
SNum: 888 Age: 18
Dept: 10 Manager: 555 SNum: 777 Age: 20
SNum: 555 Dept: 5
2. Include the operation delete(field op value), which deletes the documents where the specified field value satisfies the condition. This operation returns the number of documents deleted. If no condition is specified then all the documents are deleted. Example of delete and its result:
db.CS557.delete(Manager = 555)
Number of documents deleted: 2
db.CS557.query()
SNum: 555 Dept: 5
SNum: 888 Age: 18