CS 457/557 Final Exam Project
Implement Document Store Queries
Fall 2016
Mode: Individual
Due Date: Dec. 7 Demos 11:30 am -2:00 pm in SEC3429 & 3430 – NO LATE DEMOS
EMAIL your code to vrbsky@cs.ua.edu by 11 am Dec. 7
Data File
For this assignment you are to write a program to implement the operations required to process and execute NoSQL queries on a preexisting file of data. The data in the file corresponds to a collection in a database. In order to process the query you will have to parse the query to identify which operations it is requesting, perform the operations on the specified documents and display the results. You must write the code to implement operations similar to a select operation, project operation, and an aggregate operation. You cannot run these queries using a database management system, instead you are implementing some of the software that would be used by a NoSQL DBS.
For the demo you will run one query at a time in a command line (something fancier is alright as well). You will not be inputting a file with all the queries.
You may use the programming language of your choice.
Data: The name of your collection is “final”. You must use this name, since it will be referenced in all queries. In the data file each line represents a different document. Since this is a NoSQL database the names of each field are included in the data along with the value of the field. The fields may be stored in a different order for each document. Each field is a fieldName: value – note the field name is followed by a colon and a space, and are separated by a space. Assume all values for a field are integer. You should generate an ID field for each document.
Example of possible input data for collection final:
EID: 555 Dept: 5
Dept: 10 Manager: 555 EID: 777 Age: 20
EID: 888 Age: 18
Age: 20 Manager: 555 EID: 222
You can store the data anyway you choose for your collection. You are writing the DBMS, so do what you want to the data in order to process the query.
Queries: You will prompt the user to input a NoSQL query. There will be a series of queries for you to process. The NoSQL query is similar to MongoDB but it is NOT EXACTLY the same. You will implement the following 2 operations.
Operation 1-
find: returns values to fields specified for documents that satisfy the specified condition(s)
db.final.find((condition), [field])
condition — fieldName comparisonOp value
There will be zero or more select conditions. Zero conditions is denoted as empty () and it means include all documents. If there is more than one condition, they will be separated by a series of ‘and’ and/or ‘or’. The comparisonOp can be <, >, =, <>. You may have any number of parentheses nested within the find clause, e.g. ((Age>20 and Manager=555) or (Age<10 and C4<>20)). Since ( )’s may be nested, use a strategy from your data structures class to process the conditions in the where clause.
field — fieldName
zero or more field names separated by commas. This is a projection operation. Zero fields is denoted as the empty list [] and it means include all fields for each document that satisfies the condition, including the ID field. Unlike Mongo DB, the ID field is only included if specified. If a document does not have a field in the fieldName list, but it has other fields appearing in the list, then it should be included in the result.
find((), []) returns the entire collection, all fields for all documents including the id.
Output: For each document satisfying the conditions, output each field name ending with a colon, a space, and the value for the field. Separate fields with a space. The fields do not have to be in the same order in each of the documents, nor do the fields have to be in the same order as the fieldName list.
Example queries and their results:
db.final.find((Manager=555), [EID, Dept])
Dept: 10 EID: 777
EID: 222
db.final.find((Age=20 and Manager=555), [])
ID: 002 Dept: 10 Manager: 555 EID: 777 Age: 20
ID: 004 Age: 20 Manager: 555 EID: 222
db.final.find(( ), [ID, EID])
ID: 001 EID: 555
ID: 002 EID: 777
ID: 003 EID: 888
ID: 004 EID: 222
Since this is a NoSQL DB, there should not be an error generated if there is no match for the name of a collection or a field. In other words, if there is no such collection, output nothing. If no documents satisfy the specified condition, output nothing. If a field specified in a condition does not exist in the collection, ignore it (evaluates to a false) and process other conditions if they exist. If a field does not exist in the document, ignore it in the output. However if other fields listed do exist, then list those in the output. If “find” is misspelled print an error.
Example queries and their results:
db.final.find((Agge=6), [EID])
//returns nothing, no Agge field
Operation 2-
avg: a function which computes the average
db.final.avg(field)
field is one fieldName
Output: avg_field: computed average
Example:
db.final.avg(Age)
// returns avg_Age: 16
Use similar rules described above, e.g. if no such field, return nothing. Obviously, do not include a document in the calculation if it does not contain the field specified for the aggregate. If avg is misspelled print an error.
NOTE: I am sure there may be questions about this assignment that I did not anticipate. Expect further clarifications to this assignment, so check it regularly. Start early so I can answer everyone’s questions.
Demos: You will demo your project on Wed. December 7 between 11:30 and 2:00. You will be given a series of queries to run. A sign up sheet will available in class during dead week. Demos will be in SEC3429 and SEC3430.
Email your code to vrbsky@cs.ua.edu by 11 am Dec. 7.
CS557 – do the above plus:
1. Do the aggregates min and count in addition to avg. The aggregates will have the same format as avg(field).
2. Include a group by for an aggregate of the form:
db.final.group((field), aggregate(field)) where for each value of DEPT, the operation will return the DEPT value and its average Age.
Example: db.final.group((Dept), avg(Age))
//Dept: 45667 avg_Age: 10
//Dept: 55588 avg_Age: 12
// etc.
Only the documents with values for both fields should be included in the results (yes, this is different from MongoDB).