FIT5141 Assignment 2 2019
Due Date Sunday 15th Sept
Deployment of Dataset on MongoDB and R based simple analysis of Data
Introduction
This assignment is due by 11:55pm Sunday 15th Sept. It is worth 20% of the marks for your final assessment in this unit. A penalty of 10% (of the 20 Marks) per day, including each day of a weekend, will apply for late submission.
This is an individual assignment and must be entirely your own work. Please note the section on plagiarism, cheating and collusion in this document.
Specification
Overview
Implement a storage solution for dataset selected in Assignment 1 and in preparation for Assignment 3. Store dataset in database and perform simple data analytics and visualisations using Tableau and R. Must include analyses and visualizations that convey useful summaries of the data.
Data storage solution should be implemented in MongoDB.
A report must be written about the implementation and results of analyses.
Details
• MongoDB server setup options:
o You can either set up a mongoDB server on your own computer and perform all operations there, or use the cloud server provided by us;
o We will provide a MongoDB server in the cloud that each student can use to set up his or her own database. We will provide each student that needs it with their own MongoDB login and a separate database.
• Database setup requirements:
• The dataset must be stored in more than one collection (a collection is equivalent to a
table in a relational database).
o This means you will need to store descriptive information about the data in
separate collection to the primary data (e.g. the meaning of the information in the Vic Roads dataset such as the units of measurement, the meaning of “density” and “flow”) and so forth.
• Use the mongoimport command to import CSV or other supported data formats. E.g.: mongoimport -h 144.6.224.55 -d fit5141 -c vicroads –type CSV — headerline < 3003.csv
• Connecting and analyzing data using R
o AsyouwillneedtoquerymongoDBusingRandthereforeneedtoinstalltheR
package Rmongo from here: http://cran.r-project.org/web/packages/rmongodb/index.html. Refer to the R documentation for how to install a package.
o Performsimplestatisticalreportsintheformoftablesandchartsonthedataas appropriate for the business question chosen in assignment 1, part B.
§ If you choose to change your dataset and/or business question between assignment 1 and 2, this is ok but of course, change your report here appropriately.
FIT5141 Semester 2, 2019 1
§ Please discuss the business question and how you might report descriptive summary statistics with your tutor.
• Connect to MongoDB using Tableau (Optional for up to 5 bonus marks)
o There are several packages available to provide an ODBC or other interface for
Tableau to be able to access Mongo.
o Using Tableau generate visualization of the data appropriate to the business question
as described in the R section above. These can be similar summary statistics or related summary information to that provided in the R section.
[20 marks]
Assignment will be marked on:
◦ Quality of consideration of factors affecting implementation of storage solution
◦ Quality of projected performance of storage solution
◦ Completeness and seamlessness of database setup and R integration.
◦ Innovative approach to setting up the database, interfaces, and data analysis
Resources
• MongoDB download and installation instructions (if you are using your own PC) http://www.mongodb.org
http://docs.mongodb.org/manual/core/introduction/
• MongoDB management GUI umongo - http://www.edgytech.com/umongo
• The R interface for mongo rmonogo is a R package that can be installed directly through the
Rstudio package installation function. Or manually from: http://cran.r-project.org/web/packages/rmongodb/index.html.
Interviews
Students will be interviewed for 15-20 minutes to confirm the work is their own and during which they will be required to demonstrate:
• Setting up the MongoDB connection
• Connection to MongoDB using R
• Generation of simple statistics using R
Additional Guidelines
• The report part of the assignment should be up to 1000 words and should include o Descriptionofhowtheimplementationwasconducted
o Examplesofreportedstatisticsandcharts.
• Recommended font for the report body is Times New Roman, 12 point, single line spacing.
Please consult the lecturer if you have any enquiries about any of the aforesaid points or requirements.
Submission Requirements
This assignment is due to be submitted by 11:55pm nd September 2018
PLEASE NOTE: submission of the report component of the assignment assignment requires only a single submission to Moodle through a Turnitin link.
Turnitin
FIT5141 Semester 2, 2019 2
Turnitin is a plagiarism detection system that is very effective in discovering and proving plagiarism and collusion. Submission can be made through the following.
Your submission to Turnitin is your report in an MSWord file or PDF file that must be named with your student id number followed by _A1. For example, if your student id number is 12345678, then the file you submit should be named 12345678_A1.doc .
Marks will be deducted for any non-compliance with any these submission requirements.
Plagiarism, cheating and collusion
Students should consult University materials on this matter at:
http://www.monash.edu.au/students/policies/academic-integrity.html
The following excerpt is from the aforementioned url:
Plagiarism and cheating are regarded as very serious offences. In cases where cheating has been confirmed, students have been severely penalised, from losing all marks for an assignment, to facing disciplinary action at the Faculty level. While we would wish that all our students adhere to sound ethical conduct and honesty, I will ask you to acquaint yourself with the University Plagiarism policy and procedure (http://www.policy.monash.edu/policy-bank/academic/education/conduct/plagiarism- procedures.html) which applies to students detected plagiarising
It is your responsibility to make yourself familiar with the contents of these documents.
FIT5141 Semester 2, 2019 3