程序代写代做 algorithm go data mining database Java C javascript Cardiff School of Computer Science and Informatics Coursework Assessment for CM3104 Large Scale Databases

Cardiff School of Computer Science and Informatics Coursework Assessment for CM3104 Large Scale Databases
Module Code: CM3104
Module Title: Large Scale Databases
Lecturer: C.B. Jones, A.I. Abdelmoty
Assessment Title: Coursework
Assessment Number: 1
Date Set: Week 5, Monday 28th October 2019
Submission Date and Time: Week 10, Friday 6th December 2018 at 9:30am. Return Date: Week 12, Friday 10th January 2020.
This assignment is worth 30 of the total marks available for this module. If coursework is submitted late and where there are no extenuating circumstances:
1 If the assessment is submitted no later than 24 hours after the deadline, the mark for the assessment will be capped at the minimum pass mark;
2 If the assessment is submitted more than 24 hours after the deadline, a mark of 0 will be given for the assessment.
Your submission must include the official Coursework Submission Cover sheet, which can be found here:
https:docs.cs.cf.ac.ukdownloadscourseworkCoversheet.pdf
Submission Instructions
Description
Type
Name
Cover sheet
Compulsory
One PDF .pdf file
student number.pdf
PART A
Compulsory
One PDF .pdf file comprising your answer to all questions with snapshots of the mongodb shell as explained below.
PartAstudent number.pdf
Compulsory
The javascript file with your answer to all the questions in questions 2, 3 and 4.
PartAstudent number.js
PART B
Compulsory
One PDF .pdf file that includes for each of the first five answers to Part B:
1 The Oracle SQL query;
2 The answer to the query; 3 A screen shot of:
the query in Oracle followed by
the Oracle output from the query.
For Question 6 provide
1 computed area and discussion.
2 list of the operations and screen shots as instructed. 3 screenshots of maps as instructed.
PartBstudent number.pdf
1

Any code submitted will be run on a system equivalent to those available in the Windows laboratory and must be submitted as stipulated in the instructions above.
Any deviation from the submission instructions above including the number and types of files submitted will result in a mark of zero for the assessment.
Staff reserve the right to invite students to a meeting to discuss coursework submissions
Assignment
The coursework consists of 2 parts; Part A and Part B, each worth 15 of the total coursework mark for this module.
2

Coursework PART A: NoSQL Databases worth 15 marks
In this part of the coursework you will make use of two data sets: a restaurants data set in the file restaurants.js and a zipcodes data set in the file zipcodes.js. You are building an application with MongoDB that will use both data sets to find information about restaurants in different cities.
An example record from the restaurants dataset is:
id : ObjectId55cba2476c522cafdb053ae8, location :
coordinates : 73.9973325,
40.61174889999999 ,
type : Point ,
name : C C Catering Service
An example record from the zipcodes dataset is:
id : 01002, city : CUSHMAN, loc :
72.51565,
42.377017 ,
pop : 36963,
state : MA
Here is a map of a sample from the restaurants dataset.
3

1. Build a MongoDB database to store both the restaurants and zipcodes datasets.
1 marks
2. Write queries over the database to:
I. Find all the Cafes in the dataset and count their number. A Cafe is any restaurant
II.
which has the word Cafe in its title.
For each Cafe from the above set, find the city that it is located in. You can assume that a restaurant is located in the nearest city to its location.
3 marks
3. In your application you will need to retrieve the restaurants and their associated cities frequently.
Suggest and implement TWO different methods for relating the restaurants to the cities they are located in to allow for the efficient retrieval of the information.
6 marks
4. Query the new designs to find restaurants, grouped by city and state. Your answer should include: city name, state name, number of restaurants in the city and a list of all the names of the restaurants in the city.
An example document of the results of this query is as follows:

State: AL,
City: MOBILE,
No of Restaurants: 3, Restaurants:
Island Soft Pretzel Stop, Dairy Queen Grill Chill, Statue Of Liberty Deli

5. Compare the TWO designs you implemented, referring to their effectiveness for storing and retrieving the information about restaurants and associated cities.
2 marks
UPLOADS FOR PART A
1. Save your answer to the questions in questions 2,3 and 4 in a javascript file with the name: PartAstudent number.js
2. Save your answer to all questions in a pdf file: PartAstudent number.pdf. You can take a snapshot of the question being executed in the MongoDB shell, clearly demonstrating the answer to the question include snapshots of all
3 marks
4

intermediate steps as appropriate. A sample of the results is sufficient this should be the first 5 documents. An example of a sufficient snapshot of the answer to a query to find all records in the zipcodes collection is as follows.
5

Coursework PART B: Spatial Databases worth 15 marks
Please note that instructions on accessing the data that are required for this section follow the questions below.
For each of the following questions: write the Oracle SQL query that answers the question and write the answer. Follow that with a screen shot that shows both the SQL query and the
output
of the query in your SQL interface SQL Developer is recommended.
1. Name the populated places in the district of CAERDYDD CARDIFF that are within 100 metres of a motorway within the region of the provided datasets.
Note: the motorway is represented by a sequence of short segments, so be careful that each populated place is only counted once.
2 marks
2. Name the districts that share a boundary with CAERDYDD CARDIFF and give their areas, in square kilometres, for each of them.
2 marks
3. Count the number of segments of railway line that cross the M4 motorway within the region of the provided dataset.
Note that each segment of railway has a unique identifier. You can assume that no segment crosses a motorway more than once.
2 marks
4. Name the districts that have a coastal boundary and contain at least 5 railway stations, giving the numbers of railway stations for each of the districts
2 marks
5. What is the total area of land, in square kilometres, that is within 5 metres of the M4 Motorway and is inside the district of CAERDYDD CARDIFF.
4 marks
6. Using either QGIS or ArcGIS attempt to verify the answer that you have obtained from Question 5. Discuss the likely origin of any discrepancy between the answer that you obtain here and that from Question 5.
In your answer you should also list the sequence of operations that you performed to get the solution and for each operation provide screenshots of the QGIS or ArcGIS interaction dialog. Also provide screenshots of the maps of 1 the initial two layers; 2 the layers after selection of the M4 and Cardiff District; 3 the area that is within 5 metres of the M4 motorway and that coincides with the district of Cardiff.
3 marks
6

Instructions on accessing data required for Part 2 Spatial Databases
The file on Learning Central called LSDSpatialData2019Coursework.zip contains some Ordnance Survey digital map data in shape file format relating to the area around Cardiff in South Wales. You must import them to tables in Oracle using the mapbuilder application. Note also that the same shapefile data can be used to add layers to a project in the ArcGIS or QGIS QuantumGIS GIS systems. Looking at the data on a map in a GIS can help in understanding all queries and is required for Question 6.
On Learning Central you will also find the program mapbuilder as a jar file, i.e. mapbuilder 12.2.1.3.0.jar. The purpose of this program is to access the shape files and transform them into tables in the Oracle database.
To run the mapbuilder program, open a terminal and set the directory to the location where you have saved mapbuilder12.2.1.3.0.jar and type:
java jar mapbuilder12.2.1.3.0.jar
In Mapbuilder create a connection using : Host : csoracle.cs.cf.ac.uk
Service Name : csora12edu.cs.cf.ac.uk
In the Mapbuilder application use the Import Shapefile option to import the following six shape files i.e. with extension .shp in the directory LSDSpatialData2019Coursework: CardiffPopulatedPlaces
coastlineSouthWales
CardiffDistricts stationsSouthWales MotorwaysSouthWales RailwaysSouthWales
The geometry table field tells you the name of the table that is created it is the name of the shape file. Click next.
In the next dialogue, click on the SRID button to tell Oracle which spatial reference system SRS the data uses. Scroll down to and select British National Grid. For future reference note that in the dialogue that prompts for SRID you can just type in the numeric code for the SRS, which in this case is 81989. The Createupdate spatial index checkbox needs to be checked which it is by default. Click next for the following dialogue on Theme name. It is not necessary to enter anything, just click Finish. The program should then go ahead and create the table. Repeat this process for all six shape files, resulting in six tables in Oracle. To see the fields of the created tables, go to SQL Developer and click on the table name in the Tables submenu for your connection.
In each table there is a column called geometry that contains the spatial data, which in
the case of CardiffDistricts is a set of polygons; for stationsSouthWales and CardiffPopulatedPlaces it is points; and for MotorwaysSouthWales, RailwaysSouthWales and coastlineSouthWales it consists of lines.
7

Uploads for Part B
One PDF .pdf file the name of which is in the form: PartBstudent number.pdf that includes for each of the first five answers to Part B:
1 The SQL query using Oracle syntax typed out by you.
2 The output to the query copied from the Oracle output typed out by you; 3 A screen shot of:
the query in Oracle followed by the
output from the query.
The screen output should match precisely the content of Items 1 and 2
For Question 6 provide
1 the computed area value with a discussion of reasons for why it might differ from the value obtained in Question 5 using Oracle.
2 a list of the operations that you performed along with screen shots of the dialogue boxes with correct options selected.
3 screenshot maps of a the two layers; b the layers after selection of the M4 and Cardiff District; c the region within 5 metres of the motorway that coincides with the district of Cardiff.
8

Learning Outcomes Assessed
1. Demonstrate an appreciation of applications of largescale databases in a variety of commercial, scientific and professional contexts;
2. Understand how relational databases are extended with objectrelational technologies to support management of spatial information;
3. Describe nonrelational database approaches to support access to very large databases;
5. Exhibit a sound understanding of data mining and show familiarity with data mining algorithms;
Criteria for assessment
Credit will be awarded against the following criteria.
Part A
Ability to create and populate a MongoDB database
Ability to query and analyse data stored in the database
Quality of reflection on modelling options using the document data model
Feedback on your performance will address each of these criteria.
Allocated marks are specified against each specific questions, to a total of 15 marks
Part B
Correct formulation of Oracle spatial SQL queries with appropriate use of spatial operators and functions, along with correct corresponding output from the queries.
For question 6,
o insightintoreasonsfordifferencesinthenumericanswer;
o appropriate selection, and clarity of documentation, of the operations used
to obtain the answer.
Allocated marks are specified against each specific questions, to a total of 15 marks
Feedback and suggestion for future learning
Feedback on your coursework will address the above criteria. Feedback and marks will be returned by email at your cardiff.ac.uk address in Week 12 on Friday 10th January 2020. It will also be possible to request face to face feedback with the lecturers by appointment.
9

Feedback from this assignment will be useful for the final exam and potentially for your future learning more broadly.
10