代写 C++ C data structure Java SQL XML database graph README first:

README first:
• This is an individual programming assignment, to be completed under the Consultation model of collaboration as per the Computing Science Department Course Policies.
• You must not upload binary files or large text files (e.g., XML files out of Osmosis) of any kind to your GitHub repository.
• Your solution will be graded and must run on any of the machines in the instructional labs, without the installation of libraries or tools not already there.
Submission
Your answers must be on a GitHub repository created by following the link on eClass. That repository already has the folder structure for the assignment and the configuration file for Travis-CI. DO NOT MODIFY the directory structure nor the .travis.yml file.
You must also submit the URL of your GitHub repository on eClass before the deadline.
Learning Objectives
• Understanding the OpenStreetMap (OSM) data model.
• Extracting OSM data from a binary map file into an XML representation.
• Writing one or more programs to parse the XML representation and extract data of interest.
• Further practicing your knowledge of SQL.
• Becoming acquainted with embedded SQL programming with SQLite3 and its C/C++ programming API.
Part I — Loading OSM data into SQLite
The OSM data model has four elements:
• A node represents a unique point on the Earth’s surface; it has an id and a pair of lat/loncoordinates. A node can be used to represent a park bench, a water fountain, a phone booth, etc.
• A way is an ordered list of nodes, defining a path (called polyline in the OSM documentation) on the map.
• A path that starts and ends with the same node is said to be closed and represents an area on the map (e.g., a lake, a farm, a building, …).
• A relation is a multi-purpose data structure that groups together several elements (e.g., multiple nodes and ways).
• A tag is a key-value pair, used to describe an element. OSM has a predefined list of accepted tags, but users can add their own.
The minimum bounding rectangle (MBR) of a set S of node elements is the smallest rectangle such that no element of S falls outside that rectangle.
Q1 (2 marks)
In a sentence, your task is to write and document scripts and/or programs, in any language (as long as they can run on the lab machines) to create a SQLite database with all nodes, paths and areas (closed paths) within the City of Edmonton, together with any tags associated with them, from an OSM map of Alberta.
You need to use the Osmosis command-line Java tool to process the binary OSM data file and produce XML files which are both human and machine readable. You will use map data encompassing the city of Edmonton. For this, you can download a recent map of Alberta from the Canadian section of the Geofabrik.de download site. You must extract all node and all wayelements within the MBR containing all of the City of Edmonton. To find the coordinates of such MBR, use the official boundaries of the City of Edmonton described by approximately 8 thousand points (lat/lon coordinates).
Extracting Map Data With Osmosis
Download and install Osmosis in your laptop or your home directory in a lab machine. Run Osmosis to extract an XML dataset for the city of Edmonton, providing Osmosis the MBR of the city as a command line parameter. Call your output file edmonton.osm (note that an XML file need not have xml as extension).
Read this to familiarize yourself with the command line parameters. You want to use the following ones (all in one line and in the order below):
–read-pbf –bounding-box bottom=… left=… top=… right=…–write-xml edmonton.osm
Nodes
The XML file you created should have several thousand  elements like this one:

Such nodes should be stored in a table
node (id integer, lat float, lon float)
Some of the node elements in the XML file span multiple lines and have tag elements nested within them (which you need to capture as well–see below):

NOTE: in XML terminology, the single-line node elements are empty and hence can be closed succinctly in the same line (with />). The multi-line ones, instead, are not empty and need to be closed explicitly withe a 
 tag.
Paths and Areas
Paths and areas are encoded as way elements. For example, Athabasca Hall looks like this:

Each  element refers to a node in the XML file identified by the ref attribute. Also, these elements are logically ordered in the same way they appear in the file. Paths (open or closed) must be represented in the database inside two tables:
way (id integer, closed boolean)waypoint (wayid integer, ordinal integer, nodeid integer)
Tags
Finally, tag elements should be stored in two separate tables:
nodetag (id integer, k text, v text)waytag (id integer, k text, v text)
Constraints
Your database must enforce the following constraints:
• Primary keys: node(id) and way(id)
• Foreign Keys:
• waypoint(wayid) is a foreign key referencing way(id)
• waypoint(nodeid) is a foreign key referencing node(id)
• nodetag(id) is a foreign key referencing node(id)
• waytag(id) is a foreign key referencing way(id).
• Other constraints
• closed in way should be true if and only if the path is closed.
• There can be no loops (i.e, the same node cannot appear more than once) in a way for which closed = false.
• The ordinal values in waypoint form a dense ordering; that is: (1) each node is assigned unique ordinal between 1 and the number of nodes elements in the path. The ordinals should initially correspond to the order the nodes appear in the XML file.
Your solution must recompute the ordering of the nodes and the value of the closed attribute after each valid update, as appropriate, so that the constraints are not violated..
Loading the XML Data into a SQLite Database
The final product of this part of the assignment is a database with the schema described above. Below we give a suggestion of how to do that. You are free to choose a different approach, and of course, you can discuss it with your TA beforehand. Regardless of the design you choose, you are free to either write programs that produce CSV or TSV files as output and load them by hand into SQLite or you can write programs that connect to SQLite and populate the tables with Embedded SQL.
NOTE: due to memory restrictions, you are NOT allowed to load the entire XML file in memory using DOM. Instead, we suggest you scan the file element by element.
The proper way to do this is using the Simple API for XML (SAX) for this purpose. However, the XML file you are dealing with here is so simple that a less elegant alternative can work, which is to parse the file line by and decide what to do after each line. You should never do this for arbitrary XML files where you do not know a priori how many levels of nesting are used.
Naming convention
The test scripts for the other questions assume that there will be a SQLite database on a file called edmonton.db inside the q1 folder. If you do not follow this convention, you need to change the test scripts in every question.
What will be graded?
Your TA will grade the README.md file inside the q1 folder, and the .gitignore file in your repository. README.md should have clear instructions to build the database, while .gitignoremust have the path to all XML or SQLite files mentioned in your solution.
Rubric for Q1
Percentage
Description
100
TA is able to build the SQLite database with the instructions provided, **on a lab machine**, without errors or warnings, starting from a binary OSM file (e.g., the Alberta map); instructions work both on an OSM file created by the TA as well as on the one created by the students; the solution handles escaped characters properly; database has **all constraints properly defined**; all data is extracted as specified.
75
Solution works with instructions provided but only on the OSM file created by following the instructions; OR TA must fix/guess steps from the instructions provided; database has at least one trigger-based constraint properly defined; OR all trigger-based constraints are partially defined.
50
Instructions provided are insufficient; TA must guess/fix several steps; OR not all data is extracted and stored as specified; OR not all data is extracted as specified; OR only the PK/FK and CHECK constraints are implemented.

25
TA is able to extract and load at least two kinds of data (e.g., nodes and tags but not ways) from your instructions; OR only the primary key and foreign key constraints are implemented.

0
Repository contains any binary file or any temporary XML file(s) created during testing; OR the solution is in a repository created by the student; OR the XML processing is done using DOM; OR the solution requires libraries or tools not in the instructional lab machines.
Part II — Embedded SQL
Required reading:
Read all of An Introduction To The SQLite C/C++ Interface before writing any code.
Further to that, read the following carefully before you write any code:
• Database Connection Handle
• Prepared Statement Object, paying especial attention to the part explaining life-cycle of a prepared statement object
• Result Values From a Query
• Binding Values To Prepared Statements
• Create or Redefine SQL Functions
Specifications
For each question, you are asked to write a C program that follows the life-cycle described here to accomplish the following tasks. (If you want to use C++ talk to the instructor.) Your programs should work on any SQLite database with the schema as in Part I**. Your TA will test your code on multiple databases conforming to that schema.
Input/Output
All input to the programs must be done via command line arguments in the order specified in the question, or in text files. A tag given as command line arguments will always be a single string in the form key=value (i.e., a single entry in the argument list). For example, a tag could be wheelchair_accessible=yes.
The output of the program must be as specified and match the unit test code provided.
Q2 (1 mark)
Write a C program, in a file called q2/src/solution.c that takes as input the database file and two node identifiers and prints to STDOUT their geographical distance in meters, computed by a suitable function from here and links therein.
The distance must be computed by a new user-defined SQL function, called from a single query in your code. In your README.md file for this assignment, with an explanation of why the function you chose is suitable for computing distances within the city of Edmonton. Your explanation will count for 1/2 mark towards your grade in this question.
Output: your program must print to STDOUT the geographical distance (i.e., a single number) or the word error (in a single line) if the parameters are incorrect (e.g., the database file is missing or the number of node ids is incorrect, or some id is missing from the database).
Q3 (0.5 mark)
Write a C program, in a file called q3/src/solution.c that takes as input the database file and a listof strings of the form key=value; finds every node in the database having at least one tag matching a key/value combination from the input list; and prints to STDOUT: the number of such node elements, as well as the largest pairwise distance among those nodes.
The maximum distance must be computed by a single SQL query that uses the function you created for Q2.
Output: your program must print to STDOUT two numbers separated by a space or a tab, or the word error (in a single line) if the parameters are incorrect.
Q4 (0.5 mark)
Write a C program, in a file called q4/src/solution.c that takes as input the database file and a listof strings of the form key=value, and finds every way in the database having at least one tag matching a key/value combination from the input list; and prints to STDOUT: the number of such paths, and the length of the longest such path, computed by a single SQL query as in Q3.
All lengths must be computed in SQL, as in Q2/Q3.
Output: your program must print to STDOUT two numbers separated by a space or a tab or the word error (in a single line) if the parameters are incorrect.
Q5 (0.5 mark)
Write a C program, in a file called q5/src/solution.c that takes as input a database file and a tsvfile containing zero or more lines, each describing a node to be inserted into the database.
In that file, each node must be described by at least three columns: (0) the node id, (1) the latitude, (2) the longitude. Subsequent columns in the file correspond to strings of the form key=valueproviding tags for the node, and must be inserted accordingly.
Output: your program must print to STDOUT the word success on a single line if the execution was successful or error on a single line followed by the SQLite error message on subsequent lines, in case the output is invalid or some constraint is violated.
All or nothing: your program must leave the database unchanged if there is an error on any of the nodes described in the input file.
Q6 (0.5 mark)
Write a C program q6/src/solution.c that takes as input a database file and a tsv file containing zero or more lines describing zero or more way elements, which are to be inserted into the database. The format of the input TSV is as follows:
• Each way element is described by two consecutive non-blank lines (i.e., blank lines are used to separate ways)
• The first such line has the id of the way in column 0, followed by zero or more strings of the form key=value in subsequent columns, with tags for the way.
• The second line has all the nodes (identifiers) in the way, with the column number corresponding to the order of the node.
Output: your program must print to STDOUT the word success on a single line if the execution was successful or error on a single line followed by the SQLite error message on subsequent lines, in case the output is invalid or some constraint is violated.
All or nothing: your program must leave the database unchanged if there is an error on any of the ways or any of the nodes described in the input file.
Rubric for all other questions
Percentage
Description
100
The code is correct, readable, and well documented, so that the TA can clearly understand what it does. Also, the TA is able to compile and execute the code on all test cases without errors or warnings, including test cases with invalid input, by following the instructions provided.
75
The code is correct and is understandable with some effort. The TA is able to compile and execute the code on all test cases, including test cases with invalid input, by guessing or fixing steps.
50
The code works only on (all) test cases with valid input; OR the TA is able to execute the code from the instructions but there are parts of the code that are not understandable with reasonable effort; OR the code does not follow the life-cycle mentioned in the Tasks section above.
25
The code works on one or two test cases with valid input; OR the code is able to read all inputs but does not compute the correct value nor modifies the database as required; OR the code is incomprehensible.
0
**Travis-CI raises errors or warnings for the question**; OR, the provided Travis-CI configuration file has been modified; OR the provided folder structure has been modified; OR the code requires libraries or tools not in the instructional lab machines.