MySQL Coursework 2.ii
Overview
This is part two of second coursework for the Databases/Big Data topic
· This work starts by extending that done on using Access to generate a SQL server-based solution, firstly using MySQL and then MS-SQL (with data warehousing between two servers).
· It then moves to BigData processing techniques (e.g. Hadoop, Hive, Spark etc.)
The breakdown of these tasks are
· Use of MySQL (this document)
· Use of MS-SQL
· Big Data Processing tasks
Submission (an overview)
The submission of coursework two will comprise
1. SQL scripts for the generate of required database schemas
2. Backup/detached SQL database files
3. Code/scripts etc. for the BigData tasks
4. Written Report (covering all aspects of the work)
Full details on the submission process (e.g. for creating (1) & (2) have been provided).
Marking rubrics are provided for tasks (to ensure all aspects of the work are understood).
For the written report, a template & marking rubric will be provided.
Practical support
The seminar sessions will provide a chance for you to develop, with support, your solutions and will also allow you to ask for feedback on your solutions prior to the final submission (I will of course be answering queries by email and/or team).
Coursework 2, Task 1: Instructions (MS-SQL tasks + Linked Server work)
This section details the work required for the second task, the MySQL implementation of the ‘student marks system’ (based on the original Access work)
One approach you might try is to generate a script to create the MySql database and ‘tweak’ it to create the MS-SQL version.
Tasks 1 & 2 are to be done via a single SQL script which processes the sample data into the relevant tables without any updating of the original table, for task 3 generate views for the data.
Task 4 requires the creation of a stored procedures that manipulate the marks table and (for 4.3 & 4.4) updates all statistics in a single function. The fact that the any script has been run should be logged in an audit table (you will need to add an additional table for this).
Pre-loaded to the virtual server is a .sql file showing how to access the linked server to retrieve data (needed for some of the view/queries).
1. Database Table generation
Remember: You are to generate a single script to complete this task
· To rationalise the data into a series of tables (remembering to consider field sizes/types)
· Define the relationships between the tables; they must be created in such a fashion as to ensure complete data integrity across the database
2. Statistics table
Remember: You are to generate a single script to complete this task
A specific requirement of the database is that there is a table of student IDs & full year statistics (credits taken, credits passed, year average etc.).
3. Views
Here the work changes you are not only to generate views (the designer is different) but requires you to allocate them to the relevant logins/groups (front of house, marks etc.).
You should consider which views are appropriate to different logins (not all will get all) and grant select permissions accordingly.
Once the tables have been created, you are to generate views to achieve the following
· V1 Produce a list of all students – no marks, suitable (say) for a register
· V2 Produce a list of modules by level (for, say, a module catalogue).
· V3 Produce a list of students and their marks.
· V4 A list of module statistics (min, max & average mark)
· V5 Identify students failed to pass in any academic year. There are three ways this occurs
· An overall weighted year average of < 40% · More than 20 credits in the range 30%-39% · A mark below 30% Note: Views should provide such fields as would make the data usable (i.e. a list of marks would need to include details such as the student ID, Name etc.). The names of the views should be indicative to the tasks they perform. 4. Data processing stored procedures The following stored procedure are to be developed 4.1 A procedure to update the FULL statistics table from the marks data – the fact the procedure has been run (and when) should be logged to a table in the audit table. 4.2 A procedure to update the statistics table from the marks data for a given. student– the fact the procedure has been run (and when) should be logged to an audit table in the database. 4.3 An add/update procedure that, for a given student/year/module, will add a new mark or update an existing mark 4.4 A delete procedure that, for a given student/year/module/mark will delete an existing mark. 4.5 A comparison of the local marks table and that on the MySQL server (the host for project marks) that details (outputs) any discrepancies (missing marks, different marks etc.) between the two sets of data. 4.6 A stored procedure to synchronise the project marks table with those on the MySql server (this removing any discrepancies found by 4.5) Remember: · Running of a stored procedure is to be logged · Stored procedures 4.3, 4.4 & 4.6 should, as part of their operation, update the statistics table · Stored procedure execute should ONLY be granted to the login(s)/groups(s) to which they are appropriate 5. Calculation of weighted average (a reminder) All averages are to be weighted averages, calculated as Average = sum ( module_mark * module_credits ) / sum (module_credits) Work to come… (the MS-SQL part). You will create an equivalent system using MS-SQL but add in a security model, management activities & systems for exchanging data between different systems etc.