University of Lincoln School of Computer Science 2016 – 2017 |
|
Assessment Item 1 of 2 Briefing Document |
|
Title: CMP3036M Data Science |
Indicative Weighting: 50% |
Learning Outcomes On successful completion of this component a student will have demonstrated competence in the following areas: LO1 Critically apply fundamental concepts and techniques in data science |
Overview
The objective of this assessment is to apply your knowledge of Data Science theory and techniques from Semester A to carry out data analysis and predictions based on the datasets supplied by the delivery team. Specifically, you will be given access to the datasets (see Table below) in order to carry out a number of fundamental Data Science techniques you have learned while engaging with the module lectures and workshops. The processes you undertake during the development and application of the selected techniques must be fully documented and disseminated in a written report, and go into sufficient depth to demonstrate knowledge and critical understanding of the relevant processes involved (i.e. descriptive statistics, training, evaluation, and predictions). 100% of available marks are through the completion of the written report components, with clear and separate marking criteria for each required report section. Notably, a distinct and significant report section on discussing and critiquing the analysis and implementation processes you carried out for your data solution is required.
Several data files will be made available for the assessment as shown below:
As you engage with the module and work towards making a data solution, you will be able to self-test your data solutions against the supplied training data until you are ready to make a final submission based on the test data. You must submit your data solution file with your report documentation submitted to Blackboard. More details for making a correct submission are in the submission section of this document, and will also be covered during the module delivery.
Report
The report must conform to the below structure and include the required content as described, information on specific marking criteria for each section is available in the accompanying CRG document.
You must supply a written report containing five distinct sections that provide a full and reflective account of the processes undertaken: i) data programming tools, ii) training data pre-processing and analysis, iii) predictive models, iv) model evaluation, and v) references.
File name
Description
ds_training.csv
The training data. You will do statistical analysis on this
dataset and train your models.
ds_test.csv
The test data. You will make predictions for its given input.
ds_sample_submission.csv
A submission example. You will make your submission following the same format.
CMP3036M Data Science Page 1 of 3
- i) Section I: Data Programming Tools (5%)
The tools section should discuss the core data programming tools you used to develop your data solution. You should highlight the capabilities of the tools that you have used, and discuss their strengths and weaknesses in relation to other similar tools to manipulate the assessment dataset(s). You may wish to use diagrams and academic literature to support your discussion where appropriate, and reference all tools used and discussed (~1000 words). - ii) Section II: Training Data Pre-processing and Analysis (10%)
Your training data pre-processing and analysis section should: a) summarise the training dataset you have used in the assessment; b) discuss the data pre-processing methods you have done; c) and present your data analysis findings. You should present the insights and new knowledge that the application of Data Science techniques on the dataset can produce. You may wish to use descriptive statistics, tables and graphs to describe your findings, and you should use academic literature to support your discussion, as well as reference relevant Data Science concepts/methods. You should not discuss Data Science tools in this section. (~1000 words). - iii) Section III: Predictive Models (20%)
The predictive models section should discuss the algorithms you have implemented on the training data for classification, explain why and how they can provide predictions for the test data. You should provide a verbose and critical account of all processes undertaken to arrive at your final data solution, including programming challenges and any particular difficulties while using the tools. You are expected to implement at least two algorithms for prediction purposes, and the underpinning theory for each algorithm and subsequent implementation should be explained in sufficient detail. You could also discuss any advanced techniques implemented in your solution that have not been covered in depth in the module. This section must include academic literature and references to support your discussion. You must submit your data solution file following the required format. If you do not submit your data solution file following the required format, you will be given a mark of ‘zero’ for this section of the report. More details for making a correct submission are in the submission section of this document, and will also be covered during the module delivery. (~1500 words). - iv) Section IV: Model Evaluation (10%)
Your evaluation section should provide a critical and reflective discussion on comparing the algorithms/methods you have implemented, and explain how you selected the final algorithm for your final submission. Your final data solution submission will be parsed by the metric called the area under the curve (AUC). Therefore, your discussion should provide an introduction to the AUC. You should then explain why and how your final algorithm has achieved its performance under the AUC. (~1000 words) - v) Section V: References (5%)
The references section should contain a properly formatted list (Harvard style) of all academic literature and other supporting materials that have been cited throughout the report.
Your report should be a maximum of 4500 words; as a level 3/4 year student you are expected to communicate academic concepts and discussion from a critical standpoint that moves beyond a purely descriptive account. A presentation penalty of 5% will be strictly applied if you exceed the 4500 maximum word limit (10% leeway applies). In summary the report must:
- Contain your name, student number, email address, module name and code;
- Be in PDF and no more than 4500 words (~10%), including cover page (if you have one), table of
contents, appendices (if you have any) and references;
- Be formatted in single line spacing and use an 11pt font;
- Do not include this briefing document.
CMP3036M Data Science Page 2 of 3
Submission Instructions
The deadline for submission of this work is included in the School Submission dates on Blackboard. You must make an electronic submission of your work to Blackboard that includes the following mandatory items:
- a PDF of your written report (following the requirements above), submitted to the Turnitin upload area for assessment 1
- a ZIP file containing the following item: your data solution file, submitted to the supporting documentation area for assessment 1. You must make sure that your data solution file follows the same content format as the ds_sample_submission.csv file and your data solution file should be named as ds_submission_YourStudentID.csv, e.g., ds_submission_12345000.csv. Remember – If you do not submit your data solution file following the required format, you will be given a mark of ‘zero’ for report section III: Predictive Models (20%).
This assessment is an individually assessed component. Your work must be presented according to the School of Computer Science guidelines for the presentation of assessed written work. Please make sure you have a clear understanding of the grading principles for this component as detailed in the accompanying Criterion Reference Grid. Your citations and referencing should be in accordance with University guidelines.
If you are unsure about any aspect of this assessment component, please seek the advice of the module coordinator: Bowei Chen <bchen@lincoln.ac.uk>
CMP3036M Data Science Page 3 of 3