CMP3036M Data Science Page 1 of 3
University of Lincoln
School of Computer Science
2016 – 2017
Assessment Item 1 of 2 Briefing Document
Title: CMP3036M Data Science Indicative Weighting: 50%
Learning Outcomes
On successful completion of this component a student will have demonstrated competence in the
following areas:
x LO1 Critically apply fundamental concepts and techniques in data science
Overview
The objective of this assessment is to apply your knowledge of Data Science theory and techniques from
Semester A to carry out data analysis and predictions based on the datasets supplied by the delivery team.
Specifically, you will be given access to the datasets (see Table below) in order to carry out a number of
fundamental Data Science techniques you have learned while engaging with the module lectures and
workshops. The processes you undertake during the development and application of the selected techniques
must be fully documented and disseminated in a written report, and go into sufficient depth to demonstrate
knowledge and critical understanding of the relevant processes involved (i.e. descriptive statistics, training,
evaluation, and predictions). 100% of available marks are through the completion of the written report
components, with clear and separate marking criteria for each required report section. Notably, a distinct and
significant report section on discussing and critiquing the analysis and implementation processes you carried
out for your data solution is required.
Several data files will be made available for the assessment as shown below:
File name Description
ds_training.csv The training data. You will do statistical analysis on this
dataset and train your models.
ds_test.csv The test data. You will make predictions for its given input.
ds_sample_submission.csv A submission example. You will make your submission
following the same format.
As you engage with the module and work towards making a data solution, you will be able to self-test your
data solutions against the supplied training data until you are ready to make a final submission based on the
test data. You must submit your data solution file with your report documentation submitted to Blackboard.
More details for making a correct submission are in the submission section of this document, and will also be
covered during the module delivery.
Report
The report must conform to the below structure and include the required content as described, information on
specific marking criteria for each section is available in the accompanying CRG document.
You must supply a written report containing five distinct sections that provide a full and reflective account of
the processes undertaken: i) data programming tools, ii) training data pre-processing and analysis, iii)
predictive models, iv) model evaluation, and v) references.
CMP3036M Data Science Page 2 of 3
i) Section I: Data Programming Tools (5%)
The tools section should discuss the core data programming tools you used to develop your data
solution. You should highlight the capabilities of the tools that you have used, and discuss their
strengths and weaknesses in relation to other similar tools to manipulate the assessment dataset(s).
You may wish to use diagrams and academic literature to support your discussion where appropriate,
and reference all tools used and discussed (~1000 words).
ii) Section II: Training Data Pre-processing and Analysis (10%)
Your training data pre-processing and analysis section should: a) summarise the training dataset
you have used in the assessment; b) discuss the data pre-processing methods you have done; c) and
present your data analysis findings. You should present the insights and new knowledge that the
application of Data Science techniques on the dataset can produce. You may wish to use descriptive
statistics, tables and graphs to describe your findings, and you should use academic literature to
support your discussion, as well as reference relevant Data Science concepts/methods. You should not
discuss Data Science tools in this section. (~1000 words).
iii) Section III: Predictive Models (20%)
The predictive models section should discuss the algorithms you have implemented on the training
data for classification, explain why and how they can provide predictions for the test data. You should
provide a verbose and critical account of all processes undertaken to arrive at your final data
solution, including programming challenges and any particular difficulties while using the tools.
You are expected to implement at least two algorithms for prediction purposes, and the underpinning
theory for each algorithm and subsequent implementation should be explained in sufficient detail.
You could also discuss any advanced techniques implemented in your solution that have not been
covered in depth in the module. This section must include academic literature and references to
support your discussion. You must submit your data solution file following the required format. If
you do not submit your data solution file following the required format, you will be given a
mark of ‘zero’ for this section of the report. More details for making a correct submission are in the
submission section of this document, and will also be covered during the module delivery. (~1500
words).
iv) Section IV: Model Evaluation (10%)
Your evaluation section should provide a critical and reflective discussion on comparing the
algorithms/methods you have implemented, and explain how you selected the final algorithm for your
final submission. Your final data solution submission will be parsed by the metric called the area
under the curve (AUC). Therefore, your discussion should provide an introduction to the AUC. You
should then explain why and how your final algorithm has achieved its performance under the AUC.
(~1000 words)
v) Section V: References (5%)
The references section should contain a properly formatted list (Harvard style) of all academic
literature and other supporting materials that have been cited throughout the report.
Your report should be a maximum of 4500 words; as a level 3/4 year student you are expected to
communicate academic concepts and discussion from a critical standpoint that moves beyond a purely
descriptive account. A presentation penalty of 5% will be strictly applied if you exceed the 4500 maximum
word limit (10% leeway applies). In summary the report must:
x Contain your name, student number, email address, module name and code;
x Be in PDF and no more than 4500 words (~10%), including cover page (if you have one), table of
contents, appendices (if you have any) and references;
x Be formatted in single line spacing and use an 11pt font;
x Do not include this briefing document.
vagrant
vagrant
CMP3036M Data Science Page 3 of 3
Submission Instructions
The deadline for submission of this work is included in the School Submission dates on Blackboard. You
must make an electronic submission of your work to Blackboard that includes the following mandatory
items:
x a PDF of your written report (following the requirements above), submitted to the Turnitin upload
area for assessment 1
x a ZIP file containing the following item: your data solution file, submitted to the supporting
documentation area for assessment 1. You must make sure that your data solution file follows the
same content format as the ds_sample_submission.csv file and your data solution file should be
named as ds_submission_YourStudentID.csv, e.g., ds_submission_12345000.csv. Remember – If
you do not submit your data solution file following the required format, you will be given a
mark of ‘zero’ for report section III: Predictive Models (20%).
This assessment is an individually assessed component. Your work must be presented according to the School
of Computer Science guidelines for the presentation of assessed written work. Please make sure you have a
clear understanding of the grading principles for this component as detailed in the accompanying Criterion
Reference Grid. Your citations and referencing should be in accordance with University guidelines.
If you are unsure about any aspect of this assessment component, please seek the advice of the module
coordinator: Bowei Chen
mailto:bchen@lincoln.ac.uk
vagrant