Introduction to information system
Questions & Answers
Tips for Assessment Item 1
Bowei Chen
School of Computer Science
University of Lincoln
CMP3036M/CMP9063M Data Science
Don’t know how to
start your assignment?
Mining the data first!
Candidate
models
Key Stages of This Data Mining Task
Data
Preparation
Model
Training
Model
Evaluation
In this stage, two steps are important:
• Data exploring:
• How many observations? How many variables?
• Do you see some numeric and/or category variables?
• What are the distributions and relationships of the variables?
• How do you present your findings? Hint: tables and plots
• …
• Data pre-processing:
• Do you need to clean the training dataset? e.g., are there any
missing values or extreme values?
• Do you need to standardise/normalise variables? Why?
• …
Key Stages of This Data Mining Task
Data
Preparation
Model
Training
Model
Evaluation
• Based on your understanding about the problem/question, what
are you candidate models?
• Why these candidate models are suitable?
• Why do you need to split the dataset into training and test sets?
• Which way do you use to split the dataset?
• Should all the variables be used to develop your model?
• …
Key Stages of This Data Mining Task
Data
Preparation
Model
Training
Model
Evaluation
• What metrics are suitable/useful for evaluating the model?
• How to calculate and interpret AUC?
• How to compare your candidate models based on AUC?
• …
Key Stages of This Data Mining Task
Data
Preparation
Model
Training
Model
Evaluation
• How you are going to select the model?
• Any other way to improve the AUC result?
• …
Key Stages of This Data Mining Task
Data
Preparation
Model
Training
Model
Evaluation
• Is there any other data pre-processing methods can be used to
further improve the AUC results?
• …
These are just simple tips
(not limited to them!).
When you obtain a good
model, don’t forget to
submit your prediction
results!
Now start to write your assignment report
Thank You!
bchen@Lincoln.ac.uk
mailto:bchen@Lincoln.ac.uk