Introduction to information system
Model Evaluation Metrics
Deema Hafeth and JingMin Huang
School of Computer Science
University of Lincoln
CMP3036M/CMP9063M
Data Science 2016 – 2017 Workshop
Today‟s Objectives
• Do the Exercises 1-3
• There are several hints about using the relevant R packages and built-in
functions. You can google and read the materials in the reference list (But
remember~~the best way is raising your hand and ask demonstrators ).
Exercises
Exercise 1/3
a) Download and import the “Worcester Heart Attack Trial.csv” dataset into R
b) Remove the varibles „id‟, „admitdate‟ and „foldate‟
c) Use the leave-one-out cross validation to estimate the root mean squared error
(RMSE) of a multiple linear model where „lenfol‟ is the response variable and
the rest of variables are predictors [Hint: use the „caret‟ library]
d) Randomly split the data into a training set and a test set. The split ratio of test
set and test set is 4:1. Set the seed to be 1.
e) Develop a logistic regression based on the training set where „fstat‟ is the
response variable and predictors are the rest of variables. Make you
predictions for the input in the test set.
Exercise 2/3
a) Write a function which:
– Input:
1) Model prediction results (i.e. probabilities)
2) Real results (i.e., labels in the test set)
3) Threshold
– Output:
• True Positive, False Positive, True Negative, False Negative,
Precision, recall and False Positive Rate
b) Use this function to evaluate the results from Exercise 1. What will the return
values be when you increase or decrease the threshold values?
Exercise 3/3
a) Write your own function to calculate AUC based on the results you get
from Exercise 1 and 2.
– Input: 1) Model prediction results; 2) Real results
– Output: AUC value
Hint: use the prediction probabilities as the „points‟ of the ROC curve.
b) Plot the ROC curve (see fig the next slide)
c) Now apply the „pROC‟ package „auc‟ function to calculate the AUC. Is it
the same as your own function AUC result?
Exercise 3/3 b)
References
• Our last week lecture slides.
• David Page. Evaluating Machine Learning Methods.
http://pages.cs.wisc.edu/~dpage/cs760/evaluating.pdf
• The Caret Package: http://topepo.github.io/caret/index.html
• G. James, D. Witten, T. Hastie, and R. Tibshirani. (2014). An
introduction to statistical learning. Springer. (Chapter 5)
http://pages.cs.wisc.edu/~dpage/cs760/evaluating.pdf
http://pages.cs.wisc.edu/~dpage/cs760/evaluating.pdf
http://pages.cs.wisc.edu/~dpage/cs760/evaluating.pdf
http://topepo.github.io/caret/index.html
Thank You
dabdalhafeth@lincoln.ac.uk
jhua8590@gmail.com
mailto:dabdalhafeth@lincoln.ac.uk
https://github.com/boweichen/CMP3036MDataScience/blob/master/jhua8590@gmail.com