程序代写代做代考 data science Introduction to information system

Introduction to information system

Model Evaluation Metrics

Deema Hafeth and JingMin Huang

School of Computer Science

University of Lincoln

CMP3036M/CMP9063M

Data Science 2016 – 2017 Workshop

Today‟s Objectives

• Do the Exercises 1-3

• There are several hints about using the relevant R packages and built-in

functions. You can google and read the materials in the reference list (But

remember~~the best way is raising your hand and ask demonstrators ).

Exercises

Exercise 1/3

a) Download and import the “Worcester Heart Attack Trial.csv” dataset into R

b) Remove the varibles „id‟, „admitdate‟ and „foldate‟

c) Use the leave-one-out cross validation to estimate the root mean squared error

(RMSE) of a multiple linear model where „lenfol‟ is the response variable and

the rest of variables are predictors [Hint: use the „caret‟ library]

d) Randomly split the data into a training set and a test set. The split ratio of test

set and test set is 4:1. Set the seed to be 1.

e) Develop a logistic regression based on the training set where „fstat‟ is the

response variable and predictors are the rest of variables. Make you

predictions for the input in the test set.

Exercise 2/3

a) Write a function which:

– Input:

1) Model prediction results (i.e. probabilities)

2) Real results (i.e., labels in the test set)

3) Threshold

– Output:

• True Positive, False Positive, True Negative, False Negative,

Precision, recall and False Positive Rate

b) Use this function to evaluate the results from Exercise 1. What will the return

values be when you increase or decrease the threshold values?

Exercise 3/3

a) Write your own function to calculate AUC based on the results you get

from Exercise 1 and 2.

– Input: 1) Model prediction results; 2) Real results

– Output: AUC value

Hint: use the prediction probabilities as the „points‟ of the ROC curve.

b) Plot the ROC curve (see fig the next slide)

c) Now apply the „pROC‟ package „auc‟ function to calculate the AUC. Is it

the same as your own function AUC result?

Exercise 3/3 b)

References

• Our last week lecture slides.

• David Page. Evaluating Machine Learning Methods.

http://pages.cs.wisc.edu/~dpage/cs760/evaluating.pdf

• The Caret Package: http://topepo.github.io/caret/index.html

• G. James, D. Witten, T. Hastie, and R. Tibshirani. (2014). An

introduction to statistical learning. Springer. (Chapter 5)

http://pages.cs.wisc.edu/~dpage/cs760/evaluating.pdf
http://pages.cs.wisc.edu/~dpage/cs760/evaluating.pdf
http://pages.cs.wisc.edu/~dpage/cs760/evaluating.pdf
http://topepo.github.io/caret/index.html

Thank You

dabdalhafeth@lincoln.ac.uk

jhua8590@gmail.com

mailto:dabdalhafeth@lincoln.ac.uk
https://github.com/boweichen/CMP3036MDataScience/blob/master/jhua8590@gmail.com