University of Cardiff
MAT012 Credit Risk Scoring (2019/20)
Lab Session 3
Copyright By PowCoder代写 加微信 powcoder
The first question for Lab Session 3 is challenging so I’ve put together these notes to the solution which will hopefully be of use to you. Use them alongside the spreadsheet “Lab3_solution_q1q3.xls”.
The question asks you to draw the scorecard, calculate the statistic etc.
The first tab we’ll look at is “Task 1 Data”.
On this tab, you’ll see that column A contains numbers from 1 – 25. These are just for display purposes and have no effect on the calculation.
In column B you’ll see the scores that are in the question sheet and alongside them, in column C, whether that applicant was ultimately Good (didn’t default) or Bad (did default). You can think of the scores as the output of a credit risk scorecard model, much like the regressions we’ve looked at in previous weeks.
Column D contains a ‘recode’ of the Good/Bad into 1/0, just to make further computation straightforward.
Column E contains a running total of ‘False Negative’ observations. What does that mean? It means that, if the ‘cutoff’ was set at the value given in column B for a specified row, that row in Column E contains the number of applicants predicted to be Bad that turned out to be Good.
Let’s elaborate on this by means of an example – look at row 11. In column B, you have the score 170. So, if you had this as a cutoff point, any applicant with a score of 170 or lower you would reject for credit – you’d consider them Bad. But, if you look at the observations (as the original table shows you) you have applicants with scores of 120 (row 3), 150 (row 6) and 170 (row 8) who all turn out to be Good. So, that means you’ve rejected 3 applicants who turned out to be Good – you’ve had 3 ‘False Negatives’. So, in Column E, Row 8, you’ll see the number 3.
Column F contains a running total of ‘True Negative’ observations. What does that mean? It means that, if the ‘cutoff’ was set at the value given in column B for a specified row, that row in Column F contains the number of predicted Bad observations that turned out to be Bad.
Again, let’s look at row 11 as an example. Column B has the score 170, so that’s our cutoff point. It means any applicant with a score at or below 170 gets rejected – you’d consider them Bad. If you look at Column C, applicants 1 (with a score of 100), 2 (score 110), 4 (score 130), 5 (score 140) and 7 (score 160) all turned out to be Bad – so you predicted them correctly. That means you predicted 5 applicants to be ‘Bad’ correctly – you have 5 ‘True Negatives’. So, in Column 5, you’ll see the number 5.
Column G and Column H contain similar logic to calculate the False Positives (applicants predicted to be Good that turned out to be Bad) and True Positives (applicants predicted to be Good that turned out to be Good).
To calculate the Kolmogorov-Smirnov statistic, you need you’ll need the distribution functions F(s|G) and F(s|B) – see slide 19 in the slide deck from Week 3 (MAT012_Lecture_3).
What is F(s|G)? You can think of it as follows: it’s the percentage of Good observations on or below score s. Again, look at Row 8; the score is 170. On or below that score, there are 3 Good observations – applicant 3, with a score of 120, applicant 6 with a score of 150, and applicant 8 with a score of 170. There are 15 Good observations overall. So, 3/15 = 0.2, which is the number you see in Column I, Row 8.
I’ll leave F(s|B) as an exercise for you….
If you refer back to slide 19, you’ll see that the Kolmogorov-Smirnov statistic is defined as the maximum of the absolute difference between F(s|G) and F(s|B). On the spreadsheet, the absolute difference between F(s|G) and F(s|B) is given in Column N. The maximum value of this – the Kolmogorv-Smirnov statistic – is given in cell N29.
The plot of the Kolmogorov-Smirnov curve is given on the tab “Task 1 KS_Curve”. It’s basically a plot of columns I (which is F(s|G)) and J (which is F(s|B)) from the “Task 1 Data” tab.
I hope the other columns on the “Task 1 Data” tab (columns K,L, M and O) and other tabs for Question 1 – “Q1 ROC_curve” and “Q1 CAP_curve” – will now make sense. Please mail me if they don’t and I’ll write another set of notes to help!
/docProps/thumbnail.jpeg
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com