留学生代考 MAT012 Credit Risk Scoring

University of Cardiff

MAT012 Credit Risk Scoring

Copyright By PowCoder代写 加微信 powcoder

Lab Session 1: How scorecards are Developed- using SAS
1. Understand data set.
Open SAS, Click on . Then go to File( Import Data and click on Local Computer. Find the Excel spreadsheet Germanregressionstart and then Click on Next. Highlight Sheet 1 and tick on “first row of range contains field names”. The click on Next twice more and then Finish. Click on the final icon made to open the data set.
This is a data set consisting of 1000 cases of applicants for credit together with details of 20 of their characteristics. 700 of the applicants turned out to be Good ( 1 in the Good column) and 300 were Bad ( 1 in the Bad column). There is a data dictionary at the end of this handout but the data you have has been simplified by removing A and the characteristic number from the attributes. Thus the data in existing checking account has values 1, 2, 3 and 4 not characteristic A11, A12, A13 and A14. Check that you understand the meaning of the data set which is provided in Sheet 1.

The course structure looks at how to apply specific scorecard methodologies before looking at the practical details of preparing the data including coarse classifying. So in this first lab class we will use data that has already been put in the appropriate form with all the variables being binary (0 or 1) values. In the second workshop we will look at how to coarse classify the variables and so obtain such binary variables

2. Using logistic regression to build a scorecard

Again go to Go to File( Import Data and then click on Local Computer and again find and open the Excel spreadsheet Germanregressionstart but this time choose Sheet 2. Check that you again have “first row of range contains field names” ticked.. Then go through Next, next and Finish and by clicking on the icon open the data.. Note this set has only six of the variables in it – checking, duration, purpose, married, telephone and foreign,- but they are all in binary form together with the Good variable. There are 3 variables for checking 1,2,3, yet there was a fourth value available 4.

Why have we left the corresponding binary variable out?

Duration also has 3 variables. What is the variable that has been left out in its case?

Similarly what are the five variables that arise from the purpose characteristic . What sort of loans will have a 0 in all five cases?

Do you understand what a 1 in the married, telephone and foreign variables means ( One has to check the original data to see)
Now we will apply logistic regression to get a scorecard where we are regressing the Good variable on the characteristics discussed above.

Go to Task( Regression(Logistic …

In Data on the screen that now opens assign the variables to their tasks by using the arrow that clicks across

Good ( Dependent Variable

All other variables( Quantitative variables
( You can highlight them all by using Ctrl at the start of the group and keeping Ctrl and Shift down when you click on the far end of the group)

Then Model ( Response( make sure it is binary response type, logit type of model and change “fit model to level” to be 1 ( So the score will be high for the goods who have a 1 in the Good value)

Model (Effects. Highlight all and click on Main

Plots(tick show plots for regression analysis and click on custom list of plots. When the list of plots appear tick ROC curve
Predictions(Original sample (tick) (Predictions (tick)

Then click Run. The output should look like the following.
Analysis of Maximum Likelihood Estimates

Pr > ChiSq

checking 3

Duration 12-

Duration 12-36

duration 36-48

purpose 34

purpose 59

The Somers D ( which is also called the GINI coefficient) is 0.54 which is very good.

However you cans see that Duration 36-48, purpose0, purpose 59 and telephone are not really adding anything to the scorecard. Why do you know that? We also question the use of duration 12-36 and purpose 2 but leave them in for now.

To remove the four variables, in the Process Flow diagram, click on the logistic regression and then right click and choose modify logistic regression. When the data window appears click on Model(Effect and then on Telephone and click on remove effect. Repeat that to remove Duration36-48, purpose0 and purpose59
Then make sure that in Plots you choose ROC Curve and tick ROC curve and in Predictions make sure Original sample and predictions is ticked

Then Run and allow the programme to replace the original results.

If you encounter difficulty you can always start again from the data imported icon and use task( regression( logistic and then only choose the relevant variables (i.e. not telephone etc) as quantitative variables
( You should get results like this

Analysis of Maximum Likelihood Estimates

Pr > ChiSq

checking 3

Duration 12-

Duration 12-36

purpose 34

This suggests that all the variables in the scorecard are now relevant. The Gini has in fact dropped a little to 0.535 but that is not a worry as the scorecard is probably more robust and not over fitting. Have a look at the other statistics and see if you understand some of them.

The Scorecard we would end up with could be represented as

Score = 82 -201Checking1 -143 Checking 2 -87Checking 3 +140Duration 12

+77Duration12-36 +124Purpose1 +41purpose2 +69purpose{3,4} -36Married +150Foreign.

Do you agree?

Does it make sense in each of the variables. In particular does it make sense for checking?,

3. Building a scorecard by linear regression and testing it on a holdout sample

Now we will build a scorecard using linear regression not logistic regression and also find out how to see how it performs on a different data set – a holdout sample – to the one it was built . This is because the scorecard will always perform better on the set it was built on than on any other one because it has picked up some of the special features of that set that do not exist elsewhere. We use the same data as before but we have now split it into two sets – 700 borrowers which make up the training set ( Sheet 3) and 300 borrowers who make up the holdout sample ( Sheet 4).

We will also make this scorecard build into a project ( We could do that for the logistic regression as well) and use process flow screen to see the development of the project flow
Go again to File ( (. Import Data and then click on Local Computer and again find Germanregressionstart. Click on Next then Click on Sheet 3 and go through the rest of the windows and click on Finish. Remember what name the data sets is given by clicking on the icon Import Data and check the log (It will be SASUSER.germanregression start and some number).

Go to Task( Regression(Linear

In Data assign the variables to their tasks by using the arrow that clicks across

Good ( Dependent Variable
All other variables are explanatory variables

Click on Model and choose stepwise selection (This means one start with no variables in the regression and at each step introduces the variable that makes the biggest impact or drops out the one with the least impact if they are above/below some prescribed values/ Keep these values for entry and exit at the 0.15 suggested.

Statistics (correlation of estimates (tick) confidence limit of estimates (tick; leave as 95%)

Prediction (Original sample (tick); Predictions (tick)

You should end up after nine iterations with the following regression where R2 = 0.17
Type II SS

checking 3

Duration 12-

duration 36-48

Notice that you do not have to worry about dropping characteristics out because they are not relevant as the stepwise procedure (or the forwards or backwards one) would do that automatically. A corresponding scorecard would be

Score = 87 -33 *Check1-27*Check2-10*Check3+12 * Duration(12-) -13 Duration(36-48) -10 *Purpose0+ 15*Purpose1 -7*Married+4*telephone +21* Foreigner

Compare this with the scorecard obtained in Task 2.They are not the same but for individual characteristics do you get the same sort of pattern?

Now let us see what the results on the holdout sample are.

Click on the View Tab to get Process Flow diagram

File(Import Data(Local Computer. Find Germanregressionstart. Open it and choose Sheet 4 clicking through Next to the Finish. Remember what name this data sets is given; Click on Import Data and check the log ( It will be SASUSER.germanregressionstart and some number).

Click on the Linear icon in the Process flow diagram. This will take you to the Task Roles tables you had before and so will recalculate the same scorecard. Go to Prediction and make sure original data is not ticked but additional data is ticked. Browse through the data sets and choose the one with the germanregressionstart number corresponding to Sheet 4. Click on display all plots and on show predictions

If it asks do you want to replace the original results say No.

When the results come up you should have the same parameter estimates as before. You can check by going back and for between the two outputs.

Click on Process Flown and then click on the new Linear regression icon which looks like a table ( the one that is connected to the both data sets). This is the results for the extra 300 cases you did not build the model on. Go to output and check there are 300 cases. If you look at the last three columns you will see Good, Flag ( ignore) and Predicted Good. The latter is the score ( i.e. the prediction of that case being Good). It should start with

0.9144741217

0.5617142138

0.6960097216

0.5962368774

0.5236957807

0.6854915045

0.3625292558

0.6444213441

Next week we will see how to use Excel to draw the ROC curves and calculate the Gini coefficient for this data.

To save your work go to File and choose save project as. Save it somewhere in the Local computer you can access later.

Description of the German credit dataset.

Number of Attributes german: 20 (7 numerical, 13 categorical)

Number of Attributes german.numer: 24 (24 numerical)

Attribute description

Attribute 1: (qualitative)

Status of existing checking account

A11 : … < 0 DM A12 : 0 <= ... < 200 DM A13 : ... >= 200 DM /

salary assignments for at least 1 year

A14 : no checking account

Attribute 2: (numerical)

Duration in month

Attribute 3: (qualitative)

Credit history

A30 : no credits taken/

all credits paid back duly

A31 : all credits at this bank paid back duly

A32 : existing credits paid back duly till now

A33 : delay in paying off in the past

A34 : critical account/

other credits existing (not at this bank)

Attribute 4: (qualitative)

A40 : car (new)

A41 : car (used)

A42 : furniture/equipment

A43 : radio/television

A44 : domestic appliances

A45 : repairs

A46 : education

A47 : (vacation – does not exist?)

A48 : retraining

A49 : business

A410 : others

Attribute 5: (numerical)

Credit amount

Attibute 6: (qualitative)

Savings account/bonds

A61 : … < 100 DM A62 : 100 <= ... < 500 DM A63 : 500 <= ... < 1000 DM A64 : .. >= 1000 DM

A65 : unknown/ no savings account

Attribute 7: (qualitative)

Present employment since

A71 : unemployed

A72 : … < 1 year A73 : 1 <= ... < 4 years A74 : 4 <= ... < 7 years A75 : .. >= 7 years

Attribute 8: (numerical)

Installment rate in percentage of disposable income

Attribute 9: (qualitative)

Personal status and sex

A91 : male : divorced/separated

A92 : female : divorced/separated/married

A93 : male : single

A94 : male : married/widowed

A95 : female : single

Attribute 10: (qualitative)

Other debtors / guarantors

A101 : none

A102 : co-applicant

A103 : guarantor

Attribute 11: (numerical)

Present residence since

Attribute 12: (qualitative)

A121 : real estate

A122 : if not A121 : building society savings agreement/

life insurance

A123 : if not A121/A122 : car or other, not in attribute 6

A124 : unknown / no property

Attribute 13: (numerical)

Age in years

Attribute 14: (qualitative)

Other installment plans

A141 : bank

A142 : stores

A143 : none

Attribute 15: (qualitative)

A151 : rent

A152 : own

A153 : for free

Attribute 16: (numerical)

Number of existing credits at this bank

Attribute 17: (qualitative)

A171 : unemployed/ unskilled – non-resident

A172 : unskilled – resident

A173 : skilled employee / official

A174 : management/ self-employed/

highly qualified employee/ officer

Attribute 18: (numerical)

Number of people being liable to provide maintenance for

Attribute 19: (qualitative)

A191 : none

A192 : yes, registered under the customers name

Attribute 20: (qualitative)

foreign worker

A201 : yes

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com