University of Cardiff
MAT012 Credit Risk Scoring (2015/16)
Lab Session 2
Copyright By PowCoder代写 加微信 powcoder
1. Using SAS to plot graphs
Use of variables and understand the dataset in SAS.
Create a library ‘crs’. Import the file ‘german.csv’ to SAS base. This is a dataset consisting of 1000 cases of applicants for credit together with details of 20 of their characteristics. 700 of the applicants turned out to be Good (‘1’ in Good column) and 300 were Bad (‘1’ in the Bad column). There is a data dictionary at the end of this handout. Check that you understand the meaning of the data set which is provided in this worksheet.
a) Create a histogram for the variable Age
b) Create a histogram for the variable Purpose
c) Create a box plot for the variable Age by Good/Bad
2. Coarse classifying classes
The following data describes the attributes of the characteristic “time with bank” in a credit scoring sample.
Under 6 months
6-12 months
12-18 months
18-24 months
24-36 months
5-10 years
No. of Goods
No. of Bads
Use this information to find the best coarse classifying classes to split this into using
a. The Chi-square statistic
b. The information statistic
Age in years
Amount of loan
Status of existing checking account:
1: < 0 DM; 2: 0 to <200 DM; 3: >=200 DM/ salary assignments for at least 1 year; 4: no checking account
Other debtors/guarantors: 1: none; 2: co-applicant; 3: guarantor
Number of dependents
Duration in months
Present employed since:
1: unemployed; 2: < 1 year; 3: 1 to < 4 years; 4: 4 to < 7 years;
5: >= 7 years
Number of existing credits at this bank
Foreign worker: 1: yes; 2: no
Good/bad payer
0: no credits taken/all credits paid back duly; 1: all credits at this bank paid back duly; 2: existing credits paid back duly till now;
3: delay in paying off in the past; 4: critical account/other credits existing (not at this bank)
Housing: 1: rent; 2: own; 3: for free
Instalment rate in percentage of disposable income
Job: 1: unemployed/unskilled – non-resident; 2: unskilled – resident; 3: skilled employee/official; 4: management/self-employed/highly qualified employee/officer
Marital status: 1: male: divorced/separated;
2: female: divorced/separated/married; 3: male: single;
4: male: married/widowed; 5: female: single
Other instalment plans: 1: bank; 2: stores; 3: none
Property: 1: real estate; 2: if not 1: building society savings agreement/life insurance; 3: if not 1/2: car or other, not in attribute 6; 4: unknown/no property
Purpose of loan: 0: car (new); 1: car (used); 2: furniture/equipment; 3: radio/television; 4: domestic appliances; 5: repairs; 6: education;
7: vacation; 8: retraining; 9: business; X: others
Date beginning permanent residence
Savings account/bonds: 1: < 100 DM; 2: 100 to < 500 DM; 3: 500 to < 1000 DM; 4: >= 1000 DM; 5: unknown/no savings account
Telephone: 1: none; 2: yes, registered under the customer’s name
3. Using logistic regression to build a scorecard
a. Prior to building the scorecard, we need to first recode the following variables:
Using the variable ‘checking’ and the above five newly coded variables to build a scorecard utilizing logistic regression. You can use either ‘Bad’ or ‘Good’ as the target variable [Hint: use proc logistic and the option param=glm].
Consider the following:
i) Why do we need to recode all variables?
ii) Examine the results and explain whether they make sense for each variable.
iii) How could you interpret these estimates? Why has SAS left the maximum likelihood estimates of the last attribute of each variable?
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com