SURVIVAL ANALYSIS METHODS FOR PERSONAL LOAN DATA MARIA STEPANOVA and LYN THOMAS
Department of Management, University of Southampton, Southampton, United Kingdom, S017 1BJ
(Received November 1999; revision received August 2000; accepted October 2000)
Credit scoring is one of the most successful applications of quantitative analysis in business. This paper shows how using survival-analysis tools from reliability and maintenance modeling allows one to build credit-scoring models that assess aspects of profit as well as default. This survival-analysis approach is also finding favor in credit-risk modeling of bond prices. The paper looks at three extensions of Cox’s proportional hazards model applied to personal loan data. A new way of coarse-classifying of characteristics using survival-analysis methods is proposed. Also, a number of diagnostic methods to check adequacy of the model fit are tested for suitability with loan data. Finally, including time-by-characteristic interactions is proposed as a way of possible improvement of the model’s predictive power.
Copyright By PowCoder代写 加微信 powcoder
1. INTRODUCTION
Credit-scoring systems aid the decision of whether to grant credit to an applicant or not. Traditionally, this is done by estimating the probability that an applicant will default. This aim has been changing in recent years towards choos- ing the customers of highest profit. That change means it now becomes important not only if but when a customer will default (Thomas et al. 1999). It is possible that if the time to default is long, the acquired interest will compen- sate or even exceed losses resulting from default. Another factor that affects profitability is the cases in which cus- tomers close their account early, pay off the loan early by switching to another lender, or for other reasons. Depend- ing on when the actual repayment occurred, the lender will lose a proportion of the interest on the loan.
It has been shown previously by Thomas et al. (1999) and Narain (1992) that survival analysis can be applied to estimate the time to default or to early repayment. Survival analysis is the area of statistics that deals with analysis of lifetime data. Examples of lifetime data can be found in medical or reliability studies, for example, when a dete- riorating system is monitored and the time until event of interest is recorded.
The major strength of survival analysis is that it allows censored data to be incorporated into the model. This trans- lates in the consumer credit context as a customer who never defaults, or never pays off early, so an event of inter- est is not observed. Clearly there is a great amount of such data because, luckily, most of the customers are “good.”
This approach to using survival analysis to estimate time to default has also been used to model credit risk in the pricing of bonds and other financial investments. There has been considerable work recently in developing default mod- els to deal with credit risk; see the reviews by Cooper and Martin (1996), Lando (1997), Jarrow and Turnbull (2000). In his Ph.D. thesis, Lando (1994) introduced a proportional
hazards survival-analysis model to estimate the time until a bond defaults, the aim being to use economic variables as covariates.
In credit scoring we look for differences in application characteristics for customers with different survival times. Also, it is possible that there are two or more types of fail- ure outcome. In consumer credit we are interested, in sev- eral possible outcomes when concerned with profitability: early repayment, default, closure, etc.
The idea of employing survival analysis for building credit-scoring models was first introduced by Narain (1992) and then developed further by Thomas et al. (1999). Narain (1992) applied the accelerated life exponential model to 24 months of loan data. The author showed that the pro- posed model estimated the number of failures at each fail- ure time well. Then a scorecard was built using multiple regression, and it was shown that a better credit-granting decision could be made if the score was supported by the estimated survival times. Thus it was found that survival analysis adds a dimension to the standard approach. The author noted that these methods can be applied to any area of credit operations in which there are predictor variables and the time to some event is of interest.
Thomas et al. (1999) compared the performance of expo- nential with Weibull’s, and Cox’s nonparametric models with logistic regression, and found that survival-analysis methods are competitive with, and sometimes superior to, the traditional logistic-regression approach. Furthermore, the idea of competing risks was employed when two pos- sible outcomes were considered: default and early payoff.
It was noted by Thomas et al. (1999) that there are sev- eral possible ways of improving the performance of the simplest survival-analysis models, such as Weibull’s, expo- nential, or Cox’s proportional hazards models.
In this paper we explore three extensions of Cox’s pro- portional hazards model.
Subject classifications: Risk: estimating credit risk for personal loans. Failure models: Survival analysis applied to credit scoring models. Area of review: Financial Services.
0030-364X/02/0000-0001 $05.00 Operations Research © 2002 INFORMS 1526-5463 electronic ISSN 1 Vol. 00, No. 0, Xxxxx–Xxxxx 2002, pp. 1–13
2 / Stepanova and 2 outlines the theory of methods used in the
analysis. Section 3 looks at development of the techniques by applying them to personal loan data. The first improve- ment suggested is to coarse-classify the characteristic vari- ables using survival-analysis techniques rather than the tra- ditional approach. This not only keeps the whole approach consistent, but it means that at no point is it necessary to make arbitrary judgments about what time horizon is crit- ical. In the traditional approach, failure before this time is considered “bad”; failure after it is considered “good.” Section 3.1 looks at this new method of coarse-classifying, while §§3.2 and 3.3 apply it to predicting early repayment and default, respectively.
The second improvement is to use diagnostics to test the adequacy of the credit risk, and these are applied to the data in §4. The final improvement is to allow the decrease or increase in the effect of a covariate on the predicted time-to-failure as the loan evolves. Section 5 looks at this improvement, which overcomes the restriction (implicit in the proportional hazards assumption) that the same type of customer is most at risk at all times during the loan duration. Concluding remarks are found in §6.
2. SOME THEORY OF ANALYSIS OF LIFETIME DATA
Let T be the random variable representing time until repay- ment of a loan ceases,—i.e., time until default or early pay- off. Then one way to describe the distribution of T is the hazard function, which is defined as follows:
PtT