Classification methods applied to credit scoring: A systematic review and overall comparison
Francisco Louzadaa Guilherme B. Fernandesc
a Department of Applied Mathematics & Statistics, University of S ̃ao Paulo, S ̃ao Carlos, Brazil b Department of Statistics, Federal University of S ̃ao Carlos, S ̃ao Carlos, Brazil
c P&D e Inovation in Analytics, Serasa-Experian, S ̃ao Paulo, Brazil
Copyright By PowCoder代写 加微信 powcoder
The need for controlling and effectively managing credit risk has led financial institutions to excel in improving techniques designed for this purpose, resulting in the development of various quantitative models by financial institutions and consulting companies. Hence, the growing number of academic studies about credit scoring shows a variety of classifi- cation methods applied to discriminate good and bad borrowers. This paper, therefore, aims to present a systematic literature review relating theory and application of binary classification techniques for credit scoring financial analysis. The general results show the use and importance of the main techniques for credit rating, as well as some of the scientific paradigm changes throughout the years.
Keywords: Credit Scoring, Binary Classification Techniques, Literature Review, .
1. Introduction
The need for credit analysis was born in the beginnings of commerce in conjunction with the borrowing and lending of money, and the purchasing authorisation to pay any debt in future. However, the modern concepts and ideas of credit scoring analysis emerged about 70 years ago with Durand (1941). Since then, traders have begun to gather infor- mation on the applicants for credit and catalog them to decide between lend or not certain amount of money (Banasik et al., 1999; Marron, 2007; Louzada et al., 2012b).
According to Thomas et al. (2002) credit scoring is ”a set of decision models and their underlying techniques that aid credit lenders in the granting of credit”. A broader definition is considered in the present work: credit scoring is a numerical expression based on a level analysis of customer credit worthiness, a helpful tool for assessment and prevention of default risk, an important method in credit risk evaluation, and an active research area in financial risk management.
Preprint submitted to Elsevier February 8, 2016
arXiv:1602.02137v1 [stat.AP] 5 Feb 2016
At the same time, the modern statistical and data mining techniques have given a significant contribution to the field of information science and are capable of building models to measure the risk level of a single customer conditioned to his characteristics, and then classify him as a good or a bad payer according to his risk level. Thus, the main idea of credit scoring models is to identify the features that influence the payment or the non-payment behaviour of the costumer as well as his default risk, occurring the classification into two distinct groups characterised by the decision on the acceptance or rejection of the credit application (Han et al., 2006).
Since the on Banking Supervision released the , spe- cially the second accord from 2004, the use of credit scoring has grown considerably, not only for credit granting decisions but also for risk management purposes. The internal rating based approaches allows the institutions to use internal ratings to determine the risk parameters and therefore, to calculate the economic capital of a portfolio II, released in 2013, renders more accurate calculations of default risk, especially in the con- sideration of external rating agencies, which should have periodic, rigorous and formal comments that are independent of the business lines under review and that reevaluates its methodologies and models and any significant changes made to them (Rohit et al., 2013; RBNZ, 2013).
Hence, the need for an effective risk management has meant that financial institutions began to seek a continuous improvement of the techniques used for credit analysis, a fact that resulted in the development and application of numerous quantitative models in this scenario. However, the chosen technique is often related to the subjectivity of the analyst or state of the art methods. There are also other properties that usually differ, such as the number of datasets applied to verify the quality of performance capability or even other validation and misclassification cost procedures. These are natural events, since credit scoring has been widely used in different fields, including propositions of new methods or comparisons between different techniques used for prediction purposes and classification.
A remarkable, large and essential literature review was presented in the paper by Hand and Henley (1997), which discuss important issues of classification methods applied to credit scoring. Other literature reviews were also conducted but only focused on some types of classification methods and discussion of the methodologies, namely Xu et al. (2009), Shi (2010), Lahsasna et al. (2010a) and Nurlybayeva and Balakayeva (2013). Also, Garcia et al. (2014) performed a systematic literature review, but limiting the study to papers published between 2000 and 2013, these authors provided a short experimental framework comparing only four credit scoring methods. Lessmann et al. (2015) in their review considered 50 papers published between 2000 and 2014 and provided a comparison of several classifications methods in credit scoring. However, it is known that there are several different methods that may be applied for binary classification and they may be encompassed by their general methodological nature and can be seem as modifications of others usual existing methods. For instance, linear discriminant analysis has the same general methodological nature of quadratic discriminant analysis. In this sense, even
though Lessmann et al. (2015) considered several classification methods they did not consider general methodologies as genetic and fuzzy methods.
This paper, therefore, we aim to present a more general systematic literature review over the application of binary classification techniques for credit scoring, which features a better understanding of the practical applications of credit rating and its changes over time. In the present literature review, we aim to cover more than 20 years of researching (1992-2015) including 187 papers, more than any literature review already carried out so far, completely covering this partially documented period in different papers. Further- more, we present a primary experimental simulation study under nine general methodolo- gies, namely, neural networks, support vector machine, linear regression, decision trees, logistic regression, fuzzy logic, genetic programming, discriminant analysis and Bayesian networks, considering balanced and unbalanced databases based on three retail credit scoring datasets. We intent to summarise researching findings and obtain useful guidance for researchers interested in applying binary classification techniques for credit scoring.
The remainder of this paper is structured as follows. In Section 2 we present the con- ceptual classification scheme for the systematic literature review, displaying some impor- tant practical aspects of the credit scoring techniques. The main credit scoring techniques are briefly presented in Section 3. In Section 4 we present the results of the systematic review under the eligible reviewed papers, as well as the systematic review over four dif- ferent time periods based on a historical economic context. In Section 5 we compare all presented methods on a replication based study. Final comments in Section 6 end the paper.
2. Survey methodology
Systematic review, also known as systematic review, is an adequate alternative for identifying and classifying key scientific contributions to a field on a systematic, qualitative and quantitative description of the content in the literature. Interested readers can refer to Hachicha and Ghorbel (2012) for more details on systematic literature review. It consists on an observational research method used to systematically evaluate the content of a recorded communication (Kolbe and Brunette, 1991).
Overall, the procedure for conducting a systematic review is based on the definition of sources and procedures for the search of papers to be analysed, as well as on the definition of instrumental categories for the classification of the selected papers, here based on four categories to understand the historical application of the credit scoring techniques: year of publication, title of the journal where the paper was published, name of the co-authors, and conceptual scheme based on 12 questions to be answered under each published paper. For this purpose, there is a need for defining the criteria to select credit scoring papers in the research scope. Thus, two selection criteria are used in this paper to select papers related to the credit scoring area to be included in the study:
• The study is limited to the published literature available on the following databases: 3
Sciencedirect, Engineering Information, Reaxys and Scopus, covering 20,500 titles from 5,000 publishers worldwide.
• The systematic review restricts the study eligibility to journal papers in English, especially considering ’credit scoring’ as a keyword related to ’machine learning’, ’data mining’, ’classification’ or ’statistic’ topics. Other publication forms such as unpublished working papers, master and doctoral dissertations, books, conference in proceedings, white papers and others are not included in the review. The survey horizon covers a period of almost two decades: from January 1992 to December 2015.
Figure 1: Procedure of the systematic review review.
The papers were selected according to the procedure shown in Figure 1. From 437 papers eligible as potentially related to credit scoring, 250 were discarded due to not meeting the second selection criterion. The 187 papers included in the study were sub- jected to the systematic review, according to 12 questions on the conceptual scenario over the techniques: What is the main objective of the paper? What is the type of the main classification method? Which type the datasets used? Which is the type of the explanatory variables? Does the paper perform variable selection methods? Was missing values imputation performed? What is the number of datasets used in the paper? Was performed exhaustive simulation study? What is the type of validation of the approach? What is the type of misclassification cost criterion? Does the paper use the Australian or the German datasets? Which is the principal classification method used in comparison study? The 12 questions and possible answers are shown in Table A.1 in the Appendix.
2.1. The main objective of the papers
Although a series of papers is focused on the same area, they have different specific ob- jectives. One can separate them in general similar aims. In the present work, we consider seven types of main objectives: proposing a new method for rating, comparing traditional techniques, conceptual discussions, feature selection, literature review, performance mea- sures studies and, at last, other issues. Conceptual discussions account for papers that deal with problems or details of the credit rating analysis. In other issues, were included papers that presented low frequency objectives.
In the proposition of new methods, Lee et al. (2002) introduce a discriminant neural model to perform credit rating, Gestel et al. (2006) propose a support vector machine model within a Bayesian evidence framework. Hoffmann et al. (2007) propose a boosted genetic fuzzy model, Hsieh and Hung (2010) using a combined method that covers neural networks, support vector machine and bayesian networks.
Shi (2010) performed a systematic literature review that covers multiple criteria linear programming models applied to credit scoring from 1969 to 2010. Other literature reviews were performed by Hand and Henley (1997); Gemela (2001); Xu et al. (2009); Shi (2010); Lahsasna et al. (2010a); et al. (2012).
Among the papers that perform a conceptual discussion, Bardos (1998) presents tools used by the Banque de France, Banasik et al. (1999) discusses how hazard models could be considered in order to investigate when the borrowers will default, Hand (2001a) discusses the applications and challenges in credit scoring analysis. Martens et al. (2010) performs an application in credit scoring and discusses how their tool fits into a global I credit risk management system. Other examples about conceptual discussion may be seen in Chen and Huang (2003), Marron (2007) and Thomas (2010).
In comparison of traditional techniques, West (2000) compared five neural network model with traditional techniques. The results indicated that neural network can improve the credit scoring accuracy and also that logistic regression is a good alternative to the neural networks. Baesens et al. (2003) performed a comparison involving discriminant analysis, logistic regression, logic programing, support vector machines, neural networks, bayesian networks, decision trees and k-nearest neighbor. The authors concluded that many classification techniques yield performances which are quite competitive with each other. Other important comparisons may be seen in Adams et al. (2001); Hoffmann et al. (2002); Ong et al. (2005); Baesens et al. (2005); Wang et al. (2005); Lee et al. (2006); Huang et al. (2006b); Xiao et al. (2006); et al. (2007); Martens et al. (2007); Hu and Ansell (2007); Tsai (2008); Abdou et al. (2008); Sinha and Zhao (2008); Luo et al. (2009); Finlay (2009); Abdou (2009); Hu and Ansell (2009); Finlay (2010); Wang et al. (2011). Also, Liu and Schumann (2005); Somol et al. (2005); Tsai (2009); Falangis and Glen (2010); Chen and Li (2010); Yu and Li (2011); McDonald et al. (2012); Wang et al. (2012b) handled features selection. Hand and Henley (1997); Gemela (2001); Xu et al. (2009); Shi (2010); Lahsasna et al. (2010a); et al. (2012) produced their work in literature review. Yang et al. (2004); Hand (2005a); Lan et al. (2006); Dryver
and Sukkasem (2009) worked in performance measures. There are other papers covering model selection (Ziari et al., 1997), sample impact (Verstraeten and Poel, 2005), interval credit (Rezac, 2011), segmentation and accuracy (Bijak and Thomas, 2012).
2.2. The main peculiarities of the credit scoring papers
Overall the main classification methods in credit scoring are neural networks (NN) (Ripley, 1996), support vector machine (SVM) (Vapnik, 1998), linear regression (LR) (Hand and Kelly, 2002), decision trees (TREES) (Breiman et al., 1984), logistic regres- sion (LG) (Berkson, 1944), fuzzy logic (FUZZY) (Zadeh, 1965), genetic programming (Koza, 1992), discriminant analysis (DA) (Fisher, 1986), Bayesian networks (BN) (Fried- man et al., 1997), hybrid methods (HYBRID) (Lee et al., 2002), and ensemble methods (COMBINED), such as bagging (Breiman, 1996), boosting (Schapire, 1990), and stacking (Wolpert, 1992).
In comparison studies, the principal classification methods involve traditional tech- niques considered by the authors to contrast the predictive capability of their proposed methodologies. However, hybrid and ensemble methods are seldom used in comparison studies because they involve a combination of other traditional methods.
The main classification methods in credit scoring are briefly presented in the Section 3 as well as other issues related to credit scoring modeling, such as, types of the datasets used in the papers (public or not public),the use of the so called Australian or German datasets, type of the explanatory variables, feature selection methods, missing values im- putation (Little and Rubin, 2002) number of datasets used, exhaustive simulations, vali- dation approach, such as holdout sample, K-fold, leave one out, trainng/validation/test, misclassification cost criterions, such as Receiver Operating Characteristic (ROC) curve, metrics based on confusion matrix, accuracy (ACC), sensitivity (SEN), specificity (SPE), precision (PRE), false Positive Rate (FPR), and other traditional measures used in credit scoring analysis are F-Measure and two-sample K-S value.
3. The main classification methods in credit scoring
In this section, the main techniques used in credit scoring and their applications are briefly explained and discussed.
Neural networks (NN). A neural network (Ripley, 1996) is a system based on input variables, also known as explanatory variables, combined by linear and non-linear inter- actions through one or more hidden layers, resulting in the output variables, also called response variables. Neural networks were created in an attempt to simulate the human brain, since it is based on sending electronic signals between a huge number of neurons. The NN structure have elements which receive an amount of stimuli (the input variables), create synapses in several neurons (activation of neurons in hidden layers), and results in responses (output variables). Neural networks differ according to their basic structure. In general, they differ in the number of hidden layers and the activation functions applied to
them. West (2000) shows the mixture-of-experts and radial basis function neural network models must consider for credit scoring models. Lee et al. (2002) proposed a two-stage hybrid modeling procedure to integrate the discriminant analysis approach with artifi- cial neural networks technique. More recently, different artificial neural networks have been suggested to tackle the credit scoring problem: probabilistic neural network (Pang, 2005), partial logistic artificial neural network (Lisboa et al., 2009), artificial metaplastic- ity neural network (Marcano-Cedeno et al., 2011) and hybrid neural networks (Chuang and Huang, 2011). In some datasets, the neural networks have the highest average correct classification rate when compared with other traditional techniques, such as discriminant analysis and logistic regression, taking into account the fact that results were very close (Abdou et al., 2008). Possible particular methods of neural networks are feedforward neural network, multi-layer perceptron, modular neural networks, radial basis function neural networks and self-organizing network.
Support vector machine (SVM). This technique is a statistical classification method and introduced by Vapnik (1998). Given a training set {(xi,yi)}, with i = {1,…,n}, where xi is the explanatory variable vector, and yi represents the binary category of interest, and n denotes the number of dimensions of input vectors. SVM attempts to find an optimal hyper-plane, making it a non-probabilistic binary linear classifier. The optimal hyper-plane could be written as follows:
wixi +b=0,
where w = w1, w1, . . . , wn is the normal of the hyper-plane, and b is a scalar thresh-
old. Considering the hyper-plane separable with respect to yi ∈ {−1,1} and with ge-
ometric distance 2 , the procedure maximizes this distance, subject to the constraint ∥w∥2
yi (ni=1 wixi + b) ≥ 1. Commonly, this maximization may be done through the Lagrange multipliers and using linear, polynomial, Gaussian or sigmoidal separations. Just recently support vector machine was considered a credit scoring model (Chen et al., 2009). Li et al. (2006); Gestel et al. (2006); Xiao and Fei (2006); Yang (2007); Chuang and Lin (2009); Zhou et al. (2009, 2010); Feng et al. (2010); Hens and Tiwari (2012); Ling et al. (2012) used support vector machine as main technique for their new method. Possible particular methods of SVM are radial basis function least squares support vector machine, linear least-squares support vector machine, radial basis function, support vector machine and linear support vector machine.
Linear regression (LR). The linear regression ana
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com