CS代考 International Journal of Forecasting 16 (2000) 149–172

International Journal of Forecasting 16 (2000) 149–172
A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers
Department of Business Studies, University of Edinburgh, Building, 50 , Edinburgh EH8 9JY, UK
Credit scoring and behavioural scoring are the techniques that help organisations decide whether or not to grant credit to consumers who apply to them. This article surveys the techniques used — both statistical and operational research based — to support these decisions. It also discusses the need to incorporate economic conditions into the scoring systems and the way the systems could change from estimating the probability of a consumer defaulting to estimating the profit a consumer will bring to the lending organisation — two of the major developments being attempted in the area. It points out how successful has been this under-researched area of forecasting financial risk.  2000 Elsevier Science B.V. All rights reserved.

Copyright By PowCoder代写 加微信 powcoder

Keywords: Finance; Discriminant analysis; Classification; Economic forecasting; Profit scoring
1. Introduction
Forecasting financial risk has over the last thirty years become one of the major growth areas of statistics and probability modelling. When financial risk is mentioned one tends to think of portfolio management, pricing of op- tions and other financial instruments (for exam- ple the ubiquitous Black – Scholes formula (Black & Scholes, 1973)), or bond pricing where Merton’s paper (Merton, 1974) is semi- nal. Less well known but equally important are credit and behavioural scoring, which are the
*Tel.: 144-131-650-3798; fax: 144-131-668-3053. E-mail address: (L.C. Thomas)
applications of financial risk forecasting to consumer lending. An adult in the UK or US is being credit scored or behaviour scored on average at least once a week as the annual reports of the credit bureaux imply. The fact that most people are not aware of being scored does not diminish from its importance. This area of financial risk has a limited literature with only a few surveys (Rosenberg & Gleit, 1994; Hand & Henley, 1997; Thomas, 1992, 1998) and a handful of books (Hand & Jacka, 1998; & Edelman, 1992; Lewis, 1992; Mays, 1998). The aim of this survey is to give an overview of the objectives, techniques and difficulties of credit scoring as an application of forecasting. It also identifies two developments
0169-2070/00/$ – see front matter  2000 Elsevier Science B.V. All rights reserved. PII: S0169-2070(00)00034-0
www.elsevier.com / locate / ijforecast

150 L.C. Thomas / International Journal of Forecasting 16 (2000) 149–172
in credit scoring where ideas from main-stream forecasting may help. Firstly there is a need to identify consumer risk forecasting techniques which incorporate economic conditions and so would automatically adjust for economic changes. Secondly, instead of seeking to mini- mise the percentage of consumers who default, companies are hoping they can identify the customers who are most profitable. Part of the catalyst for this development is the massive increase in information on consumer transac- tions which has happened in the last decade.
Credit scoring and behavioural scoring are the techniques that help organisations decide whether or not to grant credit to consumers who apply to them. There are two types of decisions that firms who lend to consumers have to make. Firstly should they grant credit to a new appli- cation. The tools that aid this decision are called credit scoring methods. The second type of decision is how to deal with existing customers. If an existing customer wants to increase his credit limit should the firm agree to that? What marketing if any should the firm aim at that customer? If the customer starts to fall behind in his repayments what actions should the firm take? Techniques that help with these decisions are called behavioural scoring
The information that is available in making a credit scoring decision includes both the applic- ant’s application form details and the infor- mation held by a credit reference agency on the applicant. However there is also a mass of the information on previous applicants — their application form details and their subsequent performance. In many organisations such in- formation is held on millions of previous cus- tomers. There is one problem with this in- formation though. The firm will have the appli- cation form details on those customers it reject- ed for credit but no knowledge of how they would have performed. This gives a bias in the sample. This is a serious problem because if the firm says those it rejected previously would
have been bad this decision will be perpetuated in any scoring system based on this data and such groups of potential customers can never have the opportunity to prove their worth. On the other hand there are usually sound reasons for rejecting such applicants and so it is likely that the rejects have a higher default rate than those who were previously accepted. Whether one can impute whether the rejected customers will be good or bad has been the subject of considerable debate. The idea of ‘reject infer- ence’ has been suggested and used by many in the industry. Hsia (1978) describes the aug- mentation method while other approaches are suggested in Reichert, Cho and Wagner (1983) and Joanes (1993). Hand and Henley (1993) in a detailed study of the problem concluded that it cannot be overcome unless one can assume particular relationships between the distributions of the goods and the bads which hold for both the accepted and the rejected population. One way around it, is to accept everyone for a short period of time and to use that group as a sample. What firms do seems to depend as much on the culture of the organisation as on any statistical validation. Retailers and mail order firms tend to accept all applicants for a short period of time and use that group to build scorecards. Financial institutions on the other hand are swayed by the cost of default and feel there is no way they can accept everyone, even for a trial, and so use versions of reject infer- ence.
In the next section we review the history of credit scoring. Then we examine the way credit scoring works and a general overview of the techniques that are useful in building credit scorecards. The fourth section gives a similar overview of behavioural scoring while the sub- sequent sections look at two proposed exten- sions of credit scoring which could give more robust and more focussed scorecards. The first extension tries to introduce dependence on economic conditions into credit scoring, while

L.C. Thomas / International Journal of Forecasting 16 (2000) 149–172 151
the second is the change of objective from minimising default to maximising profit.
2. History of credit scoring
Credit scoring is essentially a way of recog- nising the different groups in a population when one cannot see the characteristic that separates the groups but only related ones. This idea of discriminating between groups in a population was introduced in statistics by Fisher (1936). He sought to differentiate between two varieties of iris by measurements of the physical size of the plants and to differentiate the origins of skulls using their physical measurements. (1941) was the first to recognise that one could use the same techniques to discrimi- nate between good and bad loans. His was a research project for the US National Bureau of Economic Research and was not used for any predictive purpose. At the same time some of the finance houses and mail order firms were having difficulties with their credit management. Decisions on whether to give loans or send merchandise had been made judgementally by credit analysts for many years. However, these credit analysts were being drafted into military service and there was a severe shortage of people with this expertise. So the firms got the analysts to write down the rules of thumb they used to decide to whom to give loans (Johnson, 1992). These rules were then used by non- experts to help make credit decisions — one of the first examples of expert systems. It did not take long after the war ended for some folk to connect these two events and to see the benefit of statistically derived models in lending deci- sions. The first consultancy was formed in San Francisco by and in the early 1950s and their clients at that time were mainly finance houses retailers and mail order firms
The arrival of credit cards in the late 1960s
made the banks and other credit card issuers realise the usefulness of credit scoring. The number of people applying for credit cards each day made it impossible both in economic and manpower terms to do anything but automate the lending decision. When these organisations used credit scoring they found that it also was a much better predictor than any judgmental scheme and default rates would drop by 50% or more — see Myers and Forgy (1963) for an early report on such success or Churchill, Nevin and Watson (1977) for one from a decade later. The only opposition came from those like Capon (1982) who argued ‘that the brute force empiricism of credit scoring offends against the traditions of our society’. He felt that there should be more dependence on credit history and it should be possible to explain why certain characteristics are needed in a scoring system and others are not. The event that ensured the complete acceptance of credit scoring was the passing of the Equal Credit Opportunity Acts (ECOA, 1975, 1976) in the US in 1975 and 1976. These outlawed discriminating in the granting of credit unless the discrimination could be statistically justified. It is not often that lawmakers provide long term employment for any one but lawyers but this ensured that credit scoring analysis was to be a growth profession for the next 25 years. This has proved to be the case and still is the case. So the number of analysts in the UK has doubled even in the last four years.
In the 1980s the success of credit scoring in credit cards meant that banks started using scoring for their other products like personal loans, while in the last few years scoring has been used for home loans and small business loans. Also in the 1990s the growth in direct marketing has led to the use of scorecards to improve the response rate to advertising cam- paigns. In fact this was one of the earliest uses in the 1950s when Sears used scoring to decide to whom to send its catalogues (Lewis, 1992).

152 L.C. Thomas / International Journal of Forecasting 16 (2000) 149–172
Advances in computing allowed other tech- niques to be tried to build scorecards. In the 1980s logistic regression and linear program- ming, the two main stalwarts of today’s card builders, were introduced. More recently, artifi- cial intelligence techniques like expert systems and neural networks have been piloted.
At present the emphasis is on changing the objectives from trying to minimise the chance a customer will default on one particular product to looking at how the firm can maximise the profit it can make from that customer. More- over, the original idea of estimating the risk of defaulting has been augmented by scorecards which estimate response (how likely is a con- sumer to respond to a direct mailing of a new product), usage (how likely is a consumer to use a product), retention (how likely is a consumer to keep using the product after the introductory offer period is over), attrition (will the consumer change to another lender), and debt management (if the consumer starts to become delinquent on the loan how successful are various approaches to prevent default).
3. Overview of the methods used for credit scoring
So what are the methods used in credit granting? Originally it was a purely judgmental approach. Credit analysts read the application form and said yes or no. Their decisions tended to be based on the view that what mattered was the 3Cs or the 4Cs or the 5Cs:
• The character of the person — do you know the person or their family?
• The capital — how much is being asked for?
• The collateral — what is the applicant willing to put up from their own resources?
• The capacity — what is their repaying ability. How much free income do they
• The condition — what are the conditions in
the market?
Credit scoring nowadays is based on statistical or operational research methods. The statistical tools include discriminant analysis which is essentially linear regression, a variant of this called logistic regression and classification trees, sometimes called recursive partitioning algo- rithms. The Operational Research techniques include variants of linear programming. Most scorecard builders use one of these techniques or a combination of the techniques. Credit scoring also lends itself to a number of different non-parametric statistical and AI modelling approaches. Ones that have been piloted in the last few years include the ubiquitous neural networks, expert systems, genetic algorithms and nearest neighbour methods. It is interesting that so many different approaches can be used on the same classification problem. Part of the reason is that credit scoring has always been based on a pragmatic approach to the credit granting problem. If it works use it! The object is to predict who will default not to give explanations for why they default or answer hypothesis on the relationship between default and other economic or social variables. That is what Capon (1982) considered to be one of the main objections to credit scoring in his critique of the subject.
So how are these various methods used? A sample of previous applicants is taken, which can vary from a few thousand to as high as hundreds of thousands, (not a problem in an industry where firms often have portfolios of tens of millions of customers). For each applic- ant in the sample, one needs their application form details and their credit history over a fixed period — say 12 or 18 or 24 months. One then decides whether that history is acceptable, i.e. are they bad customers or not, where a defini- tion of a bad customer is commonly taken to be someone who has missed three consecutive months of payments. There will be a number of customers where it is not possible to determine whether they are good or bad because they have not been customers long enough or their history

L.C. Thomas / International Journal of Forecasting 16 (2000) 149–172 153
is not clear. It is usual to remove this set of ‘intermediates’ from the sample.
One question is what is a suitable time horizon for the credit scoring forecast — the time between the application and the good / bad classification. The norm seems to be twelve to eighteen months. Analysis shows that the de- fault rate as a function of the time the customer has been with the organisation builds up initial- ly and it is only after twelve months or so (longer usually for loans) that it starts to stabi- lise. Thus any shorter a horizon is underesti- mating the bad rate and not reflecting in full the types of characteristics that predict default. A time horizon of more than two years leaves the system open to population drift in that the distribution of the characteristics of a population change over time, and so the population sam- pled may be significantly different from that the scoring system will be used on. One is trying to use what are essentially cross-sectional models, i.e. ones that connect two snapshots of an individual at different times, to produce models that are stable when examined longitudinally over time. The time horizon — the time be- tween these two snapshots — needs to be chosen so that the results are stable over time.
Another open question is what proportion of goods and bads to have in the sample. Should it reflect the proportions in the population or should it have equal numbers of goods and bads. Henley (1995) discusses some of these points in his thesis.
Credit scoring then becomes a classification problem where the input characteristics are the answers to the application form questions and the results of a check with a credit reference bureau and the output is the division into ‘goods’ and ‘bads’. One wants to divide the set of answers A into two subsets — x [ AB the answers given by those who turned out bad, and x[AG, the set of answers of those who turned out to be good. The rule for new applicants would then be — accept if their answers are in the set A G ; reject if their answers are in the set
A B . It is also necessary to have some consis- tency and continuity in these sets and so we accept that we will not be able to classify everyone in the sample correctly. Perfect classi- fication would be impossible anyway since, sometimes, the same set of answers is given by a ‘good’ and a ‘bad’. However we want a rule that misclassifies as few as possible and yet still satisfy some reasonable continuity requirement.
The simplest method for developing such a rule is to use a linear scoring function, which can be derived in three different ways — a Bayesian decision rule assuming normal dis- tributions, discriminant analysis and linear re- gression. The first of these approaches assumes that:
• • • • • •
pG is the proportion of applicants who are ‘goods’,
pB is the proportion of applicants who are bads,
p(xuG) is the probability that a ‘good’ ap- plicant will have answers x,
p(xuB) is the probability that a ‘bad’ applic- ant will have answers x,
p(x) is the probability that an applicant will have answers x,
q(Gux)(q(Bux)) is the probability that an applicant who has answers x will be ‘good- ’(‘bad’), so
q(Gux) 5 p(xuG) pG /p(x)
L is the loss of profit incurred by classifying a ‘good’ as a bad and rejecting them
D is the debt incurred by classifying a ‘bad’ as a good and accepting them.
The expected loss is then:
p(xuG)p 1D
q(Gux) p(x)1D
q(Bux) p(x) BG
and this is maximised when the set of ‘goods’ is taken to be:

154 L.C. Thomas / International Journal of Forecasting 16 (2000) 149–172
AG 5hxuDp(xuB) pB #Lp(xuG) pGj 5hxupB /pG #(p(xuG)L)/(p(xuB)D)j
If the distributions p(xuG), p(xuB) are multi- variate normal with common covariance this reduces to the linear rule:
AG 5hxuw1x1 1w2x2 1 ? ? ? ? ? ?wmxm .cj
as outlined in several books on classification (Lachenbruch, 1975; Choi, 1986; Hand, 1981). If the covariances of the populations of the goods and the bads are different then the analysis leads to a quadratic discriminant func- tion. However, in many classification situations (not necessarily credit scoring) (Titterington, 1992) the quadratic rule appears to be less robust than the linear one and the number of instances of its use in credit scoring is minimal (Martell & Fitts, 1981).
One could think of the above rule as giving a score s(x) for each set of answers x, i.e.
s(x)5w1x1 1w2x2 1 ? ? ? ? ? ?wmxm
If one could assume the discriminating power to differentiate between goods and bads was in the score s(x) rather than in x, then one has reduced the problem from one with m dimensions, represented by p(xuG), p(xuB) to one with one dimension corresponding to the probabilities p(suG), p(suB). This is the power of a scoring system in that minimising the loss expression (1) reduces to finding the optimal cut-off for the score, namely:
and the bads in the credit scoring context), Fisher (1936) sought to find which linear combination of the variables best separates the two groups to be classified. He suggested that if we assume the two groups have a common sample variance then a sensible measure of separation is:
M 5 (distance between sample means of two
1/2 groups)/(sample variance of each group)
Assume that the sample means are mG and mB for the goods and the bads, respectively, and S is the common sample covariance matrix. If Y5w1X1 1w2X2 1 ? ?

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com