MANG 2043 – Analytics for Marketing
MAT012 – Credit Risk Scoring
Copyright By PowCoder代写 加微信 powcoder
This Lecture’s Learning Contents
Classification methods in credit scoring
Divergence
Decision tree
Linear programming
Measuring scorecard performance
Assessing, monitoring and updating scorecards
(measuring the difference between distributions)
Divergence: difference in expectations of weights of evidence
Mahalanobis Distance (briefly covered last time)
: difference in distribution function
ROC curves: comparison of distribution function
Gini coefficient/Somer’s D concordance statistic
Confusion Matrix
Divergence
Introduced by Kullback (Continuous version of Information Value)
Let f(s|G) ( f(s|B)) be density functions of scores of goods, (G) ( bads (B)) in a scorecard. Divergence is then defined by
D ≥ 0 and D=0 f(s|G)=f(s|B)
D no overlap between scores of goods and bads
Really can only calculate the divergence by splitting scores into bands. If i bands with gi goods and bi bads in band i and
where w(s) is the weights of evidence at score
Building scorecard by maximising divergence
Assume score is
We need to maximise divergence. The likelihood functions f(x|G) and f(x|B) ) can be obtained empirically from the sample of past borrowers being used.
For any choice of attribute scores c we define the corresponding score distributions
then find the maximum divergence value as c varies, i.e.
Methods which group rather than score
Methods like classification trees, expert systems and neural nets end up with “scorecards” that classify applicants into groups rather than give a scorecard which adds the score for each answer.
Main approach is classification tree.
It was developed at the same time in statistics and computer science so is also called Recursive partitioning algorithm
Splits A -set of answers into two subsets, depending on the answer to one question so that the two subsets are very different
Take each subset and repeat the process until one decides to stop
Each terminal node is classified as in AG or AB
Classification tree depends on
Splitting rule –how to choose best daughter subsets
Stopping rule- when one decides this is a terminal node
Assigning rule- which categories for each terminal nodes
Classification tree:credit risk example
Rules in classification trees
Assigning Rule
Normally assign to class which is the largest in that node. Sometimes if D is default cost and L is lost profit, assign to good if good/bad ratio>D/L
Stopping rule
Stop either if subset is too small ( say <1% population)
Difference in daughter subsets is too small ( under splitting rule)
Really it is stopping and pruning rule, as always have to cut back some of the nodes. Do this by using a second sample ( not used in building the tree)
Pruning Decision Trees
Split data into a training sample and a validation sample
Use training sample to grow the tree
Use validation sample to decide on optimal size of the tree
Two approaches
Grow tree, monitor error on validation set and stop growing when the latter starts to increase; Or
Grow full tree, and prune retrospectively using the validation set
Splitting rules : Kolmogorov-Smirnov rule
Maximise |p(L|B)-p(L|G)|
L=parent: R=own+tenant
p(L|B)=120/500
p(L|G)=80/1500
KS= |(120/500)-(80/1500)|=.187
L=parent+tenant; R=owner
p(L|B)=320/500
p(L|G)=480/1500
KS=|(320/500)-(480/1500)|=.32
Choose split 2.
Residential status Owner Tenant With parents
No. of goods 1020 400 80
No. of bads 180 200 120
Good:bad odds 5.6:1 2:1 .67;1
Think of daughters as L(left) and R (right).
P(L|B) is prop of bads in original set
who are in left daughter ( p(L|G) similar)
Note: with just 2 categories |p(R|B)-p(R|G)|
=|1-p(L|B)-(1-p(L|G)|=|p(L|B)-p(L|G)|
gives same answer
Splitting rules: Chi square rule
Residential status Owner Tenant With parents
No. of goods 1020 400 80
No. of bads 180 200 120
Good:bad odds 5.6:1 2:1 .67;1
Think of daughters as L(left) and R (right).
P(L|B) is prop of bads in original set
who are in left daughter ( p(L|G) similar)
Assume split of G/B was same L and R to get expected then
n(G)/N=1500/2000=0.75, n(B)/N=500/2000=0.25. Then
ELG = 0.75n(L), ELB = 0.25n(L) etc
L=parent; R= owner+tenant:
n(L)=200, n(R)=1800,
=(70)2(1/150+1/50+1/1350+1/450)=145
L=Parent+tenant, R=owner:
n(L)=800, n(R)=1200
=(120)2(1/300+1/900+1/200+1/600)=168
Choose split 2 (larger difference)
Random forest
Since 2000, random forest extension of classification trees are proving popular
Build lots of trees, each using subset of data and subset of characteristics.
For each new case, classify under each tree
Choose class which majority of trees choose.
This ensemble idea can be used for the other classification approaches but has really only been tired on trees so far
Assumption is that some trees are picking up local effects
Nearest neighbour approach
There are other classification methods which make no assumptions about underlying population. The most popular one is nearest neighbour methods.
A metric is given to the space of application answers so that the distance from application form answers x1 to application form answers x2 is d(x1, x2). Applicant’s distance to the answers in a training sample set is calculated and the k-nearest neighbours are identified. Classify as good if a majority of these k are good.
Examples where k varies and
Linear discriminant and nearest 3 neighbours with Euclidean metric
Different metric and 5 nearest neighbours
Dimension reduction
Computation resource used can be reduced by reducing the dimension
reducing the number of original features, e.g omitting features that are insignificant in linear discriminant.
Use Principal components analysis to replace original features by a smaller number of variables formed by linear combinations of the original features with maximum variance.
Choice of metric
There are some modern, complex, algorithms that can learn good metrics. For example, choose vector of weights to produce good classification on the basis of a metric of the form
Two popular algorithms (too complicated to go into here) are::
Large margin nearest neighbour analysis
Neighbourhood components analysis
Mahalanobis Distance
If assume f(s|G) and f(s|B) are and respectively, divergence reduces to
Measure the Mahalanobis distance between the two mean scores of the scorecard
This is what discriminant
analysis maximises.
Kolmogorov-Smirnov Statistic
Not a difference in expectations but difference in the distribution functions F(s|G) and F(s|B) (max. difference)
Confusion Matrix
Example of Confusion Matrix
Accuracy (13798+765)/19117 = 0.76
Error rate (4436+118) /19117 = 0.24
Sensitivity 13798/18234 = 0.76
Specificity 765/883 = 0.87
Accuracy=(13798+765)/19117=0.76; Error rate = (118+4436)19117=0.24;
Specificty=13798/18234=0.76; Sensitivity =765/883=0.86
ROC (Receiver Operating Characteristics) curve
A small scorecard example
A small scorecard example
A small scorecard example
ROC curve for the small example
Gini coefficient
Concordant & Discordant Pairs
Example to calculate the C & D pairs
Somer’s D – Concordance statistic
Cumulative Accuracy Profile (CAP) Curve
The Cumulative Gains Chart for the example
The Cumulative Gains Chart for the example
Lift chart
(i.e. plot F(s|B)/F(s)) against F(s) for various scores s
Linear programming
Assume nG goods labelled i = 1, 2, …nG
nB bads labelled i = nG+1, ….nG+nB
Require weights wj j= 1, 2, . . . . p and a cut off value, c such that
For goods: w1 xi1 + w2 xi2 + + wp xip > c
For bads: w1 xi1 + w2 xi2 + + wp xip
deals more easily with large numbers of application characteristics
Classification trees, neural nets, Support vector machines pick up relationships between variables which may not be obvious
Example of non-linear relationship
(|)(|log(|)(|()
DivergenceDfsGfsBdsfsGfsBwsds
InformationValueIVgnbngnbn
(|,)(|); (|,)(|)
fsGfGfsBfB
x:c.xx:c.x
(|,)(|,log
MaxDMaxfsGfsBds
residential status
years at bank
employment
res. status
actualected
dxxEuclideanxxDwxx
Graph of simple scorecard on age and income
, if ,then
Minimise g
subject to
OUTfNETffwxwx
% of goodsphoneno phonetotal
owner903086
rent furnished604550
% of goods
rent furnished
/docProps/thumbnail.jpeg
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com