程序代写代做代考 database Problem Set 2 – Lending¶

Problem Set 2 – Lending¶
Asaf Manela¶

In your report, please answer the questions according to the order given here. The script should be submitted in the same notebook (.ipynb) file as this on Canvas.
In this assignment, you will build a credit model and use it to predict consumer default loan defaults. Specifically, you will study how the performance of default prediction will change when you introduce more information available.
You will start with a sample database of 10,000 three-year consumer loans – “ps2data.csv”. The database contains the following details for each borrower: ID, Loan Amount, Interest Rate (annual in percentage), Credit Grade, Detailed Credit Grade, Length of Employment, Revolving Credit Balance, Revolving Credit Utility (in percentage), FICO score, Home Ownership Status, Annual Income, Employment Verification, Loan Status and Default status of the loan.

Problem 1: Basic Model Estimate – the case with FICO score only
In this part, we will work with fico score only and see how that predicts default.
(a) Transform variable default into a format that Julia can understand. (1 for default and 0 for non-default).
In [ ]:

(b) Before estimating the model, what sign do you expect to see for the coefficient of fico score in a logistic regression? Give a brief explanation. One to two sentences should suffice.

Answer:

(c) Write a Julia script to estimate a logistic regression model with FICO score as predictor. Report your estimates for the intercept ad the coefficient for fico. Does your result suggest that fico score has significant explanatory power?
In [ ]:

Answer:

Problem 2: Model Evaluation
Now that you have the first fitted model, you are expected to evaluate its usefulness.
(a) Use the model you have to estimate the probability of each loan’s default.
In [ ]:

(b) Use the probability estimated, create an ROC curve plot. Use may use the code provided below for solving this part of the problem.
In [1]:
using Plots, Printf
using MLBase: roc, ROCNums, Forward, Reverse, false_positive_rate, true_positive_rate

function auc_from_rates(fpr::AbstractVector{T}, tpr::AbstractVector{R}) where {T,R}
O = promote_type(T,R)
dfpr = diff(fpr)
dtpr = diff(tpr)
a = zero(O); b = zero(O)
for i in 1:length(fpr)-1
a += tpr[i] * dfpr[i]
b += dtpr[i] * dfpr[i]
end
a + b / 2
end

“Area under curve (AUC) from an array of ROCNums”
function auc(rc::AbstractArray{T}) where {T <: ROCNums} rn = reverse(rc) auc_from_rates(false_positive_rate.(rn), true_positive_rate.(rn)) end function appendauc(showauc, label, rc) if showauc a = @sprintf "%4.3f" auc(rc) label = "$label ($a)" end label end """ Plots an ROC curve based on ground-truths ``gt``, ``scores`` by considering 100 evenly-spaced thresholds. Prediction will be made as follows: - When ``ord = Forward``: predicts ``1`` when ``scores[i] >= thres“ otherwise 0.
– When “ord = Reverse“: predicts “1“ when “scores[i] <= thres`` otherwise 0. When ``ord`` is omitted, it is defaulted to ``Forward``. """ function roccurve(gt, scores, ord=Forward; subplot=1, showauc=true, label="", kwargs...) rc = roc(gt, float.(scores), ord) label = appendauc(showauc, label, rc) plot(100*false_positive_rate.(rc), 100*true_positive_rate.(rc); label=label, subplot=subplot, kwargs...) plot!(0:100,0:100,l=:dash, linecolor=:black, label=""; subplot=subplot) ylabel!("True positive rate (% Sensitivity)") xlabel!("False positive rate (% Specificity)") end "Add another curve to existing ROC curve" function roccurve!(gt, scores, ord=Forward; showauc=true, label="", kwargs...) rc = roc(gt, float.(scores), ord) label = appendauc(showauc, label, rc) plot!(100*false_positive_rate.(rc), 100*true_positive_rate.(rc); label=label, kwargs...) end Out[1]: auc In [ ]: (c) What is the area under the curve? Is the area below your ROC curve greater than 0.5? Report the area. You can do this using function roc_auc_score from Scikitlearn.jl. In [ ]: Answer: (d) Suppose you develop a policy that rejects all the loan applicants whose predicted probability of default is greater than 10%. What is the proportion of consumers you mistakenly reject?(The denominator should be total number of people who do not default.) What is the proportion of consumers you correctly reject? (The denominator should be the total number of people who defaulted.) In [ ]: Answer: Problem 3: With Slightly Richer Data In this part, you are expected to conduct some further analysis taking advantage of the additional variables available. You are expected to run a logistic regression with fico, loan amount, interest rate, detailed credit grade, revolving credit balance and employment verification. The employment verification should be treated as two groups: non-verified and everything else. (a) Which variables should be transformed to factors in Julia? Answer: In [ ]: (b) Run a logistic regression using the explanatory variables listed above. Report your logistic regression results. Intercept and coefficient estimates only should suffice, and you can skip the coefficients for categorical variables. In [ ]: Answer: (c) Use the probability you estimated to create a ROC curve plot with the new model. Specifically, you should put the new ROC curve together with the one you obtain in part 2. Similar to ROC curve in the earlier part. In [ ]: (d) Is the area below your new ROC curve greater than that in part 2? Report the area. Interpret the differences. In [ ]: Answer: In [ ]: