FM 9528 – Banking Analytics Coursework 2
Coursework 2- Credit Risk Analytics
Freddie Mac is one of the largest secondary mortgage market actors. Financial institutions sell their mortgages to Freddie Mac, which stimulates the market as the banks know they can – if they evaluate their customers correctly – reduce their risk by having a larger institution take the burden of a default were it to occur. Freddie Mac on the other hand will evaluate the portfolios they buy to make informed decisions on which mortgages should enter their portfolio. Freddie Mac has been chartered by the US congress and its mission is to “provide liquidity, stability and affordability to the nation”. As part of their transparency push, they make available the Single-Family Loan Dataset1 that, as the name implies, covers mortgages to single family households.
In this coursework, you will develop a fully compliant PD model from the data they make available, from the raw data to the level 2 calibration, using what you have learned in the lectures. The objective of the coursework is to estimate the capital requirements for Freddie Mac as if they were a bank.
You are given information from approximately two million loans, corresponding to operations that originated between the years 2014 and 2018. The data includes information from the origination of the loan (variables present in the table “Origination Data File” in the user guide) which can be used to predict the performance of the loan, and the information from the last time the loan was observed in the dataset (variables present in the table “Monthly Performance File” in the user guide) which are NOT for prediction but can be used to understand costs and benefits for each loan (question 5 and 6). You are also given a Default flag (variable “Default”) which marks if the mortgage has defaulted for the purposes of the coursework.
With this information, the dataset, and your knowledge from the course, answer the following questions:
1. (25%) Clean the dataset so it is ready to apply models to it. Discuss all your decisions.
2. (15%) Calculate the WoE and perform the variable selection procedures you see fit. Explain
your decisions.
3. (20%) Construct a scorecard which can model the probability of default for the loans.
Discuss your choice of variables, embedded selection methods, choice of parameters of these and your final performance in terms of AUC. How many variables do you recommend using?
4. (20%) Compare your scoring model with an XGBoosting model and Random Forest model trained over the data without the WoE transformation. Use cross-validation to determine your optimal parameters, if necessary, discuss the accuracy metrics you deem relevant. Compare the performance of the three models and discuss your findings.
5. (10%) Discuss the variable importance for all models. Do they agree? Why? Design a two- cut-off point strategy for your scorecard and discuss its results.
6. (Extra credit, 20% See extra submission tab in OWL) Using the monthly macroeconomic information you consider relevant (see for example https://stats.oecd.org/Index.aspx), calibrate a long-run PD for the loans granted. For this, first segment your scorecard curve
1 http://www.freddiemac.com/research/datasets/sf_loanlevel_dataset.page 1
FM 9528 – Banking Analytics Coursework 2
into 7 to 15 groups, then regress your monthly Freddie Mac’s PDs (grouped from your objective variable) against the macroeconomic variables and the past PDs as discussed in the additional material left in OWL. Use the long-term forecasts you can find online from reputable sources (for example the OECD) for your long-term calibrated values. If you cannot find them, assume a value which makes sense to you and explain why. Analyse your results.
The remaining 10% is given by the format and style as discussed in the rubric.
Conditions of the coursework
Software: You must use Python to run the numerical calculations over your portfolio. A copy of your jupyter notebook must be attached to the coursework as an appendix in readable format, and a link to the notebook must also be included. Instructions how to export to PDF can be found here: https://stackoverflow.com/questions/52588552/google-co-laboratory-notebook-pdf-download. The notebook text MUST be machine readable (so no screenshots of the notebook please) otherwise a 25% discount will apply.
Word Limit: 2000 words +/-10% either side of the word count is deemed to be acceptable. Any text that exceeds an additional 10% will not attract any marks. The relevant word count includes items such as cover page, executive summary, title page, table of contents, tables, figures, in-text citations and section headings, if used. The relevant word count excludes your list of references and any appendices at the end of your coursework submission (including the code).
You should always include the word count (from Microsoft Word, not Turnitin), at the end of your coursework submission, before your list of references.
Title/Cover Page: You must include a title/ cover page that includes: your Student ID, Course Code, Assignment Title, Word Count. This assignment will be marked anonymously, please ensure that your name does not appear on any part of your assignment otherwise a discount will be applied.
Submission Deadline: November 18th, 23:59.
Turnitin Submission: The assignment MUST be submitted electronically via OWL. All required papers may be subject to submission for textual similarity review to the commercial plagiarism detection software under license to the University for the detection of plagiarism. All papers submitted for such checking will be included as source documents in the reference database for the purpose of detecting plagiarism of papers subsequently submitted to the system. Use of the service is subject to the licensing agreement, currently between The University of Western Ontario and Turnitin.com (http://www.turnitin.com).
Late Submission: Late submissions are possible up to seven days after the deadline. There is a linear 10% penalty per day of late submission (Final mark = Original mark – 10% * day) subtracted directly from the final mark. Submissions after the seven days are not accepted and will be considered a non-submission.
2