SEMESTER 2 2018/19 COURSEWORK BRIEF:
Module Code:
MANG6054
Assessment:
Individual Coursework
Weighting:
100
Module Title:
Credit Scoring & Data Mining
Module Leader:
Cristian Bravo
Submission Due Date: @ 16:00
31 May, 2019
Word Count:
2000
Electronic via Blackboard Turnitin ONLY
Method of Submission:
(Please ensure that your name does not appear on any part of your work)
Any submitted after 16:00 on the deadline date will be subject to the standard University late penalties (see below), unless an extension has been granted, in writing by the Senior Tutor, in advance of the deadline.
University Working Days Late:
Mark:
1
(final agreed mark) * 0.9
2
(final agreed mark) * 0.8
3
(final agreed mark) * 0.7
4
(final agreed mark) * 0.6
5
(final agreed mark) * 0.5
More than 5
0
This assessment relates to the following module learning outcomes:
3
A. Knowledge and Understanding
A1. Understand the potential of KDD and data mining for developing scorecards.
B. Subject Specific Intellectual and Research Skills
B1. Work with software to develop credit scoring solutions; develop a scorecard using data mining techniques.
C. Transferable and Generic Skills
C1. Critically analyse practical difficulties that arise when implementing scorecards; understand the cross-fertilisation potential to other business contexts (e.g. fraud detection, CRM).
Coursework Brief: Guidelines
• All questions need to be answered!
• The coursework should be handed in as one report via TurnItIn.
• You should submit your coursework to ALL of the above by 4pm on the above-mentioned date.
• The report should contain page numbers and your student ID as a header on each page of the report. Marks
will be deducted by not including your student ID.
• Only answer what is asked for, do not include any irrelevant material and/or appendices (marks will be deducted if you do)!
• Make sure not to exceed 2000 words!
Question 0 (15 marks)
These are marks assigned to the:
• Structure, formatting and lay-out of your report (including page numbers, header, see above)
• Writing style and language use
SEMESTER 2 2018/19
Question 1 (50 marks)
The dataset “MicroSyn.csv” includes the information of 50,000 loans to small businesses, provided by a company in the UK. The following variables are available to you:
Variable
Description
Variable Type
AppNo
Region
Area
Activity
Guarantor Collateral Collateral_valuation Age
Properties_Status Properties_Total Amount
Term Historic_Loans Current_Loans
Max_Arrears Defaulter
Application Number
County the customer lives in
Section of county the customer lives in Economic activity of the customer (coded) Does the customer provide a guarantor? Does the customer provide collateral? Value of the collateral (GBP)
Customer Age
Status of the property the customer lives in (A: Owner, B: Renter, C: Granted, D: Other)
Total number of properties the customer owns
Loan amount (GBP)
Term of the loan (months)
Lifetime total number of loans the customer has requested
Total number of loans the customer is actively repaying (excluding this one) Maximum number of days the customer has been in arrears in previous or current loans (excluding this one)
If customer defaulted the current loan (TARGET)
ID (DON’T USE)
Categorical Categorical Categorical Binary Binary Numerical Numerical
Categorical Numerical Numerical Numerical Numerical Numerical
Numerical Binary
1.1 Carefully preprocess the data set by considering the following activities (20 marks):
• exploratory data analysis
• missing value handling (if any). Marks will be discounted by just replacing by a value, a correct study of missing
values is necessary.
• outlier detection and treatment (if any). Marks will be discounted by just eliminating or replacing by a value
without justification, a correct study of outliers is necessary.
• categorisation of the continuous variables (if deemed useful)
• coding the nominal variables using Weights of Evidence (note that some additional coarse classification might
be needed).
• splitting the data set into a training and test set. Each student should do this individually in a random way, using
their student ID as a random seed where possible. Hence, it is very implausible that students come up with the same parameter estimates! Special consideration will be given to students that come up with the same parameter estimates.
1.2 Estimate a scorecard using a logistic regression classifier and report the following (20 marks):
• The most important variables
• The impact of the variables on the target
• The performance of the model. Use various performance metrics and discuss their relationship if any.
Compare this scorecard with the results of a Random Forest run over the data without WoE and with WoE transformations. Discuss your results. Why do must banks use Logistic Regression as their base classifier? What do banks win and lose by doing this?
In terms of software, use SAS Enterprise Miner. Carefully report the various steps of your methodology and discuss your results in a rigorous way!
SEMESTER 2 2018/19
Question 2 (35 marks)
Find an academic or business paper published in 2018 or later discussing a real-life application of data mining or credit scoring. The paper must be published in a journal marked as 3 or 4 in the ABS journal list 2018 (you can find it in https://charteredabs.org/academic-journal-guide-2018/) It is important that the case considered is a real-life case and not an artificial one. Some suggested journals on this list are:
• Informs (http://www.informs.org/), e.g. o Informs Journal on Computing o Informs Management Science
o Informs Operations Research
• Elsevier (www.elsevier.com), e.g.
o European Journal of Operational Research o Journal of the Operational Research Society o Omega
o Computers and Operations Research
o Machine Learning
o Expert Systems with Applications
• Oxford University Press (http://www.oxfordjournals.org/), e.g.
o IMA Journal of Management Mathematics • Springer
o Data Mining and Knowledge Discovery
Once you have found an appropriate paper, report the following in separate sections:
• Title, authors and complete citation (journal name, book title, issue, year, …)
• The data mining problem considered
• The data mining techniques used
• The results reported
• A critical discussion of the model and results (assumptions made, shortcomings, limitations, …)
Make sure you demonstrate that you understand what the article is all about!
Do not copy and paste from the article. Using Turnitin, this will be easily detected!x
SEMESTER 2 2018/19
Nature of Assessment: This is a SUMMATIVE ASSESSMENT. See ‘Weighting’ section above for the percentage that this assignment counts towards your final module mark.
Word Limit: +/-10% either side of the word count (see above) is deemed to be acceptable. Any text that exceeds an additional 10% will not attract any marks. The relevant word count includes items such as cover page, executive summary, title page, table of contents, tables, figures, in-text citations and section headings, if used. The relevant word count excludes your list of references and any appendices at the end of your coursework submission.
You should always include the word count (from Microsoft Word, not Turnitin), at the end of your coursework submission, before your list of references.
Title/Cover Page: You must include a title/ cover page that includes: your Student ID, Module Code, Assignment Title, Word Count. This assignment will be marked anonymously, please ensure that your name does not appear on any part of your assignment.
References: You should use the Harvard style to reference your assignment. The library provide guidance on how to reference in the Harvard style and this is available from: http://library.soton.ac.uk/sash/referencing
Submission Deadline: Please note that the submission deadline for Southampton Business School is 16.00 for ALL assessments.
Turnitin Submission: The assignment MUST be submitted electronically via Turnitin, which is accessed via the individual module on Blackboard. Further guidance on submitting assignments is available on the Blackboard support pages.
It is important that you allow enough time prior to the submission deadline to ensure your submission is processed on time as all late submissions are subject to a late penalty. We would recommend you allow 30 minutes to upload your work and check the submission has been processed and is correct. Please make sure you submit to the correct assignment link.
You will know that your submission has completed successfully when you see a message stating ‘Congratulations – your submission is complete…’. It is vital that you make a note of your Submission ID (Digital Receipt Number). This is a unique receipt number for your submission, and is proof of successful submission. You may be required to provide this number at a later date. We recommend that you take a screenshot of this page, or note the number down on a piece of paper. You should also receive an email receipt containing this number, and the number can be found after submitting by following this guide. This method of checking your submission is particularly useful in the event that you don’t receive an email receipt.
You are allowed to test submit your assignment via Turnitin before the due date. You can use Turnitin to check your assignment for plagiarism before you submit your final version. See “Viewing Your Originality Report” for guidance. Please see the Module Leader/lecturer on your module if you would like advice on the Turnitin Originality report.
The last submission prior to the deadline will be treated as the final submission and will be the copy that is assessed by the marker.
It is your responsibility to ensure that the version received by the deadline is the final version, resubmissions after the deadline will not be accepted in any circumstances.
Important: If you have any problems during the submission process you should contact ServiceLine immediately by email at Serviceline@soton.ac.uk or by phone on +44 (0)23 8059 5656.
Late Penalties: Further information on penalties for work submitted after the deadline can be found here.
Special Considerations: If you believe that illness or other circumstances have adversely affected your academic performance, information regarding the regulations governing Special Considerations can be accessed via the Calendar: http://www.calendar.soton.ac.uk/sectionIV/special-considerations.html
SEMESTER 2 2018/19
Extension Requests: : Extension requests along with supporting evidence should be submitted to the Student Office as soon as possible before the submission date. Information regarding the regulations governing extension requests can be accessed via the Calendar: http://www.calendar.soton.ac.uk/sectionIV/special-considerations.html
Academic Integrity Policy: Please note that you can access Academic Integrity Guidance for Students via the Quality Handbook: http://www.southampton.ac.uk/quality/assessment/academic_integrity.page?. Please note any suspected cases of Academic Integrity will be notified to the Academic Integrity Officer for investigation.
Feedback: Southampton Business School is committed to providing feedback within 4 weeks (University working days). Once the marks are released and you have received your feedback, you can meet with your Module Leader / Module Lecturer / Personal Academic Tutor to discuss the feedback within 4 weeks from the release of marks date. Any additional arrangements for feedback are listed in the Module Profile.
Student Support: Study skills and language support for Southampton Business School students is available at: http://www.sbsaob.soton.ac.uk/study-skills-and-language-support/.