程序代写代做代考 data science 1. Description of the Data Set

1. Description of the Data Set
The topic of this data set is to predict customer churn and suggest actions to prevent it, which is a classical application of predictive analytics. Acquiring new customers is usually at much higher cost than retaining them. Thus, it is of great importance for banks, but also other companies to identify potential churners and find the right measures to win back their loyalty.
We highly appreciate that UniCredit has agreed to sponsor this data set for academic purpose. The transmitted data must be treated as strictly confidential by all parties involved and is to be used only for the academic purpose of this seminar “Data Science for Business”, WiSe 2020/21 at LMU.
The data set describes customers of a fictive mobile only bank – “DSC PocketBank”. For simplicity, DSC PocketBank is offering in a very comfortable and easy-to-use way basic banking services like current accounts, credit and debit cards.
2. Classification Task
You receive two data sets. One including information on the customers (e.g. age, residence, and customer status) and a second data set shows the transactions of the customers over the last three months. Your task is to predict customer churn as accurate as possible. Some feature engineering is required to achieve good results. In addition, it can be useful to use external data like exchange rates to make transactions comparable. A good performance measure for this classification task is the Matthews Correlation Coefficient (MCC) on which your results will be evaluated. As a data scientist it is key to bring your insights into actions. Therefore, commercial actions should be suggested, too.
3. Feature Overview
A. Customer Characteristics
For each customer we have the following information:
Feature
id
Type
String
Exemplary Value
ef63MaOSe0T16LYA152I3f47
Description
unique randomly generated
id
gender
String
M = male F = female
gender of the subject
tenure_days
Integer
250
number of days since the subject is a customer
age
Integer
36
age of the subject in years
primary_address_city
String
MILANO
city where the subject lives (with * we refer to “other”, the cities with only a few customers)
italian_citizenship_dummy
Integer
1 = Italian citizen 0 = other citizen

student_dummy
Integer
1 = student
0 = no student
worker_dummy
Integer
1 = worker
0 = no worker
promo_dummy
Integer
1 = having a promotion
0 = not having a promotion
Customer got acquired due to a promotion (“voucher”)
premium_dummy
Integer
1 = owning a premium account
0 = not owning a premium account
Customer that pay more for more service
churn_dummy
Integer
1 = having closed the account, i.e. churn 0 = having not closed the account
Dependent variable
B. Transaction Characteristics
For each customer, we have several transactions:
Feature
Type
Exemplary Value
Description
ids_id
String
02eMeOS5aTL8c504YA7aeI93
the foreign key for linking to the customers table
tp_mov_gk
String
ATMC = ATM cash withdrawal COMM = fees (e.g. exchange fee) MOTO =MailOrderTelephoneOrder (MOTO) payment
POSC = Contactless POS payment POSF = POS payment
POSR = Recurrent payment
POSV = e-commerce POS
RFND = Refund
STPO = Payment Cancellation refund VERS = Card recharge (with cash)
transaction type (with * we refer to “other”)
co_naz_iso
String
ITA
country where the transaction took place
dv_mov
String
EUR
currency
im_mov
Double
-6.85, 3.50
transaction amount
tipo_carta
String
CD = debit card CC = credit card
card type
new_dt_ope
Timestamp
02.08.2019 14:37:00
date and hour of the transaction