Customer Analytics ¡ªIndividual Research Project¡ª
1. Background Information
You are working for a telecommunication provider. The company wants to improve their customer lifetime value (CLV) calculations for newly acquired customers. The key question that the firms marketing managers have is how they can account for the fact that it is very difficult to know a customer¡¯s relationship duration in advance. Yet customer relationship duration is one of the key information to be considered in CLV calculations.
When they describe this problem to you, you suggest that you might be able to help. In particular, you believe that you can use a survival model in order to predict customer survival probabilities and then use those probabilities to improve the CLV calculations. You agree with the marketing managers that you will check the data available and perform the necessary analyses for them.
2. Data, Sample and Variables
You are provided a dataset that contains information about 3,333 randomly sampled customer relationships. The dataset is called ¡°telecom_churn¡± and is a csv file. The dataset contains the following variables:
Churn: Information whether or not a customer has churned.
AccountWeeks: The duration of the customer relationship as it is reflected in the time the customer
had an active account at the firm. The variable is captured in weeks. Whether or not a customer has a data plan
Gigabytes of average monthly data usage
How often a customer has called the service hotline
DataPlan:
DataUsage:
CustServCalls:
DayMins:
DayCalls:
MonthlyCharge: Average monthly bill
OverageFee: Largest overage fee in the last year RoamMins: Average number of roaming minutes
Average daytime minutes (calling time) per month Average number of daytime calls
3. Your Tasks
Please use R to perform the following tasks. You can earn a total of 100 points.
1) (10 points) Estimate a base survival model (i.e., without explanatory variables) for an average customer. Call this mod0. Please provide the output and visualize the survival curve.
2) (20 points) Please estimate a model that includes DataPlan as an explanatory variable. Call it mod1. Please provide the output and visualize the survival curve. Would you prefer mod0 or mod1 for predicting customer survival probabilities. Why?
3) (10 points) You want to use the model mod1 to make predictions of survival probabilities to inform your customer acquisition efforts (e.g., which customers should be preferably acquired). Do you see a chance to improve model performance given the data at hand? Please explain your answer.
4) (10 points) You decide to move on with mod1. Critically evaluate the predicted curve. Do you see any reason for concern?
5) (30 points) You decide to use mod1 to calculate the expected CLV for customers without a data plan and customers with a data plan. The annual interest rate is 5% (note that you have to translate this into weekly discount rates). For an assumed customer lifetime of 500 weeks, please calculate the CLV and the probability corrected CLV for customers without a data plan and customers with a data plan. Please present the correct results in a table (i.e., CLV and probability corrected CLV for both customer prototypes). Should the firm focus on either type of the two customers in their future customer acquisition efforts?
Here is a little helper on how to achieve that: First, you have to derive monthly average cash flows for the two customer prototypes separately from the variable MonthlyCharge (use DataPlan as a grouping variable).
Second you have to calculate the average weekly cash flows from this data (average monthly cash flow * 12/51); you can use the weekly average cash flow as a cash flow for each of the 500 weeks.
Third, you have to derive the predicted survival probabilities for each customer. We have not done the coding for this in the tutorial but it can be achieved in a few steps. You just have to make sure that you use your own variable names in the code below.
Suggested code for this step
# First install the rms package, which is required to derive predictions for different points in
# time. install.packages(“rms”) library (“rms”)
# Then you have to rerun mod1 using the psm function that is equivalent to the survreg
# function used in the tutorial
mod1_psm <- psm(Surv(AccountWeeks, Churn) ~ DataPlan, data = Telco1, dist="weibull")
mod1_psm # This model is the same as the previous model mod1
# We produce a sequence which will define the points in time at which we want predicted survival probabilities from our model.
weeks <- seq(1,500, by = +1)
# We define the levels of the DataPlan variable for which we want probabilities n.dat <- expand.grid(DataPlan = levels(DataPlan))
# We ask the model for predictions for 500 weeks ahead. b1<-survest(mod1_psm, newdata = data.frame(n.dat), time=weeks)
# We rearrange the data such that we can easily use it for the cash flow predictions. b2<-cbind(n.dat, b1)
b3<-melt(b2, id.vars=c("DataPlan"), variable.name="time", value.name="surv prob") b3
Fourth, you now have all necessary information to calculate the CLV and probability corrected CLV for both customer prototypes. You can do this either in R or in Excel.
6) (20 points) Please present a simple visualization that demonstrates your key insight from the probability corrected CLV to managers. (It is easiest to use PowerPoint to provide an appropriate chart.)