DSC423: Data Analysis And Regression / DSC 324: Data Analysis & Statistical Software II
Assignment-6 | Total Points: 10 pts for DSC 423/ DSC 324
Due Date: 10/30/2018 by 11:59 pm
Note: For all questions, immaterial if whether the relevant output is asked to be attached or not, make sure to include it. Also, it is important to include the sign (negative/positive or increase/decrease, and units of measurements e.g. $ or $ 99 million,%, etc.) otherwise points will be deducted.
Problem 1 [10 pts] Churn analysis – to be answered by everyone
Given the large number of competitors, cell phone carriers are very interested in analyzing and predicting customer retention and churn. The primary goal of churn analysis is to identify those customers that are most likely to discontinue using your service or product. The dataset churn_train.csv contains information about a random sample of customers of a cell phone company. For each customer, company recorded the following variables:
- CHURN: 1 if customer switched provider, 0 if customer did not switch
- GENDER: M, F
- EDUCATION (categorical): code 1 to 6 depending on education levels
- LAST_PRICE_PLAN_CHNG_DAY_CNT: No. of days since last price plan change
- TOT_ACTV_SRV_CNT: Total no. of active services
- AGE: customer age
- PCT_CHNG_IB_SMS_CNT: Percent change of latest 2 months incoming SMS wrt previous 4 months incoming SMS
- PCT_CHNG_BILL_AMT: Percent change of latest 2 months bill amount wrt previous 4 months bill amount
- COMPLAINT: 1 if there was at least a customer’s complaint in the two months, 0 no complaints
The company is interested in a churn predictive model that identifies the most important predictors affecting probability of switching to a different mobile phone company (churn = 1). Answer the following questions:
- Create two boxplots to analyze the observed values of age and PCT_CHNG_BILL_AMT by churn value. Analyze the boxplots and discuss how customer age and changes in bill amount affect churn probabilities. Include the boxplots.
- Fit a logistic regression model to predict the churn probability using the data in the dataset (Churn is the response variable and the remaining variables are the independent x-variables). Remove x-variables that are not significant using alpha=0.05. Include the SAS output. Write down the expression of the fitted model. (HINT: probability of interest is p = pr(churn = 1)
- Analyze the final logistic regression model and discuss the effect of each variable on the churn probability. Discuss results in terms of odds ratios.
- Using SAS, compute the predicted churn probability and the confidence interval for a male customer who is 43 years old, and has the following information LAST_PRICE_PLAN_CHNG_DAY_CNT=0, TOT_ACTV_SRV_CN=4, PCT_CHNG_IB_SMS_CNT= 1.04, PCT_CHNG_BILL_AMT= 1.19, and COMPLAINT =1. Include the output, interpret and explain the 3 values you obtained.
- Copy and paste your FULL SAS code into the word document along with your answers.