BU.450.760 Technical Document T2.1 –L.L.Bean Catalog Targeting Prof.
L.L.Bean catalog targeting
This document describes the implementation of a catalog targeting campaign in L.L.Bean example discussed in class. The process has two steps: (1) prediction of response probabilities and expected expenditures (see S2.1) and (2) policy evaluation (see E2.1). The raw data that we are working from is contained by D2.1.
The key outcomes are y1 (whether or not the customer responded to the campaign) and y2 (how much the customer spent (if he/she responded). For obvious reasons, the latter variable is missing if a customer did not respond to the campaign. The sample includes data for ~175k customers. The dataset includes behavioral variables (recency, web & app browsing) as well as demographic variables (income, education, household size).
Copyright By PowCoder代写 加微信 powcoder
1. Estimating individual-level quantities
We aim at producing 2 quantities at the individual level: (1) a response probability (how likely each customer is to make a purchase if we send him/her the new catalog) and (2) an expected purchase amount (how much he/she will spend conditional on response).
We begin with workspace and estimation preliminaries, in lines 5-20 of S2.1. Note:
• Large non-response rate of ~60% (line 9) and corresponding missing y2 observations
• Load packages that are helpful for model evaluation (lines 13-14)
• Education levels are categorical, hence declared as factors (line 15)
• You may get different numbers compared to those in this document even if you use the
same random seed (line 17). You will get the same numbers each time you re-run the analysis using the same seed.
Response probabilities. We begin by predicting the response probabilities, comparing two specifications. The specification of line 23 only includes demographics. The richer specification of line 40 adds behavioral variables. We could have specified even richer models, for example,
BU.450.760 Technical Document T2.1 –L.L.Bean Catalog Targeting Prof.
by adding variable interactions (try it!). Models are estimated via logistic regression and 0.5 threshold used to dichotomize the response probabilities (lines 26 and 44).
For the logistic model 1, the confusion matrix output gives in-sample accuracy of 0.69 and out- of-sample accuracy of about the same number (0.6886). Thus, there is no suspicion of an acute overfitting problem. The same is true for the logistic model 2, for which these numbers are closer to 0.70.
Accordingly with out-of-sample accuracies, model 2 exhibits a larger out-of-sample AUC than model 1: 0.766 versus 0.741. We will thus select model 2. In line 53 we re-estimate this selected specification using the full dataset.
Expected expenditures. In line 60 we turn to predicting expenditures. Recall that, to do this, we only have data from those customers who respond to the promotion (y1=1). For most calculations, R will automatically recognize that data are missing, and no special instructions are needed. For other calculations, special instructions need to be given.
We consider the same two specifications used for the logistic model. Naturally, here we use a linear (as opposed to logistic) functional form. Model 1 is estimated, and its predictions generated in lines 62-63. Notice from the results of the line 64 command that predictions are generated for all observations—there are no missing observations! This step is crucial: by assuming that the same linear model governs expected expenditure for customers who responded and those who didn’t respond to the previous campaign, we are able to generate a prediction for the latter.
BU.450.760 Technical Document T2.1 –L.L.Bean Catalog Targeting Prof.
A histogram for these predicted values is created in line 65. All predicted values are positive. Had this not been the case (ie, some predictions are negative), we would have had to correct that by replacing negative values for 0.
Fit statistics are computed in lines 66 (in-sample) and 67 (out-of-sample). Out-of-sample is slightly worse but marginally so—no strong concern of overfitting.
The process is repeated for the second (richer) specification in lines 70-74. Out-of-sample fit is significantly better than that of the simpler model. In addition, there doesn’t appear to be a big concern for overfitting. Thus, we will select model 2. In line 77 we re-estimate using the full sample.
The process is completed by exporting the predicted quantities to a spreadsheet, where we will decide which customers to target with the new campaign by combining these individual-level estimates with other campaign parameters. It is important to also export the customer id variable because this will later allow us to implement the targeting policy.
2. Targeting policy evaluation
Our goal here is twofold: (i) determine which customers will be targeted, (ii) compute the return on marketing investments. We will work on the excel spreadsheet that was exported before. The analysis that is discussed below is presented in E2.1.
Overview of process and key outputs:
i. Add additional campaign parameters (column E)
ii. Compute per-customer expected profits if targeted (column F)
iii. Sort customers decreasingly on expected profits (columns H-I)
iv. Under the unlimited budget policy, determine whether each customer will be targeted and
evaluate the policy (columns L-K)
v. Under the limited budget policy, determine whether each customer will be targeted and
evaluate the policy (columns N-P)
Add additional campaign parameters. These are presented in column E. Notice that each time we reference these, cells will be fixed (eg, $E$6). Notice that for all customers/orders: (i) have
BU.450.760 Technical Document T2.1 –L.L.Bean Catalog Targeting Prof.
the same margin and (ii) have the same shipping and handling cost. We could relax these assumptions without changing the gist of our analysis.
Compute per-customer expected profits if targeted. This calculation addresses the question: if we target a given customer, how much profits should we expect to gain (or lose)? You can see the formula in Column F: response prob * (margin * expected expenditure – shipping & handling) – targeting cost. Notice that whereas shipping & handling cost is paid only if there is response, the targeting cost is paid for all targeted customers.
Sort customers decreasingly on expected profits. The process to determine which customer is targeted is much simplified when customers are sorted based on their associated expected profits calculated on the previous step. To do this, we copy the information of columns B and F into columns H and I. Importantly, we need to paste this information as values (otherwise we paste formulas whose values continue to change). Once this is done, we simply sort decreasingly based on the values of column I.
BU.450.760 Technical Document T2.1 –L.L.Bean Catalog Targeting Prof.
What you see in column I of E2.2 is the result of this process (screen shot above). Notice that, in contrast to column F, expected profits are decreasing. Now we can determine the targeting policy simply by first selecting customers at the top (most profitable), then successively moving down.
Under the unlimited budget policy, determine whether each customer will be targeted and evaluate the policy. If there is an unlimited budget, we select all customers that give us non- negative expected profits. That is, we keep going down until expected profits of column I are no longer positive. This rule is precisely what is implemented by the formulas of column K cells (ie, target if non-negative profits).
The number of targeted customers is just the summation of targeted cases and the investment, this number times the per-customer targeting cost. The value created is reflected by the summation of per-customer profits for all those customers that are targeted.1 The ROMI value indicates that each $1 invested into the campaign returns additional $2.1.
Under the limited budget policy, determine whether each customer will be targeted and evaluate the policy. Now we consider the more realistic case in which the marketing campaign is limited. The available budget that we consider is listed in cell E10 (you could change this parameter).
The process is very similar to what was done for the unlimited budget case. The difference we will have a more stringent stopping rule. We will target customers if two conditions are met: (i) customer has non-negative expected profits (as before) AND (ii) the budget use to target all previous (ie, more profitable) customers is not larger that the budget limit.
1 If you are unfamiliar with the SUMPRODUCT function used in L8, watch this video. 5
BU.450.760 Technical Document T2.1 –L.L.Bean Catalog Targeting Prof.
To incorporate the latter, we create a column (column N) which tracks the total budget that would be used if all previous customers had been targeted. Notice how the targeting rule of column O makes this logical condition explicitly through the use of the “AND()” function.
Not surprisingly, the number of customers targeted is smaller than before—we are now focusing on 75% most profitable customers of those previously targeted with unlimited budget. Accordingly, the ROMI increases from 2.1 to 2.6.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com