CS代写 Statistical Science

Statistical Science
2006, Vol. 21, No. 2, 206–222
DOI: 10.1214/088342306000000259
⃝c Institute of Mathematical Statistics, 2006

Copyright By PowCoder代写 加微信 powcoder

Estimating the Causal Effects of Marketing Interventions Using Propensity Score Methodology
. Rubin and . . Propensity score methods were proposed by Rosenbaum and Rubin [Biometrika 70 (1983) 41–55] as central tools to help assess the causal effects of interventions. Since their introduction more than two decades ago, they have found wide application in a variety of areas, including medical research, economics, epidemiology and education, es- pecially in those situations where randomized experiments are either difficult to perform, or raise ethical questions, or would require exten- sive delays before answers could be obtained. In the past few years, the number of published applications using propensity score methods to evaluate medical and epidemiological interventions has increased dra- matically. Nevertheless, thus far, we believe that there have been few applications of propensity score methods to evaluate marketing inter- ventions (e.g., advertising, promotions), where the tradition is to use generally inappropriate techniques, which focus on the prediction of an outcome from background characteristics and an indicator for the in- tervention using statistical tools such as least-squares regression, data mining, and so on. With these techniques, an estimated parameter in the model is used to estimate some global “causal” effect. This practice can generate grossly incorrect answers that can be self-perpetuating: polishing the Ferraris rather than the Jeeps “causes” them to continue to win more races than the Jeeps ⇔ visiting the high-prescribing doc- tors rather than the low-prescribing doctors “causes” them to continue to write more prescriptions. This presentation will take “causality” seri- ously, not just as a casual concept implying some predictive association in a data set, and will illustrate why propensity score methods are gen- erally superior in practice to the standard predictive approaches for estimating causal effects.
Key words and phrases: Model, observational study, nonrandomized study, marketing research, promotion response, phar- maceutical detailing, return on investment.
. Rubin is . of Statistics, Harvard University, Cambridge, Massachusetts 02138, USA e-mail: . Waterman is Adjunct Associate Professor, The Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA e-mail:
This is an electronic reprint of the original article published by the Institute of Mathematical Statistics in Statistical Science, 2006, Vol. 21, No. 2, 206–222. This reprint differs from the original in pagination and typographic detail.
arXiv:math/0609201v1 [math.ST] 7 Sep 2006

2 D. B. RUBIN AND R. P. WATERMAN
1. INTRODUCTION
This presentation is very simple in some sense, but in our experience the issues being discussed are often misunderstood, despite their importance. The appli- cation that is used throughout the first sections is a real one that was encountered nearly a decade ago, but we think has enough in common with many ap- plications in business to be of general interest. The basic situation involves ordering a list of individu- als to contact (e.g., by telephone or personal visit), from most likely to generate additional revenue to least likely to do so. Every effort at contact requires some investment, and we want to target for contact those individuals who are most likely to generate a return on that investment (ROI). The basic con- fusion in this situation is due to the more general confusion between a “before–after” change associ- ated with an intervening event and a change that is “caused by” that intervening event; the culprit here is the word “change”—change from what? The before–after comparison is a change in time from before to after, but is that a “causal change”? The answer is almost always “no.”
Our specific application was a project for a ma- jor pharmaceutical company concerned with their marketing interventions with doctors for the pur- pose of promoting sales of a particular “life-style” drug. The marketing interventions could be visit- ing a doctor to describe the details of the drug (so called, “detailing”), or it could be dining the doc- tor at a nice restaurant to convey similar informa- tion, or it could be providing free samples of the drug. All of these interventions, and other similar ones, are designed to lead to “more” prescriptions (scripts) for the drug written by the detailed doc- tor. But the critical question is: “more” than what? The answer is quite clear: more than that doctor would have written without the visit, dinner or free sample. Otherwise, the investment has had no posi- tive return. Marketing interventions are designed to CAUSE A DIFFERENCE, and this difference, or change, is generally NOT a change in time.
The causal effect of the intervention on a doctor is the comparison of something you can see (e.g., the number of scripts written after being visited) with something you cannot see (e.g., the number of scripts written during the same period of time without being visited). Causal effects can be well estimated by essentially no existing statistical soft- ware based on predictive approaches, because causal
effect estimation differs from simple prediction. Nev- ertheless, causal effects can often be well estimated in examples like ours, by propensity score technol- ogy, described and illustrated later, starting in Sec- tion 3. Essentially, the idea is to create matched pairs of units, where one member of the pair has been exposed to the intervention and the other has not, but they are otherwise identical before the time of exposure, that is, they are “clones.” Finding such clones is a tall order because exact matches are al- most impossible to find in realistically sized data sets, and this is where the propensity score technol- ogy enters.
In the next section we describe the difference be- tween (1) simple prediction from the past to the fu- ture and (2) causal effects, and illustrate this distinc- tion in a couple of totally trivial, but hopefully re- vealing, artificial examples. We then describe in Sec- tion 3 the idea of “cloning” for causal effect estimation— not a new idea, but hopefully expressed in such a way that makes important points transparent. Here
we also introduce propensity score techniques.
The real example that motivated this presentation will then be described in Section 4, and diagnostic information will be presented concerning how suc- cessful the cloning using propensity scores appeared to be in this example. The results of our approach are estimates of individual doctor-level causal ef- fects, which could be used as building blocks for addressing complex causal questions involving ROI. Specifically, these estimated doctor-level causal ef- fects were then used to create an ordered list of the unvisited doctors, summarized in Section 5, ranked from those having large estimated causal effects of a visit, who should be visited, to those having small estimated causal effects of a visit, who should not be visited. In Section 6 we present an evaluation of our ordered list versus the company’s standard (or traditional) ordering, and document the superiority of our causal ordering over their standard ordering, using the company’s own criteria based on future
scripts written.
Section 7 presents a more mechanical and general
description of the basic methodology (e.g., in terms of units of analysis rather than doctors). Section 8 continues with a brief description of possible op- portunities for the general approach in e-commerce. Section 9 concludes with a discussion of three key features of our general approach: the absence of any outcome variables when creating the clones; the op- portunity to refine the causal estimates using mod- els relating the outcome variables to background

characteristics; and the use of traditional predic- tion models to select units, as defined by their ob- served background variables, that can be anticipated to have large causal effects of the intervention, and thus, a large ROI.
2. A CAUSAL EFFECT IS A “CHANGE,” BUT NOT A CHANGE IN TIME
Display 1 is a very simple display of the title of this section. We have one doctor, and at time 1, we have in the left box the number of scripts that doctor has written in the six months prior to time 1. We have to make a choice to visit this doctor to provide details about the drug of interest, or not to visit. The top branch of the display represents what will happen if we visit, that is, detail, the doctor, where the box at the upper right gives the number of scripts written in the six months following the visit, up until time 2. In contrast, the bottom branch represents what will happen if we do not visit this doctor, and the box at the bottom right gives the number of scripts written during the same period of time if the doctor is not detailed.
The number of scripts written at time 2 given in the upper right box compared to the number at time 1 given in the left box is a change in time of the number of scripts written, but it is not the causal ef- fect of the visit on number of scripts. It is the change in scripts written from time 1 to time 2 when the doctor is visited in between. Analogously, the num- ber of scripts in the lower right box compared to the number of scripts in the left box also is a change of scripts written, and it is also a change in time, but is not the causal effect of not being visited on the number of scripts written.
The critical comparison here that is causal is the comparison of the number of scripts in the top right box and the number of scripts in the bottom right box, which is the causal effect of the doctor being visited versus not visited on the number of scripts written, which does not involve the box on the left at all, at least not without some overly strong as- sumption (e.g., the time-2 box without the visit is identical to the time-1 box—no change in time if not visited).
A causal effect is the comparison of the outcome that would be observed with the intervention and without the intervention, both measured at the same point in time. This is indicated by the comparison of apples with apples at time 2, whereas any com- parison of something at time 2 with something at
time 1 is indicated by the comparison of apples with oranges. This point, we know, is obvious, but its force is sometimes lost in the complication of real and hypothetical examples. The basic framework is often described as the “ Model ” (Hol- land, 1986) for a sequence of articles starting in the 1970s, although the ideas obviously have much older roots (e.g., see Rubin, 1990, 2005, for some history, or Imbens and Rubin, 2006, for relationships to the history of causal inference in economics).
To illustrate, take a look at the specific case in Display 2, where Doctor A is a high prescribing doc- tor, writing 10 scripts at time 1, and 15 scripts at time 2, whether visited in between or not. Clearly, even though Doctor A writes a large number of scripts, there is no ROI to visit this doctor (for simplicity we are ignoring the cost of a detail, but that is simply
a known constant). In contrast, take a look at Dis- play 3, where Doctor B is a low prescribing doctor, writing only one script at time 1, and five or fewer at time 2; yet, Doctor B may be worth visiting, at least much more so than the higher prescribing Doctor A, because a visit to Doctor B will cause an increase in number of scripts from 1 to 5. Whether the four extra scripts, which are caused by the visit, generate a positive ROI for the visit depends on the cost of the visit, the profits from the scripts, and so on.
The point is simply the following: we should make investment decisions based on a comparison of the expected returns when making the investment and when not making the investment: Visit those doc- tors for whom visiting makes a larger positive dif- ference. Also, allocate company resources to those brands (and those marketing tactics) that provide the greatest marginal positive impact to the com- pany. Great advice (like “buy low, sell high”), but how do we do this in practice?
3. THE ESTIMATION OF CAUSAL EFFECTS
The gold standard for the estimation of causal ef- fects is to conduct randomized experiments, such as clinical trials, which are essentially required by the FDA (U.S. Food and Drug Administration) before approving a drug. An alternative, and one which is sometimes acceptable, even to the FDA, is to de- sign and carefully execute an observational study (a nonrandomized design).
Causal effect estimation is not the simple pre- diction of future events from past events, although these activities can play a role in addressing causal
PROPENSITY SCORE METHODOLOGY 3

4 D. B. RUBIN AND R. P. WATERMAN
Display 1. Causal effect vs. prediction for a particular doctor.
Display 2. Example: Doctor A. Temptation is to confuse “prediction” with “causal effect estimation.” Example: Visit high prescribing doctor. Waste of money to visit this doctor. Intervention has no causal effect. Here Causal Effect = 0.

questions. Thus, causal effect estimation is not gen- erally accomplished by: regression, data mining, neural nets, CART, support vector machines, ran- dom forests, and so on. Although such techniques can be helpful, none is central, and they can be es- pecially helpful after causal effects of the interven- tion for each unit (e.g., of the visit for each doc- tor) have been estimated because it may often be of interest to classify doctors into subgroups based on background variables describing types of doctors, where the subgroups differ by the expected size of their causal effects; this would help future targeting efforts—more on this in Section 9.
So, specifically, how should we think about causal effect estimation from real data? This is easy to de- scribe in principle from the hypothetical database depicted in Display 4. Each row in the matrix dis- played there represents one unit (e.g., one doctor), and the columns represent the measurements on them: number of scripts written at time 1, background variables such as age, sex, race, place of doctoral degree, years of practice, type of practice, the num- ber of scripts written by time 2 if visited between time 1 and time 2, the number of scripts written by time 2 if not visited, and the causal effect of being visited—the difference between the latter two. The checked boxes represent observed data values, and the question marks represent unobserved or missing values; the causal effects are all either: (1) a check
minus a missing, or (2) a missing minus a check, and both (1) and (2) are effectively missing, and so the entire column of causal effects is always effectively missing.
We describe the process for estimating causal ef- fects as “cloning” for causal effect estimation. Es- sentially, for each doctor who was visited, we seek a “matching” doctor, a “clone,” who was not visited, and we use that doctor’s observed outcome (i.e., number of scripts at time 2) to fill in for the first doctor’s missing outcome. Similarly, for each doctor who was not visited, we seek a matching doctor, or clone, who was visited, and we use that doctor’s ob- served outcome to fill in for the first doctor’s missing outcome. If no matching doctor can be found, the database at hand cannot support causal effect con- clusions, at least not without relying on assumptions outside the database (e.g., time-2 scripts without the visit equal time-1 scripts).
Display 5 is a simple reworking of Display 4 where all the missing “?” in Display 4 have been replaced by “!” to indicate that the missing values have been “found” (really “imputed”) by using the clones. Of course, finding exact clones for everybody in any real problem is essentially impossible, and yet the con- ceptual foundations of the above cloning approach rely on using all background variables used in mak- ing the decisions to visit one doctor and not visit another. Thus, the pressure to collect many such
PROPENSITY SCORE METHODOLOGY 5
Display 3. Example: Doctor B. It is much better to visit this doctor. The investment pays off. Here Causal Effect = 4. Pays to visit to increase business.

6 D. B. RUBIN AND R. P. WATERMAN
Display 4.
background variables is great, which, in turn, makes it essentially impossible to find exact clones for any- one.
The key idea for simplification is to use “propen- sity score” technology (Rosenbaum and Rubin, 1983). This approach allows all the covariates to be re- duced to a single covariate. This single covariate, the propensity score, is essentially the probability of being visited as a function of all of the background variables, as estimated from the database. Our ac- tual implementation incorporates other important adjustments and refinements, but the essential idea, and the one being discussed here, is to clone based on the propensity score. For a simple review of some ideas underlying propensity scores, see Rubin (1997, 2006). And for some more recent theoretical work, see Imbens (2000) and Imai and van Dyk (2004); for a couple of hundred thousand other references, just Google “propensity score.” Instead of going into de-
tails here, we describe our example, and how propen- sity scores work there.
4. ILLUSTRATION: AN ANONYMOUS CASE STUDY
The objective of our actual case study was to pro- duce a target list of doctors based on the estimated effects of a marketing intervention, where the doc- tors who are thought to provide the best ROI would be visited, at least before the others. That is, we want the doctors ranked by their estimated causal effects (due to a visit) on the number of scripts they would write.
The database consisted of approximately 250,000 doctors in the United States who were active in the medical area of the drug to be promoted. The script data came from an industry standard physician-level prescription database, the sales intervention data came from the company’s call reporting system and

the doctors’ background characteristics came from various other sources; these characteristics included specialty, region of the United States, dates of de- grees, and more than a hundred other such variables. The company was (and still is) a top tier U.S. phar- maceutical company.
Display 6 shows the distribution of what is con- sidered to be the most important determinant of whether a rep should visit a doctor: the number of prescriptions of this class of drugs written in the recent past. Notice the rather huge distributional difference between the number of scripts (at time 1) for those who were not visited on the left, and the number of scripts (at time 1) for those who were visited on the right. The doctors who were visited between time 1 and time 2 wrote about 50% more prescriptions per doctor at time 1. Why? Possibly because the salary compensation of the sales reps who visited the doctors was not tied to causing a
difference but more to the number of total scripts written by the doctors whom they visited (so they polished the Ferraris!).
But distributional differences between those not visited and those visited are not confined to the number of scripts written at time 1. Display 7 shows the distributions of their specialties (General Prac- tice, Family Practice, Internal Medicine, Endocrinol- ogy, OB/Gynecology, Cardiology) for doctors who were not visited (left) and who were visited (right). General Practice was more highly visited, as was Family Practice and Internal Medicine and Endo- crinology, whereas OB/GYN was not visited as much. Similar displays could be created for dozens of other background variables.
Display 8 presents a single picture that summa- rizes, at least t

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com