CS代考 Name: Student ID:

Name: Student ID:
Homework 4
Submit only answers when turning in your homework. Do not copy questions.
Problem 1: Interpreting Regression Results for Heterogeneous Treatment Effects [24 points]

Copyright By PowCoder代写 加微信 powcoder

Non-pecuniary, information-based strategies have been popular to encourage consumers to conserve electricity, water, natural gas, etc. without generating backlash. One example comes from Ferraro and Mirando, who ran an experiment in 2007 in partnership with the Water System in metropolitan Atlanta, Georgia, USA (you do not need to read the paper). To induce voluntary reductions in water use during a drought, the County sent three types of messages to randomly assigned households:
1. a tip sheet with information about reducing water consumption (pure information message)
2. a tip sheet and a personalized letter promoting pro-social behavior (weak social norm message)
3. a tip sheet, personalized letter promoting pro-social behavior, and a social comparison of the household’s water consumption with the median county consumption (strong social norm message).
In the Table below, the authors report the results of seven different regressions, all of which have water consumption in the Summer of 2007 as the outcome. In each regression, they estimate heterogeneous treatment effects of their three treatments with respect to seven different covariates, which are listed at the top of each column. Each of the seven regressions estimate how the effect of each of the three treatments on water usage varies with respect to one covariate.
We will focus on Column 4. In Column 4, the authors examine whether there are different treatment effects of the messages on water consumption for owners versus for renters. That is, this column, like the others, reports the results of a regression with “Thousands of Gallons of Water Used During the Summer” as the outcome variable, and investigates whether the treatment effects of the messages differ for owners and renters.
The authors have coded the “subgroup var” in this column such that Owners = 1 and Renters = 0. “High subgroup var.” means the subgroup variable = 1. I.e., in this case, “Subgroup var. (high = 1” can be read as if it said “Owner.”
The authors are therefore running a regression like:
Water.Use.In.Thousands.Of.Gallons ~ Treatment.1 + Treatment.2 +
Treatment.3 + Owner + Treatment.1 * Owner + Treatment.2 * Owner
+ Treatment.3 * Owner

Notes: * = p < 0.05, ** = p < 0.01. a. In plain language, describe what the information in the red box above means. Please interpret the coefficient, standard error, and presence/absence of statistical significance. [5 points] b. In plain language, describe what the information in the blue box above means. Please interpret the coefficient, standard error, and presence/absence of statistical significance. (Note, as described above, that owners = 1 and renters = 0 for this variable in their regression.) [5 points] c. In plain language, describe what the information in the green box above means. Please interpret the coefficient, standard error, and presence/absence of statistical significance. [5 points] d. Based on the data in this table, what is the treatment effect estimate of the strong social norm message for owners? [3 points] e. What is the treatment effect estimate of the strong social norm message for renters? [3 points] Problem 2: Non-Compliance: Encouraging Participation In State-Controlled Elections [39 points] (Inspired by GG 5.10) Guan and Green report the results of a canvassing experiment conducted in Beijing on the eve of a local election. Students on the campus of Peking University were randomly assigned to treatment or control groups. Canvassers attempted to contact students in their dorm rooms and encourage them to vote. No contact with the control group was attempted. Students in the same dorm rooms were always assigned to the same experimental group. At every dorm room the students visited, even those where no one answered, canvassers left a leaflet encouraging students to vote. You can download the dataset here.(The data can also be accessed from the HW4 folder on Canvas) Turnout refers to the voting status of the student and equals to 1 if the student voted, 0 otherwise. Hint: For parts a-d, you do not need to consider the role of the leaflet in any way. a. Suppose the canvassers used a placebo design: they had knocked on doors in the control group, too, and identified who was home at the time they canvassed. Then, to estimate the CACE, they compared the turnout rate of those who opened the door in the control group to those who opened the door in the treatment group. What would be the advantage of this approach? [5 points] b. Name one assumption the above approach requires and an example of why it might be wrong. [5 points] c. Estimate the average intent-to-treat effect (ITT) of being assigned to the treatment group. [5 points] d. Use regression with clustered standard errors to test the null hypothesis that the average intent-to-treat effect (ITT) is zero. [5 points] We now want to estimate the treatment effect of canvassing conversations, which means having a conversation with those reached, and to separate that from the effect of the leaflet that was distributed at all doors, which is not included in “canvassing conversations.” Furthermore, keep in mind that never-takers are those who don’t answer the door, and compliers are those who do answer the door. e. First, assume that the leaflet had no effect on turnout among compliers or among never- takers. Estimate the CACE of the canvassing conversations. [5 points] Now, we will develop a simple mathematical framework for evaluating what could generate this ITT estimate. ● Let ITT = the average Intent To Treat effect, or the difference in voter turnout rates between the entire treatment and control groups. ● Let α = the contact rate, the proportion of students in the treatment group the canvassers spoke with. Note that (1 - α) is therefore the proportion of students in the treatment group who the canvassers did not speak with. ● Let τ = the true CACE of the conversations (“the canvassing”) ● Let l = the true effect of leaving the leaflet, with lc being the effect on compliers and ln being the effect on never-takers. ln might be greater than zero, because people who weren’t home when the canvassers visited might still read the leaflet left at their doors once they get home. Given these variables, the ITT can be decomposed into the following: ITT = α ∗ (τ + lc) + (1 − α) ∗ ln f. Suppose the ITT is clearly positive and statistically significant. What does the formula above tell you about the potential drivers of this difference in turnout between the entire treatment and control groups? That is, name all the causal effects that could be contributing to an overall positive ITT. [4 points] g. Assume the leaflet had a 1 percentage point effect on both the compliers and never-takers. Using the formula above, estimate the CACE of canvassing (τ). This will involve a small bit of algebra. [5 points] h. Assume the leaflet had no effect on compliers but a 3 point effect among never-takers. Using the formula above, estimate the CACE of canvassing (τ). This will involve a small bit of algebra. [5 points] Problem 3: Non-Compliance in the C-Suite: A Retailer Provides Coupons [36 points] (Inspired by GG 5.2) Suppose you are the Chief Marketing Officer for a retailer that has data on the home addresses of its 1,000,000 most active customers. You hope to determine whether sending out “20% off your entire purchase” coupons by mail will increase revenues. You conjecture that customers who have access to this coupon will spend more in the store over the next year. However, skeptics in your company argue that the coupons will just allow customers to spend less on items they would have purchased anyway. This is a debate that an experiment can resolve. In thinking about how large of an experiment you need to have enough statistical power, you realize that many of the customers you send the coupons to in the mail will not open the mail and so will not realize they received the coupon. a. The CEO argues that to estimate the effects of the coupons on revenue, you should compare the difference in revenues from a) people you sent the coupons and who used them and b) people you sent the coupons but who did not use them. Write a response to your CEO: describe the flaw with this plan in language the CEO will understand, and advocate for your proposed experiment. [5 points] b. To avoid the cost of sending out coupons you do not need to, you ask the data science team to plan an experiment just large enough (with just enough statistical power) to reliably detect a treatment effect if the true effect on those who open the mail and realize they have the coupon is a $2 increase in revenue over the next year. The data science team tells you that an experiment with 100,000 people in the treatment group (leaving the remaining 900,000 in the control group) will be well-powered to detect an overall difference between the entire treatment and control groups of $2 in revenue over the next year. To send out the minimum number of coupons required while still having enough statistical power to detect a $2 effect of opening the mail, should you send out fewer, the same number of, or more coupons than 100,000? Explain the intuition behind your answer. [5 points] c. Make up a hypothetical set of three pairs of potential outcomes for three individuals who open the mail (Compliers) and three pairs of potential outcomes for three individuals who do not open the mail (Never-takers) such that the true ATE (the true average treatment effect across all six individuals) is negative but the true CACE (the true average treatment effect but just for the three compliers) is positive. (Hint: the way to do this is to ensure there is a large negative effect among your three never-takers.) Each outcome should be “purchases over the next year, in dollars.” Each person will have one such potential outcome called “if opened mail” and one called “if did not open mail.” (Note that in the real world we would only ever observe the “if did not open mail” outcomes for the never- takers, regardless of whether they were in the treatment or control groups. However, we can still imagine what their outcomes might have been if they had in fact received mail and opened it. The following subquestions will ask you why this “imaginary” set of outcomes could still be important to consider.) [5 points] d. Give a concrete example in the context of this experiment about why the true ATE could be negative (if everyone were successfully treated and so knew they had the coupon) but the true CACE (so, the effect of knowing they have the coupon on those who know) positive. In other words, why could there be a negative true effect for never-takers (were we able to force them to take the treatment) but a positive true effect for compliers? [4 points] i. Note that the true ATE and observed ITT are different when there is non- compliance. When there is non-compliance, an experiment will allow you to estimate the ITT, which is “driven by” the CACE among the compliers. However, just like we can conceptualize the potential outcomes noted above for never- takers, we can also still conceptualize what the treatment effect might be for never-takers. This is true even though in an experiment with non-compliance, the never-takers never drive the ITT since they don’t get treatment even when in the treatment group. The “true ATE” is the average treatment effect across the compliers and never-takers, which includes this more hypothetical effect we can conceptualize for the never-takers. That is, if there would be an effect for the never-takers if we had somehow managed to force the never-takers to get treatment, than the ITT you observe and the more theoretical ATE could be different. e. Suppose you estimate a positive ITT and hence a positive CACE on revenue in this experiment. The CEO tells you that this means the store should cut prices for all 1,000,000 of the most active customers by 20%, not only with coupons but automatically at registers, because the experiment has shown that, in a random sample of these customers, lowering prices by 20% with the coupons has increased revenues. Assume all 1,000,000 of the most active customers would be aware of these proposed automatic discounts. Do you agree or disagree with the CEO’s interpretation of the experiment? Why? [4 points] f. You seek to convince your company that these coupons increase in-store sales. You can’t really estimate the compliance rate in this experiment, because you don’t know how many people opened the mail. But, based on an educated guess, you assume that 50% of people opened the mail and received the coupon. Based on this estimated compliance rate, you estimate that the effect of opening the coupon is a CACE of an average increase in spending in-store of $10. The head of the data science team argues that your interpretation is flawed, as their data indicates that only about 5% of people open their mail. The data science team head argues that this means the coupons were likely less effective in stimulating in-store purchases than you have argued. How would you respond? In particular, grant the data science team’s alternative assumption of a 5% open rate, and describe whether you agree with their interpretation or not. [5 points] g. Things go wrong in experiments. In this case, you learn from the marketing team that the wrong postage was put on 25,000 of the letters your mail vendor sent to 100,000 of the people you intended to send the coupons to. No coupons arrived for those 25,000 people. You do know which 25,000 people in the treatment group got the improper postage and therefore didn’t receive the coupons. However, you have been unable to figure out what determines whether individuals got the right postage or not, so can’t figure out who in the control group would not have received the coupons had they been sent them. The data science team argues the experiment now must be re-done from scratch. How would you respond? Is there a way to recover the average effect of sending a letter with correct postage, even though only 75% of your treatment group was sent the letter with correct postage? How? (In answering this question, ignore the question of whether people opened the letters. We are now trying to estimate just the average effect of sending a letter with correct postage.) [4 points] h. You later learn what determined whether individuals got the wrong postage: everyone whose last name started with the letter A or B got the wrong postage and therefore didn’t get the coupons. Only individuals in the treatment group whose last names began with the rest of the alphabet, C-Z, therefore actually got the letters. (E.g., , whose last name starts with the letter B, wouldn’t be sent the letter if he was in the treatment group; but , whose last name starts with L, would get it.) With this new knowledge, you’re now able to determine who in the control group would not have been sent a letter, either -- the As and the Bs. Now that you know this, would you change the analysis strategy you proposed in part (g)? How? [4 points] 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com