HDP1238 Causal Inference
Introduction to Causal Inference
Copyright By PowCoder代写 加微信 powcoder
Second assignment
(Due March 2, 2022 at 11:59 PM)
In this assignment you will practice and evaluate the use of a candidate instrumental variable to reduce selection bias.
This set of questions requires a re-analysis of the Tennessee class size data. Note that this data set includes only a subsample for this exercise and does not represent the entire sample of the original webster data. Our interest here is in estimating the average effect of attending a small rather than a regular class (D) on the first-grade reading achievement (Y, TREADSS1)) separately for African Americans (srace=2) and people of other races (srace 2). We refer to the last distinction as group 1 (G=1) for African Americans and group 0 (G=0) for people of other races.
The STATA data (webster.dta) are provided. Restrict samples to those with non-missing outcome (TREADSS1<999). Confirm that the sample size retained is 1,303 for group 1 and 3,008 for group 0. 1. In the Tennessee class size study, students assigned at random to small classes in kindergarten (Z=1, cltypek=1)) were expected to stay in small classes while those assigned to regular classes in kindergarten (Z=0, cltypek=2, 3) were expected to remain in regular classes in the later years. According to the public documentation, there was no evidence of non-compliance during the kindergarten year. Here you are asked to compare the first-grade treatment received (D=1 if cltype1=1; D=0 if cltype1=2, 3) with the kindergarten treatment assigned (Z). Do you observe any evidence of non-compliance among first graders? In this data set, what are the relative proportions of compliers, always takers, and never takers, under the assumed absence of defiers, for group 1 (African Americans) and for group 0 (people of other races)? Explain how you obtain these relative proportions. 2. In order to decide whether to use the initial treatment assignment (Z) as an instrumental variable in this case, you need to evaluate the conceptual validity or effectiveness of this instrumental variable against each of the five following assumptions that are made by both the LATE method and the econometric IV method, only by the econometric method, or only by the LATE method. The assumptions are: (a) the exclusion assumption, (b) the monotonicity assumption, (c) the exogeneity of Z, and (d) the absence of correlation between the effects of Z on D and the effects of D on Y, and (e) the efficiency of using Z as an IV. For each of these five assumptions, identify whether it is required for both methods, required only for the econometric 2sls method, or required only for the LATE. Explain the reason for this classification. For each of the five assumptions, explain whether the assumption can be justified by using either a theoretical or an empirical reasoning to judge how plausible the assumption is in the context of this study. 3. Now restrict your analysis to group 1. Regress Y on D. What is the average mean difference in first-grade reading achievement between those attending small classes (D=1) and those attending regular classes (D=0) in first grade. Thie is the Prima-Facie (PF) estimate of the effect of D on Y, Subsequently, estimate the ITT effect of Z on D denoted by 1. (Although the dependent variable D is binary here, it does not pose a problem for a simple linear regression model in this particular case. This is because when the predictor Z is also binary, 1 simply represents proportion difference.). Then analyze a reduced-form regression model and estimate the ITT effect of Z on Y denoted by 1. Finally, compute the ratio of the estimate of 1 to the estimate of 1 and report the result. How does the estimated ratio of 1 to 1 compare with the PF estimate of the effect of D on Y? 4. Continuation of 3: By working with the data for group 1 only, run a two-stage least squares analysis (you may use “ivregress 2sls” for IV analysis in Stata). Write down the regression models for the two stages of analysis. Record the estimated coefficient for D from the second stage and denote it as an IV estimate of 1. How does this estimate compare with your final result from 3? According to your analytic results from the two-stage least squares analysis, what do you conclude regarding the absence/presence of the causal effect of D on Y, and what is the average effect of class size reduction on the reading achievement of students? 5. Could you give a reason why the result from 4 might be different from the PF estimate obtained earlier when you simply regress Y on D? And explain why would the result from 4 be also different from the estimated ITT effect of Z on Y? 6. Conduct a two-stage least squares analysis for group 0. What do you conclude from the result of analysis? 7. Suppose you apply the LATE analysis instead of the econometric 2sls estimate for the effect of D on Y. Answer the following questions. (a) Do the LATE estimates differ from the econometric 2sls estimate? (b) Do the interpretations of results differ between the two? If they differ, explain why they differ. 8. Suppose you apply the following linear regression, using only the data of D=1. Assuming that Z is completely random and that monotonicity holds, express α and βusing some of the following quantities. Here, letters A, N, and C respectively indicate “Always takers”, “Never takers”, and “Compliers”, and P(.) indicates the proportion of each latent class. 9. For each group, conduct Black et al’s endogeneity test without a control for covariates. What do you conclude from this analysis for the children attending small classes and for those attending regular classes for each racial group. In particular, which of the following statement is correct for each racial group? Explain the reason. (a) The mean of Y does not differ between “compliers” and “always takers.” (b) The mean of Y does not differ between “compliers“ and “never takers.” (c) The mean of Y does not differ among “compliers”, “always takers”, and “never takers.” (d) The mean of Y differs among “compliers”, “always takers”, and “never takers.” 10. Based on the results of question 9, what can you do to increase the efficiency of the causal inference? Explain the reason. (You do not need an additional analysis here.) 11. Based on the two-stage least squares results of 4 and 6, test whether the difference in the average treatment effect between African Americans and people of other races is statistically significant. Describe the conclusion from this analysis. Note that there is no overlap in the sample between the two racial groups here. 12. Can the difference in the result of question 11 be interpreted as a causal effect? Explain the reason. ((0)|),((0)|),((0)|),((1)|), ((1)|),((1)|),(),(),() EYAEYNEYCEYA EYCEYNPLPLPL 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com