HW 4 – DiD & FE model
Due: 11/28 11:59pm
Please submit a Word or PDF file, attach necessary R codes within the Word or PDF.
Also attach necessary R outputs and snapshots as you see appropriate.
Please submit to Blackboard.
Name:
Q1. A microfinance institution in Bangladesh raised the interest rate at two of its branches at a certain point in time (Tikapara TI and Kalyanpur KA, considered these together the treatment branch), while leaving the interest rate unchanged at a third branch (Geneva GE, considered this the comparison branch). The dataset hw4.safesave_slim_data.csv gives you information on loan balances loanbal, the outcome of interest, along with some other relevant variable (monthyear gives the time period, trend is a linear time trend, TIKA identifies the treatment group, and so on). The post period is defined from February 2000 onward.
Given this information, create a plot displaying the time trends in loan balances for the treatment and comparison branches (The comparison branches belong to the control group. Use different colors for treatment and control groups), pre and post interest rate change (draw a vertical line to indicate pre- and post-treatment). Eyeballing the figure, do they look parallel?
Now run a regression to provide a test of the parallel trend assumption in the pre-period (i.e., that the treatment and comparison branches have the same time trend before the interest rate change). Illustrate how you achieve it. Also, check whether within the comparison branches, either the intercept or slope changed when the interest changed.
In general, what is needed for the DiD strategy to be valid? In this context, what do you find?
Something you will notice is that the comparison and treatment branches have a different level of loan balances (i.e., intercept) pre-interest rate change. Is this problematic for the DiD strategy? Does this change persist when you control for the variables tinpr (time in program – the length of time the borrower has been with the bank) and nage (the age of the borrower)?
What is the effect of interest rate change on loan balance?
Make necessary transformations to the DV or model (Yes, DiD model is in essence a regression model – all the assumption tests and remedies we learnt apply here). Add loan fixed effects (nacc) and time fixed effects (notice we now have multiple time points), tinpr, and nage. Cluster the standard error of the coefficients according to nacc, and obtain heteroskedasticity-robust s.e.
Q2. Load the dataset called panel_hw4.csv. This dataset examines voter turnout in 49 US states (Louisiana is omitted because of an unusual election in 1982) plus the District of Columbia over 11 elections (contains data on 50 units over 11 time periods). Submit write-ups for these problems with necessary R codes and output snapshots.
a. Regress turnout as a percent of voting age population (vaprate) on the number of days before the general election by which an individual needs to register (regdead), state per capita income (gsp), the dummy variable for midterm elections (midterm), and the dummy variables for West North Central (WNCentral), the South (South), and the Border states (Border).
Which coefficients are significant? Are there any regional effects of these regions? Conduct the hypothesis testing.
b. Part (a) assumed that pooling the data was valid. Instead, estimate this with a fixed effects regression (individual effects only; NOT twoways; do NOT include time dummies in the model). Which variables are omitted from the estimation? Why?
c. Test for whether there is evidence of unobserved heterogeneity — that is, whether a fixed effects model is more appropriate than the pooled model. What is your testing hypothesis? What is your test statistic? Is pooling appropriate in light of the results of your test?
d. Now estimate a random-effects model using part (a). Why do those results differ from the fixed effects results? Is there evidence of unobserved heterogeneity — that is, whether a random effects model is more appropriate than the pooled model? Show your testing hypothesis, and decide on it. Are some variables omitted from the estimation? Why or why not?
e. Between FE and RE, which model is more appropriate? What is your underlying testing hypothesis? What implication does the null hypothesis have? Discuss the tradeoffs between using pooled OLS, fixed-effects, and random-effects for this model.
2