Columbia University MA in Economics
GR 5411 Econometrics I Seyhan Erden
Problem Set 6
due on Dec.14th at 10am through Gradescope
__________________________________________________________________________________________
Instructions: Please aim for clarity and make sure to give the answer to the question and nothing else. This problem set, just like the previous ones, will help you to get into the habit of getting your point across efficiently. This habit will help you in the final exam as well. (Also please specify the page number of each question when you are submitting your problem set to Gradescope)
1. (Practice question, ungraded, please attend recitation for solution) Prove Cramer Rao Lower Bound theorem simply by using the definition of correlation
−1 ≤ 𝑐𝑜𝑟𝑟[𝑡(𝑥), 𝑠𝑐𝑜𝑟𝑒] ≤ 1
−1 ≤ 𝑐𝑜𝑣[𝑡(𝑥), 𝑠𝑐𝑜𝑟𝑒] ≤ 1 1𝑣𝑎𝑟[𝑡(𝑥)]𝑣𝑎𝑟[𝑠𝑐𝑜𝑟𝑒]
hence,
where 𝑡(𝑥) is an unbiased estimator.
3𝑐𝑜𝑣[𝑡(𝑥), 𝑠𝑐𝑜𝑟𝑒]4! ≤ 𝑣𝑎𝑟[𝑡(𝑥)]𝑣𝑎𝑟[𝑠𝑐𝑜𝑟𝑒]
2. (15p) Let 𝑦 ~𝑁(𝑥#𝛽, 𝜎!). That is 𝜇 = 𝑥#𝛽 “”””
Find the maximum likelihood estimator for 𝛽 and 𝜎!.
3. (13p) Show that 𝛽=$%%&” = (𝑋#𝑍𝐻()𝑍#𝑋)()𝑋#𝑍𝐻()𝑍#𝑌 is the solution to the following
minimization problem:
where 𝐻 = 𝑄 𝜎!. ++ ,
𝑚𝑖𝑛*(𝑌 − 𝑋𝑏)#𝑍𝐻()𝑍#(𝑌 − 𝑋𝑏)
4. (Practice question, ungraded, please attend recitation for solution) Let 𝑦 be 𝑏 × 1, and 𝑋 be 𝑛 × 𝑘 (rank 𝑘)
𝑦 = 𝑋𝛽 + 𝜀
with 𝐸(𝑥”𝜀”) = 0. Define the ridge regression estimator
– ()-
𝛽M=NO𝑥𝑥#+𝜆𝐼R NO𝑥𝑦R “”/””
“.) “.)
here 𝜆 > 0 is a fixed constant. Find the probability limit of 𝛽M as 𝑛 → ∞. Is 𝛽M consistent for 𝛽?
5. (24p) (Consistency of clustered standard errors) Consider the panel data model 𝑌 = 𝛽𝑋 + 𝛼 + 𝑢 “0 “0 ” “0
where all variables are scalars and usual fixed effects assumptions are valid. Let 𝑴 = 𝑰1 − 𝑇()𝒊𝒊# where𝑖isa𝑇×1vectorofones.Alsolet 𝒀 =(𝑌 𝑌 … 𝑌 )#, 𝑿 =(𝑋 𝑋 … 𝑋 )#, 𝒖 =
” “) “! “1 ” “) “! “1 ” (𝑢”) 𝑢”! … 𝑢”1)#, 𝒀`” = 𝑴𝒀”, 𝑿`” = 𝑴𝑿” and 𝒖a” = 𝑴𝒖”. For the asymptotic calculations in this
problem, suppose 𝑇 is fixed and 𝑛 → ∞.
(a) (4p) Show that the fixed effects estimator of 𝛽 can be expressed as
(b) (4p) Show that
(hint: 𝑴 is idempotent)
(c) (4p) Let 𝑄4 = 𝑇()𝐸b𝑿`#𝑿` c and
3″”
− 𝛽 = N O 𝑿` # 𝑿` R O 𝑿` # 𝒖a “.) “.)
1-1
𝑄d4= OO𝑋=! 3 𝑛𝑇 “0
“.) 0.)
– ()-
𝛽M = N O 𝑿` # 𝑿` R O 𝑿` # 𝒀` %2 “” “”
“.) “.)
– ()-
𝛽M
%2 “” “”
Show that 𝑄d34 →5 𝑄34 (d) (4p) Let 𝜂 = 𝑿`#𝒖a ⁄√𝑇
and 𝜎! = 𝑣𝑎𝑟(𝜂 ). Show that “””6”
1-7
h O𝜂 → 𝑁b0,𝜎!c 𝑛”6
“.)
(e) (4p) Use your answer to (b) through (d) to prove
7 𝜎!
√𝑛𝑇b𝛽M%2−𝛽c → 𝑁i0,6j 𝑄!
Note that here you must use the following scalar version of fixed effects estimator
-1 -1
𝛽M =OO𝑋=𝑌=kOO𝑋=! %2 “0 “0 “0
“.) 0.) “.) 0.)
be the infeasible clustered variance estimator, computed using the true errors instead of the residuals so that
(f)
(4p) Let 𝜎l! 6,9:;<02=27
Showthat𝜎l! 6,9:;<02=27
→5 𝜎! 6
𝜎l ! = 6,9:;<02=27
𝑛𝑇
O b 𝑋= # 𝑢 c " "
1-!
34
6. (8p) Use the data in cornwell.dta (from Cornwell and Trumball1)
(a) (2p) Using the data for all seven years, and using the logarithms of all variables, estimate a model
relating the crime rate to prbarr, prbconv, prbpris, avgsen, and polpc. Use pooled OLS and include a full set of year dummies. Test for serial correlation assuming that the explanatory variables are strictly exogenous. If there is serial correlation, obtain the fully robust standard errors.
(b) (2p) Add a one-year lag of log(crmrte) to the equation from part (a), and compare with the estimates from part (a).
(c) (2p) Test for first-order serial correlation in the errors in the model from part (b). If serial correlation is present, compute the fully robust standard errors.
(d) (2p) Add all of the wage variables (in logarithmic form) to the equation from part (c). Which ones are statistically and economically significant? Are they jointly significant? Test for joint significance of the wage variables allowing arbitrary serial correlation and heteroskedasticity.
".)
1 Cornwell, C. and D. Trumball (1994) “Estimating the Economic Model of Crime with Panel Data,” Review of Economics and Statistics 76, 360-366
7. (20p) The data file income_democracy.dta contains a panel data set for 195 countries for the years 1960, 1965, ... 2000.2 The data set contains an index of political freedom/democracy for each country each year, together with each country’s income and various demographic controls. (The income and demographic controls are lagged five years relative to democracy index to allow time for democracy to adjust to changes in these variables)
Variable Name
Description
country
country name
year
year
dem_ind
index of democracy
log_gdppc
logarithm of real GDP per capita
log_pop
logarithm of population
age_1
fraction of the population age 0-14
age_2
fraction of the population age 15-29
age_3
fraction of the population age 30-44
age_4
fraction of the population age 45-59
age_5
fraction of the population age 60 and older
educ
average years of education for adults (25 years and older)
age_median
median age
code
country code
Notes: The income and demographic variable are lagged five years. For example, log_gdppc for year = 1965 is the logarithm of GDP per capita in 1960.
(a) (2p) Start by “telling” Stata that this is panel data. Is this data set a balanced panel? Explain
(b) (2p) What is the democracy index for United States in 1965? For Uruguay in 1965? For Trinidad
and Tobago in 1995? For Venezuela in 1995?
(c) (2p) What is the average overall democracy index for all years in this data set? What are the minimum and maximum values of dem_ind? What is the standard deviation? What are the 10th, 25th, 50th, 75th, and 90th percentiles of its distribution?
(d) (2p) Regress democracy index on logarithm of per capita GDP using standard errors that are clustered by country. Report your results.
2 The data were supplied by Professor Daron Acemoglu and are a subset of the data used in his paper with Simon Johnson, James Robinson, and Pierre Yared, “Income and Democracy” American Economic Review, 2008, 98:3: 808- 842
(e) (2p) Interpret the coefficient of log of gdp in part (d). Is it significant? Explain.
(f) (2p) If per capita income in a country increases by 20%, by how much is the democracy index
predicted to increase? What is 95% confidence interval for the prediction?
(g) (2p) Regress democracy index on logarithm of per capita GDP controlling for country fixed effects and using standard errors that are clustered by country
(h) (2p) Generate year dummies. Regress democracy index on logarithm of per capita GDP and year dummies, controlling for country fixed effects and using standard errors that are clustered by country. Report your results.
(i) (4p) Run the model in part (g) assuming homoskedasticity and no correlation with fixed effects and with random effects and do a Hausman test. Which model would be your choice according to the result of Hausman test?
8. (20p) In this problem we will study how firms’ investment responds to changes in their valuation. In particular, we will see how related investment and “Tobin’s Q” are using the dataset “q.dta”.3 This is a panel dataset of investment by 188 US firms over 11 years 1975-1985. The main variables we will be interested in studying are the broad investment to capital ratio ikb, and the broad Q qb. Q is the ratio of the market value of a firm’s equity and liabilities to their book value.
(a) (3p) Start by running a naïve regression. Regress 𝑖𝑘𝑏 on 𝑞𝑏 using heteroskedasticity – robust errors (reg ikb qb, r) Interpret the sign and magnitude of the coefficient. Can you give this estimate a causal interpretation? In particular, can you think of omitted factors that vary across firms but not over time? And omitted factors that vary over time but not across firms?
(b) (4p) Now let’s deal with time-invariant omitted factors that vary across firms. The variable 𝑐𝑢𝑠𝑖𝑝 identifies the firms, and the variable 𝑦𝑒𝑎𝑟 identifies the year of each observation.
Declare the dataset to be a panel.
Show that the entity demeaned regression and the fixed effect regression give the same answer. [Hint, for this, you will find the “bysort :” and the “egen” commands useful. For example, egen ikb_m = mean(ikb) will add a variable called ikb_m to the dataset containing the mean of ikb. bysort year: egen ikb_m = mean(ikb) will add a variable called ikb_m to the dataset which contains the mean of ikb by year]
3 Incidentally, you can access this data set at Wooldridge (http://fmwww.bc.edu/ec-p/data/wooldridge/datasets.list.html) textbooks through Boston College. They even set up a nice stata command that lets you read them straight into stata called bcuse. To install it type “ssc install bcuse” into stata. Then you can load this dataset by typing “bcuse q.dta, clear”
What has happened to the coefficient on 𝑞𝑏? What does this suggest about the omitted factors in the regression in (a)?
(c) (6p) Now let’s think about omitted factors that vary over time but not across firms.
(i) (2p) First, let’s just see whether investment varies a lot over time. Make a plot of how
overall investment varies over time. [Hint, you may find the collapse command useful for this, after you plot the data, make sure to clear and bcuse q.dta again, before you start answering the next question]
(ii) (2p) Add time fixed effects as well as firm fixed effects to your regression of ikb on qb. Make sure to use clustered errors (Heteroskedasticity – autocorrelation consistent errors). So the command must be: xtreg ikb qb i.year, fe vce(cluster cusip) Make a plot of the estimated time fixed effects over time. [Hint, you may find the coefplot package useful for this. To install it, type ssc install coefplot
(iii) (2p) Comment on the patterns of these time effect estimates. Are they consistent with what you know about US economic history? (hint:https://en.wikipedia.org/wiki/1979_oil_crisis)
(d) (4p) Now, let’s regress 𝑖𝑘𝑏 on 𝑞𝑏 assuming homoskedasticity and no serial correlation once suing (i) fixed effects and once with (ii) random effects model: Briefly explain the difference and test which one would be preferred.
(e) (3p) Between pooled and random effects models which one would you prefer? (hint: xttest0 after RE model) What is this test testing? (what is in the null hypothesis?)