CS计算机代考程序代写 HW 5 – 2SLS & Logistic model

HW 5 – 2SLS & Logistic model

Due: 12/12 11:59pm

Please submit a Word or PDF file, attach necessary R codes within the Word or PDF.
Also attach necessary R outputs and snapshots as you see appropriate.
Please submit to Blackboard.

Name:

Q1. Women with children work less than women without kids. In a model where labor supply is regressed on the number of children in a household, the coefficient on the number of children is negative, large in magnitude, and statistically significant. This does not mean that the drop in work is actually caused by the presence of children in the house. (Why not?) To obtain a consistent estimate of the impact of kids on labor supply, some authors have suggested using whether a mother had twins on their first birth as an instrument for the number of children in the household. Twins are in many respects random, and the realization of a twin increases the number of children in the household by 1.

The data come from the 1980 Public-Use Micro Sample 5% Census (PUMS) data files. The file contains a sample of women aged 21- 40 with at least one kid. The 1980 PUMS identifies a person’s age at the time of then census and their quarter of birth. Because the census is taken on April 1st, we know a person’s year and quarter of birth and we can infer that any two kids in the household with the same age and quarter of birth are twins. There are roughly 6,000 1st births to mothers that are twins. There are over 800,000 observations in the original data set: the data file hw5.twin1stv13.csv contains a random sample of about 6,500 non-twin births for a total of about 12,500 observations.

Variable
Description

agem
Mother’s current age in years

agefst
Mom’s age when she first gave birth

race
1 = white, 2 = black, 3 = other races

educm
Mother’s years of education

married
Dummy variable for current marital status, 1 = married, 0 = not

kids
Number of children ever born to the mother

boy1st
Dummy variable, = 1 if first kid is a boy, = 0 otherwise.

twin1st
Dummy variable, = 1 if the first pregnancy ended in a twin birth

weeks
Weeks worked in previous year (from 0-52)

worked
Dummy variable, = 1 if the Mom worked at all in the previous year

lincome
Labor income earned in the previous year

Answer the following questions.

(a). What fraction of women work? What are the average weeks worked among women that work? What are median labor earnings for women who worked?

(b). Construct an indicator that equals 1 for women that have a second child. Call this variable SECOND. What fraction of women had a second child?

Consider a simple bivariate regression where WEEKS (Y) is regressed on SECOND (X) such as . What is the coefficient for in this regression?

Because of the concern that X and ε are correlated, use twins on 1st birth TWIN1ST (Z) as an instrument for X in an instrumental variables model.

Consider the first stage regression of X on Z. Why is the coefficient on Z not 1 — e.g., don’t twins increase the number of kids in the house by 1?

What is the IV estimate for ? [FYI, this IV estimate is also known as the Wald estimate.] Compare the coefficient to the OLS estimate you produced above. Why does it differ?

(c). A number of authors have used twins as an instrument for fertility in a number of different papers. The argument is that twins are “random”, but the question is whether twins convey information about the mother. Create indicators for the mother’s race. Run a series of regressions with 6 different outcomes (EDUC, AGEFST, MARRIED, and whether the mother is white, black, or some other race) on a single indicator: TWIN1ST.

Interpret the coefficients. What coefficients are statistically significant? Are these differences economically meaningful, that is, are the coefficients large in magnitude [hint: you can compare the effect of a variable to the standard deviation of the variable in the sample (NOT the s.e. of the coefficient)]? What do these results suggest about the “randomness” of twins on first birth?

(d). Now that we know twins are correlated with some observed characteristics, run two labor supply models via OLS, with weeks worked (WEEKS) and whether a mom worked (WORKED) as outcomes, and control for mothers’ agem, agefst, educm, black, other race, married and SECOND. What is the impact of a second child on labor supply and weeks worked?

Now, use TWIN1ST as an instrument (for SECOND) in these models. Compare these estimates to the IV (Wald) estimates in (b). What has happened to the labor supply impacts of having a second child? Explain.

For each of these two models, construct a Hausman test that SECOND is exogenous in the labor supply models. Can you reject or not reject the null hypothesis that SECOND is exogenous?

(e). The results in (c) suggest that twins might signal something about the mother that is correlated with labor supply, and as a result, the IV (Wald) estimates in (b) and the 2SLS estimates in (d) may be more inconsistent than OLS estimates. Calculate the correlation coefficient between Z and X. Given this value, is this a concern?

(f). Create three dummy variables that indicate whether the mother’s first birth was before age 20, between ages 20 and 24, or after age 24. Next, interact TWIN1ST with these three variables to construct three instruments. Estimate the 1st stage regression (use the set of controls in (d)) and see whether there is a different effect on fertility (SECOND) based on what age the mother had a twin on the first birth (to clarify, include the three interactions only; do not include main effects of TWIN1ST or FIRST-BIRTH AGE indicators). Using an F test, test two different hypotheses. The first is that the instruments are all the same value, and the second being that the instruments are all equal to zero. Can you reject or not reject the null hypotheses in these cases?

(g). Using weeks worked and whether the mother worked as outcomes and the same covariates as in (d), use the three instruments from (f) in a 2SLS model where SECOND is considered an endogenous variable. What has happened to the coefficient on SECOND in the WEEKS and WORKED equations in these over-identified models? Conduct tests of over-identifying restrictions for these two models. What are the degrees of freedom on these test statistics? Do you reject or not reject the null hypothesis that the model is correctly specified?

Q2. It is September 2028. After his stunning and decisive upset victory over Donald Trump Jr. in the Indiana primary, Republican Presidential Nominee Ted Cruz now faces the daunting challenge of taking on his even more surprising opponent: Democratic nominee Pete Buttigieg, the youthful former mayor of South Bend, Indiana. As Biden’s Secretary of Transportation, Buttigieg has electrified rural voters with his successful efforts to build smart streets and roundabouts in every small town in America. Cruz, however, remains optimistic. He believes it is actually a very close race at the moment. Further, if he can identify which of his issues resonates most with the American people, he is confident he can win and provide the nation with the leadership it so desperately needs. His pollsters have therefore gathered the following information from over 4,000 likely voters:

Variable
Description

cruz
1 = supports Cruz, 0 = supports Buttigieg

male
1 = male, 0 = female

socialcons
1 = respondent considers self-conservative on social issues, 0
= not socially conservative

fiscalcons
Fiscal conservatism scale (continuous variable). The higher the score, the more fiscally conservative the respondent is. The scale has been centered to have a mean of zero. The scale potentially runs from -13 to +13, but the observed values are a little less extreme than that.

ses
Socio-Economic Status (ordinal scale). 1 = Lower Class, 2 = Middle Class, 3 = Upper Class.

Answer the following questions by analyzing the data hw5.cruz.csv.

(a). Cruz thinks he has almost as much support as Buttigieg. Do you agree?

(b). Run three nested logistic regression models. First model: cruz ~ male, second: cruz ~ male + socialcons, third: cruz ~ male + socialcons + fiscalcons.
Cruz’s staff thought model 3 was the best. Explain how both likelihood ratio chi-square statistics and AIC statistics support this decision.

(c). According to the linear probability model (lm(cruz ~ male + socialcons + fiscalcons)), the effect of male is .405. What does that number mean, i.e. how do we interpret it? Why is that number potentially problematic?

(d). Obtain the marginal effect (i.e., the average partial effect) of gender (male) on Cruz’s support? How does this differ from what the linear probability model said about the effect of gender?

(e). Fill in the following classification table using the third model in (b). Pay attention to what 0 and 1 mean in the context. Use 0.5 as the cutoff.

True

Cruz
Buttigieg
Total

Classified (Predicted)
Cruz

Buttigieg

Total

How many cases are classified correctly? What is the sensitivity? Specificity?
How many cases would you classify correctly if you just predicted that nobody supported Cruz? Does the classification table do better? Indicate whether you think the classification table is useful in this case. If you don’t think it is useful, briefly explain why.
2