代写 graph statistic stata Problem Set 2: The Impact of Education on Smoking

Problem Set 2: The Impact of Education on Smoking

This problem set uses the dataset smoking.dta.

A group of researchers is working for an organization that aims to reduce smoking among the population. They hypothesize that increasing education is associated with a decline in smoking. The underlying theory is that as people gain more education, they are better able to acquire and process information on the dangers of smoking.

The dependent variable is cigs, defined as the number of cigarettes an individual smokes per day. The independent variable of interest is education.

Table 1 Variable Definitions
Variable
Description
educ
years of schooling
cigpric

state cigarette price, cents per pack

white

=1 if white

age

age in years

income

annual income, $

cigs

cigarettes smoked per day

restaurn

=1 if state restaurant smoking restrictions

Useful Stata Commands

These commands may be helpful for the problem set, though do not be concerned if you do not use all of the commands described. There are often multiple ways to accomplish the same task in Stata. Do not forget about the other commands you have learned in previous lessons. The underlined portions are the abbreviations for the commands. The italicized words are user-defined variables and expressions. For additional information about each command, please see the help files.

• sort varlist : sort variables according the variables listed in varlist

• regress depvar indvar, r : Run a regression of depvar on indvar with robust standard errors

• Example: reg salary age if female==1, r (runs a regression of salary on age for all of the females in the sample)
• Example: reg salary age if yrseduc > 12, r (runs a regression of salary on age for those individuals who have more than 12 years of education)

• twoway scatter yaxisvar xaxisvar : Create a scatter plot of yaxisvar against xaxisvar.

• Example: twoway scatter salary age, title(“Salary vs. Age”) (creates a scatter plot of salary against age)
• Example: twoway (scatter salary age, title(“Salary vs. Age”)) (lfit salary age) (creates a scatter plot of salary against age with the regression line on the graph)

• generate newvar = expression : Creates a new variable that is equal to something (e.g. an existing variable, a number, etc.)

• Example: gen salarynew = salary (creates a new variable called salarynew that is equal to the old variable)

• replace var1 = expression : Replaces the values of an existing variable

• Example: replace salarynew = . if salary > 50000 (If the value of salary is greater than $50,000, the value of salarynew is replaced by a missing value)
• Example: replace salarynew = 1 (changes all values of salarynew to 1)

• drop varlist : drop variables in varlist
• Example: drop if education > 15 (drops all observations for which education is greater than 15)
Part 1: Descriptive Statistics

Post all answers below each question—do not delete the questions!

• Complete the following table of descriptive statistics. Note: You do not need to run a regression for this question.

Table 1: Descriptive Statistics

Minimum
Maximum
Mean
Standard Deviation
Number of Observations
cigs

educ

• What is the mean number of years of education that individuals participating in this study reported?
• What is the average number of cigarettes participants smoked per day?
• Were there people in the study that did not smoke at all?
• What was the highest number of cigarettes smoked per day?

Part 2: Regression Analysis

• Using Stata, run a regression of cigs on educ with robust standard errors. Copy and paste the Stata output into your write-up. *Note: copy and paste as a picture or alternatively using font Consolas size 10 or 11. Make sure what you post is legible to your reader.

• Report the sample regression function (with robust standard errors in parentheses beneath the coefficients).

• Interpret the coefficient on education (.

• Is statistically significant? Explain how you arrived at your conclusion.

• Report the 95% confidence interval for.

• Based on these findings, one of the researchers wants to publish the results that increased education is needed to reduce smoking. Why might you advise against this?

• Using the dataset, smoking.dta, create a different bivariate regression model that aims to explain the causes behind smoking. For your model provide the following:

• The alternative and null hypothesis.
• Interpret the coefficient on the independent variable, interpret the direction, magnitude, and significance of the coefficient.
• Explain if the model allows you to satisfactorily conclude that the independent variable chosen explains the number of cigarettes smoked per day.

Do-file