Economics 430 Introduction to Statistical Methods
and Econometrics
Review of
Linear and Multiple Regression
1
Today’s Class
• Review of the Simple Regression Model
• Review of the Multiple Regression Model
2
The Simple Regression Model
• Definition of the simple linear regression model
“Explains variable in terms of variable ”
Intercept
Slope parameter
Dependent variable, explained variable, response variable,…
Independent variable, explanatory variable, regressor,…
Error term, disturbance, unobservables,…
3
The Simple Regression Model
• Interpretation of the simple linear regression model “Studies how varies with changes in :”
By how much does the dependent variable change if the independent variable is increased by one unit?
Interpretation only correct if all other things remain equal when the indepen- dent variable is increased by one unit
as long as
• The simple linear regression model is rarely applicable in prac-tice but its discussion is useful for pedagogical reasons
4
The Simple Regression Model
• Example: Soybean yield and fertilizer
Rainfall,
land quality,
presence of parasites, …
Measures the effect of fertilizer on yield, holding all other factors fixed
• Example: A simple wage equation
Labor force experience, tenure with current employer, work ethic, intelligence, …
Measures the change in hourly wage given another year of education, holding all other factors fixed
5
The Simple Regression Model
• When is there a causal interpretation?
• Conditional mean independence assumption
The explanatory variable must not contain information about the mean of the unobserved factors
• Example: wage equation
e.g. intelligence …
The conditional mean independence assumption is unlikely to hold because individuals with more education will also be more intelligent on average.
6
The Simple Regression Model
• Population regression function (PFR)
– The conditional mean independence assumption
implies that
– This means that the average value of the dependent variable can be expressed as a linear function of the explanatory variable
7
The Simple Regression Model
Population regression function
For individuals with , the average value of is
8
The Simple Regression Model
• Deriving the ordinary least squares estimates
• In order to estimate the regression model one needs data
• A random sample of observations
First observation Second observation Third observation
n-th observation
Value of the expla- natory variable of the i-th observation
Value of the dependent variable of the i-th ob- servation
9
The Simple Regression Model
• What does “as good as possible” mean? • Regression residuals
• Minimize sum of squared regression residuals • Ordinary Least Squares (OLS) estimates
10
The Simple Regression Model
• Fit as good as possible a regression line through the data points:
For example, the i-th data point
Fitted regression line
11
The Simple Regression Model
• CEO Salary and return on equity
Average return on equity of the CEO‘s firm
Salary in thousands of dollars
• Fitted regression Intercept
• Causal interpretation?
If the return on equity increases by 1 percent, then salary is predicted to change by $18,501
12
The Simple Regression Model
Fitted regression line (depends on sample)
Unknown population regression line
13
The Simple Regression Model
• Wageandeducation • Fitted regression
Years of education
Hourly wage in dollars
Intercept
In the sample, one more year of education was associated with an increase in hourly wage by $0.54
• Causal interpretation?
14
The Simple Regression Model
• Voting outcomes and campaign expenditures (two parties) Percentage of campaign expenditures candidate A
Percentage of vote for candidate A
• Fitted regression Intercept
• Causal interpretation?
If candidate A‘s share of spending increases by one percentage point, he or she receives 0.464 percen- tage points more of the total vote
15
The Simple Regression Model
• Properties of OLS on any sample of data • Fitted values and residuals
Fitted or predicted values Deviations from regression line (= residuals)
• Algebraic properties of OLS regression
Deviations from regression Covariance between deviations Sample averages of y and x line sum up to zero and regressors is zero lie on regression line
16
The Simple Regression Model
For example, CEO number 12‘s salary was $526,023 lower than predicted using the
the information on his firm‘s return on equity
17
The Simple Regression Model
• Goodness-of-Fit
“How well does the explanatory variable explain the dependent variable?”
• Measures of Variation
Total sum of squares, represents total variation in the dependent variable
Explained sum of squares, represents variation explained by regression
Residual sum of squares, represents variation not explained by regression
18
The Simple Regression Model
• Decomposition of total variation Total variation Explained part Unexplained part
• Goodness-of-fit measure (R-squared)
R-squared measures the fraction of the total variation that is explained by the regression
19
The Simple Regression Model
• CEO Salary and return on equity
The regression explains only 1.3%
of the total variation in salaries
• Voting outcomes and campaign expenditures
The regression explains 85.6% of the total
variation in election outcomes
• Caution: A high R2 does not necessarily mean
that the regression has a causal interpretation!
20
The Simple Regression Model
• Incorporatingnonlinearities:Semi-logarithmicform • Regressionoflogwagesonyearsofeducation
Natural logarithm of wage
• Thischangestheinterpretationoftheregression coefficient:
Percentage change of wage
… if years of education are increased by one year
21
• Fitted regression
The wage increases by 8.3% for
every additional year of education
(= return to another year of education)
For example:
The Simple Regression Model
Growth rate of wage is 8.3% per year of education
22
The Simple Regression Model
• Incorporating nonlinearities: Log-logarithmic form
• CEO salary and firm sales
Natural logarithm of CEO salary Natural logarithm of his/her firm‘s sales
• This changes the interpretation of the regression coefficient:
Percentage change of salary … if sales increase by 1%
Logarithmic changes are always percentage changes
23
The Simple Regression Model
• CEO salary and firm sales: fitted regression
• For example:
+ 1% sales; + 0.257% salary
• The log-log form postulates a constant elasticity model, whereas the semi-log form assumes a semi-elasticity model
24
The Simple Regression Model
• Expected values and variances of the OLS estimators
• The estimated regression coefficients are random variables
because they are calculated from a random sample
Data is random and depends on particular sample that has been drawn
• The question is what the estimators will estimate on average and how large their variability in repeated samples is
25
The Simple Regression Model
Standard assumptions for the linear regression model
• Assumption SLR.1 (Linear in parameters)
In the population, the relationship
between y and x is linear
• Assumption SLR.2 (Random sampling)
The data is a random sample
drawn from the population
Each data point therefore follows the population equation
26
The Simple Regression Model
• Discussion of random sampling: Wage and education
– The population consists, for example, of all workers of country A
– In the population, a linear relationship between wages (or log wages) and years of education holds
– Draw completely randomly a worker from the population
– The wage and the years of education of the worker drawn are random because one does not know beforehand which worker is drawn
– Throw back worker into population and repeat random draw times
– The wages and years of education of the sampled workers are used to estimate the linear relationship between wages and education
27
The Simple Regression Model
The values drawn for the i-th worker
The implied deviation from the population relationship for
the i-th worker:
28
The Simple Regression Model
• Assumptions for the linear regression model (cont.)
• Assumption SLR.3 (Sample variation in the explanatory
variable)
The values of the explanatory variables are not all the same (otherwise it would be impossible to stu- dy how different values of the explanatory variable lead to different values of the dependent variable)
• Assumption SLR.4 (Zero conditional mean)
The value of the explanatory variable must contain no information about the mean of the unobserved factors
29
The Simple Regression Model
• Theorem (Unbiasedness of OLS)
• Interpretation of unbiasedness
– The estimated coefficients may be smaller or larger, depending on the sample that is the result of a random draw
– However, on average, they will be equal to the values that charac-terize the true relationship between y and x in the population
– “On average” means if sampling was repeated, i.e. if drawing the random sample and doing the estimation was repeated many times
– In a given sample, estimates may differ considerably from true values
30
The Simple Regression Model
• Variances of the OLS estimators
– Depending on the sample, the estimates will be nearer or
farther away from the true population values
– How far can we expect our estimates to be away from the true population values on average (= sampling variability)?
– Sampling variability is measured by the estimator‘s variances
• Assumption SLR.5 (Homoskedasticity)
The value of the explanatory variable must contain no information about the variability of the unobserved factors
31
The Simple Regression Model
• Graphical illustration of homoskedasticity
The variability of the unobserved influences does not depend on the value of the explanatory variable
32
The Simple Regression Model
• An example for heteroskedasticity: Wage and education
The variance of the unobserved determinants of wages increases with the level of education
33
The Simple Regression Model
• Theorem (Variances of the OLS estimators) Under assumptions SLR.1 – SLR.5:
• Conclusion:
– The sampling variability of the estimated regression coefficients will be the higher, the larger the variability of the unobserved factors, and the lower, the higher the variation in the explanatory variable
34
The Simple Regression Model
• Estimating the error variance
The variance of u does not depend on x, i.e.
equal to the unconditional variance
One could estimate the variance of the errors by calculating the variance of the residuals in the sample; unfortunately this estimate would be biased
An unbiased estimate of the error variance can be obtained by substracting the number of estimated regression coefficients from the number of observations
35
The Simple Regression Model
• Theorem (Unbiasedness of the error variance)
• Calculation of standard errors for regression coefficients
Plug in for the unknown
The estimated standard deviations of the regression coefficients are called “standard errors.” They measure how precisely the regression coefficients are estimated.
36
Multiple Regression Analysis: Estimation
• DefinitionofheMultipleLinearRegressionModel “Explains variable in terms of variables ”
Intercept
Slope parameters
Dependent variable, explained variable, response variable,…
Independent variables, explanatory variables, regressors,…
Error term, disturbance, unobservables,…
37
Multiple Regression Analysis: Estimation
• Motivationformultipleregression
– Incorporate more explanatory factors into the model
– Explicitly hold fixed other factors that otherwise would be in
– Allow for more flexible functional forms • Example:Wageequation
Now measures effect of education explicitly holding experience fixed
All other factors… Years of education Years of labor market experience
Hourly wage
38
Multiple Regression Analysis: Estimation
• Example: Average test scores and per student spending Other factors
Per student spending Average family income at this school of students at this school
– Per student spending is likely to be correlated with average family income at a given high school because of school financing
– Omitting average family income in regression would lead to biased estimate of the effect of spending on average test scores
– In a simple regression model, effect of per student spending would partly include the effect of family income on test scores
Average standardized test score of school
39
•
Multiple Regression
Analysis: Estimation
Example: Family income and family consumption Family income Family income squared
Other factors
Family consumption
– Model has two explanatory variables: inome and income squared
– Consumption is explained as a quadratic function of income – One has to be very careful when interpreting the coefficients:
By how much does consumption increase if income is increased by one unit?
Depends on how much income is already there
40
Multiple Regression
Analysis: Estimation
• Example: CEO salary, sales, and CEO tenure
Log sales Quadratic function of CEO tenure with the firm
– Model assumes a constant elasticity relationship between CEO salary and the sales of his or her firm
– Model assumes a quadratic relationship between CEO salary and his or her tenure with the firm
• Meaningof“linear”regression
– The model has to be linear in the parameters (not in the variables)
Log of CEO salary
41
Multiple Regression Analysis: Estimation
• OLS Estimation of the Multiple Regression Model
• Random sample
• Regression residuals
• Minimize sum of squared residuals
Minimization will be carried out by computer
42
Multiple Regression
Analysis: Estimation
• Interpretation of the multiple regression model
By how much does the dependent variable change if the j-th independent variable is increased by one unit, holding all other independent variables and the error term constant
– The multiple linear regression model manages to hold the values of other explanatory variables fixed even if, in reality, they are correlated with the explanatory variable under consideration
– “Ceteris paribus”-interpretation
– It has still to be assumed that unobserved factors do
not change if the explanatory variables are changed
43
Multiple Regression Analysis: Estimation
• Example:DeterminantsofcollegeGPA
High school grade point average Achievement test score
• Interpretation
– Holding ACT fixed, another point on high school grade point average is associated with another .453 points college grade point average
– Or: If we compare two students with the same ACT, but the hsGPA of student A is one point higher, we predict student A to have a colGPA that is .453 higher than that of student B
– Holding high school grade point average fixed, another 10 points on ACT are associated with less than one point on college GPA
Grade point average at college
44
Multiple Regression Analysis: Estimation
• Properties of OLS on any sample of data • Fitted values and residuals
Fitted or predicted values Residuals
• Algebraic properties of OLS regression
Deviations from regression Covariance between deviations Sample averages of y and of the
line sum up to zero and regressors are zero regressors lie on regression line
45
Multiple Regression Analysis: Estimation
• Goodness-of-Fit
• Decompositionoftotalvariation
• R-squared
• AlternativeexpressionforR-squared
Notice that R-squared can only increase if another explanatory variable is added to the regression
R-squared is equal to the squared correlation coefficient between the actual and the predicted value of the dependent variable
47
Multiple Regression
Analysis: Estimation
• Example: Explaining arrest records
Number of times arrested 1986
Proportion prior arrests that led to conviction
Months in prison 1986
Quarters employed 1986
• Interpretation:
– If the proportion prior arrests increases by 0.5, the
predicted fall in arrests is 7.5 arrests per 100 men
– If the months in prison increase from 0 to 12, the predicted fall in arrests is 0.408 arrests for a particular man
– If the quarters employed increase by 1, the predicted fall in arrests is 10.4 arrests per 100 men
48
Multiple Regression Analysis: Estimation
• Example: Explaining arrest records (cont.) – Anadditionalexplanatoryvariableisadded:
Average sentence in prior convictions
R-squared increases only slightly
• Interpretation:
– Averagepriorsentenceincreasesnumberofarrests(?)
– LimitedadditionalexplanatorypowerasR-squaredincreasesbylittle
• General remark on R-squared
– EvenifR-squaredissmall(asinthegivenexample),regressionmay still provide good estimates of ceteris paribus effects
49
Multiple Regression
Analysis: Estimation
Standard Assumptions for the Multiple Regression Model
• Assumption MLR.1 (Linear in parameters)
In the population, the relation- ship between y and the expla- natory variables is linear
• Assumption MLR.2 (Random sampling)
The data is a random sample
drawn from the population
Each data point therefore follows the population equation
50
Multiple Regression
Analysis: Estimation
Standard Assumptions for the Multiple Regression Model
• Assumption MLR.3 (No perfect collinearity)
“In the sample (and therefore in the population), none
of the independent variables is constant and there are
no exact linear relationships among the independent variables.”
• Remarks on MLR.3
– Theassumptiononlyrulesoutperfectcollinearity/correlationbet-
ween explanatory variables; imperfect correlation is allowed
– Ifanexplanatoryvariableisaperfectlinearcombinationofother explanatory variables it is superfluous and may be eliminated
– Constantvariablesarealsoruledout(collinearwithintercept)
51
Multiple Regression Analysis: Estimation
• Exampleforperfectcollinearity:smallsample
In a small sample, avginc may accidentally be an exact multiple of expend; it will not
be possible to disentangle their separate effects because there is exact covariation
• Exampleforperfectcollinearity:relationships between regressors
Either shareA or shareB will have to be dropped from the regression because there is an exact linear relationship between them: shareA + shareB = 1
52
Multiple Regression Analysis: Estimation
Standard Assumptions for the Multiple Regression Model (cont.)
• AssumptionMLR.4(Zeroconditionalmean)
The value of the explanatory variables
must contain no information about the mean of the unobserved factors
– In a multiple regression model, the zero conditional mean assumption is much more likely to hold because fewer things end up in the error
• Example:Averagetestscores
If avginc was not included in the regression, it would end up in the error term; it
would then be hard to defend that expend is uncorrelated with the error 53
Multiple Regression Analysis: Estimation
• Discussion of the Zero Mean Conditional Assumption
– Explanatory variables that are correlated with the error term are called endogenous; endogeneity is a violation of assumption MLR.4
– Explanatory variables that are uncorrelated with the error term are called exogenous; MLR.4 holds if all explanat. var. are exogenous
– Exogeneity is the key assumption for a causal interpretation of the regression, and for unbiasedness of the OLS estimators
• Theorem (Unbiasedness of OLS)
– Unbiasedness is an average property in repeated samples; in a given sample, the estimates may still be far away from the true values
54
Multiple Regression Analysis: Estimation
• Includingirrelevantvariablesinaregressionmodel
No problem because .
However, including irrevelant variables may increase sampling variance.
• Omittingrelevantvariables:thesimplecase
True model (contains x1 and x2) Estimated model (x2 is omitted)
= 0 in the population
55
Omitted Variable Bias
Multiple Regression
Analysis: Estimation
If x1 and x2 are correlated, assume a linear regression relationship between them
If y is only regressed on x1 this will be the estimated intercept
If y is only regressed error term on x1, this will be the
estimated slope on x1
• Conclusion: All estimated coefficients will be biased
56
Multiple Regression
Analysis: Estimation
• Example: Omitting ability in a wage equation Will both be positive
The return to education will be overestimated because . It will look
as if people with many years of education earn very high wages, but this is partly due to the fact that people with more education are also more able on average.
• When is there no omitted variable bias?
– If the omitted variable is irrelevant or uncorrelated
57
Multiple Regression Analysis: Estimation
• Omittedvariablebias:moregeneralcases
True model (contains x1, x2, and x3)
Estimated model (x3 is omitted)
– No general statements possible about direction of bias – Analysis as in simple case if one regressor uncorrelated
with others
• Example:Omittingabilityinawageequation
If exper is approximately uncorrelated with educ and abil, then the direction of the omitted variable bias can be as analyzed in the simple two variable case.
58
Multiple Regression Analysis: Estimation
Standard Assumptions for the Multiple Regression Model (cont.)
• AssumptionMLR.5(Homoskedasticity)
• Example:Wageequation • Shorthandnotation
with
The value of the explanatory variables
must contain no information about the variance of the unobserved factors
This assumption may also be hard to justify in many cases
All explanatory variables are collected in a random vector
59
Multiple Regression
Analysis: Estimation
• Theorem (Sampling variances of the OLS slope estimators) Uner assumptions MLR.1 – MLR.5
Variance of the error term
Total sample variation in R-squared from a regression of explanatory variable xj on explanatory variable xj: all other independent variables
(including a constant)
60
Multiple Regression Analysis: Estimation
Components of OLS Variances:
1) The error variance
– A high error variance increases the sampling variance because there is more “noise” in the equation
– A large error variance necessarily makes estimates imprecise
– The error variance does not decrease with sample size
2) The total sample variation in the explanatory variable
– More sample variation leads to more precise estimates
– Total sample variation automatically increases with the sample size
– Increasing the sample size is thus a way to get more precise estimates
61
Multiple Regression
Analysis: Estimation
3) Linear relationships among the independent variables
Regress on all other independent variables (including a constant)
The R-squared of this regression will be the higher the better xj can be linearly explained by the other independent variables
– Sampling variance of will be the higher the better explanatory variable can be linearly explained by other independent variables
– The problem of almost linearly dependent explanatory variables is called multicollinearity (i.e. for some )
62
Multiple Regression Analysis: Estimation
An example for multicollinearity
Expenditures Expenditures for in- for teachers structional materials
Other ex- penditures
Average standardized test score of school
The different expenditure categories will be strongly correlated because if a school has a lot of resources it will spend a lot on everything.
It will be hard to estimate the differential effects of different expenditure categories because all expenditures are either high or low. For precise estimates of the differential effects, one would need information about situations where expenditure categories change differentially.
As a consequence, sampling variance of the estimated effects will be large.
63
Multiple Regression Analysis: Estimation
• Discussion of the multicollinearity problem
– In the above example, it would probably be better to lump all expen-diture categories together because effects cannot be disentangled
– In other cases, dropping some independent variables may reduce multicollinearity (but this may lead to omitted variable bias)
64
Multiple Regression Analysis: Estimation
– Only the sampling variance of the variables involved in multicollinearity will be inflated; the estimates of other effects may be very precise
– Note that multicollinearity is not a violation of MLR.3 in the strict sense
– Multicollinearity may be detected through “variance inflation factors”
As an (arbitrary) rule of thumb, the variance inflation factor should not be larger than 10
65
Multiple Regression Analysis: Estimation
• Variances in Misspecified Models
– The choice of whether to include a particular variable in a regression can be made by analyzing the tradeoff between bias and variance
True population model
Estimated model 1 Estimated model 2
– It might be the case that the likely omitted variable bias in the misspecified model 2 is overcompensated by a smaller variance
66
Multiple Regression
Analysis: Estimation
• VariancesinMisspecifiedModels(cont.)
• Case1: • Case2:
Conditional on x1 and x2, the variance in model 2 is always smaller than that in model 1
Conclusion: Do not include irrelevant regressors
Trade off bias and variance; Caution: bias will not vanish even in large samples
67
Multiple Regression
Analysis: Estimation
• Estimating the Error Variance
An unbiased estimate of the error variance can be obtained by substracting the number of estimated regression coefficients from the number of observations. The number of obser-vations minus the number of estimated parameters is also called the degrees of freedom. The n estimated squared residuals in the sum are not completely independent but related
through the k+1 equations that define the first order conditions of the minimization problem.
• Theorem (Unbiased estimator of the error variance)
68
Multiple Regression
Analysis: Estimation
• Estimation of the Sampling Variances of the OLS Estimators
The true sampling variation of the estimated
The estimated samp- ling variation of the estimated
Plug in
for the unknown
• Note that these formulas are only valid under assumptions MLR.1-MLR.5 (in particular, there has to be homoskedasticity)
69
Multiple Regression Analysis: Estimation
• EfficiencyofOLS:TheGauss-MarkovTheorem
– Under assumptions MLR.1 – MLR.5, OLS is unbiased
– However, under these assumptions there may be many other estimators that are unbiased
– Which one is the unbiased estimator with the smallest variance?
– In order to answer this question one usually limits oneself to linear estimators, i.e. estimators linear in the dependent variable
May be an arbitrary function of the sample values of all the explanatory variables; the OLS estimator
can be shown to be of this form
70
Multiple Regression Analysis: Estimation
• Theorem(Gauss-MarkovTheorem)
– Under assumptions MLR.1 – MLR.5, the OLS estimators are the best linear unbiased estimators (BLUEs) of the regression coefficients, i.e.
for all
for which .
• OLSisonlythebestestimatorifMLR.1–MLR.5hold; if there is heteroskedasticity for example, there are better estimators.
71
Multiple Regression Analysis: Inference
• Statistical inference in the regression model – Hypothesis tests about population parameters – Construction of confidence intervals
• Sampling distributions of the OLS estimators
– The OLS estimators are random variables
– We already know their expected values and their variances
– However, for hypothesis tests we need to know their distribution
– In order to derive their distribution we need additional assumptions
– Assumption about distribution of errors: normal distribution
72
Multiple Regression Analysis: Inference
• AssumptionMLR.6(Normalityoferrorterms) independently of
It is assumed that the unobserved
factors are normally distributed around the population regression function.
The form and the variance of the distribution does not depend on any of the explanatory variables.
It follows that:
73
Multiple Regression Analysis: Inference
• Discussion of the Normality Assumption
– The error term is the sum of “many” different unobserved
factors
– Sums of independent factors are normally distributed (CLT)
– Problems:
• How many different factors? Number large enough?
• Possibly very heterogenuous distributions of individual factors • How independent are the different factors?
– The normality of the error term is an empirical question
– At least the error distribution should be “close” to normal
– In many cases, normality is questionable or impossible by definition
74
Multiple Regression Analysis: Inference
• Discussion of the Normality Assumption (cont.)
– Examples where normality cannot hold:
• Wages (nonnegative; also: minimum wage)
• Number of arrests (takes on a small number of integer values) • Unemployment (indicator variable, takes on only 1 or 0)
– In some cases, normality can be achieved through transformations of the dependent variable (e.g. use log(wage) instead of wage)
– Under normality, OLS is the best (even nonlinear) unbiased estimator
– Important: For the purposes of statistical inference, the assumption of normality can be replaced by a large sample size
75
• Terminology “Gauss-Markov assumptions”
“Classical linear model (CLM) assumptions”
Multiple Regression Analysis: Inference
• Theorem(NormalSamplingDistributions) Under assumptions MLR.1 – MLR.6:
The estimators are normally distributed around the true parameters with the variance that was derived earlier
The standardized estimators follow a standard normal distribution
76
Multiple Regression Analysis: Inference
• Testing Hypotheses about a Single Population Parameter
• Theorem (t-distribution for the Standardized Estimators)
Under assumptions MLR.1 – MLR.6:
If the standardization is done using the estimated standard deviation (= standard error), the normal distribution is replaced by a t-distribution
Note: The t-distribution is close to the standard normal distribution if n-k-1 is large.
• Null hypothesis (for more general hypotheses, see below)
The population parameter is equal to zero, i.e. after controlling for the other independent variables, there is no effect of xj on y
77
Multiple Regression Analysis: Inference
• t-statistic (or t-ratio)
The t-statistic will be used to test the above null hypothesis. The farther the estimated coefficient is away from zero, the less likely it is that the null hypothesis holds true. But what does “far” away from zero mean?
This depends on the variability of the estimated coefficient, i.e. its standard deviation. The t-statistic measures how many estimated standard deviations the estimated coefficient is away from zero.
• Distribution of the t-statistic if the null hypothesis is true
• Goal: Define a rejection rule so that, if it is true, H0 is rejected
only with a small probability (= significance level, e.g. 5%)
78
Multiple Regression Analysis: Inference
• Testing against one-sided alternatives (greater than zero) Test against .
Reject the null hypothesis in favour of the alternative hypothesis if the estimated coef- ficient is “too large” (i.e. larger than a criti- cal value).
Construct the critical value so that, if the null hypothesis is true, it is rejected in, for example, 5% of the cases.
In the given example, this is the point of the t- distribution with 28 degrees of freedom that is exceeded in 5% of the cases.
Reject if t-statistic is greater than 1.701
79
Multiple Regression Analysis: Inference
• Example:Wageequation
– Test whether, after controlling for education and tenure,
higher work experience leads to higher hourly wages
Standard errors
.
One would either expect a positive effect of experience on hourly wage or no effect at all.
Test against
80
Multiple Regression Analysis: Inference
• Example: Wage equation (cont.) t-statistic
Degrees of freedom; here the standard normal approximation applies
Critical values for the 5% and the 1% significance level (these are conventional significance levels).
The null hypothesis is rejected because the t-statistic exceeds the critical value.
“The effect of experience on hourly wage is statistically greater than zero at the 5% (and even at the 1%) significance level.”
81
Multiple Regression Analysis: Inference
• Testingagainstone-sidedalternatives(lessthanzero) Test against .
Reject the null hypothesis in favour of the alternative hypothesis if the estimated coef-
ficient is “too small” (i.e. smaller than a criti-
cal value).
Construct the critical value so that, if the null hypothesis is true, it is rejected in, for example, 5% of the cases.
In the given example, this is the point of the t- distribution with 18 degrees of freedom so that 5% of the cases are below the point.
Reject if t-statistic is less than -1.734
82
Multiple Regression Analysis: Inference
• Example:Studentperformanceandschoolsize
– Test whether smaller school size leads to better student
performance
Average annual tea- cher compensation
Percentage of students passing maths test
Test against
.
Staff per one thou- sand students
Student enrollment (= school size)
Do larger schools hamper student performance or is there no such effect?
83
Multiple Regression Analysis: Inference
• Example: Student performance and school size (cont.) t-statistic
Degrees of freedom; here the standard normal approximation applies
Critical values for the 5% and the 15% significance level.
The null hypothesis is not rejected because the t-statistic is not smaller than the critical value.
One cannot reject the hypothesis that there is no effect of school size on student performance (not even for a lax significance level of 15%).
84
Multiple Regression Analysis: Inference
• Example: Student performance and school size (cont.) – Alternative specification of functional form:
R-squared slightly higher
Test against .
85
Multiple Regression Analysis: Inference
• Example: Student performance and school size (cont.) t-statistic
Critical value for the 5% significance level; reject null hypothesis
The hypothesis that there is no effect of school size on student performance can be rejected in favor of the hypothesis that the effect is negative.
How large is the effect?
+ 10% enrollment ; -0.129 percentage points students pass test
(small effect)
86
Multiple Regression Analysis: Inference
• Testing against two-sided alternatives
Test against .
Reject the null hypothesis in favour of the alternative hypothesis if the absolute value
of the estimated coefficient is too large.
Construct the critical value so that, if the null hypothesis is true, it is rejected in, for example, 5% of the cases.
In the given example, these are the points of the t-distribution so that 5% of the cases lie in the two tails.
Reject if absolute value of t-statistic is less than -2.06 or greater than 2.06
87
Multiple Regression Analysis: Inference
• Example: Determinants of college GPA
Lectures missed per week
For critical values, use standard normal distribution
The effects of hsGPA and skipped are significantly different from zero at the 1% significance level. The effect of ACT is not significantly different from zero, not even at the 10% significance level.
88
Multiple Regression Analysis: Inference
• “Statistically Significant” Variables in a Regression
– If a regression coefficient is different from zero in a two-sided test, the corresponding variable is said to be “statistically significant”
– If the number of degrees of freedom is large enough so that the normal approximation applies, the following rules of thumb apply:
“statistically significant at 10% level” “statistically significant at 5% level”
“statistically significant at 1% level”
89
Multiple Regression Analysis: Inference
• Testing more General Hypotheses about a Regression Coefficient
• Null hypothesis
• t-statistic
Hypothesized value of the coefficient
• The test works exactly as before, except that the
hypothesized value is substracted from the estimate when forming the statistic
90
Multiple Regression Analysis: Inference
• Example:Campuscrimeandenrollment
– An interesting hypothesis is whether crime increases
by one percent percent
if enrollment is increased by one
Estimate is different from one but is this difference statistically significant?
The hypothesis is rejected at the 5% level
91
Multiple Regression Analysis: Inference
• Computingp-valuesfort-tests
– If the significance level is made smaller and smaller, there will be a point where the null hypothesis cannot be rejected anymore
– The reason is that, by lowering the significance level, one wants to avoid more and more to make the error of rejecting a correct H0
– The smallest significance level at which the null hypothesis is still rejected, is called the p-value of the hypothesis test
– A small p-value is evidence against the null hypothesis because one would reject the null hypothesis even at small significance levels
– A large p-value is evidence in favor of the null hypothesis
– P-values are more informative than tests at fixed significance levels
92
Multiple Regression Analysis: Inference
• How the p-value is computed (here: two-sided test)?
The p-value is the significance level at which one is indifferent between rejecting and not rejecting the null hypothesis.
In the two-sided case, the p-value is thus the probability that the t-distributed variable takes on a larger absolute value than the realized value of the test statistic, e.g.:
From this, it is clear that a null hypothesis is rejected if and only if the corresponding p-value is smaller than the significance level.
For example, for a significance level of 5% the t- statistic would not lie in the rejection region.
These would be the critical values for a 5% significance level
value of test statistic
93
Multiple Regression Analysis: Inference
• Guidelines for discussing economic and statistical significance
– If a variable is statistically significant, discuss the magnitude of the coefficient to get an idea of its economic or practical importance
– The fact that a coefficient is statistically significant does not necessarily mean it is economically or practically significant!
– If a variable is statistically and economically important but has the “wrong” sign, the regression model might be misspecified
– If a variable is statistically insignificant at the usual levels (10%, 5%, or 1%), one may think of dropping it from the regression
– If the sample size is small, effects might be imprecisely estimated so that the case for dropping insignificant variables is less strong
94
• Confidence Intervals
Critical value of two-sided test
Multiple Regression Analysis: Inference
Lower bound of the Confidence interval
Upper bound of the Confidence interval
Confidence level
• Interpretation of the confidence interval
– Theboundsoftheintervalarerandom
– Inrepeatedsamples,theintervalthatisconstructedintheaboveway will cover the population regression coefficient in 95% of the cases
95
Multiple Regression Analysis: Inference
• Confidenceintervalsfortypicalconfidencelevels
Use rules of thumb
• Relationshipbetweenconfidenceintervalsand hypotheses tests
reject in favor of
96
Multiple Regression Analysis: Inference
• Example: Model of firms‘ R&D expenditures
Spending on R&D
Annual sales
Profits as percentage of sales
The effect of sales on R&D is relatively precisely estimated as the interval is narrow. Moreover, the effect is significantly different from zero because zero is outside the interval.
This effect is imprecisely estimated as the in- terval is very wide. It is not even statistically significant because zero lies in the interval.
97
Multiple Regression Analysis: Inference
• Testing hypotheses about a linear combination of the parameters
• Example: Return to education at two-year vs. at four-year
colleges
Test
A possible test statistic would be:
The difference between the estimates is normalized by the estimated standard deviation of the difference. The null hypothesis would have to be rejected if the statistic is “too negative” to believe that the true difference between the parameters is equal to zero.
Years of education at two year colleges
Years of education at four year colleges
Months in the workforce
against .
98
Multiple Regression Analysis: Inference
• Impossibletocomputewithstandardregression output because
• Alternativemethod Define and test
Usually not available in regression output
against .
Insert into original regression
a new regressor (= total years of college)
99
• Estimation results
Total years of college
Multiple Regression Analysis: Inference
Hypothesis is rejected at 10% level but not at 5% level
• This method works always for single linear hypotheses 100
Multiple Regression Analysis: Inference
• Testing multiple linear restrictions: The F-test • Testing exclusion restrictions
Years in Average number of the league games per year
Batting average Home runs per year Runs batted in per year
against
Test whether performance measures have no effect/can be excluded from regression.
Salary of major lea- gue baseball player
101
Multiple Regression Analysis: Inference
• Estimation of the Unrestricted Model
None of these variabels is statistically significant when tested individually
Idea: How would the model fit be if these variables were dropped from the regression?
102
Multiple Regression Analysis: Inference
• Estimation of the Restricted Model
The sum of squared residuals necessarily increases, but is the increase statistically significant?
• Test statistic
Number of restrictions
The relative increase of the sum of squared residuals when going from H1 to H0 follows a F-distribution (if the null hypothesis H0 is correct)
103
• Rejection Rule
Multiple Regression Analysis: Inference
A F-distributed variable only takes on positive values. This corresponds to the fact that the sum of squared residuals can only increase if one moves from H1 to H0.
Choose the critical value so that the null hypo- thesis is rejected in, for example, 5% of the cases, although it is true.
104
Multiple Regression Analysis: Inference
• Test decision in example
Number of restrictions to be tested
Degrees of freedom in the unrestricted model
The null hypothesis is overwhel- mingly rejected (even at very small significance levels).
• Discussion
– The three variables are “jointly significant”
– They were not significant when tested individually – The likely reason is multicollinearity between them
105
Multiple Regression Analysis: Inference
• Testofoverallsignificanceofaregression
The null hypothesis states that the explanatory variables are not useful at all in explaining the dependent variable
Restricted model (regression on constant)
• The test of overall significance is reported in most regression packages; the null hypothesis is usually overwhelmingly rejected
106
Multiple Regression Analysis: Inference
• TestinggenerallinearrestrictionswiththeF-test
•
Example: Test whether house price assessments are rational The assessed housing value Size of lot
(before the house was sold) (in square feet)
Square footage Number of bedrooms
Actual house price
If house price assessments are rational, a 1% change in the assessment should be associated with a 1% change in price.
In addition, other known factors should not influence the price once the assessed value has been controlled for.
107
Multiple Regression Analysis: Inference
• Unrestrictedregression • Restrictedregression
• Teststatistic
The restricted model is actually a regression of [y-x1] on a constant
cannot be rejected
108
Multiple Regression Analysis: Inference
• Regression output for the unrestricted regression
When tested individually, there is also no evidence against the rationality of house price assessments
• The F-test works for general multiple linear hypotheses
• For all tests and confidence intervals, validity of assumptions MLR.1 – MLR.6 has been assumed. Tests may be invalid otherwise.
109
Multiple Regression Analysis: OLS Asymptotics
• So far we focused on properties of OLS that hold for any sample
• Properties of OLS that hold for any sample/sample size – Expectedvalues/unbiasednessunderMLR.1–MLR.4
– VarianceformulasunderMLR.1–MLR.5
– Gauss-MarkovTheoremunderMLR.1–MLR.5
– Exactsamplingdistributions/testsunderMLR.1–MLR.6 • Properties of OLS that hold in large samples
– ConsistencyunderMLR.1–MLR.4
– Asymptoticnormality/testsunderMLR.1–MLR.5
Without assuming nor- mality of the error term!
Multiple Regression Analysis: OLS Asymptotics
• Consistency
An estimator is consistent for a population parameter if
for arbitrary and
.
Alternative notation:
• Interpretation:
The estimate converges in proba-bility to the true population value
– Consistencymeansthattheprobabilitythattheestimateisarbitrarily close to the true population value can be made arbitrarily high by increasing the sample size
• Consistency is a minimum requirement for sensible estimators
Multiple Regression Analysis: OLS Asymptotics
• Theorem(ConsistencyofOLS)
• Specialcaseofsimpleregressionmodel
• AssumptionMLR.4‘
One can see that the slope estimate is consistent if the explanatory variable is exogenous, i.e. un- correlated with the error term.
All explanatory variables must be uncorrelated with the error term. This assumption is weaker than the zero conditional mean assumption MLR.4.
Multiple Regression
Analysis: OLS Asymptotics
• For consistency of OLS, only the weaker MLR.4‘ is needed
• Asymptotic analog of omitted variable bias
True model
Misspecified model
Bias
There is no omitted variable bias if the omitted variable is irrelevant or uncorrelated with the included variable
Multiple Regression Analysis: OLS Asymptotics
• Asymptotic Normality and Large Sample Inference
– In practice, the normality assumption MLR.6 is often
questionable
– If MLR.6 does not hold, the results of t- or F-tests may be wrong
– Fortunately, F- and t-tests still work if the sample size is large enough
– Also, OLS estimates are normal in large samples even without MLR.6
• Theorem (Asymptotic normality of OLS) Under assumptions MLR.1 – MLR.5:
In large samples, the standardized estimates are normally distributed
also
Multiple Regression Analysis: OLS Asymptotics
• Practical Consequences
– Inlargesamples,thet-distributionisclosetotheNormal(0,1)
distribution
– Asaconsequence,t-testsarevalidinlargesampleswithoutMLR.6
– ThesameistrueforconfidenceintervalsandF-tests
– Important:MLR.1–MLR.5arestillnecessary,esp.homoskedasticity
• Asymptotic analysis of the OLS sampling errors Converges to
Converges to Converges to a fixed number
Multiple Regression Analysis: OLS Asymptotics
• Asymptotic analysis of the OLS sampling errors (cont.) shrinks with the rate
shrinks with the rate
• This is why large samples are better
• Example: Standard errors in a birth weight equation
Use only the first half of observations
Multiple Regression Analysis: Further Issues
• More on Functional Form
• More on using logarithmic functional forms
– Convenient percentage/elasticity interpretation
– Slope coefficients of logged variables are invariant to rescalings
– Taking logs often eliminates/mitigates problems with outliers
– Taking logs often helps to secure normality and homoskedasticity
– Variables measured in units such as years should not be logged
– Variables measured in percentage points should also not be logged
– Logs must not be used if variables take on zero or negative values
– It is hard to reverse the log-operation when constructing predictions
Multiple Regression Analysis: Further Issues
• UsingQuadraticFunctionalForms
• Example:Wageequation
Concave experience profile
• Marginaleffectofexperience
The first year of experience increases the wage by some $.30, the second year by .298 – 2(.0061)(1) = $.29 etc.
Multiple Regression Analysis: Further Issues
• Wagemaximumwithrespecttoworkexperience
Does this mean the return to experience becomes negative after 24.4 years?
Not necessarily. It depends on how many observations in the sample lie right of the turnaround point.
In the given example, these are about 28% of the observations. There may be a speci-fication problem (e.g. omitted variables).
Multiple Regression Analysis: Further Issues
• Example:Effectsofpollutiononhousingprices
Nitrogen oxide in the air, distance from employment centers, average student/teacher ratio
Does this mean that, at a low number of rooms, more rooms are associated with lower prices?
Multiple Regression Analysis: Further Issues
• Calculation of the turnaround point Turnaround point:
Increase rooms from 5 to 6:
Increase rooms from 6 to 7:
This area can be ignored as it concerns only 1% of the observations.
Multiple Regression Analysis: Further Issues
• Other possibilities
• Higher polynomials
Multiple Regression Analysis: Further Issues
• Modelswithinteractionterms
Interaction term
The effect of the number of bedrooms depends on the level of square footage
• Interactioneffectscomplicateinterpretationof parameters
Effect of number of bedrooms, but for a square footage of zero
Multiple Regression Analysis: Further Issues
• ReparametrizationofInteractionEffects
Population means; may be
replaced by sample means
Effect of x2 if all variables take on their mean values
• Advantagesofreparametrization
– Easy interpretation of all parameters
– Standard errors for partial effects at the mean values available
– If necessary, interaction may be centered at other interesting values
Multiple Regression Analysis: Further Issues
• Average Partial Effects
– In models with quadratics, interactions, and other nonlinear functional forms, the partial effect depend on the values of one or more explanatory variables
– Average partial effect (APE) is a summary measure to describe the relationship between dependent variable and each explanatory variable
– After computing the partial effect and plugging in the estimated parameters, average the partial effects for each unit across the sample
Multiple Regression Analysis: Further Issues
• More on goodness-of-fit and selection of regressors
• General remarks on R2
– A high R-squared does not imply that there is a causal interpretation
– A low R-squared does not preclude precise estimation of partial effects
• Adjusted R2
– What is the ordinary R-squared supposed to measure?
is an estimate for
Population R-squared
Multiple Regression Analysis: Further Issues
• Adjusted R2 (cont.)
– A better estimate taking into account degrees of freedom would
be
Correct degrees of freedom of numerator and denominator
–
– The adjusted R-squared imposes a penalty for adding new
regressors
– The adjusted R-squared increases if, and only if, the t-statistic of a newly added regressor is greater than one in absolute value
• Relationship between R-squared and adjusted R-
squared
The adjusted R-squared may even get negative
Multiple Regression Analysis: Further Issues
• Using adjusted R-squared to choose between nonnested models
– Models are nonnested if neither model is a special case of the other
– A comparison between the R-squared of both models would be unfair to the first model because the first model contains fewer parameters
– In the given example, even after adjusting for the difference in degrees of freedom, the quadratic model is preferred
Multiple Regression Analysis: Further Issues
• Comparing models with different dependent variables
– R-squared or adjusted R-squared must not be used to compare
models which differ in their definition of the dependent variable
• Example: CEO compensation and firm performance
There is much less variation in log(salary) that needs to be explained than in salary
Multiple Regression Analysis: Further Issues
• Controlling for too many factors in regression analysis
• In some cases, certain variables should not be held fixed
– In a regression of traffic fatalities on state beer taxes (and other factors) one should not directly control for beer consumption
– In a regression of family health expenditures on pesticide usage among farmers one should not control for doctor visits
• Different regressions may serve different purposes
– In a regression of house prices on house characteristics, one would only include price assessments if the purpose of the regression is to study their validity; otherwise one would not include them
Multiple Regression Analysis: Further Issues
• Adding regressors to reduce the error variance
– Adding regressors may excarcerbate multicollinearity
problems
– On the other hand, adding regressors reduces the error variance
– Variables that are uncorrelated with other regressors should be added because they reduce error variance without increasing multicollinearity
– However, such uncorrelated variables may be hard to find
• Example: Individual beer consumption and beer prices
– Including individual characteristics in a regression of beer consumption on beer prices leads to more precise estimates of the price elasticity
Multiple Regression Analysis: Further Issues
• Predictingywhenlog(y)isthedependentvariable
Under the additional assumption that is independent of :
Prediction for y
Multiple Regression Analysis: Further Issues
• Comparing R2 of a logged and an unlogged specification
These are the R-squareds for the predictions of the unlogged salary variable (although the second regression is originally for logged salaries). Both R-squareds can now be directly compared.
Multiple Regression Analysis with Qualitative Information
• Qualitative Information
– Examples: gender, race, industry, region, rating grade, …
– A way to incorporate qualitative information is to use dummy variables
– They may appear as the dependent or as independent variables
• A single dummy independent variable
= the wage gain/loss if the person is a woman rather than a man (holding other things fixed)
Dummy variable:
=1 if the person is a woman =0 if the person is man
Multiple Regression Analysis with Qualitative Information
• Graphical Illustration
Alternative interpretation of coefficient:
i.e. the difference in mean wage between men and women with the same level of education.
Intercept shift
Multiple Regression Analysis with Qualitative Information
• DummyVariableTrap
When using dummy variables, one category always has to be omitted:
The base category are men The base category are women
Alternatively, one could omit the intercept: Disadvantages:
1) More difficult to test for diffe-
rences between the parameters 2) R-squared formula only valid if regression contains intercept
This model cannot be estimated (perfect collinearity)
Multiple Regression Analysis with Qualitative Information
• Estimated wage equation with intercept shift
Holding education, experience, and tenure fixed, women earn $1.81 less per hour than men
• Does that mean that women are discriminated against?
– Notnecessarily.Beingfemalemaybecorrelatedwithotherproduc- tivity characteristics that have not been controlled for.
Multiple Regression Analysis with Qualitative Information
• Comparing means of subpopulations described by dummies
Not holding other factors constant, women earn $2.51per hour less than men, i.e. the difference between the mean wage of men and that of women is $2.51.
• Discussion
– Itcaneasilybetestedwhetherdifferenceinmeansissignificant
– Thewagedifferencebetweenmenandwomenislargerifnoother things are controlled for; i.e. part of the difference is due to differ- ences in education, experience, and tenure between men and women
Multiple Regression Analysis with
Qualitative Information
• Further example: Effects of training grants on hours of training Dummy variable indicating whether firm received a training
grant
• This is an example of program evaluation
– Treatmentgroup(=grantreceivers)vs.controlgroup(=nogrant) – Istheeffectoftreatmentontheoutcomeofinterestcausal?
Hours training per employee
Multiple Regression Analysis with Qualitative Information
• Using dummy explanatory variables in equations for log(y)
Dummy indicating whether house is of colonial style
As the dummy for colonial style changes from 0 to 1, the house price increases by 5.4 percentage points
Multiple Regression Analysis with Qualitative Information
• Using dummy variables for multiple categories
– 1) Define membership in each category by a dummy variable – 2) Leave out one category (which becomes the base category)
Holding other things fixed, married women earn 19.8% less than single men (= the base category)
Multiple Regression Analysis with Qualitative Information
• Incorporating ordinal information using dummy variables
• Example: City credit ratings and municipal bond interest rates
Credit rating from 0-4 (0=worst, 4=best)
Municipal bond rate
This specification would probably not be appropriate as the credit rating only contains ordinal information. A better way to incorporate this information is to define dummies:
Dummies indicating whether the particular rating applies, e.g. CR1=1 if CR=1, and CR1=0 otherwise. All effects are measured in comparison to the worst rating (= base category).
Multiple Regression Analysis with Qualitative Information
• Interactionsinvolvingdummyvariables • Allowingfordifferentslopes
Interaction term
= intercept men
= intercept women
• Interestinghypotheses
The return to education is the same for men and women
= slope men
= slope women
The whole wage equation is the same for men and women
Multiple Regression Analysis with Qualitative Information
• Graphical illustration
Interacting both the intercept and the slope with the female dummy enables one to model completely independent wage equations for men and women
Multiple Regression Analysis with Qualitative Information
• Estimatedwageequationwithinteractionterm
No evidence against hypothesis that the return to education is the same for men and women
Does this mean that there is no significant evidence of lower pay for women at the same levels of educ, exper, and tenure? No: this is only the effect for educ = 0. To answer the question one has to recenter the interaction term, e.g. around educ = 12.5 (= average education).
•
•
Multiple Regression Analysis with Qualitative Information
Testing for differences in regression functions across groups
Unrestricted model (contains full set of interactions) Standardized aptitude test score High school rank percentile
Total hours spent
College grade point average
•
Restricted model (same regression for both groups)
in college courses
Multiple Regression Analysis with Qualitative Information
• Null hypothesis
All interaction effects are zero, i.e. the same regression coefficients apply to men and women
• Estimation of the unrestricted model
Tested individually, the hypothesis that the interaction effects are zero cannot be rejected
Multiple Regression Analysis with Qualitative Information
• Joint test with F-statistic
• Alternative way to compute F-statistic in the given case
– Run separate regressions for men and for women; the unrestricted SSR is given by the sum of the SSR of these two regressions
– Run regression for the restricted model and store SSR
– If the test is computed in this way it is called the Chow-Test
– Important: Test assumes a constant error variance accross groups
Null hypothesis is rejected
Multiple Regression Analysis with Qualitative Information
• A Binary dependent variable: the linear probability model
• Linear regression when the dependent variable is binary
If the dependent variable only takes on the values 1 and 0
Linear probability model (LPM)
In the linear probability model, the coefficients describe the effect of the explanatory variables on the probability that y=1
Multiple Regression Analysis with Qualitative Information
• Example: Labor force participation of married women Non-wife income (in thousand dollars per year)
=1 if in labor force, =0 otherwise
If the number of kids under six years increases by one, the pro- probability that the woman works falls by 26.2%
Does not look significant (but see below)
Multiple Regression Analysis with Qualitative Information
• Example: Female labor participation of married women (cont.)
Graph for nwifeinc=50, exper=5, age=30, kindslt6=1, and kidsge6=0
The maximum level of education in the sample is educ=17. For the gi-ven case, this leads to a predicted probability to be in the labor force of about 50%.
Negative predicted probability but no problem because no woman in the sample has educ < 5.
Multiple Regression Analysis with Qualitative Information
• Disadvantages of the linear probability model
– Predicted probabilities may be larger than one or smaller than
zero
– Marginal probability effects sometimes logically impossible
– The linear probability model is necessarily heteroskedastic
Variance of Ber- noulli variable
– Heteroskedasticity consistent standard errors need to be computed
• Advantanges of the linear probability model
– Easy estimation and interpretation
– Estimated effects and predictions are often reasonably good in practice
Multiple Regression Analysis with Qualitative Information
• More on policy analysis and program evaluation
• Example: Effect of job training grants on worker productivity
=1 if firm received training grant, =0 otherwise
No apparent effect of grant on productivity
Treatment group: grant receivers, Control group: firms that received no grant
Grants were given on a first-come, first-served basis. This is not the same as giving them out randomly. It might be the case that firms with less productive workers saw an opportunity to improve productivity and applied first.
The firm‘s scrap rate
Multiple Regression Analysis with Qualitative Information
• Self-selection into treatment as a source for endogeneity
– In the given and in related examples, the treatment status is probably related to other characteristics that also influence the outcome
– The reason is that subjects self-select themselves into treatment depending on their individual characteristics and prospects
• Experimental Evaluation
– In experiments, assignment to treatment is random
– In this case, causal effects can be inferred using a simple regression
The dummy indicating whether or not there was treatment is unrelated to other factors affecting the outcome.
Multiple Regression Analysis with Qualitative Information
• Further example of an endogenous dummy regressor – Are nonwhite customers discriminated against?
Dummy indicating whether loan was approved
Race dummy
Credit rating
– It is important to control for other characteristics that may be important for loan approval (e.g. profession, unemployment)
– Omitting important characteristics that are correlated with the non-white dummy will produce spurious evidence for discrimination
Heteroskedasticity
• ConsequencesofheteroskedasticityforOLS
– OLS still unbiased and consistent under heteroskedastictiy!
– Also, interpretation of R-squared is not changed
Unconditional error variance is unaffected by heteroskedasticity (which refers to the conditional error variance)
– Heteroskedasticity invalidates variance formulas for OLS estimators
– The usual F tests and t tests are not valid under heteroskedasticity
– Under heteroskedasticity, OLS is no longer the best linear unbiased estimator (BLUE); there may be more efficient linear estimators
Heteroskedasticity
• Heteroskedasticity-robust inference after OLS estimation
– FormulasforOLSstandarderrorsandrelatedstatisticshavebeen
developed that are robust to heteroskedasticity of unknown form
– Allformulasareonlyvalidinlargesamples
– Formulaforheteroskedasticity-robustOLSstandarderror
Also called White/Huber/Eicker standard errors. They involve the squared residuals from the regression and from a regression of xj on all other explanatory variables.
– Usingtheseformulas,theusualttestisvalidasymptotically
– TheusualFstatisticdoesnotworkunderheteroskedasticity,but heteroskedasticity robust versions are available in most software
Heteroskedasticity
• Example: Hourly wage equation
Heteroskedasticity robust standard errors may be larger or smaller than their nonrobust counterparts. The differences are often small in practice.
F statistics are also often not too different.
If there is strong heteroskedasticity, differences may be larger. To be on the safe side, it is advisable to always compute robust standard errors.
Heteroskedasticity
• Testing for heteroskedasticity
– It may still be interesting whether there is heteroskedasticity because then OLS may not be the most efficient linear estimator anymore
• Breusch-Pagan test for heteroskedasticity
Under MLR.4
The mean of u2 must not vary with x1, x2, ..., xk
Heteroskedasticity
• Breusch-Pagantestforheteroskedasticity(cont.)
Regress squared residuals on all expla- natory variables and test whether this regression has explanatory power.
A large test statistic (= a high R- squared) is evidence against the null hypothesis.
Alternative test statistic (= Lagrange multiplier statistic, LM). Again, high values of the test statistic (= high R-squared) lead to rejection of the null hypothesis that the expected value of u2 is unrelated to the explanatory variables.
Heteroskedasticity
• Example: Heteroskedasticity in housing price equations Heteroskedasticity
In the logarithmic specification, homoskedasticity cannot be rejected
Heteroskedasticity
• The White Test for Heteroskedasticity
Regress squared residuals on all expla- natory variables, their squares, and in- teractions (here: example for k=3)
The White test detects more general deviations from heteroskedasticity than the Breusch-Pagan test
• Disadvantage of this form of the White test
– Includingallsquaresandinteractionsleadstoalargenumberofestimated parameters (e.g. k=6 leads to 27 parameters to be estimated)
Heteroskedasticity
• AlternativeformoftheWhiteTest
This regression indirectly tests the dependence of the squared residuals on the explanatory variables, their squares, and interactions, because the predicted value of y and its square implicitly contain all of these terms.
• Example:Heteroskedasticityin(log)housingprice equations
Heteroskedasticity
• WeightedLeastSquaresEstimation
• Heteroskedasticityisknownuptoamultiplicative constant
The functional form of the heteroskedasticity is known
Transformed model
Heteroskedasticity
• Example:Savingsandincome
Note that this regression model has no intercept
• IftheotherGauss-Markovassumptionsholdaswell, OLS applied to the transformed model is the best linear unbiased estimator
• Thetransformedmodelishomoskedastic
Heteroskedasticity
• OLS in the transformed model is weighted least squares (WLS)
Observations with a large variance get a smaller weight in the optimization problem
• Why is WLS more efficient than OLS in the original model?
– Observations with a large variance are less informative than observations with
small variance and therefore should get less weight
• WLS is a special case of generalized least squares (GLS)
Heteroskedasticity
• Example: Financial wealth equation
Net financial wealth
Assumed form of heteroskedasticity
WLS estimates have considerably smaller standard errors (which is line with the expectation that they are more efficient).
Participation in 401K pension plan
Heteroskedasticity
• Important special case of heteroskedasticity
– If the observations are reported as averages at the city/county/state/-country/firm level, they should be weighted by the size of the unit
Average contribution to pension plan in firm i
Average earnings and Percentage firm age in firm i contributes to plan
Heteroskedastic error term
Error variance if errors are homoskedastic at the individual-level
If errors are homoskedastic at the individual-level, WLS with weights equal to firm size mi should be used. If the assumption of homoskedasticity at the individual-level is not exactly right, one can calculate robust standard errors after WLS (i.e. for the transformed model).
Heteroskedasticity
• Unknownheteroskedasticityfunction(feasibleGLS)
Assumed general form of heteroskedasticity; exp-function is used to ensure positivity
Multiplicative error (assumption: independent of the explanatory variables)
Use inverse values of the estimated heteroskedasticity funtion as weights in WLS
Feasible GLS is consistent and asymptotically more efficient than OLS.
Heteroskedasticity
• Example: Demand for cigarettes • Estimation by OLS
Logged income and cigarette price
Smoking restrictions in restaurants
Cigarettes smoked per day
Reject homo- skedasticity
• EstimationbyFGLS
Heteroskedasticity
Now statistically significant
• Discussion
– Theincomeelasticityisnowstatisticallysignificant;othercoefficients are also more precisely estimated (without changing qualit. results)
Heteroskedasticity
• What if the assumed heteroskedasticity function is wrong?
– Iftheheteroskedasticityfunctionismisspecified,WLSisstillconsistent
under MLR.1 – MLR.4, but robust standard errors should be computed
– WLSisconsistentunderMLR.4butnotnecessarilyunderMLR.4‘
– IfOLSandWLSproduceverydifferentestimates,thistypicallyindi- cates that some other assumptions (e.g. MLR.4) are wrong
– Ifthereisstrongheteroskedasticity,itisstilloftenbettertousea wrong form of heteroskedasticity in order to increase efficiency
Heteroskedasticity
• WLS in the Linear Probability Model
In the LPM, the exact form of heteroskedasticity is known
Use inverse values as weights in WLS
• Discussion
– InfeasibleifLPMpredictionsarebelowzeroorgreaterthanone
– Ifsuchcasesarerare,theymaybeadjustedtovaluessuchas.01/.99 – Otherwise,itisprobablybettertouseOLSwithrobuststandarderrors
•
More on Specification and Data Issues
Tests for functional form misspecification
– One can always test whether explanatory should appear as squares or higher order terms by testing whether such terms can be excluded
– Otherwise, one can use general specification tests such as RESET • Regressionspecificationerrortest(RESET)
– The idea of RESET is to include squares and possibly higher order fitted values in the regression (similarly to the reduced White test)
Test for the exclusion of these terms. If they cannot be excluded, this is evidence for omitted higher order terms and interactions, i.e. for misspecification of functional form.
•
Example: Housing price equation
More on Specification and Data Issues
Evidence for misspecification
interactions and higher order terms of all explanatory variables
– RESETprovideslittleguidanceastowheremisspecificationcomes from
• Discussion
– Onemayalsoincludehigherorderterms,whichimpliescomplicated
Less evidence for misspecification
•
Testing against nonnested alternatives
Model 1: Model 2:
Which specification is more appropriate?
More on Specification and Data Issues
Define a general model that contains both models as subcases and test:
• Discussion
– Canalwaysbedone;however,aclearwinnerneednotemerge
– Cannotbeusedifthemodelsdifferintheirdefinitionofthedep.var.
•
•
More on Specification and Data Issues
Using proxy variables for unobserved explanatory variables
Example: Omitted ability in a wage equation
Replace by proxy
•
In general, the estimates for the returns to education and experience will be biased because one has omit the unobservable ability variable. Idea: find a proxy variable for ability which is able to control for ability differences between individuals so that the coefficients of the other variables will not be biased. A possible proxy for ability is the IQ score or similar test scores.
General approach to use proxy variables
Omitted variable, e.g. ability Regression of the omitted variable on its proxy
More on Specification and Data Issues
• Assumptions necessary for the proxy variable method to work – Theproxyis“justaproxy”fortheomittedvariable,itdoesnotbelong
into the population regression, i.e. it is uncorrelated with its error
If the error and the proxy were correlated, the proxy would actually have to be included in the population regression function
– Theproxyvariableisa“good”proxyfortheomittedvariable,i.e.using other variables in addition will not help to predict the omitted variable
Otherwise x1 and x2 would have to be included in the regression for the omitted variable
•
More on Specification and Data Issues
Under these assumptions, the proxy variable method works:
In this regression model, the error term is uncorrelated with all the explanatory variables. As a consequence, all coefficients will be correctly estimated using OLS. The coefficents for the explanatory variables x1 and x2 will be correctly identified. The coefficient for the proxy va-riable may also be of interest (it is a multiple of the coefficient of the omitted variable).
Discussion of the proxy assumptions in the wage example
– Assumption1:ShouldbefullfilledasIQscoreisnotadirectwage determinant; what matters is how able the person proves at work
– Assumption2:Mostofthevariationinabilityshouldbeexplainableby variation in IQ score, leaving only a small rest to educ and exper
•
More on Specification and Data Issues
As expected, the measured return to education decreases if IQ is included as a proxy for unobserved ability.
The coefficient for the proxy suggests that ability differences between indivi-duals are important (e.g. +15 points IQ score are associated with a wage increase of 5.4 percentage points).
Even if IQ score imperfectly soaks up the variation caused by ability, inclu-ding it will at least reduce the bias in the measured return to education.
No significant interaction effect bet-ween ability and education.
More on Specification and Data Issues
• Using lagged dependent variables as proxy variables
– In many cases, omitted unobserved factors may be proxied by the value of the dependent variable from an earlier time period
• Example: City crime rates
– Including the past crime rate will at least partly control for the many omitted factors that also determine the crime rate in a given year
– Another way to interpret this equation is that one compares cities which had the same crime rate last year; this avoids comparing cities that differ very much in unobserved crime factors
•
More on Specification and Data Issues
Models with random slopes (= random coefficient models)
Average intercept
Random component
Average slope
Random component
The model has a random intercept and a random slope
Error term
Assumptions:
The individual random com- ponents are independent of the explanatory variable
WLS or OLS with robust standard errors will consistently estimate the average intercept and average slope in the population
• •
More on Specification and Data Issues
Properties of OLS under measurement error Measurement error in the dependent variable
Mismeasured value = True value + Measurement error Population regression
Estimated regression
Consequences of measurement error in the dependent variable
– Estimateswillbelessprecisebecausetheerrorvarianceishigher
– Otherwise,OLSwillbeunbiasedandconsistent(aslongasthemea- surement error is uncorrelated to the values of the explanatory variables)
•
More on Specification and Data Issues
• Measurementerrorinanexplanatoryvariable Mismeasured value = True value + Measurement error
Classical errors-in-variables assumption:
Population regression
Estimated regression
Error uncorrelated to true value
The mismeasured variable x1 is cor- related with the error term
More on Specification and Data Issues
• Consequences of measurement error in an explanatory variable
– Under the classical errors-in-variables assumption, OLS is biased and inconsistent because the mismeasured variable is endogenous
– One can show that the inconsistency is of the following form:
This factor (which involves the error variance of a regression of the true value of x1 on the other explanatory variables) will always be between zero and one
– The effect of the mismeasured variable suffers from attenuation bias, i.e. the magnitude of the effect will be attenuated towards zero
– In addition, the effects of the other explanatory variables will be biased
More on Specification and Data Issues
• Missing data and nonrandom samples
• Missing data as sample selection
– Missing data is a special case of sample selection (= nonrandom sampling) as the observations with missing information cannot be used
– If the sample selection is based on independent variables there is no problem as a regression conditions on the independent variables
– In general, sample selection is no problem if it is uncorrelated with the error term of a regression (= exogenous sample selection)
– Sample selection is a problem, if it is based on the dependent variable or on the error term (= endogenous sample selection)
More on Specification and Data Issues
• Example for exogenous sample selection
If the sample was nonrandom in the way that certain age groups, income groups, or household sizes were over- or undersampled, this is not a problem for the regression because it examines the savings for subgroups defined by income, age, and hh-size. The distribution of subgroups does not matter.
• Example for endogenous sample selection
If the sample is nonrandom in the way individuals refuse to take part in the sample survey if their wealth is particularly high or low, this will bias the regression results because these individuals may be systematically different from those who do not refuse to take part in the sample survey.
More on Specification and Data Issues
• OutliersandInfluentialObservations
– ExtremevaluesandoutliersmaybeaparticularproblemforOLS
because the method is based on squaring deviations
– Ifoutliersaretheresultofmistakesthatoccuredwhenkeyinginthe data, one should just discard the affected observations
– Ifoutliersaretheresultofthedatageneratingprocess,thedecision whether to discard the outliers is not so easy
• Example:R&Dintensityandfirmsize
More on Specification and Data Issues
• Example: R&D intensity and firm size (cont.)
The outlier is not the result of a mistake: One of the sampled firms is much larger than the others.
The regression without the outlier makes more sense.
More on Specification and Data Issues
• Leastabsolutedeviationsestimation(LAD)
– The least absolute deviations estimator minimizes the sum of absolute deviations (instead of the sum of squared deviations, i.e. OLS)
– It may be more robust to outliers as deviations are not squared
– The least absolute deviations estimator estimates the parameters of the conditional median (instead of the conditional mean with OLS)
– The least absolute deviations estimator is a special case of quantile regression, which estimates parameters of conditional quantiles