Economics 430
Multiple Regression Concepts
1
Today’s Class
• Introductory Concepts – Projection
– Frisch-Waugh Theorem – Partial Correlation
– Adjusted R2
2
Residuals vs. Disturbances
𝜺𝜺 = Disturbances (Population) e = Residuals (Sample)
In the population : E[X’ε] = 0
Inthesample: 1∑N xiei =0 N i=1
Residuals vs. Disturbances
′ Disturbances (population) y − x βε =
Partitioning y: Residuals (sample)
iii
y = E[y|X] + ε
= conditi onal mean + disturbance
y − x′b = e iii
Partitioningy:
y = Xb + e = project ion + residual
(Note: Projection into the column space of X, i.e., the
set of linear combinations of the columns of X – Xb is one of these.)
Projection
(Stochastic Relation) (Population Regression)
The estimate of is denoted by
(Disturbance associated with point i) (Residual = estimate of )
Projection
In general for the multiple regression case:
M = Residual Maker because MX = 0
Projection
Recall that: , where e = My
(P = Projection Matrix)
(Projection + Residual)
Projection of y into the column space of X
9
Frisch-Waugh (1933) Theorem
• Consider the Model: y ~ β1X1 + β2 X2
• The Frisch-Waugh theorem says that the multiple regression coefficient of any single variable can also be obtained by first netting out (partialing out)the effect of other variable(s) in the regression model from both the dependent variable and the independent variable.
Frisch-Waugh (1933) Theorem
Algorithm
• Model: log(wages) ~ β1educ + β2exper – X1 = educ, X2 = exper, y =log(wages)
• Goal: Estimate β1
• Step 1: regress y ~ X2 save the residuals
(= res1)
• Step 2: regress X1 ~ X2 save the residuals.
(= res2)
• Step 3: regress res1 ~ res2β1
library(AER) library(np) data(wage1)
Frisch-Waugh (1933) Theorem
R Example
Estimate the model to compare results:
Step 0: m0 = lm(lwage ~ educ + exper, data = wage1)
Step 1: m1 = lm(lwage ~ exper, data = wage1)res1 = m1$res Step 2: m2 = lm(educ ~ exper ,data = wage1)res2 = m2$res Step 3: m3 = lm(res1 ~ res2)β1 (coefficient for educ)
Frisch-Waugh Result
We can show after some algebraic manipulation that: b2 = [X2’M1X2]-1[X2’M1y].
• This is Frisch and Waugh’s famous result – the “double residual regression.”
• How do we interpret this? A regression of residuals on residuals.
• “We get the same result whether we (1) detrend the other variables by using the residuals from a regression of them on a constant and a time trend and use the detrended data in the regression or (2) just include a constant and a time trend in the regression and not detrend the data”
• “Detrend the data” means compute the residuals from the regressions of the variables on a constant and a time trend.
Partial Correlation
• Let X and Y be the two variables we have found to be correlated.
• Let r(X,Y) = simple correlation
• Introduce a third variable Z which may or may not mediate
the relationship between X and Y.
• Let r((X,Y)|Z) = partial correlation of X and Y, controlling for Z.
• Result: If r(X,Y) is relatively large, but r((X,Y)|Z) is much smaller, we can conclude that Z is a mediating variable. Z may explain, at least in part, the observed relationship between X and Y.
Partial Correlation
• Study on language skills (=x) and children’s toe size (=y).
– Finding: strong correlation between language skills and size of the big toe.
• Changes in X may be a cause of changes in Y (or vice versa).
• How about a third variable, Z (e.g., age)?
• Could Z be producing changes in both X and Y?
Partial Correlation
Study results:
X = measure of language skills
Y = size of big toe
Z = child’s age
r(X,Y) = 0.40
r(X,Z) = 0.55, r(Y,Z) = 0.65
r((X,Y)|z) = 0.07 (much smaller than 0.4)
Age explains the relationship between language and big toe size
Partial Correlation
• Consider the Model: log(wages) ~ β1educ + β2exper
Q: How much of the measured correlation between wages and educ reflects a direct
relation between them, instead of the fact that wages and education level tend to rise with age?
A: Partial Correlation Coefficient
Partial Correlation
Algorithm
• Step 1: y* = the residuals in a regression of wages on a constant and age
• Step 2: z* = the residuals in a regression of education on a constant and age
• Step 3: r*yz= partial correlation of wages and education controlling for age.
• Note: In R, you can use the “ppcor” package
library(AER) library(np) library(ppcor)
Partial Correlation
R Example
data(“PSID1976”, package = “AER”)
PSID1976$kids <- with(PSID1976, factor((youngkids + oldkids) > 0,
levels = c(FALSE, TRUE), labels = c(“no”, “yes”))) PSID1976$nwincome <- with(PSID1976, (fincome - hours * wage)/1000) wage = PSID1976$wage
age = PSID1976$age
education = PSID1976$education
y.data <- data.frame(wage, age, education)
# Compute/Interpret 'simple correlation'
cor(y.data)
# Compute/Interpret 'partial correlation'
pcor(y.data)
# Partial correlation between "wage" and "educ" given "age" pcor.test(y.data$wage,y.data$education,y.data[,c("age")])
Measure of Fit Theorem
Change in R2 When a Variable is Added to the Regression: • Let R2Xz = R2 from the regression of y on X and
additional variable z.
• Let R2X = R2 from the regression of y on X.
• Let r*yz = partial correlation between y and z controlling for X.
2
Measure of Fit
Adjusted R2
• Let n = number of observations and K = number of parameters estimated. We define the adjusted R2 as:
• Theorem: Change in adjusted R2 When a Variable is Added to the Regression:
In a multiple regression, adjusted R2 will fall (rise) when the variable x is deleted from the regression if the square of the t-ratio associated with this variable is greater(less) than 1.
Adjusted R2 R Example
library(AER)
library(np)
library(ppcor)
library(AER)
data("PSID1976", package = "AER")
Model 1:
fit1 = lm(wage ~ experience+education, data = PSID1976) summary(fit1)
Q: Look at R2 and adjusted R2. What should happen to them if we add another variable to model?
Model 2:
fit2 = lm(wage ~ experience+education+youngkids, data = PSID1976) summary(fit2)