MR_Concepts
Economics 403A
Multiple Regression
Concepts
Dr. Randall R. Rojas
1
Today’s Class
• Introductory Concepts
– Projection
– Frisch-Waugh Theorem
– Partial Correlation
– Adjusted R2
2
Residuals vs. Disturbances
1=
=å
In the population: E[ ‘ ] =
1In the sample : ei
X 0
x 0
e
N
iiN
! = Disturbances (Population)
e = Residuals (Sample)
Residuals vs. Disturbances
¢- = e
¢- =
i i i
i i i
Disturbances (population) y
Partitioning : = E[ | ] +
Residuals (sample) y e
Partitio
x
y y y X
xb
b
ε
= conditional mean + disturbance
ning : = +
, i.e., the
set of linear combinations of the columns of –
y y Xb
X
X Xb
e
= projection + residual
( Note : Projection into the column space of
is one of these.)
Projection
yi = x
0
i� + “i = x
0
ib+ ei
“i = yi � x0i� (Disturbance associated with point i)
E[yi|xi] = x0i� (Population Regression)
byi = x0ibE[yi|xi] byiThe estimate of is denoted by à
yi = x
0
i� + “i (Stochastic Relation)
“iei = yi � x0ib (Residual = estimate of )
Projection
In general for the multiple regression case:
e = y �Xb
= My
M = Residual Maker because MX = 0
y = Xb+ e
e = My
Projection
Recall that: , where e = Myy = by + e
by = y � e
= y �My
= (I�M)y
= Py
y = Py +My
(P = Projection Matrix)
(Projection + Residual)
Projection of y into the column space of X
by = Py
9
Frisch-Waugh (1933) Theorem
• Consider the Model: y ~ b1X1 + b2 X2
• The Frisch-Waugh theorem says that the
multiple regression coefficient of any single
variable can also be obtained by first netting
out (partialing out)the effect of other
variable(s) in the regression model from both
the dependent variable and the independent
variable.
Frisch-Waugh (1933) Theorem
Algorithm
• Model: log(wages) ~ b1educ + b2exper
– X1 = educ, X2 = exper, y =log(wages)
• Goal: Estimate b1
• Step 1: regress y ~ X2 àsave the residuals
(= res1)
• Step 2: regress X1 ~ X2 àsave the residuals.
(= res2)
• Step 3: regress res1 ~ res2 à b1
Frisch-Waugh (1933) Theorem
R Example
library(AER)
library(np)
data(wage1)
Estimate the model to compare results:
Step 0: m0 = lm(lwage ~ educ + exper, data = wage1)
Step 1: m1 = lm(lwage ~ exper, data = wage1) àres1 = m1$res
Step 2: m2 = lm(educ ~ exper ,data = wage1) àres2 = m2$res
Step 3: m3 = lm(res1 ~ res2) à b1 (coefficient for educ)
Frisch-Waugh Result
We can show after some algebraic manipulation that:
b2 = [X2’M1X2]-1[X2’M1y].
• This is Frisch and Waugh’s famous result – the “double residual
regression.”
• How do we interpret this? A regression of residuals on residuals.
• “We get the same result whether we (1) detrend the other variables by
using the residuals from a regression of them on a constant and a time
trend and use the detrended data in the regression or (2) just include a
constant and a time trend in the regression and not detrend the data”
• “Detrend the data” means compute the residuals from the regressions of
the variables on a constant and a time trend.
Partial Correlation
• Let X and Y be the two variables we have found to be
correlated.
• Let r(X,Y) = simple correlation
• Introduce a third variable Z which may or may not mediate
the relationship between X and Y.
• Let r((X,Y)|Z) = partial correlation of X and Y, controlling for Z.
• Result: If r(X,Y) is relatively large, but r((X,Y)|Z) is much
smaller, we can conclude that Z is a mediating variable. Z may
explain, at least in part, the observed relationship between X
and Y.
Partial Correlation
• Study on language skills (=x) and
children’s toe size (=y).
– Finding: strong correlation between language
skills and size of the big toe.
• Changes in X may be a cause of changes in Y
(or vice versa).
• How about a third variable, Z (e.g., age)?
• Could Z be producing changes in both X and
Y?
Partial Correlation
§ Study results:
§ X = measure of language skills
§ Y = size of big toe
§ Z = child’s age
§ r(X,Y) = 0.40
§ r(X,Z) = 0.55, r(Y,Z) = 0.65
§ r((X,Y)|z) = 0.07 (much smaller than 0.4)
à Age explains the relationship between language
and big toe size
Partial Correlation
• Consider the Model:
log(wages) ~ b1educ + b2exper
Q: How much of the measured correlation
between wages and educ reflects a direct
relation between them, instead of the fact that
wages and education level tend to rise with age?
A: Partial Correlation Coefficient
Partial Correlation
Algorithm
• Step 1: y* = the residuals in a regression of
wages on a constant and age
• Step 2: z* = the residuals in a regression of
education on a constant and age
• Step 3: r*yz= partial correlation of wages and
education controlling for age.
• Note: In R, you can use the “ppcor” package
r⇤2yz =
(z0⇤y⇤)
2
(z0⇤z⇤)(y
0
⇤y⇤)
Partial Correlation
R Example
library(AER)
library(np)
library(ppcor)
data(“PSID1976”, package = “AER”)
PSID1976$kids <- with(PSID1976, factor((youngkids + oldkids) > 0,
levels = c(FALSE, TRUE), labels = c(“no”, “yes”)))
PSID1976$nwincome <- with(PSID1976, (fincome - hours * wage)/1000) wage = PSID1976$wage age = PSID1976$age education = PSID1976$education y.data <- data.frame(wage, age, education) # Compute/Interpret 'simple correlation' cor(y.data) # Compute/Interpret 'partial correlation' pcor(y.data) # Partial correlation between "wage" and "educ" given "age" pcor.test(y.data$wage,y.data$education,y.data[,c("age")]) Measure of Fit Theorem Change in R2 When a Variable is Added to the Regression: • Let R2Xz = R2 from the regression of y on X and additional variable z. • Let R2X = R2 from the regression of y on X. • Let r*yz = partial correlation between y and z controlling for X. R2Xz = R 2 X + (1�R 2 X)r ⇤ yz 2 Measure of Fit Adjusted R2 • Let n = number of observations and K = number of parameters estimated. We define the adjusted R2 as: • Theorem: Change in adjusted R2 When a Variable is Added to the Regression: In a multiple regression, adjusted R2 will fall (rise) when the variable x is deleted from the regression if the square of the t-ratio associated with this variable is greater(less) than 1. R 2 = 1� n� 1 n�K (1�R2) Adjusted R2 R Example library(AER) library(np) library(ppcor) library(AER) data("PSID1976", package = "AER") Model 1: fit1 = lm(wage ~ experience+education, data = PSID1976) summary(fit1) Q: Look at R2 and adjusted R2. What should happen to them if we add another variable to model? Model 2: fit2 = lm(wage ~ experience+education+youngkids, data = PSID1976) summary(fit2)