Econometrics (M.Sc.) 1 Exercises (with Solutions) Chapter 3
1. Problem
(a) Which assumptions are needed for the unbiasedness of the Ordinary Least Squares (OLS) estimator in the linear regression model? (Refer to Assumptions 1-4 in Chapter 3.)
(b) Which additional assumptions are needed to make OLS the best linear unbiased esti- mator?
(c) Correct or false: The phrase “linear” in part (b) refers to the fact that we are estimating a linear model.
Solution
(a) The following answer refers to Assumptions 1-4 from Chapter 3. Note, however, that unbiasedness can also be shown, for instance, under weakly stationary times series (i.e., not i.i.d.) contexts.
First, Assumption 1 must hold, i.e. that the relationship between Yi and Xi is given by
K
Yi =βkXik +εi.
k=1
with Xi1 = 1 for all i and (Yi,Xi) being i.i.d. across i = 1,…,n. Then condi- tional unbiasedness of the Ordinary Least Squares (OLS) estimator βˆ follows from
the strict exogeneity assumption E(ε|X) = 0 which follows from our Assumption 2 (i.e. E(εi |Xi) = 0 for all i) together with Assumption 1 (i.e. (Yi, Xi) is i.i.d. across i = 1,…,n).
(b) The Gauss-Markov theorem states that, under certain assumptions, OLS is the best linear unbiased estimator. “best” refers to the fact that OLS has the smallest variance (=highest efficiency) of all linear unbiased estimators, given the assumptions are true. The assumptions needed for the Gauss-Markov theorem are:
1. Assumption 1 (data generating process must be well defined.)
2. E(ε|X) = 0 (strict exogeneity, already required for unbiasedness. Follows from
Assumptions 1 and 2)
3. Spherical errors, i.e. Var(ε |X) = σ2In (homoscedastic, uncorrelated error terms.)
(c) This statement is false. The phrase ‘linear’ here refers to the fact that the OLS estima- tor is a linear function of Y since we can write βˆ = CY where C is the (K × n) matrix C = (X′X)−1X′. By contrast, the model E(Y |X) = Xβ is called linear because the mean of Y |X is a linear function of the unknown parameters.
2. Problem
(a) Explain, why the elasticity Elxf(x) of a deterministic and differentiable function f, with y = f(x), can be interpreted as the approximate percentage change in y per 1% change in x. Moreover, show that the elasticity with respect to x can be written as
Elxf(x) = ∂ log(f(x)). ∂ log(x)
Econometrics (M.Sc.) 2 (b) Consider the following log-log regression model (our regularity Assumptions 1-4 are
assumed to be true):
log(Yi)=γ1 +γ2log(Xi)+εi, i=1,…,n. (1)
The parameter γ2 in this regression model is often interpreted as the elasticity of f(x) = E(Y |X = x) with respect to x, i.e. as the approximate percentage change in E(Y |X = x) per 1% change in x. When is this true?
Solution
(a) In the deterministic case, y = f(x), the elasticity of the differentiable function f(x) with respect to x is defined as
∂f(x) Elxf(x) = x ∂f(x) = f(x)
(only if f(x) ̸= 0) (for small ∆ > 0)
≈
f(x) ∂x ∂x x
f(x+∆)−f(x) ·100% f(x)
x+∆−x ·100% x
% change in f(x) % change in x
Thus, the elasticity Elxf(x) can be interpreted as the approximate percentage change of f(x) for 1% change of x. The following alternative expression for the elasticity Elxf(x) is often useful (remember ∂ log(x)/∂(x) = 1/x):
Elxf(x) = x 1 ∂f(x) (for f(x) ̸= 0) f(x) ∂x
=
∂x ∂log(f(x)) ∂f(x) (onlyiff(x)>0andx>0)
∂ log(x)
∂f(x) ∂x
1/f (x)
(to reverse the last step use that ∂ log(f(x)) = ∂ log(f(x))/∂x)
Note. In a stochastic setting we cannot assume that y = f(x) for some known func- tion f, because there are always unobserved factors (stochastic errors) affecting the outcome variable. Nevertheless, we can consider the partial elasticity Elxf(x) with respect to x for the conditional mean function f(x) = E(Y |X = x).
(b) We are asked to show when γ2 equals the elasticity of f(x) = E(Y |X = x). First, observe that
log(Y ) = γ1 + γ2 log(X) + ε
⇔ Y = exp(γ1 + γ2 log(X) + ε)
= exp(γ1 + γ2 log(X)) exp(ε),
where we skipped the index i since the same result applies anyways to each i = 1,…,n.
We can now use this equivalent expression for model (1), i.e. Y = exp(γ1+γ2 log(X)) exp(ε), and the above alternative elasticity expression, i.e. Elxf(x) = ∂ log(f(x))/∂ log(x), to
=x
= ∂ log(f(x))
∂ log(x)
∂ log(x) ∂ log(x)/∂x
Econometrics (M.Sc.)
3
do the following derivations:
Elxf(x) = ElxE[Y |X = x]
∂ log E[Y |X = x] ∂ log(x)
=
∂ log E [exp(γ1 + γ2 log(X )) exp(ε)|X = x]
= ∂ log(x)
∂ log{exp(γ1 + γ2 log(x))} + log{E[exp(ε)|X = x]}
=
=
= γ2 +
(using Y = exp(γ1 + γ2 log(X)) exp(ε))
∂ log(x)
∂γ1 + γ2 log(x) + log{E[exp(ε)|X = x]}
∂ log(x)
∂ log{E[exp(ε)|X = x]}
∂ log(x)
So, γ2 equals the elasticity of the conditional mean function f(x) = E(Y |X = x) if ∂(log{E[exp(ε)|X=x]}) =0.Thisisthecase,forinstance,iftheerrortermεisindependent
∂ log(x)
from X since then E[exp(ε)|X = x] = c for some constant c > 0 such that
∂ log{c} ∂ log{c}/∂x
∂ log(x) = ∂ log(x)/∂x = 0
3. Problem
Install the R package ARE and load the package. The ARE-package contains the data set Journals. Check ?Journals to learn more about the data. Create the variables citeprice (journal price per citations) and age (journal age) as following:
# install.packages(“AER”) suppressMessages(library(“AER”))
## attach the data-set Journals to the current R-session data(“Journals”, package = “AER”)
## ?Journals # Check the help file
##
## Select variables “subs” and “price”
journals <- Journals[, c("subs", "price")]
## Define variable 'journal-price per citation' journals$citeprice <- Journals$price/Journals$citations ## Define variable 'journal-age'
journals$age <- 2020 - Journals$foundingyear
## Check variable names in 'journals'
names(journals)
## [1] "subs" "price" "citeprice" "age"
Estimate the coefficients β1 and β2 of the following linear regression model log(Yi)=β1 +β2log(Xi)+εi, i=1,...,n
with log(Y ) =log(subs) (i.e., logarithm of the number of library subscriptions) and log(X) = log(citeprice) (i.e., logarithm of the journal price per citations).
Econometrics (M.Sc.) 4
(a) Do you have heteroscedastic error-term variances? Explain your answer by discussing
a diagnostic plot showing the residuals against the fitted values.
(b) Estimate the standard error of the OLS estimator βˆ2 using an appropriate variance estimator.
Solution
The following code computes the OLS estimation and shows a typical diagonstic plot for checking heteroscedasticitiy in the residuals:
jour_lm <- lm(log(subs) ~ log(citeprice), data = journals)
## Diagnostic plot residuals against fitted values
## plot(y=resid(jour_lm), x=fitted(jour_lm))
## Or slightly more fancy
plot(jour_lm, which=1)
Residuals vs Fitted
IO
BoIES
MEPiTE
34567
Fitted values lm(log(subs) ~ log(citeprice))
(a) The error terms seem to have heteroscedastic variances. This can be seen, for in- stance, by plotting the residuals ˆεi against the fitted values Yˆi = βˆ1 + βˆ2Xi.
Note. One usually plots the residuals ˆεi against the fitted values Yˆi (not on the explana- tory variable Xi), since this works also in the case of multiple regressors (K > 2).
(b) In case of heteroscedastic error term variances, we need to consider a robust het-
Residuals
−3 −2 −1 0 1 2
Econometrics (M.Sc.) 5 eroscedasticity consistent variance estimator such as the following one:
(n×n)
(K ×K )=(2×2)
with Var(ε|X)=diag(vˆ1,…,vˆn), where vˆi =ˆε2i/(1−hi)2, i=1,…,n,
and where hi = [PX]ii is the leverage statistic of Xi. Here is the estimation result using R:
library(“sandwich”) # HC robust variance estimation
## Robust estimation of the variance of \hat\beta:
Var_hat_beta_HC3 <- sandwich::vcovHC(jour_lm, type="HC3")
## Robust standard error of \hat\beta_2
sqrt(diag(Var_hat_beta_HC3)[2])
## log(citeprice)
## 0.03447364
## Comparison with the classic standard error estimation
sqrt(diag(vcov(jour_lm))[2])
−1 −1 ˆ′′′
VarHC3(β) = (X X) X Var(ε |X) X (X X)
## log(citeprice)
## 0.0356132
4. Problem
Consider the following multiple linear regression model:
Yi =β1 +β2X2i +β3X3i +εi, (in matrix notation) Y = Xβ + ε
where β = (1, −5, 5)′, εi is a heteroscedastic error term
εi ∼ N(0,σi2) with σi = |X3i|,
and where for all i = 1,...,n = 100: • X2i ∼N(10,1.52)
i=1,...,n
• X3i ∼ U[0.2,8]
You’re given the following data generated from this regression model:
set.seed(109) # Sets the "seed" of the random number generators:
n <- 100 # Number of observations
## Generate two explanatory variables plus an intercept-variable:
X_1 <- rep(1, n) # Intercept
X_2 <- rnorm(n, mean=10, sd=1.5) # Draw realizations form a normal distr.
X_3 <- runif(n, min = 0.2, max = 8) # Draw realizations form a t-distr.
X <- cbind(X_1, X_2, X_3) # Save as a Nx3-dimensional data matrix.
beta <- c(1, -5, 5)
Econometrics (M.Sc.) 6
## Generate realizations from the heteroscadastic error term
eps <- rnorm(n, mean=0, sd=abs(X_3))
## Dependent variable:
Y <- X %*% beta + eps
(a) Compute the theoretical variance of the OLS estimator βˆ for the given data generating process and the given data.
(b) Use a Monte-Carlo simulation to generate 10000 variance estimates
ˆˆ VarHC3,1(β2|X), . . . , VarHC3,10000(β2|X)
and 10000 variance estimates
ˆˆ VarHC3,1(β3|X), . . . , VarHC3,10000(β3|X).
These estimates represent typical estimation results. (Of course, in practice you ob-
ˆˆˆ serve only one estimation result, VarHC3(β2|X), for Var(β2|X) and one other, VarHC3(β3|X),
for Var(βˆ3|X). We can only do such a Monte Carlo simulation study when we know everything about the data-generating process – also the usually unknown quantities such as the value of β.)
ˆˆ
Do the Monte Carlo realization VarHC3,r(β2|X) and VarHC3,r(β3|X), r = 1,...,10000
estimate the true variances Var(βˆ2|X) and Var(βˆ3|X) well on average? Are there large estimation uncertainties? Visualize your results and describe your findings.
Solution
(a) The theoretical variance of the OLS estimator βˆ is given by Var(βˆ|X) = (X′X)−1 X′ Var(ε |X)X (X′X)−1 ,
where Var(ε |X) = diag(X321, . . . , X32n). To compute the values of the variance-covariance matrix Var(βˆ|X) we can use R as following:
Var_theo <- solve(t(X) %*% X) %*% t(X) %*% diag(X_3^2) %*%
X %*% solve(t(X) %*% X)
rownames(Var_theo) <- c("", "", "") # remove row-names
colnames(Var_theo) <- c("", "", "") # remove col-names
Var_theo
##
## 7.22896936 -0.683424680 -0.072931570
## -0.68342468 0.069232031 -0.007624144
## -0.07293157 -0.007624144 0.057342462
(b) R code for the Monte-Carlo simulation:
library("sandwich") # HC robust variance estimation
MC_reps <- 10000 # Number of Monte Carlo replications
VarHC3_estims <- matrix(NA, 3, MC_reps) # Container to collect the results
Econometrics (M.Sc.) 7
for(r in 1:MC_reps){
## Generate new realizations from the heteroscedastic error term
eps <- rnorm(n, mean=0, sd=abs(X_3))
## Generate new realizations from the dependent variable: Y <-X%*%beta+eps
## Now OLS estimation
lm_fit <- lm(Y ~ X)
## Now robust estimation of the variance of \hat\beta:
VarHC3_estims[,r] <- diag(sandwich::vcovHC(lm_fit, type="HC3"))
}
VarHC3_estims_means <- rowMeans(VarHC3_estims)
## Compare the theoretical variances Var(\hat\beta_2) and Var(\hat\beta_3)
## with the means of the 10000 variance estimations
## \hat{Var}(\hat\beta_2) and \hat{Var}(\hat\beta_3)
cbind(diag(Var_theo)[c(2,3)], VarHC3_estims_means[c(2,3)])
## [,1] [,2]
## 0.06923203 0.07232364
## 0.05734246 0.05935843
plot(x=c(1,2), y=c(0,0), ylim=range(VarHC3_estims[c(2,3),]), type="n", axes = FALSE, xlab = "", ylab = "")
box()
axis(1, c(1,2), labels=c(2,3))
axis(2)
points(x=rep(1,MC_reps), y=VarHC3_estims[2,], pch=21, col=gray(.5,.25), bg=gray(.5,.25)) points(x=1, y=VarHC3_estims_means[2], pch=21, col="black", bg="black")
points(x=1, y=diag(Var_theo)[2], pch=1)
points(x=rep(2,MC_reps), y=VarHC3_estims[3,], pch=21, col=gray(.5,.25), bg=gray(.5,.25)) points(x=2, y=VarHC3_estims_means[3], pch=21, col="black", bg="black")
points(x=2, y=diag(Var_theo)[3], pch=1)
legend("top",
legend = c("10000 Variance estimates", "Mean of variance estimates",
"True Variance"), bty = "n", pt.bg = c(gray(.5,.75),"black","black"),
pch = c(21,21,1), col=c(gray(.5,.75),"black","black"))
Econometrics (M.Sc.)
8
10000 Variance estimates Mean of variance estimates True Variance
0.05 0.10 0.15 0.20
23
ˆ ˆ
On average, the 10000 estimates VarHC3(β2|X) and VarHC3(β3|X) approximate well the true
variances Var(βˆ2|X) = 0.069 and Var(βˆ2|X) = 0.057, but there are considerable estimation uncertainties (large variances of the estimators) with estimates ranging from 0.025 to 0.211
ˆ ˆ and from 0.017 to 0.157. The variances of the estimators VarHC3(β2|X) and VarHC3(β3|X)
decline for larger sample sizes n > 100.