ECONOMETRICS I ECON GR5411
Lecture 6 – Linear Regression Model II by
Seyhan Erden
Columbia University
MA in Economics
Greene, Erden, Hansen and other textbook notation:
Ø Population parameters: 𝛽, 𝜀 (Greene uses 𝛽, 𝜀 as well) Basically any Greek or Latin letter upper or lower case without a hat is a population parameter.
𝜀 = 𝑦 − 𝑋𝛽 + 𝜀 ( = 𝑦 ( − 𝑥 (* 𝛽
Ø Sample estimates: 𝛽, 𝑒̂ (Greene uses 𝑏, 𝑒) Basically, any Greek letter with hat or any Latin letter upper or lower case belongs to a sample.
𝑒 ̂ = 𝑦 − 𝑋 𝛽+
𝑒 ̂ ( = 𝑦 ( − 𝑥 (* 𝛽+
𝑒 ( = 𝑦 ( − 𝑥 (* 𝑏 𝑖 𝑛 𝐺 𝑟 𝑒 𝑒 𝑛 𝑒
Ø Population regression function, matrix and vector notation:
𝐸𝑦|𝑋 =𝑋𝛽 𝐸𝑦(|𝑥( =𝑥(*𝛽
Ø Sample estimate, matrix and vector notation: 𝑦5 = 𝑋 𝛽+
𝑦5 ( = 𝑥 (* 𝛽+
9/28/20 Lecture 4 GR5411 by Seyhan Erden 2
Vector and Matrix Notation Match:
Simple regression model:
Vector version: Matrix version:
Matching:
𝑦 ( = 𝒙 *( 𝛽 + 𝜀 ( 𝑦 = 𝑋𝛽 + 𝜀
;
8 𝒙 ( 𝒙 *( = 𝑋 * 𝑋
(9: ;
8𝒙(𝑦( =𝑋*𝑦
(9:
Lecture 5 GR5411 by Seyhan Erden 3
9/28/20
Least Squares Regression:
Since
The OLS estimator can be written as
𝑋*𝑋 <:𝑋*𝑦 = 𝑋*𝑋/𝑛 <:𝑋*𝑦/𝑛
where 𝑘×𝑘 matrix is sample average of 𝑥(𝑥C* 11;
𝛽+ = 𝑠 < : 𝑠 ?? ?@
𝑠 ? ? = 𝑛 𝑋 * 𝑋 = 𝑛 8 𝑥 ( 𝑥 (* (9:
and 𝑘×1 vector is sample average of 𝑥(𝑦( 1*1;
𝑠?@ = 𝑛 𝑋 𝑦 = 𝑛 8 𝑥( E 𝑦(
9/28/20 Lecture 5 GR5411 by Seyhan Erden 4
(9:
Assumptions
Ø𝐴1 − Linearity: G@ does not depend on 𝑥I
G?H
Ø𝐴2 − 𝑋 has full rank: 𝑋 has full column rank.
Regressors are linearly independent.
Ø𝐴3−Exogeneityofregressors:𝐸𝜀|𝑋 =0 Ø𝐴4 − Spherical errors: homoskedasticity and no
serial correlation.
Ø𝐴5 − 𝑥O can be fixed or random:
Ø𝐴6 − Normal distribution: the disturbances, 𝜀( are
normally distributed 𝜀(~𝑁(0, 𝜎U), 𝜀|𝑋~𝑁 0, 𝜎U𝐼 9/28/20 Lecture 4 GR5411 by Seyhan Erden 5
A1: Is linearity restrictive? Ø𝑦 = 𝐴𝑋X𝑒Y implies
𝐿𝑛𝑦 = 𝐿𝑛𝐴 + 𝛽𝐿𝑛𝑋 + 𝜀
This is known as constant elasticity form.
The elasticity of 𝑦 with respect to changes in 𝑥 is G[;@ =𝛽I
G[;?H
where 𝑥I is the 𝑘\] column of 𝑋 matrix.
Linearity assumption (A1) can be written compactly as
𝑦=𝑋 𝛽+𝜀
9/28/20
𝑛×1 𝑛×𝑘 𝑘×1 (𝑛×1)
Lecture 4 GR5411 by Seyhan Erden
6
Is linearity restrictive?
ØSemi-log model for growth rates: 𝐿𝑜𝑔𝑦\ =𝑋\*𝛽+𝛿𝑡+𝜀\
In this model autonomous growth rate (growth rate over time that is not explained by the model)
is
𝜕𝐿𝑜𝑔𝑦\ 𝜕𝑡
=𝛿
9/28/20 Lecture 4 GR5411 by Seyhan Erden 7
Is linearity restrictive?
ØOther variations of the general form: 𝑓𝑦\ =𝑔(𝑥\*𝛽+𝜀\)
also fit into definition of linear model.
9/28/20 Lecture 4 GR5411 by Seyhan Erden 8
A2: X has Full Column Rank
(Identification condition)
(No Perfect Multicollinearity)
Assumption:
𝑋 is an 𝑛×𝑘 matrix with rank 𝑘 (𝑋 has full column rank)
The columns of 𝑋 are linearly independent. This assumption is known as identification
condition
None of the 𝑘 columns of the data matrix 𝑋 can be expressed as a linear combination of the other
columns of 𝑋.
9/28/20 Lecture 4 GR5411 by Seyhan Erden 9
Identification condition:
Example:𝑦=𝑋:𝛽: +𝑋U𝛽U +𝑋d𝛽d +𝑋e𝛽e +𝜖
ØIdentification problem when 𝑋e = 𝑋U +𝑋d
ØTo see this
𝑦=𝑋:𝛽: +𝑋U𝛽U +𝑋d𝛽d + 𝑋U +𝑋d 𝛽e +𝜖
=𝑋:𝛽: +𝑋U 𝛽U +𝛽e +𝑋d 𝛽d +𝛽e +𝜖 We can only identify 𝛽U + 𝛽e and 𝛽d + 𝛽e but
we cannot identify each parameter separately.
ØIf 𝑋 does not have full rank, 𝑋*𝑋 is not
invertible.
9/28/20 Lecture 4 GR5411 by Seyhan Erden 10
A3: Exogeneity.
Conditional mean restriction:
𝐸 𝜀:|𝑋 𝐸 𝜀:|𝑥:,𝑥U,...𝑥I
𝐸𝜀|𝑋 = 𝐸𝜀U|𝑋 = 𝐸𝜀U|𝑥:,𝑥U,...𝑥I =0 ⋮⋮
𝐸 𝜀;|𝑋 𝐸 𝜀;|𝑥:,𝑥U,...𝑥I
Implications:
ØUnconditional mean of 𝜀 is zero
𝐸𝜀( =𝐸𝐸𝜀(|𝑋 =𝐸0=0 Ø𝐸𝑦|𝑋 =𝑋𝛽
9/28/20 Lecture 4 GR5411 by Seyhan Erden 11
Proof of Exogeneity.
the proof is a good illustration of the use of properties of conditional expectations:
Proof: Since 𝑥CI is an element of 𝑋, strict exogeneity implies
𝐸𝜀(|𝑥CI =𝐸𝐸𝜀(|𝑋|𝑥CI =0
by the Law of Iterated Expectations from probability theory.
9/28/20 Lecture 4 GR5411 by Seyhan Erden 12
A4: Spherical errors:𝐸 𝜀𝜀* (Homoskedasticity)
= 𝜎U𝐼
... 𝐸(𝜀:𝜀;|𝑋)
𝐸(𝜀:𝜀:|𝑋) 𝐸(𝜀:𝜀U|𝑋) 𝐸 𝜀𝜀*|𝑋 = 𝐸(𝜀U𝜀:|𝑋) 𝐸(𝜀U𝜀U|𝑋)
... 𝐸(𝜀U𝜀;|𝑋) ⋮⋮⋮
𝐸(𝜀;𝜀:|𝑋) 𝐸(𝜀;𝜀U|𝑋) ... 𝐸(𝜀;𝜀;|𝑋)
= 𝜎U ... 0 =𝜎U𝐼 ⋮⋱⋮
0 ... 𝜎U
Ø Homoskedasticity:
𝑣𝑎𝑟 𝜀( 𝑋 =𝐸 𝜀(U 𝑋 =𝜎U >0forall𝑖=1,…,𝑛
Ø No serial correlation (no correlation between observations): 𝑐𝑜𝑣 𝜀(,𝜀C 𝑋 =E(𝜀(𝜀C|𝑋)=0 forall𝑖≠𝑗
9/28/20 Lecture 4 GR5411 by Seyhan Erden 13
A5: Data generation process (𝑑𝑎𝑡𝑎 𝑎𝑟𝑒 𝑖𝑖𝑑)
In experimental data 𝒙( is nonstochastic.
In economics (especially in macro economics)
we hardly ever find experimental data.
Data in macro economics is almost always observational.
We will assume that 𝑋 can be a mixture of constant and random variables and (from A3 and A4) the mean and the variance of 𝜀 are both independent of all elements of 𝑋.
9/28/20 Lecture 4 GR5411 by Seyhan Erden 14
A6: Normality
It is convenient to assume that the disturbances are normally distributed with zero mean and constant variance. That is
𝜀|𝑋 ~ 𝑁 0,𝜎U𝐼
Normality assumption is not necessary for many of the results in multiple regression. This assumption is useful in constructing the confidence intervals and test statistics for hypothesis testing.
Later this assumption will be relaxed.
9/28/20 Lecture 4 GR5411 by Seyhan Erden 15
Projection Matrix: 𝑃 Projection matrix:
𝑃=𝑋 𝑋*𝑋 <:𝑋* 𝑃𝑋=𝑋 𝑋*𝑋 <:𝑋*𝑋=𝑋
Observe that,
For any 𝑍 = 𝑋Γ for any Γ matrix (𝑍 lies in the range
spaceof𝑋) * <: * 𝑃𝑍=𝑃𝑋Γ=𝑋 𝑋𝑋 𝑋𝑋Γ=𝑋Γ=𝑍
As an important example if we partition 𝑋 into two matrices 𝑋: and 𝑋U, so that
then
𝑋=𝑋: 𝑋U
𝑃𝑋: = 𝑋:
9/28/20 Lecture 4 GR5411 by Seyhan Erden 16
Projection Matrix: 𝑃
A special case is when 𝑋 = 1 then
𝑃u =𝑖 𝑖*𝑖 <:𝑖* =𝑛1𝑖𝑖′
Observe that,
𝑦w 𝑦w ⋮ 𝑦w
𝑃 u 𝑦 = 𝑛1 𝑖 𝑖 * 𝑦 = 𝑖 𝑦w = This is an 𝑛×1 vector of 𝑦w’s
9/28/20 Lecture 4 GR5411 by Seyhan Erden
17
Properties of projection matrix:
Let 𝑋 be a 𝑛×𝑘 matrix with 𝑛 ≥ 𝑘
1. 2. 3.
Projection matrix is symmetric:
𝑃 = 𝑃′ 𝑋𝑋*𝑋<:𝑋*= 𝑋𝑋*𝑋<:𝑋* *
Projection matrix is idempotent:
𝑃𝑃 = 𝑃
𝑋 𝑋*𝑋 <:𝑋*𝑋 𝑋*𝑋 <:𝑋* = 𝑋 𝑋*𝑋 <:𝑋*
Trace of the projection matrix is to the number of columns of 𝑋, (𝑘)
𝑡𝑟𝑃 =𝑘
𝑡𝑟 𝑋 𝑋*𝑋 <:𝑋* =𝑡𝑟 𝑋*𝑋 <:𝑋*𝑋 =𝑡𝑟 𝐼I =𝑘
9/28/20 Lecture 4 GR5411 by Seyhan Erden 18
Properties of projection matrix:
4.
The eigenvalues of 𝑃 are 1 and 0. There are k eigenvalues equaling 1 and 𝑛 − 𝑘 equaling 0.
Proof: We can write 𝑃 as 𝐻Λ𝐻′ where 𝐻 is orthonormal (orthogonal and unit) Λ contains the eigenvalues Since 𝑃 is idempotent,
𝑃𝑃 = 𝐻Λ𝐻*𝐻Λ𝐻* = 𝐻ΛU𝐻′
Then it must be true that Λ = ΛU and eigenvalues
𝜆( =𝜆U( for𝑖=1,...𝑘.Thus,𝜆( mustbe0or1. Rank of 𝑃 is 𝑘.
5.
9/28/20
Lecture 4 GR5411 by Seyhan Erden 19
Projection:
Note that since 𝑦5 = 𝑋𝛽+ = 𝑋 𝑋*𝑋 <:𝑋*𝑦, 𝑦5 = 𝑃 𝑦
𝑃 is the matrix formed from 𝑋 such that when vector 𝑦 is premultiplied by 𝑃, the result is the fitted values in the regression of 𝑦 on 𝑋 (the result is 𝑦5)
9/28/20 Lecture 4 GR5411 by Seyhan Erden 20
Orthogonal Projection:
Define
Then
𝑀=𝐼−𝑃=𝐼−𝑋 𝑋*𝑋 <:𝑋*
𝑀𝑋= 𝐼−𝑃 𝑋=𝑋−𝑋 𝑋*𝑋 <:𝑋*𝑋=0 Thus 𝑀 and 𝑋 are orthogonal.
𝑀 is called orthogonal projection (anhibilator) matrix, due to the property that for any 𝑍 matrix that is in the range space of 𝑋
𝑀𝑍= 𝐼−𝑃 𝑍=𝑍−𝑍=0
M is also called residual maker matrix due to the property 𝑒̂ = 𝑀𝑦 (we will show this soon...)
9/28/20 Lecture 4 GR5411 by Seyhan Erden 21