ECONOMETRICS I ECON GR5411
Lecture 4 – Matrix Differentiation. Start of Linear Regression
by
Seyhan Erden Columbia University
MA in Economics
The Regression Framework
Consider a simple regression:
𝑇𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒𝑠 = 𝛼 + 𝛽𝑆𝑡𝑢𝑑𝑒𝑛𝑡 𝑇𝑒𝑎𝑐h𝑒𝑟 𝑅𝑎𝑡𝑖𝑜 + 𝜀
The unknown parameters of8stochastic relationship 𝑦6 =𝑥6𝛽+𝜀6
Population quantities are 𝛽 and 𝜀6
Sample estimates of them will be 𝑏 (or 𝛽:) and 𝑒 (or 𝜀̂ ) 66
The population regression is
𝐸𝑦6|𝑥6 =𝑥68𝛽 Sample estimate of 𝐸 𝑦6|𝑥6 is denoted by
𝑦>6 = 𝑥68b or 𝑦>6 = 𝑥68𝛽: 9/21/20 Lecture 4 GR5411 by Seyhan Erden
2
Matrix Notation:
Model
In matrix form
where
𝑦 6 = 𝒙 86 𝛽 + 𝜀 6 𝑦 = 𝑋𝛽 + 𝜀
𝑦 is a 𝑛×1 vector of dependent variables (or regressands).
𝑋 is an 𝑛×𝑘 matrix of independent variables (also called regressors or covariates or predictors)
𝛽 is a 𝑘×1 vector of parameters
𝜀 is a 𝑛×1 vector of disturbances (or error term)
𝐸 𝑦 𝑋 = 𝑓(𝑋; 𝛽) is the population regression function. 9/21/20 Lecture 4 GR5411 by Seyhan Erden 3
Matrix Notation:
𝑦 is a 𝑛×1 vector
𝑦⋮
𝜀 is a 𝑛×1 vector
𝛽=
𝛽M
𝑦I 𝑦J
𝛽 is a 𝑘×1 vector
𝑦=
L
𝛽I
𝛽J ⋮
𝜀=
𝜀I
𝜀J ⋮
𝜀L
9/21/20 Lecture 4 GR5411 by Seyhan Erden 4
Matrix Notation:
𝑥6 is a k×1 vector
𝑥6I
By definition of vector inner product,
𝑥 68 𝛽 = 𝛽 I 𝑥 6 I + 𝛽 J 𝑥 6 J + ⋯ + 𝛽 M 𝑥 6 M 𝑦 6 = 𝒙 86 𝛽 + 𝜀 6 ( 𝑖 = 1 , 2 , … 𝑛 )
𝑥6 =
𝑥6L
Thus,
Also, define 𝑋 as a 𝑛𝑥𝑘 matrix of regressors (or covariates)
𝑥II 𝑥IJ 𝑥IR … 𝑥IM 𝒙I8
𝑥JI 𝑥JJ 𝑥JR … 𝑥JM 𝒙8
𝑋=…=J ⋮
𝑥6J ⋮
9/21/20
Lecture 4 GR5411 by Seyhan Erden
5
…
⋮ 𝒙8L
𝑥LI 𝑥LJ 𝑥LR … 𝑥LM
Matrix Notation:
The disturbance associated with 𝑖TU data point is 𝜀 6 = 𝑦 6 − 𝑥 68 𝛽
can be estimated by
𝑒 6 = 𝑦 6 − 𝑥 68 b
9/21/20 Lecture 4 GR5411 by Seyhan Erden 6
Matrix Differentiation:
Let𝑦=𝑓 𝑥I,𝑥J,…𝑥L 𝑦=𝑓𝑋
Gradient: vector of all partial derivatives
𝜕𝑓
𝜕𝑥I 𝜕𝑓
𝜕𝑓 =
𝜕𝑋
⋮ 𝜕𝑓
𝜕𝑥L
𝜕𝑥J ⋮
9/21/20 Lecture 4 GR5411 by Seyhan Erden 7
Matrix Differentiation:
Similarly,
𝜕𝑓=𝜕𝑓𝜕𝑓… 𝜕𝑓 𝜕𝑋8 𝜕𝑥I 𝜕𝑥J 𝜕𝑥L
Hessian (the matrix of second partial derivatives)
𝜕J𝑓 ⋯ 𝜕J𝑓 𝜕 J 𝑓 = 𝜕 𝑥 IJ 𝜕 𝑥 I 𝜕 𝑥 L
𝜕𝑋𝜕𝑋8 ⋮ ⋱ ⋮
𝜕J𝑓 ⋯ 𝜕 𝑥 L 𝜕 𝑥 I
𝜕J𝑓 𝜕 𝑥 LJ
9/21/20
Lecture 4 GR5411 by Seyhan Erden
8
Matrix Differentiation:
Linear form: 𝑎8𝑥 where 𝑎 and 𝑥 are vectors, this is the inner product.
Quadratic form: 𝑥8𝐴 𝑥 where 𝑥 is a vector 𝐴 is a square matrix.
Bilinear form: 𝑦8𝐵 𝑧 where 𝑦 and 𝑧 are vectors and 𝐵 is a matrix (not necessarily a square matrix)
9/21/20 Lecture 4 GR5411 by Seyhan Erden 9
Matrix Differentiation:
Differentiation: (1) Linear form:
𝜕 𝑎8𝑥 𝜕𝑥
Result is a column vector because 𝑥 is a column vector 𝜕𝑎8𝑥 =𝑎8
𝜕𝑥8
Result is a row vector because 𝑥8 is a row vector
=𝑎
(2) Quadratic form 8
𝜕𝑥𝐴𝑥= 𝐴+𝐴8 𝑥
𝜕𝑥 Show this with a 2×2 case.
9/21/20 Lecture 4 GR5411 by Seyhan Erden 10
Hessian:
Matrix Differentiation:
𝜕𝑥8𝐴𝑥= 𝐴+𝐴8 𝜕𝑥𝜕𝑥8
If𝐴=𝐴8 then
𝜕𝑥8𝐴 𝑥 = 2𝐴𝑥
𝜕𝑥
𝜕𝑥8𝐴 𝑥 = 2𝐴 𝜕𝑥𝜕𝑥8
9/21/20
Lecture 4 GR5411 by Seyhan Erden 11
Matrix Differentiation:
Differentiation(cont’):
(3) Bilinear form
𝜕𝑦8𝐵𝑧 = 𝐵𝑧 𝜕𝑦
𝑦8 is 1×𝑛, 𝐵 is 𝑛×𝑘 and 𝑧 is 𝑘×1. Derivative has to be 𝑛×1. Let 𝑥 = 𝑦 and 𝑎 = 𝐵𝑧 in case (1), derivative is 𝑎, as in case (1)
Similarly,
𝜕𝑦8𝐵𝑧 = 𝐵8𝑦 𝜕𝑧
𝜕𝑦8𝐵𝑧 = 𝑦𝑧′
𝜕𝐵
9/21/20 Lecture 4 GR5411 by Seyhan Erden 12
Linear Regression Model
Linear regression model is used for 3 major purposes:
Ø Estimation
Ø Prediction ØHypothesis testing
9/21/20 Lecture 4 GR5411 by Seyhan Erden 13
Least Squares Regression:
OLS objective function: minimize sum of squared residuals
min𝑆(𝑏)= 𝑦−𝑋𝑏 8 𝑦−𝑋𝑏 `
= 𝑦8𝑦 − 𝑦8𝑋𝑏 − 𝑏8𝑋8𝑦 + 𝑏8𝑋8𝑋𝑏 = 𝑦8𝑦 − 2𝑦8𝑋𝑏 + 𝑏8𝑋8𝑋𝑏
The first order conditions:
𝜕𝑆(𝑏) = −2𝑋8𝑦 + 2𝑋8𝑋𝑏 ab(`) 𝜕𝑏
Setting a` = 0 and solving for 𝑏, gives us the estimator
that minimizes 𝑆(𝑏) (must check the second order conditions to make sure we are minimizing and not maximizing)
9/21/20 Lecture 4 GR5411 by Seyhan Erden 14
Least Squares Regression:
𝜕𝑆(𝑏) = −2𝑋8𝑦 + 2𝑋8𝑋𝑏 = 0 𝜕𝑏
2𝑋8𝑋𝑏 = 2𝑋8𝑦 𝑋8𝑋𝑏 = 𝑋8𝑦
This is known as normal equations. Hence
𝑏= 𝑋8𝑋dI𝑋8𝑦 As long as 𝑋8𝑋 is non-singular
= 𝑋8𝑋 has full rank
= the inverse of 𝑋8𝑋 exists
= the columns of 𝑋8𝑋 are linearly independent The solution that satisfies the FOC
𝛽: = 𝑋 8 𝑋 d I 𝑋 8 𝑦
9/21/20 Lecture 4 GR5411 by Seyhan Erden 15