Linear Algebra Review
Australian National University
(James Taylor) 1 / 6
b0
Linear Algebra Review
Matrix multiplication
Linear Transformations
Transposition
Determinants and Invertability
(James Taylor) 2 / 6
Matrix Multiplication
(James Taylor) 3 / 6
L it.I i I
Linear Transformations
(James Taylor) 4 / 6
I Dinention Def e f 212 IRis a lineartransformation if
f ax afax Ha c IR
HXE2k
Theorem If fi2K 212islinear thenfcxt.mxforsomeMEIR
f IED Proof Letfex m X fcaxi m.ca
x aCmx a fox
n m
islinear if flax a fix Hae212 XEIRn Dinentian Def I e f 2127212
Egl f IGT EEG flatlyD ft D Ya
a If afan
Egs f III I f late’GD fCEYyD If a I’d afan
Theorem Let f HI 2km f islinearoftheeexistamatrix µ suchthat fix MX
claim f If 415L’S To 15
0
p EY
Transposition
(James Taylor) 5 / 6
µT µ flips rows t columns
I IT xELR x x’s El 43
in
Properties I NHU
L CAB B’H KABK c AB CB’A
3 In O’if I In
4 IU is symmetric if µ M
J AtB A’tB
6 mxn In xm
Determinants and Invertability
(James Taylor) 6 / 6
0
Def a matrix14 is invertible if thereexists I
a matrix N suchthat MN NIU I write N hi
I dim Is M EIR invereible Yes aslongas
m to
Asnd m s m in I
2 dim Is 9 invertible Maybe
I isnotthurible I isnotthurible as I II 11 Ez f o
If IT isnotthurible I’s is thurible i4371 11 9775 o
X o y T h ft Z
Properties 1 If M is mxnmatrix nth theIUisnotinvertible
2 If Mis nxnmatrix MN In thenNM In NIU
t
3 M J At 4 Shoe SocksTheorem CNN Nki
Determinants and Invertability
(James Taylor) 6 / 6
0
Determinants Let IUis an hxumatrix
DetUU is annotionofthesizeofM or absolutevalue
det III 7 s Det III
Det NU L detLM Det K M KhdetVU
det L Ml t detCMI
Det MN Det M Det N
Det MTN f det Un t dit N
Theorem µ is interable ifandonlyif det Uu to
Det I bae ad bc
Models and Specifications
Australian National University
(James Taylor) 1 / 8
Ad Hoc methods
A collection of often-reasonable rules of thumb
typically easy to implement
does not require a particular statistical model
examples: random walk, historical mean, weighted historical mean
(James Taylor) 2 / 8
OLSmodel
Ad Hoc methods
Statistical properties of the forecasts are often di�cult (or impossible)
to analyse
Described as a “black box”
Example: Ordinary Least Squares (OLS)
Example: Exponential Smoothing
(James Taylor) 3 / 8
Specification vs Model
Model:
a full statistical description
we know the data generating process
and can generate data according to the model
Specification:
A component of a model, but not yet a full model
not su�cient to generate data
(James Taylor) 4 / 8
A
fake
Specification vs Model – Example
Consider
yt = xtb + et
with yt as the variable of interest,
xt as a vector of explanatory variables,
b as a parameter vector, and
et as noise
(James Taylor) 5 / 8
data error
tr t
T
parameters
Specification vs Model – Example
yt = xtb + et
If we assume et ⇠ F for some distribution F , then this is a model.
E.g. If F is a zero-mean normal distribution, then this is the classical
linear regression model
We can generate data, know the full statistical properties of yt
If we don’t assume a particular distribution, we don’t have a model
E.g. Only assume zero-mean, not necessarily normal
We can’t generate data, can’t determine the full statistical properties
of yt
(James Taylor) 6 / 8
Et N lo62
The Specification
yt = xtb + et
Suppose we only specify the conditional expectation E[yt | b]:
ŷt = E[yt | b] = xtb
This equality is implied by the assumption et ⇠ N (0, s2), but
it is also implied by any model with E[et | xt , b] = 0.
So even given this equality, we still cannot generate data, determine
conditional densities f (yT+h | IT , q), etc.
(James Taylor) 7 / 8
Estimation
So far we have assumed all parameters are known
But practically we need to estimate them
Suppose we assume only that
ŷt = E[yt | b] = xtb
What then is a good estimator for b?
(James Taylor) 8 / 8
p
E
It Et o
OLS without Matrices
Why we use matrices
Australian National University
(James Taylor) 1 / 7
b2
OLS – Definition
We assumed that, on average, yt is close to ŷt = xtb.
So the error (yt � xtb) should be small on average
One option is to find the value of b which minimises the sum of
squared errors.
b̂ = argmin
b
T
Â
t=1
(yt � xtb)2
b̂ is the OLS estimator
(James Taylor) 2 / 7
Yt Xtftst E StlB to
arg argument
OLS – Computation
Can we solve b̂ analytically?
We want a closed form formula for b̂
We can indeed find one, and it’s easy to implement (in Matlab)
The simplest derivation of the formula uses matrix di↵erentiation
But here we’ll try small examples without matrices
(James Taylor) 3 / 7
A small example
Consider
yt = a + et
Only one parameter a
OLS estimator is
â = argmin
a
T
Â
t=1
(yt � a)2
As usual, we find the minimiser through di↵erentiation
â =
1
T
T
Â
t=1
yt
(James Taylor) 4 / 7
Xt l Ht
I
FaII Iyt a f IEIa lyE sayt t 24 IELLyt t 22 0
EHyt EEH a T I T EIYt
A small example
Consider
yt = a + et
Only one parameter a
OLS estimator is
â = argmin
a
T
Â
t=1
(yt � a)2
As usual, we find the minimiser through di↵erentiation
â =
1
T
T
Â
t=1
yt
(James Taylor) 4 / 7
IELst o
A medium example
Consider
yt = a + xtb + et
Only two parameters, a and b
OLS estimator is
(â, b̂) = argmin
a,b
T
Â
t=1
(yt � a � xtb)2
As usual, we find the minimiser through di↵erentiation
But, need to do two derivatives (two parameters)
(James Taylor) 5 / 7
Xt 5
singlenumber
(James Taylor) 6 / 7
IT 5 Iyt X XtBI IILy Igt’t L’tXtB 2aYt HtYtf k XtB
1Ex Lyt 2XtB o
Ip 5 Iyt a Xtpi EIkXtp Htyt H XtITI o
IEExtf EI ltxtYtt xt
f I Xt EEXtye t IEEXtt4
f I TEXtTtt EEIXt
T
A medium example
â =
 x2t ·  yt �  xt · Â(xtyt)
T Â x2t � (Â xt)2
b̂ =
T Â(xtyt)� Â xt · Â yt
T Â x2t � (Â xt)2
Imagine there were k parameters.
(James Taylor) 7 / 7
Matrix Di↵erentiation
Australian National University
(James Taylor) 1 / 10
3.3
Matrix Di↵erentiation
Matrix Di↵erentiation – just like scalar di↵erentiation, but keep track
of all the partial derivatives
Gives lovely compact expressions
Easy to manipulate
Results look reasonable, so are easy to remember
(James Taylor) 2 / 10
Di↵erentiating a Real-Valued Function – Definition
Let y : Rn ! R. The derivative of y(x) with respect to x is the row
vector
∂y(x)
∂x
=
⇣
∂y(x)
∂x1
· · · ∂y(x)∂xn
⌘
Example: Let x = (x1, x2, x3)
0 and y(x) = x1 + x2x3 � x43 . Then
∂y(x)
∂x
=
⇣
∂y(x)
∂x1
∂y(x)
∂x2
∂y(x)
∂x3
⌘
=
⇣
1 x3 x2 � 4×33
⌘
(James Taylor) 3 / 10
000
Di↵erentiating a Vector-Valued Function – Definition
Let y : Rn ! Rm. That is
y(x) = y
0
BB
@
x1
…
xn
1
CC
A =
0
BB
@
y1(x)
…
ym(x)
1
CC
A
The derivative of y(x) with respect to x, also known as the Jacobian
matrix of y is
Jy(x) =
∂y(x)
∂x
=
0
BB
@
∂y1(x)
∂x1
· · · ∂y1(x)∂xn
…
. . .
…
∂ym(x)
∂x1
· · · ∂ym(x)∂xn
1
CC
A
(James Taylor) 4 / 10
7
l
dXex
Di↵erentiation – Linear Transformations
Let y : R3 ! R2 with
y(x) =
y1(x)
y2(x)
!
=
x1 + 3×2 + 4×3
x2
!
=
1 3 4
0 1 0
!
0
BB
@
x1
x2
x3
1
CC
A
y is linear in x, so it’s derivative should be a constant matrix
Jy(x) =
∂y1(x)
∂x1
∂y1(x)
∂x2
∂y1(x)
∂x3
∂y2(x)
∂x1
∂y2(x)
∂x2
∂y2(x)
∂x3
!
=
1 3 4
0 1 0
!
(James Taylor) 5 / 10
g 41 Y fcxtmx.fxfcxt.in
a constant
41
y
Di↵erentiation – Linear Transformations
Di↵erentiating y(x) = Ax
Theorem: Let A be an m⇥ n matrix, and let y : Rn ! Rm be given by
y(x) = Ax. Then
∂y(x)
∂x
=
∂
∂x
Ax = A
Proof: Right Now.
(James Taylor) 6 / 10
text mx fix m
(James Taylor) 7 / 10
LetXCX AX Want tix A
I it i t
ex
H H t an ar a n
JX 2 2 An
2441
AH Aw Azn
2441 2t
I H
t
24m41 24m41 24m41
2 1 2 2 dXh
2
T Ax A
Di↵erentiation – Quadratic Forms
Let y : R3 ! R with
y(x) = x21 + x
2
2 + 3x1x2 + 6x1x3 =
⇣
x1 x2 x3
⌘
0
BB
@
1 3 4
0 1 0
2 0 0
1
CC
A
0
BB
@
x1
x2
x3
1
CC
A
Functions of the form y(x) = x0Ax are called quadratic forms
y is quadratic in x, so its derivative should be linear in x.
Jy(x) =
⇣
2×1 + 3×2 + 6×1 2×2 + 3×1 6×1
⌘
(James Taylor) 8 / 10
hotuniqueg
A t x
powerarealways2
00 OfO 0 000 000 0
Or24
27 2 113
1611
3 guard linear
Di↵erentiation – Quadratic Forms
Di↵erentiating y(x) = x0Ax
Theorem: Let A be an n⇥ n matrix, and let y : Rn ! R be given by
y(x) = x0Ax. Then
∂y(x)
∂x
=
∂
∂x
x0Ax = x0(A+ A0)
Proof: Right Now.
(James Taylor) 9 / 10
1dtm TxMX 2mX nextMX XLmtm
(James Taylor) 10 / 10
LetXlX x’Ax Want Xcx x’LALA’t f
ix x’ax tx.xn.xy aaii.aai.hn IYmf x xnLaYta
an
n n Amx101mWtannin
AntitankXrt ainxixn auxixz a XEl.itamxzxn i.it nlaniXitanLXst tannXn
241 1
27 2911 1191212 1
it AinXn 194 2 t i 10inXu
x.in i e xnfl i ntiLi D
X Xn
A t 9131931 Aintan
away 2am Ay1932 Autam
anitanamian i i ai
2auX XzLantau tX Caistasi 1 tlnlanitain efird
1stcomponent dI x ATA
OLS with Matrices
Australian National University
(James Taylor) 1 / 10
3.4
OLS Computation
Recall
b̂ = argmin
b
T
Â
t=1
(yt � xtb)2
or, in matrix form
b̂ = argmin
b
(y�Xb)0(y�Xb)
(James Taylor) 2 / 10
e IKil
I I
OLS Computation
We want to find b̂, so di↵erentiate the sum of squared errors with respect
to b, and set equal to zero
∂
∂b
(y�Xb)0(y�Xb) =
∂
∂b
(y0y� y0Xb � b0X0y+ b0X0Xb)
=
∂
∂b
(y0y� 2y0Xb + b0X0Xb)
= �2y0X+ b0(X0X+ (X0X)0)
= �2y0X+ 2b0X0X = 0
(James Taylor) 3 / 10
Recallftp.AB A B’AB f’CATA’s
fix’yky’xB
Lovelyif P’x’yy’xp
f y XB ly Xp Ip y’y P’x’y y’xBtB’X’Xp Trueman’scase
Y’xp p justasiyu
number1
Kk TH
o y’x y’xt B X’xtlx’XY
Ki
AsP’x’yiski isymmetric
24 28x’x O p’xyy’xp
x’x x’x xx
OLS Computation
We want to find b̂, so di↵erentiate the sum of squared errors with respect
to b, and set equal to zero
∂
∂b
(y�Xb)0(y�Xb) =
∂
∂b
(y0y� y0Xb � b0X0y+ b0X0Xb)
=
∂
∂b
(y0y� 2y0Xb + b0X0Xb)
= �2y0X+ b0(X0X+ (X0X)0)
= �2y0X+ 2b0X0X = 0
(James Taylor) 3 / 10
OLS Computation
� 2y0X+ 2b̂0X0X = 0
=) b̂0X0X = y0X
=) X0Xb̂ = X0y
=) b̂ = (X0X)�1X0y
Note that we have assumed X 0X is invertible, which is true only if the
columns of X are linearly independent.
(James Taylor) 4 / 10
Ly X t I f XX
o
HEx’x Xix
a
taketransposeonbothside
i 2want B notB
A
x’xp x’y
x’x5 x’xp x’x
1 x’y
f x’x5x’y
OLS Computation
� 2y0X+ 2b̂0X0X = 0
=) b̂0X0X = y0X
=) X0Xb̂ = X0y
=) b̂ = (X0X)�1X0y
Note that we have assumed X 0X is invertible, which is true only if the
columns of X are linearly independent.
(James Taylor) 4 / 10
A medium example, but with matrices
Consider
yt = a + xtb + et
We want to re-write this as y = Xb + e. (This is the hard part)
y =
0
BB
@
y1
…
yT
1
CC
A , X =
0
BBBBB
@
1 x1
1 x2
…
…
1 xT
1
CCCCC
A
, b =
a
b
!
, e =
0
BB
@
e1
…
eT
1
CC
A
(James Taylor) 5 / 10
Y e x t X f t s
I
it I it
s
A medium example, but with matrices
Consider
yt = a + xtb + et
We want to re-write this as y = Xb + e. (This is the hard part)
y =
0
BB
@
y1
…
yT
1
CC
A , X =
0
BBBBB
@
1 x1
1 x2
…
…
1 xT
1
CCCCC
A
, b =
a
b
!
, e =
0
BB
@
e1
…
eT
1
CC
A
(James Taylor) 5 / 10
um
D
A medium example, but with matrices
yt = a + xtb + et , y = Xb + e
Then using the OLS estimator immediately gives
b = (X0X)�1X0y
No further work required; let the computer do the multiplication.
(James Taylor) 6 / 10
I
The OLS Estimator
We solved the first order condition to get
b̂ = (X0X)�1X0y
We should also confirm that this is indeed a minimum (and not some
other stationary point).
This follows from (y�Xb)0(y�Xb) being a convex function in b.
This can be shown by constructing the Hessian matrix and showing it
is positive definite,
But, we’re not going to.
(James Taylor) 7 / 10
OLS Forecasting Example
USA GDP
Australian National University
(James Taylor) 1 / 9
3.4a
Forecasting U.S. GDP – The Details
Last week we computed MSE, AIC and BIC for a range of trend
specification (linear, quadratic and cubic) for US GDP data.
Let’s look at some of the details
First we will compute the OLS estimate using the full sample, the
in-sample forecasting approach
Then we will discuss the recursive forecasting exercise, the pseudo
out-of-sample approach
(James Taylor) 2 / 9
Computing the OLS
We have an easy formula for computing the OLS
We just need to define X appropriately
Let’s compute the OLS estimate for the quadratic specification
(James Taylor) 3 / 9
Computing the OLS
Recall: the quadratic trend model:
yt = b0 + b1t + b2t
2 + et
and we assumed E[et ] = 0.
Written in matrix form
y = Xb + e
(James Taylor) 4 / 9
specification
Y botbi Ltbz24EL
Yi botbI Itb 14E
l l l
l 2 4
iiiit I it Il T T
x
Computing the OLS
y = Xb + e
with
y =
0
BB
@
y1
.
.
.
yT
1
CC
A , X =
0
BBBBBBBB
@
1 1 1
1 2 4
1 3 9
.
.
.
.
.
.
.
.
.
1 T T 2
1
CCCCCCCC
A
, b =
0
BB
@
b0
b1
b2
1
CC
A
The OLS estimator b̂ is then
b̂ = (X0X)�1X0y
(James Taylor) 5 / 9
Fitted Values
Recall: the quadratic trend model:
yt = b0 + b1t + b2t
2 + et
and we assumed E[et ] = 0.
So the fitted value for yt , denoted ŷt is given by
ŷt = E[yt | b̂] = b̂0 + b̂1t + b̂2t2
We use the OLS estimator b̂ instead of the true b because we know b̂.
(James Taylor) 6 / 9
Ttl bit Tttbi tCTtyb’I
im
MATLAB Code
load USGDP.csv;
y = USGDP;
T = size(y,1);
t = (1:T)’;
X = [ones(T,1) t t.ˆ2];
betahat = (X’*X)\(X’*y);
yhat = X*betahat;
MSE = mean((y�yhat).ˆ2);
AIC = T*MSE + 3*2;
BIC = T*MSE + 3*log(T);
(James Taylor) 7 / 9
til
I y
a
t t
ieateuortermo
V TH TH
Artwork
Computing MSFE
To compute a MSFE in the out-of-sample forecasting exercise, we
first need ŷt+h|t
i.e. we compute the OLS estimate, using only the data available at
time t, and use it to obtain the h-step-ahead forecast.
ŷt+h|t = xt+h b̂|t
where xt+h = (1, t + h, (t + h)
2) and the estimate b̂|t is computed
using data only up to time t.
(James Taylor) 8 / 9
time tire
MATLAB Code
T0 = 40;
h = 4; % h�step�ahead forecast
fyhat = zeros(T�h�T0+1,1);
ytph = y(T0+h:end); % observed y {t+h}
for t = T0:T�h
yt = y(1:t);
s = (1:t)’;
Xt = [ones(t,1) s s.ˆ2];
betahat = (Xt’*Xt)\(Xt’*yt);
yhat = [1 t+h (t+h)ˆ2]*betahat;
fyhat(t�T0+1) = yhat; % store the forecasts
end
MSFE = mean((ytph�fyhat).ˆ2);
(James Taylor) 9 / 9
at.mn
t
t_To11 thengoes To12 untilt Th
s II
a
XtthBlt
Properties of the OLS Estimator
Australian National University
(James Taylor) 1 / 8
3.5
Properties of OLS
The linear regression in matrix form is
y = Xb + e
The OLS estimator is then
b̂ = (X0X)�1X0y
which minimises the sum of squared errors
OLS is a linear estimator.
Does it have other nice properties?
(James Taylor) 2 / 8
Quickanalytic
Unbiased Estimators
An estimator q̂ for q is unbiased if
Eq̂ = q
OLS Estimator is Unbiased
Theorem: Let b̂ be the OLS estimator for y = Xb + e. Then
E[b̂ | X, b] = b.
Proof: Right now.
(James Taylor) 3 / 8
(James Taylor) 4 / 8
Have Yexpts p cXx x’y Want ELIIXB p
E B’IXBI Elix’x5x’ylx.es
fEfcx’xxlxpts1lxiBT
tELcxYxTxYxBlxi5ftIELcx’xTx’s x.p
IE flap tKxYx’IE slap
p 11
1 514
p
Covariance Matrix
To do any further inference (confidence intervals, hypothesis testing,
etc.) we also need the covariance matrix of b̂, denoted Sb̂.
We will need an additional assumption, namely
Var[et | xt , b] = s2
Convariance of the OLS Estimator
Theorem: Let b̂ be the OLS estimator for y = Xb + e, with
Var[et | xt , b] = s2. Then Sb̂ = s
2(X0X)�1.
Proof: Right now.
(James Taylor) 5 / 8
Xis bigger1longer
x’x smaller
inmagnitude
(James Taylor) 6 / 8
Epa Cf tuft f Elf’s
x’X x’y B x’xTx’y B
x’x5X Pts f x’xTx’y B
L Btk’xTx’s flux’xTx t B 4
151 cxX5
ftcx’xHx’s flux’xTx’y B
4 1 54s HxX’s ssi Varlet Es
uixtx xc 5
Lx’x x624 1 1 5
G X’XP _G’In
G x’x 1
Gauss-Markov Theorem
Gauss-Markov Theorem
Theorem: In a linear regression with assumptions
E[et | xt , b] = 0, Var[et | xt , b] = s2
the OLS estimator b̂ has the minimum variance of all linear unbiased
estimators for the parameter vector b.
(James Taylor) 7 / 8
BLUE bestlinearunbiasedestimator
2
Summary
The OLS estimator b̂:
is unbiased: E[b̂ | X, b] = b
has covariance matrix Sb̂ = s
2(X0X)�1
is the ‘best’ linear unbiased estimator for b.
(James Taylor) 8 / 8