CS计算机代考程序代写 matlab AI Linear Algebra Review

Linear Algebra Review

Australian National University

(James Taylor) 1 / 6

b0

Linear Algebra Review

Matrix multiplication

Linear Transformations

Transposition

Determinants and Invertability

(James Taylor) 2 / 6

Matrix Multiplication

(James Taylor) 3 / 6

L it.I i I

Linear Transformations

(James Taylor) 4 / 6

I Dinention Def e f 212 IRis a lineartransformation if
f ax afax Ha c IR

HXE2k
Theorem If fi2K 212islinear thenfcxt.mxforsomeMEIR

f IED Proof Letfex m X fcaxi m.ca
x aCmx a fox

n m
islinear if flax a fix Hae212 XEIRn Dinentian Def I e f 2127212

Egl f IGT EEG flatlyD ft D Ya
a If afan

Egs f III I f late’GD fCEYyD If a I’d afan

Theorem Let f HI 2km f islinearoftheeexistamatrix µ suchthat fix MX

claim f If 415L’S To 15
0

p EY

Transposition

(James Taylor) 5 / 6

µT µ flips rows t columns

I IT xELR x x’s El 43
in

Properties I NHU
L CAB B’H KABK c AB CB’A

3 In O’if I In

4 IU is symmetric if µ M
J AtB A’tB

6 mxn In xm

Determinants and Invertability

(James Taylor) 6 / 6

0
Def a matrix14 is invertible if thereexists I

a matrix N suchthat MN NIU I write N hi

I dim Is M EIR invereible Yes aslongas
m to

Asnd m s m in I
2 dim Is 9 invertible Maybe
I isnotthurible I isnotthurible as I II 11 Ez f o
If IT isnotthurible I’s is thurible i4371 11 9775 o

X o y T h ft Z
Properties 1 If M is mxnmatrix nth theIUisnotinvertible

2 If Mis nxnmatrix MN In thenNM In NIU
t

3 M J At 4 Shoe SocksTheorem CNN Nki

Determinants and Invertability

(James Taylor) 6 / 6

0
Determinants Let IUis an hxumatrix

DetUU is annotionofthesizeofM or absolutevalue

det III 7 s Det III
Det NU L detLM Det K M KhdetVU

det L Ml t detCMI

Det MN Det M Det N

Det MTN f det Un t dit N

Theorem µ is interable ifandonlyif det Uu to

Det I bae ad bc

Models and Specifications

Australian National University

(James Taylor) 1 / 8

Ad Hoc methods

A collection of often-reasonable rules of thumb

typically easy to implement

does not require a particular statistical model

examples: random walk, historical mean, weighted historical mean

(James Taylor) 2 / 8

OLSmodel

Ad Hoc methods

Statistical properties of the forecasts are often di�cult (or impossible)

to analyse

Described as a “black box”

Example: Ordinary Least Squares (OLS)

Example: Exponential Smoothing

(James Taylor) 3 / 8

Specification vs Model

Model:

a full statistical description

we know the data generating process

and can generate data according to the model

Specification:

A component of a model, but not yet a full model

not su�cient to generate data

(James Taylor) 4 / 8

A
fake

Specification vs Model – Example

Consider

yt = xtb + et

with yt as the variable of interest,

xt as a vector of explanatory variables,

b as a parameter vector, and

et as noise

(James Taylor) 5 / 8

data error
tr t

T
parameters

Specification vs Model – Example

yt = xtb + et

If we assume et ⇠ F for some distribution F , then this is a model.

E.g. If F is a zero-mean normal distribution, then this is the classical

linear regression model

We can generate data, know the full statistical properties of yt

If we don’t assume a particular distribution, we don’t have a model

E.g. Only assume zero-mean, not necessarily normal

We can’t generate data, can’t determine the full statistical properties

of yt

(James Taylor) 6 / 8

Et N lo62

The Specification

yt = xtb + et

Suppose we only specify the conditional expectation E[yt | b]:

ŷt = E[yt | b] = xtb

This equality is implied by the assumption et ⇠ N (0, s2), but

it is also implied by any model with E[et | xt , b] = 0.

So even given this equality, we still cannot generate data, determine

conditional densities f (yT+h | IT , q), etc.

(James Taylor) 7 / 8

Estimation

So far we have assumed all parameters are known

But practically we need to estimate them

Suppose we assume only that

ŷt = E[yt | b] = xtb

What then is a good estimator for b?

(James Taylor) 8 / 8

p
E

It Et o

OLS without Matrices

Why we use matrices

Australian National University

(James Taylor) 1 / 7

b2

OLS – Definition

We assumed that, on average, yt is close to ŷt = xtb.

So the error (yt � xtb) should be small on average

One option is to find the value of b which minimises the sum of

squared errors.

b̂ = argmin
b

T

Â
t=1

(yt � xtb)2

b̂ is the OLS estimator

(James Taylor) 2 / 7

Yt Xtftst E StlB to

arg argument

OLS – Computation

Can we solve b̂ analytically?

We want a closed form formula for b̂

We can indeed find one, and it’s easy to implement (in Matlab)

The simplest derivation of the formula uses matrix di↵erentiation

But here we’ll try small examples without matrices

(James Taylor) 3 / 7

A small example

Consider

yt = a + et

Only one parameter a

OLS estimator is

â = argmin
a

T

Â
t=1

(yt � a)2

As usual, we find the minimiser through di↵erentiation

â =
1

T

T

Â
t=1

yt

(James Taylor) 4 / 7

Xt l Ht
I

FaII Iyt a f IEIa lyE sayt t 24 IELLyt t 22 0

EHyt EEH a T I T EIYt

A small example

Consider

yt = a + et

Only one parameter a

OLS estimator is

â = argmin
a

T

Â
t=1

(yt � a)2

As usual, we find the minimiser through di↵erentiation

â =
1

T

T

Â
t=1

yt

(James Taylor) 4 / 7

IELst o

A medium example

Consider

yt = a + xtb + et

Only two parameters, a and b

OLS estimator is

(â, b̂) = argmin
a,b

T

Â
t=1

(yt � a � xtb)2

As usual, we find the minimiser through di↵erentiation

But, need to do two derivatives (two parameters)

(James Taylor) 5 / 7

Xt 5
singlenumber

(James Taylor) 6 / 7

IT 5 Iyt X XtBI IILy Igt’t L’tXtB 2aYt HtYtf k XtB
1Ex Lyt 2XtB o

Ip 5 Iyt a Xtpi EIkXtp Htyt H XtITI o

IEExtf EI ltxtYtt xt

f I Xt EEXtye t IEEXtt4

f I TEXtTtt EEIXt
T

A medium example

â =
 x2t ·  yt �  xt · Â(xtyt)

T Â x2t � (Â xt)2

b̂ =
T Â(xtyt)� Â xt · Â yt

T Â x2t � (Â xt)2

Imagine there were k parameters.

(James Taylor) 7 / 7

Matrix Di↵erentiation

Australian National University

(James Taylor) 1 / 10

3.3

Matrix Di↵erentiation

Matrix Di↵erentiation – just like scalar di↵erentiation, but keep track

of all the partial derivatives

Gives lovely compact expressions

Easy to manipulate

Results look reasonable, so are easy to remember

(James Taylor) 2 / 10

Di↵erentiating a Real-Valued Function – Definition

Let y : Rn ! R. The derivative of y(x) with respect to x is the row
vector

∂y(x)
∂x

=

∂y(x)
∂x1

· · · ∂y(x)∂xn

Example: Let x = (x1, x2, x3)
0 and y(x) = x1 + x2x3 � x43 . Then

∂y(x)
∂x

=

∂y(x)
∂x1

∂y(x)
∂x2

∂y(x)
∂x3

=

1 x3 x2 � 4×33

(James Taylor) 3 / 10

000

Di↵erentiating a Vector-Valued Function – Definition

Let y : Rn ! Rm. That is

y(x) = y

0

BB
@

x1

xn

1

CC
A =

0

BB
@

y1(x)

ym(x)

1

CC
A

The derivative of y(x) with respect to x, also known as the Jacobian

matrix of y is

Jy(x) =
∂y(x)

∂x
=

0

BB
@

∂y1(x)
∂x1

· · · ∂y1(x)∂xn

. . .

∂ym(x)
∂x1

· · · ∂ym(x)∂xn

1

CC
A

(James Taylor) 4 / 10

7

l

dXex

Di↵erentiation – Linear Transformations

Let y : R3 ! R2 with

y(x) =

y1(x)

y2(x)

!
=

x1 + 3×2 + 4×3

x2

!
=

1 3 4

0 1 0

!
0

BB
@

x1

x2

x3

1

CC
A

y is linear in x, so it’s derivative should be a constant matrix

Jy(x) =

∂y1(x)

∂x1
∂y1(x)

∂x2
∂y1(x)

∂x3
∂y2(x)

∂x1
∂y2(x)

∂x2
∂y2(x)

∂x3

!
=

1 3 4

0 1 0

!

(James Taylor) 5 / 10

g 41 Y fcxtmx.fxfcxt.in
a constant

41

y

Di↵erentiation – Linear Transformations

Di↵erentiating y(x) = Ax

Theorem: Let A be an m⇥ n matrix, and let y : Rn ! Rm be given by
y(x) = Ax. Then

∂y(x)
∂x

=

∂x
Ax = A

Proof: Right Now.

(James Taylor) 6 / 10

text mx fix m

(James Taylor) 7 / 10

LetXCX AX Want tix A

I it i t
ex

H H t an ar a n
JX 2 2 An

2441
AH Aw Azn

2441 2t
I H

t

24m41 24m41 24m41

2 1 2 2 dXh

2
T Ax A

Di↵erentiation – Quadratic Forms

Let y : R3 ! R with

y(x) = x21 + x
2
2 + 3x1x2 + 6x1x3 =


x1 x2 x3


0

BB
@

1 3 4

0 1 0

2 0 0

1

CC
A

0

BB
@

x1

x2

x3

1

CC
A

Functions of the form y(x) = x0Ax are called quadratic forms

y is quadratic in x, so its derivative should be linear in x.

Jy(x) =

2×1 + 3×2 + 6×1 2×2 + 3×1 6×1

(James Taylor) 8 / 10

hotuniqueg

A t x
powerarealways2

00 OfO 0 000 000 0
Or24

27 2 113
1611

3 guard linear

Di↵erentiation – Quadratic Forms

Di↵erentiating y(x) = x0Ax

Theorem: Let A be an n⇥ n matrix, and let y : Rn ! R be given by
y(x) = x0Ax. Then

∂y(x)
∂x

=

∂x
x0Ax = x0(A+ A0)

Proof: Right Now.

(James Taylor) 9 / 10

1dtm TxMX 2mX nextMX XLmtm

(James Taylor) 10 / 10

LetXlX x’Ax Want Xcx x’LALA’t f

ix x’ax tx.xn.xy aaii.aai.hn IYmf x xnLaYta
an

n n Amx101mWtannin

AntitankXrt ainxixn auxixz a XEl.itamxzxn i.it nlaniXitanLXst tannXn

241 1
27 2911 1191212 1

it AinXn 194 2 t i 10inXu

x.in i e xnfl i ntiLi D
X Xn

A t 9131931 Aintan
away 2am Ay1932 Autam

anitanamian i i ai
2auX XzLantau tX Caistasi 1 tlnlanitain efird

1stcomponent dI x ATA

OLS with Matrices

Australian National University

(James Taylor) 1 / 10

3.4

OLS Computation

Recall

b̂ = argmin
b

T

Â
t=1

(yt � xtb)2

or, in matrix form

b̂ = argmin
b

(y�Xb)0(y�Xb)

(James Taylor) 2 / 10

e IKil

I I

OLS Computation

We want to find b̂, so di↵erentiate the sum of squared errors with respect

to b, and set equal to zero

∂b
(y�Xb)0(y�Xb) =

∂b
(y0y� y0Xb � b0X0y+ b0X0Xb)

=

∂b
(y0y� 2y0Xb + b0X0Xb)

= �2y0X+ b0(X0X+ (X0X)0)

= �2y0X+ 2b0X0X = 0

(James Taylor) 3 / 10

Recallftp.AB A B’AB f’CATA’s

fix’yky’xB
Lovelyif P’x’yy’xp

f y XB ly Xp Ip y’y P’x’y y’xBtB’X’Xp Trueman’scase
Y’xp p justasiyu

number1
Kk TH

o y’x y’xt B X’xtlx’XY
Ki

AsP’x’yiski isymmetric

24 28x’x O p’xyy’xp

x’x x’x xx

OLS Computation

We want to find b̂, so di↵erentiate the sum of squared errors with respect

to b, and set equal to zero

∂b
(y�Xb)0(y�Xb) =

∂b
(y0y� y0Xb � b0X0y+ b0X0Xb)

=

∂b
(y0y� 2y0Xb + b0X0Xb)

= �2y0X+ b0(X0X+ (X0X)0)

= �2y0X+ 2b0X0X = 0

(James Taylor) 3 / 10

OLS Computation

� 2y0X+ 2b̂0X0X = 0

=) b̂0X0X = y0X

=) X0Xb̂ = X0y

=) b̂ = (X0X)�1X0y

Note that we have assumed X 0X is invertible, which is true only if the

columns of X are linearly independent.

(James Taylor) 4 / 10

Ly X t I f XX
o

HEx’x Xix
a

taketransposeonbothside
i 2want B notB

A
x’xp x’y
x’x5 x’xp x’x

1 x’y

f x’x5x’y

OLS Computation

� 2y0X+ 2b̂0X0X = 0

=) b̂0X0X = y0X

=) X0Xb̂ = X0y

=) b̂ = (X0X)�1X0y

Note that we have assumed X 0X is invertible, which is true only if the

columns of X are linearly independent.

(James Taylor) 4 / 10

A medium example, but with matrices

Consider

yt = a + xtb + et

We want to re-write this as y = Xb + e. (This is the hard part)

y =

0

BB
@

y1

yT

1

CC
A , X =

0

BBBBB
@

1 x1

1 x2

1 xT

1

CCCCC
A

, b =

a

b

!
, e =

0

BB
@

e1

eT

1

CC
A

(James Taylor) 5 / 10

Y e x t X f t s

I

it I it
s

A medium example, but with matrices

Consider

yt = a + xtb + et

We want to re-write this as y = Xb + e. (This is the hard part)

y =

0

BB
@

y1

yT

1

CC
A , X =

0

BBBBB
@

1 x1

1 x2

1 xT

1

CCCCC
A

, b =

a

b

!
, e =

0

BB
@

e1

eT

1

CC
A

(James Taylor) 5 / 10

um
D

A medium example, but with matrices

yt = a + xtb + et , y = Xb + e

Then using the OLS estimator immediately gives

b = (X0X)�1X0y

No further work required; let the computer do the multiplication.

(James Taylor) 6 / 10

I

The OLS Estimator

We solved the first order condition to get

b̂ = (X0X)�1X0y

We should also confirm that this is indeed a minimum (and not some

other stationary point).

This follows from (y�Xb)0(y�Xb) being a convex function in b.

This can be shown by constructing the Hessian matrix and showing it

is positive definite,

But, we’re not going to.

(James Taylor) 7 / 10

OLS Forecasting Example

USA GDP

Australian National University

(James Taylor) 1 / 9

3.4a

Forecasting U.S. GDP – The Details

Last week we computed MSE, AIC and BIC for a range of trend

specification (linear, quadratic and cubic) for US GDP data.

Let’s look at some of the details

First we will compute the OLS estimate using the full sample, the

in-sample forecasting approach

Then we will discuss the recursive forecasting exercise, the pseudo

out-of-sample approach

(James Taylor) 2 / 9

Computing the OLS

We have an easy formula for computing the OLS

We just need to define X appropriately

Let’s compute the OLS estimate for the quadratic specification

(James Taylor) 3 / 9

Computing the OLS

Recall: the quadratic trend model:

yt = b0 + b1t + b2t
2 + et

and we assumed E[et ] = 0.

Written in matrix form

y = Xb + e

(James Taylor) 4 / 9

specification

Y botbi Ltbz24EL

Yi botbI Itb 14E

l l l
l 2 4

iiiit I it Il T T
x

Computing the OLS

y = Xb + e

with

y =

0

BB
@

y1
.
.
.

yT

1

CC
A , X =

0

BBBBBBBB
@

1 1 1

1 2 4

1 3 9

.

.

.
.
.
.

.

.

.

1 T T 2

1

CCCCCCCC
A

, b =

0

BB
@

b0

b1

b2

1

CC
A

The OLS estimator b̂ is then

b̂ = (X0X)�1X0y

(James Taylor) 5 / 9

Fitted Values

Recall: the quadratic trend model:

yt = b0 + b1t + b2t
2 + et

and we assumed E[et ] = 0.

So the fitted value for yt , denoted ŷt is given by

ŷt = E[yt | b̂] = b̂0 + b̂1t + b̂2t2

We use the OLS estimator b̂ instead of the true b because we know b̂.

(James Taylor) 6 / 9

Ttl bit Tttbi tCTtyb’I

im

MATLAB Code

load USGDP.csv;

y = USGDP;

T = size(y,1);

t = (1:T)’;

X = [ones(T,1) t t.ˆ2];

betahat = (X’*X)\(X’*y);
yhat = X*betahat;

MSE = mean((y�yhat).ˆ2);
AIC = T*MSE + 3*2;

BIC = T*MSE + 3*log(T);

(James Taylor) 7 / 9

til

I y

a
t t

ieateuortermo
V TH TH

Artwork

Computing MSFE

To compute a MSFE in the out-of-sample forecasting exercise, we

first need ŷt+h|t

i.e. we compute the OLS estimate, using only the data available at

time t, and use it to obtain the h-step-ahead forecast.

ŷt+h|t = xt+h b̂|t

where xt+h = (1, t + h, (t + h)
2) and the estimate b̂|t is computed

using data only up to time t.

(James Taylor) 8 / 9

time tire

MATLAB Code

T0 = 40;

h = 4; % h�step�ahead forecast
fyhat = zeros(T�h�T0+1,1);
ytph = y(T0+h:end); % observed y {t+h}
for t = T0:T�h

yt = y(1:t);

s = (1:t)’;

Xt = [ones(t,1) s s.ˆ2];

betahat = (Xt’*Xt)\(Xt’*yt);
yhat = [1 t+h (t+h)ˆ2]*betahat;

fyhat(t�T0+1) = yhat; % store the forecasts
end

MSFE = mean((ytph�fyhat).ˆ2);

(James Taylor) 9 / 9

at.mn

t

t_To11 thengoes To12 untilt Th

s II
a

XtthBlt

Properties of the OLS Estimator

Australian National University

(James Taylor) 1 / 8

3.5

Properties of OLS

The linear regression in matrix form is

y = Xb + e

The OLS estimator is then

b̂ = (X0X)�1X0y

which minimises the sum of squared errors

OLS is a linear estimator.

Does it have other nice properties?

(James Taylor) 2 / 8

Quickanalytic

Unbiased Estimators

An estimator q̂ for q is unbiased if

Eq̂ = q

OLS Estimator is Unbiased

Theorem: Let b̂ be the OLS estimator for y = Xb + e. Then

E[b̂ | X, b] = b.

Proof: Right now.

(James Taylor) 3 / 8

(James Taylor) 4 / 8

Have Yexpts p cXx x’y Want ELIIXB p

E B’IXBI Elix’x5x’ylx.es

fEfcx’xxlxpts1lxiBT

tELcxYxTxYxBlxi5ftIELcx’xTx’s x.p

IE flap tKxYx’IE slap

p 11
1 514

p

Covariance Matrix

To do any further inference (confidence intervals, hypothesis testing,

etc.) we also need the covariance matrix of b̂, denoted Sb̂.

We will need an additional assumption, namely

Var[et | xt , b] = s2

Convariance of the OLS Estimator

Theorem: Let b̂ be the OLS estimator for y = Xb + e, with

Var[et | xt , b] = s2. Then Sb̂ = s
2(X0X)�1.

Proof: Right now.

(James Taylor) 5 / 8

Xis bigger1longer
x’x smaller

inmagnitude

(James Taylor) 6 / 8

Epa Cf tuft f Elf’s
x’X x’y B x’xTx’y B
x’x5X Pts f x’xTx’y B

L Btk’xTx’s flux’xTx t B 4
151 cxX5

ftcx’xHx’s flux’xTx’y B

4 1 54s HxX’s ssi Varlet Es
uixtx xc 5

Lx’x x624 1 1 5
G X’XP _G’In

G x’x 1

Gauss-Markov Theorem

Gauss-Markov Theorem

Theorem: In a linear regression with assumptions

E[et | xt , b] = 0, Var[et | xt , b] = s2

the OLS estimator b̂ has the minimum variance of all linear unbiased

estimators for the parameter vector b.

(James Taylor) 7 / 8

BLUE bestlinearunbiasedestimator

2

Summary

The OLS estimator b̂:

is unbiased: E[b̂ | X, b] = b

has covariance matrix Sb̂ = s
2(X0X)�1

is the ‘best’ linear unbiased estimator for b.

(James Taylor) 8 / 8