ECON 61001: Lecture 5
Alastair R. Hall
The University of Manchester
Alastair R. Hall ECON 61001: Lecture 5 1 / 25
Outline of this lecture
Large sample behaviour of OLS with time series data
time series regression models
concepts and conditions OLS as projection
Non-spherical errors
Properties of OLS Generalized Least Squares
Alastair R. Hall
ECON 61001: Lecture 5
2 / 25
Time series variables
Two types of time series:
flow variable – measured over an interval of time, for example monthly consumption expenditures;
stock variable – measured at a moment in time, such as price or quantity of shares owned.
Assume time series observed at regularly spaced intervals.
Frequency at which time series is observed known as sampling frequency.
Alastair R. Hall ECON 61001: Lecture 5 3 / 25
General dynamic regression models
yt = β0,1 + β0,2yt−1 + β0,3ht + β0,4ht−1 + ut = xt′β0 + ut contains lagged “y” and contemporaneous and lagged values
of other variables (in this case “h”).
Presence of lagged variables on the rhs of regression model ⇒ difference between the number of observations on the variables and the number of observations used in the estimation.
If start with T⋆ observations and p is the longest lag on rhs then effective sample size is T⋆ − p with the first p observations being used for conditioning.
For ease of notation, assume effective sample runs t = 1, 2, . . . , T and y0, h0′ , , . . ., y−p+1, h−′ p+1 are available for conditioning.
Alastair R. Hall ECON 61001: Lecture 5 4 / 25
Statistical background
The sampling framework for time series is fundamentally different from that in analysis of cross-section data. Based on stochastic process theory.
In stochastic process theory: time series vt is viewed as evolving before we start observing it and continuing to evolve after we stop observing it, that is
…,v−3,v−2,v−1,v0 v1,v2,v3,…vT ,vT+1,vT+2,….
sample
The entire process {vt}∞t=−∞ is known as a realization of vt.
Key difference: sample once leading to a particular realization of the series.
Alastair R. Hall ECON 61001: Lecture 5 5 / 25
Statistical background
So, as the sample size grows, we see more of one realization of the process.
Does this allow us to uncover the underlying probability distribution of vt as T → ∞?
Answer: Yes! Under certain conditions → stationarity and weak dependence.
We distinguish two forms of stationarity: strong- and weak- stationarity.
Alastair R. Hall ECON 61001: Lecture 5 6 / 25
Strong stationarity
The time series {vt}∞t=−∞ is said to be strongly stationary if the joint probability distribution function, F(.), of any subset of {vt} satisfies:
F(vt1,vt2,…,vtn) = F(vt1+c,vt2+c,…,vtn+c) for any integer n and integer constant c.
Also known as “strict”- stationarity
Alastair R. Hall ECON 61001: Lecture 5 7 / 25
Weak stationarity
The time series {vt}∞t=−∞ is said to be weakly stationary if for all t,s we have:
(i)E[vt]=μ; (independentoft) (ii) Var[vt] = Σ; ( independent of t)
(iii) Cov[vt,vs]=Σt−s. (dependonlyont−s)
ifvt isscalarthenCov[vt,vs]is|t−s|th autocovarianceofvt.
if vt is vector then Cov[vt,vs] is (t −s)th autocovariance matrix of vt:
diagonal elements are autocovariances of vt,i . off-diagonal elements are Cov[vt,i,vs,j]. Cov[vt,vs] = {Cov[vs,vt]}′.
Alastair R. Hall ECON 61001: Lecture 5 8 / 25
Weak dependence
Weak dependence places restrictions on the memory of vt that is, the relationship between vt and vt−s as s → ∞.
If vt is weakly stationary process then weak dependence implies that Cov[vt,vt−s] → 0 as s → ∞ and at a sufficiently fast rate.
Can derive WLLN and CLT for (strong or weak) stationary and weak dependent series (subject to “certain other conditions” that are taken to hold without statement)
Note these conditions are sufficient and not necessary – see lecture notes.
Alastair R. Hall ECON 61001: Lecture 5 9 / 25
Limit Theorems for time series
Weak Law of Large Numbers: Let vt be a stationary and weakly dependent time series with E[vt] = μ then, subject to certain other conditions, it follows that
T
T − 1 v t →p μ .
t=1
Central Limit Theorem: Let vt be a stationary and weakly dependent time series with E[vt] = μ and Cov[vt,vt−j] = Γj then, subject to certain other conditions, it follows that
T
T − 1 / 2 ( v t − μ ) →d N ( 0 , Ω ) ,
t=1 whereΩ=∞−∞Γi = Γ0 +∞i=1{Γi +Γ′i}.
Alastair R. Hall ECON 61001: Lecture 5 10 / 25
More on CLT
Ω is known as the long run variance of vt. Form of Ω = limT→∞ΩT comes from:
− 1 / 2 T
ΩT =VarT (vt−μ) =T
t=1
− 1 T T Cov[vt,vs]
t=1 s=1
TT
= T−1 Γt−s, using stationarity,
t=1 s=1
T− 1 T − i
= Γ0+ T (Γi+Γ−i).
i=1
So limT→∞ΩT = Γ0 + ∞i=1 {Γi + Γ′i}
Alastair R. Hall ECON 61001: Lecture 5
11 / 25
Finite sample properties of OLS
Recall model with stochastic regressors. Assumptions:
SR1: true model is: y = Xβ0 + u. SR2: X is stochastic.
SR3: X is rank k with probability 1. SR4: E[u|X] = 0.
S R 5 : V a r [ u | X ] = σ 02 I T .
SR6: u|X ∼ Normal. Argued OLS unbiased via:
E[βˆT] = β0 + EX Eu|X (X′X)−1X′u
= β0 + EX (X′X)−1X′Eu|X [u] = β0
Alastair R. Hall ECON 61001: Lecture 5 12 / 25
Assumption SR4 with time series
Nature of condition E[u|X] = 0 in time series.
Whether it holds depends on “degree of exogeneity” of variables in xt:
xt is said to be contemporaneously exogenous if E [ut |xt ] = 0 ⇒ ut and xt are uncorrelated.
xt is said to be strictly exogenous if E[ut|{xt}Tt=1] = 0 ⇒ {ut}Tt=1 and {xt}Tt=1 are uncorrelated.
Since E[ut|{xt}Tt=1] = E[ut|X], E[u|X] = 0 only holds if xt is strictly exogenous.
Note: Assumption SR4 must fail if xt contains lagged values of y, see Lecture Notes. In these cases, in general, E [βˆT ] ̸= β0.
Alastair R. Hall ECON 61001: Lecture 5 13 / 25
Large sample analysis of OLS
Assuming xt = (1,x2′,t)′ and x2,t is function of (vector) ht and yt−i,ht−j (fori,j>0).
Assumption TS1: yt = xt′β0 + ut, t = 1,2,…T Assumption TS2: For (yt , ht′ ) is a weakly stationary, weakly
dependent time series.
Assumption TS3: E[xtxt′] = Q, a finite, positive definite matrix.
Assumption TS4: E[ut |xt] = 0 for all t = 1,2,…,T. Assumption TS5: Var[ut |xt] = σ02 for all t = 1,2,…,T. Assumption TS6: For all t ̸= s, E[utus |xt,xs] = 0
Alastair R. Hall ECON 61001: Lecture 5 14 / 25
Large sample analysis of OLS
Under these conditions can use essentially same arguments as for cross-section data to deduce the following results:
Theorem If Assumptions TS1 – TS4 hold then βˆT is a consistent estimator for β0.
Theorem If Assumptions TS1 – TS6 hold then: T1/2(βˆT−β0)→d N0,σ02Q−1.
All large sample inference procedures described in Lecture 4 go through.
Alastair R. Hall ECON 61001: Lecture 5 15 / 25
Dynamic completeness
Regression models can be interpreted as a statement about the conditional mean of yt given an information set.
In cross-section data, this information set contains information about the ith sampling unit: E[yi|xi] = xi′β0.
In time series data, the relevant information set is not only the ht but the history of both y and h that is,
It = {ht,yt−1,ht−1,yt−2,ht−2,…,y1,h1 }. Therefore, if regression model specifies conditional mean then we
are really stating that we believe:
E[yt |It] = xt′β0,
– if this statement is true then model is said to be dynamically complete.
Alastair R. Hall ECON 61001: Lecture 5 16 / 25
Dynamic completeness
It is convenient to introduce this property as a specific assumption. Assumption TS7: E[yt | It] = xt′β0.
Note:
Assumption TS7 ⇒ Assumptions TS4 and TS6, see Lecture Notes
Alastair R. Hall ECON 61001: Lecture 5 17 / 25
OLS as projection
Suppose wish to predict yt given xt using y ̃t = c(xt).
Issue: choice of c(·).
If choose c ( · ) to minimize
M S E ( y ̃ t ) = E [ y t − y ̃ t ] 2
→ co(xt) = E[yt|xt].
If co ( · ) unknown then might restrict to class of linear forecasts
ylp = α′x but then what should α be? tt
Alastair R. Hall ECON 61001: Lecture 5 18 / 25
OLS as projection
Choice that minimizes MSE (over class of linear forecasts) is one associated with the linear projection of yt on xt which has the property
E[(yt −α′xt)xt]=0.
→ α = {E[xtxt′]}−1 E[xtyt] ∼ population analogue to βˆT.
So OLS can be justified as estimator of weights in linear projection of yt on xt – but this does not justify using estimators to learn about how xt affects yt, for this we need to impose assumptions about the relationship between the variables.
Alastair R. Hall ECON 61001: Lecture 5 19 / 25
Non-spherical errors
Have developed large sample framework for inference in cross-section and time series data.
Do assumptions always hold? No! In next part of course, we consider consequences of violations of assumptions about second moments of error term.
Terminology (using fixed regressor model to simplify):
Var[u] = σ02IT then u is said to have a spherical distribution.
Var[u] ̸= σ02IT then u is said to have a non-spherical distribution.
See Lecture notes Section 4.1 for origins of these terms.
Alastair R. Hall ECON 61001: Lecture 5 20 / 25
Non-spherical errors
Specifically, we consider two examples of non-spherical errors:
Cross-section data with Var [ui |xi ] = σi2 – heteroscedasticity Time series data with
Var[ut|xt]=σt2 -heteroscedasticity Cov [ut , us |xt , xs ] ̸= 0 – serial correlation
Alastair R. Hall ECON 61001: Lecture 5 21 / 25
OLS with non-spherical errors
Recall that our model is:
y = Xβ0 + u
where
CA1: true model is: y = Xβ0 + u.
CA2: X is fixed in repeated samples. CA3: X is rank k.
CA4: E[u] = 0.
CA5-NS Var [u] = Σ where Σ is a T × T positive definite matrix.
CA6: u ∼ Normal.
Alastair R. Hall ECON 61001: Lecture 5 22 / 25
OLS with non-spherical errors
E [βˆT ] = β0 as impose Assumptions CA1-CA4. Var[βˆT] = (X′X)−1X′ E[uu′]X(X′X)−1 and so
Var[βˆT] = (X′X)−1X′ΣX(X′X)−1 ̸= σ02(X′X)−1. βˆT ∼ Nβ0,(X′X)−1X′ΣX(X′X)−1.
So inference procedures from Lectures 2, 3 and 4 are not valid because are based on wrong formula for Var[βˆT].
Alastair R. Hall ECON 61001: Lecture 5 23 / 25
Efficiency/GLS
Conditions of Gauss-Markov Theorem do not hold and so this result cannot be used to justify OLS is BLUE.
Can we characterize the BLUE estimator? Yes!
It is the The Generalized Least Squares (GLS) estimator of β0
given by
βˆGLS = (X′Σ−1X)−1X′Σ−1y
If Assumptions CA1-CA4, CA5-NS and CA6 hold then:
βˆGLS ∼Nβ0,(X′Σ−1X)−1.
But need Σ – what happens if Σ is unknown?
Alastair R. Hall ECON 61001: Lecture 5 24 / 25
Further reading
Notes: Sections 2.12, 3.3-3.5, 4.1, 4.2.
Greene:
time series data Section 20.1
conditions for limit theorems Section 20.2 and Section 20.4 (but more detail than in the course)
OLS with non-spherical errors Sections 9.1 and 9.2 (finite sample part)
Alastair R. Hall ECON 61001: Lecture 5 25 / 25