1 Notes on Time Series Models and Forecasting
These notes cover some relatively simple tools to help you understand and fore-
cast macroeconomic data. We will discusss stochastic processes, with a focus on
autoregressive (AR) models. We will discuss how to work with these processes,
how to estimate the parameters, and how to forecast with these processes.
2 Stochastic Processes
We will use the concept of a stochastic process to develop our process for making
forecasts. The formal process for making forecasts involves the expectations
operator. We describe these below.
2.1 Preliminaries
A stochastic process generates a sequence of random variables, indexed by time.
If {yi} is a stochastic process, its sample path, or realization, is an assignment
to each i of a possible value for yi. Thus, a realization of {yi} is a sequence
of real numbers, indexed by time. For example, suppose we have the variable
GNP which is measured annualy, and we have values for GNP for 50 years. Its
sample path is those 50 data points for GNP.
2.2 Understanding the Expectations Operator
We will use the expectations operator in these notes. This will help us mathe-
matically express how we make forecasts and how we evaluate the moments of
random variables. When applied to a random variable, the expectations opera-
tor means to find the expected value of the random variable. The expected value
will be formed based on the random variable’s stochastic process, which we will
discuss below. The expectations operator is denoted as E. It is a linear opera-
tor. For the random variable y, and for the parameter θ, this linearity means
that E(θy) = θE(y). That is, because the expectations operator is linear, we
can just pull out multiplicative constant terms from inside the brackets.
The expectations operator can also be used to find the expected value of
a function of a random variable. For exampe, take the mean zero random
variable y, which has a constant variance that is denoted by σ2y. Then we have
that E(y2) = σ2y.
1
2.3 Introduction to Stochastic Processes
Note that we only observe one particular time series sample path, or realization
of the stochastic process {yi}. Ideally, we would like to observe many different
realizations of the stochastic process. If we did get a chance to see many re-
alizations (in particular, suppose that the number of realizations went to ∞),
then we would form the expected value of the random variable y at date t as:
E(yt) = lim
N−→∞
1/N
N∑
i=1
yit
This is called the ensemble mean. It is a theoretical concept, but is useful to help
us fully understand what a stochastic process is. Of course, we only see a single
realization of U.S. GNP – we don’t get a chance to see other realizations. In
some cases, however, the time series average of a single realization is a consistent
estimate of the mean, and is given by:
E(yt) = 1/T
T∑
t=1
yt
Our goal for forecasting is to predict the future value of a random variable,
using infomation that is available today. That is, we will make a forecast of
the random variable, y, at a future time. Note that for definitional purposes,
we will always define the very first observation, or data point, of the random
variable to be period number 1.
Next, suppose we want to forecast the random variable y one year ahead
using information available to us right now at date t. We then want to formulate
the following mathematical expression:
Etyt+1 | It
Note what this expression tells us. Etyt+1 means we make a forecast of the
variable yt+1. Note that we have subscripted the expectations operator with
a “t”. This means we are forming the expectation using information that is
available to us as of today (time t). On the right hand side, It denotes the
information set that we use, and that contains information up through today
(and may also include information from the past). Thus, this expression denotes
the mathematical prediction for the variable y one period into the future. We
will look at examples of this below.
In some cases, the unconditional mean of the random variable will be the
best predictor. But in some cases, we can make use of other information to
predict the future, instead of just using the mean of the random variable. These
notes tells us how to identify and estimate the parameters of the best predictor
of a random variable.
2
张熠华�
张熠华�
张熠华�
2.4 Autocovariance
Autocovariance tells us how a random variable at one point in time is related to
the random variable at a different point in time. Consider a mean zero sequence
of random variables: {xt}. The jth autocovariance is given by:
γjt = E(xtxt−j)
Thus, autocovariance is just the covariance between a random variable at
different points of time. Specifically, this tells us the extent to which the random
variable today, and the random variable j periods ago tend to be statistically
related. If j =1, then this formula tells us how the random variable is related
between today and yesterday. Notice that if j = 0, then the “0th”autocovariance
is just the variance: E(xtxt) = E(xt)2. The autocovariance can be estiamted
from a sample of data, and is given by:
γ̂jt = 1/(T − j)
T∑
1+j
xtxt−j
Note how we construct this autocovariance. We need to adjust the sample
length to take into consideration that we will be calculating relationships be-
tween current and past values of the random variable. To see this, suppose that
j = 1. In that case, we begin with observation number 2, which allows us to
connect the random variable in period 2 with the random variable in period 1,
which as you recall from above is the first period. Suppose that j = 2. Then we
begin in period 3, which allows us to connect the random variable in period 3
with the random variable in period 1.
Why are autocovariances important?
Autocovariance measures the covariance between a variable at two different
points in time. If there is a relationship between a variable at different points
in time, then we can potentially use current and past values about the random
variable to predict its future.
Forecasting: If we know that the statistical relationship between a variable
at different points in time, this can help us forecast that variable in the future.
For example, suppose output is higher than normal today, and that when output
is higher than average today, it also tends to higher than average tomorrow. This
will lead us to forecast output tomorrow to higher than average. The details of
how we make forecasts will be considered later.
Economic Modeling: Our economic models summarize the behavior of
economic variables. If variables have large autocovariances, then some mecha-
nism is causing persistence in these variables, and our models should explain
that persistence through preferences, technologies, policies, or shocks. On the
other hand, if variables have zero autocovariances, then the variables have no
persistence, and our models should explain this as well.
3
张熠华�
张熠华�
2.5 Stationarity
2.5.1 Covariance Stationarity (Weak Stationarity)
If neither the mean of a random variable nor the autocovariances of the random
variable depend on the calendar date, then the stochastic process is called co-
variance stationary. This is also called a weakly stationary stochastic process.
The two terms are interchangeable. Technically, these requirements are given
by:
E(yt) = µ
E(yt − µ)(yt−j − µ) = γj
Note that the calendar date does not affect either the mean nor the autoco-
variances in the terms above.
Example 1: Suppose {yt} is a mean zero process, with all autocovariances
equal to zero, and with a constant variance denoted as σ2. Verify that this
process is covariance stationary.
Example 2: Suppose yt = αt + εt, where t = {1,2,3,…} and ε is a normal
random variable with mean 0 and variance σ2. Show that this process is not
covariance stationary.
Exercise: show that for a covariance stationary process, that the following
property holds: E(ytyt−j) = E(ytyt+j).
(Note that since we define γj as E(ytyt−j), then we define γ−j as E(ytyt+j))
2.5.2 Strict Stationarity
For what we will focus on in this class, covariance (weak) stationariy is what is
required. I will spend just a bit of time discussing strict stationary.A process
is strictly stationary if the joint distribution for the stochastic process does not
depend on time. Note that a process that is strictly stationary with finite second
moments must be covariance stationary. Since many issues we are interested in
don’t require strict stationary, we will hereafter refer to a stationary time series
as one that is covariance stationary.
2.5.3 Autocorrelation
Just as it is useful to normalize covariances by dividing them by the respective
variables’ standard deviations, it is also useful to normalize autocovariances.
The jth autocorrelation is denoted as ρj =
γj
γ0
, and is given by:
E(ytyt−j)√
E(yt)2
√
E(yt−j)2
Note that for ρ0, it is equal to 1. Thus, autocorrelation tells us the correlation
between a variable at two different points in time.
4
张熠华�
张熠华�
张熠华�
2.6 The White Noise Process
White noise is a serially uncorrelated process. That is, all of the autocovariances
are zero. Consider the following realization, {εt} with zero-mean. It’s first and
second moments are given by:
E(εt) = 0
E(ε2t ) = σ
2
E(εtετ ) = 0, τ 6= t
Note that this latter feature implies that the autocovariances are zero. For
the white noise process, the best prediction of the future value of the random
variable is just its mean value.
The white noise process is key because it is the building block for most of
the other stochastic processes that we are interested in.
The importance of white noise?
In studying certain classes of models, including rational expectations models,
we will see that the change in prices of some assets should be white noise. We we
will also be interested in understanding how a macroeconomic variable responsds
to a completely unanticipated change in an exogenous variable. This is called
impulse response function analysis. We will discuss this later in the course, time
permitting.
2.7 Moving Average Processes
Recall the white noise process, {εt}.We now use this process to build a process
in which one can use previous values of the variable to forecast future values.
The first such process is the moving average (MA) process. We first construct
the MA(1) process:
yt = µ+ εt + θεt−1, E(ε) = 0
This is called a moving average process because the process is a weighted
average of a white noise process.
The term εt is often called an innovation. Note that it is a mean-zero
process. We will also assume that ε has constant variance: E(ε2) = σ2
The unconditional expectation of this process is:
E(yt) = µ+ E(εt) + θE(εt−1) = µ
The variance is of y is a function of the variance of ε :
5
张熠华�
张熠华�
张熠华�
张熠华�
张熠华�
张熠华�
E(yt − µ)2 = E(εt)2 + θ2E(εt−1)2 = (1 + θ2)σ2
The first autocovariance is given by:
E((yt − µ)(yt−1 − µ)) = θσ2
To see this, we simply expand the expression as follows:
E((yt − µ)(yt−1 − µ)) = E((εt + θεt−1)(εt−1 + θεt−2))
Expanding this further, we see that there are four terms in this expression
as follows:
E(εtεt−1)
E(θεtεt−2)
E(θεt−1εt−1)
E(θ2εt−1εt−2)
Because ε is a white noise process, there is only one non-zero term among
these four terms, which is E((εt−1)(θεt−1)). Note that this is equal to θσ2.
Exercise: Verify that all other autocovariances are 0, and verify that this is
a covariance stationary process. .
The MA (1) process has non-zero autocorrelation at lag 1. It is given by:
ρ1 =
θσ2
(1 + θ2)σ2
=
θ
1 + θ2
The magnitude of this coeffi cient depends on the value of the parameter θ.
But note that the maximum value is 0.5.
How do shocks today affect this random variable today and into the future?
By construction, a one-unit shock to εt today changes the random variable yt
by the same amount. Tomorrow, this shock affects yt by factor θ. But after
that, the shock today has no affect on the random variable of interest.
The qth order MA process is denoted by MA(q), and is given by:
yt = µ+ εt −
q∑
i=1
θiεt−i
Note that its variance is given by:
6
张熠华�
张熠华�
张熠华�
张熠华�
张熠华�
张熠华�
张熠华�
张熠华�
张熠华�
张熠华�
张熠华�
张熠华�
γ0 = (1 +
∑
θ2i )σ
2
Exercise: verify that the variance is given by this formula
The first autocorrelation, ρ1,is given by:
ρ1 =
θ1 + θ1θ2
1 + θ21 + θ
2
2
Note that in these higher order MA processes, a shock today af-
fects y for more periods. In particular, the number of periods into
the future that a shock today has an effect over time for the number
of periods as the order of the process.
We will need one more assumption to talk about well-defined MA processes
when q is ∞. In this case, we will assume what is called square-summability.
This is a technical requirement:
∞∑
j=0
θ2j <∞ If the process is square summable, then it is covariance stationary. 2.8 Autoregressive (Markov) Processes Autoregressive, or Markov processes, are stochastic processes in which the ran- dom variable is related to lagged values of itself. The first-order process, or AR (1) process is given by: yt = µ+ φyt−1 + εt Assume that ε is a white noise process, with constant variance and the mean is zero.We will focus on stationary AR processes. If |φ| < 1,then it is covariance stationary. To see this, solve the difference equation using backwards substitution, which yields: yt = µ 1− φ + ∞∑ i=0 φiεt−i Note that by solving this difference equation backwards, we have re-written it as a MA(∞). This is called the moving average representation of the AR(1). This process is covariance stationary provided the following restriction holds: 7 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� ∞∑ 0 φi = 1 1− φ <∞ The mean of this process is: E(yt) = µ 1− φ The variance is: γ0 = σ 2 1 1− φ2 the jth autocovariance is: γj = σ 2 φ j 1− φ2 The jth autocorrelation is thus: ρj = γj γ0 = φj Exercise: Show that the unconditional mean of yt, is given by µ 1−φ , and that the variance is given by σ2 1 1−φ2 The second order autoregressive process is: yt = µ+ φ1yt−1 + φ2yt−2 + εt Recall from the study of difference equations that this equation is stable provided that the roots of: (1− φ1z − φ2z 2) = 0 lie outside the unit circle. If this is satisfied, then the process is stationary. Note that econometric software programs will do this for you. Note that the equation above is a quadratic equation, with two solutions (two roots). Thus we need both of those roots to be less than one in absolute value to be stationary. The importance of AR processes? Statistically, almost ALL economic time series are well approximated by low order AR processes . Behaviorally, many of the dynamic economic models we use in economics can be represented, or well-approximated, as linear autoregressive processes 8 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� 2.8.1 High Order AR Processes The pth order AR process is given by: yt = µ+ p∑ i=1 φiyt−1 + εt It is stationary provided that the roots of 1− φ1z − φ2z 2 − ...− φpz p = 0 all lie outside the unit circle. The autocovariances and autocorrelations are solved for analagously to the second order case. 2.9 ARMA Processes ARMA processes contain both autoregressive and moving average components. The ARMA (p,q) process is: yt = µ+ p∑ i=1 φiyt−i + εt + q∑ j=1 θjεt−j Stationarity requires the usual assumption on the roots of the pth order AR polynomial (that is, they all lie outside the unit circle): 1− φ1z − φ2z 2 − ...− φpz p = 0 3 Principles of Forecasting We now discuss using current and past values of variables or their innovations. Define the collection of this information to be X. First we define the forecast error: ηt+1 = yt+1 − y ∗ t+1 |t Mean square forecast error is: E(yt+1 − y∗t+1 |t) 2 9 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� One possibility in forecasting is to minimize mean-square forecast error. If we have linear models, such as: y∗t+1 |t= α ′Xt then the forecast that minimizes mean square error is the linear projection of y on X, which satisfies the following E(yt+1 − α′Xt)X ′t = 0 Note that is analgous to least squares in regression analysis - the difference is that the linear projection involves the population moments, while least squares involves the sample moments. 3.1 Forecasting an AR(1) Process Let’s forecast an AR(1) process, which is very straightforward. Note that the big picture idea here is that we exploit the fact that the future is related to the past in order to make forecasts. The process is given by: yt+1 = φ1yt + εt+1, E(ε) = 0 Since our best forecast for the term εt+1 is 0, then our forecast is given by: Etyt+1 = φ1yt The one-period forecast error is given by: e1t+1 = yt+1 − φ1yt = εt+1 The one-period forecast error variance is give by: V ar(yt+1 − φ1yt) = σ 2 Note that we can construct a forecast interval. Assuming that the data are normally distributed, then we can construct a 95% confidence interval around our one-period forecast as: Etyt+1 ± Z0.25 √ σ2 For two periods, we have the following: 10 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� yt+2 = φ1yt+1 + εt+2 and our forecast is given by: Etyt+2 = φ1Etyt+1 = φ 2 1yt The 2 period forecast error is given by: e2t = yt+1 − φ 2yt = εt+1 + φεt+1 The 2 period forecast error variance is given by: V ar(yt+1 − φ2yt) = (1 + φ2)σ2 Thus, the N period forecast is given by: Etyt+n = φ nyt Note that as N gets large, then the forecast converges to the unconditional mean, which is zero. Practice exercise: derive the formula for the variance of the N-period forecast error If we have a non-zero mean, then it is easy to incorporate that component. yt = µ+ φ1yt−1 + εt Note that we also have: yt+1 = µ+ φ1yt + εt+1 Now, let’s form the forecast as follows: Etyt+1 = µ+ φ1yt 3.2 Forecasting an AR(2) Process The stationary AR(2) process is: yt = µ+ φ1yt−1 + φ2yt−2 + εt, E(ε) = 0 The forecast is given by: Etyt+1 = µ+ φ1yt + φ2yt−1 11 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� Let’s look at a two period forecast for the AR(2). Recall that we have: yt+2 = µ+ φ1yt+1 + φ2yt + εt+2, E(ε) = 0 To construct the best two period forecast, we begin as follows: Etyt+2 = µ+ φ1Etyt+1 + φ2yt Substituting in from above for the forecast for yt+1, we get: Etyt+2 = µ+ φ1(µ+ φ1yt + φ2yt−1) + φ2yt Let’s look at the 3-period ahead forecast: Etyt+3 = µ+ φ1Etyt+2 + φ2Etyt+1 Now, substitute in and we get: Etyt+3 = µ+ φ1(µ+ φ1(µ+ φ1yt + φ2yt−1) + φ2yt) + φ2(µ+ φ1yt + φ2yt−1) Note that as the forecast horizon gets longer, we get more and more terms in the forecasting equation, but forecasting software packages will do this work for you so that you don’t have to! Note also that the same approach can be used for higher order AR processes. You just write out the forecasting equation, and then evaluate each term on the right hand side, substituting in where necessary. 4 Estimating the Parameters of AR Models OLS works well for estimating AR models. The idea is to treat the lagged variables as explanatory variables (or "X" variables that you have learned in econometrics). We illustrate this with the AR(1).Minimizing the sum of squared residuals is given by: min µ,φ T∑ t=2 (yt − µ− φyt−1)2 12 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� This implies:[ µols φols ] = [ T − 1 ∑ yt−1∑ yt−1 ∑ y2t−1 ]−1 [ ∑ yt∑ yt−1yt ] The estimate for the innovation variance, σ2ols is given by: σ2ols = ∑ (yt − µols − φolsyt−1)2 T − 1 Note that we lose one observation from the dataset because the right-hand side variable in the regression is lagged one period. For the AR(2), we lose two observations, etc. Thus for the AR(P) model, we estimate the parameters using OLS and we keep in mind that we will lose P observations from the dataset. For statistical inference, we can use the same tools that you have used previ- ously in econometrics. Specifically, one can use the t-test to test the significance of the stationary AR parameters. The estimation of MA models is complex, so we will leave that for you to learn in the next quarter. 5 Diagnostic Statistics Typically, we when we fit AR models for forecasting, we will not know the data generating process. Therefore we will have to make an initial guess regarding the type of model, and then test this guess. This can be boiled down as follows: (1) Guess the model (start with an AR (1)) (2) Estimate the parameters (3) Assess the adequancy of the model 5.1 Guessing the type of model An important principle to keep in mind is simplicity: it is typically better to consider simple models over complicated models. Thus, start from an AR(1), and then test if it is adequate. If it is, then the resdiuals will be white noise. Testing for residual autocorrelation If you have estimated a decent model for the process, then the residuals from the process should be white noise. In other words, there should be no autocor- relation in those residuals. A simple approach is to graph the autocorrelations of the residuals, and visually inspect them to see if there is substantial autocor- relation. A formal statistical test for white noise is the Box-Ljyung test. This is given as: 13 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� Q = T (T + 2) P∑ τ=1 r2τ T − 1 where T is the number of observations and P is the number of autocorre- lations being tested, and r is the order of the autocorrelation. Under the null hypothesis of white noise, then the test statistic Q is distributed as a χ2 random variable with P-p-q degrees of freedom, where p is the order of the AR compo- nent of the model, and q is the order of the MA component of the model. A useful approach is to pick P = 6, that is, 6 autocorrelations. 6 Achieving Stationarity: Part 1 Economic Time Series often violate our assumption of covariance stationarity. In particular, their mean is typically changing over time. Thus, the average value of GDP in the U.S. in the 1990s is much higher than the average value of U.S. GDP 100 years ago. For the time being, we will deal with this type of nonstationarity simply by using stationary-inducing transformations of the data. We will now consider 2 of these transformations. But before we develop these transformations, a preliminary transformation to use is to take logs of the time series, unless they already are in logged form (e.g. interest rates). This is useful, since the time series typically are growing, and also is a useful way of dealing with certain types of heteroskedasticity. 6.1 First-differencing The first approach we will consider is to take first differences. Thus, after taking logs, simply define a new variable, ∆yt,where it is defined as: ∆yt = yt − yt−1 Given that we have logged the variable, note that this transformation mea- sures the growth rate in the variable. This type of transformation almost always induces stationarity for processes that have means (in log levels) that change over time in a systematic way (e.g. trends). To understand this, note that the log-difference transformation of a variable represents that variable in terms of its growth rates - that is, log-differencing real GNP yields the growth rate of GNP. Most growth rates of economic variables are stationary. 6.2 Removing Deterministic Trend Components An alternative approach to inducing stationarity for processes that grow over time is to remove deterministic trend from their logged values. Removing a linear trend means taking the residuals from the following regression: 14 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� 张熠华� ut = yt − µ̂− α̂t In addition to removing linear trends, one may also add a quadratic, cubic, etc. terms to this regression. In practice, removing these higher order terms is not commonly done. 15 张熠华�