Some Simple Statistical Models
Australian National University
(James Taylor) 1 / 24
I I
Introduction
Forecast: A statement about a future observable, given some set of
current information
Framework:
yt is the quantity of interest at time t
could be any data: inflation rate, exchange rate, GDP, annual number
of tourists, course grades, etc.
we usually observe y1, y2, . . . , yT
and want to describe yT+h.
ŷT+h will denote the point forecast, our “best guess”
(James Taylor) 2 / 24
Main Approaches
Two main approaches: ad-hoc methods and model-based methods
Ad-hoc Methods:
Use rules of thumb, often reasonable
Example 1: Let ŷT+h = yT . That is, just use the last observed value
Example 2: Let ŷT+h be some weighted average of past observations
(James Taylor) 3 / 24
ef stockprice
Ad-hoc Methods – Pros and Cons
Pros:
Appear reasonable (mostly)
Does not require us to specify a model
Modelling is a lot of work
We may not have a good model in mind
Easy to implement
Cons:
Doesn’t make the best use of the data
Not statistically justified
Di�cult to analyse statistical properties of the forecast
(James Taylor) 4 / 24
Model-Based Approach
Build a statistical model, thinking carefully about the key features of
the data
Specify how the data are generated
Estimate the model parameters based on past observations
Use this to produce forecasts
(James Taylor) 5 / 24
Model-Based Approach – Pros and Cons
Pros:
Unified approach – can attack di↵erent problems using the same
approach
Can model relevant features of the data to produce good forecasts
Can analyse statistical properties of the forecasts
Cons:
Modelling is hard
Requires a background in statistical inference
Requires more involved programming
(James Taylor) 6 / 24
Data Generating Process
A model is a stylized description of the object of interest
We will work with statistical models which describe how the data is
generated
We’ll introduce a range of models which can generate
Autocorrelation
Trends
Cycles
(James Taylor) 7 / 24
The AR(1) Process
Fix y1 = 0
For t = 2 onward, generate yt according to
yt = ryt�1 + et , et ⇠ N (0, s2)
This is called a first-order autoregressive process or AR(1)
For r = 1, it is called a random walk. Why?
For |r| < 1, the process is stationary. Why? (James Taylor) 8 / 24 Autoregressiveprocess It tr Yz Py t E z t Ez N o 64 Example AR(1) Process (a) Stationary Process (b) Random Walk Figure: AR(1) Processes (James Taylor) 9 / 24 P o f E l Yz ofya 1Ez Yu Yat Su Code for AR(1) T = 100; y = [1:T]'; b = .8; a=1; y(1) = 0; for t = 2:T y(t) = b*y(t�1) + a*randn; end x = 1:T; plot(x,y) (James Taylor) 10 / 24 f t.io y 4 StNN oil Yt b ya ta randomnormaldistribution The MA(1) Process Set e0 = 0 and draw e1, e2, . . . independently from N (0, s2) For t = 1 onward, generate yt according to yt = qet�1 + et This is called a first-order moving average process or MA(1). (James Taylor) 11 / 24 MovingAverageProcess I Yt1 0421sty Example MA(1) Processes (a) Small q (b) Large q Figure: MA(1) Processes (James Taylor) 12 / 24 D o D of les relationship bwtoday'sobserve merecorrelation Value w yesterday'sobservevalue Code for MA(1) T = 100; e = [1:T]'; a = 1; b=0.8; for t=1:T e(t) = a*randn; end y = [1:T]'; y(1) = e(1); for t=2:T y(t) = e(t) + b*e(t�1); end x = [1:T]'; plot(x,y) (James Taylor) 13 / 24 El o y si y Eatb E y3 4 1b EL i Two Ewotb Eff Models with Trend and Cycle Want a time series y1, y2, . . . where yt = mt + ct + et , et ⇠ N (0, s2) with Trend component mt Cycle component ct Error term et (James Taylor) 14 / 24 Example - Model with trend and cycle Let trend be linear: mt = a0 + a1t so a0 is the level and a1 is the slope of the trend Let cycle be sinusoidal: ct = b1 sin(wt) + b2 cos(wt) amplitude and position are determined by b1, b2 frequency is determined by w (James Taylor) 15 / 24 Mt Ao t union www T Example Cyclical Processes (a) With drift (b) Without drift Figure: Trend + Cycle Models (James Taylor) 16 / 24 a 70 a _0 Code of model T = 50; y = [1:T]'; a0 = 0; a1 = .5; c =.8; b1 = 1; b2 = 2; w = 1; for t = 1:T y(t) = a0 + a1*t + b1*sin(w*t) + b2*cos(w*t) + c*randn; end x = 1:T; plot(x,y) (James Taylor) 17 / 24 L Yg ao ta it t b sin wt t bzGsCwtI t C Et Regression Model Often we observe data in addition to the variable of interest E.g. if forecasting international visitor arrivals, we also know Exchange rates Fuel costs Major events in other tourist markets We may be able to use this to generate a better forecast (James Taylor) 18 / 24 g GULD Let xt�1 = (x1,t�1, . . . , xk,t�1) be a (row) vector of data for predicting yt . The linear regression model specifies the linear relationship between yt and the regressors as yt = x1,t�1b1 + · · ·+ xk,t�1bk + et , et ⇠ N (0, s2) where (b1, . . . , bk)0 is a (column) vector of regression coe�cients (James Taylor) 19 / 24 Tomorrow Joe Xi'tJi t Xi tp t u Simple Example Let T = 3, k = 2. Then the regression model is y1 = x1,0b1 + x2,0b2 + e1 y2 = x1,1b1 + x2,1b2 + e2 y3 = x1,2b1 + x2,2b2 + e3 In applications T will (must) be much bigger than k . (James Taylor) 20 / 24 Matrix notation Usually we want to write this more concisely as a matrix form. In general 0 BBBBB @ y1 y2 . . . yT 1 CCCCC A = 0 BBBBB @ x1,0 x2,0 · · · xk,0 x1,1 x2,1 · · · xk,1 . . . . . . . . . . . . x1,T x2,T · · · xk,T 1 CCCCC A 0 BB @ b1 . . . bk 1 CC A+ 0 BBBBB @ e1 e2 . . . eT 1 CCCCC A or more conveniently y = Xb + e (James Taylor) 21 / 24 y X Simple Example redux 0 BB @ y1 y2 y3 1 CC A = 0 BB @ x1,0 x2,0 x1,1 x2,1 x1,2 x2,2 1 CC A b1 b2 ! + 0 BB @ e1 e2 e3 1 CC A with 0 BB @ e1 e2 e3 1 CC A ⇠ N 0 BB @ 0 BB @ 0 0 0 1 CC A , 0 BB @ s2 0 0 0 s2 0 0 0 s2 1 CC A 1 CC A (James Taylor) 22 / 24 Regression Model Finale By making an appropriate change of variable (as we will see in a future lecture), the regression model gives that y follows the multivariate normal distribution: y ⇠ N (Xb, s2IT ) (James Taylor) 23 / 24 hyun Combining these models Very very often we will want to combine models, to make an ARX, or ARMA, or ARMAX model (for example). An ARX model would look like: yt = ryt�1 + x1,t�1b1 + · · ·+ xk,t�1bk + et , et ⇠ N (0, s2) (James Taylor) 24 / 24 w w Point Forecasts Iterated and Direct Forecasts Australian National University (James Taylor) 1 / 14 1.2 Forecast Horizon The forecast horizon is the number of periods between the current period and the period which we forecast Example: Annual GDP data, forecast GDP one year from now; forecast horizon is one Example: Quarterly inflation data, forecast inflation one year from now; forecast horizon is four Example: Monthly sales data, forecast sales one year from now; forecast horizon is twelve Models for the forecast horizon when h > 1
Iterated Forecasts
Direct Forecasts
(James Taylor) 2 / 14
Ytth
Tt
t
h
Forecast Horizon
The forecast horizon is the number of periods between the current period
and the period which we forecast
Example: Annual GDP data, forecast GDP one year from now;
forecast horizon is one
Example: Quarterly inflation data, forecast inflation one year from
now; forecast horizon is four
Example: Monthly sales data, forecast sales one year from now;
forecast horizon is twelve
Models for the forecast horizon when h > 1
Iterated Forecasts
Direct Forecasts
(James Taylor) 2 / 14
Forecast Horizon – AR(1)
Usual AR(1) process:
yt = ryt�1 + et , et ⇠ N (0, s2)
We observe y1, . . . , yT , and want to forecast yT+1
Assume we know the parameters r and s.
What is a reasonable ŷT+1?
(James Taylor) 3 / 14
rho
Epsilon
IPl l
rho sigma
As yT+1 is a random variable, a reasonable estimate might be the
conditional expected value
Let It denote the information set are time t, then
ŷT+1 = E(yT+1 | IT , (r, s))
= E(ryT + eT+1 | IT , (r, s))
= E(ryT | IT , (r, s)) + E(eT+1 | IT , (r, s))
= ryT
(James Taylor) 4 / 14
quditionalon
YetI PYT tETH
O pexpectedvalueoftheerrornextperiodgivenwhat
weknowtodayandtheparameterofthemodel
Tf1451 6 ErrorTmr ETt NNLOG
T
Two-step-ahead Forecast
What about ŷT+2 ?
Can’t just use E(yT+2 | IT+1, (r, s)) because we don’t know IT+1.
One option – iterate the AR(1) process
(James Taylor) 5 / 14
Iterated Forecasts
So we have
yT+2 = ryT+1 + eT+2
= r(ryT + eT+1) + eT+2
= r2yT + reT+1 + eT+2
Then taking conditional expectation we find
E(yT+2 | IT , (r, s)) = r2yT
This is an iterated forecast.
(James Taylor) 6 / 14
IF IyTtr l It lP62 IE P y tPEItETIl IT CP64
O o
1hexpectation
E lpy l IT lP 64
p
Z
yT
Iterated Forecasts
So we have
yT+2 = ryT+1 + eT+2
= r(ryT + eT+1) + eT+2
= r2yT + reT+1 + eT+2
Then taking conditional expectation we find
E(yT+2 | IT , (r, s)) = r2yT
This is an iterated forecast.
(James Taylor) 6 / 14
More Iterations
If y is an AR(1) process it is straightforward to show that
E(yT+h | IT , (r, s)) = rhyT
From this we see that for random walks
E(yT+h | IT , (r, s)) = yT
and for stationary processes
lim
h!•
E(yT+h | IT , (r, s)) = 0
(James Taylor) 7 / 14
IPki
0 1
in ur
1014
Direct Forecast
Instead of producing iterated forecasts, we could instead re-specify the
model to
yt+h = r̃yt + et+h
then find that E(yT+h | IT , (r̃, s)) = r̃yT .
This is a direct h-step-ahead forecast
This is no longer an AR(1) model
It behaves very di↵erently to our original model
(James Taylor) 8 / 14
Hi
tilde
I
Iterated vs Direct
May give quite di↵erent forecasts
Iterated forecasts often perform better than direct forecasts
Especially for large forecast horizons
But, we can’t always do an iterated forecast
(James Taylor) 9 / 14
Linear Regression Forecasts
Recall the linear model
yt = xt�1b + et , et ⇠ N (0, s2)
Iterated forecasts are not possible for h > 1, because
yT+h = xT+h�1b + eT+h
and we have no idea what xT+h�1 could be.
No recursive relationship between xt and xt�1
(James Taylor) 10 / 14
Xt l LXi t l X2 t l
Linear Regression Forecasts
So instead we will use a direct forecast model
yt = xt�h b̃ + et , et ⇠ N (0, s2)
So that
yT+h = xT b̃ + eT+h, eT+h ⇠ N (0, s2)
Taking conditional expectation we find
E(yT+h | IT , (b̃, s)) = xT b̃
(James Taylor) 11 / 14
I
0
IFHTT EthelLt LF61
Short vs Long Term
The forecast horizon will a↵ect the choice of forecasting model.
For example, for GDP forecasts:
for near future forecast the short-term business cycle fluctuations will
drive almost all changes
for long horizon forecast, the business cycle matters very little, and
the trend component becomes important
(James Taylor) 12 / 14
Usapp
32
Short vs Long Term – Trend-cycle model
yt = mt + ct + et , et ⇠ N (0, s2)
with cycle component
ct = b1 sin(wt) + b2 cos(wt)
The ct is bounded in time
|ct | = |b1 sin(wt) + b2 cos(wt)|
|b1 sin(wt)|+ |b2 cos(wt)|
|b1|+ |b2|
(James Taylor) 13 / 14
fixedintime
Short vs Long Term – Trend-cycle model
While the trend term mt is typically unbounded in time.
For example, if we specify a linear trend mt = a0 + a1t, then |mt | is
unbounded.
As |ct | is bounded, and |mt | is not, we find
lim
t!•
|yt/mt | = 1
That is, for large t, the variable yt is determined almost entirely by
the trend component.
(James Taylor) 14 / 14
we
Interval and Density Forecasts
Australian National University
(James Taylor) 1 / 8
e
More Informative Forecasts
In previous lecture modules we have discussed only point forecasts.
This gives our best guess for ŷT+h.
But a single value might not be enough information to aid decision
making
So Interval Forecasts, and
Density Forecasts
(James Taylor) 2 / 8
Interval Forecast
Forecast GDP growth rate next quarter
Point forecast maybe -10%
How confident are we of this forecast? Are we nearly certain? Is it
just a guess?
What is the variability of the forecast?
(James Taylor) 3 / 8
I3
Interval Forecast
Better Forecast: with probability 0.95 the growth rate will fall in
(�30%,�5%).
This is an interval forecast
While a point forecast gives a very succinct summary, the interval
forecast tells us something about the forecast uncertainty
(James Taylor) 4 / 8
Density Forecasts
Can we do even better than an interval forecast?
The future observable yT+h is a random variable
So all the information about yT+h is summarized in its probability
density function
What is a suitable pdf for yT+h ?
A good estimate would be the conditional density f (yT+h|IT , q).
This is a density forecast.
(James Taylor) 5 / 8
Concerns
There are some issues with using f (yT+h|IT , q)
It assumes q is known.
It implicitly assumes a particular data generation process
So ignores both parameter and model uncertainty
We can deal with the first problem, the latter is trickier
(James Taylor) 6 / 8
AR(1) Density Forecast
yt = ryt�1 + et , et ⇠ N (0, s2)
We want a density forecast for yT+1.
yT+1 = ryT + eT+1, eT+1 ⇠ N (0, s2)
r and yT are known, just eT+1 is unknown.
But we know the distribution of eT+1.
So yT+1 ⇠ N (ryT , s2)
(James Taylor) 7 / 8
AR(1) Density Forecast
yT+1 ⇠ N (ryT , s2)
That is, yT+1 is distributed according to a normal distribution with
mean ryT and variance s
2
.
So a 95% interval forecast is (ryT � 1.96s, ryT + 1.96s).
(James Taylor) 8 / 8