Outline
Random Walk and Unit-Root Test ARIMA
Seasonal models
Nonlinear models
Readings
SDA chapter 9 FTS chapter 4
Stationarity condition
The beauty of a stationary process is that it can be modelled with relatively few
parameters. For example, we do not need a different expectation for each
observation 𝑌 ; rather they all have a common expectation. 𝑡
AnAR(1)process𝑌 =𝛿+𝜙 𝑌 +𝑒 isstationaryifandonlyif 𝑡 1𝑡−1𝑡
−1<𝜙1 <1. AnAR(2)processY =δ+φ Y +φ Y
+e isstationaryifandonlyif t
−1<φ2 <1.
t 1 t−1 2 t−2
φ2 +φ1 <1, φ2 −φ1 <1,
An MA process is always stationary.
An ARMA(p,q) process is stationary if and only if the characteristic
equation:𝜙 𝑧 =1−𝜙1𝑧−𝜙2𝑧2-...−𝜙𝑝𝑧𝑝 𝜙𝑧≠0 𝑤h𝑒𝑛𝑧≤1
i.e. all roots including complex roots of 𝜙 𝑧 lie outside the unit circle.
A stationary series should show oscillation around some fixed level, a phenomenon called mean-reversion.
Trends
Many economic, business, financial series (e.g. interest rates, inflation rates, stock prices and unemployment rates) do not conform to the assumption of stationarity. Many are trending.
The results of classical estimation are derived under the assumption that variables of concern are stationary. Standard techniques are largely invalid where data is non-stationary.
Share Prices Exchange Rate Income
Trends
Many time series are trending and hence nonstationary. Important to distinguish two kinds of trending processes:
Trend-stationary (TS): 𝑌 = 𝜗 + 𝛽𝑡 + 𝜀 , 𝜗, 𝛽 are constant. A process with a 𝑡𝑡
deterministic trend. Note that 𝐸 𝑌 = 𝜗 + 𝛽𝑡 is time dependent, but after 𝑡
removing trend 𝑌 − 𝐸[𝑌 ] = 𝜀 becomes stationary. 𝑡𝑡𝑡
Difference-stationary (DS): 𝑌 = 𝜗 + 𝑌 + 𝜀 is a process with a stochastic 𝑡 𝑡−1𝑡
trend which contains a unit root and named random walk. Its demeaned series is still non-stationary, but its difference 𝑌 − 𝑌 = 𝜗 + 𝜀 becomes stationary.
Why is distinguishing the two kinds trending important?
1) Shocks (e.g. bad news) have only transitory effects on TS time series but
permanent effects on DS series. Interesting to know if shocks have
permanent or transitory effects.
2) Find an appropriate way to obtain stationary process, based on which one
safely builds model and conducts prediction.
3) For difference-stationary series such as random walk, many asymptotic
distributions change!
𝑡𝑡−1 𝑡
Trend-stationary
Consider a stationary AR(1) model with a deterministic linear trend term: 𝑌 =𝜃𝑌 +𝛿+𝛾𝑡+𝜀, 𝑡=1,2,...,𝑇
𝑡𝑡−1 𝑡
where |𝜃| < 1, and 𝑌 is an initial value. 0
The solution for 𝑌 (MA-representation) has the form 𝑡
𝑌 =𝜃𝑡𝑌 +𝜇+𝜇𝑡+𝜀 +𝜃𝜀 +𝜃2𝜀 +𝜃3𝜀
𝑡 0 1 𝑡 𝑡−1 𝑡−2 𝑡−3
where 𝜇 and 𝜇1 depend on 𝛿 and 𝛾. Note that the mean,
𝑌 =𝜃𝑡𝑌 +𝜇+𝜇𝑡→𝜇+𝜇𝑡 𝑓𝑜𝑟𝑇→∞, 𝑡011
contains a linear trend, while the variance is constant:
+⋯
var 𝜀𝑡 +𝜃𝜀𝑡−1 +𝜃2𝜀𝑡−2 +⋯ =𝜎2 +𝜃2𝜎2 +𝜃4𝜎2 +⋯ = 𝜎2 1−𝜃2
We can analyse deviations from the mean, 𝑦𝑡.It is the same as a regression including a trend.
The original process, 𝑌 , is not stationary. The deviation from the mean, 𝑡
𝑦 =𝑌−𝐸[𝑌]=𝑌−𝜇−𝜇𝑡 𝑡𝑡𝑡𝑡1
is a stationary process. The process 𝑌 is called trend-stationary. 𝑡
The stochastic part of the process is stationary and shocks have transitory effects. We say that the demeaned process is mean reverting.
Shock to a TS process
𝑦0 𝜖1→ 𝑦1
⋮⋮ 𝜖50 → 𝑦50
⋮⋮ 𝜖100 → 𝑦100
Unit root process
Random walk model: 𝑌 = 𝑌 + 𝜀 is an example of difference-
𝑡+1 𝑡 𝑡+1
stationary process.
It is called a random walk because each new value can be
thought of as a random step away from the previous value. Random walk is a non-stationary process that has one unit root. Random walk model is equivalent to the naïve forecast method.
Whatever happened last period will happen again this time.
Naïve forecasts trace the actual values well over time, though lagged behind.
Random Walk (RW)
Consider an AR(1) process with unit root: 𝜙 = 1:
𝑌=𝑌 +𝜀, 𝜀∼𝐼𝐼𝐷0,𝜎2 , 𝑡=1,2,⋯
𝑡 𝑡−1 𝑡 𝑡
Note that 𝑧 = 1 is a root in the autoregressive polynomial, 𝜃 𝐿 =
1 − 𝐿 . 𝑌 is non-stationary. The above random walk model can be 𝑡
reformulated as:
𝑡 𝑌=𝑌 +𝜀 =𝑌 +𝜀 +𝜀 =⋯=𝑌+𝜀.
𝑖=1
𝑡 𝑡−1 𝑡 𝑡−2 𝑡−1 𝑡 0
With a random walk, the expected value of 𝑌 is always 𝑌 -- it doesn’t
depend on 𝑡:
𝐸𝑌 =𝐸𝜀 +𝐸𝜀 +⋯+𝐸𝜀 +𝐸𝑌 =𝐸(𝑌)
𝑡0
𝑖
𝑡 𝑡 𝑡−1 1 0 0
We say a random walk has long memory since no matter how far in the future we look, our best prediction of 𝑌 is today’s value 𝑌 .
𝑡
The variance is time dependent and goes to infinity as time approaches to infinity.
𝑡+h However the variance of 𝑌 increases with 𝑡
𝑡2 𝑣𝑎𝑟𝑌 =𝑣𝑎𝑟𝜀 +𝑣𝑎𝑟𝜀 +⋯+𝑣𝑎𝑟𝜀 =𝜎𝑡.
𝑡 𝑡 𝑡−1 1
RW with drift
Form:𝑌 =𝛿+𝑌 +𝜀, 𝛿≠0. 𝑡 𝑡−1𝑡
It is an AR(1) model with coefficient 𝜙 = 1. The solution for 𝑌 is given by
𝑡𝑡
with moments
𝑖=1 𝑖=1
𝑡
𝑌 =𝑌 +(𝛿+𝜀)=𝑌 +𝛿𝑡+𝜀, 𝑡0𝑖0𝑖
𝐸𝑌 =𝛿𝑡+𝐸𝑒 +𝐸𝑒 +⋯+𝐸𝑒 +𝐸𝑌 =𝛿𝑡+𝐸𝑌 𝑡 𝑡 𝑡−1 1 0 0
and𝑣𝑎𝑟 𝑌 =𝑣𝑎𝑟 𝑒 +𝑣𝑎𝑟 𝑒 +⋯+𝑣𝑎𝑟 𝑒 =𝜎2𝑡. 𝑡 𝑡 𝑡−1 1
The effect of the initial value, 𝑌 , stays in the process. 0𝑡
The innovations, 𝜀𝑡, are accumulated to a random walk, σ𝑖=1𝜀𝑖.This is denoted a stochastic trend. Note that shocks have permanent effects.
The constant 𝛿 is accumulated to a linear trend in 𝑌 .
The process is denoted a random walk with drift.
Both mean and variance of 𝑌 change with time t. 𝑡
𝑡
Shock to a DS process
The effect of the initial value, 𝑌 , stays in the process. The innovations, 0
𝜀 , are accumulated to a random walk. The variance of 𝑌 grows with t. 𝑡𝑡
This is denoted a stochastic trend. Shocks have permanent effects.
Difference operator
The difference operator Δ is used to express the difference between consecutive realizations of a time series. With Δ𝑦𝑡 we denote the first order difference of 𝑦𝑡 :
Δ𝑦𝑡 = 𝑦𝑡 − 𝑦𝑡−1
Note that 𝜟𝟐𝒚𝒕 ≠ 𝒚𝒕 − 𝒚𝒕−𝟐. Instead 2
Δ𝑦𝑡=ΔΔ𝑦𝑡 =Δ𝑦𝑡−𝑦𝑡−1
= 𝑦𝑡 −𝑦𝑡−1 − 𝑦𝑡−1 −𝑦𝑡−2
= 𝑦𝑡 − 2𝑦𝑡−1 + 𝑦𝑡−2
The difference operator can be expressed in terms of the lag operator because
222 Δ = 1 − 𝐿. Hence, Δ = 1 − 𝐿 = 1 − 2𝐿 + 𝐿 .
𝑑𝑑 Ingeneral,Δ = 1−𝐿 .From
it follows that
𝑑𝑑 𝑎+𝑏𝑑= 𝑘 𝑎𝑑−𝑘𝑏𝑘
𝑘=0
𝑛𝑛
𝑛𝑛𝑘 Δ = 1−𝐿 = 𝑘 −𝐿
𝑘−0
Differencing
Random walk with drift is difference-stationary (DS) process. It is not stationary, even its demeaned series 𝑌 −𝐸 𝑌 =𝑌 −𝑌 −𝛿𝑡=σ𝑡 𝜀 is not
stationary. However its difference – the change from one period to the next Δ𝑌 = 𝑌 − 𝑌 -- is stationary.
𝑡 𝑡 𝑡−1
If a non-stationary time series 𝑌 has to be “differenced” d times to make it 𝑡
stationary, then 𝑌 is said to contain d “Unit Roots”. It is customary to denote 𝑡
𝑌 ~𝐼(𝑑) which reads “𝑌 is integrated of order 𝑑”. It is also called integrated 𝑡𝑡
process.
If𝑌 ~I(1),thenΔ𝑌 = 𝑌 – 𝑌 isstationary.
𝑡 𝑡𝑡𝑡−1 2
If𝑌~I(2),thenΔ𝑌= 𝑌–𝑌 − 𝑌 –𝑌 isstationarybutΔ𝑌 isnot 𝑡 𝑡 𝑡 𝑡−1 𝑡−1 𝑡−2 𝑡
𝑡𝑡𝑡0 𝑖=1𝑖
stationary.
After estimating a model based on the differenced series, you can have a forecast both for the differenced process (e.g. return) and for the original process (price). Example: If the current price is 253, and the forecasted rate of return is 2.3%, the forecasted price is 259. If a 90% forecast interval for the rate of return is [1%, 4%], the 90% interval for the price level is [256,263]
Example
Simulations of 200 observations from AR(1) processes with various values of 𝜙 and 𝜇=0. The white noise process e1, 𝑒2, ⋯ , 𝑒200 is the same for all four AR(1) processes.
5_sim.R
Unit-root tests
A good way to think about unit root tests is to compare two relevant models: 𝐻 and𝐻.
What are the properties of the two models? Do they adequately describe the data?
Which one is the null hypothesis?
Consider two alternative tests:
Dickey-Fullertest:𝐻 isaunitroot,𝐻 isstationarity.
0𝐴
KPSS test: 𝐻 is stationarity, 𝐻 is a unit root.
0𝐴
0𝐴
Challenge: Unit root tests have low power. Many economic time series are persistent, but is the root 0.98 or 1.0? Often difficult to distinguish in practice.
Dickey-Fuller (DF) test
Consider first an AR model has an absolute root equal to 1
𝑌=𝜃𝑌 +𝜀 𝑡 𝑡−1 𝑡
If 𝜃 = 1, then 𝑌 is a non-stationary process e.g. random walk. 𝑡
The unit root null hypothesis against the stationarity alternative corresponds to 𝐻 ∶ 𝜃 = 1 against 𝐻 ∶ 𝜃 < 1.
0𝐴 Alternatively the model can be formulated as
Δ𝑌 = 𝑌 − 𝑌 = 𝜃 − 1 𝑌 + 𝜀 = 𝜋𝑌 + 𝜀 𝑡 𝑡 𝑡−1 𝑡−1 𝑡 𝑡−1 𝑡
where 𝜀𝑡 is random noise and set 𝜋 = 𝜃 − 1 = −𝜃(1). The unit root hypothesis becomes
𝐻_0∶𝜋=0 against 𝐻 :𝜋<0. 𝐴
Dickey-Fuller (DF) test
The unit root hypothesis becomes
𝐻 ∶𝜋=0 against 𝐻 :𝜋<0.
1−𝑡 0
equals zero (unit-root nonstationary) and
under𝐻 thiscoefficientisnegative. 1
Dickey-Fuller (DF) test:
𝐴0
In the random walk case (𝜋 = 𝟎), the conventional estimation methods such as Least Square are still valid.
BUT
under 𝐻 : 𝜋 = 0 the coefficient of 𝑌
2
𝜏 Ƹ 𝑛 = 𝜃 − 1 =
1 − 𝜃
= 𝜋ෝ 𝑠𝑒(𝜋ෝ )
) 1 ( 𝑊 − 1 →
1/2 ) 𝑢 𝑑 ) 𝑢 ( 2 𝑊 1 ( 2
 )𝜃(𝑒𝑠
𝜎ෝ 2 ( σ 𝑛 𝑌 2 1−𝑡 2=𝑡
1 − )
0
መ22
where 𝜃; 𝜎ො are the LS-estimators of (𝜃, 𝜎 ). W is standard wiener process.
𝜏Ƹ𝑛 is not Student-t distributed!
Reject 𝐻 if test statistic (constructed with 𝜋ො) is smaller than critical values (the
critical values are tabulated.). Equivalently, if P value is larger than 5%, we don’t reject the null. This implies that the time series is not stationary but probably DS series.
0
The DF distributions
DFtest:𝑌 =𝜃𝑌 +𝜀 or∆𝑌 =𝜋𝑌 +𝜀 (zeromean) 𝑡 𝑡−1 𝑡 𝑡 𝑡−1 𝑡
DF with a constant: ∆𝑌 = 𝜋𝑌 + 𝜏 + 𝜀 (Single mean) 𝑡 𝑡−1 𝑡
DFwithalineartrend:∆𝑌 =𝜋𝑌 +𝜏+𝛾𝑡+𝜀 (Trend) 𝑡𝑡−1 𝑡
Critical values are tabulated for each situation based on Monte Carlo experiments.
Dickey-Fuller (DF) test
The DF test is extended to an AR(p) model. Consider an AR(3):
𝑌=𝜃𝑌 +𝜃𝑌 +𝜃𝑌 +𝜀 𝑡 1 𝑡−1 2 𝑡−2 3 𝑡−3 𝑡
The test is most easily performed by rewriting the model:
𝑦−𝑦 = 𝜃−1𝑦 +𝜃𝑦 +𝜃𝑦 +𝜀 𝑡 𝑡−1 1 𝑡−1 2𝑡−2 3𝑡−3 𝑡
=𝜃−1𝑦 +𝜃+𝜃𝑦 +𝜃(𝑦 −𝑦 )+𝜀 1 𝑡−1 2 3 𝑡−2 3 𝑡−3 𝑡−2 𝑡
=𝜃+𝜃+𝜃−1𝑦 +(𝜃+𝜃)(𝑦 −𝑦 )+𝜃(𝑦 −𝑦 )+𝜀 1 2 3 𝑡−1 2 3 𝑡−2 𝑡−1 3 𝑡−3 𝑡−2 𝑡
Under the null, reformulate in a stationary process
Δ𝑦𝑡 =𝜋𝑦𝑡−1 + 𝑐1Δ𝑦𝑡−1 + 𝑐2Δ𝑦𝑡−2 +𝜀𝑡
𝜋=𝜃+𝜃 +𝜃−1=−𝜃(1) 123
𝑐1 = −(𝜃2 +𝜃3) 𝑐2 = −𝜃3.
Aunitrootin𝜃(𝐿) = 1−𝜃 𝐿 −𝜃 𝐿 −𝜃 𝐿 correspondsto𝜃(1) = 0. 123
23
The unit root hypothesis is as same as before:
𝐻 ∶𝜋=0 against 𝐻 :𝜋<0.
0𝐴
This is denoted the Augmented Dickey-Fuller (ADF) test.
Augmented Dickey-Fuller test
In the augmented Dickey-Fuller test, often called the ADF test, the null hypothesis is that there is a unit root. The usual alternative is that the process is stationary but one can instead use the alternative that the process is explosive.
𝐻 ∶ 𝜋 = 0 against 𝐻 : 𝜋 < 0 0𝐴
Compared to the DF distribution that only holds if the errors are independent and identically distributed, the ‘Augmented’ Dickey-Fuller (ADF) test involves estimating the equation that the AR(1) errors are autocorrelated, and thus higher order AR is used
𝑝
Δ𝑌=𝜋𝑌 +𝑐∆𝑌 +𝜀. 𝑡 𝑡−1 𝑖 𝑡−𝑖 𝑡
𝑖=1
The purpose in adding the terms ∆𝑌 is to allow for ARMA error processes. 𝑡−𝑖
Problem: Choice of 𝑝.
If we augment 𝑝 we hold the level but we lose power!
ADF test: Level and Power
Suppose𝜀𝑡 ∼MA(1):𝜀𝑡 =𝛽𝜉𝑡−1 +𝜉𝑡,𝜉𝑡 isIID 0,𝜎2 ,wehave var(𝜀𝑡) = 𝜎2(1 + 𝛽2),
𝛾1𝜀𝑡 =cov𝜀𝑡,𝜀𝑡−1 =𝛽𝜎2,
For the process
we make simulations for the ADF test.
𝛾𝑘 (𝜀𝑡) = 0 for 𝑘 ≥ 2
𝑦𝑡 =𝜃𝑦𝑡−1 +𝜀𝑡 =𝜃𝑦𝑡−1 +𝛽𝜉𝑡−1 +𝜉𝑡
𝜷
𝜽
𝒑
-0.99
-0.9
0
0.9
1
3
0.995
0.722
0.045
0.034
11
0.365
0.095
0.041
0.039
0.9
3
1.000
0.996
0.227
0.121
11
0.667
0.377
0.105
0.086
ADF-test: simulated rejection probability for the process with a nominal level of 5% (Friedmann, 1992) When 𝜃 = 1, increase p helps to reduce wrong rejection probability.
When 𝜃 < 1, the rejection probability unfortunately reduces, i.e. the chance of committing type II error (accept null when it is wrong) increases with p.
Augmented Dickey-Fuller test
Level is better held for larger p
Power is worse for large p
Hence conflict remains between validity and power of test.
To determine the number of lags, p, we can use the normal procedures.
1. General-to-specifictesting:Startwith𝑝maxanddeleteinsignificantlags. 2. Estimatepossiblemodelsanduseinformationcriteria
3. Makesurethereisnoautocorrelation.
Verbeek suggests to calculate the DF test for all values of p. This is a robustness check, but why would we look at tests based on inferior/misspecified models?.
Example: Danish Bond Rate
Example: Danish Bond Rate
An AR(4) model gives
Δ 𝑟 = −0.0093𝑟
− 0.0741Δ 𝑟 𝑡−3
(−0.817)
+ 0.0011. (0.647)
+ 0.007 (0.406)
𝑡
−0.672
𝑡−1
+ 0.4033Δ 𝑟 − 0.0192Δ 𝑟 𝑡−1 𝑡−2
(4.49) (−0.199)
Removing insignificant terms produce a model
+ 0.3916Δ 𝑟 𝑡−1
4.70
Δ 𝑟 = −0.0122𝑟
𝑡
−0.911
𝑡−1
The 5% critical value (T = 100) is -2.89, so we do not reject the null of a unit root.
We can also test for a unit root in the first difference. Deleting insignificant terms we find a preferred model
2
Δ 𝑟 (=Δ𝑟 −Δ𝑟 ) = −0.6193Δ𝑟
−0.00033. (−0.534)
𝑡 𝑡 𝑡−1
Here we safely reject the null hypothesis of a unit root (−7.49 ≤ −2.89).
Based on the tests we conclude that 𝑟 is non-stationary while Δ𝑟 is stationary. That
is 𝑟 ∼ 𝐼 1 . 𝑡
(−7.49)
𝑡𝑡
𝑡−1
Unit root test
Remarks:
Another unit-root test is the Phillips-Perron test. It is similar to the Dickey-Fuller
test but differs in some details.
A third test is the KPSS test. The null hypothesis for the KPSS test is stationarity
and the alternative is a unit root, just the opposite of the hypotheses for the
Dickey-Fuller and Phillips-Perron tests.
KPSS (Kwiatkowski, Phillips, Schmidt, and Shin) Test. 𝐻 : stationarity
Assume there is no trend:
where 𝑒𝑡 is a stationary process and 𝜉𝑡 is a random walk, i.e.
𝜉 =𝜉 +𝜐, 𝜐∼𝐼𝐼𝐷(0,𝜎2). 𝑡𝑡−1𝑡𝑡 𝜐
If𝜎2 = 0,then𝜉 =𝜉 forall𝑡and𝑌 isstationary. 𝜐𝑡0𝑡
The test statistic is given by 𝐾𝑃𝑆𝑆 = 𝑡2∞
𝑌=𝜉+𝑒, 𝑡𝑡𝑡
0𝜐 A𝜐
1σ𝑇 𝑆2 𝑡=1 𝑡
0
Use a simple regression:
to find the estimated stochastic component. Under the null, 𝑒ෝ is stationary. That
observation can be used to design a test:
𝐻:𝜎2 =0againstH:𝜎2>0.
𝑌 = 𝜇 Ƹ + 𝑒ෝ , 𝑡𝑡
𝑡
𝑇2 𝜎ෝ2
where 𝑆 = σ 𝑒ෝ is a partial sum; 𝜎ො is a consistent variance estimator of 𝑒ෝ . 𝑡 𝑠=1𝑠 ∞ 𝑠
Example: Inflation rates – Unit root test
Although the ADF and Phillips-Perron tests suggest that the inflation rate series is stationary since the null hypothesis of a unit root is rejected at 5% significance level, the KPSS test leads to the opposite conclusion. The null hypothesis for the KPSS is stationarity and it is rejected with a p-value smaller than 0.01.
The unit root tests are contradictory. However the 3 tests confirm stationarity of the change of inflation rate.
5_inflation.R
Dating financial bubbles
Any signals of financial bubbles? If so, when do they start and exit? The materials are based on Phillips and Yu (2010).
Top: Monthly data of Nasdaq index from Feb 1973 to Jan 2009. Middle: Monthly data on the house price index from Jan 1987 to Jan 2009. Bottom: Monthly data of asset backed commercial paper from Jan 2001 to Jan 2009.
Dating financial bubbles
Time series: NASDAQ, Home Price Index, Bond (monthly data). A model: The bubble period [𝜏 , 𝜏 ] with an explosive behaviour
𝑒𝑓
𝛿𝑋 ,𝑡∈𝜏,𝜏,𝛿>1.
𝑋𝑡 = 𝜀𝑡 + ൝
𝑡−1 𝑒𝑓
𝑋𝑡 , otherwise
Recursive Dickey-Fuller Test: Let 𝜏 be the fraction of the data being used and 𝜌 bethelag-1samplecorrelationcomputedbasedondata{𝑋}𝑛𝜏 .Definethe
𝜏 𝑡 𝑡=1
Dickey-Fuller test statistics 𝒯 = 𝑛𝜏 𝜌 − 1 . 𝜏𝜏
Any bubble: Let 𝑟 be a given initial proportion. 0
Define the test statistics:𝒯 = max 𝒯 .
𝜏 𝑡
𝜏≥𝑟
Dating financial bubbles
Evidence of Migrations of Financial Bubbles.
Integrated ARMA Processes
Very often we observe that the mean and/or variance of economic time series, 𝑦𝑡 , change over time. In this case, we say the series are nonstationary.
The series of the changes from one period to the next, that is, the differenced series Δ𝑦𝑡, may have a mean and variance that do not change over time.
𝑑
It is conceivable that higher-order differencing Δ 𝑦𝑡 may be necessary to obtain
a constant mean and variance.
An ARMA process is called an autoregressive integrated moving average (ARIMA) process, with order of differencing 𝒅, or an ARIMA(𝒑, 𝒅, 𝒒) process, if it has the following form
𝑑
Φ(𝐿)Δ 𝑦𝑡 = Θ 𝐿 𝜖𝑡
The original level data 𝑦𝑡 are recovered from the differenced series by appropriate summation (“integration”):
𝑟−1 𝑑−1 𝑑−1 𝑑
Δ 𝑦𝑡 = Δ 𝑦0 + Δ 𝑦𝑡−1 , 𝑑 = 1,2, … 𝑖=0
What is ARIMA(p,d,q) model?
Auto-Regressive Integrated Moving Average (ARIMA) model is a more general extension of AR models. The “I” in ARIMA stands for integrated, which basically means you’re differencing the series.
In STI example, Y is the price, Z is the return, d =1, p=1 and q = 0. In other words, we do forecast based on an ARIMA(1,1,0) model.
ARIMA models ARIMA(p,d,q) models consists of:
an integrated process of order d, which stabilizes the original time series by differencing the original series d times. Integrated at the order d (e.g., the d-th difference) until the new series becomes stationary (check with ADF test). Don’t over-difference.
an AR process of order p for the stationary series, where the past p events involve in the regression on the stationary (integrated) series,
an MA process of order q for the stationary series, where the previous q forecast errors of the stationary (integrated) series are used to improve future forecast accuracy.
You can denote a “regular” AR, MA or ARMA model using the same notation:
ARIMA(1,0,0) = ARMA(1,0) = AR(1) ARIMA(0,0,1) = ARMA(0,1) = MA(1) ARIMA(1,0,1) = ARMA(1,1)
Example: Inflation rates – ARIMA
In this example, the R function auto.arima is applied to the inflation rates. The ARIMA(1,1,1) model is selected by auto.arima using either AIC or BIC to select p and q after d = 1 is selected by the KPSS test.
5_inflation.R
Forecasting: Prediction Intervals
To assess the uncertainty associated with this prediction, we compute the
confidence or prediction interval. The distributions of 𝑦ො𝑡 h and prediction error
𝜖Ƹ h = 𝑦 − 𝑦ො h are determined by the distribution of the 𝜖 . 𝑡 𝑡+h 𝑡 𝑡
Let 𝑧(𝛼) denote 𝛼 × 100% quantile of standard normal distribution. We have
𝑧 𝛼/2
= −𝑧 1−𝛼/2 . Then
1−𝛼=Pr −𝑧1−𝛼/2 ≤ 𝑡 ≤𝑧1−𝛼/2
=Pr −𝑧1−𝛼/2 ≤𝑦𝑡+h−𝑦ො𝑡 h ≤𝑧1−𝛼/2 𝜎h
=Pr 𝑦ො𝑡 h −𝑧1−𝛼/2 𝜎 h ≤𝑦𝑡+h ≤𝑦ො𝑡 h +𝑧1−𝛼/2 𝜎 h 𝑦ො 𝑡 h ± 𝑧 1 − 𝛼 / 2 𝜎 h
𝜖Ƹ h 𝜎h
Interval
is called the 1 − 𝛼 × 100% h-step prediction interval. Usually, for 𝛼 the values of 0.05 or 0.01 are chosen.
Example: Forecasting Inflation rates
MA(3) for changes of inflation rates. ARIMA(0,1,3) for inflation rates.
5_forecast_inflation.R
Computing forecast limits by simulation
There are two important advantages to using simulation for forecasting.
1. simulation can be used in situations where standard software does not compute forecast limits, and
2. simulationdoesnotrequirethatthenoiseseriesbeGaussian. Simulation can be used to compute forecasts limits.
A set of random forecasts up to m time units ahead is generated for an ARMA process by recursion:
𝑌
𝑛+𝑡
መ
−𝜇Ƹ +⋯+𝜙 𝑌 −𝜇Ƹ +𝑒Ƹ +𝜃𝑒Ƹ +⋯ 𝑝 𝑛+𝑡−𝑝 𝑛+𝑡 1 𝑛+𝑡−1
𝑡 = 1,⋯,𝑚
where 𝑒Ƹ is the j-th residual if 𝑗 ≤ 𝑛, otherwise it is a resample from the 𝑗
residuals. A large number, call it B, of sets of random forecasts are generated in this way. Among them, 𝛼/2-upper and –lower sample quantiles of the B random values are the forecast limits.
=𝜇Ƹ+𝜙 𝑌
1 𝑛+𝑡−1
መ
+𝜃 𝑒Ƹ ,
𝑞 𝑛+𝑡−𝑞
Example: Forecast limits of the changes of Inflation rates
5_forecast_inflation2.R
What should we consider when looking at time series data?
Data: Whole Foods Market, Inc. quarterly Sales ($M) 1995 — 2007.
Visual inspection via Time series plot: Time is measured on the horizontal axis; while the (dependent) variable of interest (quarterly sales) is measured on the vertical axis.
Trends
Seasonality
Cyclical elements
Autocorrelation
Random variation
Seasonal ARIMA models
Economic time series often exhibit strong seasonal variation. Here is a time series plot of the logarithms of quarterly urban housing starts in Canada from the first quarter of 1960 to final quarter of 2001. Housing starts are much lower in the first quarter than other quarters, jump to a peak in the second quarter, and then drop off slightly in the last two quarters.
Seasonal ARIMA{(𝑝, 𝑑, 𝑞) × 𝑝 , 𝑑 , 𝑞 } models 𝑠𝑠𝑠𝑠
To remove seasonal nonstationary, one uses seasonal differencing. Let s be the period. Define Δ = 1 − 𝐵𝑠 so that Δ 𝑌 = 𝑌 – 𝑌 .
𝑠 𝑠𝑡 𝑡 𝑡−𝑠
The seasonal differencing operator Δ𝑠 = 1 − 𝐵𝑠 is not the non-seasonal differencing operator Δ𝑠 = 1 − 𝐵 𝑠.
The ARIMA{(𝑝, 𝑑, 𝑞) × 𝑝𝑠, 𝑑𝑠, 𝑞𝑠 𝑠} process is:
𝑝 ∗𝑠 ∗𝑠𝑝𝑑𝑑𝑠
1−𝜙𝐵−⋯−𝜙𝐵 1−𝜙𝐵−⋯−𝜙 𝐵 𝑠{ΔΔ𝑌−𝜇} 1 𝑝 1 𝑝𝑠 𝑠𝑡
=1+𝜃𝐵+⋯+𝜃𝐵𝑞 1+𝜃∗𝐵𝑠+⋯+𝜃∗ 𝐵𝑠𝑞𝑠𝑒 1 𝑞 1 𝑞𝑠 𝑡
Example: seasonal differencing for housing starts
Economic time series often exhibit strong seasonal variation. For example, an investor in mortgage-backed securities might be interested in predicting future housing starts, and these are usually much lower in the winter months compared to the rest of the year.
Consider the logarithms of quarterly urban housing starts in Canada from the first quarter of 1960 to final quarter of 2001. The data are in the data set Hstarts in R’s Ecdat package.
5_house.R
Example:ARIMA{(1,1,1)× 0,1,14}model for housing starts
Example: forecasting housing starts
Any evidence of nonlinear dynamics?
A possible non-stationary phenomenon in time series is to have different behaviour (structural break) in one subsample (or regime) to another. A potentially useful approach is to model using nonlinear time series models.
If dates of regime switches are known
If the dates of the regimes switches are known, modelling can be worked out with dummy variables. For example, consider the following regression model
𝑦 =𝑥′𝛽 +𝑢, 𝑡=1,⋯,𝑇 𝑡 𝑡𝑆𝑡 𝑡
βSt =β0 1−St +β1St,
σ2 =σ2 1−S +σ2S, St 0 t 1t
and
Thus under regime 1(0), the coefficient parameter vector is β1(β0) and error
variance σ2(σ2). 10
Consider an AR(1) model: Usually it is assumed that the possible difference between the regimes is a mean and volatility shift, but no autoregressive change. That is
𝑦=𝜇+𝜙(𝑦 −𝜇 )+𝑢, 𝑢∼N0,σ2, 𝑡 𝑆𝑡 𝑡−1 𝑆𝑡−1 𝑡 𝑡 St
where𝜇 =𝜇 1−𝑆 +𝜇 𝑆,andσ2 asdefinedabove.IfS isknowna 𝑆𝑡−1 0 𝑡 1𝑡 St t
priori, then the problem is just a usual dummy variable autoregression problem.
whereu∼N0,σ2 , t St
St = 0 or 1 (Regime 0 or 1)
Markov switching models
𝑦=𝜇+𝜙(𝑦 −𝜇 )+𝑢, 𝑢∼N0,σ2, 𝑡 𝑆𝑡 𝑡−1 𝑆𝑡−1 𝑡 𝑡 St
In practice, however, the prevailing regime is not usually directly observable. Denote the probability of moving from state i to state j:
𝑃(𝑆𝑡 =𝑗|𝑆𝑡−1 =𝑖)=𝑝𝑖𝑗, 𝑖,𝑗=0,1 called transition probabilities, with 𝑝𝑖0 + 𝑝𝑖1 = 1, 𝑖 = 0, 1.
This kind of process, where the current state depends only on the state before, is called a Markov process, and the model a Markov switching model in the mean and the variance.
The probabilities in a Markov process can be conveniently presented in matrix form:
𝑃(𝑆𝑡 =0) = 𝑝00 𝑝10 𝑃(𝑆𝑡−1 =0)
𝑃(𝑆𝑡 = 1) 𝑝01 𝑝11 𝑃(𝑆𝑡−1 = 1)
Estimation of the transition probabilities 𝑝𝑖𝑗 is usually done (numerically) by maximum likelihood.
Markov switching models
The conditional probability density function for the observations 𝑦𝑡 given the
state variables 𝑆 , 𝑆 and the previous observations 𝐹 = {𝑦
𝑡 𝑡−1 𝑡−1 𝑡−1
𝑓(𝑦|𝑆,𝑆 ,𝐹 )= 1 exp −𝑦𝑡−𝜇𝑆𝑡−𝜙𝑦𝑡−1−𝜇𝑆𝑡−1
, 𝑦 𝑡−2
2
, ⋯ }:
𝑡 𝑡 𝑡−1 𝑡−1
2𝜎2 𝑆𝑡
becauseu =𝑦 −𝜇 −𝜙(𝑦 −𝜇
t 𝑡 𝑆𝑡 𝑡−1 𝑆𝑡−1
)∼N 0,σ2 St
2𝜋𝜎2 𝑆𝑡
The chain rule for conditional probabilities yields then for the joint probability
density function for the variables y , 𝑆 , 𝑆 given past information 𝐹
t 𝑡 𝑡−1 𝑡−1
𝑓 𝑦 , 𝑆 , 𝑆 𝐹 = 𝑓(𝑦 |𝑆 , 𝑆 , 𝐹 )𝑃(𝑆 , 𝑆 |𝐹
𝑡 𝑡 𝑡−1 𝑡−1 𝑡 𝑡 𝑡−1 𝑡−1 𝑡 𝑡−1 𝑡−1
)
such that the log-likelihood function to be maximized with respect to the unknown parameters becomes
𝑇𝑇11
l𝜃 =l(𝜃)= 𝑙𝑛 𝑓(𝑦|𝑆,𝑆 ,𝐹 )𝑃(𝑆 ,𝑆 |𝐹 ) 𝑡 𝑡 𝑡 𝑡−1 𝑡−1 𝑡 𝑡−1 𝑡−1
𝑡=1
𝑡=1 𝑆𝑡=0 𝑆𝑡−1=0
𝜃 = (𝑝, 𝑞, 𝜙, 𝜇 , 𝜇 , 𝜎2, 𝜎2) and the transition probabilities 0101
𝑝∶=𝑃(𝑆𝑡 =0|𝑆𝑡−1 =0),and𝑞∶=𝑃(𝑆𝑡 =1|𝑆𝑡−1 =1). Chain Rule for conditional probabilities: 𝑃(𝐴 ∩ 𝐵|𝐶) = 𝑃(𝐴|𝐵 ∩ 𝐶) · 𝑃(𝐵|𝐶)
Markov switching models
In order to find the conditional joint probabilities 𝑃 𝑆 ,𝑆 𝐹 we use
again the chain rule for conditional probabilities:
𝑃(𝑆,𝑆 |𝐹 )=𝑃(𝑆|𝑆 ,𝐹 )𝑃(𝑆 |𝐹 )=𝑃(𝑆|𝑆 )𝑃(𝑆 |𝐹 ) 𝑡 𝑡−1 𝑡−1 𝑡 𝑡−1 𝑡−1 𝑡−1 𝑡−1 𝑡 𝑡−1 𝑡−1 𝑡−1
where we have used the Markov property 𝑃 𝑆 𝑆 ,𝐹 = 𝑃(𝑆 |𝑆 ). 𝑡 𝑡−1 𝑡−1 𝑡 𝑡−1
We note that the problem reduces to estimating the time dependent state
𝑡 𝑡−1
This can be achieved as follows:
First,let𝑃 𝑆 𝐹 =𝑃 𝑆 =1 =𝜋begiven(suchthat𝑃 𝑆 =0 =1−𝜋). 0000
), and weighting them with the transition probabilities 𝑃 𝑆 𝑆 to obtain the joint probabilities 𝑃 𝑆 ,𝑆 𝐹 .
probabilities 𝑃(𝑆 |𝐹 𝑡−1 𝑡−1
𝑡 𝑡−1 𝑡−1
𝑡 𝑡−1 𝑡−1
Then the probabilities 𝑃(𝑆 |𝐹
steps algorithm.
)are obtained using the following two
𝑡−1 𝑡−1
Two steps algorithm
1. Given 𝑃(𝑆 =𝑖|𝐹 ), 𝑖 = 0,1, at the beginning of time 𝑡 (the 𝑡-th
iteration),
𝑃 𝑆 = 𝑗, 𝑆 = 𝑖 𝐹 = 𝑃 𝑆 = 𝑗 𝑆 = 𝑖 𝑃 𝑆 = 𝑖 𝐹 . 𝑡 𝑡−1 𝑡−1 𝑡 𝑡−1 𝑡−1 𝑡−1
2. Once 𝑦 is observed, we update the information set 𝐹 = {𝐹 , 𝑦 } and 𝑡 𝑡 𝑡−1𝑡
the probabilities by backwards application of the chain rule and using the law of total probability:
𝑃 𝑆 = 𝑗, 𝑆 = 𝑖 𝐹 = 𝑃 𝑆 = 𝑗, 𝑆 = 𝑖 𝐹 , 𝑦 𝑡 𝑡−1 𝑡−1 𝑡 𝑡−1 𝑡−1𝑡
𝑓(𝑆 =𝑗,𝑆 =𝑖,𝑦|𝐹 ) 𝑡 𝑡−1 𝑡 𝑡−1
𝑓(𝑦|𝐹 ) 𝑡 𝑡−1
𝑡−1 𝑡−1
=
𝑓(𝑦|𝑆 =𝑗,𝑆 =𝑖,𝐹 )𝑃𝑆 =𝑗,𝑆 =𝑖𝐹
= 𝑡𝑡 𝑡−1 𝑡−1 𝑡 𝑡−1 𝑡−1
σ1 𝑓(𝑦|𝑆,𝑆 ,𝐹 )𝑃𝑆=𝑠,𝑆 =𝑠 𝐹 𝑠𝑡,𝑠𝑡−1 𝑡 𝑡 𝑡−1 𝑡−1 𝑡 𝑡 𝑡−1 𝑡−1 𝑡−1
We may then return to step 1 by applying again the law of total probability:
1 𝑃𝑆=𝑠𝐹=𝑃𝑆=𝑠,𝑆=𝑠𝐹
𝑠𝑡−1=0
Once we have the joint probability for the time point t, we can calculate the likelihood. The maximum likelihood estimates for θ is then obtained iteratively maximizing the likelihood function by updating the likelihood function at each iteration with the above algorithm.
𝑡 𝑡 𝑡
𝑡 𝑡 𝑡−1 𝑡−1 𝑡−1
Probability in MS model
Steady state probabilities 𝑃(𝑆 = 1|𝐹 ) and 𝑃(𝑆 = 0|𝐹 ) are called the 0000
steady state probabilities, and, given the transition probabilities p and q, are obtained as (exercise):
1−𝑝
2−𝑝−𝑞 1−𝑞
𝑃(𝑆 =1|𝐹)= 00
𝑃(𝑆 =0|𝐹)= 00
2−𝑝−𝑞
where 𝑝 ∶= 𝑃(𝑆𝑡 = 0|𝑆𝑡−1 = 0),and 𝑞 ∶= 𝑃(𝑆𝑡 = 1|𝑆𝑡−1 = 1).
Smoothed probabilities: Recall that the state 𝑆𝑡 is unobserved. However, once we have estimated the model, we can make inferences on 𝑆𝑡 using all the information from the sample. This gives us
𝑃(𝑆 = 𝑗|𝐹 ),𝑗= 0,1, 𝑡𝑇
which are called the smoothed probabilities.
In the estimation procedure we derived 𝑃(𝑆 = 𝑗|𝐹 ), which are usually
called the filtered probabilities.
𝑡𝑡
Expected duration
The expected length the system is going to stay in state 𝑗 can be calculated from the transition probabilities. Let 𝐷 denote the number of periods the
system is in state 𝑗.
Application of the chain rule and the Markov property yields for the probability to stay k periods in state 𝑗
𝑃(𝐷 =𝑘)=𝑝𝑘−1(1−𝑝 ), 𝑗 𝑗𝑗 𝑗𝑗
which implies for the expected duration of that state
∞
𝐸(𝐷)=𝑘𝑃(𝐷 =𝑘)= 𝑗𝑗
𝑘=0
Notethatinourcase𝑝00 = 𝑝and𝑝11 = 𝑞.
1
1 − 𝑝𝑗𝑗
𝑗
Example: Are there long swings in the dollar/ sterling exchange rate?
Consider the following time series of the USD/GBP exchange rate from 1972I to 1996IV:
It appears that rather than being a simple random walk, the time series consists of distinct time periods of both upwards and downwards trends. In that case it may be put in a Markov switching framework as follows.
Example: Are there long swings in the dollar/ sterling exchange rate?
Model changes Δ𝑥𝑡 in the exchange rate as
Δ𝑥𝑡 = 𝛼0 +𝛼1𝑆𝑡 +𝜖𝑡,
so that Δ𝑥 ∼𝑁(𝜇 ,𝜎2) when 𝑆 = 0and Δ𝑥 ∼𝑁(𝜇 ,𝜎2) when 𝑆 =1, 𝑡00𝑡𝑡11𝑡
where 𝜇0 = 𝛼0 and 𝜇1 = 𝛼0 + 𝛼1 constituting two different drifts (if 𝛼1 ≠ 0) in the random walk model.
Estimating the model from quarterly with sample period 1972I to 1996IV
Parameter
Estimate
Std err
𝜇0
2.605
0.964
𝜇1
-3.277
1.582
𝜎2 0
13.56
3.34
𝜎2 1
20.82
4.79
p (regime 0)
0.857
0.084
q (regime 1)
0.866
0.097
The expected length of stay in regime 0 is given by: 1/(1 − 𝑝) = 7.0 quarters, and in regime 1: 1/(1 − 𝑞) = 7.5 quarters.
What if the number of regimes is unknown or even the autoregression is switching (Optional)
Adaptive approach to automatically detect significant switch.
The goal is to find the longest interval of homogeneity.
References: Chen and Spokoiny (2015), Modeling Nonstationary and Leptokurtic Financial Time Series, Econometric Theory, Volume 31, Issue 04, pp 703-728.
Chen and Niu (2014), Adaptive Dynamic Nelson-Siegel Term Structure Model with Applications, Journal of Econometrics, Vol. 180, pp. 98 – 115.
Chen , Härdle and Pigorsch (2010), Localized Realized Volatility Modelling, Journal of the American Statistical Association, Vol. 105, No. 492: 1376-393.
R lab
Run the following code to input the Tbrate data set in the Ecdat package and plot the three quarterly time series in this data set as well as their auto- and cross-correlation functions. The last three lines of code run augmented Dickey- Fuller tests on the three series.
data(Tbrate,package=”Ecdat”) library(tseries)
# r = the 91-day treasury bill rate # y = the log of real GDP
# pi = the inflation rate plot(Tbrate)
acf(Tbrate)
adf.test(Tbrate[,1]) adf.test(Tbrate[,2]) adf.test(Tbrate[,3])
Data: Rlab5_Tbrate.txt R: Rlab5.R
R lab
Problem 1
a) Describe the signs of nonstationarity seen in the time series and ACF plots. b) Use the augmented Dickey-Fuller tests to decide which of the series are
nonstationary. Do the tests corroborate the conclusions of the time series and ACF plots?
Next run the augmented Dickey-Fuller test on the differenced series and plot the differenced series using the code below. Notice that the pairs function creates a scatterplot matrix, but the plot function applied to time series creates time series plots. [The plot function would create a scatterplot matrix if the data were in a data.frame rather than having “class” time series (ts). Check the class of diff_rate with attr(diff_rate,”class”).] Both types of plots are useful. The former shows cross-sectional associations, while the time series plots are helpful when deciding whether differencing once is enough to induce stationarity. You should see that the first-differenced data look stationary.
R lab
diff_rate = diff(Tbrate) adf.test(diff_rate[,1]) adf.test(diff_rate[,2]) adf.test(diff_rate[,3]) pairs(diff_rate) # scatterplot matrix plot(diff_rate) # time series plots
Next look at the autocorrelation functions of the differenced series. These will be on the diagonal of a 3 × 3 matrix of plots. The off-diagonal plots are cross- correlation functions, which will be discussed in vector autoregressive models and can be ignored for now.
acf(diff_rate) # auto- and cross-correlations
Problem 2
1. Do the differenced series appear stationary according to the augmented Dickey-Fuller tests?
2. Doyouseeevidenceofautocorrelationsinthedifferencedseries?Ifso, describe these correlations.
R lab
For the remainder of this lab, we will focus on the analysis of the 91-day T-bill rate. Since the time series are quarterly, it is good to see if the mean depends on the quarter. One way to check for such effects is to compare boxplots of the four quarters. The following code does this. Note the use of the cycle function to obtain the quarterly period of each observation; this information
is embedded in the data and cycle simply extracts it.
par(mfrow=c(1,1)) boxplot(diff_rate[,1] ~ cycle(diff_rate))
Problem 3 Do you see any seasonal differences in the boxplots? If so, describe them.
Regardless of whether seasonal variation is present, for now we will look at nonseasonal models. Seasonal models are introduced in Section 10.1. Next, use the auto.arima function in the forecast package to find a “best-fitting“ nonseasonal arima model for the T-bill rates. The specifications max.P=0 and max.Q=0 force the model to be nonseasonal, since max.P and max.Q are the number of seasonal AR and MA components.
library(forecast) auto.arima(Tbrate[,1],max.P=0,max.Q=0,ic=”aic”)
R lab
Problem 4
1. What order of differencing is chosen? Does this result agree with your previous conclusions?
2. WhatmodelwaschosenbyAIC?
3. Which goodness-of-fit criterion is being used here?
4. Change the criterion to BIC. Does the best-fitting model then change?
Finally, refit the best-fitting model with the following code, and check for any residual autocorrelation. You will need to replace the three question marks by the appropriate numerical values for the best-fitting model.
fit1 = arima(Tbrate[,1],order=c(?,?,?)) acf(residuals(fit1))
Box.test(residuals(fit1), lag = 10, type=”Ljung”)
Problem 5 Do you think that there is residual autocorrelation? If so, describe this autocorrelation and suggest a more appropriate model for the T-bill series.
R lab
GARCH effects, that is, volatility clustering, can be detected by looking for auto-correlation in the mean-centered squared residuals. Another possibility is that some quarters are more variable than others. This can be detected for quarterly data by autocorrelation in the squared residuals at time lags that are a multiple of 4. Run the following code to look at autocorrelation in the mean- centered squared residuals.
resid2 = residuals(fit1)^2
acf(resid2)
Box.test(resid2, lag = 10, type=”Ljung”)
Problem 6 Do you see evidence of GARCH effects?
R lab
This example shows how to forecast a time series using R. Run the following code to fit a nonseasonal ARIMA model to the quarterly inflation rate. The code also uses the predict function to forecast 36 quarters ahead. The standard errors of the forecasts are also returned by predict and can be used to create prediction intervals. Note the use of col to specify colors. Replace c(?,?,?) by the specification of the ARIMA model that minimizes BIC.
data(Tbrate,package=”Ecdat”)
# r = the 91-day Treasury bill rate
# y = the log of real GDP
# pi = the inflation rate
# fit the nonseasonal ARIMA model found by auto.arima auto.arima(pi,max.P=0,max.Q=0,ic=”bic”)
fit = arima(pi,order=c(?,?,?))
forecasts = predict(fit,36)
plot(pi,xlim=c(1980,2006),ylim=c(-7,12)) lines(seq(from=1997,by=.25,length=36), forecasts$pred,col=”red”) lines(seq(from=1997,by=.25,length=36), forecasts$pred + 1.96*forecasts$se,
col=”blue”)
lines(seq(from=1997,by=.25,length=36), forecasts$pred – 1.96*forecasts$se,
col=”blue”)
R lab
Problem 7 Include the plot with your work.
a) Why do the prediction intervals (blue curves) widen as one moves farther
into the future?
b) What causes the predictions (red) and the prediction intervals to wiggle
initially?
R lab – Results & Discussions
Problem 1
a) Describe the signs of nonstationarity seen in the time series and ACF plots.
None of the three times series plots show clear signs of being mean-reverting. The plot of the log of GDP is clearly non-stationary and the Treasury rate and inflation rate may also be non-stationary. In the ACF plots we see slow nearly linear decays, especially for log(GDP) which also suggests that none of the series are stationary.
R lab – Results & Discussions
Problem 1
b) Use the augmented Dickey-Fuller tests to decide which of the series are nonstationary. Do the tests corroborate the conclusions of the time series and ACF plots?
> adf.test(Tbrate[,1])
Augmented Dickey-Fuller Test
data: Tbrate[, 1]
Dickey-Fuller = -1.925, Lag order = 5, p-value = 0.6075 alternative hypothesis: stationary
In the ADF tests, the null hypothesis is unit-root nonstationarity and the alternative is stationary. None of the p-values is below 0.05 (or even 0.1) so we cannot reject the null hypotheses that three series are nonstationary. The ADF tests corroborate the conclusions from the plots.
R lab – Results & Discussions
Problem 2
1. Do the differenced series appear stationary according to the augmented Dickey-Fuller tests?
Yes, they appear stationary. In each case, the p-value is below 0.01 so we can reject the null hypothesis of nonstationarity.
> adf.test(diff_rate[,1])
Augmented Dickey-Fuller Test
data: diff_rate[, 1]
Dickey-Fuller = -5.2979, Lag order = 5, p-value = 0.01 alternative hypothesis: stationary
Warning message:
In adf.test(diff_rate[, 1]) : p-value smaller than printed p-value
R lab – Results & Discussions
Problem 2
2. Do you see evidence of autocorrelations in the differenced series? If so, describe these correlations.
There is evidence of auto- and cross- correlation, though none of it is particularly strong. There is auto- correlation in all three series at lags of 1 quarter. This autocorrelation is negative in the differenced inflation rate series (pi). (Negative correlation can be caused by differencing.) There is some cross-correlation between the differenced Treasury bill rate (r) and the log(GDP) series (pi).
There are no strong correlations at lags of 1 or more years, so seasonal effects are not evident.
R lab – Results & Discussions
Problem 3 Do you see any seasonal differences in the boxplots? If so, describe them.
The boxplots are similar, so there do not appear to be any major differences in the boxplots between the four seasons. There may be some rather minor differences in volatility, with the 1st quarter being the most variable and the 4th quarter being the least variable. (In the ACF plotted in Problem 6, we will see more evidence for quarterly differences in volatility.)
R lab – Results & Discussions
>auto.arima(Tbrate[,1],max.P=0,max.Q=0,ic a) What order of differencing is =”aic”)
Problem 4
chosen? Does this result agree with Series: Tbrate[, 1]
your previous conclusions?
b) What model was chosen by AIC?
c) Which goodness-of-fit criterion is
being used here?
d) Change the criterion to BIC. Does
the best-fitting model then change?
(a) d = 1 differencing was chosen. This agrees with the previous conclusions
that the original series is nonstationary, but the first differences are stationary.
(b) The ARIMA(0,1,1) model was selected as best.
(c) AIC was used to select the model.
(d) BIC chose the same model as AIC, as can be seen in the output below.
ARIMA(0,1,1)
Coefficients: ma1 = 0.3275 (s.e. 0.0754) sigma^2 estimated as 0.8096: log likelihood = -245.65
AIC = 495.3 AICc = 495.37 BIC = 501.76
>auto.arima(Tbrate[,1],max.P=0,max.Q=0,ic =”bic”)
Series: Tbrate[, 1]
ARIMA(0,1,1)
Coefficients:ma1=0.3275 (s.e. 0.0754) sigma^2 estimated as 0.8096: log likelihood = -245.65
AIC = 495.3 AICc = 495.37 BIC = 501.76
R lab – Results & Discussions
Problem 5 Do you think that there is residual autocorrelation? If so, describe this autocorrelation and suggest a more appropriate model for the T-bill series. > fit1 = arima(Tbrate[,1],order=c(0,1,1))
> acf(residuals(fit1))
> Box.test(residuals(fit1), lag = 10, type=”Ljung”) Box-Ljung test
data: residuals(fit1)
X-squared = 13.0169, df = 10, p-value = 0.2227
Neither the plot nor the Box-Ljung test suggest that there is any residual autocorrelation.
R lab – Results & Discussions
Problem 6 Do you see evidence of GARCH effects? > resid2 = residuals(fit1)^2
> acf(resid2)
> Box.test(resid2, lag = 10, type=”Ljung”)
Box-Ljung test
data: resid2
X-squared = 92.0531, df = 10, p-value = 2.109e-15
There are noticeable GARCH effects since the autocorrelations in the squared residuals are strong and the Box-Ljung test has an extremely small p-value. Interestingly, the highest autocorrelations are at lags that are multiples of 1 year. This may indicate seasonal differences in the volatility, something suggested earlier by the boxplots; see Problem 3.
Note: the mean of the residuals is nearly, but not exactly, equal to 0. The residuals can be mean-centered by replacing resid2 = residuals(fit1)^2 by resid2 = (fit1$residual – mean(fit1$residual))^2. However, doing this has no noticeably effect on the ACF plot and the p-value of the Box-Ljung test is changed only very slightly from 2.1e-15 to 2.0e-15.
R lab – Results & Discussions
Problem 7 Include the plot with your work.
a) Why do the prediction intervals (blue curves) widen as one moves farther
into the future?
b) What causes the predictions (red) and the prediction intervals to wiggle
initially?
dat = as.data.frame(Tbrate)
attach(dat)
library(forecast)
The output from auto.arima shows that an ARIMA(1,1,1) model is selected by BIC. (Switching to AIC shows that this model also selected by AIC).
> auto.arima(pi,max.P=0,max.Q=0,ic=”bic”)
Series: pi
ARIMA(1,1,1)
Call: auto.arima(x = pi, max.P = 0, max.Q = 0, ic = “bic”) Coefficients:
ar1 ma1
0.6749 -0.9078
s.e. 0.0899 0.0501
sigma^2 estimated as 3.516: log likelihood = -383.12 AIC = 772.24 AICc = 772.37 BIC = 781.94
R lab – Results & Discussions
Problem 7
(a) The prediction intervals widen because d = 1 so the fitted model is nonstationary. A nonstationary process is not mean-reverting and in the future will drift away from its present value. Because of this, the prediction errors will increase without bound.
(b) More accurately, the predictions do not “wiggle” but rather decay to a constant value. The first few differences are forecasts by an ARMA(1,1) model and are non-zero, but eventually the forecasted differences decay to their mean which is assumed to be zero by auto.arima and arima. (arima only uses a nonzero mean for undifferenced series.)
To see whether a nonzero mean would provide a better fit, an ARMA(1,1) model was fit to diff(pi). This is the same model as used before, but now arima allows a nonzero mean. We can see that the estimated mean is extremely close to zero, so the assumption of a zero mean is reasonable and there is no point in redoing the prediction with a nonzero mean.
> arima(diff(pi),order=c(1,0,1))
Series: diff(pi)
ARIMA(1,0,1) with non-zero mean
Call: arima(x = diff(pi), order = c(1, 0, 1)) Coefficients:
ar1 ma1 intercept
0.6734 -0.9072 -0.0083
s.e. 0.0898 0.0498 0.0409
sigma^2 estimated as 3.515: log likelihood = -383.1 AIC = 774.2 AICc = 774.42 BIC = 787.13