Outline
Stationary process and autocorrelation. AutoRegressive (AR) models: Properties and estimation
𝑌=𝑎𝑌 +𝑎𝑌 +⋯+𝑎𝑌 +𝜀
Moving-Average (MA) models Box-Jenkins methodology
Readings
SDA chapter 9 FTS chapter 2 SFM chapter 11
𝑡 1 𝑡−1
2 𝑡−2
𝑝 𝑡−𝑝 𝑡
Quiz: Are they time series?
1.Monthly number of airline tickets sold by Chan Brothers Travel Pte Ltd, a travel agency in SG.
2.Singapore quarterly unemployment rate between 2008:Q2 and 2013:Q4.
3.Quarterly number of home mortgage loan applications to DBS.
4.The annual number of road accidents reported to Land Transport Authority and Traffic Police.
5.Time spent in training by workers in Microsoft Corp.
6.The dates on which a particular employee was absent from work due to illness over the past two years.
What is a time series …
Financial data are time series and have patterns related to time.
Whenever data is recorded sequentially over time and Time is considered to be an important aspect, we have a Time Series.
Time sequence of data is important.
Most time series are equally spaced at roughly regular intervals,
such as daily, monthly, quarterly, or annually.
A time series can be considered as a sample from the stochastic process. A stochastic process is a sequence of random variables and can be viewed as the “theoretical” or “population” of a time series. “Stochastic” is a synonym for random.
Population: Stochastic Process. Sample: Time series Data
Stationary process
When we observe a time series, the fluctuations appear random, but often with the same type of stochastic behaviour from one time period to the next. A stable estimation requires stable relationship between current values and their lagged values over time!
Stationary stochastic processes are probability models for time series with time-
invariant behaviour. Mathematically, If the properties of a stochastic process is
unaffected by a change of time origin, that is, for any time s and t, the probability
distributions of a sequence 𝑌 ,⋯,𝑌 and 𝑌 ,⋯,𝑌 are the same then the 1 𝑡 1+𝑠 𝑡+𝑠
process is said to be stationary..
Population: Stationary Stochastic Process. Sample: Stationary Time Series
Weak stationarity
Weak stationarity: If all the moments up to some order 𝑓 are unaffected by a change of origin, the process is said to be weakly stationary of order 𝑓
A process is covariance stationary (weakly stationary) if its mean, variance, and
covariance are unchanged by time shifts:
constant mean 𝐸 𝑌 = 𝜇 𝑡
constant variance var Y = 𝐸(𝑌 − 𝜇)2= 𝜎2 t𝑡
constant autocovariance structure corr 𝑌 , 𝑌 𝑠 𝑡
= 𝜌 |𝑠−𝑡|
, ∀𝑠, 𝑡
The mean and variance do not change with time and the correlation between two observations depends only on the lag, the time distance between them.
For example, the auto-covariance between 𝑌 and 𝑌 with time lag 1 being the 𝑡−1 𝑡−2
same as 𝑌 and 𝑌 with time lag 1. However the autocorrelations 𝑌 and
𝑡−5 𝑡−6
𝑌 with time lag 1 may differ from the autocorrelations 𝑌
lag 2 .
𝑡−1
and 𝑌 with time 𝑡−3
𝑡−2
𝑡−1
White-Noise (WN) process
The sequence of random variables 𝑋 , 𝑋 … is called IID noise if the observations of 12
the time series are independent and identically distributed (IID) random variables. Let 𝜖𝑡 , 𝑡 = ±0, ±1, ±2, … , be a zero-mean, IID sequence {𝜖𝑡} with
𝐸 𝜖𝑡 = 0, 𝐸 𝜖𝑠𝜖𝑡 = ቊ𝜎2 , 𝑖𝑓 𝑠 = 𝑡 for all 𝑡 and 𝑠 0 , 𝑖𝑓 𝑠 ≠ 𝑡
Sequence {𝜖𝑡} is called a purely random process, IID noise or simply strict white noise and we write 𝜖𝑡~IID(0,σ2).
If successive values follow a normal (Gaussian) distribution, then Gaussian white noise is a strict white noise, denoted 𝜖𝑡~IID N(0,σ2).
If the sequence {𝜖𝑡} is only uncorrelated and not necessarily independent, then {𝜖𝑡} is known as an uncorrelated white noise process or weak white noise, 𝜖𝑡~WN(0,σ2).
A weak white noise process is weakly stationary with 𝜌0 = 1 and 𝜌𝑘 = 0,∀𝑘 ≠0.
Because of the lack of correlation, past values of a white noise process contain no information that can be used to predict future values. One cannot predict the future values of a white noise process.
Serial dependence of a stationary process
Given a stationary linear time series, we are interested to know: • Expectation: 𝜇 = E(𝑌 ).
• Variance:𝛾 =𝑣𝑎𝑟𝑌. 0𝑡
• Auto-covariance (serial covariance):
𝛾 = 𝑐𝑜𝑣(𝑌,𝑌) = 𝐸(𝑌 −𝜇)(𝑌 −𝜇) 𝑡−𝑠 𝑠𝑡 𝑠 𝑡
that is symmetric with 𝛾𝑡−𝑠 = 𝛾𝑠−𝑡.
𝑡
𝑠𝑡 𝑡−𝑠 • Autocorrelation: 𝜌 = 𝑐𝑜𝑣 (𝑌 ,𝑌 ) = 𝛾
𝑡−𝑠 𝑣𝑎𝑟 (𝑌 ) 𝛾 𝑡0
Lagged scatterplot is a simple graphical summary of serial dependence in a time series, which is a scatterplot of the time series against itself offset in time by one to several time steps.
Let the time series of length 𝑇 be y1,…,𝑦𝑇 ∶
the lagged scatterplot for lag 𝑘 is a scatterplot of
the last T − 𝑘 observations 𝑦 , … , 𝑦 (concurrent values 𝑌 ) 𝑘+1𝑇 𝑡
against
the first T − 𝑘 observations 𝑦 , … 𝑦 . (lagged values 𝑌 ) 1 𝑇−𝑘 𝑡−𝑘
Autocorrelation
The (serial) dependence between values of a time series and their own past values is measured by autocorrelations.
The theoretical autocorrelation function is defined:
𝐸[(𝑌 −𝜇)(𝑌 −𝜇)] 𝜌𝑘= 𝑡 𝑡−𝑘 .
𝜎2
and 𝑌 , for different values of k. 𝑡−𝑘
ACF measures linear relations of 𝑌
−1≤𝜌𝑘 ≤1and𝜌𝑘 =𝜌−𝑘.
The covariance between 𝑌 and 𝑌
autocovariance function.
𝛾𝑘 =𝜎2𝜌𝑘 𝛾0 =𝜎2
𝜌𝑘 =𝛾𝑘 =𝛾𝑘
𝑡
is denoted by 𝛾 and 𝛾(⋅) is called the 𝑡𝑡−𝑘 𝑘
𝜎2 𝛾 0
Estimate autocorrelations
The theoretical autocorrelation function is defined: 𝜌 = 𝐸[(𝑌 −𝜇)(𝑌 −𝜇)].
𝑡 𝑡−𝑘 𝑘 𝜎2
The first order (or lag 1) autocorrelation measures the correlation between
two successive observations in a time series. Given data, the sample
estimator of lag-1 autocorrelation is computed as:
𝑇തത σ (𝑌−𝑌)(𝑌 −𝑌)
𝜌ො1 =
where 𝑌 =
Higher order autocorrelations: its sample function for lag k is
𝑡=2 𝑡 𝑡−1
𝑇ത2 σ𝑌−𝑌
ത 1
𝑡=1 𝑡
𝑇
𝑌 is the common sample mean of the whole time series.
σ𝑇
𝑡=1 𝑡
𝑇തത σ (𝑌 −𝑌)(𝑌 −𝑌)
𝑡=𝑘+1 𝑡 𝑡−𝑘
𝑌 is the common sample mean of the whole time series.
𝜌ො𝑘 =
𝑇ത2 σ𝑌−𝑌
where 𝑌 =
σ𝑇
𝑡=1 𝑡
ത 1
𝑡=1 𝑡
𝑇
impossible to estimate 𝜌ො𝑘 for 𝑘 ≥ 𝑇;
𝜌ො𝑘 can not be estimated accurately for large k; ruleofthumb:T≥50and𝑘 ≤𝑇/4.
Sample autocorrelation function (ACF)
A plot of sample autocorrelations 𝜌ො𝑘 against lags 𝑘 for 𝑘 = 1,2, … is called sample autocorrelation function (ACF) or correlogram, typically plotted for the first 𝑇/4 lags or thereabouts.
𝜌0 = 1 𝜌0 = 1
4_ACF.R
Sampling distribution of AC estimator
Sampling distribution: For a given sample of a stationary time series 𝑦 𝑇 , let 𝑦ത be the sample mean. Then the lag-k sample
autocorrelation is:
If 𝑦𝑡is an IID sequence with finite variance, 𝜌ො𝑘 ∼ 𝑁(𝜌𝑘, 1) 𝑇
If 𝑦 is a weakly stationary series, 𝜌ො ∼ 𝑁(𝜌 , (1 + 2 σ𝑘−1 𝜌2)/𝑇) 𝑡 𝑘 𝑘 𝑗=1𝑗
Facts of normal random variable
𝑡 𝑡=1
𝑇തത σ (𝑌 −𝑌)(𝑌 −𝑌)
𝑡=𝑘+1 𝑡 𝑡−𝑘
𝜌ො𝑘 =
𝑡=1 𝑡
𝑇ത2 σ𝑌−𝑌
68.26% 95.44% 99.72%
2/3 of values of X are within 1 std deviation of its mean 95% of values of X are within 2 std deviations of its mean almost all values of X are within 3 std deviations of its mean
Test for randomness (individual test for autocorrelation)
If autocorrelations are always zero, it indicates that past values are uncorrelated to future values. On the contrary, nonzero autocorrelation(s) implies that past values can be used to forecast future values. However the sample autocorrelations may not be identical to the theoretical ones due to random errors.
Test for autocorrelation at any lag k: 𝐻 : 𝜌 = 0 0𝑘
The lag-k autocorrelation is considered as significant,
if the sample autocorrelation at lag order k is larger than two standard deviations in magnitude,
𝜌ො 𝑘 > 2 / 𝑇
it is significant at 5% level. We reject the null hypothesis of 𝐻 : 𝜌 = 0 and 0𝑘
conclude that the time series is not random. Otherwise, if all the considered autocorrelations are insignificant, we don’t reject the null hypothesis of randomness of the time series.
The critical value is 1.96 at 5% significance level. Nevertheless, 2 is used to simplify the computation.
Correlogram (Sample ACF)
The plot is normally supplemented with 5% significance limits (dashed
lines at ±2/ 𝑇 ) to enable a graphical check of whether serial dependence exists at a particular lag. Any bar beyond the dashed lines indicates significant autocorrelations.
±2/ 𝑇
Joint test for autocorrelations
The Ljung-Box test statistic, also called Q test statistic, can be used to determine if the first 𝑚 ACFs are jointly equal to zero.
𝐻:𝜌=𝜌=⋯=𝜌 =0𝑣𝑠𝐻:𝜌≠0. 012𝑚𝑎𝑗
𝑚 𝜌ො𝑘2 𝑄(𝑚)=𝑇(𝑇+2) 𝑇−𝑘 ~𝑥2(𝑚)
𝑘=1
which is asymptotically chi-squared distributed with m degrees of freedom. Decision rule: Reject 𝐻 if 𝑄 𝑚 > 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝒳2 (𝛼) or
P-value is less than 𝛼.
Remark: 1. The autocorrelation tests, including the individual sampling distribution test and the joint Q-test, can be applied to the original time series and residuals.
2. For residuals of a fitted model, we use the same test statistics to check the
significance of autocorrelations. In this case the 𝑄(𝑚) statistic is asymptotically
𝜒2 distributed, where 𝑔 is the no. of estimated parameters in the fitted model. 𝑚−𝑔
0𝑚
Example
Daily prices of STI from 4th January, 2000 to 26th October, 2012. Q(5) = 16459.81, df = 5, p-value < 2.2e-16.
Daily log returns of STI: 𝑄 5 = 7.723(𝑃 − 𝑣𝑎𝑙𝑢𝑒: 0.172) and 𝑄 10 = 15.936(0.102)
Implication: Daily prices of STI depend on their own past values. However stock returns do not have significant serial correlations.
Autoregressive (AR) model of order 1
The autoregressive model of lag 1, written as AR(1) is:
𝑌−𝜇=𝜙𝑌 −𝜇 +𝑒, 𝑒 ∼𝑊𝑁0,𝜎2 𝑡 𝑡−1 𝑡 𝑡 𝑒
or 𝑌=𝛿+𝜙𝑌 +𝑒, where𝛿= 1−𝜙𝜇 𝑡 𝑡−1𝑡
Current value of 𝑌 can be predicted using its past value 𝑌
𝑡 𝑡−1
. The deviation between the realized value and the fitted value (from the
model) is due to the existence of a random shock 𝑒𝑡.
The AR coefficient 𝜙 is zero: Y depends purely on the random component
(error), and there is no serial dependence.
If 𝜙 is large, past values strongly influence future values.
Assumptions regarding the error term: Zero mean, constant variance (𝜎2), and
mutually uncorrelated (random).
If our AR(1) model successfully captures the data’s characteristics, then there should be no (significant) autocorrelations in the residuals! To check the adequacy of the fitted AR(1) model, we focus on the residuals from the regression for any “left-over” dependence. Significant?-> Try AR model of higher order (more lagged values)
𝑒
AR(1) model
The AR(1) model 𝑌 −𝜇=𝜙 𝑌 −𝜇 +𝑒 is stationary if and only if 𝑡 𝑡−1 𝑡
− 1 < 𝜙 < 1.
Stationary models have constant mean, variance and ACFs over time.
Mean:𝐸 𝑌 =𝜇=𝛿/(1−𝜙) 𝑡
Variance:𝛾 =𝑣𝑎𝑟 𝑌 −𝜇 =𝜙2𝑣𝑎𝑟 𝑌 −𝜇 +𝑣𝑎𝑟 𝑒 0 𝑡 𝑡−1 𝑡
𝜎2 𝑒
1−𝜙2
⇒𝛾 =𝜙2𝛾 +𝜎2⇒𝑣𝑎𝑟𝑌 =𝛾 = 00𝑒𝑡0
ACF:𝛾 =𝑐𝑜𝑣𝑌,𝑌 =𝐸 𝑌−𝜇 𝑌 −𝜇 𝑘 𝑡𝑡−𝑘 𝑡 𝑡−𝑘
=𝜙𝐸[𝑌 −𝜇 𝑌 −𝜇]+𝐸[𝑒 𝑌 −𝜇]=𝜙𝛾 𝑡−1 𝑡−𝑘 𝑡 𝑡−𝑘 𝑘−1
⇒𝛾𝑘 =𝜙𝛾𝑘−1 ⇒𝜌𝑘 =𝜙𝜌𝑘−1 𝛾0 𝛾0
𝑐𝑜𝑟𝑟𝑌,𝑌 =𝜌 =𝜙𝑘, 𝑘=1,2,... 𝑡𝑡−𝑘 𝑘
ACF 𝜌𝑘 decays exponentially as k increases
Theoretical ACFs of AR(1) process
Sample ACFs will not look exactly like the theoretical forms due to random noise!
Sample autocorrelations of an AR(1) process
ACF of a AR(1) process with 𝜙1 = 0.9 (left) and 𝜙1 = −0.9 (right).
𝑇തത σ (𝑌 −𝑌)(𝑌 −𝑌)
𝑡=𝑘+1 𝑡 𝑡−𝑘
𝜌ො𝑘 =
𝑡=1 𝑡
𝑇ത2 σ𝑌−𝑌
How to check stationarity of a stochastic process – Visual inspection
Left: One-month inflation rate (in percent, annual rate) and its first difference. The differenced series certainly oscillate around a fixed mean of 0%.The differenced series is clearly stationary, but whether or not the original series is stationary needs further investigation.
Right: Monthly totals of international airline passengers for the years 1949 to 1960. There are three types of nonstationarity seen in the plot: 1) upward trend, 2) seasonal variation, and 3) increase over time in the size of the seasonal oscillations.
4_examples.R
How to check stationarity of a stochastic process? --Lag operator, AR polynomial and stationarity
Let the value of a time series at time 𝑡, 𝑦𝑡 , is a linear function of the last 𝑝 values of 𝑦 and of exogenous terms, denoted by 𝜖𝑡:
𝑦𝑡 =𝑎1𝑦𝑡−1+𝑎2𝑦𝑡−2+⋯+𝑎𝑝𝑦𝑡−𝑝+𝜖𝑡
Expressions of this type are called difference equations.
The lag operator, denoted by 𝐿 (or B), is an operator that shifts the time index
backward by one unit.
Applying 𝐿 the variable at time t will lead to the variable at 𝑡 − 1 : 𝐿𝑦𝑡 = 𝑦𝑡−1.
22
Applying𝐿 ,𝐿 𝑦𝑡 = 𝐿(𝐿𝑦𝑡) =𝐿𝑦𝑡−1 = 𝑦𝑡−2 .
series,say 𝑥 ∞ where𝑥 = 𝑦 . 𝑡 𝑡=−∞ 𝑡 𝑡−1
A constant 𝑐 can be viewed as a special series, 𝑦 ∞
𝑡 𝑡=−∞
Formally, the lag operator transforms one time series, say 𝑦 ∞ into another
𝑡 𝑡=−∞
with 𝑦 = 𝑐 for all 𝑡,
𝑡 and we can apply the lag operator to a constant obtaining 𝐿𝑐 = 𝑐.
By raising 𝐿 to a negative power, we obtain a lead operator:
−𝑘
𝐿 𝑦𝑡=𝑦𝑡+𝑘.
Difference equations
The difference equation for an ARMA(p, q) process:
𝑦𝑡 =𝑎1𝑦𝑡−1 +𝑎2𝑦𝑡−2 +⋯+𝑎𝑝𝑦𝑡−𝑝 +𝜖𝑡 +𝑏1𝜖𝑡−1 +⋯+𝑏𝑞𝜖𝑡−𝑞
can be written as
where 𝑎𝐿 =1−𝑎1𝐿−𝑎2𝐿 −⋯−𝑎𝑝𝐿 iscalledARpolynomial,andb𝐿 =1+
2𝑞
𝑏1𝐿 + 𝑏2𝐿 + ... + 𝑏𝑞 𝐿 is called MA polynomial.
The difference equation for an AR(3) process
𝑎 𝐿 𝑦𝑡 = 𝑏 𝐿 𝜖𝑡. 2𝑝
can be written as or, in compact form
𝑦𝑡 = 𝑎1𝑦𝑡−1 + 𝑎2𝑦𝑡−2 + 𝑎3𝑦𝑡−3 + 𝜖𝑡
23
𝑦𝑡 =𝑎1𝐿𝑦𝑡 +𝑎2𝐿 𝑦𝑡 +𝑎3𝐿 𝑦𝑡 +𝜖𝑡
23 1−𝑎1𝐿−𝑎2𝐿 −𝑎3𝐿 𝑦𝑡 =𝜖𝑡
𝑎 𝐿 𝑦𝑡 = 𝜖𝑡
23
where 𝑎 𝐿 = 1 − 𝑎1𝐿 − 𝑎2𝐿 − 𝑎3𝐿 is called AR polynomial.
Characteristic equation
The reverse characteristic equation associated with the difference equation: 𝑎𝜆=0
Any value of 𝜆 which satisfies the reverse characteristic equation is called a root of polynomial 𝑎(𝜆). A polynomial of degree 𝑝 has 𝑝 roots 𝜆𝑘 , 𝑘 = 1, ... , 𝑝. In general, roots are complex numbers: 𝜆𝑘 = 𝑎𝑘 + 𝑏𝑘𝑖.
The coefficient form of a reverse characteristic equation: 1−𝑎1𝜆−⋯−𝑎𝑝𝜆𝑝 =0
An alternative is the root form given by
𝜆1 − 𝜆 𝜆2 − 𝜆 ... 𝜆𝑝 − 𝜆 = ෑ
The latter form reveals the roots directly.
𝑝 𝑖=1
𝜆𝑖 − 𝜆 = 0
The ARMA(p,q) process is stationary if the roots of the AR polynomial lie outside the unit circle.
Example
Given an AR(2) process
The reverse characteristic equation in coefficient form is given by
1−3𝜆+1𝜆2 =0or2−3𝜆+1𝜆2 =0 22
that can be written in root form as
1−𝜆 2−𝜆 =0
Here, 𝜆1 = 1 and 𝜆2 = 2 represent the set of possible solutions for 𝜆 satisfying the reverse characteristic equation
1 − 3 𝜆 + 1 𝜆2 = 0 22
𝑦𝑡 = 3 𝑦𝑡−1 − 1 𝑦𝑡−2 + 𝜖𝑡 22
Model estimation
There are several approaches to estimating AR models.
1. TheYule-WalkerestimatorusestheYule-Walkerequationswith
𝑘 = 1, ... . , 𝑝 and estimates the AR parameters of pure AR models from the SACF.
2. The least squares estimator (LSE) finds the parameter estimates that minimize the sum of the squared residuals. For pure AR models, the LSE leads to the linear OLS estimator.
3. Themaximumlikelihoodestimator(MLE)maximizesthe(exactor approximate) log-likelihood function associated with the specified model. To do so, explicit distributional assumption for the disturbances, 𝜖t , has to be made.
Maximum Likelihood Estimator
The joint density function of two dependent random variables can be decomposed
𝑓𝑥2,𝑥1 =𝑓𝑥2𝑥1𝑓𝑥1
Accordingly, for three dependent random variables:
and
𝑓 𝑥3,𝑥2,𝑥1 = 𝑓 𝑥3 𝑥2,𝑥1 𝑓 𝑥2,𝑥1
𝑓 𝑥3,𝑥2,𝑥1 = 𝑓 𝑥3 𝑥2,𝑥1 𝑓 𝑥2|𝑥1 𝑓(𝑥1)
For time series data (𝑦0 , ... , 𝑦𝑇 ), the joint pdf can be written as 𝑓𝑌,...,𝑌 =𝑓𝑦 𝑌 𝑓𝑦 𝑌 ...𝑓(𝑦)
and the likelihood function becomes
0 𝑇 𝑇 𝑇−1 𝑇−1 𝑇−2
0
𝑇
𝐿 𝜃;𝑌,...,𝑌 =𝑓(𝑦 )ෑ𝑓 𝑦 𝑌
0 𝑇 0 𝑇𝑇−1
𝑡=1
Maximizing 𝐿 𝜃; 𝑌 𝑇
is equivalent to maximizing the log-likelihood function
𝑇
ln𝐿 𝜃;𝑌,...,𝑌 =ln𝑓(𝑦 )+ln𝑓 𝑦 𝑌 0𝑇 0 𝑇𝑇−1
which is typically maximized in practice.
𝑡=0
Maximum Likelihood Estimator
𝑓𝑦𝑌 ~N(𝜙𝑦 ,𝜎2) 𝑡 𝑡−1 𝑡−1 𝜖
Thisgives𝐿 𝜙,𝜎2;𝑌,...,𝑌 =𝑓 𝑦 ς𝑇 2𝜋𝜎2 −1/2𝑒𝑥𝑝 − 𝑦𝑡−𝜙𝑦𝑡−1 2 𝜖0𝑇 0𝑡=1𝜖 2𝜎2
𝜖
Consider an 𝐴𝑅(1) process 𝑌 = 𝜙𝑌 + 𝜖 with 𝜖 ~ 𝑁(0, 𝜎2), IID. Thus
𝑡 𝑡−1 𝑡 𝑡 𝜖
2−𝑇 𝑇 𝑦𝑡−𝜙𝑦𝑡−12
=𝑓𝑦 2𝜋𝜎 2ෑ𝑒𝑥𝑝−
0𝜖 2𝜎2
The log-likelihood function is
𝑡=1
𝜖
𝑇𝑇1𝑇
ln𝐿𝜙,𝜎2;𝑌,...,𝑌 =ln𝑓𝑦 − ln2𝜋− ln𝜎2− 𝑦−𝜙𝑦 2 𝜖0𝑇 022𝜖2𝜎2𝑡𝑡−1
𝜖 𝑡=1
The unconditional pdf of initial value 𝑦0 is normal with mean 0 and variance 𝜎2/(1 − 𝜙2):
Therefore
Maximization with respect to 𝜙 and 𝜎2 yields exact maximum likelihood estimates.
𝜖
𝜎2
𝑦0~𝑁0, 𝜖 1−𝜙2
1
𝑦2(1 − 𝜙2)
ln𝑓𝑦 =− 𝑙𝑛2𝜋 +𝑙𝑛1−𝜙2 −𝑙𝑛𝜎2− 0
02 𝜖2𝜎2 𝜖
Conditional MLE and conditional LSE
Conditional MLE of AR(1) is obtained by conditioning 𝑦1, ... , 𝑦𝑇 on pre-sample realizations 𝑦0. It is a simpler estimator that deletes the marginal density of 𝑦0 from the likelihood and maximizes
𝑇𝑇1𝑇
−ln2𝜋−ln𝜎2− 𝑦−𝜙𝑦 2 2 2 𝜖 2𝜎2 𝑡 𝑡−1
𝜖 𝑡=1
This estimator is called the conditional least-squares estimator. It is a least- squares estimator because it minimize
𝑇
𝑦𝑡−𝜙𝑦𝑡−1 2 𝑡=1
The default method for the function arima in R is to use the conditional least- squares estimates as starting values for maximum likelihood.
Example: BMW log returns
Box-Ljung test
data: bmw
X-squared = 44.987, df = 5, p-value = 1.460e-08
A positive value of 𝜙 means that
there is some information in today's return that could be used for prediction of tomorrow's return, but a small value of 𝜙 means that the prediction will not be very accurate. The potential for profit might be negated by trading costs. 4_BMW.R
Why stationarity?
The beauty of a stationary process is that it can be modelled with relatively few
parameters. For example, we do not need a different expectation for each
observation 𝑌 ; rather they all have a common expectation. 𝑡
A stationary series should show oscillation around some fixed level, a phenomenon called mean-reversion.
An AR(1) process is stationary if and only if −1 < 𝜙 < 1.
Example 1 (stationary): 𝑌 = 80 + 0.2 𝑌 + 𝑒 , with 𝑌 = 120. The mean of time series
𝜇 =
80 = 100. According to the AR(1) model, we obtain: 1−0.2
𝑌 =80+0.2×120=104 2
𝑌 =80+0.2×104=100.8 3
⋮
𝑡 𝑡−1𝑡 1
Stationary time series converges to the mean. (Mean reversion!)
Example 2 (non-stationary): 𝑌 = 80 + 1.2 𝑌 + 𝑒 , with 𝑌 = 120. The mean of time
series 𝜇 =
80 = −400. According to the AR(1) model, we obtain: 1−1.2
𝑌 =80+1.2×120=224 2
𝑌 =80+1.2×224=348.8 3
⋮
𝑡 𝑡−1𝑡 1
Non-stationary time series is explosive.
Diagnostic checking
We need to check the adequacy of the fitted model.
The part of the data unexplained by the model (i.e., the residuals) should be small and not exhibit any systematic or predictable patterns.
One could design diagnostic checking procedures that take the modeling objectives explicitly into account.
Why residual autocorrelation indicate problem?
SupposethatwearefittinganAR(1)model,𝑌 −𝜇=𝜙 𝑌 −𝜇 +𝑒, 𝑡 𝑡−1 𝑡
but the true model is an AR(2) process given by 𝑌−𝜇=𝜙𝑌 −𝜇+𝜙𝑌 −𝜇+𝑒
value of 𝜙.
Residual: 𝑒ෝ = 𝑌 − 𝜇 − 𝜙∗ 𝑌 − 𝜇
𝑡 1 𝑡−1 2 𝑡−2 𝑡
Since we are fitting the incorrect AR(1) model, there is no hope of estimating
𝜙2 since it is not in the model. Moreover, 𝜙 does not necessarily estimate 𝜙
because of bias caused by model misspecification. Let 𝜙∗ be the expected
𝑡𝑡 𝑡−1
=𝜙 𝑌 −𝜇+𝜙 𝑌 −𝜇+𝑒−𝜙∗𝑌 −𝜇
𝑡−1
The residuals do not estimate the white noise process as they would if the correct AR(2) model were used.
The presence of 𝜙 𝑌 −𝜇 in the residuals causes them to be
autocorrelated.
1 𝑡−1 2 𝑡−2 𝑡
=(𝜙−𝜙∗)𝑌 −𝜇+𝜙𝑌 −𝜇+𝑒 1 𝑡−1 2𝑡−2 𝑡
2 𝑡−2
Testing for whiteness of residuals
A standard assumption in econometrics modeling is that the white noise assumption:
𝜎2, 𝑖𝑓 𝑠 = 𝑡 𝐸𝜖𝑡 =0,𝐸𝜖𝑠𝜖𝑡 =ቊ𝜖
Any departure from whiteness indicates that the residuals still contain serial dependent information that the model has not extracted from the data.
A systematic way of checking the whiteness: if the SACF and SPACF of the residuals have no significant elements, we conclude that they resemble white noise; otherwise, there is still information in the residuals.
One problem with checking the significance of individual elements of any of the identification functions is that each element might be individually insignificant, but all (or a subset) of the elements taken together may be jointly significant.
0 , 𝑖𝑓 𝑠 ≠ 𝑡
Portmanteau test
A popular goodness of fit test is the Box-Pierce Q-statistic, also known as the portmanteau test, which tests the joint hypothesis
𝐻∶𝜌=𝜌=⋯=𝜌=0
The Q-statistic is computed by
𝑚
𝑄 = 𝑇 𝜌ො 2 𝜖,𝑘
𝑘=1
0 𝜖,1 𝜖,2
𝜖,𝑚
The sum of squared autocorrelations is intended to capture deviations from zero in either direction and at all lags m.
For data generated by a white noise process, Q has an asymptotic chi-square (𝜒2) distribution with (𝑚 − 𝑝 − 𝑞) degrees of freedom, where 𝑝 and 𝑞 refer to the number of model parameters. In the AR(1) case, 𝑝 + 𝑞 = 1.
Portmanteau test
Ljung and Box adapt the Box-Pierce test for finite sample by modifying the Q-statistic to obtain the Q*-statistic, such that
𝑚
𝑄∗ =𝑇 𝑇+2 𝑇−𝑘 −1𝜌ො2 𝜖,𝑘
𝑘=1
which constitutes the Ljung-Box test. However, for moderate sample sizes, also the Ljung-Box test has low power and may fail to detect model misspecifications.
Both versions of the portmanteau test check only for uncorrelatedness of the residuals and not for independence or "true" whiteness.
The detection of more complex temporal dependencies in the absence of autocorrelations indicates that the class of linear ARMA models is inappropriate for the data at hand.
.
Portmanteau test
Adapting the Ljung-Box test, McLeod and Li test the joint hypothesis on the ACF of squared residuals:
𝐻 ∶𝜌2 =𝜌2 =⋯=𝜌2 0𝜖,1𝜖,2 𝜖,𝑚
by performing a Q test on the squared residuals: 𝑚
=0
∗ −1 2 𝑄2 =𝑇(𝑇+2) 𝑇−𝑘 𝜌ො𝜖2,𝑘
𝑘=1
Under the null hypothesis of no autocorrelation, 𝑄∗ has a 𝜒2 distribution with 𝑚
degrees of freedom.
Alternatively, a goodness of fit test based on residual partial autocorrelation can be
used. If is the 𝜓𝜖,𝑘 is the k-th order residual partial autocorrelation coefficients, then
the statistic
𝑚
−1 2 𝑄𝑀 = 𝑇 𝑇 + 2 𝑇 − 𝑘 𝜓𝜖,𝑘
𝑘=1
2
is asymptotically 𝜒2 distributed with (𝑚 − 𝑝 − 𝑞) degrees of freedom if the model fitted is appropriate.
Testing for normality
In the common case of a normal assumption, the residuals have to be tested for normality.
The Jarque-Bera test accomplishes this. It is based on the third and fourth sample moments of the residuals. Since the normal distribution is symmetric,
the third moment, denoted by 𝜇3, should be zero; and the fourth moment 𝜇 , should satisfy 𝜇 = 3𝜎 .
መ
The measure of third moment or skewness, 𝑆, and the measure of fourth moment
or kurtosis, 𝐾 , can be calculated as
444
1 𝑇 𝜖Ƹ3 𝑡
መ
𝑆 = 𝑇 𝜎ො 3
𝑇 𝑡=1
1 𝜖Ƹ4 𝑡
𝐾 = 𝑇 3 𝜎ො 4 − 1
𝑡=1
JB test
The Jarque-Bera test tests the null hypothesis
The sample statistics
𝐻 ∶ 𝜇3 = 0 𝑎𝑛𝑑 𝜇4 − 3 = 0
0
𝜎3 𝜎4
1𝑇𝜖Ƹ4 2 6𝑇 𝑡=1 𝜎ො3 24𝑇 𝑡=1 𝜎ො4
1𝑇𝜖Ƹ32
𝜆1= 𝑡 and𝜆2= 𝑡−3
are asymptotically 𝜒2(1) distributed, respectively.
The null hypothesis, 𝐻 , as stated above consists of a joint test for 𝜆
and 𝜆 012
being zero and can be tested via 𝜆3 = 𝜆1 + 𝜆2 which is asymptotically 𝜒2(2) distributed.
Example: BMW log returns
The sample ACF of the residuals is plotted. None of the autocorrelations at low lags is outside the test bounds.
A few at higher lags are outside the bounds, but this type of behaviour is expected to occur by chance or because, with a large sample size, very small but nonzero true correlations can be detected.
Box-Ljung test
data: residuals(fitAR1)
X-squared = 6.8669, df = 5, p-value = 0.1431
The large p-value indicates that we should accept the null hypothesis that the residuals are uncorrelated, at least at small lags. This is a sign that the AR(1) model provides an adequate fit. 4_BMW.R
Example: Inflation rate – AR(1) fit
One might try fitting an AR(1) to the changes in the inflation rate, since this series is stationary. However, the AR(1) model does not adequately fit the changes in the inflation rate.
Box-Ljung test
data: fit$resid
X-squared = 46.1752, df = 12, p-value = 3.011e-06. 4_inflation.R
Forecasting
Let the data observed up to period t be collected in the information set 𝐹 ={𝑦:𝜏≤𝑡}
𝑡𝜏
Having observed a time series up to period 𝑡, we would like to forecast a future value𝑦𝑡+h forperiod𝑡+h,h = 1,....
We distinguish between the one-step-ahead predictor, 𝑦ො𝑡(1) for 𝑦𝑡+1, and the multi- step-ahead predictor, 𝑦ො𝑡 h for 𝑦𝑡+h, given the forecasting horizon h and forecasting origin 𝑡.
To characterize the forecasts, three quantities are needed:
Forecast function 𝑦ො𝑡(h)
Forecast error 𝜖Ƹ (h) t
Variance of the forecast error
Forecasting: Loss Function
Instead of considering the "true cost" of wrong predictions, we consider a purely statistical criterion, the mean-squared prediction error (MSE). Given the information set, we can also define the conditional expectation of 𝑦𝑡+h:
𝐸𝑦≔𝐸𝑦𝐹=𝐸𝑦𝑦,𝑦,... 𝑡 𝑡+h 𝑡+h 𝑡 𝑡+h 𝑡 𝑡−1
We would like to find the estimate of 𝑦𝑡+h, 𝑦ො𝑡 h ,which has the smallest possible MSE:
𝑀𝑆𝐸 𝑦ො𝑡 h =E 𝑦𝑡+h−yොt h 2 =𝐸 𝑦𝑡+h−𝐸𝑡 𝑦𝑡+h +𝐸𝑡 𝑦𝑡+h −yොt h 2
Squaring the expression in brackets and using the fact that
Then, we obtain
𝐸𝑦𝑡+h−𝐸𝑡𝑦𝑡+h 𝐸𝑡𝑦𝑡+h−yොth =0
2
𝑀𝑆𝐸𝑦ො𝑡h=𝑀𝑆𝐸𝐸𝑡𝑦𝑡+h +𝐸𝐸𝑡𝑦𝑡+h−yොth
We see quantity 𝑀𝑆𝐸 𝑦ො𝑡 h is minimized , if
𝑦ො 𝑡 h = 𝐸 𝑡 𝑦 𝑡 + h
Example: AR(1)
1-step ahead forecast at time 𝑡, the forecast origin: 𝑦ො 𝑡 + 1 = 𝑎 0 + 𝑎 1 𝑦 𝑡
is the un-
1-step ahead forecast error: 𝜖Ƹ 1 ≔ 𝑦 − 𝑦ො = 𝜖 𝑡 𝑡+1 𝑡+1
Thus, 𝜖 Variance of 1-step ahead forecast error:𝑉𝑎𝑟[𝜖Ƹ 1 ] = 𝑉𝑎𝑟(𝜖
predictable part of 𝑦𝑡+1. It is the shock at time 𝑡 + 1!
𝑡 𝑡+1𝜖
𝑡+1
𝑡+1
) = 𝜎2
2-step ahead forecast: 𝑦ො𝑡+2 = 𝑎0 + 𝑎1𝑦ො𝑡+1
2-step ahead forecast error: 𝜖Ƹ 2 ≔ 𝑦 − 𝑦ො = 𝑎 + 𝑎 𝑦 + 𝜖 − 𝑡 𝑡+2𝑡+201𝑡+1𝑡+2
𝑎0 + 𝑎1𝑦ො𝑡+1 = 𝜖𝑡+2 + 𝑎1𝜖𝑡+1
Variance of 2-step ahead forecast error: 𝑉𝑎𝑟[𝜖Ƹ
2 ] = 𝑉𝑎𝑟(𝜖 + 𝑎 𝜖 𝑡+2 1 𝑡+1
) = (1 +
𝑡 1𝜖𝑡
𝑎2)𝜎2, which is greater than or equal to 𝑉𝑎𝑟[𝜖Ƹ forecasts increases as the number of steps increases.
1 ], implying that uncertainty in
h-step ahead forecast: 𝑦ො𝑡+h = 𝑎0 + 𝑎1𝑦ො𝑡+h−1 h-step ahead forecast error:𝜖Ƹ h ≔ 𝑦 − 𝑦ො =
𝑎
0
+ 𝑎 𝑦
1 𝑡+h−1
+ 𝜖
𝑡+h
−
𝑎 + 𝑎 𝑦ො = 𝜖 + 𝑎 𝜖 + ⋯ + 𝑎 h − 1 𝜖
0 1 𝑡+h−1 𝑡+h 1 𝑡+h−1 1 𝑡+1
𝑡 𝑡+h 𝑡+h
Variance of h-step ahead forecast error: 𝑉𝑎𝑟[𝜖Ƹ h ]
= 𝜎2 σh−1 𝑎2𝑘 𝑡 𝜖𝑘=0
Forecasting: Prediction Intervals
To assess the uncertainty associated with this prediction, we compute the
confidence or prediction interval. The distributions of 𝑦ො𝑡 h and prediction error
𝜖Ƹ h are determined by the distribution of the 𝜖 . 𝑡𝑡
Let 𝑧(𝛼) denote 𝛼 × 100% quantile of standard normal distribution. We have
𝑧 𝛼/2
= −𝑧 1−𝛼/2 . Then
1−𝛼=Pr −𝑧1−𝛼/2 ≤ 𝑡 ≤𝑧1−𝛼/2
=Pr −𝑧1−𝛼/2 ≤𝑦𝑡+h−𝑦ො𝑡 h ≤𝑧1−𝛼/2
= P r 𝑦ො h − 𝑧 𝑡
𝜖Ƹ h
𝜎h 𝜖
𝜎h 𝜖
𝜎 h ≤ 𝑦 ≤ 𝑦ො h + 𝑧
𝜎 h 𝜖
1−𝛼/2 𝜖 𝑡+h
𝑡
1−𝛼/2
Interval
𝑦ොh±𝑧 𝜎h 𝑡 1−𝛼/2 𝜖
is called the 1 − 𝛼 × 100% h-step prediction interval. Usually, for 𝛼 the values of 0.05 or 0.01 are chosen.
Autoregressive models of order p
An autoregressive model is a regression model that is based on an weighted average of prior values in the series, weighted according to a regression on lagged version of the series.
The autoregressive model of lag p, written as AR(p) is:
𝑌−𝜇=𝜙 𝑌 −𝜇 +𝜙 𝑌 −𝜇 +⋯+𝜙 𝑌 −𝜇 +𝑒 or
𝑡 1 𝑡−1 2 𝑡−2 𝑝 𝑡−𝑝 𝑡
𝑌=𝛿+𝜙𝑌 +⋯+𝜙𝑌 +𝑒,where𝛿= 1−𝜙 −𝜙 −⋯−𝜙 𝜇 𝑡 1𝑡−1 𝑝𝑡−𝑝 𝑡 1 2 𝑝
Assumptions regarding the error term are the same as before: Zero mean, constant variance, and mutually uncorrelated.
Most of the concepts we have discussed for AR(1) models generalize easily to AR(p) models.
AR(2) model
An autoregressive time series of order 𝑝 = 2, or AR(2) model: 𝑌=𝛿+𝜙𝑌 +𝜙𝑌 +𝑒
Properties of AR(2): • Mean:𝐸𝑌=
• ACF:𝜌=1,𝜌=𝜙1 ,𝜌=𝜙𝜌 +𝜙𝜌 , 0 1 1−𝜙2 𝑘 1𝑘−1 2𝑘−2
𝑘 ≥ 2.
• Forecasts: Similar to AR(1) models
• Stationarity condition: The AR(2) model is stationary, if and only if the characteristic equation:𝜙 𝑧 = 1 − 𝜙1𝑧 − 𝜙2𝑧2
𝜙𝑧≠0 𝑤h𝑒𝑛𝑧≤1
i.e. all roots including complex roots of 𝜙 𝑧 lie outside the unit
circle.
The condition is equivalent to:
𝜙2 +𝜙1 <1, 𝜙2 −𝜙1 <1, −1<𝜙2 <1.
Prooftip:𝜌 =𝜙 +𝜙𝜌 𝜌 <1,𝑧𝑧 >1 1121,1 12
𝑡
𝑡
𝛿 1−𝜙1−𝜙2
1 𝑡−1
2 𝑡−2 𝑡
Sample autocorrelations of an AR(2) process
ACF of a AR(2) process with (𝜙1 = 0.5, 𝜙2 = 0.4) (upper left), (𝜙1 = 0.9, 𝜙2 = −0.4) (upper right), (𝜙1 = −0.4, 𝜙2 = 0.5) (lower left) and (𝜙1 = −0.5, 𝜙2 = −0.9) (lower right).
How to identify AR model and its order?
Partial autocorrelations
AR(p) model has a distinct rubric: Its partial autocorrelations for lag order higher than p are zero: 𝜋𝑘𝑘 = 0,𝑓𝑜𝑟 𝑎𝑙𝑙 𝑘 > 𝑝.
A partial autocorrelation is the amount of correlation between a variable
and a lag of itself that is not explained by correlations at all lower-order-
lags. In other words, it measures the dependence between 𝑌 and 𝑌
𝑡 𝑡+𝑘
after correcting 𝑌 and 𝑌 for the linear influence of the variables 𝑡 𝑡+𝑘
𝑌 ,⋯,𝑌
𝑡+1 𝑡+𝑘−1
𝜋 = 𝐶𝑜𝑟𝑟(𝑌 − 𝓟 𝑌 𝑌 , ⋯ , 𝑌 , 𝑌 − 𝓟 𝑌 𝑌 𝑘𝑘 𝑡 𝑡 𝑡+1 𝑡+𝑘−1 𝑡+𝑘 𝑡+𝑘 𝑡+1
where 𝓟 𝑊 𝑍 is the ‘best linear projection’ of W on Z.
, ⋯ , 𝑌 ) 𝑡+𝑘−1
The partial autocorrelations at all lags can be computed by fitting a succession of autoregressive models with increasing numbers of lags. In particular, the partial autocorrelation at lag k is equal to the estimated AR(k) coefficient in an autoregressive model with k terms–i.e., a multiple regression model in which Y is regressed on lag-1, lag-2, etc., up to lag-k.
ACF and PACF of AR(1) models
PACF and ACF of order 𝑘 = 1 are identical: 𝜋11 = 𝜌1
By observing correlogram and sample PAC plot can help to select model type. Moreover, by mere inspection of the PACF we can determine how many AR terms are needed to explain the autocorrelation pattern in a time series: if the partial autocorrelation is significant at lag k and not significant at any higher order lags, i.e., if the PACF “cuts off” at lag p, then this suggests that you should try fitting an autoregressive model of order p. Note that sample PACF
isalsoasymptoticallynormaldistributedwithSDof1/ 𝑇.
ACF and PACF of AR(2) models
PACF cuts off at lag 2, while the ACF decays slowly and may have significant values at higher lags. We say that the series probably displays an “AR signature“ with order 2.
Yule-Walker equation
For an AR(𝑝) process 𝑌 = 𝑎 𝑌 + 𝑎 𝑌 + ⋯ + 𝑎 𝑌
𝑡 1 𝑡−1 2 𝑡−2 𝑝 𝑡−𝑝
𝛾𝑘 =𝑎1𝛾𝑘−1+𝑎2𝛾𝑘−2+⋯+𝑎𝑝𝛾𝑘−𝑝,
which carries over to the ACF, namely,
𝜌𝑘 =𝑎1𝜌𝑘−1+𝑎2𝜌𝑘−2+⋯+𝑎𝑝𝜌𝑘−𝑝,
+ 𝜀 we have 𝑡
𝑘=0,1,2,… 𝑘=0,1,2,…
These relations are called Yule-Walker equations.
Using sample autocorrelations and collecting the first p equations in matrix form
we obtain
𝜌ො1 1 𝜌ො1 … 𝜌ො𝑝−1 𝑎1 𝜌ො2 𝜌ො1 1 𝜌ො𝑝−2 𝑎2
or, in short 𝝆ෝ𝒑 = 𝑻𝒂.
𝜌ො3 ⋮
𝜌ො𝑝−1 𝜌ො𝑝
=
𝜌ො2 𝜌ො1 𝜌ො𝑝−3 𝑎3
⋮⋱⋮⋮ 𝜌ො𝑝−2 𝜌ො1 𝑎𝑝−1
𝜌ො𝑝−1 𝜌ො𝑝−2 … 1 𝑎𝑝
Yule-Walker estimation
YW estimator is then given by If SACF is estimated by
−𝟏 𝒂ෝ=𝑻 𝝆ෝ𝒑
σ𝑇 𝑦−𝜇Ƹ𝑦−𝜇Ƹ 𝜌ො𝑘= 𝑖=𝑘+1 𝑖 𝑖−𝑘
σ𝑇 𝑦𝑖 − 𝜇Ƹ 2 𝑖=1
all roots of the YW-estimated AR polynomial will be 1 − 𝑎ො1𝐿 − ⋯ − 𝑎ො𝑝𝐿𝑝 greater than unity.
Let’s estimate an AR(2) model, where we have over-identified YW: 1 𝜌ො1 𝜌ො1
𝜌ො1 1 𝑎1 = 𝜌ො2 𝜌ො2 𝜌ො1 𝑎2 𝜌ො3
𝜌ො3 𝜌ො2
𝜌ො4
The over-identified YW estimator is then given by the least squares solution:
′ −𝟏′
𝒂ෝ = 𝑻 𝟒 𝑻 𝟒 𝑻 𝟒 𝝆ෝ 𝟒
Extended Yule-Walker equations
In the case of a pure MA process, simplifies to
𝛾𝑘 = ൞
𝑏 𝑏
𝜎2
2
𝑗 𝑗−𝑘
𝑗=𝑘 0
𝑖𝑓 𝑘 = 0, 1, … , 𝑞
𝑖𝑓 𝑘 > 𝑞
More generally, for an ARMA(p,q) process,
𝛾𝑘 =𝑎1𝛾𝑘−1+𝑎2𝛾𝑘−2+⋯+𝑎𝑝𝛾𝑘−𝑝, 𝑘=𝑞+1,𝑞+2,…
or
𝜌𝑘 =𝑎1𝜌𝑘−1+𝑎2𝜌𝑘−2+⋯+𝑎𝑝𝜌𝑘−𝑝, 𝑘=𝑞+1,𝑞+2,…
The latter recursions are sometimes called extended Yule-Walker equations.
Another application of YW equation: estimate partial autocorrelation function
To compute the PACF, letting 𝑎𝑘𝑙 denote the l-th autoregressive coefficient of an AR(k) process, that is,
𝑦𝑡 = 𝑎𝑘1𝑦𝑡−1 + 𝑎𝑘2𝑦𝑡−2 + ⋯ + 𝑎𝑘𝑘−1𝑦𝑡−(𝑘−1) + 𝑎𝑘𝑘𝑦𝑡−𝑘 + 𝜖𝑘,𝑡 then, 𝛼𝑘 = 𝑎𝑘𝑘 , 𝑘 = 1, 2, …
The k Yule-Walker equations for the ACF, give rise to the system of linear
equations
𝑎𝑘1 𝜌1
𝑎𝑘2 𝜌2
𝑎𝑘3 = 𝜌3 ⋮⋮
𝑎𝑘,𝑘−1 𝜌𝑘−1 𝑎𝑘𝑘 𝜌𝑘
1
𝜌1 𝜌2
𝜌1 … 𝜌𝑘−1 1 𝜌𝑘−2 𝜌1 𝜌𝑘−3
or, in short 𝑷𝑘𝒂𝑘 = 𝜌𝑘 ,𝑘 = 1,2,….
⋮⋱⋮
𝜌𝑘−2 𝜌1 𝜌𝑘−1 𝜌𝑘−2 … 1
Estimate PACF
Sample Partial Autocorrelation Function
To estimate the sample PACF (SPACF), we follow the procedure for computing the theoretical PACF described earlier, but replace theoretical autocorrelations, 𝜌𝑖, by
their estimates, 𝜌ො𝑖:
where 𝑃∗ replaces the k-th column of 𝑃 by (𝜌 , … , 𝜌 )′. Cramer’s rule.
∗ 𝑃
𝑎ො 𝑘 𝑘 = 𝑘 , 𝑘 = 1 , 2 , …
𝑃
𝑘 𝑘1𝑘
1 𝜌ො1 𝜌ො1 1
… 𝜌ො𝑘−2 𝜌ො𝑘−3 𝜌ො𝑘−4
𝜌ො𝑘−1 1 𝜌ො1 𝜌ො𝑘−2 𝜌ො1 1 𝜌ො𝑘−3 and𝑷∗ = 𝜌ො2 𝜌ො1
… 𝜌ො𝑘−2 𝜌ො1
𝜌ො𝑘−3 𝜌ො2
𝑘
𝑷 = 𝜌ො2 𝜌ො1
𝜌ො𝑘−2 1 𝜌ො1
𝜌ො𝑘−1 𝜌ො𝑘−2 … 𝜌ො1 1
From the Yule-Walker equations it is evident that 𝑷∗
whose order is less than k.
𝜌ො𝑘−4 𝜌ො3 𝑘⋮⋱⋮⋮𝒌⋮⋱⋮⋮
𝒌
𝜌ො𝑘−2 1 𝜌ො𝑘−1 𝜌ො𝑘−1 𝜌ො𝑘−2 … 𝜌ො1 𝜌ො𝑘
= 0 for an AR process
Estimate PACF
A computationally more efficient procedure for estimating the SPACF is the following recursion for 𝑘 = 1, 2, …
𝜌ො −σ𝑘−1𝛼ො 𝜌ො 𝑎ො𝑘𝑘=𝑘 𝑙=1 𝑘−1,𝑙𝑘−𝑙
1 − σ 𝑘 − 1 𝑎ො 𝜌ො 𝑙=1 𝑘−1,𝑙 𝑘
𝑎ො𝑘𝑙 = 𝑎ො𝑘−1,𝑙 − 𝑎ො𝑘𝑘𝑎ො𝑘−1,𝑘−𝑙, 𝑙 = 1, 2, … , 𝑘
with 𝑎ො𝑖𝑗 = 0, for 𝑖, 𝑗 < 1. For large samples and values of 𝑘 sufficiently large,
the PACF is approximately normally distributed with variance 𝑉𝑎𝑟(𝑎ො𝑘) ≈ 1. The 𝑇
95%confidenceintervalcanbeapproximatedby±2/ 𝑇.
Impact of forecast error
Considering forecast error helps for forecasting exchange rates.
With the increasing globalization and liberalization of economies various corporate and firms are expanding their business operations overseas. These organizations usually encounter the risk of currency exposure and need to forecast the exchange rates to hedge against the risk of exchange rate fluctuation. Not only this, corporate needs to forecast exchange rates to take decisions regarding short term investments, long term investments, short term and long term financing and other capital budgeting decisions.
However the exchange rates forecasted are seldom accurate and it is this aspect which gives rise to forecast error. The forecast errors are more frequent in such periods where there was a lot of fluctuation in the currency rates. Potential forecasted errors have a great impact on the financial position of the firm, it is due to this that the corporate need to examine and calculate the degree of impact of such potential errors on the financial positions before taking up any financial decision.
Moving average model of order 1
The moving average model of lag 1, written as MA(1) is:
𝑌=𝜇+𝑒+𝜃𝑒
𝑡 𝑡1𝑡−1
Current value of 𝑌 can be found from past shocks/error, plus a new 𝑡
error (𝑒𝑡).The time series is regarded as a moving average (unevenly weighted, because of different coefficients) of a random error series 𝑒𝑡. Assumptions regarding the error term are the same as before: Zero mean, constant variance 𝜎2, and mutually uncorrelated.
If 𝜃 is zero, Y depends purely on the error or shock (e) at the current 1
𝑒
time, and there is no serial dependence
If 𝜃 is large, previous errors influence the value of 𝑌 much.
1𝑡
If our model successfully captures the dependence structure in the
data then the residuals should look random.
Property of MA(1)
The MA(1) process: 𝑌 = 𝜇 + 𝑒 + 𝜃 𝑒 is stationary! 𝑡 𝑡1𝑡−1
𝐸𝑌=𝜇 𝑡
𝑣𝑎𝑟𝑌 =𝛾 = 1+𝜃2 𝜎2 𝑡01𝑒
𝑐𝑜𝑣𝑌,𝑌 =𝛾 =𝜃𝜎2 𝑡𝑡−1 1 1𝑒
𝛾𝑘 = 0, 𝑘 > 1
𝜃 1
1+𝜃 2 1
𝐶𝑜𝑟𝑟𝑌,𝑌 =𝜌 =0 𝑘>1 𝑡𝑡−𝑘 𝑘
• Finite memory! MA(1) models do not remember what happens two time periods ago. The autocorrelation function has a cut-off after lag 1.
• Invertibility Condition: 𝜃 can be solved from the equation: 𝜃2 − 𝜃 /𝜌 + 11 111
1 = 0. If𝜃 isasolution,sois .Thusitrequires 𝜃 <1. 1 𝜃1 1
• What’s the link between the AR and MA models? The MA model can be
𝐶𝑜𝑟𝑟𝑌,𝑌 =𝜌 = 𝑡𝑡−1 1
reformulated as an AR(∞). Given MA(1): 𝑌 =𝑒 +𝜃 𝑒 and 𝑌 =𝑒
𝑡 𝑡 1𝑡−1 𝑡−1 𝑡−1
+
𝜃 𝑒 ⇒ 𝑒 = 𝑌
1 𝑡−2 𝑡−1 𝑡−1
𝑌 = 𝑒 + 𝜃 𝑌
𝑡 𝑡 1 𝑡−1
− 𝜃 𝑒 , we have 1 𝑡−2
− 𝜃2𝑒 = 𝑒 + 𝜃 𝑌 − 𝜃2𝑌 + 𝜃3𝑒 = ⋯ 1 𝑡−2 𝑡 1 𝑡−1 1 𝑡−2 1 𝑡−3
Thus the PACF of MA(1) is infinite in contents but damps out geometrically.
Itssignsalternateif𝜃 <0. 1
MA(1) model
Moving Average models have a simple ACF structure.
The MA(1) models have nonzero autocorrelations only for k=1.
A simulated MA model of lag 1 and its sample ACF:
𝑌 =𝜇+𝑒 +𝜃𝑒
𝑡 𝑡 𝑡−1
Moving average models
The moving average model of lag q, written as MA(q) is:
𝑌=𝜇+𝑒+𝜃𝑒 +⋯+𝜃𝑒
𝑡 𝑡 1 𝑡−1 𝑞 𝑡−𝑞
Assumptions regarding the error term are the same as before: Zero mean, constant variance 𝜎2, and mutually uncorrelated.
• The MA models are stationary!
𝐸𝑌=𝜇 𝑡
𝑣𝑎𝑟𝑌 =(1+𝜃2+⋯+𝜃2)𝜎2 𝑡1𝑞𝑒
𝜌 =𝐶𝑜𝑟𝑟𝑌,𝑌 =0 𝑘>𝑞,… 𝑘 𝑡𝑡−𝑘
• The MA(q) models have nonzero autocorrelations for 𝑘 = 1, … , 𝑞. If the corrlogram “cuts off” at lag k, then this suggests that we should try fitting an moving average model of order q.
• Invertibility conditions: The MA(q) model is invertible if and only if
𝜃 𝑧 ≠ 0 𝑤h𝑒𝑛 𝑧 ≤ 1
i.e. all roots including complex roots of 𝜃 𝑧 lie outside the unit circle.
• PACF: Infinite in contents.
ACF and PACF of MA(1) model
ACF: cuts off at lag 1.
PACF: declines over time.
ACF and PACF of MA(2) model
The lag at which the ACF cuts off is the indicated number of possible MA terms.
Example: Inflation rate – MA(q) fit
The auto.arima function in R’s forecast package found that q = 3 is the first local minimum of AIC, while the first local minimum of BIC is at q = 2.
Call:
arima(x = diff(x), order = c(0, 0, 3))
Coefficients:
ma1 ma2 ma3 intercept -0.632950 -0.102734 -0.108172 -0.000156
s.e. 0.046017 0.051399 0.046985 0.020892
Thus, if an MA model is used, then only two or three MA parameters are needed. This is a strong contrast with AR models, which require far more parameters.
ARMA(1,1) models
A model may have both autoregressive and moving average components. Autoregressive Moving Average (ARMA) model is an extension of AR model, where the future values depend on both the historical values (AR part) and the past forecast errors (MA part).
TheARMA(1,1)modelhastheform:𝑌 =𝛿+𝜙𝑌 +𝑒 +𝜃𝑒
𝑡 𝑡−1𝑡𝑡−1
Just a combination of MA and AR terms.
Reformulation:
𝑌 =𝛿+𝜙𝑌 +𝑒 +𝜃𝑒 =𝛿+𝜙 𝛿+𝜙𝑌 +𝑒 +𝜃𝑒
+𝑒 +𝜃𝑒 =⋯ 𝑡 𝑡−1
𝑡 𝑡−1 𝑡 𝑡−1
𝑡−2 𝑡−1 𝑡−2
=𝐴+𝐵𝑌 +𝑒+𝐶𝑒 +⋯+𝐶𝑒
1 1 𝑡−𝑘 𝑡 1 𝑡−1 𝑘 𝑡−𝑘
𝑌 =𝛿+𝜙𝑌 +𝑒 +𝜃𝑒 =𝛿+𝜙𝑌 +𝑒 +𝜃 𝑌 −𝛿−𝜙𝑌
𝑡 𝑡−1 𝑡 𝑡−1 𝑡−1 𝑡 𝑡−1 𝑡−2
=𝐴+𝐷𝑌 +⋯+𝐷𝑌 +𝑒+𝐸 21𝑡−1 𝑚𝑡−𝑚𝑡1
Where A.,B.,C.,D. and E. are coefficients of the AR and MA terms.
−𝜃𝑒 𝑡−2
=⋯
Property of ARMA(1,1) process
FromtheARMA(1,1)model:𝑌 =𝛿+𝜙𝑌 +𝑒 +𝜃𝑒 ,weobtain 𝑡 𝑡−1𝑡𝑡−1
𝑐𝑜𝑣𝑌,𝑒 =𝐸𝑌𝑒 =𝜎2, 𝑡𝑡𝑡𝑡𝑒
since 𝑒 is independent of 𝑒 and 𝑌 . 𝑡 𝑡−1 𝑡−1
Multiplying 𝑌 on both sides and taking expectation, we have 𝑡
𝑣𝑎𝑟 𝑌 =𝛾 =𝜙2𝛾 + 1+𝜃2 𝜎2 +2𝜙𝜃𝜎2 ⇒𝛾 = 1+𝜃2 +2𝜙𝜃 𝜎2/(1−𝜙2) 𝑡00𝑒𝑒0𝑒
Similarly, we obtain For 𝑘 ≥ 2, multiply 𝑌
𝜌1 =(1+𝜙𝜃)(𝜙+𝜃)
1+𝜃2 +2𝜙𝜃
on both sides and take expectation
𝜌𝑘 =𝜙𝜌𝑘−1, 𝑘≥2.
𝑡−𝑘
After one lag the ACF of an ARMA(1,1) process decays in the same way as the ACF of an AR(1) process with the same 𝜙.
ARMA(1,1) model
and its sample ACF.
A simulated ARMA(1,1) model 𝑌 = 0.75𝑌 + 𝑒 + 0.75𝑒
𝑡 𝑡−1𝑡 𝑡−1
ARMA(p,q) models
The ARMA(p,q) model has the form:
𝑌=𝛿+𝜙𝑌 +⋯𝜙𝑌 +𝑒+𝜃𝑒 +⋯+𝜃𝑒 .
𝑡 1 𝑡−1 𝑝 𝑡−𝑝 𝑡 1 𝑡−1 𝑞 𝑡−𝑞
where p indicates the order of the lagged values (AR part) and q refers to the order of the past errors (MA part). By including MA part, the process learns from the error made over time and tries to improve forecast accuracy in the future.
AR(p) can be written as ARMA(p,0). MA(q) can be represented as ARMA(0,q).
ACF and PACF of ARMA(1,1) models
For the ARMA(1,1), both the ACF and the PACF exponentially decrease.
Much of fitting ARMA models is guess work and trial-and- error!
Identification of lag orders: visual inspection
Identification of lag orders: visual inspection of SACF and SPACF
Estimation: By relating the sample autocorrelations and partial autocorrelations to the ACF and PACF of ARMA models, candidates may be identified. The selected model is estimated and its residuals are tested for randomness using the Q test statistics on the residual ACFs. If significance of autocorrelations is detected in the residuals, a new model, normally with lagged values at higher order is considered and the procedure is repeated.
Two principles: forecast accuracy, and parsimony.
Identification of lag orders: model selection criteria
Let the residuals of an estimated ARMA(p,q) model be denoted by 𝜖Ƹ (p, q). The
estimate of the corresponding residual variance, denoted by 𝜎ො2 1𝑇
𝜎ො2 = 𝜖Ƹ2(p,q) 𝑝,𝑞 𝑇 t
𝑖=1
Larger models tend to fit in-sample better. However, if we use too many parameters we fit noise and obtain poor forecasting capabilities. This phenomenon is called overfitting.
In the extreme, we could achieve a perfect fit by fitting a “model” that has as many parameters as observations. Such models overfit the data by also capturing non- systematic features contained in the data. In general, overparameterized models tend to be unreliable.
Log-likelihood:
𝑇𝑇1𝑇
− l n 2 𝜋 − l n 𝜎ො 2 − 𝑒 Ƹ 2
t 𝑝,𝑞
, is
2 2 𝑝,𝑞 2𝜎ො2
𝑝,𝑞 𝑡=1
𝑡
Model selection criteria
Several model-selection criteria attempting to overcome the overparameterization problem have been proposed in the literature:
1. The Akaike Information Criterion (AIC) is given by 𝐴𝐼𝐶 =ln𝜎ො2 +2(𝑝+𝑞)
The (p,q)-combination that minimizes the AIC should be selected. However, this criterion may give more than one minimum, depends on assumption that the data are normally distributed and tends to overparameterize.
2. The Bayesian Information Criterion (BIC) is given by 𝐵𝐼𝐶 =ln𝜎ො2 +ln𝑇(𝑝+𝑞)
This criterion imposes a more severe penalty for each additional parameter and thereby tends to select lower-order models than the AIC.
𝑝,𝑞 𝑝,𝑞
𝑇
𝑝,𝑞 𝑝,𝑞
𝑇
Model selection criteria
3. The Corrected Akaike Information Criterion (AICC) given by 𝐴𝐼𝐶𝐶 =ln𝜎ො2 + 2 (𝑝+𝑞+1)
𝑝,𝑞 𝑝,𝑞
𝑇−𝑝−𝑞−2
attempts to correct the bias of the AIC that is causing the overparameterization problem and is especially designed for small samples. For small sample sizes, it tends to select different models.
Let k be the number of estimated parameters of a model as recommended by an information criterion. Due to the different penalty terms we have
𝑘𝐴𝐼𝐶 ≥ 𝑘𝐴𝐼𝐶𝐶 ≥ 𝑘𝐵𝐼𝐶 .
Model selection criteria
The BIC is strongly consistent in selecting the orders of a process; namely, it determines the true model asymptotically.
In contrast, AIC will always determine an overeparameterized model, independent of the length of the sample.
In practice, use of information criteria should be viewed as supplementary guidelines to assist in the model selection process rather than as a main model selection criteria.
There may be several models that produce criterion values that are very close to the minimum value. All reasonable models should remain candidates for the final selection and be subjected to further diagnostic checks (for example, a test for whiteness of the residuals).
Example: Inflation rate – AR(p) fit
The auto.arima function in R’s forecast package found that p = 8 is the first local minimum of AIC, while the first local minimum of BIC is at p = 6.
> auto.arima(diff(x),max.p=10,max.q=0,ic=“aic”) > auto.arima(diff(x),max.p=10,max.q=0,ic=”bic”) 4_inflation.R
Example: Inflation rate – AR(7) fit
Here are the results for p =7.
4_inflation.R
Time series models
Sample Sample ACFs Model Residuals
Given data, we conduct statistical analysis to discover:
Static distributional properties such as sample mean, sample variance Serial dependence analysis such as correlogram (sample ACFs) (Relationship on exogenous variables)
We then select the model that matches data’s characteristics
Model type: time series models AR(p), MA(q) or ARMA(p,q)? Order: select lag order p and q?
Population
Properties
True dynamics
Errors
Box-Jenkins procedure
1. The purpose of the identification step in the Box- Jenkins approach is to first determine the autoregressive order, p, and the moving average order, q. These initial guesses are typically not final values for these orders, but they are used to specify one or more tentative (competing) models.
2. Given values for p and q from the identification step, the parameters of the ARMA(p,q) model are derived in the estimation step. It delivers estimates of the ARMA coefficients for ARMA model formulations and the residual variance 𝜎2.
3. In the diagnostic-checking step we examine the adequacy of the estimated model. Here, the main objective is to determine whether or not all relevant information has been “extracted“ from the data. If so, the residuals should resemble white noise.
4. Forecast: A comparison of the forecasting performance of alternative models over several post sample periods may help to find the most suitable model.
Which model should we choose?
That depends on the assumptions we are comfortable making with respect to the data. The model with lower order is essentially easy to understand and interpret. It is also fairly optimistic about the accuracy with which it can forecast with less information necessary. As a general rule in this kind of situation, I would recommend choosing the model with the lower order, other things being roughly equal.