QBUS 6840 Lecture 8 ARIMA models (II)
QBUS 6840 Lecture 8
Copyright By PowCoder代写 加微信 powcoder
ARIMA models (II)
The University of School
Objectives
Fully understand MA(q) models and their PACF characteristic
Be able to conduct forecast and variance derivation
Be very familiar with using B operator in all the related models
Understand the concept of invertibility
Fully understand ARMA(p,q) and their characteristics
Fully understand ARIMA(p,q,d)
Understand estimating ARIMA(p,q,d) models and general
procedures and model selection
Review of ACF and PACF
For nonseasonal time series, if the ACF either cuts off fairly
quickly or dies down fairly quickly, then the time series should
be considered stationary
For nonseasonal time series, if the ACF dies down extremely
slowly, then it should be considered nonstationary
Review of AR(p) Processes
Yt = c + φ1Yt−1 + φ2Yt−2 + · · ·+ φpYt−p + εt
where εt is i.i.d. with mean zero and variance σ
Data characteristics
The ACF dies down
The PACF has spikes at lags 1, 2, …, p and cuts off after lag p
Model characteristics (Examples)
For an AR(1) model Yt = c + φ1Yt−1 + εt to be statioanry:
−1 < φ1 < 1 Review of AR(p) Processes Consider the following Yt+2 = c + Yt + 0.23Yt−1 + 0.41Yt−2 + εt+2 where εt is i.i.d. with mean zero and variance σ What is the order p of this AR model? Moving average (MA) processes MA(q) processes Yt = c + εt + θ1εt−1 + θ2εt−2 + . . .+ θqεt−q, where εt is i.i.d. with mean zero and variance σ a weighted moving average of the past few forecast errors. appropriate to model quantities Yt , such as economic indicators, which are affected by random events that have both immediate and persistent effect on Yt sometimes, the εt are called random shocks: shocks caused by unpredictable events See example Lecture08 Example01.py Example: MA(1) process Yt = c + εt + θ1εt−1. Unconditional Expectation E[Yt ] = E[c + εt + θ1εt−1] = c + 0 + θ1 × 0 = c Unconditional variance: V(Yt) = V(c) + V(εt) + V(θ1εt−1) = 0 + σ2 + σ2θ21 = σ 2(1 + θ21) Example: MA(1) process Properties Covariance: V(Yt) = V(c) + V(εt) + V(θ1εt−1) = 0 + σ2 + σ2θ21 = σ 2(1 + θ21) Cov(Yt ,Yt−1) =Cov(c + εt + θ1εt−1, c + εt−1 + θ1εt−2) =Cov(c , c) + Cov(c , εt−1) + Cov(c , θ1εt−2) + Cov(εt , c) + Cov(εt , εt−1) + Cov(εt , θ1εt−2) + Cov(θ1εt−1, c) + Cov(θ1εt−1, εt−1) + Cov(θ1εt−1, θ1εt−2) =θ1Cov(εt−1, εt−1) = θ1V(εt−1) = θ1σ2 Autocorrelation: Cov(Yt ,Yt−1) Example: MA(1) process Properties Cov(Yt ,Yt−2) = 0, (Why?) ρk = 0 for k > 1.
Conclusion: MA(1) process is stationary for every θ1, and its ACF
plot cuts off after lag 1. Partial ACF:
1− θ2(k+1)1
Partial ACF plot dies down exponentially when |θ1| < 1. Example: MA(1) process Forecasting From the process Yt+1 = c + εt+1 + θ1εt we easily have the following one-step ahead forecasting: E(Yt+1|y1:t) = ŷt+1 = c + θ1ε̂t , V(Yt+1|y1:t) = σ2. We use the forecast errors ε̂1, ..., ε̂t from the previous periods to construct the next forecast at time t + 1. Let the forecast at time t is ŷt , and forecast error ε̂t = yt − ŷt = yt − (c + θ1ε̂t−1) Forecast of Yt+1 is ŷt+1 = c + θ1ε̂t and the forecast error ε̂t+1 = yt+1 − ŷt+1 = yt+1 − (c + θ1ε̂t) 10 / 67 Example: MA(1) process Forecasting E(Yt+2|y1:t) = c + E(εt+2|y1:t) + θ1E(εt+1|y1:t) = c so ŷt+2|t = c and V(Yt+2|y1:t) = σ2(1 + θ21) In general, it is easy to show that E(Yt+h|y1:t) = c for h > 1
V(Yt+h|y1:t) = σ2(1 + θ21) for h > 1
Example: MA(1) process
Forecasting
MA(q) processes
Properties
Unconditional:
V(Yt) =Cov(c + εt + θ1εt−1 + θ2εt−2 + . . .+ θqεt−q,
c + εt + θ1εt−1 + θ2εt−2 + . . .+ θqεt−q)
=σ2(1 + θ21 + · · ·+ θ
Cov(Yt ,Yt−1) =Cov(c + εt + θ1εt−1 + θ2εt−2 + . . .+ θqεt−q,
c + εt−1 + θ1εt−2 + θ2εt−3 + . . .+ θqεt−q−1)
=σ2(θ1 + θ1θ2 + θ2θ3 + . . .+ θq−1θq).
θ1 + θ1θ2 + θ2θ3 + . . .+ θq−1θq
1 + θ21 + . . .+ θ
MA(q) processes
Properties
Cov(Yt ,Yt−q) =Cov(c + εt + θ1εt−1 + θ2εt−2 + . . .+ θqεt−q,
c + εt−q + θ1εt−q−1 + θ2εt−q−2 + . . .+ θqεt−2q)
1 + θ21 + . . .+ θ
ρk = 0 for k > q
Question: What about ρk if 2 ≤ k < q? θk + θk+1θ1 + · · ·+ θqθq−k 1 + θ21 + . . .+ θ MA(q) processes Properties Cov(Yt ,Yt−q) =Cov(c + εt + θ1εt−1 + θ2εt−2 + . . .+ θqεt−q, c + εt−q + θ1εt−q−1 + θ2εt−q−2 + . . .+ θqεt−2q) 1 + θ21 + . . .+ θ ρk = 0 for k > q
Question: What about ρk if 2 ≤ k < q? θk + θk+1θ1 + · · ·+ θqθq−k 1 + θ21 + . . .+ θ MA(q) processes Forecasting ŷt+h = E(Yt+h|y1:t) = c + θ1E(εt+h−1|y1:t) + . . .+ θqE(εt+h−q|y1:t), E(εt+h−i |y1:t) = 0 if h > i
ε̂t+h−i if h ≤ i
V(Yt+h|y1:t) = σ2
1 + min(q,h−1)∑
Example: MA(3) Forecasting
Forecasting if we know all the parameters c and θi s
ŷt+h =E(Yt+h|y1:t)
=c + θ1E(εt+h−1|y1:t) + θ2E(εt+h−2|y1:t) + θ3E(εt+h−3|y1:t),
ŷt+1 =c + θ1E(εt |y1:t) + θ2E(εt−1|y1:t) + θ3E(εt−2|y1:t)
=c + θ1ε̂t + θ2ε̂t−1 + θ3ε̂t−2
ŷt+2 =c + θ1E(εt+1|y1:t) + θ2E(εt |y1:t) + θ3E(εt−1|y1:t)
=c + θ1 × 0 + θ2ε̂t + θ3ε̂t−1 = c + θ2ε̂t + θ3ε̂t−1
ŷt+3 =c + θ1E(εt+2|y1:t) + θ2E(εt+1|y1:t) + θ3E(εt |y1:t)
=c + θ1 × 0 + θ2 × 0 + θ3ε̂t = c + θ3ε̂t
MA(q) processes
Properties and Estimation
ρk (ACF) cuts off after lag q.
ρkk (PACF) dies down exponentially.
What is the best way for us to estimate all the parameters c
and θi ’s (1 ≤ i ≤ q)?
Backshift operators
We now introduce the Backshift operator, which is very useful for
describing time series models
BYt = Yt−1
B2Yt = B(BYt) = B(Yt−1) = Yt−2
BkYt = Yt−k
Particularly for a constant series {d}, we define
Backshift operators
In context: AR(1)
Yt = c + φ1Yt−1 + εt
where gives µ = E(Yt) = E(Yt−1) = c/(1− φ1)
(1− φ1B)Yt = c + εt
(1− φ1B)(Yt − µ) = εt
which comes from the fact c = (1− φ1)µ = (1− φ1B)µ, which is
from Bd = d for any constant d .
Denote Zt = Yt − µ, then
(1− φ1B)Zt = εt =⇒ Zt = φ1Zt−1 + εt
Backshift operators
In context: MA(1)
Yt = c + εt + θ1εt−1
which gives µ = E(Yt) = c.
Yt = c + (1 + θ1B)εt
(Yt − µ) = (1 + θ1B)εt
Denote Zt = Yt − µ, then
Zt = (1 + θ1B)εt =⇒ Zt = εt + θ1εt−1
Backshift operators
In context: MA(1)
Yt = c + εt + θ1εt−1
which gives µ = E(Yt) = c.
Yt = c + (1 + θ1B)εt
(Yt − µ) = (1 + θ1B)εt
Denote Zt = Yt − µ, then
Zt = (1 + θ1B)εt =⇒ Zt = εt + θ1εt−1
Backshift operators
In context: AR(p)
Yt = c + φ1Yt−1 + . . .+ φpYt−p + εt
= c + φ1B(Yt) + . . .+ φpB
p(Yt) + εt
(1− φ1B − φ2B2 − . . .− φpBp)(Yt − µ) = εt
where µ = c/(1− φ1 − φ2 − · · · − φp),
i )(Yt − µ) = εt
Backshift operators
In context: MA(q)
Yt = c + θ1εt−1 + . . .+ θqεt−q + εt
= c + θ1B(εt) + . . .+ θqB
q(εt) + εt
(Yt − µ) = (1 + θ1B + θ2B2 + . . .+ θqBq)εt
(Yt − µ) = (1 +
where µ = c .
Invertibility
Definition
An MA(q) process is invertible when we can rewrite Yt as a linear
combination of its past values (an AR(∞)) plus the
contemporaneous error term εt .
Yt = εt + c
∗ + φ∗1Yt−1 + φ
3Yt−3 + · · ·
where c∗ and each φ∗ can be calculated from the MA parameters
θi (1 ≤ i ≤ q)
Invertibility
MA(1) (optional)
Yt = c + θ1εt−1 + εt
Note: For MA processes c = µ
(Yt − µ) = (1 + θ1B)εt ⇒ εt =
= 1− x + x2 − x3 + · · · for |x | < 1] Under the condition |θ1| < 1, we have εt = (1− θ1B + θ21B 3 + . . .)(Yt − µ) εt = −µ(1− θ1 + θ21 − θ 1 + . . .) + Yt − θ1BYt + θ 2Yt − · · · (−1)iθi1Yt−i + εt Invertibility MA(1) (alternative route) (optional) The MA(1) gives εt = Yt − c − θ1εt−1 Yt = c + θ1εt−1 + εt = c + θ1(yt−1 − c − θ1εt−2) + εt = c(1− θ1) + θ1yt−1 − θ21εt−2 + εt = c(1− θ1 + θ21) + θ1yt−1−θ 1εt−3 + εt = c(1− θ1 + θ21 − θ 1 + . . .)− (−1)iθi1Yt−i + εt (−1)iθi1Yt−i + εt or εt = Yt − c∗ + (−1)iθi1Yt−i Invertibility Why it matters If we want to find the value εt at a certain period and the process is invertible, we need to know the current and past values of Y . For a noninvertible representation we would need to use all future values of Y ! Convenient algorithms for estimating parameters and forecasting are only valid if we use an invertible representation. Invertibility MA(1) (Estimate) (optional) We wish to find θ1 such that From previous formula, we know, given Y1,Y2, ...,YT , ε2 = Y2 − c∗ − θ1Y1 ε3 = Y3 − c∗ − θ1Y2 + θ21Y1 ε4 = Y4 − c∗ − θ1Y3 + θ21Y2 − θ εT = YT − c∗ − θ1YT−1 + θ21YT−2 − · · ·+ (−1) T−1θT−11 Y1 This results in a nonlinear least square problem. Harder than AR Invertibility What about MA(1) in the case of θ1 > 1? (optional)
Note that for any t
(Yt+1 − c − εt+1)
Yt = c + θ1εt−1 + εt = c + θ1εt−1 +
(Yt+1 − c − εt+1)
) + θ1εt−1 +
) + θ1εt−1 +
(Yt+2 − c − εt+2)
= θ1εt−1 + c(1−
+ . . .) +
We have to use future Y s to express any εt , so there is no way to
t for estimating θ1. 28 / 67
Every invertible MA(q) model can be written as an AR model
of infinite order.
If the coefficient terms on yt−k in the AR representation
decline with k then the MA model is invertible. So is AR(p)
invertible?
A MA(1) requires that |θ1| < 1 for invertibility.
Every stationary AR(p) model can be written as a MA model of
infinite order.
Example: AR(1) as MA(∞)
Yt = c + φ1Yt−1 + εt
= c(1 + φ1) + φ
1Yt−2 + φ1εt−1 + εt
= c(1 + φ1 + φ
1εt−2 + φ1εt−1 + εt
= c(1 + φ1 + . . .+ φ
φi1εt−i + εt
φi1εt−i + εt
Checking Stationarity of AR(p)
Consider the AR(p) process
Yt = φ1Yt−1 + φ2Yt−2 + · · ·+ φpYt−p + εt
Accordingly, define the characteristic equation
1− φ1z − φ2z2 − · · · − φpzp = 0
whose roots are called the characteristic roots. There are p
such roots, although some of them may be equal.
Conclusion: The AR(p) is stationary if all the roots satisfy
For example, the AR(1) is Yt = φ1Yt−1 + εt . The
characteristic equation is 1− φ1z = 0 and its only root is
z∗ = 1/φ1. |z∗| > 1 implies the AR(1) stationarity. This
means |φ1| < 1.
Checking Invertibility of MA(q)
Consider the MA(q) process
Yt = εt + θ1εt−1 + θ2εt−2 + · · ·+ θqεt−q
Accordingly, define the characteristic equation
1 + θ1z + θ2z
2 + · · ·+ θqzq = 0
whose roots are called the characteristic roots. There are q
such roots, although some of them may be equal.
Conclusion: The MA(q) is invertible if all the roots satisfy
For example, the MA(1) is Yt = εt + θ1εt−1. The
characteristic equation is 1 + θ1z = and its only root is
z∗ = −1/θ1. |z∗| > 1 implies the MA(1) is invertible. This
means |θ1| < 1.
ARMA(p, q) processes
Yt = c + φ1Yt−1 + . . .+ φpYt−p + θ1εt−1 + . . .+ θqεt−q + εt ,
where εt is i.i.d. with mean zero and variance σ
Example: ARMA(0, 0) : (White Noise)
Yt = c + εt ,
Example: ARMA(1, 1) :
Yt = c + φ1Yt−1 + θ1εt−1 + εt ,
ARMA(p, q) processes
Properties
1− φ1 − . . .− φp
ρk dies down. ACF
ρkk dies down. PACF
See Examples Lecture08 Example02.py
ARMA(1, 1)
Forecasting
Yt+1 = c + φ1Yt + θ1εt + εt+1,
ŷt+1 = E(Yt+1|y1, . . . , yt ) = c + φ1yt + θ1ε^t
Var(Yt+1|y1, . . . , yt ) = σ2.
ARMA(1, 1)
Forecasting
Yt+2 = c + φ1Yt+1 + θ1εt+1 + εt+2
= c + φ1(c + φ1Yt + θ1εt + εt+1) + θ1εt+1 + εt+2
= c(1 + φ1) + φ
1Yt + φ1θ1εt + (φ1 + θ1)εt+1 + εt+2
ŷt+2 = c(1 + φ1) + φ
1yt + φ1θ1ε̂t
Var(Yt+2|y1, . . . , yt) = σ2(1 + (φ1 + θ1)2).
Stationary transforms
Box and Jenkins advocate difference transforms to achieve
stationarity, e.g
∆Yt = Yt − Yt−1
∆2Yt = (Yt − Yt−1)− (Yt−1 − Yt−2) = Yt − 2Yt−1 + Yt−1
Stationary transforms
Example: S&P 500 index
Stationary transforms
Example: S&P 500 index
Stationary transforms
Example: S&P 500 index
Taking the first difference:
Stationary transforms
Example: S&P 500 index
Autocorrelations for the differenced series:
Autoregressive Integrated Moving Average Models:
ARIMA(p, d , q)
Suppose we consider the d-order difference of the original
time series {Yt}. Denote Zt = ∆dYt
An ARMA(p,q) model on {Zt} is called an ARIMA(p,d,q)
model on {Yt}
Examples Lecture08 Example03.py
ARIMA(0, 1, 0) model: the random walk model
After taking the first difference, the series ∆Yt is white noise, i.e.,
∆Yt = εt . We can therefore write:
∆Yt = Yt − Yt−1 = εt
The random walk model is:
Yt = Yt−1 + εt .
Adding an intercept, we obtain the random walk plus drift model:
Yt − Yt−1 = c + εt ,
Yt = c + Yt−1 + εt .
ARIMA(0, 1, 0)
Random walk model
Yt = Yt−1 + εt
= Yt−2 + εt−1 + εt
= Yt−3 + εt−1 + εt
ARIMA(0, 1, 0)
Random walk model
Model equation: Yt = Yt−1 + εt
Yt+h = Yt +
ŷt+h = yt
Var(Yt+h|y1:t) = hσ2
ARIMA(0, 1, 0)
Random walk plus drift model
Model equation: Yt = c + Yt−1 + εt
Yt+h = Yt +
(c + εt+i ).
ŷt+h = yt + c × h
Var(Yt+h|y1:t) = hσ2
It is the formal statistical model for the drift forecasting method
mentioned early in the course.
Seasonally adjusted visitor arrivals in Australia
Example of modelling process
Seasonally adjusted visitor arrivals in Australia
Variance stabilising transform
We first take the log of the series as a variance stabilising
transformation:
Log seasonally adjusted visitor arrivals in Australia
ACF and PACF for the log series
Log seasonally adjusted visitor arrivals in Australia
Stationary transform
We then take the first difference:
Log seasonally adjusted visitor arrivals in Australia
Differenced series
Autocorrelations for the differenced series:
Log seasonally adjusted visitor arrivals in Australia
Tentative model identification
The ACF of the differenced series cuts off after lag one.
The PACF seems to die down.
This suggests that the differenced series may be an MA(1)
The original log series would then be an ARIMA(0, 1, 1)
Log seasonally adjusted visitor arrivals in Australia
ARIMA(0, 1, 1) model
Yt − Yt−1 = εt + θ1εt−1
(1− B)Yt = (1 + θ1B)εt
With an intercept:
Yt − Yt−1 = c + εt + θ1εt−1
Log seasonally adjusted visitor arrivals in Australia
Forecasting by adding an intercept to the model
Log seasonally adjusted visitor arrivals in Australia
Residual analysis
ARIMA(0, 1, 1) model
Reinterpreting the model
Consider the ARIMA(0, 1, 1) model with the intercept c = 0:
Model equation: Yt = Yt−1 + εt + θ1εt−1
E(Yt |y1:t−1) = yt−1 + θ1εt−1
= yt−1 + θ1(yt−1 − yt−2 − θ1εt−2)
= (1 + θ1)yt−1 − θ1(yt−2 + θ1εt−2)
Now, label `t−1 = yt−1 + θ1εt−1 and α = (1 + θ1). We get:
`t−1 = αyt−1 + (1− α)`t−2
The simple exponential smoothing model.
ARMA(p, q) vs ARIMA(p, d , q) processes
Formulation with backshift operators
ARMA(p, q) processes in B-form(
ARIMA(p, d , q) processes in B-form(
(1− B)dYt = c +
Procedure to Estimate ARMA(p, q)/ARIMA(p, d , q)
processes:
1 For the given time series {Yt}, check its stationarity by looking at
its Sample ACF and Sample PACF.
2 If ACF does not die down quickly, which means the given time series
{Yt} is nonstationary, we seek for a transformation, e.g., log
transformation {Zt = log(Yt)}, or the first order difference
{Zt = Yt − Yt−1}, or even the difference of log time series, or the
difference of the first order difference, so that the transformed time
series is stationary by checking its Sample ACF
3 When both Sample ACF and Sample PACF die down quickly, check
the orders at which ACF or PACF die down. The order of ACF will
be the lag q of the ARIMA and the order of PACF will be the lag p
of the ARIMA, and the order of difference will be d .
4 Estimate the identified ARIMA(p, d , q), or ARMA(p, q) (if we did
not do any difference transformation)
5 Make forecast with estimated ARIMA(p, d , q), or ARMA(p, q)
ARIMA(p, d , q) processes
Order selection
(1− B)dYt = c +
How to choose p (the number of AR terms) and q (the number of
MA) terms when the ACF and PACF do not give us a
straightforward answer?
ARIMA order selection
Akaike’s Information Criterion
We define Akaike’s Information Criterion as
AIC = −2 log(L) + 2(p + q + k + 1),
where L is the likelihood of the data and k = 1 if the model
has an intercept.
The model with the minimum value of the AIC is often the
best model for forecasting.
The corrected AIC described in FPP has better performance
in small samples.
ARIMA order selection
Corrected Akaike’s Information Criterion
The corrected Akaike’s Information Criterion is
AICc = AIC +
2(p + q + k + 1)(p + q + k + 2)
n − p − q − k − 2
where n is the number of observations.
The corrected AIC has penalises extra parameters more
heavily and has better performance in small samples.
The AICc is the foremost criterion used by researchers in
selecting the orders of ARIMA modesl.
The AICc is based on the assumption of normally distributed
residuals.
ARIMA order selection
Information Criterion
A related measure is Schwarz’s Bayesian Information Criterion
(known as SBIC, BIC or SC):
BIC =− 2 log(L) + log(n)(p + q + k + 1)
=AIC + (log(n)− 2)(p + q + k + 1).
As with the AIC, minimizing the BIC is intended to give the
best model. The model chosen by BIC is either the same as
that chosen by AIC, or one with fewer paramteres. This is
because BIC penalizes the model complexity more heavily
than the AIC.
Under some mathematical assumptions, BIC can select the
true model with enough data (if the true model exists!)
ARIMA model selection: Procedure
Make sure the given time series is stationary, may need
transformation.
Check its sample ACF and PACF and find the larger AR order
For each pair of (p, q) where 1 ≤ p ≤ p0 and 1 ≤ q ≤ q0,
estimate the model ARIMA(p,d,q) or ARMA(p,q) (if we did
not do any difference transformation)
Find out that pair (p, q) which gives the best BIC (or AIC, or
AICc) [depending on which criterion you choose]
ARIMA model selection: example
Alcohol related assaults in NSW
ARIMA model selection: example
Log series
ARIMA model selection: example
First differenced log series
ARIMA model selection: example
ACF and PACF for the first differenced series
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com