CS代写 QBUS 6840 Lecture 8 ARIMA models (II)

QBUS 6840 Lecture 8 ARIMA models (II)

QBUS 6840 Lecture 8

ARIMA models (II)

The University of School

Objectives

Fully understand MA(q) models and their PACF characteristic

Be able to conduct forecast and variance derivation

Be very familiar with using B operator in all the related models

Understand the concept of invertibility

Fully understand ARMA(p,q) and their characteristics

Fully understand ARIMA(p,q,d)

Understand estimating ARIMA(p,q,d) models and general
procedures and model selection

Review of ACF and PACF

For nonseasonal time series, if the ACF either cuts off fairly
quickly or dies down fairly quickly, then the time series should
be considered stationary

For nonseasonal time series, if the ACF dies down extremely
slowly, then it should be considered nonstationary

Review of AR(p) Processes

Yt = c + φ1Yt−1 + φ2Yt−2 + · · ·+ φpYt−p + εt
where εt is i.i.d. with mean zero and variance σ

Data characteristics

The ACF dies down

The PACF has spikes at lags 1, 2, …, p and cuts off after lag p

Model characteristics (Examples)

For an AR(1) model Yt = c + φ1Yt−1 + εt to be statioanry:

−1 < φ1 < 1 Review of AR(p) Processes Consider the following Yt+2 = c + Yt + 0.23Yt−1 + 0.41Yt−2 + εt+2 where εt is i.i.d. with mean zero and variance σ What is the order p of this AR model? Moving average (MA) processes MA(q) processes Yt = c + εt + θ1εt−1 + θ2εt−2 + . . .+ θqεt−q, where εt is i.i.d. with mean zero and variance σ a weighted moving average of the past few forecast errors. appropriate to model quantities Yt , such as economic indicators, which are affected by random events that have both immediate and persistent effect on Yt sometimes, the εt are called random shocks: shocks caused by unpredictable events See example Lecture08 Example01.py Example: MA(1) process Yt = c + εt + θ1εt−1. Unconditional Expectation E[Yt ] = E[c + εt + θ1εt−1] = c + 0 + θ1 × 0 = c Unconditional variance: V(Yt) = V(c) + V(εt) + V(θ1εt−1) = 0 + σ2 + σ2θ21 = σ 2(1 + θ21) Example: MA(1) process Properties Covariance: V(Yt) = V(c) + V(εt) + V(θ1εt−1) = 0 + σ2 + σ2θ21 = σ 2(1 + θ21) Cov(Yt ,Yt−1) =Cov(c + εt + θ1εt−1, c + εt−1 + θ1εt−2) =Cov(c , c) + Cov(c , εt−1) + Cov(c , θ1εt−2) + Cov(εt , c) + Cov(εt , εt−1) + Cov(εt , θ1εt−2) + Cov(θ1εt−1, c) + Cov(θ1εt−1, εt−1) + Cov(θ1εt−1, θ1εt−2) =θ1Cov(εt−1, εt−1) = θ1V(εt−1) = θ1σ2 Autocorrelation: Cov(Yt ,Yt−1) Example: MA(1) process Properties Cov(Yt ,Yt−2) = 0, (Why?) ρk = 0 for k > 1.

Conclusion: MA(1) process is stationary for every θ1, and its ACF
plot cuts off after lag 1. Partial ACF:

1− θ2(k+1)1

Partial ACF plot dies down exponentially when |θ1| < 1. Example: MA(1) process Forecasting From the process Yt+1 = c + εt+1 + θ1εt we easily have the following one-step ahead forecasting: E(Yt+1|y1:t) = ŷt+1 = c + θ1ε̂t , V(Yt+1|y1:t) = σ2. We use the forecast errors ε̂1, ..., ε̂t from the previous periods to construct the next forecast at time t + 1. Let the forecast at time t is ŷt , and forecast error ε̂t = yt − ŷt = yt − (c + θ1ε̂t−1) Forecast of Yt+1 is ŷt+1 = c + θ1ε̂t and the forecast error ε̂t+1 = yt+1 − ŷt+1 = yt+1 − (c + θ1ε̂t) 10 / 67 Example: MA(1) process Forecasting E(Yt+2|y1:t) = c + E(εt+2|y1:t) + θ1E(εt+1|y1:t) = c so ŷt+2|t = c and V(Yt+2|y1:t) = σ2(1 + θ21) In general, it is easy to show that E(Yt+h|y1:t) = c for h > 1

V(Yt+h|y1:t) = σ2(1 + θ21) for h > 1

Example: MA(1) process
Forecasting

MA(q) processes
Properties

Unconditional:

V(Yt) =Cov(c + εt + θ1εt−1 + θ2εt−2 + . . .+ θqεt−q,
c + εt + θ1εt−1 + θ2εt−2 + . . .+ θqεt−q)

=σ2(1 + θ21 + · · ·+ θ

Cov(Yt ,Yt−1) =Cov(c + εt + θ1εt−1 + θ2εt−2 + . . .+ θqεt−q,

c + εt−1 + θ1εt−2 + θ2εt−3 + . . .+ θqεt−q−1)

=σ2(θ1 + θ1θ2 + θ2θ3 + . . .+ θq−1θq).

θ1 + θ1θ2 + θ2θ3 + . . .+ θq−1θq

1 + θ21 + . . .+ θ

MA(q) processes
Properties

Cov(Yt ,Yt−q) =Cov(c + εt + θ1εt−1 + θ2εt−2 + . . .+ θqεt−q,

c + εt−q + θ1εt−q−1 + θ2εt−q−2 + . . .+ θqεt−2q)

1 + θ21 + . . .+ θ

ρk = 0 for k > q

Question: What about ρk if 2 ≤ k < q? θk + θk+1θ1 + · · ·+ θqθq−k 1 + θ21 + . . .+ θ MA(q) processes Properties Cov(Yt ,Yt−q) =Cov(c + εt + θ1εt−1 + θ2εt−2 + . . .+ θqεt−q, c + εt−q + θ1εt−q−1 + θ2εt−q−2 + . . .+ θqεt−2q) 1 + θ21 + . . .+ θ ρk = 0 for k > q

Question: What about ρk if 2 ≤ k < q? θk + θk+1θ1 + · · ·+ θqθq−k 1 + θ21 + . . .+ θ MA(q) processes Forecasting ŷt+h = E(Yt+h|y1:t) = c + θ1E(εt+h−1|y1:t) + . . .+ θqE(εt+h−q|y1:t), E(εt+h−i |y1:t) = 0 if h > i

ε̂t+h−i if h ≤ i

V(Yt+h|y1:t) = σ2
1 + min(q,h−1)∑

Example: MA(3) Forecasting
Forecasting if we know all the parameters c and θi s

ŷt+h =E(Yt+h|y1:t)
=c + θ1E(εt+h−1|y1:t) + θ2E(εt+h−2|y1:t) + θ3E(εt+h−3|y1:t),

ŷt+1 =c + θ1E(εt |y1:t) + θ2E(εt−1|y1:t) + θ3E(εt−2|y1:t)
=c + θ1ε̂t + θ2ε̂t−1 + θ3ε̂t−2

ŷt+2 =c + θ1E(εt+1|y1:t) + θ2E(εt |y1:t) + θ3E(εt−1|y1:t)
=c + θ1 × 0 + θ2ε̂t + θ3ε̂t−1 = c + θ2ε̂t + θ3ε̂t−1

ŷt+3 =c + θ1E(εt+2|y1:t) + θ2E(εt+1|y1:t) + θ3E(εt |y1:t)
=c + θ1 × 0 + θ2 × 0 + θ3ε̂t = c + θ3ε̂t

MA(q) processes
Properties and Estimation

ρk (ACF) cuts off after lag q.

ρkk (PACF) dies down exponentially.

What is the best way for us to estimate all the parameters c
and θi ’s (1 ≤ i ≤ q)?

Backshift operators

We now introduce the Backshift operator, which is very useful for
describing time series models

BYt = Yt−1

B2Yt = B(BYt) = B(Yt−1) = Yt−2

BkYt = Yt−k

Particularly for a constant series {d}, we define

Backshift operators
In context: AR(1)

Yt = c + φ1Yt−1 + εt

where gives µ = E(Yt) = E(Yt−1) = c/(1− φ1)

(1− φ1B)Yt = c + εt

(1− φ1B)(Yt − µ) = εt
which comes from the fact c = (1− φ1)µ = (1− φ1B)µ, which is
from Bd = d for any constant d .
Denote Zt = Yt − µ, then

(1− φ1B)Zt = εt =⇒ Zt = φ1Zt−1 + εt

Backshift operators
In context: MA(1)

Yt = c + εt + θ1εt−1

which gives µ = E(Yt) = c.

Yt = c + (1 + θ1B)εt

(Yt − µ) = (1 + θ1B)εt

Denote Zt = Yt − µ, then

Zt = (1 + θ1B)εt =⇒ Zt = εt + θ1εt−1

Backshift operators
In context: MA(1)

Yt = c + εt + θ1εt−1

which gives µ = E(Yt) = c.

Yt = c + (1 + θ1B)εt

(Yt − µ) = (1 + θ1B)εt
Denote Zt = Yt − µ, then

Zt = (1 + θ1B)εt =⇒ Zt = εt + θ1εt−1

Backshift operators
In context: AR(p)

Yt = c + φ1Yt−1 + . . .+ φpYt−p + εt

= c + φ1B(Yt) + . . .+ φpB
p(Yt) + εt

(1− φ1B − φ2B2 − . . .− φpBp)(Yt − µ) = εt
where µ = c/(1− φ1 − φ2 − · · · − φp),

i )(Yt − µ) = εt

Backshift operators
In context: MA(q)

Yt = c + θ1εt−1 + . . .+ θqεt−q + εt

= c + θ1B(εt) + . . .+ θqB
q(εt) + εt

(Yt − µ) = (1 + θ1B + θ2B2 + . . .+ θqBq)εt

(Yt − µ) = (1 +

where µ = c .

Invertibility

Definition

An MA(q) process is invertible when we can rewrite Yt as a linear
combination of its past values (an AR(∞)) plus the
contemporaneous error term εt .

Yt = εt + c
∗ + φ∗1Yt−1 + φ

3Yt−3 + · · ·

where c∗ and each φ∗ can be calculated from the MA parameters
θi (1 ≤ i ≤ q)

Invertibility
MA(1) (optional)

Yt = c + θ1εt−1 + εt

Note: For MA processes c = µ

(Yt − µ) = (1 + θ1B)εt ⇒ εt =

= 1− x + x2 − x3 + · · · for |x | < 1] Under the condition |θ1| < 1, we have εt = (1− θ1B + θ21B 3 + . . .)(Yt − µ) εt = −µ(1− θ1 + θ21 − θ 1 + . . .) + Yt − θ1BYt + θ 2Yt − · · · (−1)iθi1Yt−i + εt Invertibility MA(1) (alternative route) (optional) The MA(1) gives εt = Yt − c − θ1εt−1 Yt = c + θ1εt−1 + εt = c + θ1(yt−1 − c − θ1εt−2) + εt = c(1− θ1) + θ1yt−1 − θ21εt−2 + εt = c(1− θ1 + θ21) + θ1yt−1−θ 1εt−3 + εt = c(1− θ1 + θ21 − θ 1 + . . .)− (−1)iθi1Yt−i + εt (−1)iθi1Yt−i + εt or εt = Yt − c∗ + (−1)iθi1Yt−i Invertibility Why it matters If we want to find the value εt at a certain period and the process is invertible, we need to know the current and past values of Y . For a noninvertible representation we would need to use all future values of Y ! Convenient algorithms for estimating parameters and forecasting are only valid if we use an invertible representation. Invertibility MA(1) (Estimate) (optional) We wish to find θ1 such that From previous formula, we know, given Y1,Y2, ...,YT , ε2 = Y2 − c∗ − θ1Y1 ε3 = Y3 − c∗ − θ1Y2 + θ21Y1 ε4 = Y4 − c∗ − θ1Y3 + θ21Y2 − θ εT = YT − c∗ − θ1YT−1 + θ21YT−2 − · · ·+ (−1) T−1θT−11 Y1 This results in a nonlinear least square problem. Harder than AR Invertibility What about MA(1) in the case of θ1 > 1? (optional)

Note that for any t

(Yt+1 − c − εt+1)

Yt = c + θ1εt−1 + εt = c + θ1εt−1 +

(Yt+1 − c − εt+1)

) + θ1εt−1 +

(Yt+2 − c − εt+2)

= θ1εt−1 + c(1−

+ . . .) +

We have to use future Y s to express any εt , so there is no way to

t for estimating θ1. 28 / 67

Every invertible MA(q) model can be written as an AR model
of infinite order.

If the coefficient terms on yt−k in the AR representation
decline with k then the MA model is invertible. So is AR(p)
invertible?

A MA(1) requires that |θ1| < 1 for invertibility. Every stationary AR(p) model can be written as a MA model of infinite order. Example: AR(1) as MA(∞) Yt = c + φ1Yt−1 + εt = c(1 + φ1) + φ 1Yt−2 + φ1εt−1 + εt = c(1 + φ1 + φ 1εt−2 + φ1εt−1 + εt = c(1 + φ1 + . . .+ φ φi1εt−i + εt φi1εt−i + εt Checking Stationarity of AR(p) Consider the AR(p) process Yt = φ1Yt−1 + φ2Yt−2 + · · ·+ φpYt−p + εt Accordingly, define the characteristic equation 1− φ1z − φ2z2 − · · · − φpzp = 0 whose roots are called the characteristic roots. There are p such roots, although some of them may be equal. Conclusion: The AR(p) is stationary if all the roots satisfy For example, the AR(1) is Yt = φ1Yt−1 + εt . The characteristic equation is 1− φ1z = 0 and its only root is z∗ = 1/φ1. |z∗| > 1 implies the AR(1) stationarity. This
means |φ1| < 1. Checking Invertibility of MA(q) Consider the MA(q) process Yt = εt + θ1εt−1 + θ2εt−2 + · · ·+ θqεt−q Accordingly, define the characteristic equation 1 + θ1z + θ2z 2 + · · ·+ θqzq = 0 whose roots are called the characteristic roots. There are q such roots, although some of them may be equal. Conclusion: The MA(q) is invertible if all the roots satisfy For example, the MA(1) is Yt = εt + θ1εt−1. The characteristic equation is 1 + θ1z = and its only root is z∗ = −1/θ1. |z∗| > 1 implies the MA(1) is invertible. This
means |θ1| < 1. ARMA(p, q) processes Yt = c + φ1Yt−1 + . . .+ φpYt−p + θ1εt−1 + . . .+ θqεt−q + εt , where εt is i.i.d. with mean zero and variance σ Example: ARMA(0, 0) : (White Noise) Yt = c + εt , Example: ARMA(1, 1) : Yt = c + φ1Yt−1 + θ1εt−1 + εt , ARMA(p, q) processes Properties 1− φ1 − . . .− φp ρk dies down. ACF ρkk dies down. PACF See Examples Lecture08 Example02.py ARMA(1, 1) Forecasting Yt+1 = c + φ1Yt + θ1εt + εt+1, ŷt+1 = E(Yt+1|y1, . . . , yt ) = c + φ1yt + θ1ε^t Var(Yt+1|y1, . . . , yt ) = σ2. ARMA(1, 1) Forecasting Yt+2 = c + φ1Yt+1 + θ1εt+1 + εt+2 = c + φ1(c + φ1Yt + θ1εt + εt+1) + θ1εt+1 + εt+2 = c(1 + φ1) + φ 1Yt + φ1θ1εt + (φ1 + θ1)εt+1 + εt+2 ŷt+2 = c(1 + φ1) + φ 1yt + φ1θ1ε̂t Var(Yt+2|y1, . . . , yt) = σ2(1 + (φ1 + θ1)2). Stationary transforms Box and Jenkins advocate difference transforms to achieve stationarity, e.g ∆Yt = Yt − Yt−1 ∆2Yt = (Yt − Yt−1)− (Yt−1 − Yt−2) = Yt − 2Yt−1 + Yt−1 Stationary transforms Example: S&P 500 index Stationary transforms Example: S&P 500 index Stationary transforms Example: S&P 500 index Taking the first difference: Stationary transforms Example: S&P 500 index Autocorrelations for the differenced series: Autoregressive Integrated Moving Average Models: ARIMA(p, d , q) Suppose we consider the d-order difference of the original time series {Yt}. Denote Zt = ∆dYt An ARMA(p,q) model on {Zt} is called an ARIMA(p,d,q) model on {Yt} Examples Lecture08 Example03.py ARIMA(0, 1, 0) model: the random walk model After taking the first difference, the series ∆Yt is white noise, i.e., ∆Yt = εt . We can therefore write: ∆Yt = Yt − Yt−1 = εt The random walk model is: Yt = Yt−1 + εt . Adding an intercept, we obtain the random walk plus drift model: Yt − Yt−1 = c + εt , Yt = c + Yt−1 + εt . ARIMA(0, 1, 0) Random walk model Yt = Yt−1 + εt = Yt−2 + εt−1 + εt = Yt−3 + εt−1 + εt ARIMA(0, 1, 0) Random walk model Model equation: Yt = Yt−1 + εt Yt+h = Yt + ŷt+h = yt Var(Yt+h|y1:t) = hσ2 ARIMA(0, 1, 0) Random walk plus drift model Model equation: Yt = c + Yt−1 + εt Yt+h = Yt + (c + εt+i ). ŷt+h = yt + c × h Var(Yt+h|y1:t) = hσ2 It is the formal statistical model for the drift forecasting method mentioned early in the course. Seasonally adjusted visitor arrivals in Australia Example of modelling process Seasonally adjusted visitor arrivals in Australia Variance stabilising transform We first take the log of the series as a variance stabilising transformation: Log seasonally adjusted visitor arrivals in Australia ACF and PACF for the log series Log seasonally adjusted visitor arrivals in Australia Stationary transform We then take the first difference: Log seasonally adjusted visitor arrivals in Australia Differenced series Autocorrelations for the differenced series: Log seasonally adjusted visitor arrivals in Australia Tentative model identification The ACF of the differenced series cuts off after lag one. The PACF seems to die down. This suggests that the differenced series may be an MA(1) The original log series would then be an ARIMA(0, 1, 1) Log seasonally adjusted visitor arrivals in Australia ARIMA(0, 1, 1) model Yt − Yt−1 = εt + θ1εt−1 (1− B)Yt = (1 + θ1B)εt With an intercept: Yt − Yt−1 = c + εt + θ1εt−1 Log seasonally adjusted visitor arrivals in Australia Forecasting by adding an intercept to the model Log seasonally adjusted visitor arrivals in Australia Residual analysis ARIMA(0, 1, 1) model Reinterpreting the model Consider the ARIMA(0, 1, 1) model with the intercept c = 0: Model equation: Yt = Yt−1 + εt + θ1εt−1 E(Yt |y1:t−1) = yt−1 + θ1εt−1 = yt−1 + θ1(yt−1 − yt−2 − θ1εt−2) = (1 + θ1)yt−1 − θ1(yt−2 + θ1εt−2) Now, label `t−1 = yt−1 + θ1εt−1 and α = (1 + θ1). We get: `t−1 = αyt−1 + (1− α)`t−2 The simple exponential smoothing model. ARMA(p, q) vs ARIMA(p, d , q) processes Formulation with backshift operators ARMA(p, q) processes in B-form( ARIMA(p, d , q) processes in B-form( (1− B)dYt = c + Procedure to Estimate ARMA(p, q)/ARIMA(p, d , q) processes: 1 For the given time series {Yt}, check its stationarity by looking at its Sample ACF and Sample PACF. 2 If ACF does not die down quickly, which means the given time series {Yt} is nonstationary, we seek for a transformation, e.g., log transformation {Zt = log(Yt)}, or the first order difference {Zt = Yt − Yt−1}, or even the difference of log time series, or the difference of the first order difference, so that the transformed time series is stationary by checking its Sample ACF 3 When both Sample ACF and Sample PACF die down quickly, check the orders at which ACF or PACF die down. The order of ACF will be the lag q of the ARIMA and the order of PACF will be the lag p of the ARIMA, and the order of difference will be d . 4 Estimate the identified ARIMA(p, d , q), or ARMA(p, q) (if we did not do any difference transformation) 5 Make forecast with estimated ARIMA(p, d , q), or ARMA(p, q) ARIMA(p, d , q) processes Order selection (1− B)dYt = c + How to choose p (the number of AR terms) and q (the number of MA) terms when the ACF and PACF do not give us a straightforward answer? ARIMA order selection Akaike’s Information Criterion We define Akaike’s Information Criterion as AIC = −2 log(L) + 2(p + q + k + 1), where L is the likelihood of the data and k = 1 if the model has an intercept. The model with the minimum value of the AIC is often the best model for forecasting. The corrected AIC described in FPP has better performance in small samples. ARIMA order selection Corrected Akaike’s Information Criterion The corrected Akaike’s Information Criterion is AICc = AIC + 2(p + q + k + 1)(p + q + k + 2) n − p − q − k − 2 where n is the number of observations. The corrected AIC has penalises extra parameters more heavily and has better performance in small samples. The AICc is the foremost criterion used by researchers in selecting the orders of ARIMA modesl. The AICc is based on the assumption of normally distributed residuals. ARIMA order selection Information Criterion A related measure is Schwarz’s Bayesian Information Criterion (known as SBIC, BIC or SC): BIC =− 2 log(L) + log(n)(p + q + k + 1) =AIC + (log(n)− 2)(p + q + k + 1). As with the AIC, minimizing the BIC is intended to give the best model. The model chosen by BIC is either the same as that chosen by AIC, or one with fewer paramteres. This is because BIC penalizes the model complexity more heavily than the AIC. Under some mathematical assumptions, BIC can select the true model with enough data (if the true model exists!) ARIMA model selection: Procedure Make sure the given time series is stationary, may need transformation. Check its sample ACF and PACF and find the larger AR order For each pair of (p, q) where 1 ≤ p ≤ p0 and 1 ≤ q ≤ q0, estimate the model ARIMA(p,d,q) or ARMA(p,q) (if we did not do any difference transformation) Find out that pair (p, q) which gives the best BIC (or AIC, or AICc) [depending on which criterion you choose] ARIMA model selection: example Alcohol related assaults in NSW ARIMA model selection: example Log series ARIMA model selection: example First differenced log series ARIMA model selection: example ACF and PACF for the first differenced series 程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts