程序代写代做代考 Predictive Analytics – Week 13: ARIMA models (optional)

Predictive Analytics – Week 13: ARIMA models (optional)

Predictive Analytics
Week 13: ARIMA models (optional)

Semester 2, 2018

Discipline of Business Analytics, The University of Sydney Business School

Week 13: ARIMA models (optional)

1. Stationarity and Box-Jenkins methodology

2. Differencing

3. Models for stationary series

4. ARIMA models

5. Seasonal ARIMA models

2/63

ARIMA models

This module provides a discussion of ARIMA models, the most
widely used methods for univariate time series forecasting.

ARIMA models aim to describe the serial dependence in the data,
rather than to directly describe the time series components as in
exponential smoothing. The two approaches are complementary.

3/63

Stationarity and Box-Jenkins
methodology

Stationarity (key concept)

Intuitively, a stationary time series is one whose properties do not
depend on the time at which we observe it.

Time series with trend and seasonality are not stationary, since
these patterns affect the change the mean of the series over time.

4/63

Strict stationarity (key concept)

Formally, a time series process is strictly stationary when the
joint distribution of Yt, Yt−1, . . . , Yt−k does not depend on t. That
is, the joint density

p(yt, yt−1, . . . , yt−k)

does not depend on t.

5/63

Weak stationarity (key concept)

A process is weakly stationary or covariance stationary if its
mean, variance and autocovariances do not change over time.
That is,

E(Yt) = µ,

Var(Yt) = σ2,

Cov(Yt, Yt−k) = Cov(Yt, Yt−k) = γk,

for all t and k.

6/63

ARIMA models

• Introduced in the seminal book “Time Series Analysis
Forecasting and Control” (1970) by Box and Jenkins.

• The Box-Jenkins approach relies on (a) finding a stationary
transformation of the data (b) modelling the autocorrelations
in the transformed data.

• This approach contrast with exponential smoothing, where we
explicitly model the different time series components through
additive or multiplicative specifications.

7/63

Box Jenkins methodology

• We consider log or Box-Cox transformations to stabilise the
variance of series.

• Differencing (next section) leads to stationarity in the mean
by removing changes in the level of the series (due for
example to trend and seasonality).

• Autocorrelation (ACF) and partial autocorrelation (PACF)
plots help us to assess stationarity and to identify suitable
specification for the stationary transformation of the series.

8/63

Partial autocorrelation function (PACF)

• The partial autocorrelation of order k (labelled ρkk) is the
correlation between Yt and Yt−k net of effects at times
t− 1, t− 2, . . . , t− k + 1.

• rkk estimates ρkk.

9/63

PACF: interpretation via autoregressions

Yt = ρ10 + ρ11Yt−1 + at
Yt = ρ20 + ρ21Yt−1 + ρ22Yt−2 + at
Yt = ρk0 + ρk1Yt−1 + ρk2Yt−2 + . . .+ ρkkYt−k + at

10/63

Differencing

Differencing (key concept)

Box and Jenkins advocate difference transforms to achieve
stationarity. The fist difference of a time series is

∆Yt = Yt − Yt−1

11/63

Example: random walk

In the random walk model

Yt = Yt−1 + εt,

the first difference leads to stationary white noise series

∆Yt = Yt − Yt−1 = εt.

12/63

Second order differencing

In rare cases, it may be necessary to difference the series a second
time to obtain stationarity:

∆2Yt = (Yt − Yt−1)− (Yt−1 − Yt−2) = Yt − 2Yt−1 + Yt−2

13/63

Differencing

The ACF helps us to to determine whether the time series needs
differencing (or further differencing).

• The ACF of a non-stationary series will decrease slowly.

• The ACF of a stationary series should drop to zero relatively
quickly.

Unit root tests are also common for determining the need for
differencing, but sensitive to assumptions. When in doubt, use
model selection for model selection, not hypothesis testing.

14/63

Example: AUD/USD exchange rate

15/63

Example: AUD/USD exchange rate

ACF and PACF for the time series

16/63

Example: AUD/USD exchange rate

17/63

Example: AUD/USD exchange rate

ACF and PACF for the first differenced series

18/63

Seasonal differencing

We can use seasonal differencing to address non-stationarity
caused by seasonality:

∆mYt = Yt − Yt−m,

where m is the number of seasons.

The ACF of a series that needs seasonal differencing will decrease
slowly at the seasonal lags m, 2m, 3m, etc.

19/63

First and seasonal differencing

Time series that have a changing level and a seasonal pattern may
require both first and seasonal differencing for stationarity.

The first and seasonally differenced series is

∆m(∆Yt) = (Yt − Yt−1)− (Yt−m − Yt−m−1),

noting that the order of differencing does not matter.

20/63

Example: Visitor Arrivals in Australia

21/63

Example: Visitor Arrivals in Australia

Log transformation

22/63

Example: Visitor Arrivals in Australia

First differenced log series

23/63

Example: Visitor Arrivals in Australia

ACF and PACF for the first differenced log series

24/63

Example: Visitor Arrivals in Australia

First and seasonally differenced series

25/63

Example: Visitor Arrivals in Australia

ACF and PACF for the first and seasonally differenced log series

26/63

Backshift notation

The backshift operator is a useful notational device for ARIMA
models.

BYt = Yt−1

We can manipulate the backshift operator with standard algebra,
for example

B2Yt = B(BYt) = BYt−1 = Yt−2.

Therefore,
BkYt = Yt−k.

27/63

Differencing in backshift notation

First differenced series:

(1−B)Yt = Yt −BYt = Yt − Yt−1

Seasonally differenced series:

(1−Bm)Yt = Yt −BmYt = Yt − Yt−m

First and seasonally differenced series:

(1−B)(1−Bm)Yt = (1−B −Bm +Bm+1)
= (Yt − Yt−1)− (Yt−m − Yt−m−1)

28/63

Models for stationary series

Autoregressive (AR) model (key concept)

The autoregressive model of order p, or AR(p) model, is

Yt = c+ φ1Yt−1 + φ2Yt−2 + . . .+ φpYt−p + εt,

where εt is a white noise series.

29/63

Example: AR(1) model

AR(1) model:
Yt = c+ φ1Yt−1 + εt,

where εt is i.i.d. with mean zero and variance σ2.

E(Yt|y1, . . . , yt−1) = E(Yt|yt−1) = c+ φ1yt−1.

Var(Yt|y1, . . . , yt−1) = Var(Yt|yt−1) = σ2.

30/63

AR(1) illustration: φ = 0.7

31/63

AR(1) illustration: φ = −0.7

32/63

AR model: ACF and PACF identification (key concept)

For an AR(p) process, we can show that:

• The theoretical autocorrelations ρk decrease exponentially.

• The theoretical partial autocorrelation ρkk cuts off to zero
after lag p.

• The pth partial autocorrelation ρpp is φp.

33/63

AR(1) with φ = 0.7: ACF (left) and Partial ACF (right)

34/63

AR(p) model forecasts

From the linearity of expectations,

E(Yt+h|y1:t) = c+ φ1E(Yt+h−1|y1:t) + . . .+ φpE(Yt+h−p|y1:t),

where

E(Yt+h−i|y1:t) =


ŷt+h−i if h > 1yt+h−i if h ≤ i.

35/63

Example: AR(1) model

Yt = c+ φ1Yt−1 + εt

For t+ 1,

ŷt+1 = E(Yt+1|y1:t)
= E(c+ φ1Yt + εt+1|y1:t) = c+ φ1yt

Var(Yt+1|y1:t) = σ2.

36/63

Example: AR(1) model

For t+ 2,

ŷt+2 = c+ φ1ŷt+1
= c(1 + φ1) + φ21yt.

Var(Yt+2|y1:t) = Var(φ1Yt+1 + εt+2|y1:t)
= φ21Var(Yt+1|y1:t) + σ

2

= (1 + φ21)σ
2

37/63

Example: AR(1) model

ŷt+h = c+ φ1yt+h−1
= c(1 + φ1 + φ21 + . . .+ φ

h−1
1 ) + φ

h
1yt

Var(Yt+h|y1:t) = φ21Var(Yt+h−1|y1:t) + σ
2

= σ2(1 + φ21 + . . .+ φ
2(h−1)
1 ).

As h gets larger, both the point forecast and the conditional
variance converge exponentially to a constant.

38/63

Illustration: AR(1) forecast

39/63

Stationarity conditions

AR(p) model:

Yt = c+ φ1Yt−1 + φ2Yt−2 + . . .+ φpYt−p + εt.

We need to impose restrictions on the AR coefficients such that
the model is stationary.

AR(1): −1 < φ1 < 1. AR(2): −1 < φ2 < 1, φ1 + φ2 < 1, φ2 − φ1 < 1. AR(p) with p > 2: more technical.

40/63

Moving average (MA) model (key concept)

The moving average model of order q, or MA(q) model, is

Yt = c+ εt + θ1εt−1 + θ2εt−2 + . . .+ θqεt−q,

where εt is a white noise series.

41/63

Example: MA(1) process

The
Yt = c+ εt + θ1εt−1.

E(Yt|yt−1) = c+ θ1εt−1

Var(Yt|yt−1) = σ2

42/63

MA model: ACF and PACF identification

For an MA(q) process, we can show that:

• The theoretical autocorrelation ρk cuts off after lag q.

• The theoretical partial autocorrelations ρkk decrease
exponentially.

43/63

Invertibility

• An MA(q) process is invertible when we can write it as a
linear combination of its past values (an AR(∞) process) plus
the contemporaneous error term.

• Estimation and forecasting methods for MA models rely on
invertibility. We therefore impose restrictions on the MA
coefficients such that invertibility holds.

44/63

ARMA(p, q) model (key concept)

The ARMA(p, q) model is

Yt = c+ φ1Yt−1 + . . .+ φpYt−p + θ1εt−1 + . . .+ θqεt−q + εt,

where εt is a white noise series

In backshift notation,(
1−

p∑
i=1

φiB
i

)
Yt = c+

(
1 +

p∑
i=1

θiB
i

)
εt.

The autocorrelations and partial autocorrelations decrease
exponentially for ARMA processes.

45/63

Example: ARMA(1, 1)

The ARMA(1,1) model is

Yt = c+ φ1Yt−1 + θ1εt−1 + εt.

In backshift notation,

(1− φ1B)Yt = c+ (1 + θ1B)ε.

46/63

ARIMA models

ARIMA(p,d,q) model (key concept)

The ARIMA(p,d q) model is(
1−

p∑
i=1

φiB
i

)
(1−B)d Yt = c+

(
1 +

p∑
i=1

θiB
i

)
εt,

p : autoregressive order.

d : degree of first differencing (nearly always d = 0 or d = 1).

q : moving average order.

47/63

ARIMA(p,d,q) model

ARIMA(p,d q) model:(
1−

p∑
i=1

φiB
i

)
︸ ︷︷ ︸
AR (p) component

(1−B)d︸ ︷︷ ︸
Differencing

Yt = c+
(

1 +
p∑

i=1
θiB

i

)
︸ ︷︷ ︸
MA(q) component

εt.

The ARIMA(p,d q) model specifies a stationary ARMA(p,q) model
for the differenced series.

48/63

Example: ARIMA(0,1,1) model

The ARIMA(0,1,1) model is an MA(1) model for the first
differenced series,

Yt − Yt−1 = εt + θ1εt−1.

In backshift notation,

(1−B)Yt = (1 + θ1B)εt.

With an intercept:

Yt − Yt−1 = c+ εt + θ1εt−1

49/63

ARIMA(0,1,1): relation to exponential smoothing

ARIMA(0,1,1): Yt = Yt−1 + εt + θ1εt−1

E(Yt|y1:t−1) = yt−1 + θ1εt−1
= yt−1 + θ1(yt−1 − yt−2 − θ1εt−2)
= (1 + θ1)yt−1 − θ1(yt−2 + θ1εt−2)

Now, label `t−1 = yt−1 + θ1εt−1 and α = (1 + θ1). We get:

`t−1 = αyt−1 + (1− α)`t−2

The simple exponential smoothing model.

50/63

Intercept in a first differenced series

The inclusion of an intercept induces a linear trend in an
ARIMA(p,1,q) model.

For example, in the random walk plus drift model

Yt = c+ Yt−1 + εt,

we can derive

Yt+h = Yt +
h∑

i=1
(c+ εt+i),

ŷt+h = yt + c× h,

Var(Yt+h|y1:t) = hσ2.

51/63

ARIMA modelling

• Estimation: maximum likelihood.

• Order selection (p, q): visual identification, AIC, and model
validation.

• Intercept terms induce permanent trends. Use model
selection.

52/63

Seasonal ARIMA models

Seasonal ARIMA: ACF and PACF identification (key concept)

We refer to a seasonal ARIMA model as

ARIMA (p, d, q)︸ ︷︷ ︸
Non-seasonal

(P,D,Q)m︸ ︷︷ ︸
Seasonal

,

where D is the order of seasonal differencing, P and Q are the
orders of the seasonal AR and MA components, and m is the
number of seasons.

53/63

Seasonal ARIMA: ACF and PACF identification (key concept)

ARIMA(0,0,0)(P ,0,0)

• Sample autocorrelations decrease exponentially for lags m,
2m, 3m, etc.

• Sample partial autocorrelations cuts off at lag Pm.

ARIMA(0,0,0)(0,0,Q)

• Sample autocorrelations cuts off at lag Qm.

• Sample partial autocorrelations decrease exponentially for lags
m, 2m, 3m, etc.

54/63

Seasonal ARIMA models

Seasonal AR(1) or ARIMA(0, 0, 0)(1, 1, 0)12:

Yt − Yt−12 = c+ φ1(Yt−12 − Yt−24) + εt

Seasonal MA(1) or ARIMA(0, 0, 0)(0, 1, 1)12:

Yt − Yt−12 = c+ θ1εt−12 + εt

55/63

Seasonal ARIMA models

ARIMA(1,0,0)(0,1,1)12 model:

(1− φ1B)(1−B12)Yt = c+ (1 + θ1B12)εt

Yt − Y12 = c+ φ1(Yt−1 − Y13) + εt + θ1εt−12

56/63

Seasonal ARIMA models

ARIMA(1,1,1)(1,1,0)12 model:

(1− φ1B)(1− φ2B12)(1−B)(1−B12)Yt = c+ (1 + θ1B)εt

(Yt − Yt−1)− (Yt−12 − Yt−13) = c+ φ1 [(Yt−1 − Yt−2)− (Yt−13 − Yt−14)]
+ φ2 [(Yt−12 − Yt−13)− (Yt−24 − Yt−25)]
+ φ1φ2 [(Yt−13 − Yt−14)− (Yt−25 − Yt−26)]
+ εt + θ1εt−1

57/63

Seasonal ARIMA modelling

• Estimation: maximum likelihood.

• Order selection (p, q, P , Q): visual identification, AIC, and
model validation.

• Usually only one seasonal AR or MA term is needed.

58/63

Example: Visitor Arrivals in Australia

Recall that we obtained the following ACF and PACF plots the
first and seasonally differenced log series:

We select an ARIMA(3,1,0)(0,1,1)12 specification based on the
AIC.

59/63

Example: Visitor Arrivals in Australia

Statespace Model Results
==========================================================================================
Dep. Variable: Arrivals No. Observations: 127
Model: SARIMAX(3, 1, 1)x(0, 1, 1, 12) Log Likelihood 212.670
Date: AIC -411.339
Time: BIC -391.430
Sample: 01-31-2007 HQIC -403.250

– 07-31-2017
Covariance Type: opg
==============================================================================

coef std err z P>|z| [0.025 0.975]
——————————————————————————
intercept 0.0007 0.000 2.957 0.003 0.000 0.001
ar.L1 0.0532 0.118 0.450 0.652 -0.178 0.285
ar.L2 -0.0454 0.112 -0.403 0.687 -0.266 0.175
ar.L3 0.2426 0.112 2.166 0.030 0.023 0.462
ma.L1 -0.9726 0.166 -5.873 0.000 -1.297 -0.648
ma.S.L12 -0.9976 7.778 -0.128 0.898 -16.242 14.247
sigma2 0.0010 0.008 0.129 0.897 -0.015 0.017
===================================================================================
Ljung-Box (Q): 65.43 Jarque-Bera (JB): 0.72
Prob(Q): 0.01 Prob(JB): 0.70
Heteroskedasticity (H): 0.49 Skew: 0.10
Prob(H) (two-sided): 0.03 Kurtosis: 2.66
==================================================================================

60/63

Example: Visitor Arrivals in Australia

61/63

Summary of modelling process (FPP)

Figure from FPP

62/63

Review questions

• What is stationarity and why is it a fundamental concept in
ARIMA modelling?

• What transformation do we apply to a time series to make it
stationary?

• How do we identify AR vs MA processes from ACF and PACF
plots?

• What is an ARIMA model?

• Write the equation for a seasonal ARIMA model using
backshift notation.

63/63

Stationarity and Box-Jenkins methodology
Differencing
Models for stationary series
AR models
MA models
ARMA models

ARIMA models
Seasonal ARIMA models