ETC3231/5231 Business forecasting
Ch9. ARIMA models OTexts.org/fpp3/
Outline
1 Stationarity and differencing
2 Backshift notation
3 Non-seasonal ARIMA models
4 Estimation and order selection
5 ARIMA modelling in R
6 Forecasting
7 Seasonal ARIMA models
8 ARIMA vs ETS 2
ARIMA models
AR: autoregressive (lagged observations as inputs)
I: integrated (differencing to make series stationary)
MA: moving average (lagged errors as inputs)
3
ARIMA models
AR: autoregressive (lagged observations as inputs)
I: integrated (differencing to make series stationary)
MA: moving average (lagged errors as inputs)
An ARIMA model is rarely interpretable in terms of visible data structures like trend and seasonality. But it can capture a huge range of time series patterns.
3
Outline
1 Stationarity and differencing
2 Backshift notation
3 Non-seasonal ARIMA models
4 Estimation and order selection
5 ARIMA modelling in R
6 Forecasting
7 Seasonal ARIMA models
8 ARIMA vs ETS 4
Stationarity
Definition
If {yt} is a stationary time series, then for all s, the distribution of (yt, . . . , yt+s) does not depend on t.
5
Stationarity
Definition
If {yt} is a stationary time series, then for all s, the distribution of (yt, . . . , yt+s) does not depend on t.
A stationary series is:
roughly horizontal
constant variance
no patterns predictable in the long-term
5
Stationary?
gafa_stock %>%
filter(Symbol == “GOOG”, year(Date) == 2018) %>%
autoplot(Close) +
labs(y = “Google closing stock price”, x = “Day”)
6
1200
1100
1000
Jan 2018 Apr 2018 Jul 2018 Oct 2018 Jan 2019
Day
Google closing stock price
Stationary?
gafa_stock %>%
filter(Symbol == “GOOG”, year(Date) == 2018) %>%
autoplot(difference(Close)) +
labs(y = “Google closing stock price”, x = “Day”)
7
60
30
0
−30
−60
Jan 2018
Apr 2018
Jul 2018
Oct 2018
Jan 2019
Day
Google closing stock price
Stationary?
aus_production %>%
autoplot(Bricks) +
labs(title = “Clay brick production in Australia”)
600
500
400
300
200
Clay brick production in Australia
1960 Q1
1980 Q1
2000 Q1
Quarter [1Q]
9
Bricks
Stationary?
prices %>%
filter(year >= 1900) %>%
autoplot(eggs) +
labs(y=”$US (1993)”, title=”Price of a dozen eggs”)
10
300
200
100
Price of a dozen eggs
1900 1925 1950 1975 2000
year [1Y]
$US (1993)
Stationary?
global_economy %>%
filter(Country == “Algeria”) %>%
autoplot(Exports) +
labs(y = “% of GDP”, title = “Algerian Exports”)
8
50
40
30
20
Algerian Exports
1960 1980
2000
Year [1Y]
% of GDP
Stationary?
aus_livestock %>%
filter(Animal == “Pigs”, State == “Victoria”,
year(Month) >= 2015) %>%
autoplot(Count/1e3) +
labs(y = “thousands”, title = “Total pigs slaughtered in Victoria”)
11
Total pigs slaughtered in Victoria
110
100
90
2015 Jan 2016 Jan 2017 Jan 2018 Jan 2019 Jan
Month [1M]
thousands
Stationary?
aus_livestock %>%
filter(Animal == “Pigs”, State == “Victoria”,
year(Month) >= 2010) %>%
autoplot(Count/1e3) +
labs(y = “thousands”, title = “Total pigs slaughtered in Victoria”)
12
Total pigs slaughtered in Victoria
100
80
60
2010 Jan 2012 Jan 2014 Jan 2016 Jan 2018 Jan
Month [1M]
thousands
Stationary?
aus_livestock %>%
filter(Animal == “Pigs”, State == “Victoria”
) %>%
autoplot(Count/1e3) +
labs(y = “thousands”, title = “Total pigs slaughtered in Victoria”)
13
160
120
80
40
Total pigs slaughtered in Victoria
1980 Jan
1990 Jan
2000 Jan
2010 Jan
2020 Jan
Month [1M]
thousands
Stationary?
pelt %>%
autoplot(Lynx) +
labs(y = “Number trapped”,
title = “Annual Canadian Lynx Trappings”)
14
80000
60000
40000
20000
Annual Canadian Lynx Trappings
0
1860 1880
1900 1920
Year [1Y]
Number trapped
Stationarity
Definition
If {yt} is a stationary time series, then for all s, the distribution of (yt, . . . , yt+s) does not depend on t.
A stationary series is:
roughly horizontal
constant variance
no patterns predictable in the long-term
15
Stationarity
Definition
If {yt} is a stationary time series, then for all s, the distribution of (yt, . . . , yt+s) does not depend on t.
A stationary series is:
roughly horizontal
constant variance
no patterns predictable in the long-term
Transformations help to stabilize the variance. For ARIMA modelling, we also need to stabilize the mean.
15
Non-stationarity in the mean
Identifying non-stationary series
time plot.
The ACF of stationary data drops to zero relatively quickly
The ACF of non-stationary data decreases slowly. For non-stationary data, the value of r1 is often large and positive.
16
Example: Google stock price
google_2018 <- gafa_stock %>%
filter(Symbol == “GOOG”, year(Date) == 2018) %>%
mutate(trading_day = row_number()) %>%
update_tsibble(index = trading_day, regular = TRUE)
17
Example: Google stock price
google_2018 %>%
autoplot(Close) + ylab(“Closing stock price ($USD)”) + xlab(“Day”)
1200
1100
1000
0 50 100 150 200 250
Day
18
Closing stock price ($USD)
Example: Google stock price
google_2018 %>%
ACF(Close) %>% autoplot()
1.00
0.75
0.50
0.25
0.00
5 10 15 20
lag [1]
19
acf
Example: Google stock price
google_2018 %>%
autoplot(difference(Close)) +
ylab(“Change in Google closing stock price ($USD)”) + xlab(“Day”)
60
30
0
−30
−60
0 50 100 150 200 250
Day
20
Change in Google closing stock price ($USD)
Example: Google stock price
google_2018 %>% ACF(difference(Close)) %>% autoplot()
0.10
0.05
0.00
−0.05
−0.10
−0.15
5 10 15 20
lag [1]
21
acf
Differencing
Differencing helps to stabilize the mean.
The differenced series is the change between each observation in the original series:
y t′ = y t − y t − 1 .
The differenced series will have only T − 1 values since it is not possible to calculate a differencey1′ forthefirstobservation.
22
Example: Google stock price
The differences are the day-to-day changes. Now the series looks just like a white noise series:
No autocorrelations outside the 95% limits.
Large Ljung-Box p-value.
Conclusion: The daily change in the Google stock price is essentially a random amount uncorrelated with previous days.
23
Random walk model
Graph of differenced data suggests the following model:
yt −yt−1 =εt or yt =yt−1 +εt where εt ∼ NID(0, σ2).
Very widely used for non-stationary data.
This is the model behind the naïve method.
Random walks typically have:
long periods of apparent trends up or down.
Sudden/unpredictable changes in direction –
stochastic trend.
Forecast are equal to the last observation
24
Random walk with drift model
If the differenced series has a non-zero mean then:
yt −yt−1 =c+εt or yt =c+yt−1 +εt where εt ∼ NID(0, σ2).
c is the average change between consecutive observations.
If c > 0, yt will tend to drift upwards and vice versa.
Stochastic and deterministic trend.
This is the model behind the drift method.
25
Second-order differencing
Occasionally the differenced data will not appear stationary and it may be necessary to difference the data a second time:
26
Second-order differencing
Occasionally the differenced data will not appear stationary and it may be necessary to difference the data a second time:
y′′ = y′ − y′
t t t−1
= (yt − yt−1) − (yt−1 − yt−2) = yt − 2yt−1 + yt−2.
26
Second-order differencing
Occasionally the differenced data will not appear stationary and it may be necessary to difference the data a second time:
y′′ = y′ − y′
t t t−1
= (yt − yt−1) − (yt−1 − yt−2) = yt − 2yt−1 + yt−2.
y′′ will have T − 2 values. t
In practice, it is almost never necessary to go beyond second-order differences.
26
Seasonal differencing
A seasonal difference is the difference between an observation and the corresponding observation from the previous year.
y t′ = y t − y t − m where m = number of seasons.
27
Seasonal differencing
A seasonal difference is the difference between an observation and the corresponding observation from the previous year.
y t′ = y t − y t − m where m = number of seasons.
For monthly data m = 12.
For quarterly data m = 4.
Seasonally differenced series will have T − m obs.
27
Seasonal differencing
A seasonal difference is the difference between an observation and the corresponding observation from the previous year.
y t′ = y t − y t − m where m = number of seasons.
For monthly data m = 12.
For quarterly data m = 4.
Seasonally differenced series will have T − m obs.
If seasonally differenced data is white noise it implies: yt −yt−m =εt or yt =yt−m +εt
The model behind the seasonal naïve method.
27
Antidiabetic drug sales
a10 <- PBS %>%
filter(ATC2 == “A10”) %>%
summarise(Cost = sum(Cost)/1e6)
28
Antidiabetic drug sales
a10 %>% autoplot(
Cost
)
30
20
10
1995 Jan 2000 Jan 2005 Jan
Month [1M]
29
Cost
Antidiabetic drug sales
a10 %>% autoplot(
log(Cost)
)
3.5
3.0
2.5
2.0
1.5
1.0
1995 Jan
2000 Jan
2005 Jan
Month [1M]
30
log(Cost)
Antidiabetic drug sales
a10 %>% autoplot(
log(Cost) %>% difference(12)
)
0.3
0.2
0.1
0.0
−0.1
1995 Jan
2000 Jan
2005 Jan
Month [1M]
31
log(Cost) %>% difference(12)
Cortecosteroid drug sales
h02 <- PBS %>%
filter(ATC2 == “H02”) %>%
summarise(Cost = sum(Cost)/1e6)
32
Cortecosteroid drug sales
h02 %>% autoplot(
Cost
)
1.25
1.00
0.75
0.50
1995 Jan
2000 Jan
2005 Jan
Month [1M]
33
Cost
Cortecosteroid drug sales
h02 %>% autoplot(
log(Cost)
)
0.0
−0.4
−0.8
1995 Jan 2000 Jan 2005 Jan
Month [1M]
34
log(Cost)
Cortecosteroid drug sales
h02 %>% autoplot(
log(Cost) %>% difference(12)
)
0.4
0.2
0.0
1995 Jan 2000 Jan 2005 Jan
Month [1M]
35
log(Cost) %>% difference(12)
Cortecosteroid drug sales
h02 %>% autoplot(
log(Cost) %>% difference(12) %>% difference(1)
)
0.4
0.2
0.0
−0.2
1995 Jan
2000 Jan
2005 Jan
Month [1M]
36
log(Cost) %>% difference(12) %>% difference(1)
Cortecosteroid drug sales
Seasonally differenced series is closer to being stationary.
Remaining non-stationarity can be removed with further first difference.
If yt′ = yt − yt−12 denotes seasonally differenced series, then twice-differenced series is
y∗ = y′ − y′
t t t−1
= (yt − yt−12) − (yt−1 − yt−13) =yt −yt−1 −yt−12 +yt−13 .
37
Seasonal differencing
When both seasonal and first differences are applied. . .
38
Seasonal differencing
When both seasonal and first differences are applied. . .
it makes no difference which is done first—the result will be the same.
If seasonality is strong, we recommend that seasonal differencing be done first because sometimes the resulting series will be stationary and there will be no need for further first difference.
38
Interpretation of differencing
It is important that if differencing is used, the differences are interpretable.
first differences are the change between one observation and the next;
seasonal differences are the change between one year to the next.
39
Interpretation of differencing
It is important that if differencing is used, the differences are interpretable.
first differences are the change between one observation and the next;
seasonal differences are the change between one year to the next.
But taking lag 3 differences for yearly data, for example, results in a model which cannot be sensibly interpreted.
39
Unit root tests
Statistical tests to determine the required order of differencing.
1 Augmented Dickey Fuller test: null hypothesis is that the data are non-stationary and non-seasonal.
2 Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test: null hypothesis is that the data are stationary and non-seasonal.
3 Other tests available for seasonal data.
40
KPSS test
google_2018 %>%
features(Close, unitroot_kpss)
## # A tibble: 1 x 3
## Symbol kpss_stat kpss_pvalue
##
## 1 GOOG 0.573 0.0252
41
KPSS test
google_2018 %>%
features(Close, unitroot_kpss)
## # A tibble: 1 x 3
## Symbol kpss_stat kpss_pvalue
##
## 1 GOOG 0.573
google_2018 %>%
features(Close, unitroot_ndiffs)
## # A tibble: 1 x 2
## Symbol ndiffs
##
## 1 GOOG 1
41
Automatically selecting differences
STL decomposition: yt = Tt + St + Rt Seasonal strength Fs = max 0, 1 − Var(Rt )
Var(St +Rt ) If Fs > 0.64, do one seasonal difference.
h02 %>% mutate(log_sales = log(Cost)) %>%
features(log_sales, list(unitroot_nsdiffs, feat_stl))
## # A tibble: 1 x 10
## nsdiffs trend_strength seasonal_strength~ seasonal_peak_y~
##
## 1 1 0.954 0.952
## # … with 6 more variables: seasonal_trough_year
## # spikiness
## # stl_e_acf1
6
42
Automatically selecting differences
h02 %>% mutate(log_sales = log(Cost)) %>%
features(log_sales, unitroot_nsdiffs)
## # A tibble: 1 x 1 ## nsdiffs
##
##1 1
h02 %>% mutate(d_log_sales = difference(log(Cost), 12)) %>%
features(d_log_sales, unitroot_ndiffs)
## # A tibble: 1 x 1 ## ndiffs
##
##1 1
43
Outline
1 Stationarity and differencing
2 Backshift notation
3 Non-seasonal ARIMA models
4 Estimation and order selection
5 ARIMA modelling in R
6 Forecasting
7 Seasonal ARIMA models
8 ARIMA vs ETS 44
Backshift notation
A very useful notational device is the backward shift operator, B, which is used as follows:
Byt = yt−1
45
Backshift notation
A very useful notational device is the backward shift operator, B, which is used as follows:
Byt = yt−1
In other words, B, operating on yt, has the effect of shifting the data back one period.
45
Backshift notation
A very useful notational device is the backward shift operator, B, which is used as follows:
Byt = yt−1
In other words, B, operating on yt, has the effect of shifting the data back one period. Two applications of B to yt shifts the data back two periods:
B(Byt) = B2yt = yt−2
45
Backshift notation
A very useful notational device is the backward shift operator, B, which is used as follows:
Byt = yt−1
In other words, B, operating on yt, has the effect of shifting the data back one period. Two applications of B to yt shifts the data back two periods:
B(Byt) = B2yt = yt−2
For monthly data, if we wish to shift attention to “the same month last year”, then B12 is used, and the notation is B12yt = yt−12.
45
Backshift notation
The backward shift operator is convenient for describing the process of differencing.
46
Backshift notation
The backward shift operator is convenient for describing the process of differencing. A first(-order) difference can be written as
yt′ =yt −yt−1 =yt −Byt =(1−B)yt
46
Backshift notation
The backward shift operator is convenient for describing the process of differencing. A first(-order) difference can be written as
yt′ =yt −yt−1 =yt −Byt =(1−B)yt
Note that a first difference is represented by (1 − B).
46
Backshift notation
The backward shift operator is convenient for describing the process of differencing. A first(-order) difference can be written as
yt′ =yt −yt−1 =yt −Byt =(1−B)yt
Note that a first difference is represented by (1 − B).
Similarly, if second-order differences (i.e., first differences of first differences) have to be computed, then:
y′′ =yt −2yt−1 +yt−2 =(1−B)2yt t
46
Backshift notation
Second-order difference is denoted (1 − B)2;
47
Backshift notation
Second-order difference is denoted (1 − B)2; Second-order difference is not the same as a
second difference, which would be denoted 1−B2;
47
Backshift notation
Second-order difference is denoted (1 − B)2; Second-order difference is not the same as a
second difference, which would be denoted 1−B2;
In general, a dth-order difference can be written as
(1 − B)dyt
47
Backshift notation
Second-order difference is denoted (1 − B)2; Second-order difference is not the same as a
second difference, which would be denoted 1−B2;
In general, a dth-order difference can be written as
(1 − B)dyt
A seasonal difference followed by a first difference can be written as
(1 − Bm)(1 − B)yt 47
Backshift notation
The “backshift” notation is convenient because the terms can be multiplied together to see the combined effect.
(1−B)(1−Bm)yt =(1−B−Bm +Bm+1)yt
= yt − yt−1 − yt−m + yt−m−1.
48
Backshift notation
The “backshift” notation is convenient because the terms can be multiplied together to see the combined effect.
(1−B)(1−Bm)yt =(1−B−Bm +Bm+1)yt
= yt − yt−1 − yt−m + yt−m−1.
For monthly data, m = 12 and we obtain the same result as earlier.
48
Outline
1 Stationarity and differencing
2 Backshift notation
3 Non-seasonal ARIMA models
4 Estimation and order selection
5 ARIMA modelling in R
6 Forecasting
7 Seasonal ARIMA models
8 ARIMA vs ETS 49
Autoregressive models
Autoregressive model – AR(p):
yt =c+φ1yt−1 +φ2yt−2 +···+φpyt−p +εt, where εt is white noise. This is a multiple regression with lagged values of yt as predictors.
50
AR(1)
12
10
8
0 25 50 75 100
idx [1]
AR(2)
25.0
22.5
20.0
17.5
15.0
0 25 50 75 100
idx [1]
AR(1) model
yt = −0.8yt−1 + εt
εt ∼ N(0,1), T = 100.
2.5
0.0
−2.5
AR(1)
0 25 50 75 100
idx [1]
51
AR(1) model
yt = φ1yt−1 + εt
When φ1 = 0, yt is equivalent to a WN
When φ1 = 1, yt is equivalent to a RW
We require |φ1| < 1 for stationarity. The closer φ1 is to the bounds the more the process wanders above or below it’s unconditional mean (zero in this case).
When φ1 < 0, yt tends to oscillate between positive and negative values.
52
AR(1) model including a constant
yt =18−0.8yt−1 +εt
εt ∼ N(0,1), T = 100.
12.5
10.0
7.5
AR(1)
0 25 50 75 100
idx [1]
53
AR(1) model including a constant
yt = c + φ1yt−1 + εt
Whenφ1 =0andc=0,yt isequivalenttoWN; Whenφ1 =1andc=0,yt isequivalenttoaRW; Whenφ1 =1andc̸=0,yt isequivalenttoaRW with drift;
54
AR(1) model including a constant
yt = c + φ1yt−1 + εt c is related to the mean of yt.
Let E(yt) = μ then c = μ × (1 − φ1).
ARIMA() takes care of whether you need a constant or not, or you can overide it.
55
AR(1) model including a constant
Ifincludedestimatedmodelreturnsw/ mean Series: sim
Model: ARIMA(1,0,0) w/ mean
Coefficients:
ar1 constant
-0.8381 18.3527
s.e. 0.0540 0.1048
sigma^2 estimated as 1.11: log likelihood=-146.7
AIC=299.4 AICc=299.7 BIC=307.2
56
AR(2) model
yt =8+1.3yt−1 −0.7yt−2 +εt
εt ∼ N(0,1), T = 100.
25.0
22.5
20.0
17.5
15.0
AR(2)
0 25 50 75 100
idx [1]
57
Stationarity conditions
We normally restrict autoregressive models to stationary data, and then some constraints on the values of the parameters are required.
General condition for stationarity
Complexrootsof1−φ1z−φ2z2 −···−φpzp lie outside the unit circle on the complex plane.
58
Stationarity conditions
We normally restrict autoregressive models to stationary data, and then some constraints on the values of the parameters are required.
General condition for stationarity
Complexrootsof1−φ1z−φ2z2 −···−φpzp lie outside the unit circle on the complex plane.
Forp=1:−1<φ1 <1.
For p = 2:
−1<φ2 <1 φ2 +φ1 <1 φ2 −φ1 <1. More complicated conditions hold for p ≥ 3. Estimation software takes care of this.
58
53
Moving Average (MA) models
Moving Average model - MA(q)
yt =c+εt +θ1εt−1 +θ2εt−2 +···+θqεt−q, where εt is white noise. This is a multiple regression with past errors as predictors. Don’t confuse this with moving average smoothing!
MA(1)
22
20
18
0 25 50 75 100
idx [1]
MA(2)
2.5
0.0
−2.5
−5.0
0 25 50 75 100
idx [1]
59
MA(1) model
yt =20+εt +0.8εt−1
εt ∼ N(0,1), T = 100.
22
20
18
MA(1)
0 25 50 75 100
idx [1]
60
MA(2) model
yt = εt − εt−1 + 0.8εt−2
εt ∼ N(0,1), T = 100.
2.5
0.0
−2.5
−5.0
MA(2)
0 25 50 75 100
idx [1]
61
MA(∞) models
It is possible to write any stationary AR(p) process as an MA(∞) process.
Example: AR(1)
yt = φ1yt−1 + εt
= φ1(φ1yt−2 + εt−1) + εt
= φ21yt−2 + φ1εt−1 + εt
= φ31yt−3 + φ21εt−2 + φ1εt−1 + εt
...
62
MA(∞) models
It is possible to write any stationary AR(p) process as an MA(∞) process.
Example: AR(1)
yt = φ1yt−1 + εt
= φ1(φ1yt−2 + εt−1) + εt
= φ21yt−2 + φ1εt−1 + εt
= φ31yt−3 + φ21εt−2 + φ1εt−1 + εt
...
Provided −1 < φ1 < 1:
yt =εt +φ1εt−1 +φ21εt−2 +φ31εt−3 +··· 62
Invertibility
Any MA(q) process can be written as an AR(∞) process if we impose some constraints on the MA parameters.
Then the MA model is called "invertible". Invertible models have some mathematical properties that make them easier to use in practice.
Invertibility of an ARIMA model is equivalent to forecastability of an ETS model.
63
Invertibility
General condition for invertibility
Complexrootsof1+θ1z+θ2z2 +···+θqzq lieoutside the unit circle on the complex plane.
64
Invertibility
General condition for invertibility
Complexrootsof1+θ1z+θ2z2 +···+θqzq lieoutside the unit circle on the complex plane.
Forq=1:−1<θ1 <1.
For q = 2:
−1<θ2 <1 θ2 +θ1 >−1 θ1 −θ2 <1. More complicated conditions hold for q ≥ 3. Estimation software takes care of this.
64
63
75
ARMA models
Autoregressive Moving Average model - ARMA(p, q)
yt =c+φ1yt−1 +···+φpyt−p
+θ1εt−1 +···+θqεt−q +εt.
65
ARMA models
Autoregressive Moving Average model - ARMA(p, q) yt =c+φ1yt−1 +···+φpyt−p
+θ1εt−1 +···+θqεt−q +εt.
Predictors include both lagged values of yt and lagged errors.
Conditions on coefficients ensure stationarity. Conditions on coefficients ensure invertibility. These are stationary models. They are only suitable for stationary series.
65
ARIMA models
Combine ARMA model with differencing. Lety′ =(1−B)dyt.
AutoreRressive Integrated Moving Average model - ARIMA(p, d, q)
y′ =c+φ1y′ +···+φpy′
t t−1 t−p
+θ1εt−1 +···+θqεt−q +εt.
66
ARIMA models
(AutoRegressive Integrated Moving Average) ARIMA(p, d, q) model
AR: p = order of the autoregressive part
I: d = degree of first differencing involved
MA: q = order of the moving average part.
White noise model: ARIMA(0,0,0)
Random walk: ARIMA(0,1,0) with no constant Random walk with drift: ARIMA(0,1,0) with const. AR(p): ARIMA(p,0,0)
MA(q): ARIMA(0,0,q) 67
Backshift notation for ARIMA
AR(p):yt =c+φ1yt−1 +···+φpyt−p +εt
MA(q):yt =c+εt +θ1εt−1 +···+θqεt−q
68
Backshift notation for ARIMA
ARMA(p, q):
ARIMA(p, d, q) model:
69
ARIMA(1,1,1) model
(1−φ1B) (1−B)yt = c+(1+θ1B)εt ↑↑↑
AR(1) First MA(1) difference
Written out:
yt =c+yt−1 +φ1yt−1 −φ1yt−2 +θ1εt−1 +εt
70
R model
Intercept form
(1−φ1B−···−φpBp)yt′ =c+(1+θ1B+···+θqBq)εt Mean form
(1−φ1B−···−φpBp)(yt′ −μ)=(1+θ1B+···+θqBq)εt
y t′ = ( 1 − B ) d y t
μ is the mean of yt′.
c = μ(1 − φ1 − · · · − φp).
ARIMA() in the fable package uses intercept form.
71
Egyptian exports
global_economy %>%
filter(Code == “EGY”) %>%
autoplot(Exports) +
labs(y = “% of GDP”, title = “Egyptian Exports”)
Egyptian Exports
30
25
20
15
10
1960 1980
2000
Year [1Y]
72
% of GDP
Egyptian exports
fit <- global_economy %>% filter(Code == “EGY”) %>%
model(ARIMA(Exports))
report(fit)
## Series: Exports
## Model: ARIMA(2,0,1) w/ mean
##
## Coefficients:
## ar1 ar2 ma1 constant
## 1.676 -0.8034 -0.690 2.562
## s.e. 0.111 0.0928 0.149 0.116
##
## sigma^2 estimated as 8.046: log likelihood=-142
## AIC=293 AICc=294 BIC=303
73
Egyptian exports
fit <- global_economy %>% filter(Code == “EGY”) %>%
model(ARIMA(Exports))
report(fit)
## Series: Exports
## Model: ARIMA(2,0,1) w/ mean
##
## Coefficients:
## ar1 ar2 ma1 constant
## 1.676 -0.8034 -0.690 2.562
## s.e. 0.111 0.0928 0.149 0.116
##
## sigma^2 estimated as 8.046: log likelihood=-142
## AIC=293 AICc=294 BIC=303
ARIMA(2,0,1) model:
yt = 2.56 + 1.68yt−1 − 0.80yt−2 − 0.69εt−1 + εt,
where εt is white noise with a standard deviation of 2.837 =
√
8.046.
73
Egyptian exports
gg_tsresiduals(fit)
6 3 0
−3 −6
1960 1980
2000
Year
0.2 0.1 0.0
−0.1 −0.2
15 10 5 0
4 8 12 16
lag [1Y]
−4 0 4
.resid
74
acf .resid
count
Egyptian exports
augment(fit) %>%
features(.resid, ljung_box, lag = 10, dof = 3)
## # A tibble: 1 x 4
## Country .model
##
## 1 Egypt, Arab Rep. ARIMA(Exports) 5.78 0.565
lb_stat lb_pvalue
75
Egyptian exports
fit %>% forecast(h=10) %>%
autoplot(global_economy) +
labs(y = “% of GDP”, title = “Egyptian Exports”)
Egyptian Exports
30
25
20
15
10
l
1960 1980
2000 2020
Year
evel 80
95
76
% of GDP
Understanding ARIMA models
If c = 0 and d = 0, the long-term forecasts will go to zero. If c = 0 and d = 1, the long-term forecasts will go to a non-zero constant.
If c = 0 and d = 2, the long-term forecasts will follow a straight line.
If c ̸= 0 and d = 0, the long-term forecasts will go to the mean of the data.
If c ̸= 0 and d = 1, the long-term forecasts will follow a straight line.
If c ̸= 0 and d = 2, the long-term forecasts will follow a quadratic trend.
77
Understanding ARIMA models
Forecast variance and d
The higher the value of d, the more rapidly the prediction intervals increase in size.
For d = 0, the long-term forecast standard deviation will go to the standard deviation of the historical data.
Cyclic behaviour
For cyclic forecasts, p ≥ 2 and some restrictions on coefficients are required.
Ifp=2,weneedφ21 +4φ2 <0.Thenaveragecycleof length
(2π)/ arc cos(−φ1(1 − φ2)/(4φ2)) .
78
Outline
1 Stationarity and differencing
2 Backshift notation
3 Non-seasonal ARIMA models
4 Estimation and order selection
5 ARIMA modelling in R
6 Forecasting
7 Seasonal ARIMA models
8 ARIMA vs ETS 79
Maximum likelihood estimation
Having identified the model order, we need to estimatetheparametersc,φ1,...,φp,θ1,...,θq.
80
Maximum likelihood estimation
Having identified the model order, we need to estimatetheparametersc,φ1,...,φp,θ1,...,θq.
MLE is very similar to least squares estimation
obtained by minimizing
T
e 2t t−1
The ARIMA() function allows CLS or MLE estimation.
Non-linear optimization must be used in either case.
Different software will give different estimates. 80
Partial autocorrelations
Partial autocorrelations measure relationship
between yt and yt−k, when the effects of other time lags — 1,2,3,...,k−1—areremoved.
81
Partial autocorrelations
Partial autocorrelations measure relationship
between yt and yt−k, when the effects of other time lags — 1,2,3,...,k−1—areremoved.
αk = kth partial autocorrelation coefficient
= equal to the estimate of φk in regression:
yt =c+φ1yt−1 +φ2yt−2 +···+φkyt−k.
81
Partial autocorrelations
Partial autocorrelations measure relationship
between yt and yt−k, when the effects of other time lags — 1,2,3,...,k−1—areremoved.
αk = kth partial autocorrelation coefficient
= equal to the estimate of φk in regression:
yt =c+φ1yt−1 +φ2yt−2 +···+φkyt−k.
Varying number of terms on RHS gives αk for different values of k.
α1 = ρ1
same critical values of ±1.96/ T as for ACF.
Last significant αk indicates the order of an AR model.
81
√
Egyptian exports
global_economy %>%
filter(Code == “EGY”) %>%
autoplot(Exports) +
labs(y = “% of GDP”, title = “Egyptian Exports”)
Egyptian Exports
30
25
20
15
10
1960 1980
2000
Year [1Y]
% of GDP
Egyptian exports
egypt <- global_economy %>% filter(Code == “EGY”)
egypt %>% ACF(Exports) %>% autoplot()
egypt %>% PACF(Exports) %>% autoplot()
0.5
0.6
0.3
0.0
−0.3
0.0
4 8 12 16
lag [1Y]
4 8 12 16
lag [1Y]
82
acf
pacf
Egyptian exports
global_economy %>% filter(Code == “EGY”) %>%
gg_tsdisplay(Exports, plot_type=’partial’)
30 25 20 15 10
1960 1980
2000
Year
0.5
0.0
4 8 12 16
lag [1Y]
0.5
0.0
4 8 12 16
lag [1Y]
83
acf Exports
pacf
ACF and PACF interpretation
AR(1)
ρk =φk1 fork=1,2,…;
α1 =φ1 αk =0 fork=2,3,….
So we have an AR(1) model when
autocorrelations exponentially decay there is a single significant partial autocorrelation.
84
ACF and PACF interpretation
AR(p)
ACF dies out in an exponential or damped sine-wave manner
PACF has all zero spikes beyond the pth spike
So we have an AR(p) model when
the ACF is exponentially decaying or sinusoidal there is a significant spike at lag p in PACF, but none beyond p
85
ACF and PACF interpretation
MA(1)
ρ1 =θ1 ρk =0 fork=2,3,…; αk = −(−θ1)k
So we have an MA(1) model when
the PACF is exponentially decaying and there is a single significant spike in ACF
86
ACF and PACF interpretation
MA(q)
PACF dies out in an exponential or damped sine-wave manner
ACF has all zero spikes beyond the qth spike
So we have an MA(q) model when
the PACF is exponentially decaying or sinusoidal there is a significant spike at lag q in ACF, but none beyond q
87
Example: Mink trapping
mink <- as_tsibble(fma::mink)
mink %>% autoplot(value) +
xlab(“Year”) +
ylab(“Minks trapped (thousands)”) +
ggtitle(“Annual number of minks trapped”)
88
90000
60000
30000
Annual number of minks trapped
1860 1880 1900
Year
Minks trapped (thousands)
Example: Mink trapping
mink %>% gg_tsdisplay(value, plot_type=’partial’)
90000
60000
30000
1860 1880 1900
index
0.6 0.4 0.2 0.0
−0.2 −0.4
5 10 15
lag [1Y]
0.6 0.4 0.2 0.0
−0.2 −0.4
5 10 15
lag [1Y]
89
acf value
pacf
Information criteria
Akaike’s Information Criterion (AIC):
AIC = −2 log(L) + 2(p + q + k + 1), where L is the likelihood of the data,
k = 1 if c ̸= 0 and k = 0 if c = 0. Corrected AIC:
AICc = AIC + 2(p+q+k+1)(p+q+k+2) . T−p−q−k−2
Bayesian Information Criterion:
BIC = AIC + [log(T) − 2](p + q + k + 1).
Good models are obtained by minimizing either the AIC, AICc or BIC. Our preference is to use the AICc.
90
Outline
1 Stationarity and differencing
2 Backshift notation
3 Non-seasonal ARIMA models
4 Estimation and order selection
5 ARIMA modelling in R
6 Forecasting
7 Seasonal ARIMA models
8 ARIMA vs ETS 91
How does ARIMA() work?
A non-seasonal ARIMA process
φ(B)(1 − B)dyt = c + θ(B)εt Need to select appropriate orders: (p, q, d)
Hyndman and Khandakar (JSS, 2008) algorithm:
Select no. differences d and D via KPSS test and seasonal strength measure.
Select p, q by minimising AICc.
Use stepwise search to traverse model space.
92
How does ARIMA() work?
AICc=−2log(L)+2(p+q+k+1)1+ (p+q+k+2) . T−p−q−k−2
where L is the maximised likelihood fitted to the differenced data, k = 1 if c ̸= 0 and k = 0 otherwise.
93
How does ARIMA() work?
AICc=−2log(L)+2(p+q+k+1)1+ (p+q+k+2) . T−p−q−k−2
where L is the maximised likelihood fitted to the differenced data, k = 1 if c ̸= 0 and k = 0 otherwise.
Step1:
Selectcurrentmodel(withsmallestAICc)from: ARIMA(2, d, 2)
ARIMA(0, d, 0)
ARIMA(1, d, 0)
ARIMA(0, d, 1)
93
How does ARIMA() work?
AICc=−2log(L)+2(p+q+k+1)1+ (p+q+k+2) . T−p−q−k−2
where L is the maximised likelihood fitted to the differenced data, k = 1 if c ̸= 0 and k = 0 otherwise.
Step1:
Selectcurrentmodel(withsmallestAICc)from: ARIMA(2, d, 2)
ARIMA(0, d, 0)
ARIMA(1, d, 0)
ARIMA(0, d, 1) Considervariationsofcurrentmodel:
vary one of p, q, from current model by ±1; p, q both vary from current model by ±1; Include/exclude c from current model.
Step2:
Model with lowest AICc becomes current model.
Repeat Step 2 until no lower AICc can be found.
93
How does ARIMA() work?
p
q
0123456 0
1
2
3
4
5
6
step 1
How does ARIMA() work?
p
q
0123456 0
1
2
step
31 2
4
5
6
How does ARIMA() work?
p
q
0123456 0
1
2
3
4
5
6
step 1
2 3
How does ARIMA() work?
p
q
0123456 0
1
2
1 32 3 4
4
5
6
step
Choosing your own model
global_economy %>% filter(Code == “EGY”) %>%
gg_tsdisplay(Exports, plot_type=’partial’)
30 25 20 15 10
1960 1980
2000
Year
0.5
0.0
4 8 12 16
lag [1Y]
0.5
0.0
4 8 12 16
lag [1Y]
94
acf Exports
pacf
Egyptian exports
fit1 <- global_economy %>%
filter(Code == “EGY”) %>%
model(ARIMA(Exports ~ pdq(4,0,0)))
report(fit1)
## Series: Exports
## Model: ARIMA(4,0,0) w/ mean
##
## Coefficients:
## ar1 ar2 ar3 ar4
## 0.986 -0.172 0.181 -0.328
## s.e. 0.125 0.186 0.186 0.127
##
## sigma^2 estimated as 7.885: log likelihood=-141
## AIC=293 AICc=295 BIC=305
constant
6.692
0.356
95
Egyptian exports
fit2 <- global_economy %>%
filter(Code == “EGY”) %>%
model(ARIMA(Exports))
report(fit2)
## Series: Exports
## Model: ARIMA(2,0,1) w/ mean
##
## Coefficients:
## ar1 ar2 ma1 constant
## 1.676 -0.8034 -0.690 2.562
## s.e. 0.111 0.0928 0.149 0.116
##
## sigma^2 estimated as 8.046: log likelihood=-142
## AIC=293 AICc=294 BIC=303
96
Central African Republic exports
global_economy %>%
filter(Code == “CAF”) %>%
autoplot(Exports) +
labs(title=”Central African Republic exports”,
y=”% of GDP”)
97
35
30
25
20
15
10
Central African Republic exports
1960 1980
2000
Year [1Y]
% of GDP
Central African Republic exports
global_economy %>%
filter(Code == “CAF”) %>%
gg_tsdisplay(difference(Exports), plot_type=’partial’)
8 4 0
−4
1960 1980
2000
Year
0.2
0.0 −0.2 −0.4
4 8 12 16
lag [1Y]
0.2
0.0 −0.2 −0.4
4 8 12 16
lag [1Y]
98
acf difference(Exports)
pacf
Central African Republic exports
caf_fit <- global_economy %>%
filter(Code == “CAF”) %>%
model(arima210 = ARIMA(Exports ~ pdq(2,1,0)),
arima013 = ARIMA(Exports ~ pdq(0,1,3)),
stepwise = ARIMA(Exports),
search = ARIMA(Exports, stepwise=FALSE))
99
Central African Republic exports
caf_fit %>% pivot_longer(!Country, names_to = “Model name”,
values_to = “Orders”)
## # A mable: 4 x 3
## # Key:
## Country
##
## 1 Central African Republic arima210
## 2 Central African Republic arima013
## 3 Central African Republic stepwise
## 4 Central African Republic search
Country, Model name [4]
‘Model name‘ Orders
100
Central African Republic exports
glance(caf_fit) %>% arrange(AICc) %>% select(.model:BIC)
## # A tibble: 4 x 6
## .model sigma2 log_lik AIC AICc BIC
##
## 1 search 6.52 -133. 274. 275. 282.
## 2 arima210 6.71 -134. 275. 275. 281.
## 3 arima013 6.54 -133. 274. 275. 282.
## 4 stepwise 6.42 -132. 274. 275. 284.
101
Central African Republic exports
caf_fit %>%
select(search) %>%
gg_tsresiduals()
5
0
−5
1960 1980
2000
Year
0.2 0.1 0.0
−0.1 −0.2
4 8 12 16
lag [1Y]
20 15 10
5 0
−5 0 5
.resid
102
acf .resid
count
Central African Republic exports
augment(caf_fit) %>%
features(.innov, ljung_box, lag = 10, dof = 3)
## # A tibble: 4 x 4
## Country .model lb_stat lb_pvalue
##
## 1 Central African Republic arima013 5.64
## 2 Central African Republic arima210 10.7
## 3 Central African Republic search 5.75
## 4 Central African Republic stepwise 4.12
0.582
0.152
0.569
0.766
103
Central African Republic exports
caf_fit %>%
forecast(h=5) %>%
filter(.model==’search’) %>%
autoplot(global_economy)
104
30
20
10
l
1960 1980
2000 2020
Year
evel 80
95
Exports
Modelling procedure with ARIMA()
1
2
Plot the data. Identify any unusual observations. If necessary, transform the data (using a Box-Cox transformation) to stabilize the variance.
3 If the data are non-stationary: take first differences of the data until the data are stationary.
4 Examine the ACF/PACF: Is an AR(p) or MA(q) model appropriate?
5 Try your chosen model(s), and use the AICc to search for a better model.
6 Check the residuals from your chosen model by plotting the ACF of the residuals, and doing a portmanteau test of the residuals. If they do not look like white noise, try a modified model.
7 Once the residuals look like white noise, calculate forecasts.
105
Automatic modelling procedure with ARIMA()
1
2
Plot the data. Identify any unusual observations. If necessary, transform the data (using a Box-Cox transformation) to stabilize the variance.
3 Use ARIMA() to automatically select a model.
6 Check the residuals from your chosen model by plotting the ACF of the residuals, and doing a portmanteau test of the residuals. If they do not look like white noise, try a modified model.
7 Once the residuals look like white noise, calculate forecasts.
106
Modelling procedure
107
Outline
1 Stationarity and differencing
2 Backshift notation
3 Non-seasonal ARIMA models
4 Estimation and order selection
5 ARIMA modelling in R
6 Forecasting
7 Seasonal ARIMA models
8 ARIMA vs ETS 108
Point forecasts
1 Rearrange ARIMA equation so yt is on LHS.
2 Rewrite equation by replacing t by T + h.
3 On RHS, replace future observations by their
forecasts, future errors by zero, and past errors by corresponding residuals.
Start with h = 1. Repeat for h = 2,3,….
109
Point forecasts
ARIMA(3,1,1) forecasts: Step 1
(1−φ1B−φ2B2 −φ3B3)(1−B)yt =(1+θ1B)εt,
110
Point forecasts
ARIMA(3,1,1) forecasts: Step 1
(1−φ1B−φ2B2 −φ3B3)(1−B)yt =(1+θ1B)εt, 1−(1+φ1)B+(φ1 −φ2)B2 +(φ2 −φ3)B3 +φ3B4yt
= (1 + θ1B)εt,
110
Point forecasts
ARIMA(3,1,1) forecasts: Step 1
(1−φ1B−φ2B2 −φ3B3)(1−B)yt =(1+θ1B)εt, 1−(1+φ1)B+(φ1 −φ2)B2 +(φ2 −φ3)B3 +φ3B4yt
= (1 + θ1B)εt,
yt − (1 + φ1)yt−1 + (φ1 − φ2)yt−2 + (φ2 − φ3)yt−3 + φ3yt−4 = εt + θ1εt−1.
110
Point forecasts
ARIMA(3,1,1) forecasts: Step 1
(1−φ1B−φ2B2 −φ3B3)(1−B)yt =(1+θ1B)εt, 1−(1+φ1)B+(φ1 −φ2)B2 +(φ2 −φ3)B3 +φ3B4yt
= (1 + θ1B)εt, yt − (1 + φ1)yt−1 + (φ1 − φ2)yt−2 + (φ2 − φ3)yt−3
+ φ3yt−4 = εt + θ1εt−1.
yt = (1 + φ1)yt−1 − (φ1 − φ2)yt−2 − (φ2 − φ3)yt−3 − φ3yt−4 + εt + θ1εt−1.
110
Point forecasts (h=1)
yt = (1 + φ1)yt−1 − (φ1 − φ2)yt−2 − (φ2 − φ3)yt−3 − φ3yt−4 + εt + θ1εt−1.
111
Point forecasts (h=1)
yt = (1 + φ1)yt−1 − (φ1 − φ2)yt−2 − (φ2 − φ3)yt−3 − φ3yt−4 + εt + θ1εt−1.
ARIMA(3,1,1) forecasts: Step 2
yT+1 = (1 + φ1)yT − (φ1 − φ2)yT−1 − (φ2 − φ3)yT−2 − φ3yT−3 + εT+1 + θ1εT.
111
Point forecasts (h=1)
yt = (1 + φ1)yt−1 − (φ1 − φ2)yt−2 − (φ2 − φ3)yt−3 − φ3yt−4 + εt + θ1εt−1.
ARIMA(3,1,1) forecasts: Step 2
yT+1 = (1 + φ1)yT − (φ1 − φ2)yT−1 − (φ2 − φ3)yT−2
− φ3yT−3 + εT+1 + θ1εT. ARIMA(3,1,1) forecasts: Step 3
yˆT+1|T = (1 + φ1)yT − (φ1 − φ2)yT−1 − (φ2 − φ3)yT−2 − φ3yT−3 + θ1eT.
111
Point forecasts (h=1)
yt = (1 + φ1)yt−1 − (φ1 − φ2)yt−2 − (φ2 − φ3)yt−3 − φ3yt−4 + εt + θ1εt−1.
ARIMA(3,1,1) forecasts: Step 2
yT+1 = (1 + φ1)yT − (φ1 − φ2)yT−1 − (φ2 − φ3)yT−2
− φ3yT−3 + εT+1 + θ1εT. ARIMA(3,1,1) forecasts: Step 3
yˆT+1|T = (1 + φ1)yT − (φ1 − φ2)yT−1 − (φ2 − φ3)yT−2 − φ3yT−3 + θ1eT.
111
Point forecasts (h=1)
yt = (1 + φ1)yt−1 − (φ1 − φ2)yt−2 − (φ2 − φ3)yt−3 − φ3yt−4 + εt + θ1εt−1.
ARIMA(3,1,1) forecasts: Step 2
yT+1 = (1 + φ1)yT − (φ1 − φ2)yT−1 − (φ2 − φ3)yT−2
− φ3yT−3 + εT+1 + θ1εT. ARIMA(3,1,1) forecasts: Step 3
yˆT+1|T = (1 + φ1)yT − (φ1 − φ2)yT−1 − (φ2 − φ3)yT−2 − φ3yT−3 + θ1eT.
111
Point forecasts (h=2)
yt = (1 + φ1)yt−1 − (φ1 − φ2)yt−2 − (φ2 − φ3)yt−3 − φ3yt−4 + εt + θ1εt−1.
112
Point forecasts (h=2)
yt = (1 + φ1)yt−1 − (φ1 − φ2)yt−2 − (φ2 − φ3)yt−3 − φ3yt−4 + εt + θ1εt−1.
ARIMA(3,1,1) forecasts: Step 2
yT+2 = (1 + φ1)yT+1 − (φ1 − φ2)yT − (φ2 − φ3)yT−1 − φ3yT−2 + εT+2 + θ1εT+1.
112
Point forecasts (h=2)
yt = (1 + φ1)yt−1 − (φ1 − φ2)yt−2 − (φ2 − φ3)yt−3 − φ3yt−4 + εt + θ1εt−1.
ARIMA(3,1,1) forecasts: Step 2
yT+2 = (1 + φ1)yT+1 − (φ1 − φ2)yT − (φ2 − φ3)yT−1
− φ3yT−2 + εT+2 + θ1εT+1. ARIMA(3,1,1) forecasts: Step 3
yˆT+2|T = (1 + φ1)yˆT+1|T − (φ1 − φ2)yT − (φ2 − φ3)yT−1 − φ3yT−2.
112
Prediction intervals
95% prediction interval
yˆT+h|T ± 1.96vT+h|T
where vT+h|T is estimated forecast variance.
113
Prediction intervals
95% prediction interval
yˆT+h|T ± 1.96vT+h|T
where vT+h|T is estimated forecast variance.
vT+1|T = σˆ2 for all ARIMA models regardless of
parameters and orders.
Multi-step prediction intervals for ARIMA(0,0,q): q
yt = εt + θiεt−i. i=1
forh=2,3,….
h−1
vT|T+h=σˆ2 1+θi2 , i=1
113
Prediction intervals
95% prediction interval
yˆT+h|T ± 1.96vT+h|T
where vT+h|T is estimated forecast variance.
Multi-step prediction intervals for ARIMA(0,0,q): q
yt = εt + θiεt−i. i=1
forh=2,3,….
h−1
vT|T+h=σˆ2 1+θi2 , i=1
114
Prediction intervals
95% prediction interval
yˆT+h|T ± 1.96vT+h|T
where vT+h|T is estimated forecast variance.
Multi-step prediction intervals for ARIMA(0,0,q): q
yt = εt + θiεt−i. i=1
h−1
vT|T+h=σˆ2 1+θi2 , forh=2,3,…. i=1
AR(1): Rewrite as MA(∞) and use above result. Other models beyond scope of this subject.
114
Prediction intervals
Prediction intervals increase in size with forecast horizon.
Prediction intervals can be difficult to calculate by hand
Calculations assume residuals are uncorrelated and normally distributed.
115
Prediction intervals
Prediction intervals increase in size with forecast horizon.
Prediction intervals can be difficult to calculate by hand
Calculations assume residuals are uncorrelated
and normally distributed.
Prediction intervals tend to be too narrow.
the uncertainty in the parameter estimates has not
been accounted for.
the ARIMA model assumes historical patterns will
not change during the forecast period.
the ARIMA model assumes uncorrelated future .errors115
Outline
1 Stationarity and differencing
2 Backshift notation
3 Non-seasonal ARIMA models
4 Estimation and order selection
5 ARIMA modelling in R
6 Forecasting
7 Seasonal ARIMA models
8 ARIMA vs ETS 117
Seasonal ARIMA models
ARIMA (p, d, q) (P, D, Q)m
↑↑
Non-seasonal part Seasonal part of of the model of the model
where m = number of observations per year.
118
Seasonal ARIMA models
E.g., ARIMA(1, 1, 1)(1, 1, 1)4 model (without constant)
(1−φ1B)(1−Φ1B4)(1−B)(1−B4)yt =(1+θ1B)(1+Θ1B4)εt TTTTTT
Non-seasonal Non-seasonal
Non-seasonal MA(1)
Seasonal MA(1)
AR(1)
difference
Seasonal difference
Seasonal AR(1)
119
Seasonal ARIMA models
E.g., ARIMA(1, 1, 1)(1, 1, 1)4 model (without constant) (1−φ1B)(1−Φ1B4)(1−B)(1−B4)yt =(1+θ1B)(1+Θ1B4)εt
All the factors can be multiplied out and the general model written as follows:
yt = (1 + φ1)yt−1 − φ1yt−2 + (1 + Φ1)yt−4
− (1 + φ1 + Φ1 + φ1Φ1)yt−5 + (φ1 + φ1Φ1)yt−6
− Φ1yt−8 + (Φ1 + φ1Φ1)yt−9 − φ1Φ1yt−10 + εt + θ1εt−1 + Θ1εt−4 + θ1Θ1εt−5.
120
Common ARIMA models
The US Census Bureau uses the following models most often:
ARIMA(0,1,1)(0,1,1)m ARIMA(0,1,2)(0,1,1)m ARIMA(2,1,0)(0,1,1)m ARIMA(0,2,2)(0,1,1)m ARIMA(2,1,2)(0,1,1)m
with log transformation with log transformation with log transformation with log transformation with no transformation
121
Seasonal ARIMA models
The seasonal part of an AR or MA model will be seen in the seasonal lags of the PACF and ACF.
ARIMA(0,0,0)(0,0,1)12 will show:
a spike at lag 12 in the ACF but no other significant spikes.
The PACF will show exponential decay in the seasonal lags; that is, at lags 12, 24, 36, . . . .
ARIMA(0,0,0)(1,0,0)12 will show:
exponential decay in the seasonal lags of the ACF a single significant spike at lag 12 in the PACF.
122
US leisure employment
leisure <- us_employment %>%
filter(Title == “Leisure and Hospitality”, year(Month) > 2000) %>%
mutate(Employed = Employed/1000) %>% select(Month, Employed)
autoplot(leisure, Employed) +
labs(title = “US employment: leisure and hospitality”,
y=”Number of people (millions)”)
16
14
12
US employment: leisure and hospitality
2005 Jan 2010 Jan 2015 Jan 2020 Jan
Month [1M]
123
Number of people (millions)
US leisure employment
leisure %>%
gg_tsdisplay(difference(Employed, 12),
plot_type=’partial’, lag=36) +
labs(title=”Seasonally differenced”, y=””)
0.6 0.4 0.2 0.0
−0.2 −0.4
Seasonally differenced
2005 Jan
2010 Jan
Month
2015 Jan
2020 Jan
1.00 0.75 0.50 0.25 0.00
6 12 18 24 30 36
lag [1M]
1.00 0.75 0.50 0.25 0.00
6 12 18 24 30 36
lag [1M]
124
acf
pacf
US leisure employment
leisure %>%
gg_tsdisplay(difference(Employed, 12) %>% difference(),
plot_type=’partial’, lag=36) +
labs(title = “Double differenced”, y=””)
0.15 0.10 0.05 0.00
−0.05 −0.10
Double differenced
2005 Jan
2010 Jan
Month
2015 Jan
2020 Jan
0.2 0.1 0.0
−0.1 −0.2 −0.3
6 12 18 24 30 36
lag [1M]
0.2 0.1 0.0
−0.1 −0.2 −0.3
6 12 18 24 30 36
lag [1M]
125
acf
pacf
US leisure employment
fit <- leisure %>%
model(
arima012011 = ARIMA(Employed ~ pdq(0,1,2) + PDQ(0,1,1)),
arima210011 = ARIMA(Employed ~ pdq(2,1,0) + PDQ(0,1,1)),
auto = ARIMA(Employed, stepwise = FALSE, approx = FALSE)
)
fit %>% pivot_longer(everything(), names_to = “Model name”,
values_to = “Orders”)
## # A mable: 3 x 2
## # Key: Model name [3]
## ‘Model name‘ Orders
##
## 1 arima012011
## 2 arima210011
## 3 auto
126
US leisure employment
glance(fit) %>% arrange(AICc) %>% select(.model:BIC)
## # A tibble: 3 x 6
## .model sigma2 log_lik AIC AICc BIC
##
## 1 auto 0.00142
## 2 arima210011 0.00145
## 3 arima012011 0.00146
395. -780. -780. -763.
392. -776. -776. -763.
391. -775. -775. -761.
127
US leisure employment
fit %>% select(auto) %>% gg_tsresiduals(lag=36)
0.10 0.05 0.00
−0.05 −0.10 −0.15
2005 Jan
2010 Jan
Month
2015 Jan
2020 Jan
0.1
0.0
−0.1
30 20 10
0
6 12 18 24 30 36
lag [1M]
−0.1 0.0 0.1
.resid
128
acf .resid
count
US leisure employment
augment(fit) %>% features(.innov, ljung_box, lag=24, dof=4)
## # A tibble: 3 x 3
## .model lb_stat lb_pvalue
##
## 1 arima012011 22.4
## 2 arima210011 18.9
## 3 auto 16.6
0.320
0.527
0.680
129
US leisure employment
forecast(fit, h=36) %>%
filter(.model==’auto’) %>%
autoplot(leisure) +
labs(title = “US employment: leisure and hospitality”,
y=”Number of people (millions)”)
130
17.5
15.0
12.5
l
US employment: leisure and hospitality
2000 Jan
2005 Jan
2010 Jan
2015 Jan
2020 Jan
Month
evel 80
95
Number of people (millions)
Cortecosteroid drug sales
h02 <- PBS %>%
filter(ATC2 == “H02”) %>%
summarise(Cost = sum(Cost)/1e6)
131
Cortecosteroid drug sales
h02 %>% autoplot(
Cost
)
1.25
1.00
0.75
0.50
1995 Jan
2000 Jan
2005 Jan
Month [1M]
132
Cost
Cortecosteroid drug sales
h02 %>% autoplot(
log(Cost)
)
0.0
−0.4
−0.8
1995 Jan 2000 Jan 2005 Jan
Month [1M]
133
log(Cost)
Cortecosteroid drug sales
h02 %>% autoplot(
log(Cost) %>% difference(12)
)
0.4
0.2
0.0
1995 Jan 2000 Jan 2005 Jan
Month [1M]
134
log(Cost) %>% difference(12)
Cortecosteroid drug sales
h02 %>% gg_tsdisplay(difference(log(Cost),12),
lag_max = 36, plot_type = ‘partial’)
0.4
0.2
0.0
1995 Jan 2000 Jan 2005 Jan
Month
0.4 0.2 0.0
−0.2
6 12 18 24 30 36
lag [1M]
0.4 0.2 0.0
−0.2
6 12 18 24 30 36
lag [1M]
135
acf difference(log(Cost), 12)
pacf
Cortecosteroid drug sales
Choose D = 1 and d = 0.
Spikes in PACF at lags 12 and 24 suggest seasonal AR(2) term.
Spikes in PACF suggests possible non-seasonal AR(3) term.
Initial candidate model: ARIMA(3,0,0)(2,1,0)12.
136
Cortecosteroid drug sales
.model AICc
ARIMA(3,0,1)(0,1,2)[12] -485 ARIMA(3,0,1)(1,1,1)[12] -484 ARIMA(3,0,1)(0,1,1)[12] -484 ARIMA(3,0,1)(2,1,0)[12] -476 ARIMA(3,0,0)(2,1,0)[12] -475 ARIMA(3,0,2)(2,1,0)[12] -475 ARIMA(3,0,1)(1,1,0)[12] -463
137
Cortecosteroid drug sales
fit <- h02 %>%
model(best = ARIMA(log(Cost) ~ 0 + pdq(3,0,1) + PDQ(0,1,2))
report(fit)
## Series: Cost
## Model: ARIMA(3,0,1)(0,1,2)[12]
## Transformation: log(Cost)
##
## Coefficients:
## ar1 ar2 ar3 ma1 sma1 sma2
## -0.160 0.5481 0.5678 0.383 -0.5222 -0.1768
## s.e. 0.164 0.0878 0.0942 0.190 0.0861 0.0872
##
## sigma^2 estimated as 0.004278: log likelihood=250
## AIC=-486 AICc=-485 BIC=-463
138
)
Cortecosteroid drug sales
gg_tsresiduals(fit) 0.2
0.1
0.0 −0.1 −0.2
1995 Jan
2000 Jan
Month
2005 Jan
0.15 0.10 0.05 0.00
−0.05 −0.10 −0.15
30 20 10
0
−0.2 −0.1 0.0 0.1 0.2
.resid
6 12 18
lag [1M]
139
acf .resid
count
Cortecosteroid drug sales
augment(fit) %>%
features(.innov, ljung_box, lag = 36, dof = 6)
## # A tibble: 1 x 3
## .model lb_stat lb_pvalue
##
## 1 best 50.7 0.0104
144
Cortecosteroid drug sales
fit <- h02 %>% model(auto = ARIMA(log(Cost)))
report(fit)
## Series: Cost
## Model: ARIMA(2,1,0)(0,1,1)[12]
## Transformation: log(Cost)
##
## Coefficients:
##
##
## s.e. 0.0712 0.0714 0.0694
##
## sigma^2 estimated as 0.004387: log likelihood=245
## AIC=-483 AICc=-483 BIC=-470
ar1 ar2 sma1
-0.8491 -0.4207 -0.6401
141
Cortecosteroid drug sales
gg_tsresiduals(fit)
0.2 0.1 0.0
−0.1 −0.2
1995 Jan
2000 Jan
Month
2005 Jan
0.1
0.0
−0.1
50 40 30 20 10
0
6 12 18
lag [1M]
−0.2 −0.1
0.0 0.1 0.2
.resid
142
acf .resid
count
Cortecosteroid drug sales
augment(fit) %>%
features(.innov, ljung_box, lag = 36, dof = 3)
## # A tibble: 1 x 3
## .model lb_stat lb_pvalue
##
## 1 auto 59.3 0.00332
147
Cortecosteroid drug sales
fit <- h02 %>%
model(best = ARIMA(log(Cost), stepwise = FALSE,
report(fit)
approximation = FALSE,
order_constraint = p + q + P + Q <= 9))
## Series: Cost
## Model: ARIMA(4,1,1)(2,1,2)[12]
## Transformation: log(Cost)
##
## Coefficients:
## ar1 ar2 ar3 ar4
## -0.0425 0.210 0.202 -0.227 -0.742 0.621 -0.383
## s.e. 0.2167 0.181 0.114 0.081 0.207 0.242
## sma1 sma2
## -1.202 0.496
## s.e. 0.249 0.213
##
## sigma^2 estimated as 0.004049: log likelihood=254
## AIC=-489 AICc=-487 BIC=-456
0.118
ma1 sar1 sar2
144
Cortecosteroid drug sales
gg_tsresiduals(fit)
0.1
0.0 −0.1 −0.2
1995 Jan
2000 Jan
Month
2005 Jan
0.15 0.10 0.05 0.00
−0.05 −0.10 −0.15
40 30 20 10
0
6 12 18
lag [1M]
−0.2 −0.1 0.0
.resid
0.1 0.2
145
acf .resid
count
Cortecosteroid drug sales
augment(fit) %>%
features(.innov, ljung_box, lag = 36, dof = 6)
## # A tibble: 1 x 3
## .model lb_stat lb_pvalue
##
## 1 best 50.7 0.0104
144
Cortecosteroid drug sales
Training data: July 1991 to June 2006 Test data: July 2006–June 2008
fit <- h02 %>%
filter_index(~ “2006 Jun”) %>%
model(
ARIMA(log(Cost) ~ pdq(3, 0, 0) + PDQ(2, 1, 0)), ARIMA(log(Cost) ~ pdq(3, 0, 1) + PDQ(2, 1, 0)), ARIMA(log(Cost) ~ pdq(3, 0, 2) + PDQ(2, 1, 0)), ARIMA(log(Cost) ~ pdq(3, 0, 1) + PDQ(1, 1, 0)) # … #
)
fit %>%
forecast(h = “2 years”) %>%
accuracy(h02)
147
Cortecosteroid drug sales
.model RMSE
ARIMA(3,0,1)(1,1,1)[12] 0.0619 ARIMA(3,0,1)(0,1,2)[12] 0.0621 ARIMA(3,0,1)(0,1,1)[12] 0.0630 ARIMA(2,1,0)(0,1,1)[12] 0.0630 ARIMA(4,1,1)(2,1,2)[12] 0.0631 ARIMA(3,0,2)(2,1,0)[12] 0.0651 ARIMA(3,0,1)(2,1,0)[12] 0.0653 ARIMA(3,0,1)(1,1,0)[12] 0.0666 ARIMA(3,0,0)(2,1,0)[12] 0.0668
148
Cortecosteroid drug sales
Models with lowest AICc values tend to give slightly better results than the other models. AICc comparisons must have the same orders of differencing. But RMSE test set comparisons can involve any models.
Use the best model available, even if it does not pass all tests.
ARIMA(3,0,1)(0,1,2)12 has the lowest AICc, while ARIMA(3,0,1)(1,1,1)12 has the lowest RMSE. We stick to the lowest AICc as these are comparable.
149
Cortecosteroid drug sales
fit <- h02 %>%
model(ARIMA(Cost ~ 0 + pdq(3,0,1) + PDQ(0,1,2)))
fit %>% forecast %>% autoplot(h02) +
labs(y = “H02 Expenditure ($AUD)”)
1.2
0.9
0.6
0.3
l
1995 Jan
2000 Jan
2005 Jan
2010 Jan
Month
evel 80
95
150
H02 Expenditure ($AUD)
Outline
1 Stationarity and differencing
2 Backshift notation
3 Non-seasonal ARIMA models
4 Estimation and order selection
5 ARIMA modelling in R
6 Forecasting
7 Seasonal ARIMA models
8 ARIMA vs ETS 151
ARIMA vs ETS
Myth that ARIMA models are more general than exponential smoothing.
Linear exponential smoothing models all special cases of ARIMA models.
Non-linear exponential smoothing models have no equivalent ARIMA counterparts.
Many ARIMA models have no exponential smoothing counterparts.
ETS models are all non-stationary. Models with seasonality or non-damped trend (or both) have two unit roots; all other models have one unit root.
152
ARIMA vs ETS
ETS models
Combination of components
9 non−additive ETS models
All ETS models
with M components Many large models
6 additive ETS models
ARIMA models
Modelling autocorrelations
Potentially ∞ models
All stationary models
153
Equivalences
ETS model
ETS(A,N,N) ETS(A,A,N)
ETS(A,Ad,N)
ETS(A,N,A) ETS(A,A,A) ETS(A,Ad,A)
ARIMA model
ARIMA(0,1,1) ARIMA(0,2,2)
ARIMA(1,1,2)
ARIMA(0,0,m)(0,1,0)m ARIMA(0,1,m + 1)(0,1,0)m ARIMA(1,0,m + 1)(0,1,0)m
Parameters
θ1=α−1
θ1 =α+β−2
θ2 = 1 − α
φ1=φ
θ1 =α+φβ−1−φ θ2 =(1−α)φ
154
Example: Australian population
aus_economy <- global_economy %>% filter(Code == “AUS”) %>%
mutate(Population = Population/1e6)
aus_economy %>%
slice(-n()) %>%
stretch_tsibble(.init = 10) %>%
model(ets = ETS(Population),
arima = ARIMA(Population)
) %>%
forecast(h = 1) %>%
accuracy(aus_economy) %>%
select(.model, ME:RMSSE)
## # A tibble: 2 x 8
## .model ME RMSE MAE MPE MAPE MASE RMSSE
##
## 1 arima 0.0420 0.194 0.0789 0.277 0.509 0.317 0.746
## 2 ets 0.0202 0.0774 0.0543 0.112 0.327 0.218 0.298
155
Example: Australian population
aus_economy %>%
model(ETS(Population)) %>%
forecast(h = “5 years”) %>%
autoplot(aus_economy) +
labs(title = “Australian population”,
y = “People (millions)”)
156
Australian population
25
20
15
l
10
1960 1980
2000 2020
Year
evel 80
95
People (millions)
Example: Cement production
cement <- aus_production %>%
select(Cement) %>%
filter_index(“1988 Q1” ~ .)
train <- cement %>% filter_index(. ~ “2007 Q4”)
fit <- train %>%
model(
arima = ARIMA(Cement),
ets = ETS(Cement)
)
157
Example: Cement production
fit %>%
select(arima) %>%
report()
## Series: Cement
## Model: ARIMA(1,0,1)(2,1,1)[4] w/ drift
##
## Coefficients:
## ar1 ma1 sar1 sar2 sma1
## 0.8886 -0.237 0.081 -0.234 -0.898
## s.e. 0.0842 0.133 0.157 0.139 0.178
##
## sigma^2 estimated as 11456: log likelihood=-464
## AIC=941 AICc=943 BIC=957
constant
5.39
1.48
158
Example: Cement production
fit %>%
select(ets) %>%
report()
## Series: Cement
## Model: ETS(M,N,M)
##
##
##
##
##
## l s1 s2 s3 s4
## 1695 1.03 1.05 1.01 0.912
##
Smoothing parameters:
alpha = 0.753
gamma = 1e-04
Initial states:
## sigma^2: 0.0034
##
## AIC AICc BIC
## 1104 1106 1121
159
Example: Cement production
gg_tsresiduals(fit %>% select(arima), lag_max = 16)
300 200 100
0 −100 −200 −300
1990 Q1
1995 Q1
2000 Q1
2005 Q1
Quarter
0.2 0.1 0.0
−0.1 −0.2
15 10 5 0
2 4 6 8 10 12 14 16
lag [1Q]
−200 0 200
.resid
160
acf .resid
count
Example: Cement production
gg_tsresiduals(fit %>% select(ets), lag_max = 16)
0.15 0.10 0.05 0.00
−0.05 −0.10 −0.15
1990 Q1
1995 Q1
2000 Q1
2005 Q1
Quarter
0.2 0.1 0.0
−0.1 −0.2
20 15 10
5 0
2 4 6 8 10 12 14 16
lag [1Q]
−0.1 0.0 0.1
.resid
161
acf .resid
count
Example: Cement production
fit %>%
select(arima) %>%
augment() %>%
features(.innov, ljung_box, lag = 16, dof = 6)
## # A tibble: 1 x 3
## .model lb_stat lb_pvalue
##
## 1 arima 6.37 0.783
162
Example: Cement production
fit %>%
select(ets) %>%
augment() %>%
features(.innov, ljung_box, lag = 16, dof = 6)
## # A tibble: 1 x 3
## .model lb_stat lb_pvalue
##
## 1 ets 10.0 0.438
163
Example: Cement production
fit %>%
forecast(h = “2 years 6 months”) %>%
accuracy(cement) %>%
select(-ME, -MPE, -ACF1)
## # A tibble: 2 x 7
## .model .type RMSE MAE MAPE MASE RMSSE
##
## 1 arima Test 216. 186. 8.68 1.27 1.26
## 2 ets Test 222. 191. 8.85 1.30 1.29
164
Example: Cement production
fit %>%
select(arima) %>%
forecast(h=”3 years”) %>%
autoplot(cement) +
labs(title = “Cement production in Australia”,
y=”Tonnes (‘000)”)
165
Cement production in Australia
2500
2000
1500
l
1990 Q1
1995 Q1
2000 Q1
2005 Q1
2010 Q1
Quarter
evel 80
95
Tonnes (‘000)