CS计算机代考程序代写 algorithm data structure Bayesian ETC3231/5231 Business forecasting

ETC3231/5231 Business forecasting
Ch9. ARIMA models OTexts.org/fpp3/

Outline
1 Stationarity and differencing
2 Backshift notation
3 Non-seasonal ARIMA models
4 Estimation and order selection
5 ARIMA modelling in R
6 Forecasting
7 Seasonal ARIMA models
8 ARIMA vs ETS 2

ARIMA models
AR: autoregressive (lagged observations as inputs)
I: integrated (differencing to make series stationary)
MA: moving average (lagged errors as inputs)
3

ARIMA models
AR: autoregressive (lagged observations as inputs)
I: integrated (differencing to make series stationary)
MA: moving average (lagged errors as inputs)
An ARIMA model is rarely interpretable in terms of visible data structures like trend and seasonality. But it can capture a huge range of time series patterns.
3

Outline
1 Stationarity and differencing
2 Backshift notation
3 Non-seasonal ARIMA models
4 Estimation and order selection
5 ARIMA modelling in R
6 Forecasting
7 Seasonal ARIMA models
8 ARIMA vs ETS 4

Stationarity
Definition
If {yt} is a stationary time series, then for all s, the distribution of (yt, . . . , yt+s) does not depend on t.
5

Stationarity
Definition
If {yt} is a stationary time series, then for all s, the distribution of (yt, . . . , yt+s) does not depend on t.
A stationary series is:
roughly horizontal
constant variance
no patterns predictable in the long-term
5

Stationary?
gafa_stock %>%
filter(Symbol == “GOOG”, year(Date) == 2018) %>%
autoplot(Close) +
labs(y = “Google closing stock price”, x = “Day”)
6
1200
1100
1000
Jan 2018 Apr 2018 Jul 2018 Oct 2018 Jan 2019
Day
Google closing stock price

Stationary?
gafa_stock %>%
filter(Symbol == “GOOG”, year(Date) == 2018) %>%
autoplot(difference(Close)) +
labs(y = “Google closing stock price”, x = “Day”)
7
60
30
0
−30
−60
Jan 2018
Apr 2018
Jul 2018
Oct 2018
Jan 2019
Day
Google closing stock price

Stationary?
global_economy %>%
filter(Country == “Algeria”) %>%
autoplot(Exports) +
labs(y = “% of GDP”, title = “Algerian Exports”)
8
50
40
30
20
Algerian Exports
1960 1980
2000
Year [1Y]
% of GDP

Stationary?
aus_production %>%
autoplot(Bricks) +
labs(title = “Clay brick production in Australia”)
600
500
400
300
200
Clay brick production in Australia
1960 Q1
1980 Q1
2000 Q1
Quarter [1Q]
9
Bricks

Stationary?
prices %>%
filter(year >= 1900) %>%
autoplot(eggs) +
labs(y=”$US (1993)”, title=”Price of a dozen eggs”)
10
300
200
100
Price of a dozen eggs
1900 1925 1950 1975 2000
year [1Y]
$US (1993)

Stationary?
aus_livestock %>%
filter(Animal == “Pigs”, State == “Victoria”,
year(Month) >= 2015) %>%
autoplot(Count/1e3) +
labs(y = “thousands”, title = “Total pigs slaughtered in Victoria”)
11
Total pigs slaughtered in Victoria
110
100
90
2015 Jan 2016 Jan 2017 Jan 2018 Jan 2019 Jan
Month [1M]
thousands

Stationary?
aus_livestock %>%
filter(Animal == “Pigs”, State == “Victoria”,
year(Month) >= 2010) %>%
autoplot(Count/1e3) +
labs(y = “thousands”, title = “Total pigs slaughtered in Victoria”)
12
Total pigs slaughtered in Victoria
100
80
60
2010 Jan 2012 Jan 2014 Jan 2016 Jan 2018 Jan
Month [1M]
thousands

Stationary?
aus_livestock %>%
filter(Animal == “Pigs”, State == “Victoria”
) %>%
autoplot(Count/1e3) +
labs(y = “thousands”, title = “Total pigs slaughtered in Victoria”)
13
160
120
80
40
Total pigs slaughtered in Victoria
1980 Jan
1990 Jan
2000 Jan
2010 Jan
2020 Jan
Month [1M]
thousands

Stationary?
pelt %>%
autoplot(Lynx) +
labs(y = “Number trapped”,
title = “Annual Canadian Lynx Trappings”)
14
80000
60000
40000
20000
Annual Canadian Lynx Trappings
0
1860 1880
1900 1920
Year [1Y]
Number trapped

Stationarity
Definition
If {yt} is a stationary time series, then for all s, the distribution of (yt, . . . , yt+s) does not depend on t.
A stationary series is:
roughly horizontal
constant variance
no patterns predictable in the long-term
15

Stationarity
Definition
If {yt} is a stationary time series, then for all s, the distribution of (yt, . . . , yt+s) does not depend on t.
A stationary series is:
roughly horizontal
constant variance
no patterns predictable in the long-term
Transformations help to stabilize the variance. For ARIMA modelling, we also need to stabilize the mean.
15

Non-stationarity in the mean
Identifying non-stationary series
time plot.
The ACF of stationary data drops to zero relatively quickly
The ACF of non-stationary data decreases slowly. For non-stationary data, the value of r1 is often large and positive.
16

Example: Google stock price
google_2018 <- gafa_stock %>%
filter(Symbol == “GOOG”, year(Date) == 2018) %>%
mutate(trading_day = row_number()) %>%
update_tsibble(index = trading_day, regular = TRUE)
17

Example: Google stock price
google_2018 %>%
autoplot(Close) + ylab(“Closing stock price ($USD)”) + xlab(“Day”)
1200
1100
1000
0 50 100 150 200 250
Day
18
Closing stock price ($USD)

Example: Google stock price
google_2018 %>%
ACF(Close) %>% autoplot()
1.00
0.75
0.50
0.25
0.00
5 10 15 20
lag [1]
19
acf

Example: Google stock price
google_2018 %>%
autoplot(difference(Close)) +
ylab(“Change in Google closing stock price ($USD)”) + xlab(“Day”)
60
30
0
−30
−60
0 50 100 150 200 250
Day
20
Change in Google closing stock price ($USD)

Example: Google stock price
google_2018 %>% ACF(difference(Close)) %>% autoplot()
0.10
0.05
0.00
−0.05
−0.10
−0.15
5 10 15 20
lag [1]
21
acf

Differencing
Differencing helps to stabilize the mean.
The differenced series is the change between each observation in the original series:
y t′ = y t − y t − 1 .
The differenced series will have only T − 1 values since it is not possible to calculate a differencey1′ forthefirstobservation.
22

Example: Google stock price
The differences are the day-to-day changes. Now the series looks just like a white noise series:
􏰌 No autocorrelations outside the 95% limits.
􏰌 Large Ljung-Box p-value.
Conclusion: The daily change in the Google stock price is essentially a random amount uncorrelated with previous days.
23

Random walk model
Graph of differenced data suggests the following model:
yt −yt−1 =εt or yt =yt−1 +εt where εt ∼ NID(0, σ2).
Very widely used for non-stationary data.
This is the model behind the naïve method.
Random walks typically have:
􏰌 long periods of apparent trends up or down.
􏰌 Sudden/unpredictable changes in direction –
stochastic trend.
Forecast are equal to the last observation
24

Random walk with drift model
If the differenced series has a non-zero mean then:
yt −yt−1 =c+εt or yt =c+yt−1 +εt where εt ∼ NID(0, σ2).
c is the average change between consecutive observations.
If c > 0, yt will tend to drift upwards and vice versa.
􏰌 Stochastic and deterministic trend.
This is the model behind the drift method.
25

Second-order differencing
Occasionally the differenced data will not appear stationary and it may be necessary to difference the data a second time:
26

Second-order differencing
Occasionally the differenced data will not appear stationary and it may be necessary to difference the data a second time:
y′′ = y′ − y′
t t t−1
= (yt − yt−1) − (yt−1 − yt−2) = yt − 2yt−1 + yt−2.
26

Second-order differencing
Occasionally the differenced data will not appear stationary and it may be necessary to difference the data a second time:
y′′ = y′ − y′
t t t−1
= (yt − yt−1) − (yt−1 − yt−2) = yt − 2yt−1 + yt−2.
y′′ will have T − 2 values. t
In practice, it is almost never necessary to go beyond second-order differences.
26

Seasonal differencing
A seasonal difference is the difference between an observation and the corresponding observation from the previous year.
y t′ = y t − y t − m where m = number of seasons.
27

Seasonal differencing
A seasonal difference is the difference between an observation and the corresponding observation from the previous year.
y t′ = y t − y t − m where m = number of seasons.
For monthly data m = 12.
For quarterly data m = 4.
Seasonally differenced series will have T − m obs.
27

Seasonal differencing
A seasonal difference is the difference between an observation and the corresponding observation from the previous year.
y t′ = y t − y t − m where m = number of seasons.
For monthly data m = 12.
For quarterly data m = 4.
Seasonally differenced series will have T − m obs.
If seasonally differenced data is white noise it implies: yt −yt−m =εt or yt =yt−m +εt
The model behind the seasonal naïve method.
27

Antidiabetic drug sales
a10 <- PBS %>%
filter(ATC2 == “A10”) %>%
summarise(Cost = sum(Cost)/1e6)
28

Antidiabetic drug sales
a10 %>% autoplot(
Cost
)
30
20
10
1995 Jan 2000 Jan 2005 Jan
Month [1M]
29
Cost

Antidiabetic drug sales
a10 %>% autoplot(
log(Cost)
)
3.5
3.0
2.5
2.0
1.5
1.0
1995 Jan
2000 Jan
2005 Jan
Month [1M]
30
log(Cost)

Antidiabetic drug sales
a10 %>% autoplot(
log(Cost) %>% difference(12)
)
0.3
0.2
0.1
0.0
−0.1
1995 Jan
2000 Jan
2005 Jan
Month [1M]
31
log(Cost) %>% difference(12)

Cortecosteroid drug sales
h02 <- PBS %>%
filter(ATC2 == “H02”) %>%
summarise(Cost = sum(Cost)/1e6)
32

Cortecosteroid drug sales
h02 %>% autoplot(
Cost
)
1.25
1.00
0.75
0.50
1995 Jan
2000 Jan
2005 Jan
Month [1M]
33
Cost

Cortecosteroid drug sales
h02 %>% autoplot(
log(Cost)
)
0.0
−0.4
−0.8
1995 Jan 2000 Jan 2005 Jan
Month [1M]
34
log(Cost)

Cortecosteroid drug sales
h02 %>% autoplot(
log(Cost) %>% difference(12)
)
0.4
0.2
0.0
1995 Jan 2000 Jan 2005 Jan
Month [1M]
35
log(Cost) %>% difference(12)

Cortecosteroid drug sales
h02 %>% autoplot(
log(Cost) %>% difference(12) %>% difference(1)
)
0.4
0.2
0.0
−0.2
1995 Jan
2000 Jan
2005 Jan
Month [1M]
36
log(Cost) %>% difference(12) %>% difference(1)

Cortecosteroid drug sales
Seasonally differenced series is closer to being stationary.
Remaining non-stationarity can be removed with further first difference.
If yt′ = yt − yt−12 denotes seasonally differenced series, then twice-differenced series is
y∗ = y′ − y′
t t t−1
= (yt − yt−12) − (yt−1 − yt−13) =yt −yt−1 −yt−12 +yt−13 .
37

Seasonal differencing
When both seasonal and first differences are applied. . .
38

Seasonal differencing
When both seasonal and first differences are applied. . .
it makes no difference which is done first—the result will be the same.
If seasonality is strong, we recommend that seasonal differencing be done first because sometimes the resulting series will be stationary and there will be no need for further first difference.
38

Interpretation of differencing
It is important that if differencing is used, the differences are interpretable.
first differences are the change between one observation and the next;
seasonal differences are the change between one year to the next.
39

Interpretation of differencing
It is important that if differencing is used, the differences are interpretable.
first differences are the change between one observation and the next;
seasonal differences are the change between one year to the next.
But taking lag 3 differences for yearly data, for example, results in a model which cannot be sensibly interpreted.
39

Unit root tests
Statistical tests to determine the required order of differencing.
1 Augmented Dickey Fuller test: null hypothesis is that the data are non-stationary and non-seasonal.
2 Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test: null hypothesis is that the data are stationary and non-seasonal.
3 Other tests available for seasonal data.
40

KPSS test
google_2018 %>%
features(Close, unitroot_kpss)
## # A tibble: 1 x 3
## Symbol kpss_stat kpss_pvalue
##
## 1 GOOG 0.573 0.0252
41

KPSS test
google_2018 %>%
features(Close, unitroot_kpss)
## # A tibble: 1 x 3
## Symbol kpss_stat kpss_pvalue
##
## 1 GOOG 0.573
0.0252
google_2018 %>%
features(Close, unitroot_ndiffs)
## # A tibble: 1 x 2
## Symbol ndiffs
##
## 1 GOOG 1
41

Automatically selecting differences
STL decomposition: yt = Tt + St + Rt Seasonal strength Fs = max 􏰀0, 1 − Var(Rt ) 􏰁
Var(St +Rt ) If Fs > 0.64, do one seasonal difference.
h02 %>% mutate(log_sales = log(Cost)) %>%
features(log_sales, list(unitroot_nsdiffs, feat_stl))
## # A tibble: 1 x 10
## nsdiffs trend_strength seasonal_strength~ seasonal_peak_y~
##
## 1 1 0.954 0.952
## # … with 6 more variables: seasonal_trough_year ,
## # spikiness , linearity , curvature ,
## # stl_e_acf1 , stl_e_acf10
6
42

Automatically selecting differences
h02 %>% mutate(log_sales = log(Cost)) %>%
features(log_sales, unitroot_nsdiffs)
## # A tibble: 1 x 1 ## nsdiffs
##
##1 1
h02 %>% mutate(d_log_sales = difference(log(Cost), 12)) %>%
features(d_log_sales, unitroot_ndiffs)
## # A tibble: 1 x 1 ## ndiffs
##
##1 1
43

Outline
1 Stationarity and differencing
2 Backshift notation
3 Non-seasonal ARIMA models
4 Estimation and order selection
5 ARIMA modelling in R
6 Forecasting
7 Seasonal ARIMA models
8 ARIMA vs ETS 44

Backshift notation
A very useful notational device is the backward shift operator, B, which is used as follows:
Byt = yt−1
45

Backshift notation
A very useful notational device is the backward shift operator, B, which is used as follows:
Byt = yt−1
In other words, B, operating on yt, has the effect of shifting the data back one period.
45

Backshift notation
A very useful notational device is the backward shift operator, B, which is used as follows:
Byt = yt−1
In other words, B, operating on yt, has the effect of shifting the data back one period. Two applications of B to yt shifts the data back two periods:
B(Byt) = B2yt = yt−2
45

Backshift notation
A very useful notational device is the backward shift operator, B, which is used as follows:
Byt = yt−1
In other words, B, operating on yt, has the effect of shifting the data back one period. Two applications of B to yt shifts the data back two periods:
B(Byt) = B2yt = yt−2
For monthly data, if we wish to shift attention to “the same month last year”, then B12 is used, and the notation is B12yt = yt−12.
45

Backshift notation
The backward shift operator is convenient for describing the process of differencing.
46

Backshift notation
The backward shift operator is convenient for describing the process of differencing. A first(-order) difference can be written as
yt′ =yt −yt−1 =yt −Byt =(1−B)yt
46

Backshift notation
The backward shift operator is convenient for describing the process of differencing. A first(-order) difference can be written as
yt′ =yt −yt−1 =yt −Byt =(1−B)yt
Note that a first difference is represented by (1 − B).
46

Backshift notation
The backward shift operator is convenient for describing the process of differencing. A first(-order) difference can be written as
yt′ =yt −yt−1 =yt −Byt =(1−B)yt
Note that a first difference is represented by (1 − B).
Similarly, if second-order differences (i.e., first differences of first differences) have to be computed, then:
y′′ =yt −2yt−1 +yt−2 =(1−B)2yt t
46

Backshift notation
Second-order difference is denoted (1 − B)2;
47

Backshift notation
Second-order difference is denoted (1 − B)2; Second-order difference is not the same as a
second difference, which would be denoted 1−B2;
47

Backshift notation
Second-order difference is denoted (1 − B)2; Second-order difference is not the same as a
second difference, which would be denoted 1−B2;
In general, a dth-order difference can be written as
(1 − B)dyt
47

Backshift notation
Second-order difference is denoted (1 − B)2; Second-order difference is not the same as a
second difference, which would be denoted 1−B2;
In general, a dth-order difference can be written as
(1 − B)dyt
A seasonal difference followed by a first difference can be written as
(1 − Bm)(1 − B)yt 47

Backshift notation
The “backshift” notation is convenient because the terms can be multiplied together to see the combined effect.
(1−B)(1−Bm)yt =(1−B−Bm +Bm+1)yt
= yt − yt−1 − yt−m + yt−m−1.
48

Backshift notation
The “backshift” notation is convenient because the terms can be multiplied together to see the combined effect.
(1−B)(1−Bm)yt =(1−B−Bm +Bm+1)yt
= yt − yt−1 − yt−m + yt−m−1.
For monthly data, m = 12 and we obtain the same result as earlier.
48

Outline
1 Stationarity and differencing
2 Backshift notation
3 Non-seasonal ARIMA models
4 Estimation and order selection
5 ARIMA modelling in R
6 Forecasting
7 Seasonal ARIMA models
8 ARIMA vs ETS 49

Autoregressive models
Autoregressive model – AR(p):
yt =c+φ1yt−1 +φ2yt−2 +···+φpyt−p +εt, where εt is white noise. This is a multiple regression with lagged values of yt as predictors.
50
AR(1)
12
10
8
0 25 50 75 100
idx [1]
AR(2)
25.0
22.5
20.0
17.5
15.0
0 25 50 75 100
idx [1]

AR(1) model
yt = −0.8yt−1 + εt
εt ∼ N(0,1), T = 100.
2.5
0.0
−2.5
AR(1)
0 25 50 75 100
idx [1]
51

AR(1) model
yt = φ1yt−1 + εt
When φ1 = 0, yt is equivalent to a WN
When φ1 = 1, yt is equivalent to a RW
We require |φ1| < 1 for stationarity. The closer φ1 is to the bounds the more the process wanders above or below it’s unconditional mean (zero in this case). When φ1 < 0, yt tends to oscillate between positive and negative values. 52 AR(1) model including a constant yt =18−0.8yt−1 +εt εt ∼ N(0,1), T = 100. 12.5 10.0 7.5 AR(1) 0 25 50 75 100 idx [1] 53 AR(1) model including a constant yt = c + φ1yt−1 + εt Whenφ1 =0andc=0,yt isequivalenttoWN; Whenφ1 =1andc=0,yt isequivalenttoaRW; Whenφ1 =1andc̸=0,yt isequivalenttoaRW with drift; 54 AR(1) model including a constant yt = c + φ1yt−1 + εt c is related to the mean of yt. Let E(yt) = μ then c = μ × (1 − φ1). ARIMA() takes care of whether you need a constant or not, or you can overide it. 55 AR(1) model including a constant Ifincludedestimatedmodelreturnsw/ mean Series: sim Model: ARIMA(1,0,0) w/ mean Coefficients: ar1 constant -0.8381 18.3527 s.e. 0.0540 0.1048 sigma^2 estimated as 1.11: log likelihood=-146.7 AIC=299.4 AICc=299.7 BIC=307.2 56 AR(2) model yt =8+1.3yt−1 −0.7yt−2 +εt εt ∼ N(0,1), T = 100. 25.0 22.5 20.0 17.5 15.0 AR(2) 0 25 50 75 100 idx [1] 57 Stationarity conditions We normally restrict autoregressive models to stationary data, and then some constraints on the values of the parameters are required. General condition for stationarity Complexrootsof1−φ1z−φ2z2 −···−φpzp lie outside the unit circle on the complex plane. 58 Stationarity conditions We normally restrict autoregressive models to stationary data, and then some constraints on the values of the parameters are required. General condition for stationarity Complexrootsof1−φ1z−φ2z2 −···−φpzp lie outside the unit circle on the complex plane. Forp=1:−1<φ1 <1. For p = 2: −1<φ2 <1 φ2 +φ1 <1 φ2 −φ1 <1. More complicated conditions hold for p ≥ 3. Estimation software takes care of this. 58 Moving Average (MA) models Moving Average model - MA(q) yt =c+εt +θ1εt−1 +θ2εt−2 +···+θqεt−q, where εt is white noise. This is a multiple regression with past errors as predictors. Don’t confuse this with moving average smoothing! MA(1) 22 20 18 0 25 50 75 100 idx [1] MA(2) 2.5 0.0 −2.5 −5.0 0 25 50 75 100 idx [1] 59 MA(1) model yt =20+εt +0.8εt−1 εt ∼ N(0,1), T = 100. 22 20 18 MA(1) 0 25 50 75 100 idx [1] 60 MA(2) model yt = εt − εt−1 + 0.8εt−2 εt ∼ N(0,1), T = 100. 2.5 0.0 −2.5 −5.0 MA(2) 0 25 50 75 100 idx [1] 61 MA(∞) models It is possible to write any stationary AR(p) process as an MA(∞) process. Example: AR(1) yt = φ1yt−1 + εt = φ1(φ1yt−2 + εt−1) + εt = φ21yt−2 + φ1εt−1 + εt = φ31yt−3 + φ21εt−2 + φ1εt−1 + εt ... 62 MA(∞) models It is possible to write any stationary AR(p) process as an MA(∞) process. Example: AR(1) yt = φ1yt−1 + εt = φ1(φ1yt−2 + εt−1) + εt = φ21yt−2 + φ1εt−1 + εt = φ31yt−3 + φ21εt−2 + φ1εt−1 + εt ... Provided −1 < φ1 < 1: yt =εt +φ1εt−1 +φ21εt−2 +φ31εt−3 +··· 62 Invertibility Any MA(q) process can be written as an AR(∞) process if we impose some constraints on the MA parameters. Then the MA model is called "invertible". Invertible models have some mathematical properties that make them easier to use in practice. Invertibility of an ARIMA model is equivalent to forecastability of an ETS model. 63 Invertibility General condition for invertibility Complexrootsof1+θ1z+θ2z2 +···+θqzq lieoutside the unit circle on the complex plane. 64 Invertibility General condition for invertibility Complexrootsof1+θ1z+θ2z2 +···+θqzq lieoutside the unit circle on the complex plane. Forq=1:−1<θ1 <1. For q = 2: −1<θ2 <1 θ2 +θ1 >−1 θ1 −θ2 <1. More complicated conditions hold for q ≥ 3. Estimation software takes care of this. 64 ARMA models Autoregressive Moving Average model - ARMA(p, q) yt =c+φ1yt−1 +···+φpyt−p +θ1εt−1 +···+θqεt−q +εt. 65 ARMA models Autoregressive Moving Average model - ARMA(p, q) yt =c+φ1yt−1 +···+φpyt−p +θ1εt−1 +···+θqεt−q +εt. Predictors include both lagged values of yt and lagged errors. Conditions on coefficients ensure stationarity. Conditions on coefficients ensure invertibility. These are stationary models. They are only suitable for stationary series. 65 ARIMA models Combine ARMA model with differencing. Lety′ =(1−B)dyt. AutoreRressive Integrated Moving Average model - ARIMA(p, d, q) y′ =c+φ1y′ +···+φpy′ t t−1 t−p +θ1εt−1 +···+θqεt−q +εt. 66 ARIMA models (AutoRegressive Integrated Moving Average) ARIMA(p, d, q) model AR: p = order of the autoregressive part I: d = degree of first differencing involved MA: q = order of the moving average part. White noise model: ARIMA(0,0,0) Random walk: ARIMA(0,1,0) with no constant Random walk with drift: ARIMA(0,1,0) with const. AR(p): ARIMA(p,0,0) MA(q): ARIMA(0,0,q) 67 Backshift notation for ARIMA AR(p):yt =c+φ1yt−1 +···+φpyt−p +εt MA(q):yt =c+εt +θ1εt−1 +···+θqεt−q 68 Backshift notation for ARIMA ARMA(p, q): ARIMA(p, d, q) model: 69 ARIMA(1,1,1) model (1−φ1B) (1−B)yt = c+(1+θ1B)εt ↑↑↑ AR(1) First MA(1) difference Written out: yt =c+yt−1 +φ1yt−1 −φ1yt−2 +θ1εt−1 +εt 70 R model Intercept form (1−φ1B−···−φpBp)yt′ =c+(1+θ1B+···+θqBq)εt Mean form (1−φ1B−···−φpBp)(yt′ −μ)=(1+θ1B+···+θqBq)εt y t′ = ( 1 − B ) d y t μ is the mean of yt′. c = μ(1 − φ1 − · · · − φp). ARIMA() in the fable package uses intercept form. 71 Egyptian exports global_economy %>%
filter(Code == “EGY”) %>%
autoplot(Exports) +
labs(y = “% of GDP”, title = “Egyptian Exports”)
Egyptian Exports
30
25
20
15
10
1960 1980
2000
Year [1Y]
72
% of GDP

Egyptian exports
fit <- global_economy %>% filter(Code == “EGY”) %>%
model(ARIMA(Exports))
report(fit)
## Series: Exports
## Model: ARIMA(2,0,1) w/ mean
##
## Coefficients:
## ar1 ar2 ma1 constant
## 1.676 -0.8034 -0.690 2.562
## s.e. 0.111 0.0928 0.149 0.116
##
## sigma^2 estimated as 8.046: log likelihood=-142
## AIC=293 AICc=294 BIC=303
73

Egyptian exports
fit <- global_economy %>% filter(Code == “EGY”) %>%
model(ARIMA(Exports))
report(fit)
## Series: Exports
## Model: ARIMA(2,0,1) w/ mean
##
## Coefficients:
## ar1 ar2 ma1 constant
## 1.676 -0.8034 -0.690 2.562
## s.e. 0.111 0.0928 0.149 0.116
##
## sigma^2 estimated as 8.046: log likelihood=-142
## AIC=293 AICc=294 BIC=303
ARIMA(2,0,1) model:
yt = 2.56 + 1.68yt−1 − 0.80yt−2 − 0.69εt−1 + εt,
where εt is white noise with a standard deviation of 2.837 =

8.046.
73

Egyptian exports
gg_tsresiduals(fit)
6 3 0
−3 −6
1960 1980
2000
Year
0.2 0.1 0.0
−0.1 −0.2
15 10 5 0
4 8 12 16
lag [1Y]
−4 0 4
.resid
74
acf .resid
count

Egyptian exports
augment(fit) %>%
features(.resid, ljung_box, lag = 10, dof = 3)
## # A tibble: 1 x 4
## Country .model
##
## 1 Egypt, Arab Rep. ARIMA(Exports) 5.78 0.565
lb_stat lb_pvalue
75

Egyptian exports
fit %>% forecast(h=10) %>%
autoplot(global_economy) +
labs(y = “% of GDP”, title = “Egyptian Exports”)
Egyptian Exports
30
25
20
15
10
l
1960 1980
2000 2020
Year
evel 80
95
76
% of GDP

Understanding ARIMA models
If c = 0 and d = 0, the long-term forecasts will go to zero. If c = 0 and d = 1, the long-term forecasts will go to a non-zero constant.
If c = 0 and d = 2, the long-term forecasts will follow a straight line.
If c ̸= 0 and d = 0, the long-term forecasts will go to the mean of the data.
If c ̸= 0 and d = 1, the long-term forecasts will follow a straight line.
If c ̸= 0 and d = 2, the long-term forecasts will follow a quadratic trend.
77

Understanding ARIMA models
Forecast variance and d
The higher the value of d, the more rapidly the prediction intervals increase in size.
For d = 0, the long-term forecast standard deviation will go to the standard deviation of the historical data.
Cyclic behaviour
For cyclic forecasts, p ≥ 2 and some restrictions on coefficients are required.
Ifp=2,weneedφ21 +4φ2 <0.Thenaveragecycleof length (2π)/ 􏰍arc cos(−φ1(1 − φ2)/(4φ2))􏰎 . 78 Outline 1 Stationarity and differencing 2 Backshift notation 3 Non-seasonal ARIMA models 4 Estimation and order selection 5 ARIMA modelling in R 6 Forecasting 7 Seasonal ARIMA models 8 ARIMA vs ETS 79 Maximum likelihood estimation Having identified the model order, we need to estimatetheparametersc,φ1,...,φp,θ1,...,θq. 80 Maximum likelihood estimation Having identified the model order, we need to estimatetheparametersc,φ1,...,φp,θ1,...,θq. MLE is very similar to least squares estimation obtained by minimizing T 􏰆 e 2t t−1 The ARIMA() function allows CLS or MLE estimation. Non-linear optimization must be used in either case. Different software will give different estimates. 80 Partial autocorrelations Partial autocorrelations measure relationship between yt and yt−k, when the effects of other time lags — 1,2,3,...,k−1—areremoved. 81 Partial autocorrelations Partial autocorrelations measure relationship between yt and yt−k, when the effects of other time lags — 1,2,3,...,k−1—areremoved. αk = kth partial autocorrelation coefficient = equal to the estimate of φk in regression: yt =c+φ1yt−1 +φ2yt−2 +···+φkyt−k. 81 Partial autocorrelations Partial autocorrelations measure relationship between yt and yt−k, when the effects of other time lags — 1,2,3,...,k−1—areremoved. αk = kth partial autocorrelation coefficient = equal to the estimate of φk in regression: yt =c+φ1yt−1 +φ2yt−2 +···+φkyt−k. Varying number of terms on RHS gives αk for different values of k. α1 = ρ1 same critical values of ±1.96/ T as for ACF. Last significant αk indicates the order of an AR model. 81 √ Egyptian exports egypt <- global_economy %>% filter(Code == “EGY”)
egypt %>% ACF(Exports) %>% autoplot()
egypt %>% PACF(Exports) %>% autoplot()
0.5
0.6
0.3
0.0
−0.3
0.0
4 8 12 16
lag [1Y]
4 8 12 16
lag [1Y]
82
acf
pacf

Egyptian exports
global_economy %>% filter(Code == “EGY”) %>%
gg_tsdisplay(Exports, plot_type=’partial’)
30 25 20 15 10
1960 1980
2000
Year
0.5
0.0
4 8 12 16
lag [1Y]
0.5
0.0
4 8 12 16
lag [1Y]
83
acf Exports
pacf

ACF and PACF interpretation
AR(1)
ρk =φk1 fork=1,2,…;
α1 =φ1 αk =0 fork=2,3,….
So we have an AR(1) model when
autocorrelations exponentially decay there is a single significant partial autocorrelation.
84

ACF and PACF interpretation
AR(p)
ACF dies out in an exponential or damped sine-wave manner
PACF has all zero spikes beyond the pth spike
So we have an AR(p) model when
the ACF is exponentially decaying or sinusoidal there is a significant spike at lag p in PACF, but none beyond p
85

ACF and PACF interpretation
MA(1)
ρ1 =θ1 ρk =0 fork=2,3,…; αk = −(−θ1)k
So we have an MA(1) model when
the PACF is exponentially decaying and there is a single significant spike in ACF
86

ACF and PACF interpretation
MA(q)
PACF dies out in an exponential or damped sine-wave manner
ACF has all zero spikes beyond the qth spike
So we have an MA(q) model when
the PACF is exponentially decaying or sinusoidal there is a significant spike at lag q in ACF, but none beyond q
87

Example: Mink trapping
mink <- as_tsibble(fma::mink) mink %>% autoplot(value) +
xlab(“Year”) +
ylab(“Minks trapped (thousands)”) +
ggtitle(“Annual number of minks trapped”)
88
90000
60000
30000
Annual number of minks trapped
1860 1880 1900
Year
Minks trapped (thousands)

Example: Mink trapping
mink %>% gg_tsdisplay(value, plot_type=’partial’)
90000
60000
30000
1860 1880 1900
index
0.6 0.4 0.2 0.0
−0.2 −0.4
5 10 15
lag [1Y]
0.6 0.4 0.2 0.0
−0.2 −0.4
5 10 15
lag [1Y]
89
acf value
pacf

Information criteria
Akaike’s Information Criterion (AIC):
AIC = −2 log(L) + 2(p + q + k + 1), where L is the likelihood of the data,
k = 1 if c ̸= 0 and k = 0 if c = 0. Corrected AIC:
AICc = AIC + 2(p+q+k+1)(p+q+k+2) . T−p−q−k−2
Bayesian Information Criterion:
BIC = AIC + [log(T) − 2](p + q + k + 1).
Good models are obtained by minimizing either the AIC, AICc or BIC. Our preference is to use the AICc.
90

Outline
1 Stationarity and differencing
2 Backshift notation
3 Non-seasonal ARIMA models
4 Estimation and order selection
5 ARIMA modelling in R
6 Forecasting
7 Seasonal ARIMA models
8 ARIMA vs ETS 91

How does ARIMA() work?
A non-seasonal ARIMA process
φ(B)(1 − B)dyt = c + θ(B)εt Need to select appropriate orders: (p, q, d)
Hyndman and Khandakar (JSS, 2008) algorithm:
Select no. differences d and D via KPSS test and seasonal strength measure.
Select p, q by minimising AICc.
Use stepwise search to traverse model space.
92

How does ARIMA() work?
AICc=−2log(L)+2(p+q+k+1)􏰉1+ (p+q+k+2) 􏰊. T−p−q−k−2
where L is the maximised likelihood fitted to the differenced data, k = 1 if c ̸= 0 and k = 0 otherwise.
93

How does ARIMA() work?
AICc=−2log(L)+2(p+q+k+1)􏰉1+ (p+q+k+2) 􏰊. T−p−q−k−2
where L is the maximised likelihood fitted to the differenced data, k = 1 if c ̸= 0 and k = 0 otherwise.
Step1:
Selectcurrentmodel(withsmallestAICc)from: ARIMA(2, d, 2)
ARIMA(0, d, 0)
ARIMA(1, d, 0)
ARIMA(0, d, 1)
93

How does ARIMA() work?
AICc=−2log(L)+2(p+q+k+1)􏰉1+ (p+q+k+2) 􏰊. T−p−q−k−2
where L is the maximised likelihood fitted to the differenced data, k = 1 if c ̸= 0 and k = 0 otherwise.
Step1:
Selectcurrentmodel(withsmallestAICc)from: ARIMA(2, d, 2)
ARIMA(0, d, 0)
ARIMA(1, d, 0)
ARIMA(0, d, 1) Considervariationsofcurrentmodel:
vary one of p, q, from current model by ±1; p, q both vary from current model by ±1; Include/exclude c from current model.
Step2:
Model with lowest AICc becomes current model.
Repeat Step 2 until no lower AICc can be found.
93

Choosing your own model
global_economy %>% filter(Code == “EGY”) %>%
gg_tsdisplay(Exports, plot_type=’partial’)
30 25 20 15 10
1960 1980
2000
Year
0.5
0.0
4 8 12 16
lag [1Y]
0.5
0.0
4 8 12 16
lag [1Y]
94
acf Exports
pacf

Egyptian exports
fit1 <- global_economy %>%
filter(Code == “EGY”) %>%
model(ARIMA(Exports ~ pdq(4,0,0)))
report(fit1)
## Series: Exports
## Model: ARIMA(4,0,0) w/ mean
##
## Coefficients:
## ar1 ar2 ar3 ar4
## 0.986 -0.172 0.181 -0.328
## s.e. 0.125 0.186 0.186 0.127
##
## sigma^2 estimated as 7.885: log likelihood=-141
## AIC=293 AICc=295 BIC=305
constant
6.692
0.356
95

Egyptian exports
fit2 <- global_economy %>%
filter(Code == “EGY”) %>%
model(ARIMA(Exports))
report(fit2)
## Series: Exports
## Model: ARIMA(2,0,1) w/ mean
##
## Coefficients:
## ar1 ar2 ma1 constant
## 1.676 -0.8034 -0.690 2.562
## s.e. 0.111 0.0928 0.149 0.116
##
## sigma^2 estimated as 8.046: log likelihood=-142
## AIC=293 AICc=294 BIC=303
96

Central African Republic exports
global_economy %>%
filter(Code == “CAF”) %>%
autoplot(Exports) +
labs(title=”Central African Republic exports”,
y=”% of GDP”)
97
35
30
25
20
15
10
Central African Republic exports
1960 1980
2000
Year [1Y]
% of GDP

Central African Republic exports
global_economy %>%
filter(Code == “CAF”) %>%
gg_tsdisplay(difference(Exports), plot_type=’partial’)
8 4 0
−4
1960 1980
2000
Year
0.2
0.0 −0.2 −0.4
4 8 12 16
lag [1Y]
0.2
0.0 −0.2 −0.4
4 8 12 16
lag [1Y]
98
acf difference(Exports)
pacf

Central African Republic exports
caf_fit <- global_economy %>%
filter(Code == “CAF”) %>%
model(arima210 = ARIMA(Exports ~ pdq(2,1,0)),
arima013 = ARIMA(Exports ~ pdq(0,1,3)),
stepwise = ARIMA(Exports),
search = ARIMA(Exports, stepwise=FALSE))
99

Central African Republic exports
caf_fit %>% pivot_longer(!Country, names_to = “Model name”,
values_to = “Orders”)
## # A mable: 4 x 3
## # Key:
## Country
##
## 1 Central African Republic arima210
## 2 Central African Republic arima013
## 3 Central African Republic stepwise
## 4 Central African Republic search
Country, Model name [4]
‘Model name‘ Orders





100

Central African Republic exports
glance(caf_fit) %>% arrange(AICc) %>% select(.model:BIC)
## # A tibble: 4 x 6
## .model sigma2 log_lik AIC AICc BIC
##
## 1 search 6.52 -133. 274. 275. 282.
## 2 arima210 6.71 -134. 275. 275. 281.
## 3 arima013 6.54 -133. 274. 275. 282.
## 4 stepwise 6.42 -132. 274. 275. 284.
101

Central African Republic exports
caf_fit %>%
select(search) %>%
gg_tsresiduals()
5
0
−5
1960 1980
2000
Year
0.2 0.1 0.0
−0.1 −0.2
4 8 12 16
lag [1Y]
20 15 10
5 0
−5 0 5
.resid
102
acf .resid
count

Central African Republic exports
augment(caf_fit) %>%
features(.innov, ljung_box, lag = 10, dof = 3)
## # A tibble: 4 x 4
## Country .model lb_stat lb_pvalue
##
## 1 Central African Republic arima013 5.64
## 2 Central African Republic arima210 10.7
## 3 Central African Republic search 5.75
## 4 Central African Republic stepwise 4.12

0.582
0.152
0.569
0.766
103

Central African Republic exports
caf_fit %>%
forecast(h=5) %>%
filter(.model==’search’) %>%
autoplot(global_economy)
104
30
20
10
l
1960 1980
2000 2020
Year
evel 80
95
Exports

Modelling procedure with ARIMA()
1
2
Plot the data. Identify any unusual observations. If necessary, transform the data (using a Box-Cox transformation) to stabilize the variance.
3 If the data are non-stationary: take first differences of the data until the data are stationary.
4 Examine the ACF/PACF: Is an AR(p) or MA(q) model appropriate?
5 Try your chosen model(s), and use the AICc to search for a better model.
6 Check the residuals from your chosen model by plotting the ACF of the residuals, and doing a portmanteau test of the residuals. If they do not look like white noise, try a modified model.
7 Once the residuals look like white noise, calculate forecasts.
105

Automatic modelling procedure with ARIMA()
1
2
Plot the data. Identify any unusual observations. If necessary, transform the data (using a Box-Cox transformation) to stabilize the variance.
3 Use ARIMA to automatically select a model.
6 Check the residuals from your chosen model by plotting the ACF of the residuals, and doing a portmanteau test of the residuals. If they do not look like white noise, try a modified model.
7 Once the residuals look like white noise, calculate forecasts.
106

Modelling procedure
107

Outline
1 Stationarity and differencing
2 Backshift notation
3 Non-seasonal ARIMA models
4 Estimation and order selection
5 ARIMA modelling in R
6 Forecasting
7 Seasonal ARIMA models
8 ARIMA vs ETS 108

Point forecasts
1 Rearrange ARIMA equation so yt is on LHS.
2 Rewrite equation by replacing t by T + h.
3 On RHS, replace future observations by their
forecasts, future errors by zero, and past errors by corresponding residuals.
Start with h = 1. Repeat for h = 2,3,….
109

Point forecasts
ARIMA(3,1,1) forecasts: Step 1
(1−φ1B−φ2B2 −φ3B3)(1−B)yt =(1+θ1B)εt,
110

Point forecasts
ARIMA(3,1,1) forecasts: Step 1
(1−φ1B−φ2B2 −φ3B3)(1−B)yt =(1+θ1B)εt, 􏰉1−(1+φ1)B+(φ1 −φ2)B2 +(φ2 −φ3)B3 +φ3B4􏰊yt
= (1 + θ1B)εt,
110

Point forecasts
ARIMA(3,1,1) forecasts: Step 1
(1−φ1B−φ2B2 −φ3B3)(1−B)yt =(1+θ1B)εt, 􏰉1−(1+φ1)B+(φ1 −φ2)B2 +(φ2 −φ3)B3 +φ3B4􏰊yt
= (1 + θ1B)εt,
yt − (1 + φ1)yt−1 + (φ1 − φ2)yt−2 + (φ2 − φ3)yt−3 + φ3yt−4 = εt + θ1εt−1.
110

Point forecasts
ARIMA(3,1,1) forecasts: Step 1
(1−φ1B−φ2B2 −φ3B3)(1−B)yt =(1+θ1B)εt, 􏰉1−(1+φ1)B+(φ1 −φ2)B2 +(φ2 −φ3)B3 +φ3B4􏰊yt
= (1 + θ1B)εt, yt − (1 + φ1)yt−1 + (φ1 − φ2)yt−2 + (φ2 − φ3)yt−3
+ φ3yt−4 = εt + θ1εt−1.
yt = (1 + φ1)yt−1 − (φ1 − φ2)yt−2 − (φ2 − φ3)yt−3 − φ3yt−4 + εt + θ1εt−1.
110

Point forecasts (h=1)
yt = (1 + φ1)yt−1 − (φ1 − φ2)yt−2 − (φ2 − φ3)yt−3 − φ3yt−4 + εt + θ1εt−1.
111

Point forecasts (h=1)
yt = (1 + φ1)yt−1 − (φ1 − φ2)yt−2 − (φ2 − φ3)yt−3 − φ3yt−4 + εt + θ1εt−1.
ARIMA(3,1,1) forecasts: Step 2
yT+1 = (1 + φ1)yT − (φ1 − φ2)yT−1 − (φ2 − φ3)yT−2 − φ3yT−3 + εT+1 + θ1εT.
111

Point forecasts (h=1)
yt = (1 + φ1)yt−1 − (φ1 − φ2)yt−2 − (φ2 − φ3)yt−3 − φ3yt−4 + εt + θ1εt−1.
ARIMA(3,1,1) forecasts: Step 2
yT+1 = (1 + φ1)yT − (φ1 − φ2)yT−1 − (φ2 − φ3)yT−2
− φ3yT−3 + εT+1 + θ1εT. ARIMA(3,1,1) forecasts: Step 3
yˆT+1|T = (1 + φ1)yT − (φ1 − φ2)yT−1 − (φ2 − φ3)yT−2 − φ3yT−3 + θ1eT.
111

Point forecasts (h=2)
yt = (1 + φ1)yt−1 − (φ1 − φ2)yt−2 − (φ2 − φ3)yt−3 − φ3yt−4 + εt + θ1εt−1.
112

Point forecasts (h=2)
yt = (1 + φ1)yt−1 − (φ1 − φ2)yt−2 − (φ2 − φ3)yt−3 − φ3yt−4 + εt + θ1εt−1.
ARIMA(3,1,1) forecasts: Step 2
yT+2 = (1 + φ1)yT+1 − (φ1 − φ2)yT − (φ2 − φ3)yT−1 − φ3yT−2 + εT+2 + θ1εT+1.
112

Point forecasts (h=2)
yt = (1 + φ1)yt−1 − (φ1 − φ2)yt−2 − (φ2 − φ3)yt−3 − φ3yt−4 + εt + θ1εt−1.
ARIMA(3,1,1) forecasts: Step 2
yT+2 = (1 + φ1)yT+1 − (φ1 − φ2)yT − (φ2 − φ3)yT−1
− φ3yT−2 + εT+2 + θ1εT+1. ARIMA(3,1,1) forecasts: Step 3
yˆT+2|T = (1 + φ1)yˆT+1|T − (φ1 − φ2)yT − (φ2 − φ3)yT−1 − φ3yT−2.
112

Prediction intervals
95% prediction interval
yˆT+h|T ± 1.96􏰋vT+h|T
where vT+h|T is estimated forecast variance.
113

Prediction intervals
95% prediction interval
yˆT+h|T ± 1.96􏰋vT+h|T
where vT+h|T is estimated forecast variance.
vT+1|T = σˆ2 for all ARIMA models regardless of
parameters and orders.
Multi-step prediction intervals for ARIMA(0,0,q): q
yt = εt + 􏰆 θiεt−i. i=1
forh=2,3,….
 h−1  
vT|T+h=σˆ2 1+􏰆θi2 , i=1
113

Prediction intervals
95% prediction interval
yˆT+h|T ± 1.96􏰋vT+h|T
where vT+h|T is estimated forecast variance.
Multi-step prediction intervals for ARIMA(0,0,q): q
yt = εt + 􏰆 θiεt−i. i=1
forh=2,3,….
 h−1  
vT|T+h=σˆ2 1+􏰆θi2 , i=1
114

Prediction intervals
95% prediction interval
yˆT+h|T ± 1.96􏰋vT+h|T
where vT+h|T is estimated forecast variance.
Multi-step prediction intervals for ARIMA(0,0,q): q
yt = εt + 􏰆 θiεt−i. i=1
 h−1  
vT|T+h=σˆ2 1+􏰆θi2 , forh=2,3,…. i=1
AR(1): Rewrite as MA(∞) and use above result. Other models beyond scope of this subject.
114

Prediction intervals
Prediction intervals increase in size with forecast horizon.
Prediction intervals can be difficult to calculate by hand
Calculations assume residuals are uncorrelated and normally distributed.
115

Prediction intervals
Prediction intervals increase in size with forecast horizon.
Prediction intervals can be difficult to calculate by hand
Calculations assume residuals are uncorrelated
and normally distributed.
Prediction intervals tend to be too narrow.
􏰌 the uncertainty in the parameter estimates has not
been accounted for.
􏰌 the ARIMA model assumes historical patterns will
not change during the forecast period.
􏰌 the ARIMA model assumes uncorrelated future .errors115

Your turn
For the GDP data (from global_economy):
fit a suitable ARIMA model to the logged data for all countries
check the residual diagnostics for Australia; produce forecasts of your fitted model for Australia.
116

Outline
1 Stationarity and differencing
2 Backshift notation
3 Non-seasonal ARIMA models
4 Estimation and order selection
5 ARIMA modelling in R
6 Forecasting
7 Seasonal ARIMA models
8 ARIMA vs ETS 117

Seasonal ARIMA models
ARIMA (p, d, q) (P, D, Q)m 􏰕 􏰔􏰓 􏰖 􏰕 􏰔􏰓 􏰖
↑↑
Non-seasonal part Seasonal part of of the model of the model
where m = number of observations per year.
118

Seasonal ARIMA models
E.g., ARIMA(1, 1, 1)(1, 1, 1)4 model (without constant)
(1−φ1B)(1−Φ1B4)(1−B)(1−B4)yt =(1+θ1B)(1+Θ1B4)εt TTTTTT
􏰂Non-seasonal􏰃 􏰂Non-seasonal􏰃
􏰂Non-seasonal􏰃 MA(1)
􏰂Seasonal􏰃 MA(1)
AR(1)
difference
􏰂 Seasonal 􏰃 difference
􏰂Seasonal􏰃 AR(1)
119

Seasonal ARIMA models
E.g., ARIMA(1, 1, 1)(1, 1, 1)4 model (without constant) (1−φ1B)(1−Φ1B4)(1−B)(1−B4)yt =(1+θ1B)(1+Θ1B4)εt
All the factors can be multiplied out and the general model written as follows:
yt = (1 + φ1)yt−1 − φ1yt−2 + (1 + Φ1)yt−4
− (1 + φ1 + Φ1 + φ1Φ1)yt−5 + (φ1 + φ1Φ1)yt−6
− Φ1yt−8 + (Φ1 + φ1Φ1)yt−9 − φ1Φ1yt−10 + εt + θ1εt−1 + Θ1εt−4 + θ1Θ1εt−5.
120

Common ARIMA models
The US Census Bureau uses the following models most often:
ARIMA(0,1,1)(0,1,1)m ARIMA(0,1,2)(0,1,1)m ARIMA(2,1,0)(0,1,1)m ARIMA(0,2,2)(0,1,1)m ARIMA(2,1,2)(0,1,1)m
with log transformation with log transformation with log transformation with log transformation with no transformation
121

Seasonal ARIMA models
The seasonal part of an AR or MA model will be seen in the seasonal lags of the PACF and ACF.
ARIMA(0,0,0)(0,0,1)12 will show:
a spike at lag 12 in the ACF but no other significant spikes.
The PACF will show exponential decay in the seasonal lags; that is, at lags 12, 24, 36, . . . .
ARIMA(0,0,0)(1,0,0)12 will show:
exponential decay in the seasonal lags of the ACF a single significant spike at lag 12 in the PACF.
122

US leisure employment
leisure <- us_employment %>%
filter(Title == “Leisure and Hospitality”, year(Month) > 2000) %>%
mutate(Employed = Employed/1000) %>% select(Month, Employed)
autoplot(leisure, Employed) +
labs(title = “US employment: leisure and hospitality”,
y=”Number of people (millions)”)
16
14
12
US employment: leisure and hospitality
2005 Jan 2010 Jan 2015 Jan 2020 Jan
Month [1M]
123
Number of people (millions)

US leisure employment
leisure %>%
gg_tsdisplay(difference(Employed, 12),
plot_type=’partial’, lag=36) +
labs(title=”Seasonally differenced”, y=””)
0.6 0.4 0.2 0.0
−0.2 −0.4
Seasonally differenced
2005 Jan
2010 Jan
Month
2015 Jan
2020 Jan
1.00 0.75 0.50 0.25 0.00
6 12 18 24 30 36
lag [1M]
1.00 0.75 0.50 0.25 0.00
6 12 18 24 30 36
lag [1M]
124
acf
pacf

US leisure employment
leisure %>%
gg_tsdisplay(difference(Employed, 12) %>% difference(),
plot_type=’partial’, lag=36) +
labs(title = “Double differenced”, y=””)
0.15 0.10 0.05 0.00
−0.05 −0.10
Double differenced
2005 Jan
2010 Jan
Month
2015 Jan
2020 Jan
0.2 0.1 0.0
−0.1 −0.2 −0.3
6 12 18 24 30 36
lag [1M]
0.2 0.1 0.0
−0.1 −0.2 −0.3
6 12 18 24 30 36
lag [1M]
125
acf
pacf

US leisure employment
fit <- leisure %>%
model(
arima012011 = ARIMA(Employed ~ pdq(0,1,2) + PDQ(0,1,1)),
arima210011 = ARIMA(Employed ~ pdq(2,1,0) + PDQ(0,1,1)),
auto = ARIMA(Employed, stepwise = FALSE, approx = FALSE)
)
fit %>% pivot_longer(everything(), names_to = “Model name”,
values_to = “Orders”)
## # A mable: 3 x 2
## # Key: Model name [3]
## ‘Model name‘ Orders
##
## 1 arima012011
## 2 arima210011
## 3 auto
126

US leisure employment
glance(fit) %>% arrange(AICc) %>% select(.model:BIC)
## # A tibble: 3 x 6
## .model sigma2 log_lik AIC AICc BIC
##
## 1 auto 0.00142
## 2 arima210011 0.00145
## 3 arima012011 0.00146

395. -780. -780. -763.
392. -776. -776. -763.
391. -775. -775. -761.
127

US leisure employment
fit %>% select(auto) %>% gg_tsresiduals(lag=36)
0.10 0.05 0.00
−0.05 −0.10 −0.15
2005 Jan
2010 Jan
Month
2015 Jan
2020 Jan
0.1
0.0
−0.1
30 20 10
0
6 12 18 24 30 36
lag [1M]
−0.1 0.0 0.1
.resid
128
acf .resid
count

US leisure employment
augment(fit) %>% features(.innov, ljung_box, lag=24, dof=4)
## # A tibble: 3 x 3
## .model lb_stat lb_pvalue
##
## 1 arima012011 22.4
## 2 arima210011 18.9
## 3 auto 16.6

0.320
0.527
0.680
129

US leisure employment
forecast(fit, h=36) %>%
filter(.model==’auto’) %>%
autoplot(leisure) +
labs(title = “US employment: leisure and hospitality”,
y=”Number of people (millions)”)
130
17.5
15.0
12.5
l
US employment: leisure and hospitality
2000 Jan
2005 Jan
2010 Jan
2015 Jan
2020 Jan
Month
evel 80
95
Number of people (millions)

Cortecosteroid drug sales
h02 <- PBS %>%
filter(ATC2 == “H02”) %>%
summarise(Cost = sum(Cost)/1e6)
131

Cortecosteroid drug sales
h02 %>% autoplot(
Cost
)
1.25
1.00
0.75
0.50
1995 Jan
2000 Jan
2005 Jan
Month [1M]
132
Cost

Cortecosteroid drug sales
h02 %>% autoplot(
log(Cost)
)
0.0
−0.4
−0.8
1995 Jan 2000 Jan 2005 Jan
Month [1M]
133
log(Cost)

Cortecosteroid drug sales
h02 %>% autoplot(
log(Cost) %>% difference(12)
)
0.4
0.2
0.0
1995 Jan 2000 Jan 2005 Jan
Month [1M]
134
log(Cost) %>% difference(12)

Cortecosteroid drug sales
h02 %>% gg_tsdisplay(difference(log(Cost),12),
lag_max = 36, plot_type = ‘partial’)
0.4
0.2
0.0
1995 Jan 2000 Jan 2005 Jan
Month
0.4 0.2 0.0
−0.2
6 12 18 24 30 36
lag [1M]
0.4 0.2 0.0
−0.2
6 12 18 24 30 36
lag [1M]
135
acf difference(log(Cost), 12)
pacf

Cortecosteroid drug sales
Choose D = 1 and d = 0.
Spikes in PACF at lags 12 and 24 suggest seasonal AR(2) term.
Spikes in PACF suggests possible non-seasonal AR(3) term.
Initial candidate model: ARIMA(3,0,0)(2,1,0)12.
136

Cortecosteroid drug sales
.model AICc
ARIMA(3,0,1)(0,1,2)[12] -485 ARIMA(3,0,1)(1,1,1)[12] -484 ARIMA(3,0,1)(0,1,1)[12] -484 ARIMA(3,0,1)(2,1,0)[12] -476 ARIMA(3,0,0)(2,1,0)[12] -475 ARIMA(3,0,2)(2,1,0)[12] -475 ARIMA(3,0,1)(1,1,0)[12] -463
137

Cortecosteroid drug sales
fit <- h02 %>%
model(best = ARIMA(log(Cost) ~ 0 + pdq(3,0,1) + PDQ(0,1,2))
report(fit)
## Series: Cost
## Model: ARIMA(3,0,1)(0,1,2)[12]
## Transformation: log(Cost)
##
## Coefficients:
## ar1 ar2 ar3 ma1 sma1 sma2
## -0.160 0.5481 0.5678 0.383 -0.5222 -0.1768
## s.e. 0.164 0.0878 0.0942 0.190 0.0861 0.0872
##
## sigma^2 estimated as 0.004278: log likelihood=250
## AIC=-486 AICc=-485 BIC=-463
138
)

Cortecosteroid drug sales
gg_tsresiduals(fit) 0.2
0.1
0.0 −0.1 −0.2
1995 Jan
2000 Jan
Month
2005 Jan
0.15 0.10 0.05 0.00
−0.05 −0.10 −0.15
30 20 10
0
−0.2 −0.1 0.0 0.1 0.2
.resid
6 12 18
lag [1M]
139
acf .resid
count

Cortecosteroid drug sales
augment(fit) %>%
features(.resid, ljung_box, lag = 36, dof = 6)
## # A tibble: 1 x 3
## .model lb_stat lb_pvalue
##
## 1 best 57.9 0.00163
140

Cortecosteroid drug sales
fit <- h02 %>% model(auto = ARIMA(log(Cost)))
report(fit)
## Series: Cost
## Model: ARIMA(2,1,0)(0,1,1)[12]
## Transformation: log(Cost)
##
## Coefficients:
##
##
## s.e. 0.0712 0.0714 0.0694
##
## sigma^2 estimated as 0.004387: log likelihood=245
## AIC=-483 AICc=-483 BIC=-470
ar1 ar2 sma1
-0.8491 -0.4207 -0.6401
141

Cortecosteroid drug sales
gg_tsresiduals(fit)
0.2 0.1 0.0
−0.1 −0.2
1995 Jan
2000 Jan
Month
2005 Jan
0.1
0.0
−0.1
50 40 30 20 10
0
6 12 18
lag [1M]
−0.2 −0.1
0.0 0.1 0.2
.resid
142
acf .resid
count

Cortecosteroid drug sales
augment(fit) %>%
features(.resid, ljung_box, lag = 36, dof = 3)
## # A tibble: 1 x 3
## .model lb_stat lb_pvalue
##
## 1 auto 64.7 0.000801
143

Cortecosteroid drug sales
fit <- h02 %>%
model(best = ARIMA(log(Cost), stepwise = FALSE,
report(fit)
approximation = FALSE,
order_constraint = p + q + P + Q <= 9)) ## Series: Cost ## Model: ARIMA(4,1,1)(2,1,2)[12] ## Transformation: log(Cost) ## ## Coefficients: ## ar1 ar2 ar3 ar4 ## -0.0425 0.210 0.202 -0.227 -0.742 0.621 -0.383 ## s.e. 0.2167 0.181 0.114 0.081 0.207 0.242 ## sma1 sma2 ## -1.202 0.496 ## s.e. 0.249 0.213 ## ## sigma^2 estimated as 0.004049: log likelihood=254 ## AIC=-489 AICc=-487 BIC=-456 0.118 ma1 sar1 sar2 144 Cortecosteroid drug sales gg_tsresiduals(fit) 0.1 0.0 −0.1 −0.2 1995 Jan 2000 Jan Month 2005 Jan 0.15 0.10 0.05 0.00 −0.05 −0.10 −0.15 40 30 20 10 0 6 12 18 lag [1M] −0.2 −0.1 0.0 .resid 0.1 0.2 145 acf .resid count Cortecosteroid drug sales augment(fit) %>%
features(.resid, ljung_box, lag = 36, dof = 9)
## # A tibble: 1 x 3
## .model lb_stat lb_pvalue
##
## 1 best 44.7 0.0175
146

Cortecosteroid drug sales
Training data: July 1991 to June 2006 Test data: July 2006–June 2008
fit <- h02 %>%
filter_index(~ “2006 Jun”) %>%
model(
ARIMA(log(Cost) ~ pdq(3, 0, 0) + PDQ(2, 1, 0)), ARIMA(log(Cost) ~ pdq(3, 0, 1) + PDQ(2, 1, 0)), ARIMA(log(Cost) ~ pdq(3, 0, 2) + PDQ(2, 1, 0)), ARIMA(log(Cost) ~ pdq(3, 0, 1) + PDQ(1, 1, 0)) # … #
)
fit %>%
forecast(h = “2 years”) %>%
accuracy(h02)
147

Cortecosteroid drug sales
.model RMSE
ARIMA(3,0,1)(1,1,1)[12] 0.0619 ARIMA(3,0,1)(0,1,2)[12] 0.0621 ARIMA(3,0,1)(0,1,1)[12] 0.0630 ARIMA(2,1,0)(0,1,1)[12] 0.0630 ARIMA(4,1,1)(2,1,2)[12] 0.0631 ARIMA(3,0,2)(2,1,0)[12] 0.0651 ARIMA(3,0,1)(2,1,0)[12] 0.0653 ARIMA(3,0,1)(1,1,0)[12] 0.0666 ARIMA(3,0,0)(2,1,0)[12] 0.0668
148

Cortecosteroid drug sales
Models with lowest AICc values tend to give slightly better results than the other models. AICc comparisons must have the same orders of differencing. But RMSE test set comparisons can involve any models.
Use the best model available, even if it does not pass all tests.
ARIMA(3,0,1)(0,1,2)12 has the lowest AICc, while ARIMA(3,0,1)(1,1,1)12 has the lowest RMSE. We stick to the lowest AICc as these are comparable.
149

Cortecosteroid drug sales
fit <- h02 %>%
model(ARIMA(Cost ~ 0 + pdq(3,0,1) + PDQ(0,1,2)))
fit %>% forecast %>% autoplot(h02) +
labs(y = “H02 Expenditure ($AUD)”)
1.2
0.9
0.6
0.3
l
1995 Jan
2000 Jan
2005 Jan
2010 Jan
Month
evel 80
95
150
H02 Expenditure ($AUD)

Outline
1 Stationarity and differencing
2 Backshift notation
3 Non-seasonal ARIMA models
4 Estimation and order selection
5 ARIMA modelling in R
6 Forecasting
7 Seasonal ARIMA models
8 ARIMA vs ETS 151

ARIMA vs ETS
Myth that ARIMA models are more general than exponential smoothing.
Linear exponential smoothing models all special cases of ARIMA models.
Non-linear exponential smoothing models have no equivalent ARIMA counterparts.
Many ARIMA models have no exponential smoothing counterparts.
ETS models are all non-stationary. Models with seasonality or non-damped trend (or both) have two unit roots; all other models have one unit root.
152

ARIMA vs ETS
ETS models
Combination of components
9 non−additive ETS models
All ETS models
with M components Many large models
6 additive ETS models
ARIMA models
Modelling autocorrelations
Potentially ∞ models
All stationary models
153

Equivalences
ETS model
ETS(A,N,N) ETS(A,A,N)
ETS(A,Ad,N)
ETS(A,N,A) ETS(A,A,A) ETS(A,Ad,A)
ARIMA model
ARIMA(0,1,1) ARIMA(0,2,2)
ARIMA(1,1,2)
ARIMA(0,0,m)(0,1,0)m ARIMA(0,1,m + 1)(0,1,0)m ARIMA(1,0,m + 1)(0,1,0)m
Parameters
θ1=α−1
θ1 =α+β−2
θ2 = 1 − α
φ1=φ
θ1 =α+φβ−1−φ θ2 =(1−α)φ
154

Example: Australian population
aus_economy <- global_economy %>% filter(Code == “AUS”) %>%
mutate(Population = Population/1e6)
aus_economy %>%
slice(-n()) %>%
stretch_tsibble(.init = 10) %>%
model(ets = ETS(Population),
arima = ARIMA(Population)
) %>%
forecast(h = 1) %>%
accuracy(aus_economy) %>%
select(.model, ME:RMSSE)
## # A tibble: 2 x 8
## .model ME RMSE MAE MPE MAPE MASE RMSSE
##
## 1 arima 0.0420 0.194 0.0789 0.277 0.509 0.317 0.746
## 2 ets 0.0202 0.0774 0.0543 0.112 0.327 0.218 0.298
155

Example: Australian population
aus_economy %>%
model(ETS(Population)) %>%
forecast(h = “5 years”) %>%
autoplot(aus_economy) +
labs(title = “Australian population”,
y = “People (millions)”)
156
Australian population
25
20
15
l
10
1960 1980
2000 2020
Year
evel 80
95
People (millions)

Example: Cement production
cement <- aus_production %>%
select(Cement) %>%
filter_index(“1988 Q1” ~ .)
train <- cement %>% filter_index(. ~ “2007 Q4”)
fit <- train %>%
model(
arima = ARIMA(Cement),
ets = ETS(Cement)
)
157

Example: Cement production
fit %>%
select(arima) %>%
report()
## Series: Cement
## Model: ARIMA(1,0,1)(2,1,1)[4] w/ drift
##
## Coefficients:
## ar1 ma1 sar1 sar2 sma1
## 0.8886 -0.237 0.081 -0.234 -0.898
## s.e. 0.0842 0.133 0.157 0.139 0.178
##
## sigma^2 estimated as 11456: log likelihood=-464
## AIC=941 AICc=943 BIC=957
constant
5.39
1.48
158

Example: Cement production
fit %>%
select(ets) %>%
report()
## Series: Cement
## Model: ETS(M,N,M)
##
##
##
##
##
## l s1 s2 s3 s4
## 1695 1.03 1.05 1.01 0.912
##
Smoothing parameters:
alpha = 0.753
gamma = 1e-04
Initial states:
## sigma^2: 0.0034
##
## AIC AICc BIC
## 1104 1106 1121
159

Example: Cement production
gg_tsresiduals(fit %>% select(arima), lag_max = 16)
300 200 100
0 −100 −200 −300
1990 Q1
1995 Q1
2000 Q1
2005 Q1
Quarter
0.2 0.1 0.0
−0.1 −0.2
15 10 5 0
2 4 6 8 10 12 14 16
lag [1Q]
−200 0 200
.resid
160
acf .resid
count

Example: Cement production
gg_tsresiduals(fit %>% select(ets), lag_max = 16)
0.15 0.10 0.05 0.00
−0.05 −0.10 −0.15
1990 Q1
1995 Q1
2000 Q1
2005 Q1
Quarter
0.2 0.1 0.0
−0.1 −0.2
20 15 10
5 0
2 4 6 8 10 12 14 16
lag [1Q]
−0.1 0.0 0.1
.resid
161
acf .resid
count

Example: Cement production
fit %>%
select(arima) %>%
augment() %>%
features(.innov, ljung_box, lag = 16, dof = 6)
## # A tibble: 1 x 3
## .model lb_stat lb_pvalue
##
## 1 arima 6.37 0.783
162

Example: Cement production
fit %>%
select(ets) %>%
augment() %>%
features(.innov, ljung_box, lag = 16, dof = 6)
## # A tibble: 1 x 3
## .model lb_stat lb_pvalue
##
## 1 ets 10.0 0.438
163

Example: Cement production
fit %>%
forecast(h = “2 years 6 months”) %>%
accuracy(cement) %>%
select(-ME, -MPE, -ACF1)
## # A tibble: 2 x 7
## .model .type RMSE MAE MAPE MASE RMSSE
##
## 1 arima Test 216. 186. 8.68 1.27 1.26
## 2 ets Test 222. 191. 8.85 1.30 1.29
164

Example: Cement production
fit %>%
select(arima) %>%
forecast(h=”3 years”) %>%
autoplot(cement) +
labs(title = “Cement production in Australia”,
y=”Tonnes (‘000)”)
165
Cement production in Australia
2500
2000
1500
l
1990 Q1
1995 Q1
2000 Q1
2005 Q1
2010 Q1
Quarter
evel 80
95
Tonnes (‘000)