Week-5 ARIMA Models
Some of the slides are adapted from the lecture notes provided by Prof. Antoine Saure and Prof. Rob Hyndman
Business Forecasting Analytics
ADM 4307 – Fall 2021
ARIMA Models (cont’d)
Ahmet Kandakoglu, PhD
01 November, 2021
Outline
• Review of last lecture
• Non-Seasonal ARIMA models
• Estimation and Order Selection
• ARIMA Modelling in R
• Seasonal ARIMA models
• Examples
• ARIMA vs ETS
ADM 4307 Business Forecasting Analytics 2Fall 2021
Review of Last Lecture
• Stationarity
• Differencing
• Backshift Notation
• Autoregressive Models
• Moving Average Models
ADM 4307 Business Forecasting Analytics 3Fall 2021
Exponential Smoothing vs. ARIMA
• ARIMA models provide another approach to time series forecasting
• ARIMA
• AR: autoregressive (lagged observations as inputs)
• I: integrated (differencing to make series stationary)
• MA: moving average (lagged errors as inputs)
• While exponential smoothing models were based on a description of trend and
seasonality in the data, ARIMA models aim to describe the autocorrelations in the
data.
• Exponential smoothing and ARIMA models are the two most widely used approaches
to time series forecasting.
ADM 4307 Business Forecasting Analytics 4Fall 2021
Stationarity
• A stationary series is:
• roughly horizontal
• constant variance
• no patterns predictable in the long-term
• Transformations help to stabilize the variance.
• For ARIMA modelling, we also need to stabilize the mean.
ADM 4307 Business Forecasting Analytics 5Fall 2021
Differencing
• One way to make a time series stationary is to compute the differences between
consecutive observations. This is known as differencing
• Transformations such as logarithms can help to stabilize the variance of a time series
• Differencing can help stabilize the mean of a time series by removing changes in the
level of a time series, and so eliminating trend and seasonality
• The differenced series is the change between consecutive observations in the original
series, and can be written as
ADM 4307 Business Forecasting Analytics 6Fall 2021
Second-Order Differencing
• Occasionally, the differenced data will not appear stationary and it may be
necessary to difference the data a second time to obtain a stationary series:
• In this case, 𝑦𝑡
′′ will have values 𝑇 − 2.
• Then we would model the change in the changes of the original data.
• In practice, it is almost never necessary to go beyond second-order
differences
ADM 4307 Business Forecasting Analytics 7Fall 2021
Seasonal Differencing
• A seasonal difference is the difference between an observation and the
corresponding observation from the previous season
where 𝑚 is the number of seasons.
• For monthly data 𝑚 = 12
• For quarterly data 𝑚 = 4
ADM 4307 Business Forecasting Analytics 8Fall 2021
Example 1: Antidiabetic Drug Sales
Monthly anti-diabetic drug sales in Australia
a10 <- PBS %>% filter(ATC2 == “A10”) %>%
summarise(Cost = sum(Cost)/1e6)
a10 %>% autoplot(Cost)
a10 %>% autoplot(log(Cost))
a10 %>% autoplot(log(Cost) %>% difference(12))
ADM 4307 Business Forecasting Analytics 9Fall 2021
Example 2: Cortecosteroid Drug Sales
h02 <- PBS %>% filter(ATC2 == “H02”) %>%
summarise(Cost = sum(Cost)/1e6)
h02 %>% autoplot(Cost)
h02 %>% autoplot(log(Cost))
h02 %>% autoplot(log(Cost) %>% difference(12))
h02 %>% autoplot(log(Cost) %>% difference(12) %>%
difference(1))
ADM 4307 Business Forecasting Analytics 10Fall 2021
Automatically Selecting Differences
h02 %>% mutate(log_sales = log(Cost)) %>% features(log_sales, unitroot_nsdiffs)
# A tibble: 1 x 1
nsdiffs
1 1
• Because unitroot_nsdiffs() returns 1 (indicating one seasonal difference is required), we apply
the ndiffs() function to the seasonally differenced data.
h02 %>% mutate(d_log_sales = difference(log(Cost), 12)) %>%
features(d_log_sales, unitroot_ndiffs)
# A tibble: 1 x 1
ndiffs
1 1
• Since unitroot_ndiffs() returns 1, one difference is required.
ADM 4307 Business Forecasting Analytics 11Fall 2021
Autoregressive Models
• In multiple regression, we forecast the variable of interest using a linear
combination of predictors. In an autoregression model, we forecast the
variable of interest using a linear combination of past values of the variable (a
regression of the variable against itself)
• An autoregressive model of order 𝑝, denoted AR(𝑝) model, can be written as:
• 𝑐 is a constant and 𝑒𝑡 is white noise. This is like a multiple regression but with
lagged values of 𝑦𝑡 as predictors
ADM 4307 Business Forecasting Analytics 12Fall 2021
Moving Average Models
• A moving average model of order 𝑞, denoted by 𝑀𝐴(𝑞), uses 𝑞 past forecast
errors in a regression-like model, where 𝑒𝑡 is white noise (This is a multiple
regression with past errors as predictors)
• Moving average models should not be confused with moving average
smoothing
• A moving average model is used for forecasting future values while moving
average smoothing is used for estimating the trend-cycle of past values
ADM 4307 Business Forecasting Analytics 13Fall 2021
Outline
• Review of last lecture
• Non-Seasonal ARIMA models
• Estimation and Order Selection
• ARIMA Modelling in R
• Seasonal ARIMA models
• Examples
• ARIMA vs ETS
ADM 4307 Business Forecasting Analytics 14Fall 2021
Non-Seasonal ARIMA Models
• If we combine differencing with autoregression and a moving average model,
we obtain a non-seasonal ARIMA model.
• ARIMA is an acronym for AutoRegressive Integrated Moving Average
model (integration in this context is the reverse of differencing).
• The full model can be written as:
• 𝑦𝑡
′ is the differenced series (may be more than once)
• The predictors include both lagged values of 𝑦𝑡 and lagged errors.
ADM 4307 Business Forecasting Analytics 15Fall 2021
Non-Seasonal ARIMA Models
• ARIMA(p, d, q) model
• AR: p = order of the autoregressive part
• I: d = degree of first differencing involved
• MA: q = order of the moving average part.
• Many of the models we have already discussed are special cases of the
ARIMA model as shown below:
• White noise : ARIMA(0,0,0)
• Random walk : ARIMA(0,1,0) with no constant
• Random walk with drift : ARIMA(0,1,0) with a constant
• Autoregression : ARIMA(p,0,0)
• Moving average : ARIMA(0,0,q)
ADM 4307 Business Forecasting Analytics 16Fall 2021
Backshift Notation for ARIMA
• Once we start combining components in this way to form more complicated
models, it is much easier to work with the backshift notation.
• Selecting appropriate values for p, d and q can be difficult. The ARIMA()
function in R will do it for you automatically.
ADM 4307 Business Forecasting Analytics 17
𝑀𝐴(𝑞)𝐴𝑅(𝑝) 𝑑 diffs
Fall 2021
Example – Egyptian Exports
global_economy %>% filter(Code == “EGY”) %>%
autoplot(Exports) + labs(y = “% of GDP”, title = “Egyptian Exports”)
ADM 4307 Business Forecasting Analytics 18Fall 2021
Example – Egyptian Exports
fit <- global_economy %>% filter(Code == “EGY”) %>% model(ARIMA(Exports))
report(fit)
ADM 4307 Business Forecasting Analytics 19
Series: Exports
Model: ARIMA(2,0,1) w/ mean
Coefficients:
ar1 ar2 ma1 constant
1.6764 -0.8034 -0.6896 2.5623
s.e. 0.1111 0.0928 0.1492 0.1161
sigma^2 estimated as 8.046: log likelihood=-141.57
AIC=293.13 AICc=294.29 BIC=303.43
This is an ARIMA(2,0,1) model:
𝑦𝑡 = 2.5623 + 1.6764𝑦𝑡−1 − 0.8034𝑦𝑡−2 − 0.6896𝜀𝑡−1 + 𝜀𝑡
where 𝜀𝑡 is white noise with a standard deviation of 2.837= 8.046
Fall 2021
Example – Egyptian Exports
gg_tsresiduals(fit)
ADM 4307 Business Forecasting Analytics 20Fall 2021
Example – Egyptian Exports
fit %>% forecast(h=10) %>%
autoplot(global_economy) + labs(y = “% of GDP”, title = “Egyptian Exports”)
ADM 4307 Business Forecasting Analytics 21Fall 2021
Understanding ARIMA Models
• The ARIMA() function is useful, but anything automated can be a little dangerous.
• It is worth understanding something of the behavior of the models.
• The constant 𝑐 has an important effect on the long-term forecasts obtained from these models:
• If c = 0 and d = 0, the long-term forecasts will go to zero
• If c = 0 and d = 1, the long-term forecasts will go to a non-zero constant
• If c = 0 and d = 2, the long-term forecasts will follow a straight line
• If c ≠ 0 and d = 0, the long-term forecasts will go to the mean of the data
• If c ≠ 0 and d = 1, the long-term forecasts will follow a straight line
• If c ≠ 0 and d = 2, the long-term forecasts will follow a quadratic trend
• The value of 𝑑 also has an effect on the prediction intervals — the higher the value of 𝑑, the
more rapidly the prediction intervals increase in size.
ADM 4307 Business Forecasting Analytics 22Fall 2021
Outline
• Review of last lecture
• Non-Seasonal ARIMA models
• Estimation and Order Selection
• ARIMA Modelling in R
• Seasonal ARIMA models
• Examples
• ARIMA vs ETS
ADM 4307 Business Forecasting Analytics 23Fall 2021
Maximum Likelihood Estimation
• Once the model order has been identified (i.e., the values of 𝑝, d and q), we need to
estimate the parameters c, ∅1, … , ∅𝑝, 𝜃1, … , 𝜃𝑞
• When R estimates the ARIMA model, it uses maximum likelihood estimation (MLE).
• MLE is very similar to the least squares estimates that would be obtained by
minimizing
𝑡−1
𝑇
𝑒𝑡
2
• The ARIMA() function allows MLE estimation using a non-linear optimization
• Different software will give slightly different estimates.
ADM 4307 Business Forecasting Analytics 24Fall 2021
Information Criteria
• Akaike’s Information Criterion (AIC):
• Corrected AIC (AICc):
• Bayesian Information Criterion (BIC):
• These are useful for determining the order of an ARIMA model. Good models are
obtained by minimizing the AIC, AICc or BIC.
• Our preference is to use the AICc (only for selecting the values of p and q)
ADM 4307 Business Forecasting Analytics 25Fall 2021
ACF and PACF Plots
• It is usually not possible to tell, simply from a time plot, what values of p and q are
appropriate for the data.
• However, it is sometimes possible to use the ACF plot, and the closely related PACF
plot, to determine appropriate values for p and q.
• An ACF plot shows the autocorrelations which measure the relationship between
𝑦𝑡 and 𝑦𝑡−1 for different values of 𝑘.
• Now if 𝑦𝑡 and 𝑦𝑡−1 are correlated, then 𝑦𝑡−1 and 𝑦𝑡−2 must also be correlated. But
then 𝑦𝑡 and 𝑦𝑡−2 might be correlated, simply because they are both connected to
𝑦𝑡−1.
ADM 4307 Business Forecasting Analytics 26Fall 2021
ACF and PACF Plots
• Partial autocorrelations measure the relationship between 𝑦𝑡 and 𝑦𝑡−𝑘 after removing
the effects of other time lags 1, 2, . . , 𝑘 − 1.
• So the first partial autocorrelation is identical to the first autocorrelation, because
there is nothing between them to remove. The partial autocorrelations for lags 2, 3
and greater are calculated as follows:
• Varying the number of terms on the right hand side of this autoregression model gives
𝑎𝑘 for different values of 𝑘.
ADM 4307 Business Forecasting Analytics 27Fall 2021
Example – Egyptian Exports
egypt <- global_economy %>% filter(Code == “EGY”)
egypt %>% ACF(Exports) %>% autoplot()
egypt %>% PACF(Exports) %>% autoplot()
ADM 4307 Business Forecasting Analytics 28Fall 2021
ACF and PACF Plots
• A convenient way to produce a time plot, ACF plot and PACF plot in one command is
to use the gg_tsdisplay() function with plot_type = “partial“.
• If the data are from an ARIMA(p,d,0) or ARIMA(0,d,q) model, then the ACF and
PACF plots can be helpful in determining the value of p or q.
• If p and q are both positive, then the plots do not help in finding suitable values of p
and q.
ADM 4307 Business Forecasting Analytics 29Fall 2021
Example – Egyptian Exports
egypt <- global_economy %>% filter(Code == “EGY”)
egypt %>% gg_tsdisplay(Exports, plot_type=’partial’)
ADM 4307 Business Forecasting Analytics 30Fall 2021
ACF and PACF Plots
• The data may follow an ARIMA(p,d,0) model if the ACF and PACF plots of the
differenced data show the following patterns:
• The ACF is exponentially decaying or sinusoidal
• There is a significant spike at lag p in the PACF, but none beyond lag p
• The data may follow an ARIMA(0,d,q) model if the ACF and PACF plots of the
differenced data show the following patterns:
• The PACF is exponentially decaying or sinusoidal
• There is a significant spike at lag q in ACF, but none beyond lag q
ADM 4307 Business Forecasting Analytics 31Fall 2021
Example – Egyptian Exports
• There is a decaying sinusoidal pattern in the
ACF.
• PACF shows the last significant spike at lag 4.
• So in this case, the ACF and PACF lead us to
think an ARIMA(4,0,0) model might be
appropriate.
ADM 4307 Business Forecasting Analytics 32Fall 2021
Example – Egyptian Exports
egypt <- global_economy %>% filter(Code == “EGY”)
fit2 <- egypt %>% model(ARIMA(Exports ~ pdq(4,0,0)))
report(fit2)
ADM 4307 Business Forecasting Analytics 33
Series: Exports
Model: ARIMA(4,0,0) w/ mean
Coefficients:
ar1 ar2 ar3 ar4 constant
0.9861 -0.1715 0.1807 -0.3283 6.6922
s.e. 0.1247 0.1865 0.1865 0.1273 0.3562
sigma^2 estimated as 7.885: log likelihood=-140.53
AIC=293.05 AICc=294.7 BIC=305.41
This model is only slightly worse than the ARIMA(2,0,1) model identified by ARIMA() (with
an AICc value of 294.70 compared to 294.29).
Fall 2021
Order Selection using Particular Values
• We can also specify particular values of pdq() that ARIMA() can search for.
• For example, to find the best ARIMA model with p∈{1,2,3}, q∈{0,1,2} and d=1, you
could use ARIMA(y ~ pdq(p=1:3, d=1, q=0:2)).
ADM 4307 Business Forecasting Analytics 34Fall 2021
Example – Egyptian Exports
egypt <- global_economy %>% filter(Code == “EGY”)
fit3 <- egypt %>% model(ARIMA(Exports ~ pdq(p=2:6, d=0, q=0:2)))
report(fit3)
ADM 4307 Business Forecasting Analytics 35
Series: Exports
Model: ARIMA(2,0,1) w/ mean
Coefficients:
ar1 ar2 ma1 constant
1.6764 -0.8034 -0.6896 2.5623
s.e. 0.1111 0.0928 0.1492 0.1161
sigma^2 estimated as 8.046: log likelihood=-141.57
AIC=293.13 AICc=294.29 BIC=303.43
Fall 2021
Outline
• Review of last lecture
• Non-Seasonal ARIMA models
• Estimation and Order Selection
• ARIMA Modelling in R
• Seasonal ARIMA models
• Examples
• ARIMA vs ETS
ADM 4307 Business Forecasting Analytics 36Fall 2021
How does ARIMA() work?
• The ARIMA() function in R uses a variation of the Hyndman and Khandakar
algorithm, which combines KPSS unit root tests, minimization of the AICc and MLE to
obtain an ARIMA model.
Hyndman-Khandakar algorithm for automatic ARIMA modelling
• Select number of differences d and D via KPSS test and seasonal strength measure.
• Select p, q by minimizing AICc.
• Use stepwise search to traverse model space
ADM 4307 Business Forecasting Analytics 37Fall 2021
Modelling Procedure with ARIMA
1. Plot the data and identify any unusual observations
2. If necessary, transform the data (using a Box-Cox transformation) to stabilize the
variance
3. If the data are non-stationary, take first differences of the data until the data are
stationary
4. Examine the ACF/PACF: Is an ARIMA(p,d,0) or ARIMA(0,d,q) model appropriate?
5. Try your chosen model(s) and use the AICc to search for a better model
6. Check the residuals from your chosen model by plotting the ACF of the residuals
and doing a portmanteau test. If they do not look like white noise, try a modified
model
7. Once the residuals look like white noise, calculate forecasts
ADM 4307 Business Forecasting Analytics 38Fall 2021
Automatic Modelling Procedure with ARIMA
1. Plot the data and identify any unusual observations
2. If necessary, transform the data (using a Box-Cox transformation) to stabilize the
variance
3. Use ARIMA() to automatically select a model.
4. Check the residuals from your chosen model by plotting the ACF of the residuals
and doing a portmanteau test. If they do not look like white noise, try a modified
model
5. Once the residuals look like white noise, calculate forecasts
ADM 4307 Business Forecasting Analytics 39Fall 2021
Modelling Procedure
ADM 4307 Business Forecasting Analytics 40Fall 2021
Example – Central African Republic Exports
caf <- global_economy %>% filter(Code == “CAF”)
caf %>% autoplot(Exports) + labs(title=”Central African Republic exports”, y=”% of GDP”)
ADM 4307 Business Forecasting Analytics 41Fall 2021
Example – Central African Republic Exports
Step 1. The time plot shows some non-stationarity, with an overall decline.
Step 2. There is no evidence of changing variance, so we will not do a Box-Cox
transformation.
Step 3. To address the non-stationarity, we will take a first difference of the data.
ADM 4307 Business Forecasting Analytics 42Fall 2021
Example – Central African Republic Exports
caf %>% gg_tsdisplay(difference(Exports), plot_type=’partial’)
ADM 4307 Business Forecasting Analytics 43Fall 2021
Example – Central African Republic Exports
Step 4. PACF is suggestive of AR(2); so an initial candidate model is an
ARIMA(2,1,0).
The ACF suggests an MA(3) model; so an alternative candidate is an
ARIMA(0,1,3).
Step 5. We fit both an ARIMA(2,1,0) and an ARIMA(0,1,3) model along with two
automated model selections, one using the default stepwise procedure,
and one working harder to search a larger model space.
ADM 4307 Business Forecasting Analytics 44Fall 2021
Example – Central African Republic Exports
caf_fit <- caf %>%
model(arima210 = ARIMA(Exports ~ pdq(2,1,0)),
arima013 = ARIMA(Exports ~ pdq(0,1,3)),
stepwise = ARIMA(Exports),
search = ARIMA(Exports, stepwise=FALSE))
caf_fit
ADM 4307 Business Forecasting Analytics 45
# A mable: 1 x 5
# Key: Country [1]
Country arima210 arima013 stepwise search
1 Central Afr~
Fall 2021
Example – Central African Republic Exports
accuracy(caf_fit)
glance(caf_fit) %>% arrange(AICc) %>% select(.model:BIC)
ADM 4307 Business Forecasting Analytics 46
# A tibble: 4 x 11
Country .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
1 Central African Republic arima~ Trai~ -0.343 2.52 1.87 -2.97 9.14 0.853 0.870 0.0417
2 Central African Republic arima~ Trai~ -0.215 2.47 1.86 -1.97 9.12 0.849 0.851 -0.0417
3 Central African Republic stepw~ Trai~ -0.265 2.42 1.82 -2.33 8.93 0.832 0.836 -0.0417
4 Central African Republic search Trai~ -0.269 2.46 1.87 -2.39 9.18 0.853 0.850 -0.0470
# A tibble: 4 x 6
.model sigma2 log_lik AIC AICc BIC
1 search 6.52 -133. 274. 275. 282.
2 arima210 6.71 -134. 275. 275. 281.
3 arima013 6.54 -133. 274. 275. 282.
4 stepwise 6.42 -132. 274. 275. 284.
The full search has
found that an
ARIMA(3,1,0) gives
the lowest AICc value
Fall 2021
Example – Central African Republic Exports
caf_fit %>% select(search) %>% gg_tsresiduals()
Step 6. The ACF plot of the residuals from the ARIMA(3,1,0) model shows that all
autocorrelations are within the threshold limits, indicating that the residuals are
behaving like white noise.
ADM 4307 Business Forecasting Analytics 47Fall 2021
Example – Central African Republic Exports
augment(caf_fit) %>% filter(.model==’search’) %>%
features(.innov, ljung_box, lag = 10, dof = 3)
A portmanteau test returns a large p-value, also suggesting that the residuals are white
noise.
ADM 4307 Business Forecasting Analytics 48
# A tibble: 1 x 4
Country .model lb_stat lb_pvalue
1 Central African Republic search 5.75 0.569
Fall 2021
Example – Central African Republic Exports
Step 7. Forecasts from the chosen model
caf_fit %>% forecast(h=5) %>% filter(.model==’search’) %>% autoplot(global_economy)
caf_fit %>% select(search) %>% forecast(h=5)
ADM 4307 Business Forecasting Analytics 49
# A fable: 5 x 4 [1Y]
# Key: .model [1]
.model Year Exports .mean
1 search 2018 N(13, 6.5) 12.5
2 search 2019 N(13, 8.5) 12.6
3 search 2020 N(13, 11) 12.5
4 search 2021 N(13, 15) 12.5
5 search 2022 N(13, 18) 12.5
Fall 2021
Outline
• Review of last lecture
• Non-Seasonal ARIMA models
• Estimation and Order Selection
• ARIMA Modelling in R
• Seasonal ARIMA models
• Examples
• ARIMA vs ETS
ADM 4307 Business Forecasting Analytics 50Fall 2021
Seasonal ARIMA Models
• A seasonal ARIMA model includes additional seasonal terms. It is written as follows:
where m = number of periods per season
• We will use uppercase notation for the seasonal parts of the model and lowercase
notation for the non-seasonal parts
ADM 4307 Business Forecasting Analytics 51Fall 2021
Seasonal ARIMA Models
• The seasonal part of the model consists of terms that are very similar to the non-
seasonal components, but they involve backshifts of the seasonal period
• For example, an 𝐴𝑅𝐼𝑀𝐴(1,1,1)(1,1,1)4 model (without a constant) is for quarterly data
(m = 4) and can be written as:
ADM 4307 Business Forecasting Analytics 52Fall 2021
Seasonal ARIMA Models
• The seasonal part of an AR or MA model will be seen in the seasonal lags of the
PACF and ACF
• For example, an 𝐴𝑅𝐼𝑀𝐴(0,0,0)(0,0,1)12 model will show:
• A spike at lag 12 in the ACF but no other significant spikes
• The PACF will show exponential decay in the seasonal lags. That is, at lags 12,
24, 36,…
• Similarly, an 𝐴𝑅𝐼𝑀𝐴(0,0,0)(1,0,0)12 model will show:
• Exponential decay in the seasonal lags of the ACF
• A single significant spike at lag 12 in the PACF
ADM 4307 Business Forecasting Analytics 53Fall 2021
Outline
• Review of last lecture
• Non-Seasonal ARIMA models
• Estimation and Order Selection
• ARIMA Modelling in R
• Seasonal ARIMA models
• Examples
• ARIMA vs ETS
ADM 4307 Business Forecasting Analytics 54Fall 2021
Example: Monthly US Employment
We will describe seasonal ARIMA modelling using monthly US employment data for leisure and hospitality
jobs from January 2000 to September 2019.
leisure <- us_employment %>% filter(Title == “Leisure and Hospitality”,
year(Month) > 2000) %>% mutate(Employed = Employed/1000) %>% select(Month, Employed)
autoplot(leisure, Employed) + labs(y=”Number of people (millions)”)
ADM 4307 Business Forecasting Analytics 55Fall 2021
Example: Monthly US Employment
The data are clearly non-stationary, with strong seasonality and a nonlinear trend, so we will first take a
seasonal difference.
leisure %>% gg_tsdisplay(difference(Employed, 12),
plot_type=’partial’, lag=36) + labs(title=”Seasonally differenced”, y=””)
ADM 4307 Business Forecasting Analytics 56Fall 2021
Example: Monthly US Employment
These are also clearly non-stationary, so we take a further first difference.
leisure %>% gg_tsdisplay(difference(Employed, 12) %>% difference(),
plot_type=’partial’, lag=36) + labs(title = “Double differenced”, y=””)
ADM 4307 Business Forecasting Analytics 57Fall 2021
Example: Monthly US Employment
These are also clearly non-stationary, so we take a further first difference.
leisure %>% gg_tsdisplay(difference(Employed, 12) %>% difference(),
plot_type=’partial’, lag=36) + labs(title = “Double differenced”, y=””)
ADM 4307 Business Forecasting Analytics 58Fall 2021
Example: Monthly US Employment
• Our aim now is to find an appropriate ARIMA model
based on the ACF and PACF
• Let’s start with the ACF
• The significant spike at lag 2 in the ACF suggests a
non-seasonal MA(2) component.
• The significant spike at lag 12 in the ACF suggests
a seasonal MA(1) component.
• Consequently, we begin with an
ARIMA(0,1,2)(0,1,1)12 model, indicating
• a first difference,
• a seasonal difference,
• and non-seasonal MA(2) and
• seasonal MA(1) component
ADM 4307 Business Forecasting Analytics 59Fall 2021
Example: Monthly US Employment
• Let’s start with the PACF (using the PACF to select
the non-seasonal part of the model and the ACF to
select the seasonal part of the model)
• We may have selected an ARIMA(2,1,0)(0,1,1)12
model, indicating
• a first difference,
• a seasonal difference,
• and non-seasonal AR(2) and
• seasonal MA(1) component
• We will also include an automatically selected
model. By setting stepwise=FALSE and
approximation=FALSE, we are making R work
extra hard to find a good model.
ADM 4307 Business Forecasting Analytics 60Fall 2021
Example: Monthly US Employment
fit <- leisure %>%
model(
arima012011 = ARIMA(Employed ~ pdq(0,1,2) + PDQ(0,1,1)),
arima210011 = ARIMA(Employed ~ pdq(2,1,0) + PDQ(0,1,1)),
auto = ARIMA(Employed, stepwise = FALSE, approx = FALSE)
)
fit
ADM 4307 Business Forecasting Analytics 61
# A mable: 1 x 3
arima012011 arima210011 auto
1
Fall 2021
Example: Monthly US Employment
glance(fit) %>% arrange(AICc) %>% select(.model:BIC)
The three fitted models have similar AICc values, with the automatically selected model
being a little better.
ADM 4307 Business Forecasting Analytics 62
# A tibble: 3 x 6
.model sigma2 log_lik AIC AICc BIC
1 auto 0.00142 395. -780. -780. -763.
2 arima210011 0.00145 392. -776. -776. -763.
3 arima012011 0.00146 391. -775. -775. -761.
Fall 2021
Example: Monthly US Employment
fit %>% select(auto) %>% gg_tsresiduals(lag=36)
One small but significant spike (at lag 11) out of 36 is still consistent with white noise
ADM 4307 Business Forecasting Analytics 63Fall 2021
Example: Monthly US Employment
augment(fit) %>% features(.innov, ljung_box, lag=24, dof=4)
We use a Ljung-Box test, which has a large p-value, confirming that the residuals are similar to
white noise.
(The p value is greater than 0.05. So, the residuals are independent)
ADM 4307 Business Forecasting Analytics 64
# A tibble: 3 x 3
.model lb_stat lb_pvalue
1 arima012011 22.4 0.320
2 arima210011 18.9 0.527
3 auto 16.6 0.680
Fall 2021
Example: Monthly US Employment
• We now have a seasonal ARIMA model that passes the required checks and is ready for
forecasting. Forecast from the model for the next three years.
• The forecasts have captured the seasonal pattern very well, and the increasing trend extends
the recent pattern.
fit %>% forecast(h=36) %>% filter(.model==’auto’) %>% autoplot(leisure) +
labs(title = “US employment: leisure and hospitality”, y=”Number of people (millions)”)
ADM 4307 Business Forecasting Analytics 65Fall 2021
Example: Cortecosteroid Drug Sales
We will try to forecast monthly corticosteroid (H02) drug sales in Australia.
h02 <- PBS %>% filter(ATC2 == “H02”) %>% summarise(Cost = sum(Cost)/1e6)
h02 %>% autoplot(Cost)
There is a small increase in the variance with the level, so we take logarithms to stabilize the variance
ADM 4307 Business Forecasting Analytics 66Fall 2021
Example: Cortecosteroid Drug Sales
h02 %>% autoplot(log(Cost))
The data are strongly seasonal and obviously non-stationary, so seasonal differencing will be used.
ADM 4307 Business Forecasting Analytics 67Fall 2021
Example: Cortecosteroid Drug Sales
h02 %>% autoplot(log(Cost) %>% difference(12))
It is not clear at this point whether we should do another difference or not.
We decide not to, but the choice is not obvious.
ADM 4307 Business Forecasting Analytics 68Fall 2021
Example: Cortecosteroid Drug Sales
h02 %>% gg_tsdisplay(difference(log(Cost), 12), plot_type=’partial’, lag_max = 36)
ADM 4307 Business Forecasting Analytics 69
• Choose D = 1 and d = 0.
• Spikes in the PACF at lags 12 and 24
suggest seasonal 𝐴𝑅(2) term.
• Three significant spikes in the PACF
suggests possible non-seasonal 𝐴𝑅(3)
term.
• The pattern in the ACF is not indicative of
any simple model.
• Initial candidate model:
𝐴𝑅𝐼𝑀𝐴(3,0,0)(2,1,0)12
Fall 2021
Example: Cortecosteroid Drug Sales
• We fit 𝐴𝑅𝐼𝑀𝐴(3,0,0)(2,1,0)12, along with some variations on it, and compute the AICc values
• Of these models, the best is the 𝐴𝑅𝐼𝑀𝐴(3,0,1)(0,1,2)12 model (i.e., it has the smallest AICc
value)
ADM 4307 Business Forecasting Analytics 70Fall 2021
Example: Cortecosteroid Drug Sales
fit <- h02 %>% model(ARIMA(log(Cost) ~ pdq(3,0,1) + PDQ(0,1,2)))
fit %>% gg_tsresiduals(lag_max=36)
There are a few significant spikes in the ACF.
ADM 4307 Business Forecasting Analytics 71Fall 2021
Example: Cortecosteroid Drug Sales
augment(fit) %>% features(.innov, ljung_box, lag=36, dof=6)
The p value is less than 0.05. So, the residuals are correlated
There are a few significant spikes in the ACF, and the model fails the Ljung-Box test.
The model can still be used for forecasting, but the prediction intervals may not be accurate due
to the correlated residuals.
ADM 4307 Business Forecasting Analytics 72
# A tibble: 1 x 3
.model lb_stat lb_pvalue
1 ARIMA(log(Cost) ~ pdq(3, 0, 1) + PDQ(0, 1, 2)) 44.8 0.0406
Fall 2021
Example: Cortecosteroid Drug Sales
fit <- h02 %>% model(auto = ARIMA(log(Cost)))
report(fit)
ADM 4307 Business Forecasting Analytics 73
Series: Cost
Model: ARIMA(2,1,0)(0,1,1)[12]
Transformation: log(Cost)
Coefficients:
ar1 ar2 sma1
-0.8491 -0.4207 -0.6401
s.e. 0.0712 0.0714 0.0694
sigma^2 estimated as 0.004387: log likelihood=245.39
AIC=-482.78 AICc=-482.56 BIC=-469.77
Fall 2021
Example: Cortecosteroid Drug Sales
fit %>% gg_tsresiduals(lag_max=36)
There are a few significant spikes in the ACF.
ADM 4307 Business Forecasting Analytics 74Fall 2021
Example: Cortecosteroid Drug Sales
augment(fit) %>% features(.innov, ljung_box, lag=36, dof=3)
The p value is less than 0.05. So, the residuals are correlated
There are a few significant spikes in the ACF, and the model fails the Ljung-Box test.
The model can still be used for forecasting, but the prediction intervals may not be accurate due
to the correlated residuals.
ADM 4307 Business Forecasting Analytics 75
# A tibble: 1 x 3
.model lb_stat lb_pvalue
1 auto 59.3 0.00332
Fall 2021
Example: Cortecosteroid Drug Sales
fit <- h02 %>% model(best = ARIMA(log(Cost), stepwise = FALSE, approximation = FALSE)
report(fit)
ADM 4307 Business Forecasting Analytics 76
Series: Cost
Model: ARIMA(2,1,3)(0,1,1)[12]
Transformation: log(Cost)
Coefficients:
ar1 ar2 ma1 ma2 ma3 sma1
-1.0194 -0.8351 0.1717 0.2578 -0.4206 -0.6528
s.e. 0.1648 0.1203 0.2079 0.1177 0.1060 0.0657
sigma^2 estimated as 0.004203: log likelihood=250.8
AIC=-487.6 AICc=-486.99 BIC=-464.83
Fall 2021
Example: Cortecosteroid Drug Sales
fit %>% gg_tsresiduals(lag_max=36)
There are a few significant spikes in the ACF.
ADM 4307 Business Forecasting Analytics 77Fall 2021
Example: Cortecosteroid Drug Sales
augment(fit) %>% features(.innov, ljung_box, lag=36, dof=6)
The p value is less than 0.05. So, the residuals are correlated and the model fails the Ljung-Box
test.
ADM 4307 Business Forecasting Analytics 78
A tibble: 1 x 3
.model lb_stat lb_pvalue
1 best 46.1 0.0301
Fall 2021
Example: Cortecosteroid Drug Sales
ADM 4307 Business Forecasting Analytics 79
Forecasts from the 𝐴𝑅𝐼𝑀𝐴(3,0,1)(0,1,2)12 model (which has the lowest RMSE
value on the test set, and the best AICc value amongst models with only seasonal
differencing)
Fall 2021
Example: Cortecosteroid Drug Sales
• Models with lowest AICc values tend to give slightly better results than the
other models.
• AICc comparisons must have the same orders of differencing.
• But RMSE test set comparisons can involve any models.
• Use the best model available, even if it does not pass all tests.
ADM 4307 Business Forecasting Analytics 80Fall 2021
Example: Cortecosteroid Drug Sales
h02 %>% model(ARIMA(log(Cost) ~ 0 + pdq(3,0,1) + PDQ(0,1,2))) %>%
forecast() %>% autoplot(h02) +
labs(y=” $AU (millions)”, title=”Corticosteroid drug scripts (H02) sales”)
ADM 4307 Business Forecasting Analytics 81Fall 2021
Outline
• Review of last lecture
• Non-Seasonal ARIMA models
• Estimation and Order Selection
• ARIMA Modelling in R
• Seasonal ARIMA models
• Examples
• ARIMA vs ETS
ADM 4307 Business Forecasting Analytics 82Fall 2021
ARIMA vs ETS
• It is a common myth that ARIMA models are more general than exponential smoothing
• Linear exponential smoothing models are all special cases of ARIMA models
• Non-linear exponential smoothing models have no equivalent ARIMA counterparts
• Many ARIMA models that have no exponential smoothing counterparts
ADM 4307 Business Forecasting Analytics 83Fall 2021
ARIMA and ETS Equivalences
ADM 4307 Business Forecasting Analytics 84Fall 2021
Business Forecasting Analytics
ADM 4307 – Fall 2021
ARIMA Models
ADM 4307 Business Forecasting Analytics 85Fall 2021