CS计算机代考程序代写 Bayesian algorithm Week-5 ARIMA Models

Week-5 ARIMA Models

Some of the slides are adapted from the lecture notes provided by Prof. Antoine Saure and Prof. Rob Hyndman

Business Forecasting Analytics
ADM 4307 – Fall 2021

ARIMA Models (cont’d)

Ahmet Kandakoglu, PhD

01 November, 2021

Outline

• Review of last lecture

• Non-Seasonal ARIMA models

• Estimation and Order Selection

• ARIMA Modelling in R

• Seasonal ARIMA models

• Examples

• ARIMA vs ETS

ADM 4307 Business Forecasting Analytics 2Fall 2021

Review of Last Lecture

• Stationarity

• Differencing

• Backshift Notation

• Autoregressive Models

• Moving Average Models

ADM 4307 Business Forecasting Analytics 3Fall 2021

Exponential Smoothing vs. ARIMA

• ARIMA models provide another approach to time series forecasting

• ARIMA

• AR: autoregressive (lagged observations as inputs)

• I: integrated (differencing to make series stationary)

• MA: moving average (lagged errors as inputs)

• While exponential smoothing models were based on a description of trend and

seasonality in the data, ARIMA models aim to describe the autocorrelations in the

data.

• Exponential smoothing and ARIMA models are the two most widely used approaches

to time series forecasting.

ADM 4307 Business Forecasting Analytics 4Fall 2021

Stationarity

• A stationary series is:

• roughly horizontal

• constant variance

• no patterns predictable in the long-term

• Transformations help to stabilize the variance.

• For ARIMA modelling, we also need to stabilize the mean.

ADM 4307 Business Forecasting Analytics 5Fall 2021

Differencing

• One way to make a time series stationary is to compute the differences between

consecutive observations. This is known as differencing

• Transformations such as logarithms can help to stabilize the variance of a time series

• Differencing can help stabilize the mean of a time series by removing changes in the

level of a time series, and so eliminating trend and seasonality

• The differenced series is the change between consecutive observations in the original

series, and can be written as

ADM 4307 Business Forecasting Analytics 6Fall 2021

Second-Order Differencing

• Occasionally, the differenced data will not appear stationary and it may be

necessary to difference the data a second time to obtain a stationary series:

• In this case, 𝑦𝑡
′′ will have values 𝑇 − 2.

• Then we would model the change in the changes of the original data.

• In practice, it is almost never necessary to go beyond second-order

differences

ADM 4307 Business Forecasting Analytics 7Fall 2021

Seasonal Differencing

• A seasonal difference is the difference between an observation and the

corresponding observation from the previous season

where 𝑚 is the number of seasons.

• For monthly data 𝑚 = 12

• For quarterly data 𝑚 = 4

ADM 4307 Business Forecasting Analytics 8Fall 2021

Example 1: Antidiabetic Drug Sales

Monthly anti-diabetic drug sales in Australia

a10 <- PBS %>% filter(ATC2 == “A10”) %>%

summarise(Cost = sum(Cost)/1e6)

a10 %>% autoplot(Cost)

a10 %>% autoplot(log(Cost))

a10 %>% autoplot(log(Cost) %>% difference(12))

ADM 4307 Business Forecasting Analytics 9Fall 2021

Example 2: Cortecosteroid Drug Sales

h02 <- PBS %>% filter(ATC2 == “H02”) %>%

summarise(Cost = sum(Cost)/1e6)

h02 %>% autoplot(Cost)

h02 %>% autoplot(log(Cost))

h02 %>% autoplot(log(Cost) %>% difference(12))

h02 %>% autoplot(log(Cost) %>% difference(12) %>%

difference(1))

ADM 4307 Business Forecasting Analytics 10Fall 2021

Automatically Selecting Differences

h02 %>% mutate(log_sales = log(Cost)) %>% features(log_sales, unitroot_nsdiffs)

# A tibble: 1 x 1

nsdiffs

1 1

• Because unitroot_nsdiffs() returns 1 (indicating one seasonal difference is required), we apply

the ndiffs() function to the seasonally differenced data.

h02 %>% mutate(d_log_sales = difference(log(Cost), 12)) %>%

features(d_log_sales, unitroot_ndiffs)

# A tibble: 1 x 1

ndiffs

1 1

• Since unitroot_ndiffs() returns 1, one difference is required.

ADM 4307 Business Forecasting Analytics 11Fall 2021

Autoregressive Models

• In multiple regression, we forecast the variable of interest using a linear

combination of predictors. In an autoregression model, we forecast the

variable of interest using a linear combination of past values of the variable (a

regression of the variable against itself)

• An autoregressive model of order 𝑝, denoted AR(𝑝) model, can be written as:

• 𝑐 is a constant and 𝑒𝑡 is white noise. This is like a multiple regression but with
lagged values of 𝑦𝑡 as predictors

ADM 4307 Business Forecasting Analytics 12Fall 2021

Moving Average Models

• A moving average model of order 𝑞, denoted by 𝑀𝐴(𝑞), uses 𝑞 past forecast
errors in a regression-like model, where 𝑒𝑡 is white noise (This is a multiple
regression with past errors as predictors)

• Moving average models should not be confused with moving average

smoothing

• A moving average model is used for forecasting future values while moving

average smoothing is used for estimating the trend-cycle of past values

ADM 4307 Business Forecasting Analytics 13Fall 2021

Outline

• Review of last lecture

• Non-Seasonal ARIMA models

• Estimation and Order Selection

• ARIMA Modelling in R

• Seasonal ARIMA models

• Examples

• ARIMA vs ETS

ADM 4307 Business Forecasting Analytics 14Fall 2021

Non-Seasonal ARIMA Models

• If we combine differencing with autoregression and a moving average model,

we obtain a non-seasonal ARIMA model.

• ARIMA is an acronym for AutoRegressive Integrated Moving Average

model (integration in this context is the reverse of differencing).

• The full model can be written as:

• 𝑦𝑡
′ is the differenced series (may be more than once)

• The predictors include both lagged values of 𝑦𝑡 and lagged errors.

ADM 4307 Business Forecasting Analytics 15Fall 2021

Non-Seasonal ARIMA Models

• ARIMA(p, d, q) model

• AR: p = order of the autoregressive part

• I: d = degree of first differencing involved

• MA: q = order of the moving average part.

• Many of the models we have already discussed are special cases of the

ARIMA model as shown below:

• White noise : ARIMA(0,0,0)

• Random walk : ARIMA(0,1,0) with no constant

• Random walk with drift : ARIMA(0,1,0) with a constant

• Autoregression : ARIMA(p,0,0)

• Moving average : ARIMA(0,0,q)

ADM 4307 Business Forecasting Analytics 16Fall 2021

Backshift Notation for ARIMA

• Once we start combining components in this way to form more complicated

models, it is much easier to work with the backshift notation.

• Selecting appropriate values for p, d and q can be difficult. The ARIMA()

function in R will do it for you automatically.

ADM 4307 Business Forecasting Analytics 17

𝑀𝐴(𝑞)𝐴𝑅(𝑝) 𝑑 diffs

Fall 2021

Example – Egyptian Exports

global_economy %>% filter(Code == “EGY”) %>%

autoplot(Exports) + labs(y = “% of GDP”, title = “Egyptian Exports”)

ADM 4307 Business Forecasting Analytics 18Fall 2021

Example – Egyptian Exports

fit <- global_economy %>% filter(Code == “EGY”) %>% model(ARIMA(Exports))

report(fit)

ADM 4307 Business Forecasting Analytics 19

Series: Exports

Model: ARIMA(2,0,1) w/ mean

Coefficients:

ar1 ar2 ma1 constant

1.6764 -0.8034 -0.6896 2.5623

s.e. 0.1111 0.0928 0.1492 0.1161

sigma^2 estimated as 8.046: log likelihood=-141.57

AIC=293.13 AICc=294.29 BIC=303.43

This is an ARIMA(2,0,1) model:

𝑦𝑡 = 2.5623 + 1.6764𝑦𝑡−1 − 0.8034𝑦𝑡−2 − 0.6896𝜀𝑡−1 + 𝜀𝑡

where 𝜀𝑡 is white noise with a standard deviation of 2.837= 8.046

Fall 2021

Example – Egyptian Exports

gg_tsresiduals(fit)

ADM 4307 Business Forecasting Analytics 20Fall 2021

Example – Egyptian Exports

fit %>% forecast(h=10) %>%

autoplot(global_economy) + labs(y = “% of GDP”, title = “Egyptian Exports”)

ADM 4307 Business Forecasting Analytics 21Fall 2021

Understanding ARIMA Models

• The ARIMA() function is useful, but anything automated can be a little dangerous.

• It is worth understanding something of the behavior of the models.

• The constant 𝑐 has an important effect on the long-term forecasts obtained from these models:

• If c = 0 and d = 0, the long-term forecasts will go to zero

• If c = 0 and d = 1, the long-term forecasts will go to a non-zero constant

• If c = 0 and d = 2, the long-term forecasts will follow a straight line

• If c ≠ 0 and d = 0, the long-term forecasts will go to the mean of the data

• If c ≠ 0 and d = 1, the long-term forecasts will follow a straight line

• If c ≠ 0 and d = 2, the long-term forecasts will follow a quadratic trend

• The value of 𝑑 also has an effect on the prediction intervals — the higher the value of 𝑑, the
more rapidly the prediction intervals increase in size.

ADM 4307 Business Forecasting Analytics 22Fall 2021

Outline

• Review of last lecture

• Non-Seasonal ARIMA models

• Estimation and Order Selection

• ARIMA Modelling in R

• Seasonal ARIMA models

• Examples

• ARIMA vs ETS

ADM 4307 Business Forecasting Analytics 23Fall 2021

Maximum Likelihood Estimation

• Once the model order has been identified (i.e., the values of 𝑝, d and q), we need to
estimate the parameters c, ∅1, … , ∅𝑝, 𝜃1, … , 𝜃𝑞

• When R estimates the ARIMA model, it uses maximum likelihood estimation (MLE).

• MLE is very similar to the least squares estimates that would be obtained by

minimizing

𝑡−1

𝑇

𝑒𝑡
2

• The ARIMA() function allows MLE estimation using a non-linear optimization

• Different software will give slightly different estimates.

ADM 4307 Business Forecasting Analytics 24Fall 2021

Information Criteria

• Akaike’s Information Criterion (AIC):

• Corrected AIC (AICc):

• Bayesian Information Criterion (BIC):

• These are useful for determining the order of an ARIMA model. Good models are

obtained by minimizing the AIC, AICc or BIC.

• Our preference is to use the AICc (only for selecting the values of p and q)

ADM 4307 Business Forecasting Analytics 25Fall 2021

ACF and PACF Plots

• It is usually not possible to tell, simply from a time plot, what values of p and q are
appropriate for the data.

• However, it is sometimes possible to use the ACF plot, and the closely related PACF

plot, to determine appropriate values for p and q.

• An ACF plot shows the autocorrelations which measure the relationship between

𝑦𝑡 and 𝑦𝑡−1 for different values of 𝑘.

• Now if 𝑦𝑡 and 𝑦𝑡−1 are correlated, then 𝑦𝑡−1 and 𝑦𝑡−2 must also be correlated. But
then 𝑦𝑡 and 𝑦𝑡−2 might be correlated, simply because they are both connected to
𝑦𝑡−1.

ADM 4307 Business Forecasting Analytics 26Fall 2021

ACF and PACF Plots

• Partial autocorrelations measure the relationship between 𝑦𝑡 and 𝑦𝑡−𝑘 after removing
the effects of other time lags 1, 2, . . , 𝑘 − 1.

• So the first partial autocorrelation is identical to the first autocorrelation, because

there is nothing between them to remove. The partial autocorrelations for lags 2, 3

and greater are calculated as follows:

• Varying the number of terms on the right hand side of this autoregression model gives

𝑎𝑘 for different values of 𝑘.

ADM 4307 Business Forecasting Analytics 27Fall 2021

Example – Egyptian Exports

egypt <- global_economy %>% filter(Code == “EGY”)

egypt %>% ACF(Exports) %>% autoplot()

egypt %>% PACF(Exports) %>% autoplot()

ADM 4307 Business Forecasting Analytics 28Fall 2021

ACF and PACF Plots

• A convenient way to produce a time plot, ACF plot and PACF plot in one command is

to use the gg_tsdisplay() function with plot_type = “partial“.

• If the data are from an ARIMA(p,d,0) or ARIMA(0,d,q) model, then the ACF and

PACF plots can be helpful in determining the value of p or q.

• If p and q are both positive, then the plots do not help in finding suitable values of p

and q.

ADM 4307 Business Forecasting Analytics 29Fall 2021

Example – Egyptian Exports

egypt <- global_economy %>% filter(Code == “EGY”)

egypt %>% gg_tsdisplay(Exports, plot_type=’partial’)

ADM 4307 Business Forecasting Analytics 30Fall 2021

ACF and PACF Plots

• The data may follow an ARIMA(p,d,0) model if the ACF and PACF plots of the

differenced data show the following patterns:

• The ACF is exponentially decaying or sinusoidal

• There is a significant spike at lag p in the PACF, but none beyond lag p

• The data may follow an ARIMA(0,d,q) model if the ACF and PACF plots of the

differenced data show the following patterns:

• The PACF is exponentially decaying or sinusoidal

• There is a significant spike at lag q in ACF, but none beyond lag q

ADM 4307 Business Forecasting Analytics 31Fall 2021

Example – Egyptian Exports

• There is a decaying sinusoidal pattern in the

ACF.

• PACF shows the last significant spike at lag 4.

• So in this case, the ACF and PACF lead us to

think an ARIMA(4,0,0) model might be

appropriate.

ADM 4307 Business Forecasting Analytics 32Fall 2021

Example – Egyptian Exports

egypt <- global_economy %>% filter(Code == “EGY”)

fit2 <- egypt %>% model(ARIMA(Exports ~ pdq(4,0,0)))

report(fit2)

ADM 4307 Business Forecasting Analytics 33

Series: Exports

Model: ARIMA(4,0,0) w/ mean

Coefficients:

ar1 ar2 ar3 ar4 constant

0.9861 -0.1715 0.1807 -0.3283 6.6922

s.e. 0.1247 0.1865 0.1865 0.1273 0.3562

sigma^2 estimated as 7.885: log likelihood=-140.53

AIC=293.05 AICc=294.7 BIC=305.41

This model is only slightly worse than the ARIMA(2,0,1) model identified by ARIMA() (with

an AICc value of 294.70 compared to 294.29).

Fall 2021

Order Selection using Particular Values

• We can also specify particular values of pdq() that ARIMA() can search for.

• For example, to find the best ARIMA model with p∈{1,2,3}, q∈{0,1,2} and d=1, you
could use ARIMA(y ~ pdq(p=1:3, d=1, q=0:2)).

ADM 4307 Business Forecasting Analytics 34Fall 2021

Example – Egyptian Exports

egypt <- global_economy %>% filter(Code == “EGY”)

fit3 <- egypt %>% model(ARIMA(Exports ~ pdq(p=2:6, d=0, q=0:2)))

report(fit3)

ADM 4307 Business Forecasting Analytics 35

Series: Exports

Model: ARIMA(2,0,1) w/ mean

Coefficients:

ar1 ar2 ma1 constant

1.6764 -0.8034 -0.6896 2.5623

s.e. 0.1111 0.0928 0.1492 0.1161

sigma^2 estimated as 8.046: log likelihood=-141.57

AIC=293.13 AICc=294.29 BIC=303.43

Fall 2021

Outline

• Review of last lecture

• Non-Seasonal ARIMA models

• Estimation and Order Selection

• ARIMA Modelling in R

• Seasonal ARIMA models

• Examples

• ARIMA vs ETS

ADM 4307 Business Forecasting Analytics 36Fall 2021

How does ARIMA() work?

• The ARIMA() function in R uses a variation of the Hyndman and Khandakar

algorithm, which combines KPSS unit root tests, minimization of the AICc and MLE to

obtain an ARIMA model.

Hyndman-Khandakar algorithm for automatic ARIMA modelling

• Select number of differences d and D via KPSS test and seasonal strength measure.

• Select p, q by minimizing AICc.

• Use stepwise search to traverse model space

ADM 4307 Business Forecasting Analytics 37Fall 2021

Modelling Procedure with ARIMA

1. Plot the data and identify any unusual observations

2. If necessary, transform the data (using a Box-Cox transformation) to stabilize the

variance

3. If the data are non-stationary, take first differences of the data until the data are

stationary

4. Examine the ACF/PACF: Is an ARIMA(p,d,0) or ARIMA(0,d,q) model appropriate?

5. Try your chosen model(s) and use the AICc to search for a better model

6. Check the residuals from your chosen model by plotting the ACF of the residuals

and doing a portmanteau test. If they do not look like white noise, try a modified

model

7. Once the residuals look like white noise, calculate forecasts

ADM 4307 Business Forecasting Analytics 38Fall 2021

Automatic Modelling Procedure with ARIMA

1. Plot the data and identify any unusual observations

2. If necessary, transform the data (using a Box-Cox transformation) to stabilize the

variance

3. Use ARIMA() to automatically select a model.

4. Check the residuals from your chosen model by plotting the ACF of the residuals

and doing a portmanteau test. If they do not look like white noise, try a modified

model

5. Once the residuals look like white noise, calculate forecasts

ADM 4307 Business Forecasting Analytics 39Fall 2021

Modelling Procedure

ADM 4307 Business Forecasting Analytics 40Fall 2021

Example – Central African Republic Exports

caf <- global_economy %>% filter(Code == “CAF”)

caf %>% autoplot(Exports) + labs(title=”Central African Republic exports”, y=”% of GDP”)

ADM 4307 Business Forecasting Analytics 41Fall 2021

Example – Central African Republic Exports

Step 1. The time plot shows some non-stationarity, with an overall decline.

Step 2. There is no evidence of changing variance, so we will not do a Box-Cox

transformation.

Step 3. To address the non-stationarity, we will take a first difference of the data.

ADM 4307 Business Forecasting Analytics 42Fall 2021

Example – Central African Republic Exports

caf %>% gg_tsdisplay(difference(Exports), plot_type=’partial’)

ADM 4307 Business Forecasting Analytics 43Fall 2021

Example – Central African Republic Exports

Step 4. PACF is suggestive of AR(2); so an initial candidate model is an

ARIMA(2,1,0).

The ACF suggests an MA(3) model; so an alternative candidate is an

ARIMA(0,1,3).

Step 5. We fit both an ARIMA(2,1,0) and an ARIMA(0,1,3) model along with two

automated model selections, one using the default stepwise procedure,

and one working harder to search a larger model space.

ADM 4307 Business Forecasting Analytics 44Fall 2021

Example – Central African Republic Exports

caf_fit <- caf %>%

model(arima210 = ARIMA(Exports ~ pdq(2,1,0)),

arima013 = ARIMA(Exports ~ pdq(0,1,3)),

stepwise = ARIMA(Exports),

search = ARIMA(Exports, stepwise=FALSE))

caf_fit

ADM 4307 Business Forecasting Analytics 45

# A mable: 1 x 5

# Key: Country [1]

Country arima210 arima013 stepwise search

1 Central Afr~

Fall 2021

Example – Central African Republic Exports

accuracy(caf_fit)

glance(caf_fit) %>% arrange(AICc) %>% select(.model:BIC)

ADM 4307 Business Forecasting Analytics 46

# A tibble: 4 x 11

Country .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1

1 Central African Republic arima~ Trai~ -0.343 2.52 1.87 -2.97 9.14 0.853 0.870 0.0417

2 Central African Republic arima~ Trai~ -0.215 2.47 1.86 -1.97 9.12 0.849 0.851 -0.0417

3 Central African Republic stepw~ Trai~ -0.265 2.42 1.82 -2.33 8.93 0.832 0.836 -0.0417

4 Central African Republic search Trai~ -0.269 2.46 1.87 -2.39 9.18 0.853 0.850 -0.0470

# A tibble: 4 x 6

.model sigma2 log_lik AIC AICc BIC

1 search 6.52 -133. 274. 275. 282.

2 arima210 6.71 -134. 275. 275. 281.

3 arima013 6.54 -133. 274. 275. 282.

4 stepwise 6.42 -132. 274. 275. 284.

The full search has
found that an

ARIMA(3,1,0) gives
the lowest AICc value

Fall 2021

Example – Central African Republic Exports

caf_fit %>% select(search) %>% gg_tsresiduals()

Step 6. The ACF plot of the residuals from the ARIMA(3,1,0) model shows that all

autocorrelations are within the threshold limits, indicating that the residuals are

behaving like white noise.

ADM 4307 Business Forecasting Analytics 47Fall 2021

Example – Central African Republic Exports

augment(caf_fit) %>% filter(.model==’search’) %>%

features(.innov, ljung_box, lag = 10, dof = 3)

A portmanteau test returns a large p-value, also suggesting that the residuals are white

noise.

ADM 4307 Business Forecasting Analytics 48

# A tibble: 1 x 4

Country .model lb_stat lb_pvalue

1 Central African Republic search 5.75 0.569

Fall 2021

Example – Central African Republic Exports

Step 7. Forecasts from the chosen model

caf_fit %>% forecast(h=5) %>% filter(.model==’search’) %>% autoplot(global_economy)

caf_fit %>% select(search) %>% forecast(h=5)

ADM 4307 Business Forecasting Analytics 49

# A fable: 5 x 4 [1Y]

# Key: .model [1]

.model Year Exports .mean

1 search 2018 N(13, 6.5) 12.5

2 search 2019 N(13, 8.5) 12.6

3 search 2020 N(13, 11) 12.5

4 search 2021 N(13, 15) 12.5

5 search 2022 N(13, 18) 12.5

Fall 2021

Outline

• Review of last lecture

• Non-Seasonal ARIMA models

• Estimation and Order Selection

• ARIMA Modelling in R

• Seasonal ARIMA models

• Examples

• ARIMA vs ETS

ADM 4307 Business Forecasting Analytics 50Fall 2021

Seasonal ARIMA Models

• A seasonal ARIMA model includes additional seasonal terms. It is written as follows:

where m = number of periods per season

• We will use uppercase notation for the seasonal parts of the model and lowercase

notation for the non-seasonal parts

ADM 4307 Business Forecasting Analytics 51Fall 2021

Seasonal ARIMA Models

• The seasonal part of the model consists of terms that are very similar to the non-

seasonal components, but they involve backshifts of the seasonal period

• For example, an 𝐴𝑅𝐼𝑀𝐴(1,1,1)(1,1,1)4 model (without a constant) is for quarterly data
(m = 4) and can be written as:

ADM 4307 Business Forecasting Analytics 52Fall 2021

Seasonal ARIMA Models

• The seasonal part of an AR or MA model will be seen in the seasonal lags of the

PACF and ACF

• For example, an 𝐴𝑅𝐼𝑀𝐴(0,0,0)(0,0,1)12 model will show:

• A spike at lag 12 in the ACF but no other significant spikes

• The PACF will show exponential decay in the seasonal lags. That is, at lags 12,

24, 36,…

• Similarly, an 𝐴𝑅𝐼𝑀𝐴(0,0,0)(1,0,0)12 model will show:

• Exponential decay in the seasonal lags of the ACF

• A single significant spike at lag 12 in the PACF

ADM 4307 Business Forecasting Analytics 53Fall 2021

Outline

• Review of last lecture

• Non-Seasonal ARIMA models

• Estimation and Order Selection

• ARIMA Modelling in R

• Seasonal ARIMA models

• Examples

• ARIMA vs ETS

ADM 4307 Business Forecasting Analytics 54Fall 2021

Example: Monthly US Employment

We will describe seasonal ARIMA modelling using monthly US employment data for leisure and hospitality

jobs from January 2000 to September 2019.

leisure <- us_employment %>% filter(Title == “Leisure and Hospitality”,

year(Month) > 2000) %>% mutate(Employed = Employed/1000) %>% select(Month, Employed)

autoplot(leisure, Employed) + labs(y=”Number of people (millions)”)

ADM 4307 Business Forecasting Analytics 55Fall 2021

Example: Monthly US Employment

The data are clearly non-stationary, with strong seasonality and a nonlinear trend, so we will first take a

seasonal difference.

leisure %>% gg_tsdisplay(difference(Employed, 12),

plot_type=’partial’, lag=36) + labs(title=”Seasonally differenced”, y=””)

ADM 4307 Business Forecasting Analytics 56Fall 2021

Example: Monthly US Employment

These are also clearly non-stationary, so we take a further first difference.

leisure %>% gg_tsdisplay(difference(Employed, 12) %>% difference(),

plot_type=’partial’, lag=36) + labs(title = “Double differenced”, y=””)

ADM 4307 Business Forecasting Analytics 57Fall 2021

Example: Monthly US Employment

These are also clearly non-stationary, so we take a further first difference.

leisure %>% gg_tsdisplay(difference(Employed, 12) %>% difference(),

plot_type=’partial’, lag=36) + labs(title = “Double differenced”, y=””)

ADM 4307 Business Forecasting Analytics 58Fall 2021

Example: Monthly US Employment

• Our aim now is to find an appropriate ARIMA model

based on the ACF and PACF

• Let’s start with the ACF

• The significant spike at lag 2 in the ACF suggests a

non-seasonal MA(2) component.

• The significant spike at lag 12 in the ACF suggests

a seasonal MA(1) component.

• Consequently, we begin with an

ARIMA(0,1,2)(0,1,1)12 model, indicating

• a first difference,

• a seasonal difference,

• and non-seasonal MA(2) and

• seasonal MA(1) component

ADM 4307 Business Forecasting Analytics 59Fall 2021

Example: Monthly US Employment

• Let’s start with the PACF (using the PACF to select

the non-seasonal part of the model and the ACF to

select the seasonal part of the model)

• We may have selected an ARIMA(2,1,0)(0,1,1)12
model, indicating

• a first difference,

• a seasonal difference,

• and non-seasonal AR(2) and

• seasonal MA(1) component

• We will also include an automatically selected

model. By setting stepwise=FALSE and

approximation=FALSE, we are making R work

extra hard to find a good model.

ADM 4307 Business Forecasting Analytics 60Fall 2021

Example: Monthly US Employment

fit <- leisure %>%

model(

arima012011 = ARIMA(Employed ~ pdq(0,1,2) + PDQ(0,1,1)),

arima210011 = ARIMA(Employed ~ pdq(2,1,0) + PDQ(0,1,1)),

auto = ARIMA(Employed, stepwise = FALSE, approx = FALSE)

)

fit

ADM 4307 Business Forecasting Analytics 61

# A mable: 1 x 3

arima012011 arima210011 auto

1

Fall 2021

Example: Monthly US Employment

glance(fit) %>% arrange(AICc) %>% select(.model:BIC)

The three fitted models have similar AICc values, with the automatically selected model

being a little better.

ADM 4307 Business Forecasting Analytics 62

# A tibble: 3 x 6

.model sigma2 log_lik AIC AICc BIC

1 auto 0.00142 395. -780. -780. -763.

2 arima210011 0.00145 392. -776. -776. -763.

3 arima012011 0.00146 391. -775. -775. -761.

Fall 2021

Example: Monthly US Employment

fit %>% select(auto) %>% gg_tsresiduals(lag=36)

One small but significant spike (at lag 11) out of 36 is still consistent with white noise

ADM 4307 Business Forecasting Analytics 63Fall 2021

Example: Monthly US Employment

augment(fit) %>% features(.innov, ljung_box, lag=24, dof=4)

We use a Ljung-Box test, which has a large p-value, confirming that the residuals are similar to

white noise.

(The p value is greater than 0.05. So, the residuals are independent)

ADM 4307 Business Forecasting Analytics 64

# A tibble: 3 x 3

.model lb_stat lb_pvalue

1 arima012011 22.4 0.320

2 arima210011 18.9 0.527

3 auto 16.6 0.680

Fall 2021

Example: Monthly US Employment

• We now have a seasonal ARIMA model that passes the required checks and is ready for

forecasting. Forecast from the model for the next three years.

• The forecasts have captured the seasonal pattern very well, and the increasing trend extends

the recent pattern.

fit %>% forecast(h=36) %>% filter(.model==’auto’) %>% autoplot(leisure) +

labs(title = “US employment: leisure and hospitality”, y=”Number of people (millions)”)

ADM 4307 Business Forecasting Analytics 65Fall 2021

Example: Cortecosteroid Drug Sales

We will try to forecast monthly corticosteroid (H02) drug sales in Australia.

h02 <- PBS %>% filter(ATC2 == “H02”) %>% summarise(Cost = sum(Cost)/1e6)

h02 %>% autoplot(Cost)

There is a small increase in the variance with the level, so we take logarithms to stabilize the variance

ADM 4307 Business Forecasting Analytics 66Fall 2021

Example: Cortecosteroid Drug Sales

h02 %>% autoplot(log(Cost))

The data are strongly seasonal and obviously non-stationary, so seasonal differencing will be used.

ADM 4307 Business Forecasting Analytics 67Fall 2021

Example: Cortecosteroid Drug Sales

h02 %>% autoplot(log(Cost) %>% difference(12))

It is not clear at this point whether we should do another difference or not.

We decide not to, but the choice is not obvious.

ADM 4307 Business Forecasting Analytics 68Fall 2021

Example: Cortecosteroid Drug Sales

h02 %>% gg_tsdisplay(difference(log(Cost), 12), plot_type=’partial’, lag_max = 36)

ADM 4307 Business Forecasting Analytics 69

• Choose D = 1 and d = 0.

• Spikes in the PACF at lags 12 and 24

suggest seasonal 𝐴𝑅(2) term.

• Three significant spikes in the PACF

suggests possible non-seasonal 𝐴𝑅(3)
term.

• The pattern in the ACF is not indicative of

any simple model.

• Initial candidate model:

𝐴𝑅𝐼𝑀𝐴(3,0,0)(2,1,0)12

Fall 2021

Example: Cortecosteroid Drug Sales

• We fit 𝐴𝑅𝐼𝑀𝐴(3,0,0)(2,1,0)12, along with some variations on it, and compute the AICc values

• Of these models, the best is the 𝐴𝑅𝐼𝑀𝐴(3,0,1)(0,1,2)12 model (i.e., it has the smallest AICc
value)

ADM 4307 Business Forecasting Analytics 70Fall 2021

Example: Cortecosteroid Drug Sales

fit <- h02 %>% model(ARIMA(log(Cost) ~ pdq(3,0,1) + PDQ(0,1,2)))

fit %>% gg_tsresiduals(lag_max=36)

There are a few significant spikes in the ACF.

ADM 4307 Business Forecasting Analytics 71Fall 2021

Example: Cortecosteroid Drug Sales

augment(fit) %>% features(.innov, ljung_box, lag=36, dof=6)

The p value is less than 0.05. So, the residuals are correlated

There are a few significant spikes in the ACF, and the model fails the Ljung-Box test.

The model can still be used for forecasting, but the prediction intervals may not be accurate due

to the correlated residuals.

ADM 4307 Business Forecasting Analytics 72

# A tibble: 1 x 3

.model lb_stat lb_pvalue

1 ARIMA(log(Cost) ~ pdq(3, 0, 1) + PDQ(0, 1, 2)) 44.8 0.0406

Fall 2021

Example: Cortecosteroid Drug Sales

fit <- h02 %>% model(auto = ARIMA(log(Cost)))

report(fit)

ADM 4307 Business Forecasting Analytics 73

Series: Cost

Model: ARIMA(2,1,0)(0,1,1)[12]

Transformation: log(Cost)

Coefficients:

ar1 ar2 sma1

-0.8491 -0.4207 -0.6401

s.e. 0.0712 0.0714 0.0694

sigma^2 estimated as 0.004387: log likelihood=245.39

AIC=-482.78 AICc=-482.56 BIC=-469.77

Fall 2021

Example: Cortecosteroid Drug Sales

fit %>% gg_tsresiduals(lag_max=36)

There are a few significant spikes in the ACF.

ADM 4307 Business Forecasting Analytics 74Fall 2021

Example: Cortecosteroid Drug Sales

augment(fit) %>% features(.innov, ljung_box, lag=36, dof=3)

The p value is less than 0.05. So, the residuals are correlated

There are a few significant spikes in the ACF, and the model fails the Ljung-Box test.

The model can still be used for forecasting, but the prediction intervals may not be accurate due

to the correlated residuals.

ADM 4307 Business Forecasting Analytics 75

# A tibble: 1 x 3

.model lb_stat lb_pvalue

1 auto 59.3 0.00332

Fall 2021

Example: Cortecosteroid Drug Sales

fit <- h02 %>% model(best = ARIMA(log(Cost), stepwise = FALSE, approximation = FALSE)

report(fit)

ADM 4307 Business Forecasting Analytics 76

Series: Cost

Model: ARIMA(2,1,3)(0,1,1)[12]

Transformation: log(Cost)

Coefficients:

ar1 ar2 ma1 ma2 ma3 sma1

-1.0194 -0.8351 0.1717 0.2578 -0.4206 -0.6528

s.e. 0.1648 0.1203 0.2079 0.1177 0.1060 0.0657

sigma^2 estimated as 0.004203: log likelihood=250.8

AIC=-487.6 AICc=-486.99 BIC=-464.83

Fall 2021

Example: Cortecosteroid Drug Sales

fit %>% gg_tsresiduals(lag_max=36)

There are a few significant spikes in the ACF.

ADM 4307 Business Forecasting Analytics 77Fall 2021

Example: Cortecosteroid Drug Sales

augment(fit) %>% features(.innov, ljung_box, lag=36, dof=6)

The p value is less than 0.05. So, the residuals are correlated and the model fails the Ljung-Box

test.

ADM 4307 Business Forecasting Analytics 78

A tibble: 1 x 3

.model lb_stat lb_pvalue

1 best 46.1 0.0301

Fall 2021

Example: Cortecosteroid Drug Sales

ADM 4307 Business Forecasting Analytics 79

Forecasts from the 𝐴𝑅𝐼𝑀𝐴(3,0,1)(0,1,2)12 model (which has the lowest RMSE
value on the test set, and the best AICc value amongst models with only seasonal

differencing)

Fall 2021

Example: Cortecosteroid Drug Sales

• Models with lowest AICc values tend to give slightly better results than the

other models.

• AICc comparisons must have the same orders of differencing.

• But RMSE test set comparisons can involve any models.

• Use the best model available, even if it does not pass all tests.

ADM 4307 Business Forecasting Analytics 80Fall 2021

Example: Cortecosteroid Drug Sales

h02 %>% model(ARIMA(log(Cost) ~ 0 + pdq(3,0,1) + PDQ(0,1,2))) %>%

forecast() %>% autoplot(h02) +

labs(y=” $AU (millions)”, title=”Corticosteroid drug scripts (H02) sales”)

ADM 4307 Business Forecasting Analytics 81Fall 2021

Outline

• Review of last lecture

• Non-Seasonal ARIMA models

• Estimation and Order Selection

• ARIMA Modelling in R

• Seasonal ARIMA models

• Examples

• ARIMA vs ETS

ADM 4307 Business Forecasting Analytics 82Fall 2021

ARIMA vs ETS

• It is a common myth that ARIMA models are more general than exponential smoothing

• Linear exponential smoothing models are all special cases of ARIMA models

• Non-linear exponential smoothing models have no equivalent ARIMA counterparts

• Many ARIMA models that have no exponential smoothing counterparts

ADM 4307 Business Forecasting Analytics 83Fall 2021

ARIMA and ETS Equivalences

ADM 4307 Business Forecasting Analytics 84Fall 2021

Business Forecasting Analytics
ADM 4307 – Fall 2021

ARIMA Models

ADM 4307 Business Forecasting Analytics 85Fall 2021