Week-12 Practical Forecasting Issues
Some of the slides are adapted from the lecture notes provided by Prof. Antoine Saure and Prof. Rob Hyndman
Business Forecasting Analytics
ADM 4307 – Fall 2021
Practical Forecasting Issues
Ahmet Kandakoglu, PhD
22 November, 2021
Outline
• Models for different frequencies
• Ensuring forecasts stay within limits
• Forecast combinations
• Missing values
• Outliers
• Choosing the right forecasting technique
ADM 4307 Business Forecasting Analytics 2Fall 2021
Models for Different Frequencies
• Models for annual data
• ETS, ARIMA, Dynamic regression
• Models for quarterly data
• ETS, ARIMA/SARIMA, Dynamic regression, Dynamic harmonic regression, STL+ETS,
STL+ARIMA
• Models for monthly data
• ETS, ARIMA/SARIMA, Dynamic regression, Dynamic harmonic regression, STL+ETS,
STL+ARIMA
• Models for weekly data
• ARIMA/SARIMA, Dynamic regression, Dynamic harmonic regression, STL+ETS,
STL+ARIMA, TBATS
• Models for daily, hourly and other sub-daily data
• ARIMA/SARIMA, Dynamic regression, Dynamic harmonic regression, STL+ETS,
STL+ARIMA, TBATS
ADM 4307 Business Forecasting Analytics 3Fall 2021
Ensuring Forecasts Stay within Limits
• It is common to want forecasts to be positive, or to require them to be within some
specified range [a, b].
• Both of these situations are relatively easy to handle using transformations.
• Positive Forecasts
• To impose a positivity constraint, simply work on the log scale, by specifying the
Box-Cox parameter λ=0.
• Interval Forecasts
• We can transform the data using a scaled logit transform which maps (a, b) to the
whole real line:
ADM 4307 Business Forecasting Analytics 4Fall 2021
Ensuring Forecasts Stay within Limits
• For example, consider the real price of a dozen eggs (1900-1993; in cents):
ADM 4307 Business Forecasting Analytics 5
Constrained to be lie between 50 and 400. Constrained to be positive.
Fall 2021
Forecast Combinations
• An easy way to improve forecast accuracy is to use several different methods on the
same time series, and to average the resulting forecasts.
• Clemen (1989):
• The results have been virtually unanimous: combining multiple forecasts leads to
increased forecast accuracy. In many cases one can make dramatic performance
improvements by simply averaging the forecasts.
• While there has been considerable research on using weighted averages, or some
other more complicated combination approach, using a simple average has proven
hard to beat.
ADM 4307 Business Forecasting Analytics 6Fall 2021
Example – Expenditure on Eating Out
We form a combination in the mutate() function by simply taking a average of the estimated
models.
auscafe <- aus_retail %>%
filter(Industry == “Takeaway food services”) %>%
summarise(Turnover = sum(Turnover))
train <- auscafe %>% filter(year(Month) <= 2013) cafe_models <- train %>%
model(
ETS = ETS(Turnover),
ARIMA = ARIMA(log(Turnover))
) %>%
mutate(Combination = (ETS + ARIMA) / 2)
cafe_fc <- cafe_models %>% forecast(h = “5 years”)
ADM 4307 Business Forecasting Analytics 7Fall 2021
Example – Expenditure on Eating Out
Forecast combinations
cafe_fc %>% autoplot(auscafe %>% filter(year(Month) > 2008), level = NULL) +
labs(y = “$ billion”, title = “Australian monthly expenditure on eating out”)
ADM 4307 Business Forecasting Analytics 8Fall 2021
Example – Expenditure on Eating Out
cafe_fc %>% accuracy(auscafe) %>% arrange(RMSE)
ADM 4307 Business Forecasting Analytics 9
# A tibble: 3 x 10
.model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
1 ARIMA Test -25.4 46.2 38.9 -1.77 2.65 0.949 0.890 0.786
2 Combination Test 30.6 57.4 45.1 1.87 3.02 1.10 1.10 0.814
3 ETS Test 86.5 122. 101. 5.51 6.66 2.46 2.35 0.880
Fall 2021
Missing Values
• Missing data can arise for many reasons.
• It is worth considering whether the missingness will induce bias in the forecasting
model.
• When missing values cause errors, there are at least two ways to handle the problem.
• We could just take the section of data after the last missing value, assuming there
is a long enough series of observations to produce meaningful forecasts.
• We could replace the missing values with estimates. The interpolate() function is
designed for this purpose.
ADM 4307 Business Forecasting Analytics 10Fall 2021
Missing Values
• Some methods allow for missing values without any problems, some not.
• Functions which can handle missing values.
• ARIMA()
• TSLM()
• NNETAR()
• VAR()
• FASSTER()
• Models which cannot handle missing values
• ETS()
• STL()
• TBATS()
ADM 4307 Business Forecasting Analytics 11Fall 2021
Example – Daily Gold Prices
We will use the “gold” dataset in “forecast” library (Install and load the library to access the
dataset).
gold <- as_tsibble(gold) gold %>% autoplot(value)
ADM 4307 Business Forecasting Analytics 12Fall 2021
Example – Daily Gold Prices
gold_complete <- gold %>% model(ARIMA(value)) %>% interpolate(gold)
gold_complete %>% autoplot(value, colour = “red”) + autolayer(gold, value)
ADM 4307 Business Forecasting Analytics 13Fall 2021
Outliers
• Outliers are observations that are very different from the majority of the observations
in the time series.
• They may be errors, or they may simply be unusual.
• All of the methods we have considered in this course will not work well if there are
extreme outliers in the data.
• In this case, we may wish to replace them with missing values, or with an estimate
that is more consistent with the majority of the data.
• Simply replacing outliers without thinking about why they have occurred may be a
dangerous practice. They may provide useful information which should be taken into
account when forecasting.
ADM 4307 Business Forecasting Analytics 14Fall 2021
Example – Australia Visitors
Number of visitors to the Adelaide Hills region of South Australia.
There appears to be an unusual observation in 2002 Q4.
tourism %>% filter(Region == “Adelaide Hills”, Purpose == “Visiting”) %>%
autoplot(Trips) + labs(title = “Quarterly overnight trips to Adelaide Hills”, y = “Number of trips”)
ADM 4307 Business Forecasting Analytics 15Fall 2021
Example – Australia Visitors
One useful way to find outliers is to apply STL() to the series with the argument robust=TRUE.
Then any outliers should show up in the remainder series.
Since the data have almost no visible seasonality, so we will apply STL without a seasonal
component by setting period=1.
ah_decomp <- tourism %>%
filter(
Region == “Adelaide Hills”, Purpose == “Visiting”
) %>%
# Fit a non-seasonal STL decomposition
model(
stl = STL(Trips ~ season(period = 1), robust = TRUE)
) %>%
components()
ah_decomp %>% autoplot()
ADM 4307 Business Forecasting Analytics 16Fall 2021
Example – Australia Visitors
In more challenging cases using a boxplot of the remainder series would be useful.
A stricter rule is to define outliers as those that are greater than 3 interquartile ranges (IQRs)
from the central 50% of the data.
outliers <- ah_decomp %>%
filter(
remainder < quantile(remainder, 0.25) - 3*IQR(remainder) | remainder > quantile(remainder, 0.75) + 3*IQR(remainder)
)
outliers
ADM 4307 Business Forecasting Analytics 17
# A dable: 1 x 9 [1Q]
# Key: Region, State, Purpose, .model [1]
# : Trips = trend + remainder
Region State Purpose .model Quarter Trips trend remainder season_adjust
1 Adelaide Hills Sout~ Visiti~ stl 2002 Q4 81.1 11.1 70.0 81.1
Fall 2021
Choosing a Forecasting Technique
• No single technique works in every situation
• Two most important factors
• Cost
• Accuracy
• Other factors in selecting a forecasting technique:
• Relevance and availability of historical data
• Forecasting horizon
• Time available for making the analysis
• Pattern of data
ADM 4307 Business Forecasting Analytics 18Fall 2021
Business Forecasting Analytics
ADM 4307 – Fall 2021
Practical Forecasting Issues
ADM 4307 Business Forecasting Analytics 19Fall 2021