Week-12 Practical Forecasting Issues
Some of the slides are adapted from the lecture notes provided by Prof. Antoine Saure and Prof. Rob Hyndman
Business Forecasting Analytics
ADM 4307 – Fall 2021
Review
Ahmet Kandakoglu, PhD
29 November, 2021
What is forecasting?
• It is about predicting the future as accurately as possible, given all of the
information available, including historical data and knowledge of any future
events that might impact the forecasts
• It is the process of making predictions of the future based on past and present
data.
I see that you will get a 90 in
Forecasting Analytics this semester.
ADM 4307 Business Forecasting Analytics 2Fall 2021
Features Common to All Forecasts
• Assumes causal system
past ==> future
• Forecasts rarely perfect because of randomness
• Forecasts more accurate for
groups vs. individuals
• Forecast accuracy decreases
as time horizon increases
ADM 4307 Business Forecasting Analytics 3Fall 2021
Approaches to Forecasting
• Qualitative: Judgmental methods
• Non-quantitative analysis of subjective inputs
• Considers “soft” information such as human factors, experience, gut instinct
• Quantitative: Analyze “hard” data
• Time series models
• Extends historical patterns of numerical data
• Associative (causal) models
• Create equations with explanatory variables to predict the future
ADM 4307 Business Forecasting Analytics 4Fall 2021
Quantitative Forecasting
• Conditions for their application:
• Information about the past is available
• Information can be quantified in numerical data
• Some aspects of the past pattern will continue into the future (continuity
assumption)
• Two extremes:
• Intuitive or ad hoc methods (simple, based on empirical experience, and no
accuracy information)
• Formal quantitative methods based on statistical principles
ADM 4307 Business Forecasting Analytics 5Fall 2021
Quantitative Forecasting
• Time series models:
• Prediction of the future is based on past values of a variable and/or past errors
• The goal is to determine the pattern in the historical data series and extrapolate that
pattern into the future
• Black box that makes no attempt to discover the factor affecting forecast variable behavior
• Explanatory models:
• Assume that the variable to be forecasted shows an explanatory relationship with one or
more independent variables
• The goal is to determine the form of the relationship and use it to forecast future values of
the forecast variable
ADM 4307 Business Forecasting Analytics 6Fall 2021
Qualitative Forecasting
• Do not require data in the same manner as quantitative forecasting methods
• Inputs required are mainly the product of judgement and accumulative
knowledge
• Used mainly to provide hints, to aid the planner, and to supplement
quantitative forecasts, rather than to provide a specific numerical forecast
• Used almost exclusively for medium- and long-term situations
• Frequently the only alternative is no forecast at all
ADM 4307 Business Forecasting Analytics 7Fall 2021
Elements of a Good Forecast
Accurate
and in
writing
Reliable
Meaningful
Cost-effective
Useful time
horizon
Simple to
understand &
use
ADM 4307 Business Forecasting Analytics 8Fall 2021
A Tidy Forecasting Workflow
• The process of producing forecasts for time series data can be broken down into a
few steps
ADM 4307 Business Forecasting Analytics 9Fall 2021
Some Simple Forecasting Methods
• Some forecasting methods are very simple and surprisingly effective.
• Here are four methods that we will use as benchmarks for other forecasting
methods:
• Average method
• Naïve method
• Seasonal naïve method
• Drift method
ADM 4307 Business Forecasting Analytics 10Fall 2021
Average Method
• Forecast of all future values is equal to mean of historical data {𝑦1, … , 𝑦𝑇}.
ො𝑦𝑇+ℎ|𝑇 = 𝑦 = (𝑦1 +⋯+ 𝑦𝑇)/𝑇
MEAN(y)
# y contains the time series
ADM 4307 Business Forecasting Analytics 11
bricks <- aus_production %>%
filter_index(“1970 Q1” ~ “2004 Q4”)
bricks %>% model(MEAN(Bricks))
Fall 2021
Naïve Method
• Forecasts equal to last observed value.
• Simple to use and understand, very low cost and low accuracy
•
ො𝑦𝑇+ℎ|𝑇 = 𝑦𝑇
NAIVE(y)
ADM 4307 Business Forecasting Analytics 12
bricks %>% model(NAIVE(Bricks))
Fall 2021
Seasonal Naïve Method
• Forecasts equal to last value from same season.
ො𝑦𝑇+ℎ|𝑇 = 𝑦𝑇+ℎ−𝑚(𝑘+1)
(𝑚 = seasonal period and 𝑘 is the integer part of (ℎ − 1)/𝑚)
SNAIVE(y ~ lag(m))
ADM 4307 Business Forecasting Analytics 13
bricks %>% model(SNAIVE(Bricks ~ lag(“year”)))
Fall 2021
Drift Method
• A variation on the naïve method is to allow the forecasts to increase or decrease over time,
where the amount of change over time (called the drift) is set to be the average change seen
in the historical data
• So the forecast for time 𝑇 + ℎ is given by:
𝑦𝑇 +
ℎ
𝑇 − 1
𝑡=2
𝑇
𝑦t − 𝑦t−1 = 𝑦𝑇 + ℎ
𝑦T − 𝑦1
𝑇 − 1
• This is equivalent to drawing a line between the first and last observation, and extrapolating it
into the future
RW(y ~ drift())
ADM 4307 Business Forecasting Analytics 14Fall 2021
Drift Method
bricks %>% model(RW(Bricks ~ drift()))
ADM 4307 Business Forecasting Analytics 15Fall 2021
Fitted Values and Residuals
• A residual in forecasting is the difference between an observed value and its
forecast based on other observations:
𝑒𝑖 = 𝑦𝑖 − ො𝑦𝑖
• For time series forecasting, a residual is based on one-step forecasts; that is
ො𝑦𝑡|𝑡−1 is the forecast of 𝑦𝑡 based on observations 𝑦1, … , 𝑦𝑡.
• ො𝑦𝑡|𝑡−1 is also called fitted values.
ADM 4307 Business Forecasting Analytics 16Fall 2021
Residual Diagnostics
• A good forecasting method will yield residuals with the following properties:
• The residuals are uncorrelated. If there are correlations between residuals, then there is
information left in the residuals which should be used in computing forecasts
• The residuals have zero mean. If the residuals have a mean other than zero, then the
forecasts are biased
• Any forecasting method that does not satisfy these properties can be
improved. That does not mean that forecasting methods that satisfy these
properties can not be improved.
ADM 4307 Business Forecasting Analytics 17Fall 2021
Residual Diagnostics
• In addition to these essential properties, it is useful (but not necessary) for the
residuals to also have the following two properties:
• The residuals have constant variance
• The residuals are normally distributed
• These two properties make the calculation of prediction intervals easier.
However, a forecasting method that does not satisfy these properties may not
necessarily be improved
ADM 4307 Business Forecasting Analytics 18Fall 2021
Forecast Errors
• Forecast “error”: the difference between an observed value and its forecast
𝑒𝑖 = 𝑦𝑖 − ො𝑦𝑖
• Unlike residuals, forecast errors on the test set involve multi-step forecasts.
• These are true forecast errors as the test data is not used in computing the
forecast ො𝑦𝑖
ADM 4307 Business Forecasting Analytics 19Fall 2021
The Forecasting Scenario
ADM 4307 Business Forecasting Analytics 20Fall 2021
Measures of Forecast Accuracy
• Key measures to evaluate the accuracy:
Mean absolute error: MAE = mean(|𝑒𝑖|)
Mean square error: MSE = mean(𝑒𝑖
2)
Mean absolute percentage error: MAPE = 100 mean(|𝑒𝑖|/|𝑦𝑖|)
Root mean squared error: RMSE = mean(𝑒𝑖
2)
• MAE, MSE, RMSE are all scale dependent.
• MAPE is scale independent but is only sensible if 𝑦𝑡 ≫ 0 for all 𝑡, and 𝑦 has a
natural zero.
ADM 4307 Business Forecasting Analytics 21Fall 2021
Graphical Summaries
• Time plots
• The data are plotted over time
• Reveal trends over time, regular seasonal behavior and other systematic features of the
data
• Seasonal plots
• The data are plotted against the individual “seasons” in which the data were observed
• Enable the underlying seasonal pattern and substantial departures from the seasonal
pattern to be seen clearly
• Scatterplots
• Plot the variable that we wish to forecast against an explanatory variable
• Help us to visualize the relationship between two variables
ADM 4307 Business Forecasting Analytics 22Fall 2021
Time Series Patterns
Horizontal pattern
• The data values fluctuate around a constant mean
• Such a series is called stationary in its mean
Seasonal pattern
• The data values are influenced by seasonal factors such as the month of the year or the day of the
week
• Seasonal series are sometimes called periodic although they do not exactly repeat themselves over
time
Cyclical pattern
• The data exhibit rises and falls that – are not of a fixed period
Trend pattern
• There is a long-term increase or decrease in the data
Many data series include a combination of the preceding patterns
ADM 4307 Business Forecasting Analytics 23Fall 2021
Autocorrelation
• Correlation measures the extent of a linear relationship between two variables.
• Autocorrelation measures the linear relationship between lagged values of a
time series.
• The autocorrelation coefficients make up the autocorrelation function or ACF.
• The autocorrelation coefficients for the beer production data can be computed
using the ACF() function.
ADM 4307 Business Forecasting Analytics 24Fall 2021
Transformations and Adjustments
• Adjusting the historical data can often lead to a simpler forecasting model
• Four kinds of adjustments:
• Calendar adjustments
• Population adjustments
• Inflation adjustments
• Mathematical transformations
• The main goal is to simplify the patterns in the historical data by removing known
sources of variation or by making the pattern more consistent across the data set
• Simpler patterns usually lead to more accurate forecasts
ADM 4307 Business Forecasting Analytics 25Fall 2021
Time Series Decomposition
• Decomposition assumes that the data are made up as follows:
data = pattern + error =𝑓 trend−cycle, seasonality, error
• The error term is often called the irregular or the remainder component
ADM 4307 Business Forecasting Analytics 26Fall 2021
Decomposition Graphics
• Decomposition plot:
• Help visualize the decomposition procedure
• Seasonal sub-series plot:
• Help visualize the overall seasonal pattern and how the seasonal
component is changing over time
ADM 4307 Business Forecasting Analytics 27Fall 2021
History of Decomposition Methods
• Classical method originated in 1920s.
• Census II method introduced in 1957. Basis for X-11 method and variants (including
X-12-ARIMA, X-13-ARIMA)
• STL method introduced in 1983
• TRAMO/SEATS introduced in 1990s.
National Statistics Offices
• ABS uses X-12-ARIMA
• US Census Bureau uses X-13ARIMA-SEATS
• Statistics Canada uses X-12-ARIMA
• ONS (UK) uses X-12-ARIMA
• EuroStat use X-13ARIMA-SEATS
ADM 4307 Business Forecasting Analytics 28Fall 2021
Exponential Smoothing
• Exponential smoothing methods are weighted averages of past observations, with weights
decaying exponentially as the observations get older
• The most recent observations usually provide the best guide as to the future
ADM 4307 Business Forecasting Analytics 29Fall 2021
Trend and Seasonality Patterns
• Patterns based on Pegel’s (1969) classification
• An inappropriate forecasting model, even when optimized, will be inferior to a
more appropriate model
ADM 4307 Business Forecasting Analytics 30Fall 2021
State Space Models
• Each model has an observation equation and transition equations, one for each state
(level, trend, seasonal), i.e., state space models.
• Two models for each method: one with additive and one with multiplicative errors, i.e.,
in total 18 models.
• ETS(Error, Trend, Seasonal):
• Error = {A, M}
• Trend = {N, A, Ad}
• Seasonal = {N, A, M}
ADM 4307 Business Forecasting Analytics 31Fall 2021
Exponential Smoothing vs. ARIMA
• ARIMA models provide another approach to time series forecasting
• ARIMA
• AR: autoregressive (lagged observations as inputs)
• I: integrated (differencing to make series stationary)
• MA: moving average (lagged errors as inputs)
• While exponential smoothing models were based on a description of trend and
seasonality in the data, ARIMA models aim to describe the autocorrelations in the
data.
• Exponential smoothing and ARIMA models are the two most widely used approaches
to time series forecasting.
ADM 4307 Business Forecasting Analytics 32Fall 2021
Stationarity
• A stationary series is:
• roughly horizontal
• constant variance
• no patterns predictable in the long-term
• Transformations help to stabilize the variance.
• For ARIMA modelling, we also need to stabilize the mean.
ADM 4307 Business Forecasting Analytics 33Fall 2021
Differencing
• One way to make a time series stationary is to compute the differences between
consecutive observations. This is known as differencing
• Transformations such as logarithms can help to stabilize the variance of a time series
• Differencing can help stabilize the mean of a time series by removing changes in the
level of a time series, and so eliminating trend and seasonality
• The differenced series is the change between consecutive observations in the original
series, and can be written as
ADM 4307 Business Forecasting Analytics 34Fall 2021
Second-Order Differencing
• Occasionally, the differenced data will not appear stationary and it may be
necessary to difference the data a second time to obtain a stationary series:
• In this case, 𝑦𝑡
′′ will have values 𝑇 − 2.
• Then we would model the change in the changes of the original data.
• In practice, it is almost never necessary to go beyond second-order
differences
ADM 4307 Business Forecasting Analytics 35Fall 2021
Seasonal Differencing
• A seasonal difference is the difference between an observation and the
corresponding observation from the previous season
where 𝑚 is the number of seasons.
• For monthly data 𝑚 = 12
• For quarterly data 𝑚 = 4
ADM 4307 Business Forecasting Analytics 36Fall 2021
Unit Root Tests
• One way to determine more objectively if differencing is required is to use a
unit root test
• These are statistical hypothesis tests of stationarity that are designed for
determining whether differencing is required
• A number of unit root tests are available. They are based on different
assumptions and may lead to conflicting answers
• In this course, we use the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test
ADM 4307 Business Forecasting Analytics 37Fall 2021
KPSS Test
• KPSS test:
• null hypothesis is that the data are stationary and non-seasonal and
• we look for evidence that the null hypothesis is false.
• Consequently, small p-values (e.g., less than 0.05) suggest that differencing is
required.
• The test can be computed using the unitroot_kpss() function.
ADM 4307 Business Forecasting Analytics 38Fall 2021
Modelling Procedure
ADM 4307 Business Forecasting Analytics 39Fall 2021
ARIMA vs ETS
• It is a common myth that ARIMA models are more general than exponential smoothing
• Linear exponential smoothing models are all special cases of ARIMA models
• Non-linear exponential smoothing models have no equivalent ARIMA counterparts
• Many ARIMA models that have no exponential smoothing counterparts
ADM 4307 Business Forecasting Analytics 40Fall 2021
Simple Linear Regression
• The basic concept is that we forecast variable 𝑦 assuming it has a linear relationship with 𝑥
variable
𝑦𝑡 = 𝛽0 + 𝛽1𝑥𝑡 + 𝜀𝑡
• The model is called simple regression as we only allow one predictor variable 𝑥. The forecast
variable 𝑦 is sometimes also called the dependent or explained variable. The predictor
variable 𝑥 is sometimes also called the independent or explanatory variable.
• The parameters 𝛽0 and 𝛽1 determine the intercept and the slope of the line respectively. The
intercept 𝛽0 represents the predicted value of 𝑦 when 𝑥 = 0. The slope 𝛽1 represents the
average predicted change in 𝑦 resulting from a one unit increase in 𝑥.
ADM 4307 Business Forecasting Analytics 41Fall 2021
Multiple Linear Regression
• The general form of a multiple regression is
𝑦𝑡 = 𝛽0 + 𝛽1𝑥1,𝑡 + 𝛽2𝑥2,𝑡 +⋯+ 𝛽𝑘 𝑥𝑘,𝑡 + 𝜀𝑡
where 𝑦𝑡 is the variable to be forecast and 𝑥1,𝑡 , … , 𝑥𝑘,𝑡 are the predictor variables.
• The coefficients 𝛽1, … , 𝛽𝑘 measure the effect of each predictor after taking account of
the effect of all other predictors in the model.
• Thus, the coefficients measure the marginal effects of the predictor variables.
ADM 4307 Business Forecasting Analytics 42Fall 2021
Assumptions
• For forecasting, we make the following assumptions about the errors
(𝜀1, … , 𝜀𝑇):
• they have mean zero
• they are not autocorrelated
• they are unrelated to the predictor variables
• It is also useful to have the errors normally distributed with constant variance
in order to produce prediction intervals, but this is not necessary for
forecasting.
ADM 4307 Business Forecasting Analytics 43Fall 2021
Least Squares Estimation
• The values of the coefficients 𝛽0, 𝛽1, … , 𝛽𝑘 are obtained by finding the minimum sum of squares
of the errors. That is, we find the values of 𝛽0, 𝛽1, … , 𝛽𝑘 which minimize
𝑡=1
𝑇
𝜀𝑡
2 =
𝑡=1
𝑇
𝑦𝑡 − 𝛽0 − 𝛽1𝑥1,𝑡 − 𝛽2𝑥2,𝑡 −⋯− 𝛽𝑘 𝑥𝑘,𝑡
2
• This is called least squares estimation because it gives
the least value of the sum of squared errors.
ADM 4307 Business Forecasting Analytics 44Fall 2021
Goodness-of-Fit
• A common way to summarize how well a linear regression model fits the data is via
the coefficient of determination, or 𝑅2.
𝑅2 =
σ ො𝑦𝑖 − ത𝑦
2
σ 𝑦𝑖 − ത𝑦
2
where the summations are over all observations.
• Thus, it reflects the proportion of variation in the forecast variable that is accounted
for (or explained) by the regression model
• In all cases, 0 ≤ 𝑅2 ≤ 1.
• If the predictions are close to the actual values, we would expect 𝑅2 to be close to 1.
On the other hand, if the predictions are unrelated to the actual values, then 𝑅2 = 0.
ADM 4307 Business Forecasting Analytics 45Fall 2021
Evaluating the Regression Model
• After selecting the regression variables and fitting a regression model, it is necessary
to plot the residuals to check that the assumptions of the model have been satisfied.
• There are a series of plots that should be produced in order to check different aspects
of the fitted model and the underlying assumptions (whether the linear model was
appropriate).
• ACF plot of residuals
• Histogram of residuals
• Residual plots against predictors
• Residual plots against fitted values
ADM 4307 Business Forecasting Analytics 46Fall 2021
Some Useful Predictors
• There are several useful predictors that occur frequently when using regression for
time series data:
• Trend
• Dummy variables
• Seasonal dummy variables
• Intervention variables
• Trading days
• Distributed lags
• Holidays
• Fourier series
ADM 4307 Business Forecasting Analytics 47Fall 2021
Adjusted 𝑅2 ( ത𝑅2)
• Computer output for regression will always give the 𝑅2 value
• This is a useful summary of the model
• However
• 𝑅2 does not allow for “degrees of freedom”.
• Adding any variable tends to increase the value of 𝑅2, even if that variable is
irrelevant.
• To overcome this problem, we can use adjusted 𝑅2 ( ത𝑅2).
• Using this measure, the best model will be the one with the largest value of ത𝑅2.
• Maximizing ത𝑅2 is equivalent to minimizing the standard error.
ADM 4307 Business Forecasting Analytics 48Fall 2021
Selecting Predictors
• When there are many predictors, how should we choose which ones to use?
• We need a way of comparing two competing models.
• What not to do!
• Plot 𝑦 against a particular predictor (𝑥𝑖) and if it shows no noticeable relationship, drop it.
• Do a multiple linear regression on all the predictors and disregard all variables whose 𝑝
values are greater than 0.05.
• Maximize 𝑅2 or minimize MSE
• Instead, we will use five measures of predictive accuracy (CV, AIC, AICc, BIC and
ത𝑅2).
• They can be shown using the glance() function.
ADM 4307 Business Forecasting Analytics 49Fall 2021
Business Forecasting Analytics
ADM 4307 – Fall 2021
Review
ADM 4307 Business Forecasting Analytics 50Fall 2021