ETW3420: Principles of Forecasting and Applications
Principles of
Copyright By PowCoder代写 加微信 powcoder
Forecasting and
Applications
Topic 8: Advanced Forecasting Methods
1 Regression with ARIMA errors
2 Ensemble Forecasts
3 Bootstrap and Bagging
4 Practical Issues: Missing Values
5 Practical Issues: Outliers
Regression with ARIMA errors
Regression models
yt = β0 + β1×1,t + · · · + βkxk,t + εt,
yt modeled as function of k explanatory variables
x1,t, . . . , xk,t.
In regression, we assume that εt was WN.
Now we want to allow εt to be autocorrelated.
Example: ARIMA(1,1,1) errors
yt = β0 + β1×1,t + · · · + βkxk,t + ηt,
(1− φ1B)(1− B)ηt = (1 + θ1B)εt,
where εt is white noise.
Regression with ARIMA errors
Regression models
yt = β0 + β1×1,t + · · · + βkxk,t + εt,
yt modeled as function of k explanatory variables
x1,t, . . . , xk,t.
In regression, we assume that εt was WN.
Now we want to allow εt to be autocorrelated.
Example: ARIMA(1,1,1) errors
yt = β0 + β1×1,t + · · · + βkxk,t + ηt,
(1− φ1B)(1− B)ηt = (1 + θ1B)εt,
where εt is white noise.
Residuals and errors
Example: ηt = ARIMA(1,1,1)
yt = β0 + β1×1,t + · · · + βkxk,t + ηt,
(1− φ1B)(1− B)ηt = (1 + θ1B)εt,
Be careful in distinguishing ηt from εt.
Only the errors εt are assumed to be white noise.
In ordinary regression, ηt is assumed to be white noise and so
Residuals and errors
Example: ηt = ARIMA(1,1,1)
yt = β0 + β1×1,t + · · · + βkxk,t + ηt,
(1− φ1B)(1− B)ηt = (1 + θ1B)εt,
Be careful in distinguishing ηt from εt.
Only the errors εt are assumed to be white noise.
In ordinary regression, ηt is assumed to be white noise and so
Estimation
If we minimize
η2t (by using ordinary regression):
1 Estimated coefficients β̂0, . . . , β̂k are biased as some
information ignored;
2 Statistical tests associated with the model (e.g., t-tests on the
coefficients) are incorrect.
3 p-values for coefficients usually too small (“spurious
regression”).
4 AIC of fitted models misleading.
Minimizing
ε2t avoids these problems.
Maximizing likelihood is similar to minimizing
Estimation
If we minimize
η2t (by using ordinary regression):
1 Estimated coefficients β̂0, . . . , β̂k are biased as some
information ignored;
2 Statistical tests associated with the model (e.g., t-tests on the
coefficients) are incorrect.
3 p-values for coefficients usually too small (“spurious
regression”).
4 AIC of fitted models misleading.
Minimizing
ε2t avoids these problems.
Maximizing likelihood is similar to minimizing
Stationarity
Regression with ARMA errors
yt = β0 + β1×1,t + · · · + βkxk,t + ηt,
where ηt is an ARMA process.
All variables in the model must be stationary.
If we estimate the model while any of these are
non-stationary, the estimated coefficients can be incorrect.
Difference variables until all stationary.
If necessary, apply same differencing to all variables.
Stationarity
Model with ARIMA(1,1,1) errors
yt = β0 + β1×1,t + · · · + βkxk,t + ηt,
(1− φ1B)(1− B)ηt = (1 + θ1B)εt,
Equivalent to model with ARIMA(1,0,1) errors
1,t + · · · + βkx
(1− φ1B)η′t = (1 + θ1B)εt,
where y′t = yt − yt−1, x′t,i = xt,i − xt−1,i and η′t = ηt − ηt−1.
Stationarity
Model with ARIMA(1,1,1) errors
yt = β0 + β1×1,t + · · · + βkxk,t + ηt,
(1− φ1B)(1− B)ηt = (1 + θ1B)εt,
Equivalent to model with ARIMA(1,0,1) errors
1,t + · · · + βkx
(1− φ1B)η′t = (1 + θ1B)εt,
where y′t = yt − yt−1, x′t,i = xt,i − xt−1,i and η′t = ηt − ηt−1.
Regression with ARIMA errors
Any regression with an ARIMA error can be rewritten as a
regression with an ARMA error by differencing all variables
with the same differencing operator as in the ARIMA model.
Original data
yt = β0 + β1×1,t + · · · + βkxk,t + ηt
where φ(B)(1− B)dηt = θ(B)εt
After differencing all variables
1,t + · · · + βkx
where φ(B)ηt = θ(B)εt
and y′t = (1− B)
Regression with ARIMA errors
Any regression with an ARIMA error can be rewritten as a
regression with an ARMA error by differencing all variables
with the same differencing operator as in the ARIMA model.
Original data
yt = β0 + β1×1,t + · · · + βkxk,t + ηt
where φ(B)(1− B)dηt = θ(B)εt
After differencing all variables
1,t + · · · + βkx
where φ(B)ηt = θ(B)εt
and y′t = (1− B)
Regression with ARIMA errors
Any regression with an ARIMA error can be rewritten as a
regression with an ARMA error by differencing all variables
with the same differencing operator as in the ARIMA model.
Original data
yt = β0 + β1×1,t + · · · + βkxk,t + ηt
where φ(B)(1− B)dηt = θ(B)εt
After differencing all variables
1,t + · · · + βkx
where φ(B)ηt = θ(B)εt
and y′t = (1− B)
Model selection
Check that all variables are stationary. If not, apply
differencing. Where appropriate, use the same
differencing for all variables to preserve interpretability.
Fit regression model with automatically selected ARIMA
Check that εt series looks like white noise.
Selecting predictors
AICc can be calculated for final model.
Repeat procedure for all subsets of predictors to be
considered, and select model with lowest AIC value.
US personal consumption and income
1970 1980 1990 2000 2010
Quarterly changes in US consumption and personal income
US personal consumption and income
−2.5 0.0 2.5
Quarterly changes in US consumption and personal income
US personal consumption and income
No need for transformations or further differencing.
Increase in income does not necessarily translate into instant
increase in consumption (e.g., after the loss of a job, it may
take a few months for expenses to be reduced to allow for
the new circumstances). We will ignore this for now.
US personal consumption and income
(fit <- auto.arima(uschange[,1], xreg=uschange[,2]))
## Series: uschange[, 1]
## Regression with ARIMA(1,0,2) errors
## Coefficients:
## ar1 ma1 ma2 intercept xreg
## 0.6922 -0.5758 0.1984 0.5990 0.2028
## s.e. 0.1159 0.1301 0.0756 0.0884 0.0461
## sigma^2 estimated as 0.3219: log likelihood=-156.95
## AIC=325.91 AICc=326.37 BIC=345.29
Write down the equations for the fitted model.
US personal consumption and income
(fit <- auto.arima(uschange[,1], xreg=uschange[,2]))
## Series: uschange[, 1]
## Regression with ARIMA(1,0,2) errors
## Coefficients:
## ar1 ma1 ma2 intercept xreg
## 0.6922 -0.5758 0.1984 0.5990 0.2028
## s.e. 0.1159 0.1301 0.0756 0.0884 0.0461
## sigma^2 estimated as 0.3219: log likelihood=-156.95
## AIC=325.91 AICc=326.37 BIC=345.29
Write down the equations for the fitted model.
US personal consumption and income
ggtsdisplay(residuals(fit, type='regression'),
main="Regression errors")
1970 1980 1990 2000 2010
Regression errors
4 8 12 16 20
4 8 12 16 20
US personal consumption and income
ggtsdisplay(residuals(fit, type='response'),
main="ARIMA errors")
1970 1980 1990 2000 2010
ARIMA errors
4 8 12 16 20
4 8 12 16 20
US personal consumption and income
checkresiduals(fit, test=FALSE)
1970 1980 1990 2000 2010
Residuals from Regression with ARIMA(1,0,2) errors
4 8 12 16 20
−2 −1 0 1 2
US personal consumption and income
checkresiduals(fit, plot=FALSE)
## Ljung-Box test
## data: Residuals from Regression with ARIMA(1,0,2) errors
## Q* = 5.8916, df = 3, p-value = 0.117
## Model df: 5. Total lags used: 8
US personal consumption and income
fcast <- forecast(fit,
xreg=rep(mean(uschange[,2]),8), h=8)
autoplot(fcast) + xlab("Year") +
ylab("Percentage change") +
ggtitle("Forecasts from regression with ARIMA(1,0,2) errors")
1970 1980 1990 2000 2010 2020
Forecasts from regression with ARIMA(1,0,2) errors
Forecasting
To forecast a regression model with ARIMA errors, we need
to forecast the regression part of the model and the ARIMA
part of the model and combine the results.
Some predictors are known into the future (e.g., time,
Separate forecasting models may be needed for other
predictors.
Forecast intervals ignore the uncertainty in forecasting the
predictors.
Daily electricity demand
Model daily electricity demand as a function of temperature using
quadratic regression with ARMA errors.
qplot(elecdaily[,"Temperature"], elecdaily[,"Demand"]) +
xlab("Temperature") + ylab("Demand")
10 20 30 40
Temperature
Daily electricity demand
autoplot(elecdaily, facets = TRUE)
Daily electricity demand
xreg <- cbind(MaxTemp = elecdaily[, "Temperature"],
MaxTempSq = elecdaily[, "Temperature"]^2,
Workday = elecdaily[, "WorkDay"])
fit <- auto.arima(elecdaily[, "Demand"], xreg = xreg)
checkresiduals(fit)
0 10 20 30 40 50
Residuals from Regression with ARIMA(2,1,2)(2,0,0)[7] errors
−20 0 20 40
## Ljung-Box test
## data: Residuals from Regression with ARIMA(2,1,2)(2,0,0)[7] errors
## Q* = 28.229, df = 4, p-value = 1.121e-05
## Model df: 10. Total lags used: 14
Daily electricity demand
## Ljung-Box test
## data: Residuals from Regression with ARIMA(2,1,2)(2,0,0)[7] errors
## Q* = 28.229, df = 4, p-value = 1.121e-05
## Model df: 10. Total lags used: 14
Daily electricity demand
# Forecast one day ahead
forecast(fit, xreg = cbind(26, 26^2, 1))
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 53.14286 189.769 181.2954 198.2427 176.8096 202.7284
Daily electricity demand
fcast <- forecast(fit,
xreg = cbind(rep(26,14), rep(26^2,14),
c(0,1,0,0,1,1,1,1,1,0,0,1,1,1)))
autoplot(fcast) + ylab("Electicity demand (GW)")
Forecasts from Regression with ARIMA(2,1,2)(2,0,0)[7] errors
1 Regression with ARIMA errors
2 Ensemble Forecasts
3 Bootstrap and Bagging
4 Practical Issues: Missing Values
5 Practical Issues: Outliers
Forecast Combinations
It turns out that combining multiple forecasting methods can
lead to improved predictive performance.
This fact has been well known since at least 1969, when Bates
and Granger wrote their famous paper on “The Combination
of Forecasts”, that combining forecasts often lead to better
forecast accuracy.
20 years later, Clemen (1989) wrote
The results have been virtually unanimous: combining multiple
forecasts leads to increased forecast accuracy. In many cases one
can make dramatic performance improvements by simply
averaging the forecasts.
Forecast Combinations
In the million-dollar Netflix Prize contest in 2009, the winning
team, Bellkor’s Pragmatic Chaos, combined results from
several teams to produce their winning forecasts.
In a 2010 article the Netflix Prize winners described the
power of their ensemble approach:
“An early lesson of the competition was the value of combining sets
of predictions from multiple models or algorithms. If 2 prediction
sets achieved similar RMSEs, it was quicker and more effective to
simply average the 2 sets than to try to develop a new model that
incorporated the best of each method. Even if the RMSE for one
set was much worse than the other, there was almost certainly a
linear combination that improved on the better set.”
Forecast Combinations
There has been considerable research on using weighted
averages, with packages developed for estimating these
weights (e.g. opera: “Online Prediction by ExpeRt
Aggegation”).
However, using a simple average has proven hard to beat.
Monthly Expenditure on eating out in Australia
train <- window(auscafe, end=c(2012,9))
h <- length(auscafe) - length(train)
ETS <- forecast(ets(train), h=h)
ARIMA <- forecast(auto.arima(train, lambda=0, biasadj=TRUE),
STL <- stlf(train, lambda=0, h=h, biasadj=TRUE)
Combination <- (ETS[["mean"]] + ARIMA[["mean"]] +
STL[["mean"]])/3
Monthly Expenditure on eating out in Australia
autoplot(auscafe) +
autolayer(ETS, series="ETS", PI=FALSE) +
autolayer(ARIMA, series="ARIMA", PI=FALSE) +
autolayer(STL, series="STL", PI=FALSE) +
autolayer(Combination, series="Combination") +
xlab("Year") + ylab("$ billion") +
ggtitle("Australian monthly expenditure on eating out")
Monthly Expenditure on eating out in Australia
1990 2000 2010
Combination
Australian monthly expenditure on eating out
Out-of-sample MAPE:
ETS ARIMA STL COMBINATION
3.682 3.036 5.201 2.388 32
1 Regression with ARIMA errors
2 Ensemble Forecasts
3 Bootstrap and Bagging
4 Practical Issues: Missing Values
5 Practical Issues: Outliers
The basic idea of bootstrap is to make inference about a
estimate (such as sample mean) for a population parameter θ
(such as population mean) on sample data.
It is a resampling method by independently sampling with
replacement from an existing sample data with same sample
size n, and performing inference among these resampled
In a time series context, bootstrapping can be performed to
generate new time series that are similar to our observed
However, bootstrapping is a little tricky to implement with
time series data due to its ordering and autocorrelation.
Solution: Block bootstrap.
Block Bootstrap
1 The time series is Box-Cox transformed, and decomposed into
trend, seasonal and remainder components using STL.
2 Obtain shuffled versions of the remainder component to get
bootstrapped remainder series.
(a) Cannot simply use the re-draw procedure due to the
presence of autocorrelation present in an STL remainder
(b) Instead, contiguous sections of the remainder time series
are selected at random and joined together.
3 The boostrapped remainder series are added to the trend
and seasonal components, and the Box-Cox transformation is
reversed to give variations on the original time series.
Monthly expenditure on retail debit cards in Iceland
bootseries <- bld.mbb.bootstrap(debitcards, 10) %>%
as.data.frame() %>%
ts(start=2000, frequency=12)
autoplot(debitcards) +
autolayer(bootseries, colour=TRUE) +
autolayer(debitcards, colour=FALSE) +
ylab(“Bootstrapped series”) + guides(colour=”none”)
Monthly expenditure on retail debit cards in Iceland
2000 2005 2010
Uses of bootstrapping
1 Get a better measure of forecast uncertainty
2 Improve point forecasts using “bagging”
Bootstrapped time series can be used to improve forecast
Produce forecasts from each of the additional time series→
average the forecasts→ better forecasts.
This procedure is called “bagging”, acronym for “bootstrap
aggregating”.
Bagging with ETS
sim <- bld.mbb.bootstrap(debitcards, 10) %>%
as.data.frame() %>%
ts(start = 2000, freq = 12)
fc <- purrr:: map(as.list(sim), function(x){forecast(ets(x))[["mean"]]}) %>%
as.data.frame() %>%
ts(freq = 12, start = c(2013,9))
autoplot(debitcards) +
autolayer(sim, colour = TRUE) +
autolayer(fc, colour = TRUE) +
autolayer(debitcards, colour = FALSE) +
ylab(“Bootstrapped series”) +
guides(colour = “none”) 42
Bagging with ETS
2000 2005 2010 2015
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com