ETW3420: Principles of Forecasting and Applications
Principles of
Copyright By PowCoder代写 加微信 powcoder
Forecasting and
Applications
Topic 3: The Forecaster’s Toolbox
1 Some simple forecasting methods
2 Box-Cox transformations
3 Residual diagnostics
4 Evaluating forecast accuracy: The traditional approach
5 Evaluating forecast accuracy: The modern approach
6 Prediction intervals
Some simple forecasting methods
1995 2000 2005 2010
Australian quarterly beer production
How would you forecast these data?
Some simple forecasting methods
0 50 100 150 200 250 300
Dow−Jones index
How would you forecast these data?
Some simple forecasting methods
1. Average method
Forecast of all future values is equal to mean of
historical data {y1, . . . , yT}.
Forecasts: ŷT+h|T = ȳ = (y1 + · · · + yT)/T
2. Naive method
Forecasts equal to last observed value.
Forecasts: ŷT+h|T = yT.
Consequence of efficient market hypothesis.
3. Seasonal naive method
Forecasts equal to last value from same season.
Forecasts: ŷT+h|T = yT+h−m(k+1), wherem = seasonal
period and k is the integer part of (h− 1)/m.
Some simple forecasting methods
1. Average method
Forecast of all future values is equal to mean of
historical data {y1, . . . , yT}.
Forecasts: ŷT+h|T = ȳ = (y1 + · · · + yT)/T
2. Naive method
Forecasts equal to last observed value.
Forecasts: ŷT+h|T = yT.
Consequence of efficient market hypothesis.
3. Seasonal naive method
Forecasts equal to last value from same season.
Forecasts: ŷT+h|T = yT+h−m(k+1), wherem = seasonal
period and k is the integer part of (h− 1)/m.
Some simple forecasting methods
1. Average method
Forecast of all future values is equal to mean of
historical data {y1, . . . , yT}.
Forecasts: ŷT+h|T = ȳ = (y1 + · · · + yT)/T
2. Naive method
Forecasts equal to last observed value.
Forecasts: ŷT+h|T = yT.
Consequence of efficient market hypothesis.
3. Seasonal naive method
Forecasts equal to last value from same season.
Forecasts: ŷT+h|T = yT+h−m(k+1), wherem = seasonal
period and k is the integer part of (h− 1)/m.
Some simple forecasting methods
4. Drift method
Forecasts equal to last value plus average change.
Forecasts:
ŷT+h|T = yT +
(yt − yt−1)
(yT − y1).
Equivalent to extrapolating a line drawn between first and
last observations.
Some simple forecasting methods
1995 2000 2005 2010
s Forecast
Seasonal naive
Forecasts for quarterly beer production
Some simple forecasting methods
0 50 100 150 200 250 300
Dow Jones Index (daily ending 15 Jul 94)
1 Some simple forecasting methods
2 Box-Cox transformations
3 Residual diagnostics
4 Evaluating forecast accuracy: The traditional approach
5 Evaluating forecast accuracy: The modern approach
6 Prediction intervals
Variance stabilization
If the data shows different variation at different levels
of the series, then a transformation can be useful.
Denote original observations as y1, . . . , yn and
transformed observations as w1, . . . ,wn.
Mathematical transformations for stabilizing variation
Square root wt =
Cube root wt = 3
yt Increasing
Logarithm wt = log(yt) strength
Logarithms, in particular, are useful because they are more
interpretable: changes in a log value are relative (percent)
changes on the original scale.
Variance stabilization
If the data shows different variation at different levels
of the series, then a transformation can be useful.
Denote original observations as y1, . . . , yn and
transformed observations as w1, . . . ,wn.
Mathematical transformations for stabilizing variation
Square root wt =
Cube root wt = 3
yt Increasing
Logarithm wt = log(yt) strength
Logarithms, in particular, are useful because they are more
interpretable: changes in a log value are relative (percent)
changes on the original scale.
Variance stabilization
If the data shows different variation at different levels
of the series, then a transformation can be useful.
Denote original observations as y1, . . . , yn and
transformed observations as w1, . . . ,wn.
Mathematical transformations for stabilizing variation
Square root wt =
Cube root wt = 3
yt Increasing
Logarithm wt = log(yt) strength
Logarithms, in particular, are useful because they are more
interpretable: changes in a log value are relative (percent)
changes on the original scale.
Variance stabilization
If the data shows different variation at different levels
of the series, then a transformation can be useful.
Denote original observations as y1, . . . , yn and
transformed observations as w1, . . . ,wn.
Mathematical transformations for stabilizing variation
Square root wt =
Cube root wt = 3
yt Increasing
Logarithm wt = log(yt) strength
Logarithms, in particular, are useful because they are more
interpretable: changes in a log value are relative (percent)
changes on the original scale. 10
Variance stabilization
1960 1970 1980 1990
Australian electricity production
Variance stabilization
1960 1970 1980 1990
Square root electricity production
Variance stabilization
1960 1970 1980 1990
Cube root electricity production
Variance stabilization
1960 1970 1980 1990
Log electricity production
Variance stabilization
1960 1970 1980 1990
Inverse electricity production
Box-Cox transformations
Each of these transformations is close to a member of the family
of Box-Cox transformations:
log(yt), λ = 0;(yλt − 1)/λ, λ 6= 0.
λ = 1: (No substantive transformation)
λ = 12 : (Square root plus linear transformation)
λ = 0: (Natural logarithm)
λ = −1: (Inverse plus 1)
Box-Cox transformations
Each of these transformations is close to a member of the family
of Box-Cox transformations:
log(yt), λ = 0;(yλt − 1)/λ, λ 6= 0.
λ = 1: (No substantive transformation)
λ = 12 : (Square root plus linear transformation)
λ = 0: (Natural logarithm)
λ = −1: (Inverse plus 1)
Box-Cox transformations
Box-Cox transformations
autoplot(BoxCox(elec,lambda=1/3))
1960 1970 1980 1990
Box-Cox transformations
yλt for λ close to zero behaves like logs.
If some yt = 0, then must have λ > 0
if some yt < 0, no power transformation is possible unless all
yt adjusted by adding a constant to all values.
Simple values of λ are easier to explain.
Results are relatively insensitive to λ.
Often no transformation (λ = 1) needed.
Transformation can have very large effect on PI.
Choosing λ = 0 is a simple way to force forecasts to be
Automated Box-Cox transformations
(BoxCox.lambda(elec))
## [1] 0.2654076
This attempts to balance the seasonal fluctuations and
random variation across the series.
Always check the results.
A low value of λ can give extremely large prediction intervals.
Automated Box-Cox transformations
(BoxCox.lambda(elec))
## [1] 0.2654076
This attempts to balance the seasonal fluctuations and
random variation across the series.
Always check the results.
A low value of λ can give extremely large prediction intervals.
Back-transformation
We must reverse the transformation (or back-transform) to obtain
forecasts on the original scale. The reverse Box-Cox
transformations are given by
exp(wt), λ = 0;(λWt + 1)1/λ, λ 6= 0.
Back-transformation
fit <- snaive(elec, lambda=1/3)
autoplot(fit)
1960 1970 1980 1990
Forecasts from Seasonal naive method
Back-transformation
autoplot(fit, include=120)
1987.5 1990.0 1992.5 1995.0 1997.5
Forecasts from Seasonal naive method
Bias adjustment
Back-transformed point forecasts are medians.
Back-transformed PI have the correct coverage.
Back-transformed means
Let X have mean µ and variance σ2.
Let f(x) be back-transformation function, and Y = f(X).
Taylor series expansion about the point X = µ:
f(X) = f(µ) + (X − µ)f′(µ) +
(X − µ)2f′′(µ).
E[Y] = E[f(X)] = f(µ) + 12σ
Bias adjustment
Back-transformed point forecasts are medians.
Back-transformed PI have the correct coverage.
Back-transformed means
Let X have mean µ and variance σ2.
Let f(x) be back-transformation function, and Y = f(X).
Taylor series expansion about the point X = µ:
f(X) = f(µ) + (X − µ)f′(µ) +
(X − µ)2f′′(µ).
E[Y] = E[f(X)] = f(µ) + 12σ
Bias adjustment
Back-transformed point forecasts are medians.
Back-transformed PI have the correct coverage.
Back-transformed means
Let X have mean µ and variance σ2.
Let f(x) be back-transformation function, and Y = f(X).
Taylor series expansion about the point X = µ:
f(X) = f(µ) + (X − µ)f′(µ) +
(X − µ)2f′′(µ).
E[Y] = E[f(X)] = f(µ) + 12σ
Bias adjustment
Box-Cox back-transformation:
exp(wt) λ = 0;(λWt + 1)1/λ λ 6= 0.
exp(x) λ = 0;
(λx + 1)1/λ λ 6= 0.
exp(x) λ = 0;
(1− λ)(λx + 1)1/λ−2 λ 6= 0.
(λµ + 1)1/λ
Bias adjustment
Box-Cox back-transformation:
exp(wt) λ = 0;(λWt + 1)1/λ λ 6= 0.
exp(x) λ = 0;
(λx + 1)1/λ λ 6= 0.
exp(x) λ = 0;
(1− λ)(λx + 1)1/λ−2 λ 6= 0.
(λµ + 1)1/λ
Bias adjustment
fc <- rwf(eggs, drift=TRUE, lambda=0, h=50, level=80)
fc2 <- rwf(eggs, drift=TRUE, lambda=0, h=50, level=80,
biasadj=TRUE)
autoplot(eggs) +
autolayer(fc, series="Simple back transformation") +
autolayer(fc2, series="Bias adjusted", PI=FALSE) +
guides(colour=guide_legend(title="Forecast"))
1900 1950 2000 2050
Bias adjusted
Simple back transformation
1 Some simple forecasting methods
2 Box-Cox transformations
3 Residual diagnostics
4 Evaluating forecast accuracy: The traditional approach
5 Evaluating forecast accuracy: The modern approach
6 Prediction intervals
Fitted values
ŷt|t−1 is the forecast of yt based on observations y1, . . . , yt−1.
We call these “fitted values”.
Sometimes drop the subscript: ŷt ≡ ŷt|t−1.
For example:
ŷt = ȳ for average method.
ŷt = yt−1 + (yT − y1)/(T − 1) for drift method.
Forecasting residuals
Residuals in forecasting: difference between observed value and
its fitted value: êt = yt − ŷt|t−1.
Assumptions
1 {et} uncorrelated. If they aren’t, then information left
in residuals that should be used in computing forecasts.
2 {et} have mean zero. If they don’t, then forecasts are
Useful properties (for prediction intervals)
3 {et} have constant variance.
4 {et} are normally distributed.
Forecasting residuals
Residuals in forecasting: difference between observed value and
its fitted value: êt = yt − ŷt|t−1.
Assumptions
1 {et} uncorrelated. If they aren’t, then information left
in residuals that should be used in computing forecasts.
2 {et} have mean zero. If they don’t, then forecasts are
Useful properties (for prediction intervals)
3 {et} have constant variance.
4 {et} are normally distributed.
Forecasting residuals
Residuals in forecasting: difference between observed value and
its fitted value: êt = yt − ŷt|t−1.
Assumptions
1 {et} uncorrelated. If they aren’t, then information left
in residuals that should be used in computing forecasts.
2 {et} have mean zero. If they don’t, then forecasts are
Useful properties (for prediction intervals)
3 {et} have constant variance.
4 {et} are normally distributed.
Example: Google stock price
autoplot(goog200) +
xlab("Day") + ylab("Closing Price (US$)") +
ggtitle("Google Stock (daily ending 6 December 2013)")
0 50 100 150 200
Google Stock (daily ending 6 December 2013)
Example: Google stock price
Naïve forecast:
ŷt|t−1 = yt−1
êt = yt − yt−1
Note: êt are one-step-ahead forecast residuals
Example: Google stock price
Naïve forecast:
ŷt|t−1 = yt−1
êt = yt − yt−1
Note: êt are one-step-ahead forecast residuals
Example: Google stock price
Naïve forecast:
ŷt|t−1 = yt−1
êt = yt − yt−1
Note: êt are one-step-ahead forecast residuals
Example: Google stock price
fits <- fitted(naive(goog200))
autoplot(goog200, series="Data") +
autolayer(fits, series="Fitted") +
xlab("Day") + ylab("Closing Price (US$)") +
ggtitle("Google Stock (daily ending 6 December 2013)")
0 50 100 150 200
Google Stock (daily ending 6 December 2013)
Example: Google stock price
res <- residuals(naive(goog200))
autoplot(res) + xlab("Day") + ylab("") +
ggtitle("Residuals from naive method")
0 50 100 150 200
Residuals from naive method
Example: Google stock price
gghistogram(res, add.normal=TRUE) +
ggtitle("Histogram of residuals")
−20 0 20 40 60
Histogram of residuals
Example: Google stock price
ggAcf(res) + ggtitle("ACF of residuals")
0 5 10 15 20
ACF of residuals
ACF of residuals
We assume that the residuals are white noise (uncorrelated,
mean zero, constant variance). If they aren’t, then there is
information left in the residuals that should be used in
computing forecasts.
So a standard residual diagnostic is to check the ACF of the
residuals of a forecasting method.
We expect these to look like white noise.
Portmanteau tests
Consider a whole set of rk values, and develop a test to see
whether the set is significantly different from a zero set.
Box-Pierce test
where h is max lag being considered and T is number of
observations.
If each rk close to zero, Q will be small.
If some rk values large (positive or negative), Q will be large.
Portmanteau tests
Consider a whole set of rk values, and develop a test to see
whether the set is significantly different from a zero set.
Box-Pierce test
where h is max lag being considered and T is number of
observations.
If each rk close to zero, Q will be small.
If some rk values large (positive or negative), Q will be large.
Portmanteau tests
Consider a whole set of rk values, and develop a test to see
whether the set is significantly different from a zero set.
Ljung-Box test
Q∗ = T(T + 2)
(T − k)−1r2k
where h is max lag being considered and T is number of
observations.
My preferences: h = 10 for non-seasonal data, h = 2m for
seasonal data.
Better performance, especially in small samples.
Portmanteau tests
If data are WN, Q∗ ∼ χ2(h− K), where K =
no. parameters in model.
When applied to raw data, set K = 0.
For the Google example:
# lag=h and fitdf=K
Box.test(res, lag=10, fitdf=0, type="Ljung")
## Box-Ljung test
## data: res
## X-squared = 11.031, df = 10, p-value =
checkresiduals function
checkresiduals(naive(goog200))
0 50 100 150 200
Residuals from Naive method
0 5 10 15 20
−20 0 20 40 60
## Ljung-Box test
## data: Residuals from Naive method
## Q* = 11.031, df = 10, p-value = 0.3551
## Model df: 0. Total lags used: 10
checkresiduals function
## Ljung-Box test
## data: Residuals from Naive method
## Q* = 11.031, df = 10, p-value = 0.3551
## Model df: 0. Total lags used: 10
Self-Practice
Compute seasonal naive forecasts for quarterly Australian beer
production from 1992.
beer <- window(ausbeer, start=1992)
fc <- snaive(beer)
autoplot(fc)
Test if the residuals are white noise.
checkresiduals(fc)
What do you conclude?
1 Some simple forecasting methods
2 Box-Cox transformations
3 Residual diagnostics
4 Evaluating forecast accuracy: The traditional approach
5 Evaluating forecast accuracy: The modern approach
6 Prediction intervals
Partitioning
Partitioning is the 5th step in the forecasting process.
It refers to the splitting of the data set into two parts:
training set (70% - 80%) and test (validation) set.
We develop our forecasting model/method using the training
set (e.g. model identification and estimation).
The developed model is then used to produce forecasts for
the test set period. Actual values in the test set are then
compared with the forecasted values for the test set to
compute forecast errors.
These forecast errors are then summarized to produce
measures of forecasting accuracy.
Training and test sets
Training data Test data
A model which fits the training data well will not necessarily
forecast well.
A perfect fit can always be obtained by using a model with
enough parameters.
Over-fitting a model to data is just as bad as failing to identify
a systematic pattern in the data.
The test set must not be used for any aspect of model
development or calculation of forecasts.
Forecast accuracy is based only on the test set.
Forecast errors
Forecast “error”: the difference between an observed value and
its forecast.
eT+h = yT+h − ŷT+h|T,
where the training data is given by {y1, . . . , yT}
Unlike residuals, forecast errors on the test set involve
multi-step forecasts.
These are true forecast errors as the test data is not used in
computing ŷT+h|T.
Measures of forecast accuracy
1995 2000 2005 2010
s Forecast Method
SeasonalNaive
Forecasts for quarterly beer production
Measures of forecast accuracy
yT+h = (T + h)th observation, h = 1, . . . ,H
ŷT+h|T = its forecast based on data up to time T.
eT+h = yT+h − ŷT+h|T
1. Mean Absolute Error
2. Mean Squared Error
Measures of forecast accuracy
yT+h = (T + h)th observation, h = 1, . . . ,H
ŷT+h|T = its forecast based on data up to time T.
eT+h = yT+h − ŷT+h|T
3. Root Mean Squared Error
4. Mean Absolute Percentage Error
∣∣∣∣eT+hyT+h
Measures of forecast accuracy
MAE, MSE, RMSE are all scale dependent.
MAPE is scale independent but is only sensible if yt � 0 for
The Mean Absolute Scaled Error (MASE) was subsequently
developed by Hyndman and Koehler (IJF, 2006) to be able to
handle zero counts, and also be a scale independent measure.
Measures of forecast accuracy
5. Mean Absolute Scaled Error
where Q is a stable measure of the scale of the time series {yt}.
For non-seasonal time series,
Q = (T − 1)−1
|yt − yt−1|
works well. Then MASE is equivalent to MAE relative to a naive
Measures of forecast accuracy
5. Mean Absolute Scaled Error
where Q is a stable measure of the scale of the time series {yt}.
For seasonal time series,
Q = (T −m)−1
|yt − yt−m|
works well. Then MASE is equivalent to MAE relative to a seasonal
naive method.
Measures of forecast accuracy
1995 2000 2005 2010
s Forecast Method
SeasonalNaive
Forecasts for quarterly beer production
Measures of forecast accuracy - Computing in R
beer2 <- window(ausbeer, start=1992, end=c(2007,4)) #training
beer3 <- window(ausbeer, start=2008) #test
beerfit1 <- meanf(beer2, h=10)
beerfit2 <- rwf(beer2, h=10)
beerfit3 <- snaive(beer2, h=10)
accuracy(beerfit1, beer3)
accuracy(beerfit2, beer3)
accuracy(beerfit3, beer3)
Measures of foreca
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com