程序代写 ETW3420: Principles of Forecasting and Applications

ETW3420: Principles of Forecasting and Applications

Principles of

Copyright By PowCoder代写 加微信 powcoder

Forecasting and
Applications
Topic 3: The Forecaster’s Toolbox

1 Some simple forecasting methods

2 Box-Cox transformations

3 Residual diagnostics

4 Evaluating forecast accuracy: The traditional approach

5 Evaluating forecast accuracy: The modern approach

6 Prediction intervals

Some simple forecasting methods

1995 2000 2005 2010

Australian quarterly beer production

How would you forecast these data?

Some simple forecasting methods

0 50 100 150 200 250 300

Dow−Jones index

How would you forecast these data?

Some simple forecasting methods

1. Average method
Forecast of all future values is equal to mean of
historical data {y1, . . . , yT}.
Forecasts: ŷT+h|T = ȳ = (y1 + · · · + yT)/T

2. Naive method
Forecasts equal to last observed value.
Forecasts: ŷT+h|T = yT.
Consequence of efficient market hypothesis.

3. Seasonal naive method
Forecasts equal to last value from same season.
Forecasts: ŷT+h|T = yT+h−m(k+1), wherem = seasonal
period and k is the integer part of (h− 1)/m.

Some simple forecasting methods

1. Average method
Forecast of all future values is equal to mean of
historical data {y1, . . . , yT}.
Forecasts: ŷT+h|T = ȳ = (y1 + · · · + yT)/T

2. Naive method
Forecasts equal to last observed value.
Forecasts: ŷT+h|T = yT.
Consequence of efficient market hypothesis.

3. Seasonal naive method
Forecasts equal to last value from same season.
Forecasts: ŷT+h|T = yT+h−m(k+1), wherem = seasonal
period and k is the integer part of (h− 1)/m.

Some simple forecasting methods

1. Average method
Forecast of all future values is equal to mean of
historical data {y1, . . . , yT}.
Forecasts: ŷT+h|T = ȳ = (y1 + · · · + yT)/T

2. Naive method
Forecasts equal to last observed value.
Forecasts: ŷT+h|T = yT.
Consequence of efficient market hypothesis.

3. Seasonal naive method
Forecasts equal to last value from same season.
Forecasts: ŷT+h|T = yT+h−m(k+1), wherem = seasonal
period and k is the integer part of (h− 1)/m.

Some simple forecasting methods

4. Drift method
Forecasts equal to last value plus average change.
Forecasts:

ŷT+h|T = yT +

(yt − yt−1)

(yT − y1).

Equivalent to extrapolating a line drawn between first and
last observations.

Some simple forecasting methods

1995 2000 2005 2010

s Forecast

Seasonal naive

Forecasts for quarterly beer production

Some simple forecasting methods

0 50 100 150 200 250 300

Dow Jones Index (daily ending 15 Jul 94)

1 Some simple forecasting methods

2 Box-Cox transformations

3 Residual diagnostics

4 Evaluating forecast accuracy: The traditional approach

5 Evaluating forecast accuracy: The modern approach

6 Prediction intervals

Variance stabilization

If the data shows different variation at different levels
of the series, then a transformation can be useful.

Denote original observations as y1, . . . , yn and
transformed observations as w1, . . . ,wn.

Mathematical transformations for stabilizing variation

Square root wt =

Cube root wt = 3
yt Increasing

Logarithm wt = log(yt) strength

Logarithms, in particular, are useful because they are more
interpretable: changes in a log value are relative (percent)
changes on the original scale.

Variance stabilization

If the data shows different variation at different levels
of the series, then a transformation can be useful.
Denote original observations as y1, . . . , yn and
transformed observations as w1, . . . ,wn.

Mathematical transformations for stabilizing variation

Square root wt =

Cube root wt = 3
yt Increasing

Logarithm wt = log(yt) strength

Logarithms, in particular, are useful because they are more
interpretable: changes in a log value are relative (percent)
changes on the original scale.

Variance stabilization

If the data shows different variation at different levels
of the series, then a transformation can be useful.
Denote original observations as y1, . . . , yn and
transformed observations as w1, . . . ,wn.

Mathematical transformations for stabilizing variation

Square root wt =

Cube root wt = 3
yt Increasing

Logarithm wt = log(yt) strength

Logarithms, in particular, are useful because they are more
interpretable: changes in a log value are relative (percent)
changes on the original scale.

Variance stabilization

If the data shows different variation at different levels
of the series, then a transformation can be useful.
Denote original observations as y1, . . . , yn and
transformed observations as w1, . . . ,wn.

Mathematical transformations for stabilizing variation

Square root wt =

Cube root wt = 3
yt Increasing

Logarithm wt = log(yt) strength

Logarithms, in particular, are useful because they are more
interpretable: changes in a log value are relative (percent)
changes on the original scale. 10

Variance stabilization

1960 1970 1980 1990

Australian electricity production

Variance stabilization

1960 1970 1980 1990

Square root electricity production

Variance stabilization

1960 1970 1980 1990

Cube root electricity production

Variance stabilization

1960 1970 1980 1990

Log electricity production

Variance stabilization

1960 1970 1980 1990

Inverse electricity production

Box-Cox transformations

Each of these transformations is close to a member of the family
of Box-Cox transformations:

 log(yt), λ = 0;(yλt − 1)/λ, λ 6= 0.

λ = 1: (No substantive transformation)
λ = 12 : (Square root plus linear transformation)
λ = 0: (Natural logarithm)
λ = −1: (Inverse plus 1)

Box-Cox transformations

Each of these transformations is close to a member of the family
of Box-Cox transformations:

 log(yt), λ = 0;(yλt − 1)/λ, λ 6= 0.

λ = 1: (No substantive transformation)
λ = 12 : (Square root plus linear transformation)
λ = 0: (Natural logarithm)
λ = −1: (Inverse plus 1)

Box-Cox transformations

Box-Cox transformations

autoplot(BoxCox(elec,lambda=1/3))

1960 1970 1980 1990

Box-Cox transformations

yλt for λ close to zero behaves like logs.
If some yt = 0, then must have λ > 0
if some yt < 0, no power transformation is possible unless all yt adjusted by adding a constant to all values. Simple values of λ are easier to explain. Results are relatively insensitive to λ. Often no transformation (λ = 1) needed. Transformation can have very large effect on PI. Choosing λ = 0 is a simple way to force forecasts to be Automated Box-Cox transformations (BoxCox.lambda(elec)) ## [1] 0.2654076 This attempts to balance the seasonal fluctuations and random variation across the series. Always check the results. A low value of λ can give extremely large prediction intervals. Automated Box-Cox transformations (BoxCox.lambda(elec)) ## [1] 0.2654076 This attempts to balance the seasonal fluctuations and random variation across the series. Always check the results. A low value of λ can give extremely large prediction intervals. Back-transformation We must reverse the transformation (or back-transform) to obtain forecasts on the original scale. The reverse Box-Cox transformations are given by  exp(wt), λ = 0;(λWt + 1)1/λ, λ 6= 0. Back-transformation fit <- snaive(elec, lambda=1/3) autoplot(fit) 1960 1970 1980 1990 Forecasts from Seasonal naive method Back-transformation autoplot(fit, include=120) 1987.5 1990.0 1992.5 1995.0 1997.5 Forecasts from Seasonal naive method Bias adjustment Back-transformed point forecasts are medians. Back-transformed PI have the correct coverage. Back-transformed means Let X have mean µ and variance σ2. Let f(x) be back-transformation function, and Y = f(X). Taylor series expansion about the point X = µ: f(X) = f(µ) + (X − µ)f′(µ) + (X − µ)2f′′(µ). E[Y] = E[f(X)] = f(µ) + 12σ Bias adjustment Back-transformed point forecasts are medians. Back-transformed PI have the correct coverage. Back-transformed means Let X have mean µ and variance σ2. Let f(x) be back-transformation function, and Y = f(X). Taylor series expansion about the point X = µ: f(X) = f(µ) + (X − µ)f′(µ) + (X − µ)2f′′(µ). E[Y] = E[f(X)] = f(µ) + 12σ Bias adjustment Back-transformed point forecasts are medians. Back-transformed PI have the correct coverage. Back-transformed means Let X have mean µ and variance σ2. Let f(x) be back-transformation function, and Y = f(X). Taylor series expansion about the point X = µ: f(X) = f(µ) + (X − µ)f′(µ) + (X − µ)2f′′(µ). E[Y] = E[f(X)] = f(µ) + 12σ Bias adjustment Box-Cox back-transformation:  exp(wt) λ = 0;(λWt + 1)1/λ λ 6= 0. exp(x) λ = 0; (λx + 1)1/λ λ 6= 0. exp(x) λ = 0; (1− λ)(λx + 1)1/λ−2 λ 6= 0. (λµ + 1)1/λ Bias adjustment Box-Cox back-transformation:  exp(wt) λ = 0;(λWt + 1)1/λ λ 6= 0. exp(x) λ = 0; (λx + 1)1/λ λ 6= 0. exp(x) λ = 0; (1− λ)(λx + 1)1/λ−2 λ 6= 0. (λµ + 1)1/λ Bias adjustment fc <- rwf(eggs, drift=TRUE, lambda=0, h=50, level=80) fc2 <- rwf(eggs, drift=TRUE, lambda=0, h=50, level=80, biasadj=TRUE) autoplot(eggs) + autolayer(fc, series="Simple back transformation") + autolayer(fc2, series="Bias adjusted", PI=FALSE) + guides(colour=guide_legend(title="Forecast")) 1900 1950 2000 2050 Bias adjusted Simple back transformation 1 Some simple forecasting methods 2 Box-Cox transformations 3 Residual diagnostics 4 Evaluating forecast accuracy: The traditional approach 5 Evaluating forecast accuracy: The modern approach 6 Prediction intervals Fitted values ŷt|t−1 is the forecast of yt based on observations y1, . . . , yt−1. We call these “fitted values”. Sometimes drop the subscript: ŷt ≡ ŷt|t−1. For example: ŷt = ȳ for average method. ŷt = yt−1 + (yT − y1)/(T − 1) for drift method. Forecasting residuals Residuals in forecasting: difference between observed value and its fitted value: êt = yt − ŷt|t−1. Assumptions 1 {et} uncorrelated. If they aren’t, then information left in residuals that should be used in computing forecasts. 2 {et} have mean zero. If they don’t, then forecasts are Useful properties (for prediction intervals) 3 {et} have constant variance. 4 {et} are normally distributed. Forecasting residuals Residuals in forecasting: difference between observed value and its fitted value: êt = yt − ŷt|t−1. Assumptions 1 {et} uncorrelated. If they aren’t, then information left in residuals that should be used in computing forecasts. 2 {et} have mean zero. If they don’t, then forecasts are Useful properties (for prediction intervals) 3 {et} have constant variance. 4 {et} are normally distributed. Forecasting residuals Residuals in forecasting: difference between observed value and its fitted value: êt = yt − ŷt|t−1. Assumptions 1 {et} uncorrelated. If they aren’t, then information left in residuals that should be used in computing forecasts. 2 {et} have mean zero. If they don’t, then forecasts are Useful properties (for prediction intervals) 3 {et} have constant variance. 4 {et} are normally distributed. Example: Google stock price autoplot(goog200) + xlab("Day") + ylab("Closing Price (US$)") + ggtitle("Google Stock (daily ending 6 December 2013)") 0 50 100 150 200 Google Stock (daily ending 6 December 2013) Example: Google stock price Naïve forecast: ŷt|t−1 = yt−1 êt = yt − yt−1 Note: êt are one-step-ahead forecast residuals Example: Google stock price Naïve forecast: ŷt|t−1 = yt−1 êt = yt − yt−1 Note: êt are one-step-ahead forecast residuals Example: Google stock price Naïve forecast: ŷt|t−1 = yt−1 êt = yt − yt−1 Note: êt are one-step-ahead forecast residuals Example: Google stock price fits <- fitted(naive(goog200)) autoplot(goog200, series="Data") + autolayer(fits, series="Fitted") + xlab("Day") + ylab("Closing Price (US$)") + ggtitle("Google Stock (daily ending 6 December 2013)") 0 50 100 150 200 Google Stock (daily ending 6 December 2013) Example: Google stock price res <- residuals(naive(goog200)) autoplot(res) + xlab("Day") + ylab("") + ggtitle("Residuals from naive method") 0 50 100 150 200 Residuals from naive method Example: Google stock price gghistogram(res, add.normal=TRUE) + ggtitle("Histogram of residuals") −20 0 20 40 60 Histogram of residuals Example: Google stock price ggAcf(res) + ggtitle("ACF of residuals") 0 5 10 15 20 ACF of residuals ACF of residuals We assume that the residuals are white noise (uncorrelated, mean zero, constant variance). If they aren’t, then there is information left in the residuals that should be used in computing forecasts. So a standard residual diagnostic is to check the ACF of the residuals of a forecasting method. We expect these to look like white noise. Portmanteau tests Consider a whole set of rk values, and develop a test to see whether the set is significantly different from a zero set. Box-Pierce test where h is max lag being considered and T is number of observations. If each rk close to zero, Q will be small. If some rk values large (positive or negative), Q will be large. Portmanteau tests Consider a whole set of rk values, and develop a test to see whether the set is significantly different from a zero set. Box-Pierce test where h is max lag being considered and T is number of observations. If each rk close to zero, Q will be small. If some rk values large (positive or negative), Q will be large. Portmanteau tests Consider a whole set of rk values, and develop a test to see whether the set is significantly different from a zero set. Ljung-Box test Q∗ = T(T + 2) (T − k)−1r2k where h is max lag being considered and T is number of observations. My preferences: h = 10 for non-seasonal data, h = 2m for seasonal data. Better performance, especially in small samples. Portmanteau tests If data are WN, Q∗ ∼ χ2(h− K), where K = no. parameters in model. When applied to raw data, set K = 0. For the Google example: # lag=h and fitdf=K Box.test(res, lag=10, fitdf=0, type="Ljung") ## Box-Ljung test ## data: res ## X-squared = 11.031, df = 10, p-value = checkresiduals function checkresiduals(naive(goog200)) 0 50 100 150 200 Residuals from Naive method 0 5 10 15 20 −20 0 20 40 60 ## Ljung-Box test ## data: Residuals from Naive method ## Q* = 11.031, df = 10, p-value = 0.3551 ## Model df: 0. Total lags used: 10 checkresiduals function ## Ljung-Box test ## data: Residuals from Naive method ## Q* = 11.031, df = 10, p-value = 0.3551 ## Model df: 0. Total lags used: 10 Self-Practice Compute seasonal naive forecasts for quarterly Australian beer production from 1992. beer <- window(ausbeer, start=1992) fc <- snaive(beer) autoplot(fc) Test if the residuals are white noise. checkresiduals(fc) What do you conclude? 1 Some simple forecasting methods 2 Box-Cox transformations 3 Residual diagnostics 4 Evaluating forecast accuracy: The traditional approach 5 Evaluating forecast accuracy: The modern approach 6 Prediction intervals Partitioning Partitioning is the 5th step in the forecasting process. It refers to the splitting of the data set into two parts: training set (70% - 80%) and test (validation) set. We develop our forecasting model/method using the training set (e.g. model identification and estimation). The developed model is then used to produce forecasts for the test set period. Actual values in the test set are then compared with the forecasted values for the test set to compute forecast errors. These forecast errors are then summarized to produce measures of forecasting accuracy. Training and test sets Training data Test data A model which fits the training data well will not necessarily forecast well. A perfect fit can always be obtained by using a model with enough parameters. Over-fitting a model to data is just as bad as failing to identify a systematic pattern in the data. The test set must not be used for any aspect of model development or calculation of forecasts. Forecast accuracy is based only on the test set. Forecast errors Forecast “error”: the difference between an observed value and its forecast. eT+h = yT+h − ŷT+h|T, where the training data is given by {y1, . . . , yT} Unlike residuals, forecast errors on the test set involve multi-step forecasts. These are true forecast errors as the test data is not used in computing ŷT+h|T. Measures of forecast accuracy 1995 2000 2005 2010 s Forecast Method SeasonalNaive Forecasts for quarterly beer production Measures of forecast accuracy yT+h = (T + h)th observation, h = 1, . . . ,H ŷT+h|T = its forecast based on data up to time T. eT+h = yT+h − ŷT+h|T 1. Mean Absolute Error 2. Mean Squared Error Measures of forecast accuracy yT+h = (T + h)th observation, h = 1, . . . ,H ŷT+h|T = its forecast based on data up to time T. eT+h = yT+h − ŷT+h|T 3. Root Mean Squared Error 4. Mean Absolute Percentage Error ∣∣∣∣eT+hyT+h Measures of forecast accuracy MAE, MSE, RMSE are all scale dependent. MAPE is scale independent but is only sensible if yt � 0 for The Mean Absolute Scaled Error (MASE) was subsequently developed by Hyndman and Koehler (IJF, 2006) to be able to handle zero counts, and also be a scale independent measure. Measures of forecast accuracy 5. Mean Absolute Scaled Error where Q is a stable measure of the scale of the time series {yt}. For non-seasonal time series, Q = (T − 1)−1 |yt − yt−1| works well. Then MASE is equivalent to MAE relative to a naive Measures of forecast accuracy 5. Mean Absolute Scaled Error where Q is a stable measure of the scale of the time series {yt}. For seasonal time series, Q = (T −m)−1 |yt − yt−m| works well. Then MASE is equivalent to MAE relative to a seasonal naive method. Measures of forecast accuracy 1995 2000 2005 2010 s Forecast Method SeasonalNaive Forecasts for quarterly beer production Measures of forecast accuracy - Computing in R beer2 <- window(ausbeer, start=1992, end=c(2007,4)) #training beer3 <- window(ausbeer, start=2008) #test beerfit1 <- meanf(beer2, h=10) beerfit2 <- rwf(beer2, h=10) beerfit3 <- snaive(beer2, h=10) accuracy(beerfit1, beer3) accuracy(beerfit2, beer3) accuracy(beerfit3, beer3) Measures of foreca 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com