ETC3231/5231 Business forecasting
Ch5. The forecasters’ toolbox OTexts.org/fpp3/
Outline
1 A tidy forecasting workflow
2 Some simple forecasting methods
3 Residual diagnostics
4 Distributional forecasts and prediction intervals
5 Forecasting with transformations
6 Forecasting and decomposition
7 Evaluating forecast accuracy
8 Time series cross-validation 2
Outline
1 A tidy forecasting workflow
2 Some simple forecasting methods
3 Residual diagnostics
4 Distributional forecasts and prediction intervals
5 Forecasting with transformations
6 Forecasting and decomposition
7 Evaluating forecast accuracy
8 Time series cross-validation 3
A tidy forecasting workflow
The process of producing forecasts can be split up into a few fundamental steps.
1 Preparing data
2 Data visualisation
3 Specifying a model
4 Model estimation
5 Accuracy & performance evaluation
6 Producing forecasts
4
A tidy forecasting workflow
Tidy Visualise
Estimate Forecast
Specify
Evaluate
5
Data preparation (tidy)
gdppc <- global_economy %>%
mutate(GDP_per_capita = GDP/Population) %>%
select(Year, Country, GDP, Population, GDP_per_capita)
gdppc
## # A tsibble: 15,150 x 5 [1Y]
## # Key: Country [263]
## Year Country GDP Population GDP_per_capita
##
## 1 1960 Afghanistan 537777811.
## 2 1961 Afghanistan 548888896.
## 3 1962 Afghanistan 546666678.
## 4 1963 Afghanistan 751111191.
## 5 1964 Afghanistan 800000044.
## 6 1965 Afghanistan 1006666638.
8996351 59.8
9166764 59.9
9345868 58.5
9533954 78.8
9731361 82.2
9938414 101.
6
Data visualisation
gdppc %>%
filter(Country==”Sweden”) %>%
autoplot(GDP_per_capita) +
labs(title = “GDP per capita for Sweden”, y = “$US”)
GDP per capita for Sweden
60000
40000
20000
0
1960 1980
2000
Year [1Y]
7
$US
Model estimation
The model() function trains models to data.
fit <- gdppc %>%
model(trend_model = TSLM(GDP_per_capita ~ trend()))
fit
## # A mable: 263 x 2
## # Key: Country [263]
## Country trend_model
##
## 1 Afghanistan
## 2 Albania
## 3 Algeria
## 4 American Samoa
## 5 Andorra
8
## 6 Angola
Model estimation
The model() function trains models to data.
fit <- gdppc %>%
model(trend_model = TSLM(GDP_per_capita ~ trend()))
fit
## # A mable: 263 x 2
## # Key: Country [263]
## Country trend_model
##
## 1 Afghanistan
## 2 Albania
## 3 Algeria
## 4 American Samoa
## 5 Andorra
A mable is a model table, each cell corresponds to a fitted model.
8
## 6 Angola
Producing forecasts
fit %>% forecast(h = “3 years”)
## # A fable: 789 x 5 [1Y]
## # Key:
## Country
##
## 1 Afghanistan
## 2 Afghanistan
## 3 Afghanistan
## 4 Albania
## 5 Albania
## 6 Albania
## 7 Algeria
## 8 Algeria
## 9 Algeria
GDP_per_capita .mean
N(526, 9653) 526.
N(534, 9689) 534.
N(542, 9727) 542.
trend_model 2018 N(4716, 476419) 4716.
trend_model 2019 N(4867, 481086) 4867.
trend_model 2020 N(5018, 486012) 5018.
trend_model 2018 N(4410, 643094) 4410.
trend_model 2019 N(4489, 645311) 4489.
trend_model 2020 N(4568, 647602) 4568.
Country, .model [263]
.model Year
trend_model
trend_model
trend_model
2018
2019
2020
9
Producing forecasts
fit %>% forecast(h = “3 years”)
## # A fable: 789 x 5 [1Y]
## # Key:
## Country
##
Country, .model [263]
.model Year
GDP_per_capita .mean
N(526, 9653) 526.
N(534, 9689) 534.
N(542, 9727) 542.
## 1 Afghanistan
## 2 Afghanistan
## 3 Afghanistan
## 4 Albania
## 5 Albania
## 6 Albania
## 7 Algeria
## 8 Algeria
A fable is a forecast table with point forecasts and distributions.
## 9 Algeria trend_model 2020 N(4568, 647602) 4568.
trend_model
trend_model
trend_model
2018
2019
2020
trend_model 2018 N(4716, 476419) 4716.
trend_model 2019 N(4867, 481086) 4867.
trend_model 2020 N(5018, 486012) 5018.
trend_model 2018 N(4410, 643094) 4410.
trend_model 2019 N(4489, 645311) 4489.
9
Visualising forecasts
fit %>% forecast(h = “3 years”) %>%
filter(Country==”Sweden”) %>%
autoplot(gdppc) +
labs(title = “GDP per capita for Sweden”, y = “$US”)
GDP per capita for Sweden
60000
40000
20000
0
l
1960 1980
2000 2020
Year
evel
80 95
10
$US
Outline
1 A tidy forecasting workflow
2 Some simple forecasting methods
3 Residual diagnostics
4 Distributional forecasts and prediction intervals
5 Forecasting with transformations
6 Forecasting and decomposition
7 Evaluating forecast accuracy
8 Time series cross-validation 11
Some simple forecasting methods
MEAN(y): Average method
Forecast of all future values is equal to mean of historical data {y1, . . . , yT}.
Forecasts:yˆT+h|T =y ̄=(y1+···+yT)/T
Clay brick production in Australia
600
500
400
300
200
1960 Q1
1980 Q1
2000 Q1
Quarter
12
Bricks
Some simple forecasting methods
NAIVE(y): Naïve method
Forecasts equal to last observed value.
Forecasts: yˆT+h|T = yT.
Consequence of efficient market hypothesis.
500
450
400
350
300
Clay brick production in Australia
1995 Q1
2000 Q1
Quarter
2005 Q1
2010 Q1
13
Bricks
Some simple forecasting methods
SNAIVE(y ~ lag(m)):Seasonalnaïvemethod
Forecasts equal to last value from same season. Forecasts: yˆT+h|T = yT+h−m(k+1), where m = seasonal period and k is the integer part of (h − 1)/m.
Clay brick production in Australia
500
450
400
350
300
1995 Q1
2000 Q1
Quarter
2005 Q1
2010 Q1
14
Bricks
Some simple forecasting methods
RW(y ~ drift()):Driftmethod
Forecasts equal to last value plus average change.
Forecasts:
yˆT+h|T=yT+T−1 (yt−yt−1)
h T t=2
=yT+ h (yT−y1). T−1
Equivalent to extrapolating a line drawn between first and last observations.
15
Some simple forecasting methods
Drift method
Clay brick production in Australia
600
500
400
300
200
1960 Q1
1980 Q1
Quarter
2000 Q1
16
Bricks
Model fitting
The model() function trains models to data.
brick_fit <- aus_production %>%
filter(!is.na(Bricks)) %>%
model(
Seasonal_naive = SNAIVE(Bricks),
Naive = NAIVE(Bricks),
Drift = RW(Bricks ~ drift()),
Mean = MEAN(Bricks)
)
## # A mable: 1 x 4
## ## ## 1
Seasonal_naive Naive Drift Mean
17
Producing forecasts
brick_fc <- brick_fit %>%
forecast(h = “5 years”)
## # A fable: 80 x 4 [1Q]
## # Key:
## .model
##
## 1 Seasonal_naive 2005 Q3 N(428, 2336) 428
## 2 Seasonal_naive 2005 Q4 N(397, 2336) 397
## 3 Seasonal_naive 2006 Q1 N(355, 2336) 355
## 4 Seasonal_naive 2006 Q2 N(435, 2336) 435
## # … with 76 more rows
.model [4]
Quarter Bricks .mean
18
Visualising forecasts
brick_fc %>%
autoplot(aus_production, level = NULL) +
labs(title = “Clay brick production in Australia”,
y = “Millions of bricks”) +
guides(colour = guide_legend(title = “Forecast”))
Clay brick production in Australia
600
500
400
300
200
1960 Q1
1980 Q1
2000 Q1
Quarter
Forecast Drift
Mean
Naive Seasonal_naive
19
Millions of bricks
Facebook closing stock price
# Extract training data
fb_stock <- gafa_stock %>%
filter(Symbol == “FB”) %>%
mutate(trading_day = row_number()) %>%
update_tsibble(index=trading_day, regular=TRUE)
# Specify, estimate and forecast
fb_stock %>%
model(
Mean = MEAN(Close),
Naive = NAIVE(Close),
Drift = RW(Close ~ drift())
) %>%
forecast(h=42) %>%
autoplot(fb_stock, level = NULL) +
labs(title = “Facebook closing stock price”, y=”$US”) +
guides(colour=guide_legend(title=”Forecast”))
20
Facebook closing stock price
Facebook closing stock price
200
150
Forecast Drift
Mean Naive
100
50
0 500
1000
trading_day
21
$US
Outline
1 A tidy forecasting workflow
2 Some simple forecasting methods
3 Residual diagnostics
4 Distributional forecasts and prediction intervals
5 Forecasting with transformations
6 Forecasting and decomposition
7 Evaluating forecast accuracy
8 Time series cross-validation 22
Fitted values
yˆt|t−1 is the forecast of yt based on observations y1,…,yt−1.
We call these “fitted values”.
Sometimes drop the subscript: yˆt ≡ yˆt|t−1. Often not true forecasts since parameters are estimated on all data.
For example:
yˆt = y ̄ for average method.
yˆt = yt−1 + (yT − y1)/(T − 1) for drift method.
23
Forecasting residuals
Residuals in forecasting: difference between observed value and its fitted value: et = yt − yˆt|t−1.
24
Forecasting residuals
Residuals in forecasting: difference between observed value and its fitted value: et = yt − yˆt|t−1.
Assumptions
1 {et} uncorrelated. If they aren’t, then information left in residuals that should be used in computing forecasts.
2 {et} have mean zero. If they don’t, then forecasts are biased.
24
Forecasting residuals
Residuals in forecasting: difference between observed value and its fitted value: et = yt − yˆt|t−1.
Assumptions
1 {et} uncorrelated. If they aren’t, then information left in residuals that should be used in computing forecasts.
2 {et} have mean zero. If they don’t, then forecasts are biased.
Useful properties (for distributions & prediction intervals)
3 {et} have constant variance.
4 {et} are normally distributed.
24
Facebook closing stock price
fb_stock <- gafa_stock %>%
filter(Symbol == “FB”) %>%
mutate(trading_day = row_number()) %>%
update_tsibble(index = trading_day, regular = TRUE)
fb_stock %>% autoplot(Close)
25
200
150
100
50
0 400
800 1200
trading_day [1]
Close
Facebook closing stock price
fit <- fb_stock %>% model(NAIVE(Close))
augment(fit)
## # A tsibble: 1,258 x 7 [1]
## # Key: Symbol, .model [1]
## Symbol .model
trading_day Close .fitted .resid .innov
##
## 1 FB
## 2 FB
0.150
## 3 FB
## 4 FB
## 5 FB
## 6 FB
1.01
## 7 FB
## 8 FB
2.03
## 9 FB
NAIVE(Clo~
NAIVE(Clo~
NAIVE(Clo~
NAIVE(Clo~
NAIVE(Clo~
NAIVE(Clo~
NAIVE(Clo~
NAIVE(Clo~
NAIVE(Clo~
1 54.7
2 54.6
3 57.2
4 57.9
5 58.2
6 57.2
7 57.9 8 55.9
NA NA NA
54.7 -0.150 –
54.6 2.64 2.64
57.2 0.720 0.720
57.9 0.310 0.310
58.2 -1.01 –
57.2 0.720 0.720
57.9 -2.03 –
26
9 57.7
55.9 1.83 1.83
Facebook closing stock price
fit <- fb_stock %>% model(NAIVE(Close))
augment(fit)
## # A tsibble: 1,258 x 7 [1]
## # Key: Symbol, .model [1]
## Symbol .model
##
## 1 FB
## 2 FB
0.150
## 3 FB
## 4 FB
## 5 FB
## 6 FB
NAIVE(Clo~
NAIVE(Clo~
NAIVE(Clo~
NAIVE(Clo~
NAIVE(Clo~
NAIVE(Clo~
1 54.7
2 54.6
3 57.2
4 57.9
5 58.2
6 57.2
7 57.9 8 55.9
NA NA NA
54.7 -0.150 –
54.6 2.64 2.64
57.2 0.720 0.720
57.9 0.310 0.310
58.2 -1.01 –
57.2 0.720 0.720
57.9 -2.03 –
1.01
yˆt|t−1 et trading_day Close .fitted .resid .innov
Naïve forecasts:
yˆ ## 7= FyB NAIVE(Clo~ t|t−1 t−1
## 8 FB NAIVE(Clo~
## 9 FB NAIVE(Clo~
e = y − yˆ = y − y 2.0t3 t t|t−1 t
t−1
26
9 57.7
55.9 1.83 1.83
Facebook closing stock price
augment(fit) %>%
ggplot(aes(x = trading_day)) +
geom_line(aes(y = Close, colour = “Data”)) +
geom_line(aes(y = .fitted, colour = “Fitted”))
200
150
100
colour Data
Fitted
Close
50
0 400
800 1200
27
trading_day
Facebook closing stock price
augment(fit) %>%
filter(trading_day > 1100) %>%
ggplot(aes(x = trading_day)) +
geom_line(aes(y = Close, colour = “Data”)) +
geom_line(aes(y = .fitted, colour = “Fitted”))
28
200
175
150
125
1100 1150 1200 1250
colour
Data Fitted
Close
Facebook closing stock price
augment(fit) %>%
autoplot(.resid) +
labs(y = “$US”,
title = “Residuals from naïve method”)
Residuals from naïve method
0
−20
−40
0 400
800 1200
trading_day [1]
29
$US
Facebook closing stock price
augment(fit) %>%
ggplot(aes(x = .resid)) +
geom_histogram(bins = 150) +
labs(title = “Histogram of residuals”)
Histogram of residuals
150
100
50
0
−40 −20
0
.resid
30
count
Facebook closing stock price
augment(fit) %>%
ACF(.resid) %>%
autoplot() + labs(title = “ACF of residuals”)
ACF of residuals
0.06
0.03
0.00
−0.03
−0.06
10 20 30
lag [1]
31
acf
gg_tsresiduals() function gg_tsresiduals(fit)
0
−20
−40
0.06 0.03 0.00
−0.03 −0.06
0 400
800 1200
trading_day
150
100
50
acf .resid
count
0
10 20 30 −40 −20 0
lag [1] .resid
32
ACF of residuals
We assume that the residuals are white noise (uncorrelated, mean zero, constant variance). If they aren’t, then there is information left in the residuals that should be used in computing forecasts.
So a standard residual diagnostic is to check the ACF of the residuals of a forecasting method.
We expect these to look like white noise.
33
Portmanteau tests
Consider a whole set of rk values, and develop a test to see whether the set is significantly different from a zero set.
34
Portmanteau tests
Consider a whole set of rk values, and develop a test to see whether the set is significantly different from a zero set.
Box-Pierce test
l
Q = T r k2 k=1
where l is max lag being considered and T is number of observations.
If each rk close to zero, Q will be small.
If some rk values large (positive or negative), Q will be large.
34
Portmanteau tests
Consider a whole set of rk values, and develop a test to see whether the set is significantly different from a zero set.
Ljung-Box test
l
Q ∗ = T ( T + 2 ) ( T − k ) − 1 r k2 k=1
where l is max lag being considered and T is number of observations.
My preferences: l = 10 for non-seasonal data,
h = 2m for seasonal data. Betterperformance,especiallyinsmallsamples. 35
Portmanteau tests
If data are WN, Q∗ has χ2 distribution with (l − K) degrees of freedom where K = no. parameters in model.
When applied to raw data, set K = 0.
lag = l, dof = K
augment(fit) %>%
features(.resid, ljung_box, lag=10, dof=0)
## # A tibble: 1 x 4
## Symbol .model
##
## 1 FB NAIVE(Close) 12.1 0.276
lb_stat lb_pvalue
36
Outline
1 A tidy forecasting workflow
2 Some simple forecasting methods
3 Residual diagnostics
4 Distributional forecasts and prediction intervals
5 Forecasting with transformations
6 Forecasting and decomposition
7 Evaluating forecast accuracy
8 Time series cross-validation 37
Forecast distributions
A forecast yˆT+h|T is (usually) the mean of the conditionaldistributionyT+h |y1,…,yT. Most time series models produce normally distributed forecasts.
The forecast distribution describes the probability of observing any future value.
38
Forecast distributions
Assuming residuals are normal, uncorrelated, sd = σˆ :
Mean:
Naïve: Seasonal naïve: Drift:
yˆT+h|T ∼N(y ̄,(1+1/T)σˆ2)
yˆT+h|T ∼ N(yT,hσˆ2)
yˆT+h|T ∼ N(yT+h−m(k+1), (k + 1)σˆ2)
yˆT+h|T ∼N(yT + h (yT −y1),hT+hσˆ2) T−1 T
where k is the integer part of (h − 1)/m.
Note that when h = 1 and T is large, these all give the same approximate forecast variance: σˆ2.
39
Prediction intervals
A prediction interval gives a region within which we expect yT+h to lie with a specified probability.
Assuming forecast errors are normally distributed, then a 95% PI is
yˆT+h|T ± 1.96σˆh
where σˆh is the st dev of the h-step distribution.
When h = 1, σˆh can be estimated from the residuals.
40
Prediction intervals
brick_fc %>% hilo(level = 95)
## # A tsibble: 80 x 5 [1Q]
## # Key:
## .model
##
## 1 Seasonal_naive 2005 Q3 N(428, 2336)
## 2 Seasonal_naive 2005 Q4 N(397, 2336)
## 3 Seasonal_naive 2006 Q1 N(355, 2336)
## 4 Seasonal_naive 2006 Q2 N(435, 2336)
## 5 Seasonal_naive 2006 Q3 N(428, 4672)
## 6 Seasonal_naive 2006 Q4 N(397, 4672)
## 7 Seasonal_naive 2007 Q1 N(355, 4672)
## 8 Seasonal_naive 2007 Q2 N(435, 4672)
## 9 Seasonal_naive 2007 Q3 N(428, 7008)
## 10 Seasonal_naive 2007 Q4 N(397, 7008)
.model [4]
Quarter Bricks .mean 95%
428 [333, 523]95
397 [302, 492]95
355 [260, 450]95
435 [340, 530]95
428 [294, 562]95
397 [263, 531]95
355 [221, 489]95
435 [301, 569]95
428 [264, 592]95
397 [233, 561]95
41
Prediction intervals
Point forecasts often useless without a measure of uncertainty (such as prediction intervals). Prediction intervals require a stochastic model (with random errors, etc).
For most models, prediction intervals get wider as the forecast horizon increases.
Use level argument to control coverage. Check residual assumptions before believing them.
Usually too narrow due to unaccounted uncertainty.
42
Outline
1 A tidy forecasting workflow
2 Some simple forecasting methods
3 Residual diagnostics
4 Distributional forecasts and prediction intervals
5 Forecasting with transformations
6 Forecasting and decomposition
7 Evaluating forecast accuracy
8 Time series cross-validation 43
Modelling with transformations
eggs <- prices %>%
filter(!is.na(eggs)) %>% select(eggs)
eggs %>% autoplot() +
labs(title=”Annual egg prices”,
y=”$US (adjusted for inflation)”)
300
200
100
Annual egg prices
1900 1925 1950 1975
year [1Y]
44
$US (adjusted for inflation)
Modelling with transformations
Transformations used in the left of the formula will be automatically back-transformed. To model log-transformed egg prices, you could use:
fit <- eggs %>%
model(RW(log(eggs) ~ drift()))
fit
## # A mable: 1 x 1
## ## ## 1
RW(log(eggs) ~ drift())
45
Forecasting with transformations
fc <- fit %>%
forecast(h = 50)
fc
## # A fable: 50 x 4 [1Y]
## # Key: .model [1]
## .model year eggs .mean
##
## 1 RW(log(eggs) ~ drift()) 1994 t(N(4.1, 0.018)) 61.8
## 2 RW(log(eggs) ~ drift()) 1995 t(N(4.1, 0.036)) 61.4
## 3 RW(log(eggs) ~ drift()) 1996 t(N(4.1, 0.054)) 61.0
## 4 RW(log(eggs) ~ drift()) 1997 t(N(4.1, 0.073)) 60.5
## 5 RW(log(eggs) ~ drift()) 1998 t(N(4.1, 0.093)) 60.1
## 6 RW(log(eggs) ~ drift()) 1999
## 7 RW(log(eggs) ~ drift()) 2000
## 8 RW(log(eggs) ~ drift()) 2001
t(N(4, 0.11)) 59.7
t(N(4, 0.13)) 59.3
46
t(N(4, 0.15)) 58.9
Forecasting with transformations
fc %>% autoplot(eggs) +
labs(title=”Annual egg prices”,
y=”US$ (adjusted for inflation)”)
Annual egg prices
300
200
100
0
l
1900 1950
2000 2050
year
evel
80 95
47
US$ (adjusted for inflation)
Bias adjustment
Back-transformed point forecasts are medians. Back-transformed PI have the correct coverage.
48
Bias adjustment
Back-transformed point forecasts are medians. Back-transformed PI have the correct coverage.
Back-transformed means
Let X be have mean μ and variance σ2.
Let f(x) be back-transformation function, and Y = f(X). Taylor series expansion about μ:
f(X) = f(μ) + (X − μ)f′(μ) + 1(X − μ)2f′′(μ). 2
48
Bias adjustment
Back-transformed point forecasts are medians. Back-transformed PI have the correct coverage.
Back-transformed means
Let X be have mean μ and variance σ2.
Let f(x) be back-transformation function, and Y = f(X). Taylor series expansion about μ:
f(X) = f(μ) + (X − μ)f′(μ) + 1(X − μ)2f′′(μ). 2
E[Y] = E[f(X)] = f(μ) + 1σ2f′′(μ) 2
48
Bias adjustment
Box-Cox back-transformation:
yt = f(x) =
f′′(x) =
exp(wt)
λ=0; λ̸=0.
λ = 0 ; λ ̸= 0.
(λWt +1)1/λ (λx + 1)1/λ
e x
e x
(1 − λ)(λx + 1)1/λ−2
λ = 0 ; λ ̸= 0.
49
Bias adjustment
Box-Cox back-transformation:
yt = f(x) =
exp(wt)
λ=0; λ̸=0.
λ = 0 ; λ ̸= 0.
(λWt +1)1/λ (λx + 1)1/λ
e x
e x
(1 − λ)(λx + 1)1/λ−2
λ = 0 ; λ ̸= 0.
λ = 0 ; λ̸=0.
f′′(x) =
e μ 1 + σ 2
2
E[Y]= 2
(λμ+1)1/λ 1+ σ (1−λ) 2(λμ+1)2
49
Bias adjustment
fc %>%
autoplot(eggs, level = 80, point_forecast = lst(mean, median)) +
labs(title=”Annual egg prices”,
y=”US$ (adjusted for inflation)”)
Annual egg prices
300
200
100
0
level 80
Point forecast
mean median
1900 1950
2000 2050
year
50
US$ (adjusted for inflation)
Outline
1 A tidy forecasting workflow
2 Some simple forecasting methods
3 Residual diagnostics
4 Distributional forecasts and prediction intervals
5 Forecasting with transformations
6 Forecasting and decomposition
7 Evaluating forecast accuracy
8 Time series cross-validation 51
Forecasting and decomposition
y t = Sˆ t + Aˆ t
Aˆt is seasonally adjusted component
Sˆt is seasonal component.
Forecast Sˆt using SNAIVE.
Forecast Aˆt using non-seasonal time series method.
Combine forecasts of Sˆt and Aˆt to get forecasts of original data.
52
US Retail Employment
us_retail_employment <- us_employment %>%
filter(year(Month) >= 1990, Title == “Retail Trade”) %>%
select(-Series_ID)
us_retail_employment
## # A tsibble: 357 x 3 [1M]
## Month Title
##
## 1 1990 Jan Retail Trade
## 2 1990 Feb Retail Trade
## 3 1990 Mar Retail Trade
## 4 1990 Apr Retail Trade
## 5 1990 May Retail Trade
## 6 1990 Jun Retail Trade
## 7 1990 Jul Retail Trade
## 8 1990 Aug Retail Trade
## 9 1990 Sep Retail Trade
## 10 1990 Oct Retail Trade
Employed
13256.
12966.
12938.
13012.
13108.
13183.
13170.
13160.
13113.
13185.
53
US Retail Employment
dcmp <- us_retail_employment %>%
model(STL(Employed)) %>%
components() %>% select(-.model)
dcmp
## # A tsibble: 357 x 6 [1M]
## Month Employed trend season_year remainder
##
## 1 1990 Jan
## 2 1990 Feb
## 3 1990 Mar
## 4 1990 Apr
## 5 1990 May
## 6 1990 Jun 13183. 13193.
## 7 1990 Jul 13170. 13173.
## 8 1990 Aug 13160. 13152.
## 9 1990 Sep 13113. 13131.
## 10 1990 Oct 13185. 13110.
-38.1
-261.
-291.
-221.
-115.
-44.2 -23.0
0.0892
13256. 13291.
12966. 13272.
12938. 13252.
13012. 13233.
13108. 13213.
9.98
-25.6 15.7
-24.4 22.0
-11.8 19.5
-43.4 25.7
62.5 12.3
54
US Retail Employment
dcmp %>%
model(NAIVE(season_adjust)) %>%
forecast() %>%
autoplot(dcmp) +
labs(title = “Naive forecasts of seasonally adjusted data”)
Naive forecasts of seasonally adjusted data
16000
15000
14000
13000
l
1990 Jan
2000 Jan
2010 Jan
2020 Jan
Month
evel
80 95
55
season_adjust
US Retail Employment
us_retail_employment %>%
model(stlf = decomposition_model(
STL(Employed ~ trend(window = 7), robust = TRUE),
NAIVE(season_adjust)
)) %>%
forecast() %>%
autoplot(us_retail_employment)
56
16000
15000
14000
13000
l
1990 Jan
2000 Jan
2010 Jan
2020 Jan
Month
evel
80 95
Employed
Decomposition models
decomposition_model() creates a decomposition model
You must provide a method for forecasting the season_adjust series.
A seasonal naive method is used by default for the seasonal components.
The variances from both the seasonally adjusted and seasonal forecasts are combined.
57
Outline
1 A tidy forecasting workflow
2 Some simple forecasting methods
3 Residual diagnostics
4 Distributional forecasts and prediction intervals
5 Forecasting with transformations
6 Forecasting and decomposition
7 Evaluating forecast accuracy
8 Time series cross-validation 58
Training and test sets
Training data Test data
time
A model which fits the training data well will not necessarily forecast well.
A perfect fit can always be obtained by using a model with enough parameters.
Over-fitting a model to data is just as bad as failing to identify a systematic pattern in the data.
The test set must not be used for any aspect of model development or calculation of forecasts. Forecast accuracy is based only on the test set.
59
Forecast errors
Forecast “error”: the difference between an observed value and its forecast.
e T + h = y T + h − yˆ T + h | T ,
where the training data is given by {y1, . . . , yT}
Unlike residuals, forecast errors on the test set involve multi-step forecasts.
These are true forecast errors as the test data is not used in computing yˆT+h|T.
60
Measures of forecast accuracy
Forecasts for quarterly beer production
500
450
400
Forecast Drift
Mean
Naive Seasonal_naive
1995 Q1 2000 Q1 2005 Q1 2010 Q1
Quarter
61
Megalitres
Measures of forecast accuracy
yT+h = (T+h)thobservation,h=1,…,H yˆT+h|T = its forecast based on data up to time T.
eT+h = yT+h − yˆT+h|T
MAE = mean(|eT+h|)
MSE=mean(e2T+h) RMSE =mean(e2T+h)
MAPE = 100mean(|eT+h|/|yT+h|)
62
Measures of forecast accuracy
yT+h = (T+h)thobservation,h=1,…,H yˆT+h|T = its forecast based on data up to time T.
eT+h = yT+h − yˆT+h|T
MAE = mean(|eT+h|)
MSE=mean(e2T+h) RMSE =mean(e2T+h)
MAPE = 100mean(|eT+h|/|yT+h|)
MAE, MSE, RMSE are all scale dependent. MAPE is scale independent but is only sensible if yt ≫0forallt,andyhasanaturalzero.
62
Measures of forecast accuracy
Mean Absolute Scaled Error
MASE = mean(|eT+h|/Q)
where Q is a stable measure of the scale of the time
series {yt}.
Proposed by Hyndman and Koehler (IJF, 2006).
For non-seasonal time series,
T Q=(T−1)−1|yt −yt−1|
t=2
works well. Then MASE is equivalent to MAE relative
to a naïve method.
63
Measures of forecast accuracy
Mean Absolute Scaled Error
MASE = mean(|eT+h|/Q)
where Q is a stable measure of the scale of the time
series {yt}.
Proposed by Hyndman and Koehler (IJF, 2006).
For seasonal time series,
T Q=(T−m)−1 |yt −yt−m|
t=m+1
works well. Then MASE is equivalent to MAE relative
to a seasonal naïve method.
64
Measures of forecast accuracy
Forecasts for quarterly beer production
500
450
400
Forecast Drift
Mean
Naive Seasonal_naive
1995 Q1 2000 Q1 2005 Q1 2010 Q1
Quarter
65
Megalitres
Measures of forecast accuracy
recent_production <- aus_production %>%
filter(year(Quarter) >= 1992)
train <- recent_production %>%
filter(year(Quarter) <= 2007)
beer_fit <- train %>%
model(
Mean = MEAN(Beer),
Naive = NAIVE(Beer),
Seasonal_naive = SNAIVE(Beer),
Drift = RW(Beer ~ drift())
)
beer_fc <- beer_fit %>%
forecast(h = 10)
66
Measures of forecast accuracy
accuracy(beer_fit) ## # A tibble: 4 x 6
## .model
##
## 1 Drift
## 2 Mean
## 3 Naive
## 4 Seasonal_naive Training 16.8 14.3 3.31 1
.type RMSE MAE MAPE MASE
Training 65.3 54.8 12.2 3.83
Training 43.6 35.2 7.89 2.46
Training 65.3 54.7 12.2 3.83
accuracy(beer_fc, recent_production) ## # A tibble: 4 x 6
## .model
##
## 1 Drift
## 2 Mean
## 3 Naive
## 4 Seasonal_naive
.type RMSE MAE MAPE MASE
Test 64.9 58.9 14.6 4.12
Test 38.4 34.8 8.28 2.44
Test 62.7 57.4 14.2 4.01
Test 14.3 13.4 3.17 0.937 67
Poll: true or false?
https://pollev.com/robjhyndman
68
Outline
1 A tidy forecasting workflow
2 Some simple forecasting methods
3 Residual diagnostics
4 Distributional forecasts and prediction intervals
5 Forecasting with transformations
6 Forecasting and decomposition
7 Evaluating forecast accuracy
8 Time series cross-validation 69
Time series cross-validation
Traditional evaluation
Training data Test data
time
70
Time series cross-validation
Traditional evaluation
Training data
Time series cross-validation
Test data
time
time
71
Time series cross-validation
Traditional evaluation
Training data Test data
Time series cross-validation
time
time
Forecast accuracy averaged over test sets. Also known as “evaluation on a rolling forecasting origin”
71
Creating the rolling training sets
There are three main rolling types which can be used.
Stretch: extends a growing length window with new data.
Slide: shifts a fixed length window through the data. Tile: moves a fixed length window without overlap.
Three functions to roll a tsibble: stretch_tsibble(), slide_tsibble(), and tile_tsibble().
For time series cross-validation, stretching windows are most commonly used.
72
Creating the rolling training sets
73
Time series cross-validation
Stretch with a minimum length of 3, growing by 1 each step.
fb_stretch <- fb_stock %>%
stretch_tsibble(.init = 3, .step = 1) %>%
filter(.id != max(.id))
## # A tsibble: 790,650 x 4 [1]
## # Key: .id [1,255]
## Date Close trading_day .id
##
## 1 2014-01-02 54.7
## 2 2014-01-03 54.6
## 3 2014-01-06 57.2
## 4 2014-01-02 54.7
## 5 2014-01-03 54.6
## 6 2014-01-06 57.2
## 7 2014-01-07 57.9
1 1
2 1
3 1
1 2
2 2
3 2
4 2
74
## # … with 790,643 more rows
Time series cross-validation
Estimate RW w/ drift models for each window.
fit_cv <- fb_stretch %>%
model(RW(Close ~ drift()))
## # A mable: 1,255 x 3
## # Key: .id, Symbol [1,255]
## .id Symbol RW(Close ~ drift())
##
## 1
## 2
## 3
## 4
## # … with 1,251 more rows
1FB 2FB 3FB 4FB
75
Time series cross-validation
Produce one step ahead forecasts from all models.
fc_cv <- fit_cv %>%
forecast(h=1)
## # A fable: 1,255 x 6 [1]
## # Key:
## .model
##
## 1 RW(Close ~ drif~
## 2 RW(Close ~ drif~
## 3 RW(Close ~ drif~
## 4 RW(Close ~ drif~
## # … with 1,251 more rows
.id, Symbol, .model [1,255]
.id Symbol trading_day Close .mean
1 FB
4 N(58, 3.9) 58.4
5 N(59, 2) 59.0
6 N(59, 1.5) 59.1
7 N(58, 1.8) 57.7
2 FB 3 FB 4 FB
76
Time series cross-validation
# Cross-validated
fc_cv %>% accuracy(fb_stock)
# Training set
fb_stock %>% model(RW(Close ~ drift())) %>% accuracy()
RMSE MAE MAPE Cross-validation 2.418 1.469 1.266
Training 2.414 1.465 1.261
A good way to choose the best forecasting model is to find the model with the smallest RMSE computed using time series cross-validation.
77