CS计算机代考程序代写 ETC3231/5231 Business forecasting

ETC3231/5231 Business forecasting
Ch5. The forecasters’ toolbox OTexts.org/fpp3/

Forecasting
Forecasting: is estimating how the sequence of observations will continue into the future.
We usually think probabilistically about future sample paths
What range of values covers the possible sample paths with 80% probability?
3

International visitors to Australia
Total international visitors to Australia
10.0
7.5
5.0
2.5
1980 1990 2000
2010 2020
Year
Data Future 1 Future 2 Future 3 Future 4 Future 5 Future 6 Future 7 Future 8 Future 9 Future 10
4
Millions of visitors

Forecast intervals
Forecasts of total international visitors to Australia
10.0
7.5
5.0
l
evel 80
95
2.5
1980 1990 2000
2010 2020
Year
5
Millions of visitors

Outline
1 A tidy forecasting workflow
2 Some simple forecasting methods
3 Residual diagnostics
4 Distributional forecasts and prediction intervals
5 Forecasting with transformations
6 Forecasting and decomposition
7 Evaluating forecast accuracy
8 Time series cross-validation 2

Outline
1 A tidy forecasting workflow
2 Some simple forecasting methods
3 Residual diagnostics
4 Distributional forecasts and prediction intervals
5 Forecasting with transformations
6 Forecasting and decomposition
7 Evaluating forecast accuracy
8 Time series cross-validation 3

A tidy forecasting workflow
The process of producing forecasts can be split up into a few fundamental steps.
1 Preparing data
2 Data visualisation
3 Specifying a model
4 Model estimation
5 Accuracy & performance evaluation
6 Producing forecasts
4

A tidy forecasting workflow
Tidy Visualise
Estimate Forecast
Specify
Evaluate
5

Data preparation (tidy)
gdppc <- global_economy %>%
mutate(GDP_per_capita = GDP/Population) %>%
select(Year, Country, GDP, Population, GDP_per_capita)
gdppc
## # A tsibble: 15,150 x 5 [1Y]
## # Key: Country [263]
## Year Country GDP Population GDP_per_capita
##
## 1 1960 Afghanistan 537777811.
## 2 1961 Afghanistan 548888896.
## 3 1962 Afghanistan 546666678.
## 4 1963 Afghanistan 751111191.
## 5 1964 Afghanistan 800000044.
## 6 1965 Afghanistan 1006666638.

8996351 59.8
9166764 59.9
9345868 58.5
9533954 78.8
9731361 82.2
9938414 101.
6

Data visualisation
gdppc %>%
filter(Country==”Sweden”) %>%
autoplot(GDP_per_capita) +
labs(title = “GDP per capita for Sweden”, y = “$US”)
GDP per capita for Sweden
60000
40000
20000
0
1960 1980
2000
Year [1Y]
7
$US

Model estimation
The model() function trains models to data.
fit <- gdppc %>%
model(trend_model = TSLM(GDP_per_capita ~ trend()))
fit
## # A mable: 263 x 2
## # Key: Country [263]
## Country trend_model
##
## 1 Afghanistan
## 2 Albania
## 3 Algeria
## 4 American Samoa
## 5 Andorra





8
## 6 Angola

Model estimation
The model() function trains models to data.
fit <- gdppc %>%
model(trend_model = TSLM(GDP_per_capita ~ trend()))
fit
## # A mable: 263 x 2
## # Key: Country [263]
## Country trend_model
##
## 1 Afghanistan
## 2 Albania
## 3 Algeria
## 4 American Samoa
## 5 Andorra
A mable is a model table, each cell corresponds to a fitted model.





8
## 6 Angola

Producing forecasts
fit %>% forecast(h = “3 years”)
## # A fable: 789 x 5 [1Y]
## # Key:
## Country
##
## 1 Afghanistan
## 2 Afghanistan
## 3 Afghanistan
## 4 Albania
## 5 Albania
## 6 Albania
## 7 Algeria
## 8 Algeria
## 9 Algeria
GDP_per_capita .mean

N(526, 9653) 526.
N(534, 9689) 534.
N(542, 9727) 542.
trend_model 2018 N(4716, 476419) 4716.
trend_model 2019 N(4867, 481086) 4867.
trend_model 2020 N(5018, 486012) 5018.
trend_model 2018 N(4410, 643094) 4410.
trend_model 2019 N(4489, 645311) 4489.
trend_model 2020 N(4568, 647602) 4568.
Country, .model [263]
.model Year

trend_model
trend_model
trend_model
2018
2019
2020
9

Producing forecasts
fit %>% forecast(h = “3 years”)
## # A fable: 789 x 5 [1Y]
## # Key:
## Country
##
Country, .model [263]
.model Year

GDP_per_capita .mean

N(526, 9653) 526.
N(534, 9689) 534.
N(542, 9727) 542.
## 1 Afghanistan
## 2 Afghanistan
## 3 Afghanistan
## 4 Albania
## 5 Albania
## 6 Albania
## 7 Algeria
## 8 Algeria
A fable is a forecast table with point forecasts and distributions.
## 9 Algeria trend_model 2020 N(4568, 647602) 4568.
trend_model
trend_model
trend_model
2018
2019
2020
trend_model 2018 N(4716, 476419) 4716.
trend_model 2019 N(4867, 481086) 4867.
trend_model 2020 N(5018, 486012) 5018.
trend_model 2018 N(4410, 643094) 4410.
trend_model 2019 N(4489, 645311) 4489.
9

Visualising forecasts
fit %>% forecast(h = “3 years”) %>%
filter(Country==”Sweden”) %>%
autoplot(gdppc) +
labs(title = “GDP per capita for Sweden”, y = “$US”)
GDP per capita for Sweden
60000
40000
20000
0
l
1960 1980
2000 2020
Year
evel
80 95
10
$US

A tidy forecasting workflow
Specify
Tidy Visualise
Estimate Forecast
Evaluate

Outline
1 A tidy forecasting workflow
2 Some simple forecasting methods
3 Residual diagnostics
4 Distributional forecasts and prediction intervals
5 Forecasting with transformations
6 Forecasting and decomposition
7 Evaluating forecast accuracy
8 Time series cross-validation 11

Some simple forecasting methods
500
450
Australian quarterly beer production
400
1995 Q1 2000 Q1 2005 Q1 2010 Q1
Year
How would you forecast these series?
12
megalitres

Some simple forecasting methods
130000
120000
110000
100000
90000
80000
Number of pigs slaughtered in Victoria, 1990−1995
1990 Jan
1992 Jan
1994 Jan
1996 Jan
Year
How would you forecast these series?
13
thousands

Some simple forecasting methods
Facebook closing stock price in 2018
200
175
150
125
Jan 2018 Apr 2018 Jul 2018 Oct 2018 Jan 2019
How would you forecast these series?
14
Closing price ($USD)

Some simple forecasting methods
MEAN(y): Average method
Forecast of all future values is equal to mean of historical data {y1, . . . , yT}.
Forecasts:yˆT+h|T =y ̄=(y1+···+yT)/T
600
500
400
300
200
Clay brick production in Australia
1960 Q1
1980 Q1
2000 Q1
Quarter
15
Bricks

Some simple forecasting methods
NAIVE(y): Naïve method
Forecasts equal to last observed value.
Forecasts: yˆT+h|T = yT.
Consequence of efficient market hypothesis.
500
450
400
350
300
Clay brick production in Australia
1995 Q1
2000 Q1
Quarter
2005 Q1
2010 Q1
16
Bricks

Some simple forecasting methods
SNAIVE(y ~ lag(m)):Seasonalnaïvemethod
Forecasts equal to last value from same season. Forecasts: yˆT+h|T = yT+h−m(k+1), where m = seasonal period and k is the integer part of (h − 1)/m.
500
450
400
350
300
Clay brick production in Australia
1995 Q1
2000 Q1
Quarter
2005 Q1
2010 Q1
17
Bricks

Some simple forecasting methods
RW(y ~ drift()):Driftmethod
Forecasts equal to last value plus average change.
Forecasts:
yˆT+h|T=yT+T−1 (yt−yt−1)
h 􏰆T t=2
=yT+ h (yT−y1). T−1
Equivalent to extrapolating a line drawn between first and last observations.
18

Some simple forecasting methods
Drift method
600
500
400
300
200
Clay brick production in Australia
1960 Q1
1980 Q1
Quarter
2000 Q1
19
Bricks

Model fitting
The model() function trains models to data.
brick_fit <- aus_production %>%
filter(!is.na(Bricks)) %>%
model(
Seasonal_naive = SNAIVE(Bricks),
Naive = NAIVE(Bricks),
Drift = RW(Bricks ~ drift()),
Mean = MEAN(Bricks)
)
## # A mable: 1 x 4
## ## ## 1
Seasonal_naive Naive Drift Mean


20

Producing forecasts
brick_fc <- brick_fit %>%
forecast(h = “5 years”)
## # A fable: 80 x 4 [1Q]
## # Key:
## .model
##
## 1 Seasonal_naive 2005 Q3 N(428, 2336) 428
## 2 Seasonal_naive 2005 Q4 N(397, 2336) 397
## 3 Seasonal_naive 2006 Q1 N(355, 2336) 355
## 4 Seasonal_naive 2006 Q2 N(435, 2336) 435
## # … with 76 more rows
.model [4]
Quarter Bricks .mean

21

Visualising forecasts
brick_fc %>%
autoplot(aus_production, level = NULL) +
labs(title = “Clay brick production in Australia”,
y = “Millions of bricks”) +
guides(colour = guide_legend(title = “Forecast”))
Clay brick production in Australia
600
500
400
300
200
1960 Q1
1980 Q1
2000 Q1
Quarter
Forecast Drift
Mean
Naive Seasonal_naive
22
Millions of bricks

Facebook closing stock price
# Extract training data
fb_stock <- gafa_stock %>%
filter(Symbol == “FB”) %>%
mutate(trading_day = row_number()) %>%
update_tsibble(index=trading_day, regular=TRUE)
# Specify, estimate and forecast
fb_stock %>%
model(
Mean = MEAN(Close),
Naive = NAIVE(Close),
Drift = RW(Close ~ drift())
) %>%
forecast(h=42) %>%
autoplot(fb_stock, level = NULL) +
labs(title = “Facebook closing stock price”, y=”$US”) +
guides(colour=guide_legend(title=”Forecast”))
23

Facebook closing stock price
Facebook closing stock price
200
150
Forecast Drift
Mean Naive
100
50
0 500
1000
trading_day
24
$US

Outline
1 A tidy forecasting workflow
2 Some simple forecasting methods
3 Residual diagnostics
4 Distributional forecasts and prediction intervals
5 Forecasting with transformations
6 Forecasting and decomposition
7 Evaluating forecast accuracy
8 Time series cross-validation 25

Fitted values
yˆt|t−1 is the forecast of yt based on observations y1,…,yt−1.
We call these “fitted values”.
Sometimes drop the subscript: yˆt ≡ yˆt|t−1. Often not true forecasts since parameters are estimated on all data.
For example:
yˆt = y ̄ for average method.
yˆt = yt−1 + (yT − y1)/(T − 1) for drift method.
26

Forecasting residuals
Residuals in forecasting: difference between observed value and its fitted value: et = yt − yˆt|t−1.
27

Forecasting residuals
Residuals in forecasting: difference between observed value and its fitted value: et = yt − yˆt|t−1.
Assumptions
1 {et} uncorrelated. If they aren’t, then information left in residuals that should be used in computing forecasts.
2 {et} have mean zero. If they don’t, then forecasts are biased.
27

Forecasting residuals
Residuals in forecasting: difference between observed value and its fitted value: et = yt − yˆt|t−1.
Assumptions
1 {et} uncorrelated. If they aren’t, then information left in residuals that should be used in computing forecasts.
2 {et} have mean zero. If they don’t, then forecasts are biased.
Useful properties (for distributions & prediction intervals)
3 {et} have constant variance.
4 {et} are normally distributed.
27

Facebook closing stock price
fb_stock <- gafa_stock %>%
filter(Symbol == “FB”) %>%
mutate(trading_day = row_number()) %>%
update_tsibble(index = trading_day, regular = TRUE)
fb_stock %>% autoplot(Close)
28
200
150
100
50
0 400
800 1200
trading_day [1]
Close

Facebook closing stock price
fit <- fb_stock %>% model(NAIVE(Close))
augment(fit)
## # A tsibble: 1,258 x 7 [1]
## # Key: Symbol, .model [1]
## Symbol .model
trading_day Close .fitted .resid .innov
##
## 1 FB ## 2 FB
## 3 FB

NAIVE(Clo~
NAIVE(Clo~
NAIVE(Clo~
NAIVE(Clo~
NAIVE(Clo~
NAIVE(Clo~
NAIVE(Clo~
NAIVE(Clo~
NAIVE(Clo~
NAIVE(Clo~

1 54.7
2 54.6
3 57.2
4 57.9
5 58.2
6 57.2
7 57.9
8 55.9
9 57.7
10 57.6

NA NA NA
54.7 -0.150 -0.150
54.6 2.64 2.64
57.2 0.720 0.720
57.9 0.310 0.310
58.2 -1.01 -1.01
57.2 0.720 0.720
57.9 -2.03 -2.03
55.9 1.83 1.83
57.7 -0.140 -0.140
## 4 FB
## 5 FB
## 6 FB
## 7 FB
## 8 FB
## 9 FB
## 10 FB
## # … with 1,248 more rows
29

Facebook closing stock price
fit <- fb_stock %>% model(NAIVE(Close))
augment(fit)
## # A tsibble: 1,258 x 7 [1]
## # Key: Symbol, .model [1]
## Symbol .model
yˆt|t−1 et trading_day Close .fitted .resid .innov
##
## 1 FB ## 2 FB
## 3 FB
## 4 FB
## 5 FB
## 6 FB

NAIVE(Clo~
NAIVE(Clo~
NAIVE(Clo~
NAIVE(Clo~
NAIVE(Clo~
NAIVE(Clo~

1 54.7
2 54.6
3 57.2
4 57.9
5 58.2
6 57.2
7 57.9
8 55.9
9 57.7
10 57.6

NA NA NA
54.7 -0.150 -0.150
54.6 2.64 2.64
57.2 0.720 0.720
57.9 0.310 0.310
58.2 -1.01 -1.01
57.2 0.720 0.720
57.9 -2.03 -2.03
55.9 1.83 1.83
57.7 -0.140 -0.140
## 7 FB NAIVE(Clo~
Naïve forecasts:
## 8 FB NAIVE(Clo~
yˆ ## 9=FBy t|t−1
t−1
NAIVE(Clo~
## 10 FB
NAIVE(Clo~
e = y − yˆ = y − y
## #t … twith 1t,|t2−481 moret rows t−1
29

Facebook closing stock price
augment(fit) %>%
ggplot(aes(x = trading_day)) +
geom_line(aes(y = Close, colour = “Data”)) +
geom_line(aes(y = .fitted, colour = “Fitted”))
200
150
100
colour Data
Fitted
Close
50
0 400
800 1200
30
trading_day

Facebook closing stock price
augment(fit) %>%
filter(trading_day > 1100) %>%
ggplot(aes(x = trading_day)) +
geom_line(aes(y = Close, colour = “Data”)) +
geom_line(aes(y = .fitted, colour = “Fitted”))
31
200
175
150
125
1100 1150
1200 1250
trading_day
colour Data
Fitted
Close

Facebook closing stock price
augment(fit) %>%
autoplot(.resid) +
labs(y = “$US”,
title = “Residuals from naïve method”)
0
−20
−40
Residuals from naïve method
0 400
800 1200
trading_day [1]
32
$US

Facebook closing stock price
augment(fit) %>%
ggplot(aes(x = .resid)) +
geom_histogram(bins = 150) +
ggtitle(“Histogram of residuals”)
Histogram of residuals
150
100
50
0
−40 −20
0
.resid
33
count

Facebook closing stock price
augment(fit) %>%
ACF(.resid) %>%
autoplot() + labs(title = “ACF of residuals”)
ACF of residuals
0.06
0.03
0.00
−0.03
−0.06
10 20 30
lag [1]
34
acf

gg_tsresiduals() function gg_tsresiduals(fit)
0
−20
−40
0.06 0.03 0.00
−0.03 −0.06
0 400
800 1200
acf .resid
count
trading_day
150
100
50
0
10 20 30 −40 −20 0
lag [1] .resid
35

ACF of residuals
We assume that the residuals are white noise (uncorrelated, mean zero, constant variance). If they aren’t, then there is information left in the residuals that should be used in computing forecasts.
So a standard residual diagnostic is to check the ACF of the residuals of a forecasting method.
We expect these to look like white noise.
36

Portmanteau tests
Consider a whole set of rk values, and develop a test to see whether the set is significantly different from a zero set.
37

Portmanteau tests
Consider a whole set of rk values, and develop a test to see whether the set is significantly different from a zero set.
Box-Pierce test
l
Q = T 􏰆 r k2 k=1
where l is max lag being considered and T is number of observations.
If each rk close to zero, Q will be small.
If some rk values large (positive or negative), Q will be large.
37

Portmanteau tests
Consider a whole set of rk values, and develop a test to see whether the set is significantly different from a zero set.
Ljung-Box test
l
Q ∗ = T ( T + 2 ) 􏰆 ( T − k ) − 1 r k2 k=1
where l is max lag being considered and T is number of observations.
My preferences: l = 10 for non-seasonal data,
h = 2m for seasonal data. Betterperformance,especiallyinsmallsamples. 38

Portmanteau tests
If data are WN, Q∗ has χ2 distribution with (l − K) degrees of freedom where K = no. parameters in model.
When applied to raw data, set K = 0.
lag = l, dof = K
augment(fit) %>%
features(.resid, ljung_box, lag=10, dof=0)
## # A tibble: 1 x 4
## Symbol .model
##
## 1 FB NAIVE(Close) 12.1 0.276
lb_stat lb_pvalue
39

Outline
1 A tidy forecasting workflow
2 Some simple forecasting methods
3 Residual diagnostics
4 Distributional forecasts and prediction intervals
5 Forecasting with transformations
6 Forecasting and decomposition
7 Evaluating forecast accuracy
8 Time series cross-validation 40

Forecast distributions
A forecast yˆT+h|T is (usually) the mean of the conditionaldistributionyT+h |y1,…,yT. Most time series models produce normally distributed forecasts.
The forecast distribution describes the probability of observing any future value.
41

Forecast distributions
Assuming residuals: have zero mean, are uncorrelated, normal, with variance = σˆ2:
Mean:
Naïve: Seasonal naïve: Drift:
yˆT+h|T ∼N(y ̄,(1+1/T)σˆ2)
yˆT+h|T ∼ N(yT,hσˆ2)
yˆT+h|T ∼ N(yT+h−m(k+1), (k + 1)σˆ2)
yˆT+h|T ∼N(yT + h (yT −y1),hT+hσˆ2) T−1 T
where k is the integer part of (h − 1)/m.
Note that when h = 1 and T is large, these all give the same approximate forecast variance: σˆ2.
42

Prediction intervals
A prediction interval gives a region within which we expect yT+h to lie with a specified probability.
Assuming forecast errors are normally distributed, then a 95% PI is
yˆT+h|T ± 1.96σˆh
where σˆh is the st dev of the h-step distribution.
When h = 1, σˆh can be estimated from the residuals.
43

Prediction intervals
brick_fc %>% hilo(level = 95)
## # A tsibble: 80 x 5 [1Q]
## # Key:
## .model
##
## 1 Seasonal_naive 2005 Q3 N(428, 2336)
## 2 Seasonal_naive 2005 Q4 N(397, 2336)
## 3 Seasonal_naive 2006 Q1 N(355, 2336)
## 4 Seasonal_naive 2006 Q2 N(435, 2336)
## 5 Seasonal_naive 2006 Q3 N(428, 4672)
## 6 Seasonal_naive 2006 Q4 N(397, 4672)
## 7 Seasonal_naive 2007 Q1 N(355, 4672)
## 8 Seasonal_naive 2007 Q2 N(435, 4672)
## 9 Seasonal_naive 2007 Q3 N(428, 7008)
## 10 Seasonal_naive 2007 Q4 N(397, 7008)
.model [4]
Quarter Bricks .mean ‘95%‘

428 [333, 523]95
397 [302, 492]95
355 [260, 450]95
435 [340, 530]95
428 [294, 562]95
397 [263, 531]95
355 [221, 489]95
435 [301, 569]95
428 [264, 592]95
397 [233, 561]95
44

Prediction intervals
Point forecasts often useless without a measure of uncertainty (such as prediction intervals). Prediction intervals require a stochastic model (with random errors, etc).
For most models, prediction intervals get wider as the forecast horizon increases.
Use level argument to control coverage. Check residual assumptions before believing them.
Usually too narrow due to unaccounted uncertainty.
45

Outline
1 A tidy forecasting workflow
2 Some simple forecasting methods
3 Residual diagnostics
4 Distributional forecasts and prediction intervals
5 Forecasting with transformations
6 Forecasting and decomposition
7 Evaluating forecast accuracy
8 Time series cross-validation 46

Mathematical transformations
If the data show different variation at different levels of the series, then a transformation can be useful.
Denote original observations as y1, . . . , yn and transformed observations as w1, . . . , wn.
Box-Cox transformations
 log(yt), λ=0;
wt = (yλ − 1)/λ, λ ̸= 0. t
Natural logarithm, particularly useful because they are more interpretable: changes in a log value are relative (percent) changes on the original scale.
47

Back-transformation
We must reverse the transformation or back-transform to obtain forecasts on the original scale. The reverse Box-Cox transformations are given by
Box-Cox back-transformations
 exp(wt),
 (λWt + 1)1/λ,
yt =
λ = 0; λ ̸= 0.
48

Modelling with transformations
eggs <- prices %>%
filter(!is.na(eggs)) %>% select(eggs)
eggs %>% autoplot() +
labs(title=”Annual egg prices”,
y=”$US (adjusted for inflation)”)
49
300
200
100
Annual egg prices
1900 1925 1950 1975
year [1Y]
$US (adjusted for inflation)

Modelling with transformations
Transformations used in the left of the formula will be automatically back-transformed. To model log-transformed egg prices, you could use:
fit <- eggs %>%
model(RW(log(eggs) ~ drift()))
fit
## # A mable: 1 x 1
## ## ## 1
‘RW(log(eggs) ~ drift())‘


50

Forecasting with transformations
fc <- fit %>%
forecast(h = 50)
fc
## # A fable: 50 x 4 [1Y]
## # Key: .model [1]
## .model year eggs .mean
##
## 1 RW(log(eggs) ~ drift()) 1994 t(N(4.1, 0.018)) 61.8
## 2 RW(log(eggs) ~ drift()) 1995 t(N(4.1, 0.036)) 61.4
## 3 RW(log(eggs) ~ drift()) 1996 t(N(4.1, 0.054)) 61.0
## 4 RW(log(eggs) ~ drift()) 1997 t(N(4.1, 0.073)) 60.5
## 5 RW(log(eggs) ~ drift()) 1998 t(N(4.1, 0.093)) 60.1
## 6 RW(log(eggs) ~ drift()) 1999
## 7 RW(log(eggs) ~ drift()) 2000
## 8 RW(log(eggs) ~ drift()) 2001
t(N(4, 0.11)) 59.7
t(N(4, 0.13)) 59.3
51
t(N(4, 0.15)) 58.9

Forecasting with transformations
fc %>% autoplot(eggs) +
labs(title=”Annual egg prices”,
y=”US$ (adjusted for inflation)”)
Annual egg prices
300
200
100
0
l
1900 1950
2000 2050
year
evel 80
95
52
US$ (adjusted for inflation)

Bias adjustment
Back-transformed point forecasts are medians. Back-transformed PI have the correct coverage.
53

Bias adjustment
Back-transformed point forecasts are medians. Back-transformed PI have the correct coverage.
Back-transformed means
Let X be have mean μ and variance σ2.
Let f(x) be back-transformation function, and Y = f(X). Taylor series expansion about μ:
f(X) = f(μ) + (X − μ)f′(μ) + 1(X − μ)2f′′(μ). 2
E[Y] = E[f(X)] = f(μ) + 1σ2f′′(μ) 2
53

Bias adjustment
Box-Cox back-transformation:
yt = f(x) =
 exp(wt)
λ=0; λ̸=0.
λ = 0 ; λ ̸= 0.
(λWt +1)1/λ (λx + 1)1/λ
  e x
 e x
(1 − λ)(λx + 1)1/λ−2
λ = 0 ; λ ̸= 0.
λ = 0 ; λ̸=0.
f′′(x) =
 e μ 􏰉 1 + σ 2 􏰊
2
E[Y]= 􏰉 2 􏰊
(λμ+1)1/λ 1+ σ (1−λ) 2(λμ+1)2
54

Bias adjustment
fc %>%
autoplot(eggs, level = 80, point_forecast = lst(mean, median)) +
labs(title=”Annual egg prices”,
y=”US$ (adjusted for inflation)”)
Annual egg prices
300
200
100
0
level 80
Point forecast mean
median
1900 1950
2000 2050
year
55
US$ (adjusted for inflation)

Outline
1 A tidy forecasting workflow
2 Some simple forecasting methods
3 Residual diagnostics
4 Distributional forecasts and prediction intervals
5 Forecasting with transformations
6 Forecasting and decomposition
7 Evaluating forecast accuracy
8 Time series cross-validation 56

Forecasting and decomposition
y t = Sˆ t + Aˆ t
Aˆt is seasonally adjusted component
Sˆt is seasonal component.
Forecast Sˆt using SNAIVE.
Forecast Aˆt using non-seasonal time series method.
Combine forecasts of Sˆt and Aˆt to get forecasts of original data.
57

US Retail Employment
us_retail_employment <- us_employment %>%
filter(year(Month) >= 1990, Title == “Retail Trade”) %>%
select(-Series_ID)
us_retail_employment
## # A tsibble: 357 x 3 [1M]
## Month Title
##
## 1 1990 Jan Retail Trade
## 2 1990 Feb Retail Trade
## 3 1990 Mar Retail Trade
## 4 1990 Apr Retail Trade
## 5 1990 May Retail Trade
## 6 1990 Jun Retail Trade
## 7 1990 Jul Retail Trade
## 8 1990 Aug Retail Trade
## 9 1990 Sep Retail Trade
## 10 1990 Oct Retail Trade
Employed

13256.
12966.
12938.
13012.
13108.
13183.
13170.
13160.
13113.
13185.
58

US Retail Employment
dcmp <- us_retail_employment %>%
model(STL(Employed)) %>%
components() %>% select(-.model)
dcmp
## # A tsibble: 357 x 6 [1M]
## Month Employed trend season_year remainder
##
## 1 1990 Jan
## 2 1990 Feb
## 3 1990 Mar
## 4 1990 Apr
## 5 1990 May
## 6 1990 Jun 13183. 13193.
## 7 1990 Jul 13170. 13173.
## 8 1990 Aug 13160. 13152.
## 9 1990 Sep 13113. 13131.
## 10 1990 Oct 13185. 13110.

-38.1
-261.
-291.
-221.
-115.
3.08
-44.2 -23.0
0.0892

13256. 13291.
12966. 13272.
12938. 13252.
13012. 13233.
13108. 13213.
9.98
-25.6 15.7
-24.4 22.0
-11.8 19.5
-43.4 25.7
62.5 12.3
59

US Retail Employment
dcmp %>%
model(NAIVE(season_adjust)) %>%
forecast() %>%
autoplot(dcmp) +
labs(title = “Naive forecasts of seasonally adjusted data”)
Naive forecasts of seasonally adjusted data
16000
15000
14000
13000
l
1990 Jan
2000 Jan
2010 Jan
2020 Jan
Month
evel 80
95
60
season_adjust

US Retail Employment
us_retail_employment %>%
model(stlf = decomposition_model(
STL(Employed ~ trend(window = 7), robust = TRUE),
NAIVE(season_adjust)
)) %>%
forecast() %>%
autoplot(us_retail_employment)
61
16000
15000
14000
13000
l
1990 Jan
2000 Jan
2010 Jan
2020 Jan
Month
evel 80
95
Employed

Decomposition models
decomposition_model() creates a decomposition model
You must provide a method for forecasting the season_adjust series.
A seasonal naive method is used by default for the seasonal components.
The variances from both the seasonally adjusted and seasonal forecasts are combined.
62

Outline
1 A tidy forecasting workflow
2 Some simple forecasting methods
3 Residual diagnostics
4 Distributional forecasts and prediction intervals
5 Forecasting with transformations
6 Forecasting and decomposition
7 Evaluating forecast accuracy
8 Time series cross-validation 63

Training and test sets
Training data Test data
time
A model which fits the training data well will not necessarily forecast well.
A perfect fit can always be obtained by using a model with enough parameters.
Over-fitting a model to data is just as bad as failing to identify a systematic pattern in the data.
The test set must not be used for any aspect of model development or calculation of forecasts.
64

Forecast errors
Forecast “error”: the difference between an observed value and its forecast.
e T + h = y T + h − yˆ T + h | T ,
where the training data is given by {y1, . . . , yT}
Unlike residuals, forecast errors on the test set involve multi-step forecasts.
These are true forecast errors as the test data is not used in computing yˆT+h|T.
65

Measures of forecast accuracy
Forecasts for quarterly beer production
500
450
400
Forecast Drift
Mean
Naive Seasonal_naive
1995 Q1 2000 Q1 2005 Q1 2010 Q1
Quarter
66
Megalitres

Measures of forecast accuracy
yT+h = (T+h)thobservation,h=1,…,H yˆT+h|T = its forecast based on data up to time T.
eT+h = yT+h − yˆT+h|T
MAE = mean(|eT+h|)
MSE=mean(e2T+h) RMSE =􏰋mean(e2T+h)
MAPE = 100mean(|eT+h|/|yT+h|)
67

Measures of forecast accuracy
yT+h = (T+h)thobservation,h=1,…,H yˆT+h|T = its forecast based on data up to time T.
eT+h = yT+h − yˆT+h|T
MAE = mean(|eT+h|)
MSE=mean(e2T+h) RMSE =􏰋mean(e2T+h)
MAPE = 100mean(|eT+h|/|yT+h|)
MAE, MSE, RMSE are all scale dependent. MAPE is scale independent but is only sensible if yt ≫0forallt,andyhasanaturalzero.
67

Measures of forecast accuracy
Mean Absolute Scaled Error
MASE = mean(|eT+h|/Q)
where Q is a stable measure of the scale of the time
series {yt}.
Proposed by Hyndman and Koehler (IJF, 2006).
For non-seasonal time series,
T Q=(T−1)−1􏰆|yt −yt−1|
t=2
works well. Then MASE is equivalent to MAE relative
to a naïve method.
68

Measures of forecast accuracy
Mean Absolute Scaled Error
MASE = mean(|eT+h|/Q)
where Q is a stable measure of the scale of the time
series {yt}.
Proposed by Hyndman and Koehler (IJF, 2006).
For seasonal time series,
T Q=(T−m)−1 􏰆 |yt −yt−m|
t=m+1
works well. Then MASE is equivalent to MAE relative
to a seasonal naïve method.
69

Measures of forecast accuracy
Forecasts for quarterly beer production
500
450
400
Forecast Drift
Mean
Naive Seasonal_naive
1995 Q1 2000 Q1 2005 Q1 2010 Q1
Year
70
Megalitres

Measures of forecast accuracy
recent_production <- aus_production %>%
filter(year(Quarter) >= 1992)
train <- recent_production %>%
filter(year(Quarter) <= 2007) beer_fit <- train %>%
model(
Mean = MEAN(Beer),
Naive = NAIVE(Beer),
Seasonal_naive = SNAIVE(Beer),
Drift = RW(Beer ~ drift())
)
beer_fc <- beer_fit %>%
forecast(h = 10)
71

Measures of forecast accuracy
accuracy(beer_fit) ## # A tibble: 4 x 6
## .model
##
## 1 Drift
## 2 Mean
## 3 Naive
## 4 Seasonal_naive Training 16.8 14.3 3.31 1
.type RMSE MAE MAPE MASE

Training 65.3 54.8 12.2 3.83
Training 43.6 35.2 7.89 2.46
Training 65.3 54.7 12.2 3.83
accuracy(beer_fc, recent_production) ## # A tibble: 4 x 6
## .model
##
## 1 Drift
## 2 Mean
## 3 Naive
## 4 Seasonal_naive
.type RMSE MAE MAPE MASE

Test 64.9 58.9 14.6 4.12
Test 38.4 34.8 8.28 2.44
Test 62.7 57.4 14.2 4.01
Test 14.3 13.4 3.17 0.937 72

Poll: true or false?
https://PollEv.com/georgeathana023
73

Outline
1 A tidy forecasting workflow
2 Some simple forecasting methods
3 Residual diagnostics
4 Distributional forecasts and prediction intervals
5 Forecasting with transformations
6 Forecasting and decomposition
7 Evaluating forecast accuracy
8 Time series cross-validation 74

Time series cross-validation
Traditional evaluation
Training data
Test data
time
75

Time series cross-validation
Traditional evaluation
Training data
Test data
Time series cross-validation
time
h = 1
75

Time series cross-validation
Traditional evaluation
Training data
Test data
Time series cross-validation
time
h = 2
76

Time series cross-validation
Traditional evaluation
Training data
Test data
Time series cross-validation
time
h = 3
77

Time series cross-validation
Traditional evaluation
Training data
Test data
Time series cross-validation
time
h = 4
78

Time series cross-validation
Traditional evaluation
Training data
Test data
Time series cross-validation
time
Forecast accuracy averaged over test sets. Also known as “evaluation on a rolling forecasting origin”
h = 4
78

Creating the rolling training sets
There are three main rolling types which can be used.
Stretch: extends a growing length window with new data.
Slide: shifts a fixed length window through the data. Tile: moves a fixed length window without overlap.
Three functions to roll a tsibble: stretch_tsibble(), slide_tsibble(), and tile_tsibble().
For time series cross-validation, stretching windows are most commonly used.
79

Creating the rolling training sets
73

Time series cross-validation
Stretch with a minimum length of 3, growing by 1 each step.
fb_stretch <- fb_stock %>%
stretch_tsibble(.init = 3, .step = 1) %>%
filter(.id != max(.id))
## # A tsibble: 790,650 x 4 [1]
## # Key: .id [1,255]
## Date Close trading_day .id
##
## 1 2014-01-02 54.7
## 2 2014-01-03 54.6
## 3 2014-01-06 57.2
## 4 2014-01-02 54.7
## 5 2014-01-03 54.6
## 6 2014-01-06 57.2
## 7 2014-01-07 57.9

1 1
2 1
3 1
1 2
2 2
3 2
4 2
81
## # … with 790,643 more rows

Time series cross-validation
Estimate RW w/ drift models for each window.
fit_cv <- fb_stretch %>%
model(RW(Close ~ drift()))
## # A mable: 1,255 x 3
## # Key: .id, Symbol [1,255]
## .id Symbol ‘RW(Close ~ drift())‘
##





## 1
## 2
## 3
## 4
## # … with 1,251 more rows
1FB 2FB 3FB 4FB
82

Time series cross-validation
Produce one step ahead forecasts from all models.
fc_cv <- fit_cv %>%
forecast(h=1)
## # A fable: 1,255 x 5 [1]
## # Key: .id, Symbol [1,255]
## .id Symbol trading_day

4
5
6
7
## # … with 1,251 more rows
Close .mean

N(58, 3.9) 58.4
N(59, 2) 59.0
N(59, 1.5) 59.1
N(58, 1.8) 57.7
##
## 1
##2 2FB
##3 3FB
##4 4FB
1FB
83

Time series cross-validation
# Cross-validated
fc_cv %>% accuracy(fb_stock)
# Training set
fb_stock %>% model(RW(Close ~ drift())) %>% accuracy()
RMSE MAE MAPE Cross-validation 2.418 1.469 1.266
Training 2.414 1.465 1.261
A good way to choose the best forecasting model is to find the model with the smallest RMSE computed using time series cross-validation.
84