ETC3231/5231 Business forecasting
Ch10. Dynamic regression models OTexts.org/fpp3/
Outline
1 Regression with ARIMA errors
2 Stochastic and deterministic trends
3 Dynamic harmonic regression
4 Lagged predictors
2
Outline
1 Regression with ARIMA errors
2 Stochastic and deterministic trends
3 Dynamic harmonic regression
4 Lagged predictors
3
Regression with ARIMA errors Regression models
yt =β0 +β1×1,t +···+βkxk,t +εt,
yt modeled as function of k explanatory variables x1,t,…,xk,t.
In regression, we assume that εt was WN.
Now we want to allow εt to be autocorrelated.
4
Regression with ARIMA errors
Regression models
yt =β0 +β1×1,t +···+βkxk,t +εt,
yt modeled as function of k explanatory variables x1,t,…,xk,t.
In regression, we assume that εt was WN.
Now we want to allow εt to be autocorrelated.
Example: ARIMA(1,1,1) errors
yt =β0 +β1×1,t +···+βkxk,t +ηt, (1 − φ1B)(1 − B)ηt = (1 + θ1B)εt,
where εt is white noise. 4
Residuals and errors
Example: ηt = ARIMA(1,1,1)
yt =β0 +β1×1,t +···+βkxk,t +ηt,
(1 − φ1B)(1 − B)ηt = (1 + θ1B)εt,
5
Residuals and errors
Example: ηt = ARIMA(1,1,1)
yt =β0 +β1×1,t +···+βkxk,t +ηt,
(1 − φ1B)(1 − B)ηt = (1 + θ1B)εt,
Be careful in distinguishing ηt from εt.
Only the errors εt are assumed to be white noise. In ordinary regression, ηt is assumed to be white noise and so ηt = εt.
5
Estimation
If we minimize ηt2 (by using ordinary regression):
1 Estimated coefficients βˆ0, . . . , βˆk are no longer optimal as some information ignored;
2 Statistical tests associated with the model (e.g., t-tests on the coefficients) are incorrect.
3 p-values for coefficients usually too small (“spurious regression’ ’).
4 AIC of fitted models misleading.
6
Estimation
If we minimize ηt2 (by using ordinary regression):
1 Estimated coefficients βˆ0, . . . , βˆk are no longer optimal as some information ignored;
2 Statistical tests associated with the model (e.g., t-tests on the coefficients) are incorrect.
3 p-values for coefficients usually too small (“spurious regression’ ’).
4 AIC of fitted models misleading.
Minimizingε2t avoidstheseproblems. Maximizing likelihood similar to minimizing ε2t .
6
Stationarity
Regression with ARMA errors
yt =β0 +β1×1,t +···+βkxk,t +ηt, where ηt is an ARMA process.
All variables in the model must be stationary.
If we estimate the model while any of these are non-stationary, the estimated coefficients can be incorrect.
Difference variables until all stationary.
If necessary, apply same differencing to all variables.
7
Stationarity
Model with ARIMA(1,1,1) errors
yt =β0 +β1×1,t +···+βkxk,t +ηt, (1 − φ1B)(1 − B)ηt = (1 + θ1B)εt,
8
Stationarity
Model with ARIMA(1,1,1) errors
yt =β0 +β1×1,t +···+βkxk,t +ηt, (1 − φ1B)(1 − B)ηt = (1 + θ1B)εt,
Equivalent to model with ARIMA(1,0,1) errors
y′ =β1x′ +···+βkx′ +η′, t1,t k,tt
(1 − φ1B)ηt′ = (1 + θ1B)εt,
where yt′ = yt − yt−1, xt′,i = xt,i − xt−1,i and η t′ = η t − η t − 1 .
8
Regression with ARIMA errors
Any regression with an ARIMA error can be rewritten as a regression with an ARMA error by differencing all variables with the same differencing operator as in the ARIMA model.
9
Regression with ARIMA errors
Any regression with an ARIMA error can be rewritten as a regression with an ARMA error by differencing all variables with the same differencing operator as in the ARIMA model.
Original data
yt =β0 +β1×1,t +···+βkxk,t +ηt where φ(B)(1 − B)dηt = θ(B)εt
9
Regression with ARIMA errors
Any regression with an ARIMA error can be rewritten as a regression with an ARMA error by differencing all variables with the same differencing operator as in the ARIMA model.
Original data
yt =β0 +β1×1,t +···+βkxk,t +ηt where φ(B)(1 − B)dηt = θ(B)εt
After differencing all variables
y′ =β1x′ +···+βkx′ +η′. t1,t k,tt
where φ(B)ηt = θ(B)εt and yt′=(1−B)dyt
9
Regression with ARIMA errors
In R, we can specify an ARIMA(p, d, q) for the errors, and d levels of differencing will be applied to all variables (y, x1,t, . . . , xk,t).
Check that εt series looks like white noise.
AICc can be calculated for final model.
Repeat procedure for all subsets of predictors to be considered, and select model with lowest AICc value.
10
US personal consumption and income
Quarterly changes in US consumption and personal income
2 1 0
−1 −2
2.5
0.0 −2.5
2.5
0.0 −2.5 −5.0
40
20
0
−20
−40
−60 1.5
1.0 0.5 0.0
−0.5 −1.0
1980 Q1
2000 Q1
2020 Q1
Quarter
11
Consumption Income Production Savings Unemployment
US personal consumption and income
No need for transformations or further differencing.
Increase in income does not necessarily translate into instant increase in consumption (e.g., after the loss of a job, it may take a few months for expenses to be reduced to allow for the new circumstances). We will ignore this for now.
12
US personal consumption and income
fit <- us_change %>% model(ARIMA(Consumption ~ Income))
report(fit)
## Series: Consumption
## Model: LM w/ ARIMA(1,0,2) errors
##
## Coefficients:
## ar1 ma1 ma2 Income intercept
## 0.707 -0.617 0.2066 0.1976 0.595
## s.e. 0.107 0.122 0.0741 0.0462 0.085
##
## sigma^2 estimated as 0.3113: log likelihood=-163
## AIC=338 AICc=339 BIC=358
13
US personal consumption and income
fit <- us_change %>% model(ARIMA(Consumption ~ Income))
report(fit)
## Series: Consumption
## Model: LM w/ ARIMA(1,0,2) errors
##
## Coefficients:
## ar1 ma1 ma2 Income intercept
## 0.707 -0.617 0.2066 0.1976 0.595
## s.e. 0.107 0.122 0.0741 0.0462 0.085
##
## sigma^2 estimated as 0.3113: log likelihood=-163
## AIC=338 AICc=339 BIC=358
Write down the equations for the fitted model.
13
US personal consumption and income
## Series: Consumption
## Model: LM w/ ARIMA(1,0,2) errors
##
## Coefficients:
## ar1 ma1 ma2 Income intercept
## 0.707 -0.617 0.2066 0.1976 0.595
## s.e. 0.107 0.122 0.0741 0.0462 0.085
##
## sigma^2 estimated as 0.3113: log likelihood=-163
## AIC=338 AICc=339 BIC=358
Write down the equations for the fitted model.
14
US personal consumption and income
residuals(fit, type=’regression’) %>%
gg_tsdisplay(.resid, plot_type = ‘partial’) +
labs(title = “Regression errors”)
Regression errors
1
0 −1 −2
1980 Q1
2000 Q1
2020 Q1
Quarter
0.3 0.2 0.1 0.0
−0.1
2 4 6 8 10121416182022
lag [1Q]
0.3 0.2 0.1 0.0
−0.1
2 4 6 8 10121416182022
lag [1Q]
15
acf .resid
pacf
US personal consumption and income
residuals(fit, type=’innovation’) %>%
gg_tsdisplay(.resid, plot_type = ‘partial’) +
labs(title = “ARIMA errors”)
ARIMA errors
1
0 −1 −2
1980 Q1
2000 Q1
2020 Q1
Quarter
0.15 0.10 0.05 0.00
−0.05 −0.10 −0.15
0.15 0.10 0.05 0.00
−0.05 −0.10 −0.15
2 4 6 8 10121416182022
lag [1Q]
2 4 6 8 10121416182022
lag [1Q]
16
acf .resid
pacf
US personal consumption and income
augment(fit) %>%
features(.innov, ljung_box, dof = 5, lag = 12)
## # A tibble: 1 x 3
## .model
##
## 1 ARIMA(Consumption ~ Income) 5.54 0.595
lb_stat lb_pvalue
17
US personal consumption and income
us_change_future <- new_data(us_change, 8) %>%
mutate(Income = mean(us_change$Income))
forecast(fit, new_data = us_change_future) %>%
autoplot(us_change) +
labs(x = “Year”, y = “Percentage change”,
title = “Forecasts from regression with ARIMA(1,0,2) errors”)
18
Forecasts from regression with ARIMA(1,0,2) errors
2
1
0
−1
−2
l
1980 Q1
2000 Q1
2020 Q1
Year
evel
80 95
Percentage change
Forecasting
To forecast a regression model with ARIMA errors, we need to forecast the regression part of the model and the ARIMA part of the model and combine the results.
Some predictors are known into the future (e.g., time, dummies).
Separate forecasting models may be needed for other predictors.
Forecast intervals ignore the uncertainty in forecasting the predictors.
19
Daily electricity demand
Model daily electricity demand as a function of temperature using quadratic regression with ARMA errors.
vic_elec_daily %>%
ggplot(aes(x=Temperature, y=Demand, colour=Day_Type)) +
geom_point() +
labs(x = “Maximum temperature”, y = “Electricity demand (GW)”)
350
300
250
200
10 20 30 40 20 Maximum temperature
Day_Type Holiday
Weekday Weekend
Electricity demand (GW)
Daily electricity demand
vic_elec_daily %>%
pivot_longer(c(Demand, Temperature)) %>%
ggplot(aes(x = Date, y = value)) + geom_line() +
facet_grid(name ~ ., scales = “free_y”) + ylab(“”)
350
300
250
200
40
30
20
10
Jan 2014
Apr 2014
Jul 2014
Oct 2014
Jan 2015
Date
21
Demand Temperature
Daily electricity demand
fit <- vic_elec_daily %>%
model(ARIMA(Demand ~ Temperature + I(Temperatureˆ2) +
report(fit)
(Day_Type==”Weekday”)))
## Series: Demand
## Model: LM w/ ARIMA(2,1,2)(2,0,0)[7] errors
##
## Coefficients:
## ar1 ar2 ma1 ma2 sar1 sar2
## -0.1093 0.7226 -0.0182 -0.9381 0.1958 0.417
## s.e. 0.0779 0.0739 0.0494 0.0493 0.0525 0.057
## Temperature I(Temperature^2)
## -7.614 0.1810
## s.e. 0.448 0.0085
## Day_Type == “Weekday”TRUE
## 30.40
## s.e. 1.33
##
22
## sigma^2 estimated as 44.91: log likelihood=-1206
Daily electricity demand
gg_tsresiduals(fit)
40
20
0
−20
Jan 2014
Apr 2014
Jul 2014
Date
Oct 2014
Jan 2015
0.1
0.0
−0.1
7 14 21
lag [1D]
40
20
0
−20 0 20 40
.resid
23
acf .resid
count
Daily electricity demand
augment(fit) %>%
features(.innov, ljung_box, dof = 9, lag = 14)
## # A tibble: 1 x 3
## .model
##
## 1 “ARIMA(Demand ~ Temperature + I(Tempera~
lb_stat lb_pvalue
28.4 0.0000304
24
Daily electricity demand
# Forecast one day ahead
vic_next_day <- new_data(vic_elec_daily, 1) %>%
mutate(Temperature = 26, Day_Type = “Holiday”)
forecast(fit, vic_next_day)
## # A fable: 1 x 6 [1D]
## # Key: .model [1]
## .model Date Demand .mean Temperature Day_Type
##
## 1 “ARIMA(D~ 2015-01-01 N(161, 45) 161. 26 Holiday
25
Daily electricity demand
vic_elec_future <- new_data(vic_elec_daily, 14) %>%
mutate(
Temperature = 26,
Holiday = c(TRUE, rep(FALSE, 13)),
Day_Type = case_when(
Holiday ~ “Holiday”,
wday(Date) %in% 2:6 ~ “Weekday”,
TRUE ~ “Weekend”
) )
26
Daily electricity demand
forecast(fit, new_data=vic_elec_future) %>%
autoplot(vic_elec_daily) + labs(y=”GW”)
350
300
250
200
150
l
evel
80 95
Jan 2014
Apr 2014
Jul 2014
Date
Oct 2014
Jan 2015
27
GW
Outline
1 Regression with ARIMA errors
2 Stochastic and deterministic trends
3 Dynamic harmonic regression
4 Lagged predictors
28
Stochastic & deterministic trends Deterministic trend
yt = β0 + β1t + ηt where ηt is ARMA process.
29
Stochastic & deterministic trends
Deterministic trend
yt = β0 + β1t + ηt where ηt is ARMA process.
Stochastic trend
yt = β0 + β1t + ηt where ηt is ARIMA process with d = 1.
29
Stochastic & deterministic trends
Deterministic trend
yt = β0 + β1t + ηt where ηt is ARMA process.
Stochastic trend
yt = β0 + β1t + ηt where ηt is ARIMA process with d = 1.
Difference both sides until ηt is stationary: y t′ = β 1 + η t′
where ηt′ is ARMA process.
29
Air transport passengers Australia
aus_airpassengers %>%
autoplot(Passengers) +
labs(y = “Passengers (millions)”,
title = “Total air passengers”)
60
40
20
Total air passengers
1970 1980
1990 2000 2010
Year [1Y]
30
Passengers (millions)
Air transport passengers Australia
Deterministic trend
fit_deterministic <- aus_airpassengers %>%
model(ARIMA(Passengers ~ 1 + trend() + pdq(d = 0)))
report(fit_deterministic)
## Series: Passengers
## Model: LM w/ ARIMA(1,0,0) errors
##
## Coefficients:
## ar1 trend() intercept
## 0.9564 1.415 0.901
## s.e. 0.0362 0.197 7.075
##
## sigma^2 estimated as 4.343: log likelihood=-101
## AIC=210 AICc=211 BIC=217
31
Air transport passengers Australia
Deterministic trend
fit_deterministic <- aus_airpassengers %>%
model(ARIMA(Passengers ~ 1 + trend() + pdq(d = 0)))
report(fit_deterministic)
## Series: Passengers
## Model: LM w/ ARIMA(1,0,0) errors
##
## Coefficients:
## ar1 trend() intercept
## 0.9564 1.415 0.901
## s.e. 0.0362 0.197 7.075
##
## sigma^2 estimated as 4.343: log likelihood=-101
## AIC=210 AICc=211 BIC=217
yt = 0.901 + 1.415t + ηt ηt = 0.956ηt−1 + εt
εt ∼NID(0,4.343).
31
Air transport passengers Australia
Stochastic trend
fit_stochastic <- aus_airpassengers %>%
model(ARIMA(Passengers ~ 1 + pdq(d = 1)))
report(fit_stochastic)
## Series: Passengers
## Model: ARIMA(0,1,0) w/ drift
##
## Coefficients:
##
##
## s.e.
##
## sigma^2 estimated as 4.271: log likelihood=-98.2
## AIC=200 AICc=201 BIC=204
constant
1.419
0.301
32
Air transport passengers Australia
Stochastic trend
fit_stochastic <- aus_airpassengers %>%
model(ARIMA(Passengers ~ 1 + pdq(d = 1)))
report(fit_stochastic)
## Series: Passengers
## Model: ARIMA(0,1,0) w/ drift
##
## Coefficients:
##
##
## s.e.
##
## sigma^2 estimated as 4.271: log likelihood=-98.2
## AIC=200 AICc=201 BIC=204
yt − yt−1 = 1.419 + εt,
yt =y0 +1.419t+ηt
ηt = ηt−1 + εt
εt ∼NID(0,4.271). 32
constant
1.419
0.301
Air transport passengers Australia
100
75
50
25
100
75
50
25
Forecasts from trend models
.model
Deterministic trend Stochastic trend
1970 1980
1990 2000 2010 2020
Year
level
80 95
33
Air passengers (millions)
Deterministic trend Stochastic trend
Air transport passengers Australia
aus_airpassengers %>%
autoplot(Passengers) +
autolayer(fit_stochastic %>% forecast(h = 20),
colour = “#0072B2”, level = 95) +
autolayer(fit_deterministic %>% forecast(h = 20),
colour = “#D55E00”, alpha = 0.65, level = 95) +
labs(y = “Air passengers (millions)”,
title = “Forecasts from trend models”)
34
Forecasts from trend models
100
75
50
25
l
1980 2000 2020
Year [1Y]
evel 95
Air passengers (millions)
Forecasting with trend
Point forecasts are almost identical, but prediction intervals differ.
Stochastic trends have much wider prediction intervals because the errors are non-stationary. Be careful of forecasting with deterministic trends too far ahead.
34
Outline
1 Regression with ARIMA errors
2 Stochastic and deterministic trends
3 Dynamic harmonic regression
4 Lagged predictors
35
Dynamic harmonic regression
Combine Fourier terms with ARIMA errors
Advantages
it allows any length seasonality;
for data with more than one seasonal period, you can include Fourier terms of different frequencies;
the seasonal pattern is smooth for small values of K (but more wiggly seasonality can be handled by increasing K);
the short-term dynamics are easily handled with a simple ARMA error.
Disadvantages
seasonality is assumed to be fixed
36
Eating-out expenditure
aus_cafe <- aus_retail %>% filter(
Industry == “Cafes, restaurants and takeaway food service
year(Month) %in% 2004:2018
) %>% summarise(Turnover = sum(Turnover))
aus_cafe %>% autoplot(Turnover)
s”,
37
4000
3500
3000
2500
2000
2005 Jan
2010 Jan
2015 Jan
Month [1M]
Turnover
Eating-out expenditure
fit <- aus_cafe %>% model(
`K = 1` `K = 2` `K = 3` `K = 4` `K = 5` `K = 6`
glance(fit)
= ARIMA(log(Turnover) ~ fourier(K = 1) + PDQ(0,0,0)),
= ARIMA(log(Turnover) ~ fourier(K = 2) + PDQ(0,0,0)),
= ARIMA(log(Turnover) ~ fourier(K = 3) + PDQ(0,0,0)),
= ARIMA(log(Turnover) ~ fourier(K = 4) + PDQ(0,0,0)),
= ARIMA(log(Turnover) ~ fourier(K = 5) + PDQ(0,0,0)),
= ARIMA(log(Turnover) ~ fourier(K = 6) + PDQ(0,0,0)))
.model sigma2
K=1 0.002
K=2 0.001
K=3 0.001
K=4 0.001
K=5 0.000
K=6 0.000
log_lik AIC
317 -616 362 -700 394 -763 427 -822 474 -919 474 -920
AICc BIC
-615 -588
-698 -661
-761 -725
-818 -771
-917 -875
-918 -875 38
Eating-out expenditure
Log transformed LM w/ ARIMA(2,1,3) errors, fourier(K = 1)
AICc = −615
5000
4000
3000
2000
2005 Jan
2010 Jan
2015 Jan
2020 Jan
Month
level
80 95
39
Turnover
Eating-out expenditure
Log transformed LM w/ ARIMA(5,1,1) errors, fourier(K = 2)
AICc = −698
5000
4000
3000
2000
2005 Jan
2010 Jan
2015 Jan
2020 Jan
Month
level
80 95
40
Turnover
Eating-out expenditure
Log transformed LM w/ ARIMA(3,1,1) errors, fourier(K = 3)
AICc = −761
5000
4000
3000
2000
2005 Jan
2010 Jan
2015 Jan
2020 Jan
Month
level
80 95
41
Turnover
Eating-out expenditure
Log transformed LM w/ ARIMA(1,1,5) errors, fourier(K = 4)
AICc = −818
5000
4000
3000
2000
2005 Jan
2010 Jan
2015 Jan
2020 Jan
Month
level
80 95
42
Turnover
Eating-out expenditure
Log transformed LM w/ ARIMA(2,1,0) errors, fourier(K = 5)
AICc = −917
5000
4000
3000
2000
2005 Jan
2010 Jan
2015 Jan
2020 Jan
Month
level
80 95
43
Turnover
Eating-out expenditure
Log transformed LM w/ ARIMA(0,1,1) errors, fourier(K = 6)
AICc = −918
5000
4000
3000
2000
2005 Jan
2010 Jan
2015 Jan
2020 Jan
Month
level
80 95
44
Turnover
Example: weekly gasoline products
fit <- us_gasoline %>%
model(ARIMA(Barrels ~ fourier(K = 6) + PDQ(0,0,0)))
report(fit)
## Series: Barrels
## Model: LM w/ ARIMA(0,1,1) errors
##
## Coefficients:
##
##
## s.e.
##
##
## s.e.
##
##
## s.e.
##
##
## s.e.
##
##
## s.e.
##
ma1 fourier(K = 6)C1_52 fourier(K = 6)S1_52
-0.896 -0.1122
0.013 0.0123
fourier(K = 6)C2_52 fourier(K = 6)S2_52
0.0419 0.0316
0.0099 0.0099
fourier(K = 6)C3_52 fourier(K = 6)S3_52
0.0832 0.0345
0.0094 0.0094
fourier(K = 6)C4_52 fourier(K = 6)S4_52
0.0186 0.0397
0.0093 0.0092
fourier(K = 6)C5_52 fourier(K = 6)S5_52
-0.0314 0.0010
0.0092 0.0092
-0.2300
0.0122
fourier(K = 6)C6_52 fourier(K = 6)S6_52 intercept
46
Example: weekly gasoline products
forecast(fit, h = “3 years”) %>%
autoplot(us_gasoline)
10
9
Barrels
8 95
level 80
7
1990 W01 1999 W52
2009 W53 2020 W01
Week
46
5-minute call centre volume
(calls <- readr::read_tsv("http://robjhyndman.com/data/callcenter.txt") %>
rename(time = X1) %>%
pivot_longer(-time, names_to = “date”, values_to = “volume”) %>%
mutate(
date = as.Date(date, format = “%d/%m/%Y”),
datetime = as_datetime(date) + time
) %>%
as_tsibble(index = datetime))
## # A tsibble: 27,716 x 4 [5m]
## time date
##
5-minute call centre volume
calls %>% fill_gaps() %>% autoplot(volume) 400
300
200
100
0
volume
Apr Jul Oct
datetime [5m]
48
5-minute call centre volume
calls %>% fill_gaps() %>%
gg_season(volume, period = “day”, alpha = 0.1) +
guides(colour = FALSE)
400
300
200
100
volume
0
00 06 12 18
50
datetime
5-minute call centre volume
library(sugrrants)
calls %>% filter(month(date, label = TRUE) == “Apr”) %>%
ggplot(aes(x = time, y = volume)) +
geom_line() + facet_calendar(date)
300 200 100
300 200 100
300 200 100
300 200 100
Apr 07
Apr 14
Apr 21
Apr 28
Apr 01
Apr 08
Apr 15
Apr 22
Apr 29
Apr 02
Apr 09
Apr 16
Apr 23
Apr 30
Apr
Apr
Apr
Apr
03
10
17
24
Apr 04
Apr 11
Apr 18
Apr 25
Apr 05
Apr 12
Apr 19
Apr 26
Apr 06
Apr 13
Apr 20
Apr 27
volume
51
10:0104:0:0108:0:00:0010:0104:0:0108:0:00:0010:0104:0:0108:0:00:00
5-minute call centre volume
calls_mdl <- calls %>%
mutate(idx = row_number()) %>%
update_tsibble(index = idx)
fit <- calls_mdl %>%
model(ARIMA(volume ~ fourier(169, K = 10) + pdq(d=0) + PDQ(0,0,0)))
report(fit)
## Series: volume
## Model: LM w/ ARIMA(1,0,3) errors
##
## Coefficients:
##
##
## s.e.
##
##
## s.e.
##
##
## s.e.
##
##
## s.e.
##
ar1 ma1 ma2 ma3 fourier(169, K = 10)C1_169
0.989 -0.7383 -0.0333 -0.0282
0.001 0.0061 0.0075 0.0060
fourier(169, K = 10)S1_169 fourier(169, K = 10)C2_169
55.298 -32.361
0.701 0.378
fourier(169, K = 10)S2_169 fourier(169, K = 10)C3_169
13.742 -9.318
0.379 0.273
fourier(169, K = 10)S3_169 fourier(169, K = 10)C4_169
-13.645 -2.791
0.273 0.223
fourier(169, K = 10)S4_169 fourier(169, K = 10)C5_169
-79.1 0.7
51
5-minute call centre volume
gg_tsresiduals(fit, lag = 338)
150 100 50 0 −50 −100
0.050 0.025 0.000
−0.025
0
10000
20000
acf .resid
count
100 200
lag [1]
300
−100
−50
0 50
.resid
100 150
idx
1000 750 500 250 0
52
5-minute call centre volume
fit %>% forecast(h = 1690) %>%
autoplot(calls_mdl)
400
300
200
100
0
level
80 95
volume
0 10000
20000
30000
idx
53
5-minute call centre volume
fit %>% forecast(h = 1690) %>%
autoplot(filter(calls_mdl, idx > 25000))
400
300
200
100
level
80 95
volume
25000 26000
27000
28000
29000
idx
54
Outline
1 Regression with ARIMA errors
2 Stochastic and deterministic trends
3 Dynamic harmonic regression
4 Lagged predictors
55
Lagged predictors
Sometimes a change in xt does not affect yt instantaneously
56
Lagged predictors
Sometimes a change in xt does not affect yt instantaneously
yt = sales, xt = advertising.
yt = stream flow, xt = rainfall.
yt = size of herd, xt = breeding stock.
56
Lagged predictors
Sometimes a change in xt does not affect yt instantaneously
yt = sales, xt = advertising.
yt = stream flow, xt = rainfall.
yt = size of herd, xt = breeding stock.
These are dynamic systems with input (xt) and output (yt).
xt is often a leading indicator.
There can be multiple predictors.
56
Lagged predictors
The model include present and past values of predictor: xt,xt−1,xt−2,….
yt =a+γ0xt +γ1xt−1 +···+γkxt−k +ηt where ηt is an ARIMA process.
57
Lagged predictors
The model include present and past values of predictor: xt,xt−1,xt−2,….
yt =a+γ0xt +γ1xt−1 +···+γkxt−k +ηt
where ηt is an ARIMA process. Rewrite model as
yt =a+(γ0 +γ1B+γ2B2 +···+γkBk)xt +ηt = a + γ(B)xt + ηt.
57
Lagged predictors
The model include present and past values of predictor: xt,xt−1,xt−2,….
yt =a+γ0xt +γ1xt−1 +···+γkxt−k +ηt
where ηt is an ARIMA process. Rewrite model as
yt =a+(γ0 +γ1B+γ2B2 +···+γkBk)xt +ηt = a + γ(B)xt + ηt.
γ(B) is called a transfer function since it describes how change in xt is transferred to yt.
x can influence y, but y is not allowed to influence x.
57
Example: Insurance quotes and TV adverts
insurance %>%
pivot_longer(Quotes:TVadverts) %>%
ggplot(aes(x = Month, y = value)) + geom_line() +
facet_grid(vars(name), scales = “free_y”) +
labs(y = NULL, title = “Insurance advertising and quotations”)
18 16 14 12 10
8 11
10 9 8 7 6
Insurance advertising and quotations
2002 Jan
2003 Jan
2004 Jan
2005 Jan
Month
59
Quotes TVadverts
Example: Insurance quotes and TV adverts
Insurance advertising and quotations
18
16
14
12
10
8
6 7 8 9 10 11 6 7 8 9 10 11 6 7 8 9 10 11
TV_advert
lag0
lag1
lag2
59
Quotes
Example: Insurance quotes and TV adverts
fit <- insurance %>%
# Restrict data so models use same fitting period mutate(Quotes = c(NA,NA,NA,Quotes[4:40])) %>%
# Estimate models
model(
ARIMA(Quotes ~ pdq(d = 0) + TVadverts),
ARIMA(Quotes ~ pdq(d = 0) + TVadverts + lag(TVadverts)),
ARIMA(Quotes ~ pdq(d = 0) + TVadverts + lag(TVadverts) +
lag(TVadverts, 2)),
ARIMA(Quotes ~ pdq(d = 0) + TVadverts + lag(TVadverts) +
)
lag(TVadverts, 2) + lag(TVadverts, 3))
60
Example: Insurance quotes and TV adverts
glance(fit)
Lag order
0 1 2 3
sigma2
0.265 0.209 0.215 0.206
log_lik AIC AICc BIC
-28.3 66.6 68.3 75.0 -24.0 58.1 59.9 66.5 -24.0 60.0 62.6 70.2 -22.2 60.3 65.0 73.8
61
Example: Insurance quotes and TV adverts
fit_best <- insurance %>%
model(ARIMA(Quotes ~ pdq(d=0) + TVadverts + lag(TVadverts)))
report(fit_best)
## Series: Quotes
## Model: LM w/ ARIMA(1,0,2) errors
##
## Coefficients:
## ar1 ma1 ma2
## 0.512 0.917 0.459
## s.e. 0.185 0.205 0.190
##
## sigma^2 estimated as 0.2166: log likelihood=-23.9
## AIC=61.9 AICc=65.4 BIC=73.7
TVadverts
1.2527
0.0588
lag(TVadverts) intercept
0.1464 2.16
0.0531 0.86
62
Example: Insurance quotes and TV adverts
fit_best <- insurance %>%
model(ARIMA(Quotes ~ pdq(d=0) + TVadverts + lag(TVadverts)))
report(fit_best)
## Series: Quotes
## Model: LM w/ ARIMA(1,0,2) errors
##
## Coefficients:
## ar1 ma1 ma2
## 0.512 0.917 0.459
## s.e. 0.185 0.205 0.190
##
## sigma^2 estimated as 0.2166: log likelihood=-23.9
## AIC=61.9 AICc=65.4 BIC=73.7
yt = 2.155 + 1.253xt + 0.146xt−1 + ηt,
ηt = 0.512ηt−1 + εt + 0.917εt−1 + 0.459εt−2,
TVadverts
1.2527
0.0588
lag(TVadverts) intercept
0.1464 2.16
0.0531 0.86
62
Example: Insurance quotes and TV adverts
advert_a <- new_data(insurance, 20) %>%
mutate(TVadverts = 10)
forecast(fit_best, advert_a) %>% autoplot(insurance)
18
16
14
12
10
8
l
2002 Jan
2004 Jan
2006 Jan
Month
evel
80 95
63
Quotes
Example: Insurance quotes and TV adverts
advert_b <- new_data(insurance, 20) %>%
mutate(TVadverts = 8)
forecast(fit_best, advert_b) %>% autoplot(insurance)
18
16
14
12
10
8
l
2002 Jan
2004 Jan
2006 Jan
Month
evel
80 95
64
Quotes
Example: Insurance quotes and TV adverts
advert_c <- new_data(insurance, 20) %>%
mutate(TVadverts = 6)
forecast(fit_best, advert_c) %>% autoplot(insurance)
18
16
14
12
10
8
l
2002 Jan
2004 Jan
2006 Jan
Month
evel
80 95
65
Quotes