Office Use Only
UNIT CODES: TITLE OF PAPER: EXAM DURATION:
Semester One 2019 Examination Period
Faculty of Business & Economics
ETF3231/ETF5231
“Business Forecasting – PAPER 1” 3.5 hours
Page1of18
The exam contains FIVE questions. ALL questions must be answered. The exam is worth 100 marks in total.
SECTION A
Write about a quarter of a page each on any FOUR of the following topics. (Clearly state if you agree or disagree with each statement. No marks will be given without any justification.)
1. The trouble with statistical methods of forecasting is that they assume the patterns in the past will continue in the future.
2. A time series decomposition into trend, seasonal and remainder terms is only useful when there are no cycles in the data.
3. Some ETS models are not always suitable and should be avoided.
4. The random walk model is a non-stationary process.
5. The combination of AR and MA components guide long-run ARIMA forecasts.
6. Linear regression models are simplistic because the real world is nonlinear.
—ENDOFSECTIONA—
Total: 20 marks
Page2of18
SECTION B
Figure 1 shows the number of employees (in thousands) in child day care services in New York City over the period the period January 1990–February 2019.
Number of employees in child day care services in New York City
30
20
1990 2000
2010 2020
Year
Figure 1:
1. The following code has been used to produce Figures 1, 2 and 3.
daycare %>%
autoplot(Count) +
ggtitle(“Number of employees in child day care services in New York City”) + xlab(“Year”) + ylab(“Thousands of persons”)
daycare %>%
gg_season(Count, labels=’both’) +
ggtitle(“Number of employees in child day care services in New York City”) + ylab(“Thousand of persons”)
daycare %>%
gg_subseries(Count) +
ggtitle(“Number of employees in child day care services in New York City”) + ylab(“Thousand of persons”)
Using Figures 1, 2 and 3, describe the daycare series.
4 marks
Page3of18
Thousands of persons
Number of employees in child day care services in New York City
20
20
18
19
20
19
20
20
20
20
20
17
18
16
15
14
20
20
20
20
20
20
20
20 20
17
16
15
14
12
13
11
1009 08
20
20
20
20
20
20
20
20
20
20
20
20
20
20
13
11
12
10
08
09
07
06
05
04
03
02
01
00
20
20
20
20
20
20
20
06
07
05
04
03
02
01
20
19
19
19
19
19
19
00
99
98
97
96
95
94
19
19
19
19
19
19
19
99
98
97
96
95
94
93
19
19
19
19
93
92
90
91
19
19
19
92
91
90
30
20
Jan Feb Mar Apr May Jun
Jul Aug Sep Oct Nov Dec
Date
Figure 2:
Number of employees in child day care services in New York City
Jan Feb Mar Apr May Jun
Jul Aug Sep Oct Nov Dec
30
20
Date
Figure 3:
Page4of18
1990
2000
2010
2020 1990
2000 2010
2020 1990
2000 2010
2020 1990
2000 2010
2020 1990
2000 2010
2020 1990
2000 2010
2020 1990
2000 2010
2020 1990
2000 2010
2020 1990
2000 2010
2020 1990
2000 2010
2020 1990
2000 2010
2020 1990
2000
2010
2020
Thousand of persons
Thousand of persons
2. Using the code below, describe what is plotted in Figure 4. Comment on the selection of window.
daycare %>%
model(STL(log(Count) ~ season(window = 21))) %>%
components() %>% autoplot() +
ggtitle(“Number of employees in child day care services in New York City”)
Number of employees in child day care services in New York City `log(Count)` = trend + season_year + remainder
3.50 3.25 3.00 2.75
3.6 3.4 3.2 3.0 2.8 2.6
0.03
0.00 −0.03 −0.06
0.02 0.01 0.00
−0.01 −0.02
1990 2000
2010 2020
Date
Figure 4:
3. You are asked to provide forecasts for the next two years for the daycare series shown in Figure 1. Consider applying each of the methods and models below. Comment, in a few words each, on whether each one is appropriate for forecasting the data. No marks will be given for simply guessing whether a method or a model is appropriate without justifying your choice. Start your response by stating: SUITABLE or NOT SUITABLE.
(a) Seasonal naïve method.
(b) Drift method plus seasonal dummies. (c) Holt-Winters additive damped trend
method.
(d) Holt-Winters multiplicative damped trend
(f) ETS(M,Ad,M).
(g) ARIMA(1,1,4).
(h) ARIMA(3,1,2)(1,1,0)12.
method.
(e) ETS(A,N,M).
(i) (j)
ARIMA(0,1,1)(2,0,0)12.
Regression model with time and Fourier terms.
—ENDOFSECTIONB—
6 marks
10 marks
Total: 20 marks
Page5of18
log(Count) trend season_year remainder
SECTION C
The following R code and output concerns two models for the daycare series plotted in Figure 1. The estimated components of the models are plotted in Figure 5.
## Series: Count
## Model: ETS(M,A,M)
##
##
##
##
##
##
## l b s1 s2 s3 s4 s5 s6 s7 s8
## 14.013 0.061175 1.0192 1.0275 1.0127 1.0027 0.95847 0.96163 1.0028 0.9928
## s9 s10 s11 s12
## 1.0102 1.0171 1.0017 0.99314
##
## sigma^2: 2e-04
##
## AIC AICc BIC
## 1198.8 1200.6 1264.4
fit_ets %>% select(damped) %>% report()
## Series: Count
## Model: ETS(M,Ad,M)
##
##
##
##
##
##
##
## l b s1 s2 s3 s4 s5 s6 s7 s8
## 13.545 0.045311 1.022 1.0193 1.0126 0.98462 0.92759 0.94846 1.0018 1.0164
## s9 s10 s11 s12
## 1.0204 1.0211 1.014 1.0118
##
## sigma^2: 1e-04
##
## AIC AICc BIC
## 1147.3 1149.4 1216.8
fit_ets <- daycare %>% model(
trend = ETS(Count ~ trend(method = “A”)),
damped = ETS(Count ~ trend(method = “Ad”)) )
fit_ets %>% select(trend) %>% report()
Smoothing parameters:
alpha = 0.46736
beta = 0.00010498
gamma = 0.35681
Initial states:
Smoothing parameters:
alpha = 0.89049
beta = 0.015802
gamma = 0.069298
phi = 0.98
Initial states:
Page6of18
ETS(M,A,M) decomposition Components
30 20
35 30 25 20 15
0.06125 0.06100 0.06075 0.06050
1.00 0.95 0.90
0.050 0.025 0.000
−0.025 −0.050
30 20
35 30 25 20 15
0.06 0.05 0.04 0.03 0.02 0.01
1.025 1.000 0.975 0.950 0.925
0.04 0.02 0.00
−0.02
1990 2000
2010 2020
ETS(M,Ad,M) decomposition Components
Date
Count level slope season remainder Count level slope season remainder
1990 2000
2010 2020
Figure 5:
Date
Page7of18
1. Comment on the differences between the two model specifications.
2. Comment on Figure 5 and how this relates to the estimated parameters of the models.
3. Figures 6 and 7 and the R-output below these relate to the residuals from the two estimated models. Comment on these in relation to the fit of the models. Give as many details as you can. What do your conclusions here imply about using these models for forecasting?
2 marks
6 marks
4 marks
fit_ets %>% select(trend) %>% gg_tsresiduals()
0.050 0.025 0.000
−0.025 −0.050
1990 2000
2010 2020
Date
0.4
0.2
0.0
−0.2
6 12 18 24
lag [1M]
60
40
20
0
−0.03 0.00 0.03
.resid
## # A tibble: 1 x 3
## .model lb_stat lb_pvalue
##
## 1 trend 191. 0
Figure 6:
fit_ets %>%
select(trend) %>%
augment() %>%
features(.resid, ljung_box, lag = 24, dof = 16)
Page8of18
acf .resid
count
fit_ets %>% select(damped) %>% gg_tsresiduals()
0.04 0.02 0.00
−0.02
1990 2000
2010 2020
Date
0.2 0.1 0.0
−0.1
6 12 18 24
lag [1M]
60
40
20
0
−0.02 0.00 0.02 0.04
.resid
## # A tibble: 1 x 3
## .model lb_stat lb_pvalue
##
## 1 damped 98.1 0
Figure 7:
fit_ets %>%
select(damped) %>%
augment() %>%
features(.resid, ljung_box, lag = 24, dof = 17)
4. Considering all the analysis so far, which model would you choose for forecasting and why?
5. Figure 8 shows forecasts from the two models (in order to improve visualization only data from 2000 onwards is included in the plots). Comment on the two sets of forecasts. Based on these would you change your decision as to which model you would choose for forecasting.
2 marks
4 marks
Page9of18
acf .resid
count
Forecasts from ETS(M,A,M)
40
35
30
25
20
.level 80
95
2000 2005 2010 2015 2020
Year
Forecasts from ETS(M,Ad,M)
40
35
30
25
20
.level 80
95
2000 2005 2010 2015 2020
Year
Figure 8:
6. Write down in full your selected estimated model.
—ENDOFSECTIONC—
2 marks
Total: 20 marks
Page 10 of 18
Thousand of persons Thousand of persons
SECTION D
1. Figures 9 and 10 show time plots, ACFs and PACFs related to the daycare series. The variables plotted were constructed as follows.
daycare %>% mutate(
log_count = log(Count),
diff_log_count = difference(log(Count)),
sdiff_log_count = difference(log(Count),12), diff_sdiff_log_count = difference(difference(log(Count),12))
)
3.50 3.25 3.00 2.75
0.05
0.00
−0.05
0.12
0.08
0.04
0.00
0.04
0.00
−0.04
1990 2000
2010 2020
Date
Figure 9:
Page 11 of 18
value
1. log_count 2. diff_log_count 3. sdiff_log_count 4. diff_sdiff_log_count
Explain what each of the ACFs and PACFs show about the stationarity, seasonality and other features of the time series.
1.00 0.75 0.50 0.25 0.00
ACF of log_count
6 12 18 24
lag [1M]
1.0
0.5
0.0
PACF of log_count
6 12 18 24
lag [1M]
ACF of diff_log_count
0.5
0.0
6 12 18 24
lag [1M]
PACF of diff_log_count
0.4
0.0
−0.4
6 12 18 24
lag [1M]
ACF of sdiff_log_count
0.6
0.3
0.0
−0.3
6 12 18 24
lag [1M]
PACF of sdiff_log_count
0.6
0.3
0.0
6 12 18 24
lag [1M]
ACF of diff_sdiff_log_count
0.0
−0.2
−0.4
6 12 18 24
lag [1M]
PACF of diff_sdiff_log_count
0.0
−0.2
−0.4
6 12 18 24
lag [1M]
Figure 10:
4 marks
Page 12 of 18
acf acf acf acf
pacf pacf pacf pacf
2. The following R code and output concerns two models estimated for the daycare series. Relate the estimated models to the relevant ACFs and PACFs from Figure 10 and describe what dynamics they capture.
## Series: Count
## Model: ARIMA(1,0,1)(0,1,1)[12] w/ drift
## Transformation: log(.x)
##
## Coefficients:
## ar1 ma1 sma1 constant
## 0.9191 -0.0920 -0.7295 0.0026
## s.e. 0.0257 0.0621 0.0403 0.0002
##
## sigma^2 estimated as 0.0001186: log likelihood=1045.4
## AIC=-2080.8 AICc=-2080.6 BIC=-2061.7
fit.arima %>% select(arima2) %>% report()
## Series: Count
## Model: ARIMA(0,1,1)(0,1,1)[12]
## Transformation: log(.x)
##
## Coefficients:
6 marks
fit.arima <- daycare %>% model(
arima1 = ARIMA(log(Count)),
arima2 = ARIMA(log(Count) ~ pdq(d=1)) )
fit.arima %>% select(arima1) %>% report()
##
##
## s.e. 0.0576 0.0399
##
## sigma^2 estimated as 0.0001224: log likelihood=1036.6
## AIC=-2067.2 AICc=-2067.1 BIC=-2055.7
3. Write down the estimated model of arima2 using backshift notation and expand this to the point where it can be used to generate point forecasts.
ma1 sma1
-0.1420 -0.7254
4 marks
Page 13 of 18
4. Some diagnostics for the residuals of the model associated with arima2 are presented below. Briefly comment as to whether these are satisfactory (comment on the significant spikes in the ACF).
0.050
0.025
0.000
−0.025
1990 2000
2010 2020
Date
0.10 0.05 0.00
−0.05 −0.10 −0.15
6 12 18 24
lag [1M]
60
40
20
0
−0.025 0.000 0.025
.resid
0.050
## # A tibble: 1 x 3
## .model lb_stat lb_pvalue
##
## 1 arima2 26.6 0.228
Figure 11:
2 marks
Page 14 of 18
acf .resid
count
5. Figure 12 plots forecasts from the two ARIMA models estimated in part (b). Comment on the difference in the point and interval forecasts and in particular the role the differencing plays.
4 marks
40
35
30
25
20
40
35
30
25
20
2000 2005
2010 2015 2020
Date
.model arima1
arima2
.level 80
95
Figure 12:
—ENDOFSECTIOND—
Total: 20 marks
Page 15 of 18
Count
arima1 arima2
SECTION E
In the following code, a series of dynamic harmonic regression models are fitted to the day care data shown in Figure 1.
dhr <- daycare %>% model(
fit1 = ARIMA(log(Count) ~ trend() + fourier(12,1)), fit2 = ARIMA(log(Count) ~ trend() + fourier(12,2)), fit3 = ARIMA(log(Count) ~ trend() + fourier(12,3)), fit4 = ARIMA(log(Count) ~ trend() + fourier(12,4)), fit5 = ARIMA(log(Count) ~ trend() + fourier(12,5)), fit6 = ARIMA(log(Count) ~ trend() + fourier(12,6)), #fit7 = ARIMA(log(Count) ~ trend() + fourier(12,7))
) glance(dhr)
## # A tibble: 6 x 8
## .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
##
## 1 fit1
## 2 fit2
## 3 fit3
## 4 fit4
## 5 fit5
## 6 fit6
0.000163
0.000136
0.000130
0.000120
0.000115
0.000115
1024. -2034. -2034. -2007.
1060. -2096. -2095. -2049.
1071. -2116. -2115. -2066.
1089. -2148. -2146. -2090.
1097. -2160. -2158. -2094.
1097. -2158. -2156. -2088.
1. The seventh model (commented out) would cause an error if the code was run. Why?
2. Which model would you select from the six models fitted? Why?
3. One of the models has the following output:
## Series: Count
## Model: LM w/ ARIMA(1,0,1)(1,0,1)[12] errors
## Transformation: log(.x)
##
## Coefficients:
##
##
## s.e.
##
##
## s.e.
##
## 0.0049 0.0117 -0.0021
## s.e. 0.0012 0.0012 0.0010
ar1 ma1 sar1 sma1 trend() fourier(12, 5)C1_12
0.9016 -0.0615 0.8082 -0.5787 0.0027 0.0238
0.0271 0.0622 0.0784 0.1056 0.0001 0.0030
fourier(12, 5)S1_12 fourier(12, 5)C2_12 fourier(12, 5)S2_12
0.0139 -0.0155 -0.0184
0.0030 0.0016 0.0016
fourier(12, 5)C3_12 fourier(12, 5)S3_12 fourier(12, 5)C4_12
##
##
## s.e.
##
## sigma^2 estimated as 0.000115: log likelihood=1096.8
## AIC=-2159.6 AICc=-2157.7 BIC=-2094
fourier(12, 5)S4_12 fourier(12, 5)C5_12 fourier(12, 5)S5_12 intercept
-0.0072 -0.0011
0.0010 0.0009
0.0025
0.0009
2.6195
0.0184
2 marks
1 marks
Page 16 of 18
Write down the form of the model using equations. Explain how each of the coefficients contributes to the forecast function.
4. Using the model above, the point forecast for the next observation is 35.34. Give a 95% prediction interval for this month.
5. Why is the selected ARIMA model stationary when the data are clearly nonstationary?
6. For the model shown above, the residuals can be tested to assess the model assumptions.
8 marks
3 marks
2 marks
0.04 0.02 0.00
−0.02
1990 2000
2010 2020
Date
0.1
0.0
−0.1
6 12 18 24
lag [1M]
60
40
20
0
−0.04 −0.02
0.00 0.02 0.04
.resid
## # A tibble: 1 x 2
## ## ## 1
lb_stat lb_pvalue
27.6 0.00111
Figure 13:
Comment on what this tells you about the model assumptions. Should you trust the point forecasts and forecast intervals that are produced from the model?
—ENDOFSECTIONE—
4 marks
Total: 20 marks
Page 17 of 18
acf .resid
count
Table 1: State space equations for each of the models in the ETS framework. ADDITIVE ERROR MODELS
Trend
N yt =lt−1+εt
Seasonal NAM
lt =lt−1+αεt
yt =lt−1+bt−1+εt
A lt =lt−1+bt−1+αεt bt =bt−1+βεt
yt =lt−1+φbt−1+εt Ad lt =lt−1+φbt−1+αεt
bt =φbt−1+βεt
MULTIPLICATIVE ERROR MODELS
yt =lt−1+st−m+εt lt =lt−1+αεt
st =st−m+γεt
yt =lt−1+bt−1+st−m+εt lt =lt−1+bt−1+αεt
bt =bt−1+βεt
st =st−m+γεt
yt =lt−1+φbt−1+st−m+εt lt =lt−1+φbt−1+αεt
bt =φbt−1+βεt
st =st−m+γεt
yt =lt−1st−m+εt
lt =lt−1+αεt/st−m st =st−m+γεt/lt−1
yt =(lt−1+bt−1)st−m+εt
lt =lt−1+bt−1+αεt/st−m bt =bt−1+βεt/st−m
st =st−m+γεt/(lt−1+bt−1)
yt =(lt−1+φbt−1)st−m+εt lt =lt−1+φbt−1+αεt/st−m bt =φbt−1+βεt/st−m
st =st−m+γεt/(lt−1+φbt−1)
Trend
N yt =lt−1(1+εt)
lt =lt−1(1+αεt)
yt =(lt−1+bt−1)(1+εt)
A lt =(lt−1+bt−1)(1+αεt) bt = bt−1 + β(lt−1 + bt−1)εt
yt =(lt−1+φbt−1)(1+εt) Ad lt =(lt−1+φbt−1)(1+αεt)
bt = φbt−1 + β(lt−1 + φbt−1)εt
Seasonal NAM
yt =(lt−1+st−m)(1+εt)
lt =lt−1+α(lt−1+st−m)εt st =st−m+γ(lt−1+st−m)εt
yt =(lt−1+bt−1+st−m)(1+εt)
lt =lt−1+bt−1+α(lt−1+bt−1+st−m)εt bt = bt−1 + β(lt−1 + bt−1 + st−m)εt
st =st−m+γ(lt−1+bt−1+st−m)εt
yt =(lt−1+φbt−1+st−m)(1+εt)
lt =lt−1+φbt−1+α(lt−1+φbt−1+st−m)εt
bt = φbt−1 + β(lt−1 + φbt−1 + st−m)εt st =st−m+γ(lt−1+φbt−1+st−m)εt
yt =lt−1st−m(1+εt) lt =lt−1(1+αεt)
st =st−m(1+γεt)
yt =(lt−1+bt−1)st−m(1+εt) lt =(lt−1+bt−1)(1+αεt)
bt = bt−1 + β(lt−1 + bt−1)εt st =st−m(1+γεt)
yt =(lt−1+φbt−1)st−m(1+εt) lt =(lt−1+φbt−1)(1+αεt)
bt = φbt−1 + β(lt−1 + φbt−1)εt st =st−m(1+γεt)
Page 18 of 18