SECTION A
Forecasting Exam 2020: solutions
Write about a quarter of a page each on any four of the following topics. (Clearly state if you agree or disagree with each statement. No marks will be given without any justification.)
Deduct marks for each major thing missed, and for each wrong statement. In general, be relatively generous if the answer makes sense and contains the main ideas.
1. The disadvantage of using a test set for choosing a forecasting model is that it uses only a small proportion of the data.
• This is true.
• Alternative is to use information criteria – use all data
• These are not useful/helful in all situations, e.g., for comparing across model classes or
ARIMA with different orders, etc.
• Another alternative is to use cross-validation.
• If used across many series and across various forecast horizons it can be slow. In this case
may be test sets across the many series will be useful.
2. The best forecasting models adapt rapidly to changes in the trend and seasonal patterns.
• This is not necessarily true.
• Good models need to be adaptive, but it is possible to overfit if the model adapts too
quickly to changes in the data.
• Models that don’t adapt to changes will give biased forecasts, while models that adapt too
quickly to changes will have inflated variances.
3. With STL decompositions and ETS models, we always need to transform our data before
estimating the components.
• This is not true.
• STL is an additive decomposition and therefore we need to take a transformation before
we apply the decomposition if the data has monotonically changing variance.
• ETS models can have multiplicative components and therefore we do not need to take a
transformation as these will take care of this.
4. The mean of a stationary AR(3) process
• This is not true.
• The mean in related to the contsant but it is not equal to this
• The actual mean can be computed like this.
1
1
1
1
1
1
2
2
1
1
2
2
1
2
2
yt =c + φ1yt−1 + φ2yt−2 + φ3yt−3 + εt
E(yt) =c + φ1E(yt−1) + φ2E(yt−2) + φ3E(yt−3) + E(εt)
1−φ1 −φ2 −φ3
5. ARIMA models are better than ETS models because there are more possible models available.
• This is not true.
• There are more ARIMA models available, but that does not make them a better class of
models.
• ETS models allow for multiplicative behaviour, while ARIMA models do not.
• ARIMA models handle some types of data (e.g., cyclic, stationary) which ETS models
cannot handle.
6. Regression models are not very useful for forecasting because you have to forecast all the
predictors as well.
• This is not true.
• Sometimes the predictors are known such as trend, seasonality, public holidays, other
calendar effects, lagged predictors.
• When the predictors are unknown in the future, you do have to forecast them, or create
μ=c+φ1μ+φ2μ+φ3μ μ=c
1
2
scenario forecasts.
— END OF SECTION A —
2
[Total: 20 marks]
1
1
1
2
2
SECTION B
1. Marks are allocated for the four main features, trend, seasonality, two-peaks, COVID-19.
1
1
1
1
2.
Time plot:
• the series is trending.
• strong seasonality with seasonal variation increasing with the level of the series, i.e.,
multiplicative seasonality.
• last two peaks seem lower than previous years.
Seasonal plot:
• There are two peaks, one in February, due to the beginning of year, and one in July, mid-year break.
• The drop in February 2020 shows the effect of COVID-19 and the travel bans. Subseries plot:
• Effect of COVID-19 and travel bans clearly shown in the February 2020.
• The average and growth of the peak season arrivals of February and July are almost
identical. Flatter growth shown across May-Jun and Nov-Dec.
• The panels show STL decompositions, where the lower three panels show trend-cycle, seasonal and remainder components.
• As STL is an additive decomposition a log transformation is first applied to account for the multiplicative seasonality.
• The default setting for trend(window) and season(window) are 21 and 13 respectively. These seem to be appropriate here with both the trend and the seasonal components smoothly changing over time.
• The reminder in both decompositions shows a large value for Feb 2020 due to the travel ban restrictions related to COVID-19.
• Setting robust=TRUE removes its effect from the trend and makes the remainder value even larger.
• For forecasting, the robust setting doesn’t make much difference as it hardly affects seasonality, and the seasonally adjusted values will therefore be almost identical either way. For analysing the trend, we know that the downturn is likely to continue, so it should show up in the trend component. In that case, it is probably better not to use robust=TRUE, and instead to choose a smaller trend window. (Any sensible remarks along these lines are ok.)
(a) Seasonal naive.
Not suitable. Data needs to be transformed and is also trending.
(b) An STL decomposition combined with the drift method to forecast the seasonal adjusted component.
Not suitable. Data needs transformation before STL is implemented. Also drift will not capture the change in the trend due to COVID-19.
3.
3
1
1
1
1
1
1
1
1
(c) An STL decomposition on the log transformed data combined with an ETS to forecast the seasonally adjusted component.
Suitable. Log transformation will take care of the multiplicative seasonality and ETS trend should update enough to deal with change in trend.
(d) Holt-Winters method with damped trend and multiplicative seasonality.
Suitable, both multiplicative seasonality and damped trend would be appropriate.
(e) ETS(A,A,A).
Not Suitable. Some multiplicative component would be required.
(f) ETS(M,A,M).
Suitable. Accounting for multiplicative components.
(g) ARIMA(1,12,4).
Not suitable. Not a feasible model.
(h) ARIMA(3,2,1)(1,1,0)12} on the log transformed data. Not suitable. Too many differences.
(i) ARIMA(3,1,1)(2,1,0)12} on the log transformed data.
Suitable. Order of differencing seems fine. Of course you will need to check the
residuals.
(j) Regression with time and Fourier terms.
Not suitable. Seasonality is changing so need transformation. Also need something like a step variable, but probably hard to estimate its coefficient successfully with only limited data.
— END OF SECTION B —
4
[Total: 20 marks]
1
1
1
1
1
1
1
1
SECTION C
1. fit is a mable (model table) containing two models for the Chinese education series: an ETS model and a model combining an STL decomposition and ETS.
2. The full estimated model is:
yt =(lt−1+bt−1)st−m(1+εt)
lt =(lt−1+bt−1)(1+0.1929εt) bt = bt−1 + 0.0066(lt−1 + bt−1)εt st =st−m(1+0.292εt)
εt ∼N(0,0.0329).
3. Figure 6 shows the data in the first panel and below that the estimated components of the
ETS model.
• The level is adjusting over time, with a smoothing parameter of α = 0.2, especially showing an increase after 2015.
• Although the slope coefficient is relatively low β = 0.0066 the slope is adjusting, especially showing an increase after 2015 (not much change before that).
• It is clearly visible that the seasonal component is rapidly changing/adjusting with a coefficient of γ = 0.292.
• The level and trend show a dip at the end showing the model adjusting and trying to accounting for the travel bans and COVID-19. The last peak of the seasonal component is also lower than previous peaks. The remainder shows a large residual due to the travel bans and COVID-19.
4. Residuals look like WN, with no significant autocorrelation left over. They are close to normally distributed. The effect of the travel bans and COVID-19 is visible by the large residual for Feb 2020.
5. Point forecasts:
Forecast intervals:
yˆMar,2020 = (22.4 + 0.0747) ∗ 1.08 = 24.273 yˆApr,2020 = (22.4 + 2 ∗ 0.0747) ∗ 0.563 = 12.695
√ √
Mar 2020 : 24.273 ± 1.96 ∗ Apr 2020 : 12.695 ± 1.96 ∗
19.4 = (15.64, 32.906) 5.5 = (8.098, 17.292)
6. The alternative model dcmp fits an STL decomposition on the log() transformed data. The seasonal component is projected using a seasonal naïve and the seasonally adjusted series usinganETSmodel. Itseemsthatthedcmpadjustsmoreseverelytothebreakinlevel and seasonal component compared to the ETS model.
2
1
— END OF SECTION C —
5
[Total: 20 marks]
2
4
1
1
1
2
2
2
1
1
SECTION D
1. Time plot:
2.
• Strong seasonal component 1
• significant break in the level towards the end due to COVID-19 and the social
distancing measures 1
• significant break in the variance together with the level towards the end due to
COVID-19 and the social distancing measures 1 Subseries plot
• Maximum average pedestrian traffic happens on a Friday. Weekends have lower pedestrian traffic than weekdays with Sunday being the lowest. 1
• Seasonal component. ACF shows 1 spike, PACF exponential decay. Hence, I choose PDQ(0,1,1) 2
• Non-seasonal. PACF shows 3 spikes (maybe 5), ACF decaying. Hence I choose pdq(3,0,0)
2
3. We expect something like the following output.
## Series: Count
## Model: ARIMA(3,0,0)(0,1,1)[7] w/ drift
##
## Coefficients:
## ar1 ar2 ar3 sma1 constant
##
## s.e.
##
## sigma^2 estimated as 2.028: log likelihood=-893.01
## AIC=1798 AICc=1798.2 BIC=1823.3
Correct model and pasted estimation output.
2.5
0.0 −2.5 −5.0 −7.5
0.05
0.00 −0.05 −0.10
2019−01
2019−07
2020−01
0.4543 0.1766 0.2633 -0.8241 -0.0140
0.0439 0.0470 0.0431 0.0344 0.0115
acf .resid
count
lag [1D]
.resid
6
Date
60 40 20
0
71421 −505
1
4.
5.
• There are a bunch of residuals after the COVID-19 date that show some significant pattern. These may indicate that the model has not best dealt with this.
• ACF has one significant spike at lag 4 (its value fairly low). However I would change my model to capture this. Possibly introduce an MA component to the model.
• Histogram shows a longer left tail due to COVID-19 but otherwise fairly satisfied with assuming normality
## # A tibble: 1 x 3
## .model
##
## 1 ARIMA(Count ~ pdq(3, 0, 0) + PDQ(0, 1, 1)) 20.1 0.0176
Ljung-Box for lag 14 with dof=5 rejects the null of WN at 5% level of significance.
• The residuals look more like white noise, although there is still pattern after the COVID-19 break.
• The point forecasts are flat and too seasonal, having not properly accounted for the COVID-19 break.
• The width of the prediction intervals are too large and have negative lower bounds, which cannot be the case for the number of pedestrians on Melbourne streets. Taking logs would avoid this.
ARIMA(1,0,1)(0,1,1)[7]:
(1−φ1B)(1−B7)yt =(1+θ1B)(1+Θ1B7)εt
(1−φ1B−B7+φ1B8)yt =(1+θ1B+Θ1B7+θ1Θ1B8)εt
yt = φ1yt−1 + yt−7 − φ1yt−8 + εt + θ1εt−1 + Θ1εt−7 + θ1Θ1εt−8
= 0.9769yt−1 + yt−7 − 0.9769yt−8 + εt − 0.6132εt−1 − 0.8459εt−7 + 0.5187εt−8
yˆT+1|T =0.9769yT +yT−6−0.9769yT−7−0.6132eT −0.8459eT−6+0.5187eT−7
= 0.9769 ∗ 3.27 + 3.22 − 0.9769 ∗ 3.12 − 0.6132 ∗ 1.12 − 0.8459 ∗ (−0.32) + 0.5187 ∗ 0.94 = 3.43
yˆT+2|T =0.9769yˆT+1|T +yT−5−0.9769yT−6−0.8459eT−5+0.5187eT−6
= 0.9769 ∗ 3.43 + 3.17 − 0.9769 ∗ 3.22 − 0.8459 ∗ (−0.62) + 0.5187 ∗ (−0.32) = 3.74
Prediction interval for 1-step ahead
3.43 ± 1.96
√
lb_stat lb_pvalue
1.97 = [0.68, 6.19]
— END OF SECTION D —
7
[Total: 20 marks]
1
1
1
3
2
2
1
SECTION E
1.
log(yt)=β0+β1t+β2(t−τ1)++β3(t−τ2)++
where τ1 corresponds to 11 March 2020 and τ2 corresponds to 1 April 2020.
• β1 is the trend prior to 11 March.
• β1 + β2 is the trend between 11 March and 1 April 2020. • β1 +β2 +β3 is the trend after 1 April 2020.
3 2πkt 2πkt
αksin 52.18 γkcos 52.18 +εt,
k=1
2
1
1
1
2. No
The structural breaks in March and April means that any model which fits well to the end of February may perform badly in March and April.
The performance of models before the structural breaks is not really important.
3. •
• It is impossible to fit the model before the structural break as the β2 and β3 coeffi-
4. •
5. •
cients are not estimable.
We could try moving the knots or adding new knots and minimizing the AICc statistic.
The plot will show some autocorrelations in the ACF, and possibly an outlier in the
1
1
1
1
2
1
1
1
6. •
7. •
The structural breaks are due to policy changes associated with COVID19. Additional policy changes will probably change the trend again.
The simplistic approach would be to add another knot at 5 July, and assume all the trend changes would cease to have any effect after that date. This could be achieved by setting the piecewise linear predictors associated with β2 and β3 to be zero after 5 July. So then the long-term trend from before 11 March would be the only trend term continuing to have an effect after 5 July.
histogram.
• We could add an ARIMA error term to the model to handle the autocorrelations
• We could add additional knots where the residuals appear to show nonlinear trend.
• However, it is impossible to know how people would respond, but it is unlikely things would return to normal. Therefore we would probably need some kind of judgmental adjustments to the forecasts to allow for scenarios around human responses to the policy changes.
8
2
2
2