Predictive Analytics – Week 12: Exponential Smoothing
Predictive Analytics
Week 12: Exponential Smoothing
Semester 2, 2018
Discipline of Business Analytics, The University of Sydney Business School
Week 12: Exponential Smoothing
1. Simple exponential smoothing
2. Trend corrected exponential smoothing
3. Holt winters smoothing
4. Damped trend exponential smoothing
2/46
Exponential smoothing methods
Exponential smoothing forecasts are weighted averages of past
observations, where the weights decay exponentially as we go
further into the past.
Exponential smoothing can be useful when the time series
components are changing over time.
3/46
Simple exponential smoothing
Simple exponential smoothing (keyboard)
The simple exponential smoothing method specifies the
forecasting rule
ŷt+1 = `t (forecast equation)
`t = αyt + (1− α)`t−1 (smoothing equation)
for an initial value `0 and 0 ≤ α ≤ 1.
4/46
Exponentially weighted moving average
`1 = αy1 + (1− α)`0
`2 = αy2 + (1− α)`1
= αy2 + (1− α)αy1 + (1− α)2`0
`3 = αy3 + (1− α)`2
= αy3 + (1− α)αy2 + (1− α)2αy1 + (1− α)3`0
`4 = αy4 + (1− α)`3
= αy4 + (1− α)αy3 + (1− α)2αy2 + (1− α)3αy1 + (1− α)4`0
…
5/46
Exponentially weighted moving average
It follows that
`t =αyt + (1− α)`t−1
=αyt + (1− α)αyt−1 + (1− α)2αyt−2 + . . .+ (1− α)t−1αy1
+ (1− α)t`0.
Simple exponential smoothing is also known as the exponentially
weighted moving average (EWMA) method.
6/46
Simple exponential smoothing
• Useful for forecasting time series with changing levels.
• A higher α gives larger weight to recent observations, making
the forecasts more adaptive to recent changes in the series.
• A lower α leads to a larger weights for past observations,
making the forecasts smoother.
• Initialisation: we typically set `0 = y1 for simplicity.
Alternatively, we can treat it as a parameter.
7/46
Example: AUD/USD exchange rate
8/46
Example: AUD/USD exchange rate
9/46
Estimation
We estimate α by least squares (empirical risk minimisation).
α̂ = argmin
α
N∑
t=1
(yt − `t−1)2
Each `t is a nonlinear function of α, so that there is no formula for
α̂. We use numerical optimisation methods to obtain the solution.
10/46
Statistical model
In order to say more about the simple exponential smoothing
method, we need to formulate it as a statistical model. We assume
that
Yt = `t−1 + εt,
`t = αyt + (1− α)`t−1,
where the errors εt are i.i.d with constant variance σ2.
11/46
Statistical model
In forecasting, we want to:
1. To compute point forecasts for multiple forecasting horizons h.
2. To compute interval forecasts for multiple forecasting horizons
h.
In order to this for the exponential smoothing method, we rewrite
the model in error correction form.
12/46
Error correction form
We obtain the error correction form as
`t = αYt + (1− α)`t−1
= `t−1 + α(Yt − `t−1)
= `t−1 + αεt.
Hence, we can rewrite the model as:
Yt+1 = `t + εt+1,
`t = `t−1 + αεt.
13/46
Error correction form
Using `t = `t−1 + αεt,
`t+1 = `t + αεt+1
`t+2 = `t+1 + αεt+2
= `t + αεt+1 + αεt+2
`t+3 = `t+2 + αεt+3
= `t + αεt+1 + αεt+2 + αεt+3
…
`t+h = `t +
h∑
i=1
αεt+i
14/46
Constant plus noise representation
Using Yt = `t−1 + εt and the previous slide,
Yt+1 = `t + εt+1
Yt+2 = `t+1 + εt+2
= `t + αεt+1 + εt+2
Yt+3 = `t+2 + εt+3
= `t + αεt+1 + αεt+2 + εt+3
…
Yt+h = `t+h−1 + εt+h
= `t +
h−1∑
i=1
αεt+i + εt+h
15/46
Point forecast
Constant plus noise representation of future observations:
Yt+h = `t +
h−1∑
i=1
αεt+i + εt+h
From the linearity of expectations, the point forecast for any
horizon h is
ŷt+h = E(Yt+h|y1:t)
= E
(
`t +
h−1∑
i=1
αεt+i + εt+h
∣∣∣∣∣ y1:t
)
= `t
16/46
Forecast variance
Var(Yt+1|y1:t) = Var(`t + εt+1|y1:t)
= σ2
Var(Yt+2|y1:t) = Var(`t + αεt+1 + εt+2|y1:t)
= σ2(1 + α2)
…
Var(Yt+h|y1:t) = Var
(
`t +
h−1∑
i=1
αεt+h−i + εt+h
∣∣∣∣∣ y1:t
)
= σ2(1 + (h− 1)α2)
17/46
Forecast equations for simple exponential smoothing
ŷt+h = `t
Var(Yt+h|y1:t) = σ2(1 + (h− 1)α2)
18/46
Interval forecast
If we assume that εt ∼ N(0, σ2),
Yt+h|y1:t ∼ N
(
`t, σ
2
[
1 + (h− 1)α2
])
.
To compute an interval forecast, we use the estimated values of α
and σ2: ̂̀
t ± zcrit ×
√
σ̂2 [1 + (h− 1)α̂2],
where
σ̂2 =
∑n
t=1(yt − `t−1)2
N − 1
.
If the errors are not normal, you should use the Bootstrap method
or other distributional assumptions.
19/46
Example: AUD/USD exchange rate
20/46
Trend corrected exponential
smoothing
Trend corrected exponential smoothing
The trend corrected or Holt exponential smoothing method
allows for a time-varying trend:
ŷt+1 = `t + bt (forecast equation)
`t = αyt + (1− α)(`t−1 + bt−1) (smoothing equation)
bt = β(`t − `t−1) + (1− β)bt−1 (trend equation)
for an initial values `0 and b0, 0 ≤ α ≤ 1, and 0 ≤ β ≤ 1.
21/46
Trend corrected exponential smoothing
Consider the simple time series trend model
`t = a+ b× t,
Yt = `t + εt.
What is `t − `t−1 here?
22/46
Trend corrected exponential smoothing model
The statistical model is
Yt+1 = `t + bt + εt+1,
`t = αYt + (1− α)(`t−1 + bt−1),
bt = β(`t − `t−1) + (1− β)bt−1,
where the errors εt are i.i.d with constant variance σ2.
The least squares estimates of α and β are
α̂, β̂ = argmin
α,β
N∑
t=1
(yt − `t−1 − bt−1)2
23/46
Error correction form
`t = αYt + (1− α)(`t−1 + bt−1)
= `t−1 + bt−1 + α(Yt − `t−1 − bt−1)
= `t−1 + bt−1 + αεt
bt = β(`t − `t−1) + (1− β)bt−1
= bt−1 + β(`t − `t−1 − bt−1)
= bt−1 + βα(`t−1 + bt−1 + αεt − `t−1 − bt−1)
= bt−1 + βαεt
24/46
Error correction form
Yt+1 = `t + bt + εt+1
`t = `t−1 + bt−1 + αεt
bt = bt−1 + βαεt
25/46
Constant plus noise representation
Yt+1 = `t + bt + εt+1
Yt+2 = `t+1 + bt+1 + εt+2
= `t + 2bt + α(1 + β)εt+1 + εt+2
Yt+3 = `t+2 + bt+2 + +εt+3
= `t+1 + 2bt+1 + α(1 + β)εt+2 + εt+3
= `t + 3bt + α(1 + 2β)εt+1 + α(1 + β)εt+2 + εt+3
…
Yt+h = `t + hbt + α
h−1∑
i=1
(1 + iβ)εt+h−i + εt+h
26/46
Point forecast
Constant plus noise representation of future observations:
Yt+h = `t + hbt + α
h−1∑
i=1
(1 + iβ)εt+h−i + εt+h
From the linearity of expectations, the point forecast for any
horizon h is
ŷt+h = E(Yt+h|y1:t)
= E
(
`t + hbt + α
h−1∑
i=1
(1 + iβ)εt+h−i + εt+h
∣∣∣∣∣ y1:t
)
= `t + hbt.
27/46
Forecast variance
Var(Yt+1|y1:t) = Var(`t + bt + εt+1|y1:t)
= σ2
Var(Yt+2|y1:t) = Var(`t + 2bt + α(1 + β)εt+1 + εt+2|y1:t)
= σ2(1 + α2(1 + β)2)
…
Var(Yt+h|y1:t) = Var
(
`t + hbt + α
h−1∑
i=1
(1 + iβ)εt+h−i + εt+h|y1:t
)
= σ2
(
1 + α2
h−1∑
i=1
(1 + iβ)2
)
28/46
Forecast equations for the trend corrected smoothing method
Point forecast:
ŷt+h = ̂̀t + hb̂t
Variance:
Var(Yt+h|y1:t) = σ2
(
1 + α2
h−1∑
i=1
(1 + iβ)2
)
We compute interval forecasts as before.
29/46
Example: assaults in Sydney
30/46
Example: assaults in Sydney
31/46
Example: visitor arrivals
32/46
Example: visitor arrivals
33/46
Holt winters smoothing
Holt Winters exponential smoothing
The Holt-Winters exponential smoothing method extend the
trend corrected method to seasonal data. It allows for additive or
multiplicative seasonality.
34/46
Additive Holt Winters Smoothing (key concept)
ŷt+1 = `t + bt + St+1−L (forecast equation)
`t = α(yt − St−L) + (1− α)(`t−1 + bt−1) (level)
bt = β(`t − `t−1) + (1− β)bt−1, (trend)
St = δ(yt − `t) + (1− δ)St−L, (seasonal indices)
for a seasonal frequency L, initial values `0, b0, and Si−L for
i = 1, . . . , L, and parameters 0 ≤ α ≤ 1, 0 ≤ β ≤ 1, 0 ≤ δ ≤ 1.
35/46
Multiplicative Holt Winters Smoothing (key concept)
ŷt+1 = (`t + bt)× St+1−L (forecast equation)
`t = α(yt/St−L) + (1− α)(`t−1 + bt−1) (level)
bt = β(`t − `t−1) + (1− β)bt−1, (trend)
St = δ(yt/`t) + (1− δ)St−L, (seasonal indices)
for a seasonal frequency L, initial values `0, b0, and Si−L for
i = 1, . . . , L, and parameters 0 ≤ α ≤ 1, 0 ≤ β ≤ 1, 0 ≤ δ ≤ 1.
36/46
Statistical model
As before, we formulate a statistical model by specifying an
observation equation.
Additive:
Yt+1 = `t + bt + St+1−L + εt+1,
where εt+1 is i.i.d with variance σ2
Multiplicative:
yt+1 = (`t + bt)× St+1−L + εt+1,
where εt+1 is i.i.d with variance σ2.
37/46
Estimation
We estimate α, β and δ by least squares.
Additive:
α̂, β̂, δ̂ = argmin
α,β,δ
N∑
t=1
(yt − `t−1 − bt−1 − St+1−L)2
Multiplicative:
α̂, β̂, δ̂ = argmin
α,β,δ
N∑
t=1
(yt − (`t + bt)× St+1−L)2
38/46
Forecast equations
Additive:
ŷt+h = ̂̀t + hb̂t + St−L+(h mod L)
Var(Yt+h|y1:t) = σ2
(
1 +
h−1∑
i=1
[α(1 + iβ) + Ii,Lδ(1− α)]2
)
,
where mod is the modulo operator, Ii,L = 0 if h mod L 6= i and
Ii,L = 1 if h mod L = i.
Multiplicative:
ŷt+h = (̂̀t + hb̂t)× St−L+(h mod L)
No simple expression exists for the variance in the multiplicative
model.
39/46
Example: assaults in Sydney
The estimated parameters are α̂ = 0.117, β̂ = 0.023, and
δ̂ = 0.370. 40/46
Example: visitor arrivals
The estimated parameters are α̂ = 0.154, β̂ = 0.088, and
δ̂ = 0.271. 41/46
Example: visitor arrivals
42/46
Damped trend exponential
smoothing
Damped trend exponential smoothing
Damped trend exponential smoothing addresses the problem
that extrapolating trends indefinitely into the future can lead to
implausible forecasts.
43/46
Model and forecast
Model:
yt+1 = `t + φbt + εt+1,
`t = αyt + (1− α)(`t−1 + φbt−1),
bt = β(`t − `t−1) + (1− β)φbt−1,
where φ is the damping parameter, with 0 ≤ φ ≤ 1.
Forecast equation:
ŷt+h = `t + φbt + φ2bt + φ3bt + . . .+ φhbt
We can extend it to allow for additive or multiplicative seasonality.
44/46
Illustration: visitor arrivals
45/46
Review questions
• What is exponential smoothing?
• What is the difference between simple, trend corrected, and
Holt-Winters exponential smoothing methods?
• Derive the point forecasts and forecast variances for the SES
and trend corrected methods, starting from the model
equations.
• Explain how to compute forecast intervals based on the SES
and trend corrected methods.
46/46
Simple exponential smoothing
Trend corrected exponential smoothing
Holt winters smoothing
Damped trend exponential smoothing