Smoothing with Levels
Australian National University
(James Taylor) 1 / 14
J I
Exponential Smoothing
Overview: Construct forecasts as a weighted average of past
observations – smoothing the observed time series
Heavier weight is given to more recent observations
Weights decrease exponentially with time (thus exponential
smoothing)
(James Taylor) 2 / 14
Exponential Smoothing
Pro – Intuitive
Pro – Easy to implement
Pro – Forecast performance is often surprisingly good
Con – Only point forecasts are possible, so cannot analyse statistical
properties
(James Taylor) 3 / 14
45ha
A Basic Framework
Suppose a series µ1, µ2, . . . follows a random walk.
µt = µt�1 + ht
where the ht are iid N (0, s2h )
The level of µt wanders randomly up and down
Optimal h-step ahead forecast is
E(µT+h | IT , q) = µT
The best forecast of future value is current value
(James Taylor) 4 / 14
µ
eta mu
A Basic Framework
Suppose we don’t observe µt , but observe yt :
yt = µt + et
where the et are iid N (0, s2e )
What is a good point forecast for yT+h?
If we observed µT then
E(yT+h | µT ) = E(µT+h + eT+h | µT ) = µT
(James Taylor) 5 / 14
Theobserveddata hidden
t
noise
A Basic Framework
yt = µt + et , et ⇠ N (0, s2e )
µt = µt�1 + ht , ht ⇠ N (0, s2h )
We don’t observe µt
Need to estimate µT
Two sources of information; yT and µT�1
But we don’t observe µT�1, so use yT�1 and µT�2,…
and so on
(James Taylor) 6 / 14
true noise
f
Figure: Example random walk data
(James Taylor) 7 / 14
Hidden
Figure: Example random walk data seen with noise
(James Taylor) 8 / 14
Figure: Noisy Data
(James Taylor) 9 / 14
seen
A Basic Framework
The model
yt = µt + et , et ⇠ N (0, s2e )
µt = µt�1 + ht , ht ⇠ N (0, s2h )
is a simple state-space model
Estimation of such models can be quite involved
Usually use the Kalman Filter (the last part of the course)
The final topic of the course is State-Space models
For the moment we will use an easy smoothing method to produce
reasonable forecasts
(James Taylor) 10 / 14
Additive Smoothing
Suppose at time t � 1 we have a reasonable forecast for yt
Call this forecast the level Lt�1 so that
ŷt|t�1 = Lt�1
Then at time t we observe yt , and we update the level by taking a
weighted average of the old level and the new observation
Lt = ayt + (1� a)Lt�1, or
Lt = Lt�1 + a(yt � Lt�1)
(James Taylor) 11 / 14
expected
jaw 6
updating
Additive Smoothing
Lt = Lt�1 + a(yt � Lt�1)
We adjust the level to correct for the forecast error
If we underestimate yt , then we make the next forecast Lt larger
If we overestimate yt , then we make the next forecast Lt smaller
The value a a↵ects the size of the adjustment. For small a the level
will change slowly, for large a the level will change quickly.
The parameter a is called the smoothing parameter
(James Taylor) 12 / 14
negative
n
new old
Simple Additive Smoothing Algorithm
Simple Additive Smoothing
Initialize with L1 = y1
For t = 2, . . . ,T update the level Lt via
Lt = ayt + (1� a)Lt�1
(James Taylor) 13 / 14
Figure: Example random walk data seen with noise
(James Taylor) 14 / 14
Why Level Smoothing is Exponential
Australian National University
(James Taylor) 1 / 3
5 I a
Exponential Weights
This clearly give smoothing; but what makes it exponential?
Let’s write out the first few levels:
L1 = y1
L2 = ay2 + (1� a)L1 = ay2 + (1� a)y1
L3 = ay3 + (1� a)L2 = ay3 + a(1� a)y2 + (1� a)2y1
L4 = ay4 + (1� a)L3 = ay4 + a(1� a)y3 + a(1� a)2y2 + (1� a)3y1
(James Taylor) 2 / 3
Lt Lyft TN Lt I
O 0
Exponential Weights
We can show that
Lt = ayt + a(1� a)yt�1 + a(1� a)2yt�2 + · · ·+ (1� a)t�1y1
The weight declines exponentially (by a constant factor) as we go back in
time.
Aside – There is a small issue at t = 1, but it’s a vanishingly small problem.
(James Taylor) 3 / 3
f lo a I Llalt 1219 41 x0.02
all 21Yt311
h-step ahead Forecast
One-step-ahead forecast: ŷT+1|T = LT
Two-step-ahead forecast: need to use ŷT+1|T instead of yT+1
ŷT+2|T = aŷT+1|T + (1� a)LT
= aLT ++(1� a)LT
= LT
In general, the h-step ahead forecast is
ŷT+h|T = LT
(James Taylor) 4 / 3
Holt-Winters Smoothing
Australian National University
(James Taylor) 1 / 7
J 2
Data with Trend
The additive smoothing discussed previously only applies to data
without trend
Consider a model with an evolving local level, but also a trend with
an evolving local slope
yt = µt + ltt + et , et ⇠ N (0, s2e )
µt = µt�1 + ht , ht ⇠ N (0, s2h )
lt = lt�1 + nt , nt ⇠ N (0, s2n )
(James Taylor) 2 / 7
noise
levee
Tateof
observed charge
RW
slope
O RW
RWrandomwalk
Figure: Example data with evolving trend with se = 0.5, sn = 0.1, and sh = 0.5
(James Taylor) 3 / 7
positiveslope
noise
o
poorer
shocktotheslope
Holt-Winters Smoothing
The Holt-Winters Smoothing method is a two-component approach
The forecast ŷt|t�1 has two components, Lt�1 and bt�1.
These are a ’level’ and a ’slope’
ŷt|t�1 = Lt�1 + bt�1
(James Taylor) 4 / 7
level slope
Holt-Winters Smoothing – Updating
The level and slope are updated according to
Lt = ayt + (1� a)ŷt = ayt + (1� a)(Lt�1 + bt�1)
bt = b(DLt) + (1� b)bt�1 = b(Lt � Lt�1) + (1� b)bt�1
There are now two smoothing parameters a and b
Lt is the same, except ŷt = Lt�1 + bt�1
bt is a weighted average of the previous slope and the change in level
(James Taylor) 5 / 7
Obs forecast
level
slope
obs guess
Holt-Winters Smoothing Algorithm
Holt-Winters Smoothing
Initialize with L1 = y1 and b1 = y2 � y1
For t = 2, . . . ,T update the level Lt and slope bt via
Lt = ayt + (1� a)(Lt�1 + bt�1)
bt = b(Lt � Lt�1) + (1� b)bt�1
(James Taylor) 6 / 7
1st
2nd
h-step-ahead Forecast
One-step-ahead: ŷT+1|T = LT + bT
Two-step-ahead: use ŷT+1|T instead of yT+1 so
LT+1|T = a(ŷT+1|T ) + (1� a)(LT + bT )
= a(LT + bT ) + (1� a)(LT + bT ) = LT + bT
bT+1|T = b(LT+1 � LT ) + (1� b)bT
= b(LT + bT � LT ) + (1� b)bT = bT
ŷT+2|T = LT+1|T + bT+1|T = LT + 2bT
In general:
ŷT+h|T = LT + hbT
(James Taylor) 7 / 7
y T
Holt-Winters and US GDP
Australian National University
(James Taylor) 1 / 6
53
Forecasting U.S. GDP with Holt-Winters Smoothing
We’ve looked at US GDP data using linear, quadratic and cubic trend
models.
We found quadratic performed the best in terms of AIC/BIC and
MSFE
i.e. both in-sample and pseudo-out-of-sample forecasting measures
(James Taylor) 2 / 6
Forecasting U.S. GDP with Holt-Winters Smoothing
Now we will use Holt-Winters smoothing to produce forecasts for
h = 4 and h = 20
We will use three sets of smoothing parameters: a = b = 0.8,
a = b = 0.5 and a = b = 0.2
(James Taylor) 3 / 6
1year t year
T0 = 40; h = 4;
syhat = zeros(T�h�T0+1,1);
ytph = y(T0+h:end); % observed y {t+h}
alpha = .5; beta = .5; % smoothing parameters
Lt = y(1); bt = y(2) � y(1); %initialise
for t = 2:T�h
newLt = alpha*y(t) + (1�alpha)*(Lt+bt);
newbt = beta*(newLt�Lt) + (1�beta)*bt;
yhat = newLt + h*newbt;
Lt = newLt; bt = newbt; % update Lt and bt
if t>= T0 % store the forecasts for t >= T0
syhat(t�T0+1,:) = yhat;
end
end
MSFE = mean((ytph�syhat).ˆ2);
(James Taylor) 4 / 6
y
n
C ne
Lt aYt U L Lt tbt i
bt f lLt Lt It4Pbt
Itt 4tbt
I
Out-of-sample Forecast Results
quadratic trend a, b = 0.2 a, b = 0.5 a, b = 0.8
h = 4 170400 82515 61051 61295
h = 20 413280 814090 1253500 1561500
Table: MSFE for various models of US GDP
For short-horizon forecasts the Holt-Winters method performs much
better
But for long-horizon forecasts it is much worse; probably worth taking
logs of the data
(James Taylor) 5 / 6
O
Figure: Holt-Winters forecast for h = 4 and a, b = 0.5 and real US GDP data
(James Taylor) 6 / 6
o
our
00
Holt-Winters with Seasonality
Australian National University
(James Taylor) 1 / 5
5.4
Holt-Winters Smoothing with Seasonality
If our data has seasonality as well, we simply add an extra component
to the point forecast
ŷt|t�1 = Lt�1 + bt�1 + St�s
where s the periodicity of seasonality
E.g. usually s = 4 for quarterly data, s = 12 for monthly data
(James Taylor) 2 / 5
HI
o
Tr
Updating Formulae
Level, slope and seasonality are updated as:
Lt = a(yt � St�s) + (1� a)(Lt�1 + bt�1)
bt = b(Lt � Lt�1) + (1� b)bt�1
St = g(yt � Lt) + (1� g)St�s
Three smoothing parameters: a for level, b for slope, and g for
seasonal change
(James Taylor) 3 / 5
0
I
What does it mean when the data is high?
Suppose we get back Quater 1 data and it’s higher than expected.
The model suggests it’s some combination of:
random upward noise (transient; ignore)
random upward movement of the underlying data (permanent
once-o↵; increase level)
an increase in the ‘slope’ (permanent slope increase; increase slope)
Quarter 1’s being a bit higher from now on (permanent once-o↵ for
future Quarter 1’s; increase St)
(James Taylor) 4 / 5
ummm
rumrunner
murmured
men
h-step-ahead Forecast
One-step-ahead: ŷT+1|T = LT + bT + ST+1�s
Two-step-ahead: use ŷT+1|T instead of yT+1 for updating to get
ŷT+2|T = LT + 2bT + ST+2�s
h-step-ahead forecast is
ŷT+h|T = LT + hbT + ST+h�s
(James Taylor) 5 / 5
F January
Ttl Feb
Ttt 12_Feb kill slope seasonality
March March
Tth Is
Tth35
111412 1 2
1 1424 1 lo
Australian Retail Sales
Holt-Winters with Seasonality Example
Australian National University
(James Taylor) 1 / 7
5.5
Forecasting Australian Retail Sales with Holt-Winters
Smoothing
We’ve looked at Australian Retail sales previously, with various
seasonal trend models.
We found the specifications were all very similar, and usually not very
good
(James Taylor) 2 / 7
Forecasting Australian Retail Sales with Holt-Winters
Smoothing
Now produce forecasts with Holt-Winters smoothing with seasonality
Forecast horizon: 1- and 2-step-ahead
Three sets of smoothing parameters: a = b = g = 0.8 and
a = b = g = 0.5, and a = b = g = 0.2
(James Taylor) 3 / 7
Matlab Code
T0 = 15;
h = 1;
s = 4; %periodicity of seasonality
syhat = zeros(T�h�T0+1,1);
ytph = y(T0+h:end); % observed y {t+h}
alpha = .5; beta = .5; gamma = .5;
St = zeros(T�h,1);
% initialize. Many options exist, this is an easy one
Lt = mean(y(1:s)); bt = 0; St(1:s) = y(1:s) � Lt;
(James Taylor) 4 / 7
Ytth
parameter
slope
Matlab Code
for t = s+1:T�h
newLt = alpha*(y(t) � St(t�s)) + (1�alpha)*(Lt+bt);
newbt = beta*(newLt�Lt) + (1�beta)*bt;
St(t) = gamma*(y(t)�newLt) + (1�gamma)*St(t�s);
yhat = newLt + h*newbt + St(t+h�s);
Lt = newLt; bt = newbt; % update Lt and bt
if t>= T0
syhat(t�T0+1,:) = yhat;
end
end
MSFE = mean((ytph�syhat).ˆ2);
(James Taylor) 5 / 7
teal
slope
seasonality
Lot but
otteroption
Out-of-sample Forecast Results
Table: MSFE comparing Seasonal Dummy to Holt-Winters
Seasonal
Dummies
a, b,g
= 0.2
a, b,g
= 0.5
a, b,g
= 0.8
h = 1 1.3405 0.7204 0.4573 0.7466
h = 2 1.6084 0.8934 0.4648 1.0113
Holt-Winters generally works better than just using seasonal dummies
Smoothing parameters around 0.5 work well
For large h and large smoothing parameters, the model does not work
very well
(James Taylor) 6 / 7
O
Holt-Winters Forecasts
Figure: Holt-Winters forecast for h = 1 and a, b,l = 0.5 and Australian Retail
data
(James Taylor) 7 / 7
Optimising Holt-Winters Parameters
Australian National University
(James Taylor) 1 / 5
5.6
Being very tricky – Optimising Holt-Winters
If we want to be very tricky, we can optimise our Holt-Winters process
Make a function file which takes in the a, b parameter values, and
calculates the MSFE
Make an m file which minimises the preceding function with respect to
a and b
Make even better forecasts.
There is a small amount of statistical justification for this approach,
but not much.
(James Taylor) 2 / 5
g
Being very tricky – Optimising Holt-Winters
If we want to be very tricky, we can optimise our Holt-Winters process
Make a function file which takes in the a, b parameter values, and
calculates the MSFE
Make an m file which minimises the preceding function with respect to
a and b
Make even better forecasts.
There is a small amount of statistical justification for this approach,
but not much.
(James Taylor) 2 / 5
Optimising Holt-Winters for Aus Retail
If you do this for Aus Retail, you get nice results
The best parameters are a = 0.4556, b = 0.2459, and g = 0.9173.
This suggests seasonality changes quickly, and slope changes slowly.
Level changes at a moderate pace.
MSFE is 0.3667. The best we have previously was 0.4573. About a
30% reduction.
(James Taylor) 3 / 5
Holt-Winters Optimisation
Figure: Holt-Winters MSFE for Aus Retail data with g = 0.9173, various a, b
(James Taylor) 4 / 5
M 7
Optimising Holt-Winters for US GDP
If you do this for US GDP, you get very very strange results
The ‘best’ parameters are a = 1.42 and b = 0.035.
The outcome of b ⇡ 0 is that the slope doesn’t change much.
The outcome of a > 1 is that if we observe a true value above the
forecast, then we should increase the level above even the observation
(very strange).
Possible reason – Very strong feedback loops (unlikely)
Possible reason – The data is exponential, and should have been
logged first
(James Taylor) 5 / 5
Mum
Optimising Holt-Winters for US GDP
If you do this for US GDP, you get very very strange results
The ‘best’ parameters are a = 1.42 and b = 0.035.
The outcome of b ⇡ 0 is that the slope doesn’t change much.
The outcome of a > 1 is that if we observe a true value above the
forecast, then we should increase the level above even the observation
(very strange).
Possible reason – Very strong feedback loops (unlikely)
Possible reason – The data is exponential, and should have been
logged first
(James Taylor) 5 / 5