Smoothing with Levels
Exponential Smoothing
Overview: Construct forecasts as a weighted average of past
observations – smoothing the observed time series
Heavier weight is given to more recent observations
Weights decrease exponentially with time (thus exponential
Exponential Smoothing
Pro – Intuitive
Pro – Easy to implement
Pro – Forecast performance is often surprisingly good
Con – Only point forecasts are possible, so cannot analyse statistical
A Basic Framework
Suppose a series µ1, µ2, . . . follows a random walk.
µt = µt�1 + ht
where the ht are iid N (0, s2h )
The level of µt wanders randomly up and down
Optimal h-step ahead forecast is
E(µT+h | IT , q) = µT
The best forecast of future value is current value
eta mu
A Basic Framework
Suppose we don’t observe µt , but observe yt :
yt = µt + et
where the et are iid N (0, s2e )
What is a good point forecast for yT+h?
If we observed µT then
E(yT+h | µT ) = E(µT+h + eT+h | µT ) = µT
Theobserveddata hidden
A Basic Framework
yt = µt + et , et ⇠ N (0, s2e )
µt = µt�1 + ht , ht ⇠ N (0, s2h )
We don’t observe µt
Need to estimate µT
Two sources of information; yT and µT�1
But we don’t observe µT�1, so use yT�1 and µT�2,…
and so on
true noise
Figure: Example random walk data
Figure: Example random walk data seen with noise
Figure: Noisy Data
A Basic Framework
The model
yt = µt + et , et ⇠ N (0, s2e )
µt = µt�1 + ht , ht ⇠ N (0, s2h )
is a simple state-space model
Estimation of such models can be quite involved
Usually use the Kalman Filter (the last part of the course)
The final topic of the course is State-Space models
For the moment we will use an easy smoothing method to produce
reasonable forecasts
Additive Smoothing
Suppose at time t � 1 we have a reasonable forecast for yt
Call this forecast the level Lt�1 so that
ŷt|t�1 = Lt�1
Then at time t we observe yt , and we update the level by taking a
weighted average of the old level and the new observation
Lt = ayt + (1� a)Lt�1, or
Lt = Lt�1 + a(yt � Lt�1)
Additive Smoothing
Lt = Lt�1 + a(yt � Lt�1)
We adjust the level to correct for the forecast error
If we underestimate yt , then we make the next forecast Lt larger
If we overestimate yt , then we make the next forecast Lt smaller
The value a a↵ects the size of the adjustment. For small a the level
will change slowly, for large a the level will change quickly.
The parameter a is called the smoothing parameter
Simple Additive Smoothing Algorithm
Simple Additive Smoothing
Initialize with L1 = y1
For t = 2, . . . ,T update the level Lt via
Lt = ayt + (1� a)Lt�1
Figure: Example random walk data seen with noise
Why Level Smoothing is Exponential
Exponential Weights
This clearly give smoothing; but what makes it exponential?
Let’s write out the first few levels:
L1 = y1
L2 = ay2 + (1� a)L1 = ay2 + (1� a)y1
L3 = ay3 + (1� a)L2 = ay3 + a(1� a)y2 + (1� a)2y1
L4 = ay4 + (1� a)L3 = ay4 + a(1� a)y3 + a(1� a)2y2 + (1� a)3y1
Exponential Weights
We can show that
Lt = ayt + a(1� a)yt�1 + a(1� a)2yt�2 + · · ·+ (1� a)t�1y1
The weight declines exponentially (by a constant factor) as we go back in
Aside – There is a small issue at t = 1, but it’s a vanishingly small problem.
h-step ahead Forecast
One-step-ahead forecast: ŷT+1|T = LT
Two-step-ahead forecast: need to use ŷT+1|T instead of yT+1
ŷT+2|T = aŷT+1|T + (1� a)LT
= aLT ++(1� a)LT
= LT
In general, the h-step ahead forecast is
ŷT+h|T = LT
Holt-Winters Smoothing
Data with Trend
The additive smoothing discussed previously only applies to data
without trend
Consider a model with an evolving local level, but also a trend with
an evolving local slope
yt = µt + ltt + et , et ⇠ N (0, s2e )
µt = µt�1 + ht , ht ⇠ N (0, s2h )
lt = lt�1 + nt , nt ⇠ N (0, s2n )
Figure: Example data with evolving trend with se = 0.5, sn = 0.1, and sh = 0.5
Holt-Winters Smoothing
The Holt-Winters Smoothing method is a two-component approach
The forecast ŷt|t�1 has two components, Lt�1 and bt�1.
These are a ’level’ and a ’slope’
ŷt|t�1 = Lt�1 + bt�1
Holt-Winters Smoothing – Updating
The level and slope are updated according to
Lt = ayt + (1� a)ŷt = ayt + (1� a)(Lt�1 + bt�1)
bt = b(DLt) + (1� b)bt�1 = b(Lt � Lt�1) + (1� b)bt�1
There are now two smoothing parameters a and b
Lt is the same, except ŷt = Lt�1 + bt�1
bt is a weighted average of the previous slope and the change in level
Holt-Winters Smoothing Algorithm
Holt-Winters Smoothing
Initialize with L1 = y1 and b1 = y2 � y1
For t = 2, . . . ,T update the level Lt and slope bt via
Lt = ayt + (1� a)(Lt�1 + bt�1)
bt = b(Lt � Lt�1) + (1� b)bt�1
h-step-ahead Forecast
One-step-ahead: ŷT+1|T = LT + bT
Two-step-ahead: use ŷT+1|T instead of yT+1 so
LT+1|T = a(ŷT+1|T ) + (1� a)(LT + bT )
= a(LT + bT ) + (1� a)(LT + bT ) = LT + bT
bT+1|T = b(LT+1 � LT ) + (1� b)bT
= b(LT + bT � LT ) + (1� b)bT = bT
ŷT+2|T = LT+1|T + bT+1|T = LT + 2bT
In general:
ŷT+h|T = LT + hbT
Holt-Winters and US GDP
Forecasting U.S. GDP with Holt-Winters Smoothing
We’ve looked at US GDP data using linear, quadratic and cubic trend
We found quadratic performed the best in terms of AIC/BIC and
i.e. both in-sample and pseudo-out-of-sample forecasting measures
Forecasting U.S. GDP with Holt-Winters Smoothing
Now we will use Holt-Winters smoothing to produce forecasts for
h = 4 and h = 20
We will use three sets of smoothing parameters: a = b = 0.8,
a = b = 0.5 and a = b = 0.2
T0 = 40; h = 4;
syhat = zeros(T�h�T0+1,1);
ytph = y(T0+h:end); % observed y {t+h}
alpha = .5; beta = .5; % smoothing parameters
Lt = y(1); bt = y(2) � y(1); %initialise
for t = 2:T�h
newLt = alpha*y(t) + (1�alpha)*(Lt+bt);
newbt = beta*(newLt�Lt) + (1�beta)*bt;
yhat = newLt + h*newbt;
Lt = newLt; bt = newbt; % update Lt and bt
if t>= T0 % store the forecasts for t >= T0
syhat(t�T0+1,:) = yhat;
MSFE = mean((ytph�syhat).ˆ2);
Out-of-sample Forecast Results
quadratic trend a, b = 0.2 a, b = 0.5 a, b = 0.8
h = 4 170400 82515 61051 61295
h = 20 413280 814090 1253500 1561500
Table: MSFE for various models of US GDP
For short-horizon forecasts the Holt-Winters method performs much
But for long-horizon forecasts it is much worse; probably worth taking
logs of the data
Figure: Holt-Winters forecast for h = 4 and a, b = 0.5 and real US GDP data
Holt-Winters with Seasonality
Holt-Winters Smoothing with Seasonality
If our data has seasonality as well, we simply add an extra component
to the point forecast
ŷt|t�1 = Lt�1 + bt�1 + St�s
where s the periodicity of seasonality
E.g. usually s = 4 for quarterly data, s = 12 for monthly data
Updating Formulae
Level, slope and seasonality are updated as:
Lt = a(yt � St�s) + (1� a)(Lt�1 + bt�1)
bt = b(Lt � Lt�1) + (1� b)bt�1
St = g(yt � Lt) + (1� g)St�s
Three smoothing parameters: a for level, b for slope, and g for
seasonal change
What does it mean when the data is high?
Suppose we get back Quater 1 data and it’s higher than expected.
The model suggests it’s some combination of:
random upward noise (transient; ignore)
random upward movement of the underlying data (permanent
once-o↵; increase level)
an increase in the ‘slope’ (permanent slope increase; increase slope)
Quarter 1’s being a bit higher from now on (permanent once-o↵ for
future Quarter 1’s; increase St)
h-step-ahead Forecast
One-step-ahead: ŷT+1|T = LT + bT + ST+1�s
Two-step-ahead: use ŷT+1|T instead of yT+1 for updating to get
ŷT+2|T = LT + 2bT + ST+2�s
h-step-ahead forecast is
ŷT+h|T = LT + hbT + ST+h�s
Australian Retail Sales
Holt-Winters with Seasonality Example
Forecasting Australian Retail Sales with Holt-Winters
We’ve looked at Australian Retail sales previously, with various
seasonal trend models.
We found the specifications were all very similar, and usually not very
Forecasting Australian Retail Sales with Holt-Winters
Now produce forecasts with Holt-Winters smoothing with seasonality
Forecast horizon: 1- and 2-step-ahead
Three sets of smoothing parameters: a = b = g = 0.8 and
a = b = g = 0.5, and a = b = g = 0.2
Matlab Code
T0 = 15;
h = 1;
s = 4; %periodicity of seasonality
syhat = zeros(T�h�T0+1,1);
ytph = y(T0+h:end); % observed y {t+h}
alpha = .5; beta = .5; gamma = .5;
St = zeros(T�h,1);
% initialize. Many options exist, this is an easy one
Lt = mean(y(1:s)); bt = 0; St(1:s) = y(1:s) � Lt;
Matlab Code
for t = s+1:T�h
newLt = alpha*(y(t) � St(t�s)) + (1�alpha)*(Lt+bt);
newbt = beta*(newLt�Lt) + (1�beta)*bt;
St(t) = gamma*(y(t)�newLt) + (1�gamma)*St(t�s);
yhat = newLt + h*newbt + St(t+h�s);
Lt = newLt; bt = newbt; % update Lt and bt
if t>= T0
syhat(t�T0+1,:) = yhat;
MSFE = mean((ytph�syhat).ˆ2);
Out-of-sample Forecast Results
Table: MSFE comparing Seasonal Dummy to Holt-Winters
a, b,g
= 0.2
a, b,g
= 0.5
a, b,g
= 0.8
h = 1 1.3405 0.7204 0.4573 0.7466
h = 2 1.6084 0.8934 0.4648 1.0113
Holt-Winters generally works better than just using seasonal dummies
Smoothing parameters around 0.5 work well
For large h and large smoothing parameters, the model does not work
very well
Holt-Winters Forecasts
Figure: Holt-Winters forecast for h = 1 and a, b,l = 0.5 and Australian Retail
Optimising Holt-Winters Parameters
Being very tricky – Optimising Holt-Winters
If we want to be very tricky, we can optimise our Holt-Winters process
Make a function file which takes in the a, b parameter values, and
calculates the MSFE
Make an m file which minimises the preceding function with respect to
a and b
Make even better forecasts.
There is a small amount of statistical justification for this approach,
but not much.
Optimising Holt-Winters for Aus Retail
If you do this for Aus Retail, you get nice results
The best parameters are a = 0.4556, b = 0.2459, and g = 0.9173.
This suggests seasonality changes quickly, and slope changes slowly.
Level changes at a moderate pace.
MSFE is 0.3667. The best we have previously was 0.4573. About a
30% reduction.
Holt-Winters Optimisation
Figure: Holt-Winters MSFE for Aus Retail data with g = 0.9173, various a, b
Optimising Holt-Winters for US GDP
If you do this for US GDP, you get very very strange results
The ‘best’ parameters are a = 1.42 and b = 0.035.
The outcome of b ⇡ 0 is that the slope doesn’t change much.
The outcome of a > 1 is that if we observe a true value above the
forecast, then we should increase the level above even the observation
(very strange).
Possible reason – Very strong feedback loops (unlikely)
Possible reason – The data is exponential, and should have been
logged first
