CS代写 QBUS2820 – Predictive Analytics

Tutorial_10

QBUS2820 – Predictive Analytics
Tutorial 10 – Exponential smoothing¶

Copyright By PowCoder代写 加微信 powcoder

In this tutorial we will study how to apply exponential smoothing methods to forecast seasonal data in Python.

We need to use forecast.py and statlearning.py modules for this tutroial as we did for the previous tutorial.

Data: NSW retail turnover

Time series decomposition

Trend corrected exponential smoothing

Holt-Winters smoothing

Model diagnostics

Model validation

Note: To be able to run the X-13 package below for time series decompostion, Python versions newer than 3.8 are required

This notebook relies on the following imports and settings

# Packages
import warnings
warnings.filterwarnings(“ignore”)
import numpy as np
from scipy import stats
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm

# Plot settings
sns.set_context(‘notebook’)
sns.set_style(‘ticks’)
red=’#D62728′
blue=’#1F77B4′
%matplotlib inline

Data: NSW Retail turnover¶
In this tutorial we will keep working with the Australian retail turnover series. The next cells load the data and reproduce some steps from last week.

data = pd.read_csv(‘nswretail.csv’, index_col=’Month’, parse_dates=True, dayfirst=True)
data.tail()

2017-02-01 7298.9
2017-03-01 8085.8
2017-04-01 7883.7
2017-05-01 8132.0
2017-06-01 8130.1

y = data[‘Turnover’].copy()
y.index = y.index.to_period(freq=’M’)

ts = data[‘Turnover’]

data.describe().round(2)

count 138.00
mean 6611.42
std 1168.73
min 4496.90
25% 5769.78
50% 6347.20
75% 7450.70
max 10783.10

fig, ax= plt.subplots(figsize=(8,5))
y.plot(color=red)
ax.set_xlabel(”)
ax.set_ylabel(‘Turnover’)
ax.set_title(‘NSW retail turnover (2005-2017)’)
ax.set_xticks([], minor=True)
sns.despine()
plt.show()

fig, ax= plt.subplots(figsize=(8,5))
np.log(y).plot(color=red)
ax.set_xlabel(”)
ax.set_ylabel(‘Log turnover’)
ax.set_title(‘Log series’)
ax.set_xticks([], minor=True)
sns.despine()
plt.show()

Time series decomposition¶
**Note: Methods for time series decompostion are not covered in the lecture, but this tutorial shows you how to perform the decompostion using a package**,

There are many other time series decomposition methods and packages, e.g. https://www.statsmodels.org/stable/generated/statsmodels.tsa.seasonal.STL.html#statsmodels.tsa.seasonal.STL

We will use the basic one, which applies a moving average to calculate the ‘trend’, this is, it smoothes the series around each point buy averaging the values of all observations around that point.

The smoothing window can be selected automatically, by here we will play with the argument filt that defines the window.
We use a simple average, this is, all observations have the same weight.
Run with window_size=12, look at the plots of the seasonal component and trend, then change the value of window size (e.g. to 6 (too small, just for illustration), 24, etc.) and convince yourself that the definition of trend affects the seasonal component and the results might vary. You might need to restart the notebook if you get an error, due to conflicts with code chucks at the end.

import statsmodels.tsa.seasonal
window_size = 12
decomposition = statsmodels.tsa.seasonal.seasonal_decompose(ts, filt=np.repeat(1/window_size, window_size))

You can use the dir method to check the available output.

dir(decomposition)[-6:]

[‘observed’, ‘plot’, ‘resid’, ‘seasonal’, ‘trend’, ‘weights’]

For example, if we want to retrieve and plot the seasonally adjusted time series, we can do as follows (Note that the seasonally adjusted time series is the original time series after removing the seasonal component).

decomposition.seasonal.tail()

2017-02-01 -738.799451
2017-03-01 -191.496117
2017-04-01 -354.003617
2017-05-01 -221.082784
2017-06-01 -322.940739
Name: seasonal, dtype: float64

fig, ax= plt.subplots(figsize=(8,5))
decomposition.seasonal.plot(color=red)
ax.set_xlabel(”)
ax.set_ylabel(‘Turnover’)
ax.set_title(‘Seasonally adjusted NSW retail turnover 2005-2017)’)
ax.set_xticks([], minor=True) # I prefer to remove the minor ticks for a cleaner plot
sns.despine()
plt.show()

fig, ax= plt.subplots(figsize=(8,5))
decomposition.trend.plot(color=”green”)
ax.set_xlabel(”)
ax.set_ylabel(‘Turnover’)
ax.set_title(‘Trend of NSW retail turnover 2005-2017)’)
ax.set_xticks([], minor=True) # I prefer to remove the minor ticks for a cleaner plot
sns.despine()
plt.show()

There is a convenience function for plotting all components.

decomposition.plot()
plt.show()

Trend corrected exponential smoothing¶
We will forecast with trend-corrected exponential smoothing.
We have not yet seen the more advanced exponential smoothing models in the lectures,
for now lets think about in superficial terms as doing exponential smoothing in two components, the level, as we have seen in the simple exponential smoothing and the trend.

import forecast
holt = forecast.holt(ts)
holt.fit()
holt.summary()

Holt (trend corrected) exponential smoothing

Smoothing parameters:
alpha (level) 0.072 (0.031)
beta (trend) 0.038 (0.023)

In-sample fit:
MSE 513537.917
Log-likelihood -1103.100
AIC 2212.200
BIC 2220.982

smoothed=pd.Series(holt.smooth(), index=y.index)

fig, ax= plt.subplots(figsize=(10,6))

ts.plot(color=’#D62728′, label=’trend-corrected turnover’)
smoothed.plot(color=’black’, label=’Smoothed’, alpha=0.65)
plt.legend(loc=’best’)
ax.set_xlabel(”)
ax.set_xticks([], minor=True)
ax.set_title(r’Holt exponential smoothing’, fontsize=13.5, fontweight=’bold’)
sns.despine()
plt.show()

test=pd.period_range(start=ts.index[-1]+1, periods=h, freq=’M’)

pred=pd.Series(holt.forecast(h), index=test)

intv1=pd.DataFrame(holt.intervalforecast(h, level=.8), index=test)
intv2=pd.DataFrame(holt.intervalforecast(h, level=.9), index=test)
intv3=pd.DataFrame(holt.intervalforecast(h, level=.99), index=test)

fig, ax = forecast.fanchart(ts[’01-2013′:], pred, intv1, intv2, intv3)
ax.set_xlabel(”)
ax.set_xticks([], minor=True)
plt.title(‘Trend corrected retail turnover forecast’, fontsize=13.5, fontweight=’bold’)

sns.despine()
plt.show()

Holt-winters exponential smoothing¶
Yet a more advanced version, consider an simple exponential smoothing applied to the level, another to the trend and another to the season

We estimate several candidate specifications below. The AIC suggests a multiplicative model without damping for the original series, but we would need to make a likelihood adjustment or a validation set comparison to selected between this method and an additive model for a log transformation.

ahw=forecast.holtwinters(y, additive=True, damped=False, m=12)
ahw.summary()

Additive Holt-winters exponential smoothing

Smoothing parameters:
alpha (level) 0.575 (0.072)
beta (trend) 0.027 (0.017)
delta (seasonal) 1.000 (0.173)

In-sample fit:
MSE 13865.368
RMSE 117.751
Log-likelihood -853.877
AIC 1715.754
BIC 1727.463

mhw=forecast.holtwinters(y, additive=False, damped=False, m=12)
mhw.summary()

Multiplicative Holt-winters exponential smoothing

Smoothing parameters:
alpha (level) 0.459 (0.071)
beta (trend) 0.024 (0.012)
delta (seasonal) 0.704 (0.144)

In-sample fit:
MSE 13016.536
RMSE 114.090
Log-likelihood -849.518
AIC 1707.036
BIC 1718.745

mhw_damped=forecast.holtwinters(y, additive=False, damped=True, m=12)
mhw_damped.fit()
mhw_damped.summary()

Multiplicative Holt-winters exponential smoothing (damped trend)

Smoothing parameters:
alpha (level) 0.330 (0.114)
beta (trend) 1.000 (0.558)
delta (seasonal) 0.552 (0.170)
phi (damping) 0.270 (0.198)

In-sample fit:
MSE 13608.225
RMSE 116.654
Log-likelihood -852.585
AIC 1715.170
BIC 1729.807

ahw_log=forecast.holtwinters(np.log(y), additive=True, damped=False, m=12)
ahw_log.fit()
ahw_log.summary()

Additive Holt-winters exponential smoothing

Smoothing parameters:
alpha (level) 0.529 (0.074)
beta (trend) 0.022 (0.014)
delta (seasonal) 0.823 (0.162)

In-sample fit:
MSE 0.000
RMSE 0.018
Log-likelihood 358.566
AIC -709.132
BIC -697.423

The smoothed series based on the multiplicative method tracks the original series very closely.

smoothed=pd.Series(mhw.smooth(), index=y.index)

fig, ax= plt.subplots(figsize=(10,6))

y.plot(color=’#D62728′, label=’NSW retail turnover’)
smoothed.plot(color=’black’, label=’Smoothed’, alpha=0.6)
plt.legend(loc=’best’)
ax.set_xlabel(”)
ax.set_xticks([], minor=True)
ax.set_title(r’Holt-winters exponential smoothing’, fontsize=13.5, fontweight=’bold’)
sns.despine()
plt.show()

Model diagnostics¶
We base the diagnostics on the multiplicative Holt-Winters model. The model seems to adequately capture the time series patterns in the data, leading to small and insignicant residual correlations. The residuals do not follow the normal distribution, so that ideally we should use alternative assumptions for computing prediction intervals.

You might rembmer the residual plot and the autocorrelation plot from model checking in linear regression.

resid = (y-mhw.smooth())[12:] # we remove the first 12 observations as they are for initialisitation only

fig, ax= plt.subplots(figsize=(8,5))
resid.plot(color=blue)
ax.set_xlabel(”)
ax.set_xticks([], minor=True)
ax.set_title(‘Residual plot’)
sns.despine()
plt.show()

fig, ax = plt.subplots(figsize=(8,5))
sm.graphics.tsa.plot_acf(resid, lags=40, ax=ax)
sns.despine()
plt.show()

from forecast import histogram, qq_plot

histogram(resid)
plt.show()

qq_plot(resid)
plt.show()

print(‘Residual skewness: {:.3f}’.format(resid.skew()))
print(‘Residual kurtosis: {:.3f}’.format(resid.kurt()))

Residual skewness: 0.163
Residual kurtosis: 0.788

Model validation¶
The real time forecasting analysis suggests that the log additive model for the log series is the most accurate method for one step ahead forecasts. The Holt-Winters forecasts perform significantly better than a seasonal random walk.

# Real time forecasting

ts=np.log(y) # log series for the log-additive model

validation=y[‘2012-01’:].index
start = y.index.get_loc(‘2012-01’)

pred1 = [] # seasonal random walk
pred2 = [] # additive holt winters
pred3 = [] # mutiplicative holt winters
pred4 = [] # log additive holt winters
actual= []

for i in range(start, len(y)):

actual.append(y.iloc[i]) # actual value

pred1.append(y.iloc[i-12]) # seasonal random walk forecast

model = forecast.holtwinters(y.iloc[:i], additive=True, damped=False, m=12)
model.fit()
pred2.append(model.forecast(1)[0]) # additive holt winters forecast

model = forecast.holtwinters(y.iloc[:i], additive=False, damped=False, m=12)
model.fit()
pred3.append(model.forecast(1)[0]) # multiplicate holt winters forecast

model = forecast.holtwinters(ts.iloc[:i], additive=True, damped=False, m=12)
model.fit()
resid = (ts.iloc[:i]-model.smooth())[12:] # residuals
y_pred = np.exp(model.forecast(1)[0])*np.mean(np.exp(resid)) # forecast with a retransformation adjustment
pred4.append(y_pred) # long additive holt winters forecast

columns=[‘Seasonal RW’, ‘Additive’, ‘Multiplicative’, ‘Log additive’, ‘Actual’]
results = np.vstack([pred1,pred2,pred3,pred4,actual]).T
results = pd.DataFrame(results, columns=columns, index=validation)

from statlearning import rmse_jack

table = pd.DataFrame(0.0, index=results.columns[:-1], columns=[‘RMSE’,’SE’])
for i in range(4):
table.iloc[i,0], table.iloc[i,1] = rmse_jack(results.iloc[:,i], results.iloc[:,-1])
table.round(2)

Seasonal RW 385.73 21.12
Additive 146.48 17.78
Multiplicative 134.39 10.07
Log additive 129.95 10.75

Finally, we generate a two year forecast based on the selected model.

test=pd.period_range(start=y.index[-1]+1, periods=h, freq=’M’)

pred=pd.Series(ahw_log.forecast(h), index=test)
var = ahw_log.forecastvariance(h)
pred=np.exp(pred+var/2) # retransformation under the assumption of normality, for simplicity

intv1=pd.DataFrame(ahw_log.intervalforecast(h, level=.8), index=test)
intv2=pd.DataFrame(ahw_log.intervalforecast(h, level=.9), index=test)
intv3=pd.DataFrame(ahw_log.intervalforecast(h, level=.99), index=test)
intv1=np.exp(intv1)
intv2=np.exp(intv2)
intv3=np.exp(intv3)

fig, ax = forecast.fanchart(y[’01-2014′:], pred, intv1, intv2, intv3)
ax.set_xlabel(”)
ax.set_xticks([], minor=True)

plt.title(‘NSW retail turnover forecast (log additive Holt-winters model)’, fontsize=13.5, fontweight=’bold’)
sns.despine()
plt.show()

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com