Tutorial_9
QBUS2820 – Predictive Analytics
Tutorial 9 – Forecasting¶
Copyright By PowCoder代写 加微信 powcoder
This tutorial we study practical application of basic forecasting methods in Python. We will use te functions coded in forecast.py file, which you need to download from Canvas. Also, statlearning.py is needed for this tutorial and available on Canvas.
Data: Australian CPI inflation
Exploratory data analysis
Random Walk
Model diagnostics
Model validation
This notebook relies on the following imports and settings.
# Packages
import warnings
warnings.filterwarnings(“ignore”)
import numpy as np
from scipy import stats
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
# Plot settings
sns.set_context(‘notebook’)
sns.set_style(‘ticks’)
red=’#D62728′
blue=’#1F77B4′
%matplotlib inline
Data: Australian CPI inflation¶
Our data is the quarterly change in the Consumer Price Index (CPI) calculated by the Australian Bureau of Statistics. The original dataset is in the ABS website, where you can also find the explanatory notes. We use the index for all expenditure groups, which according to the ABS documentation already contains seasonal adjustments for components that are subject to calendar effects.
We start by loading the data and converting the index to quarterly periods (note that we have to specify this frequency when converting the index). We focus on the data since 1980, which has a total of 146 observations.
data=pd.read_csv(‘inflation.csv’, index_col=’Date’, parse_dates=True, dayfirst=True)
data.index=data.index.to_period(freq=’Q’) # converting the index to quarterly period instead of dates
data=data[’01-1980′:] # filtering the use data from Jan/1980 onwards
data.tail()
2020Q4 0.9
2021Q1 0.6
2021Q2 0.8
2021Q3 0.8
2021Q4 1.3
For univariate time series modelling, it is better to work with a pandas series rather than dataframe.
y=data[‘Inflation’]
Exploratory data analysis¶
The first step in our analysis is a time series plot. We can see that both the level (i.e. the mean) and volatility (i.e. the variance) of inflation is much lower in recent times than it was in the 80s. There is a noticeable outlier in the third quarter of 2000 due to introduction of the GST in Australia.
fig, ax= plt.subplots(figsize=(8,5))
y.plot(color=red)
ax.set_xlabel(”)
ax.set_ylabel(‘Inflation’)
ax.set_title(‘Australian Quarterly CPI Inflation’)
ax.set_xticks([], minor=True) # I prefer to remove the minor ticks for a cleaner plot
sns.despine()
plt.show()
y.describe().round(2)
count 168.00
mean 0.95
std 0.88
min -1.90
25% 0.40
50% 0.70
75% 1.40
max 4.10
Name: Inflation, dtype: float64
Random walk¶
In this section we use the random walk method to illustrate the process that we will follow to generate forecasts.
Specify the forecast horizon.
Create a range of dates or periods starting from the time index following the last observation in the data.
Generate the forecasts and store them in a series indexed by step 2.
Below, we generate point forecasts for one to four quarters after the end of the series.
test=pd.period_range(start=y.index[-1]+1, periods=h, freq=’Q’)
pred=pd.Series(np.repeat(y.iloc[-1], h), index=test) # the forecast repeats the last observed values h times
2022Q1 1.3
2022Q2 1.3
2022Q3 1.3
2022Q4 1.3
Freq: Q-DEC, dtype: float64
To compute interval forecasts, we first estimate the standard deviation of the errors.
resid=y-y.shift(1) # the shift lags the series by one period
sigma = resid.std()
round(sigma,3)
Using the formulas from the lecture, the interval forecasts are as below.
intv = pd.concat([pred-stats.norm.ppf(0.975)*sigma*np.sqrt(np.arange(1,h+1)),
pred+stats.norm.ppf(0.975)*sigma*np.sqrt(np.arange(1,h+1))], axis=1)
intv.round(3)
2022Q1 -0.317 2.917
2022Q2 -0.987 3.587
2022Q3 -1.501 4.101
2022Q4 -1.934 4.534
Simple exponential smoothing¶
The exponential smoothing functions are in the forecast module from Canvas. Please also explore what other forecasting models are contained in the forecast module.
import forecast # you need to download the forecast.py file from the Canvas
ses=forecast.ses(y)
fitted=pd.Series(ses.smooth(), index=y.index)
ses.summary()
Simple exponential smoothing
Smoothing parameter:
alpha 0.250 (0.067)
In-sample fit:
MSE 0.439
Log-likelihood -169.214
AIC 344.427
BIC 353.799
fig, ax= plt.subplots(figsize=(8,5))
y.plot(color=red, label=’Inflation’)
fitted.plot(color=’black’, label=’Simple exponential smoothing fit’, alpha=0.8)
ax.set_xlabel(”)
ax.set_ylabel(‘Inflation’)
ax.set_title(‘Australian Quarterly CPI Inflation’)
ax.set_xticks([], minor=True)
plt.legend(loc=’best’)
sns.despine()
plt.show()
Model diagnostics¶
We now conduct residual diagnostics for the exponential smoothing. The key diagnostics for univariate time series are:
Residual plot.
Residula ACF.
Residual distribution plots.
We compute the residulas as follows.
resid=y-fitted
The following cells compute the diagnostics. We find that the residuals are uncorrelated, have non-constant variance (due to higher volatitility in the 80s), and are non-Gaussian.
fig, ax= plt.subplots(figsize=(8,5))
resid.plot(color=blue)
ax.set_xlabel(”)
ax.set_xticks([], minor=True)
ax.set_title(‘Residual plot’)
sns.despine()
plt.show()
fig, ax = plt.subplots(figsize=(8,5))
sm.graphics.tsa.plot_acf(resid, lags=40, ax=ax)
sns.despine()
plt.show()
We can see very low autocorrelation.
def hist(series):
fig, ax= plt.subplots(figsize=(8,5))
sns.distplot(series, ax=ax, hist_kws={‘alpha’: 0.8, ‘edgecolor’:’black’, ‘color’: blue},
kde_kws={‘color’: ‘black’, ‘alpha’: 0.7})
sns.despine()
return fig, ax
hist(resid)
plt.show()
#y=y[‘1991’:]
ses=forecast.ses(y)
ses.summary()
Simple exponential smoothing
Smoothing parameter:
alpha 0.250 (0.067)
In-sample fit:
MSE 0.439
Log-likelihood -169.214
AIC 344.427
BIC 353.799
Model validation¶
We implement a real time forecasting exercise to compare the random walk and simple exponential smoothing methods.
# Real time forecasting – use it as a template
validation=y[‘2004Q1’:].index # the validation period is Q1 2004 onwards
start = y.index.get_loc(‘2004Q1’) # numerical index corresponding to Q1 2005
pred1 = []
pred2 = []
actual= []
for i in range(start, len(y)):
actual.append(y.iloc[i]) # actual value
pred1.append(y.iloc[i-1]) # random walk forecast
model = forecast.ses(y.iloc[:i])
model.fit()
pred2.append(model.forecast(1)[0]) # SES forecast
columns=[‘RW’, ‘SES’, ‘Actual’]
results = np.vstack([pred1,pred2,actual]).T
results = pd.DataFrame(results, columns=columns, index=validation)
We find that simple exponential smoothing generates more accurate forecasts.
You need to download statlearning module from Canvas.¶
from statlearning import rmse_jack
table = pd.DataFrame(0.0, index=results.columns[:-1], columns=[‘RMSE’,’SE’])
for i in range(2):
table.iloc[i,0], table.iloc[i,1] = rmse_jack(results.iloc[:,i], results.iloc[:,-1])
table.round(3)
RW 0.694 0.144
SES 0.564 0.084
We use a fan chart to report our final forecast. For now, the prediction interval is based on the normal distribution, even though we saw that this is not a good assumption for this data.
model = ses
test=pd.period_range(start=y.index[-1]+1, periods=h, freq=’Q’)
pred=pd.Series(model.forecast(h), index=test)
intv1=pd.DataFrame(model.intervalforecast(h, level=.8), index=test)
intv2=pd.DataFrame(model.intervalforecast(h, level=.9), index=test)
intv3=pd.DataFrame(model.intervalforecast(h, level=.99), index=test)
fig, ax = forecast.fanchart(y[’01-2012′:], pred, intv1, intv2, intv3)
ax.set_xlabel(”)
ax.set_xticks([], minor=True)
plt.title(‘Inflation forecast (simple exponential smoothing)’)
sns.despine()
plt.show()
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com