程序代写 QBUS2820 - Predictive Analytics

Tutorial_9

QBUS2820 – Predictive Analytics
Tutorial 9 – Forecasting¶

This tutorial we study practical application of basic forecasting methods in Python. We will use te functions coded in forecast.py file, which you need to download from Canvas. Also, statlearning.py is needed for this tutorial and available on Canvas.

Data: Australian CPI inflation

Exploratory data analysis

Random Walk

Model diagnostics

Model validation

This notebook relies on the following imports and settings.

# Packages
import warnings
warnings.filterwarnings(“ignore”)
import numpy as np
from scipy import stats
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm

# Plot settings
sns.set_context(‘notebook’)
sns.set_style(‘ticks’)
red=’#D62728′
blue=’#1F77B4′
%matplotlib inline

Data: Australian CPI inflation¶
Our data is the quarterly change in the Consumer Price Index (CPI) calculated by the Australian Bureau of Statistics. The original dataset is in the ABS website, where you can also find the explanatory notes. We use the index for all expenditure groups, which according to the ABS documentation already contains seasonal adjustments for components that are subject to calendar effects.

We start by loading the data and converting the index to quarterly periods (note that we have to specify this frequency when converting the index). We focus on the data since 1980, which has a total of 146 observations.

data=pd.read_csv(‘inflation.csv’, index_col=’Date’, parse_dates=True, dayfirst=True)
data.index=data.index.to_period(freq=’Q’) # converting the index to quarterly period instead of dates
data=data[’01-1980′:] # filtering the use data from Jan/1980 onwards
data.tail()

2020Q4 0.9
2021Q1 0.6
2021Q2 0.8
2021Q3 0.8
2021Q4 1.3

For univariate time series modelling, it is better to work with a pandas series rather than dataframe.

y=data[‘Inflation’]

Exploratory data analysis¶
The first step in our analysis is a time series plot. We can see that both the level (i.e. the mean) and volatility (i.e. the variance) of inflation is much lower in recent times than it was in the 80s. There is a noticeable outlier in the third quarter of 2000 due to introduction of the GST in Australia.

fig, ax= plt.subplots(figsize=(8,5))
y.plot(color=red)
ax.set_xlabel(”)
ax.set_ylabel(‘Inflation’)
ax.set_title(‘Australian Quarterly CPI Inflation’)
ax.set_xticks([], minor=True) # I prefer to remove the minor ticks for a cleaner plot
sns.despine()
plt.show()

y.describe().round(2)

count 168.00
mean 0.95
std 0.88
min -1.90
25% 0.40
50% 0.70
75% 1.40
max 4.10
Name: Inflation, dtype: float64

Random walk¶
In this section we use the random walk method to illustrate the process that we will follow to generate forecasts.

Specify the forecast horizon.

Create a range of dates or periods starting from the time index following the last observation in the data.

Generate the forecasts and store them in a series indexed by step 2.

Below, we generate point forecasts for one to four quarters after the end of the series.

test=pd.period_range(start=y.index[-1]+1, periods=h, freq=’Q’)
pred=pd.Series(np.repeat(y.iloc[-1], h), index=test) # the forecast repeats the last observed values h times

2022Q1 1.3
2022Q2 1.3
2022Q3 1.3
2022Q4 1.3
Freq: Q-DEC, dtype: float64

To compute interval forecasts, we first estimate the standard deviation of the errors.

resid=y-y.shift(1) # the shift lags the series by one period
sigma = resid.std()
round(sigma,3)

Using the formulas from the lecture, the interval forecasts are as below.

intv = pd.concat([pred-stats.norm.ppf(0.975)*sigma*np.sqrt(np.arange(1,h+1)),
pred+stats.norm.ppf(0.975)*sigma*np.sqrt(np.arange(1,h+1))], axis=1)
intv.round(3)

2022Q1 -0.317 2.917
2022Q2 -0.987 3.587
2022Q3 -1.501 4.101
2022Q4 -1.934 4.534

Simple exponential smoothing¶
The exponential smoothing functions are in the forecast module from Canvas. Please also explore what other forecasting models are contained in the forecast module.

import forecast # you need to download the forecast.py file from the Canvas

ses=forecast.ses(y)
fitted=pd.Series(ses.smooth(), index=y.index)
ses.summary()

Simple exponential smoothing

Smoothing parameter:
alpha 0.250 (0.067)

In-sample fit:
MSE 0.439
Log-likelihood -169.214
AIC 344.427
BIC 353.799

fig, ax= plt.subplots(figsize=(8,5))
y.plot(color=red, label=’Inflation’)
fitted.plot(color=’black’, label=’Simple exponential smoothing fit’, alpha=0.8)
ax.set_xlabel(”)
ax.set_ylabel(‘Inflation’)
ax.set_title(‘Australian Quarterly CPI Inflation’)
ax.set_xticks([], minor=True)
plt.legend(loc=’best’)
sns.despine()
plt.show()

Model diagnostics¶
We now conduct residual diagnostics for the exponential smoothing. The key diagnostics for univariate time series are:

Residual plot.
Residula ACF.
Residual distribution plots.

We compute the residulas as follows.

resid=y-fitted

The following cells compute the diagnostics. We find that the residuals are uncorrelated, have non-constant variance (due to higher volatitility in the 80s), and are non-Gaussian.

fig, ax= plt.subplots(figsize=(8,5))
resid.plot(color=blue)
ax.set_xlabel(”)
ax.set_xticks([], minor=True)
ax.set_title(‘Residual plot’)
sns.despine()
plt.show()

fig, ax = plt.subplots(figsize=(8,5))
sm.graphics.tsa.plot_acf(resid, lags=40, ax=ax)
sns.despine()
plt.show()

We can see very low autocorrelation.

def hist(series):
fig, ax= plt.subplots(figsize=(8,5))
sns.distplot(series, ax=ax, hist_kws={‘alpha’: 0.8, ‘edgecolor’:’black’, ‘color’: blue},
kde_kws={‘color’: ‘black’, ‘alpha’: 0.7})
sns.despine()
return fig, ax

hist(resid)
plt.show()

#y=y[‘1991’:]
ses=forecast.ses(y)
ses.summary()

Simple exponential smoothing

Smoothing parameter:
alpha 0.250 (0.067)

In-sample fit:
MSE 0.439
Log-likelihood -169.214
AIC 344.427
BIC 353.799

Model validation¶
We implement a real time forecasting exercise to compare the random walk and simple exponential smoothing methods.

# Real time forecasting – use it as a template

validation=y[‘2004Q1’:].index # the validation period is Q1 2004 onwards
start = y.index.get_loc(‘2004Q1’) # numerical index corresponding to Q1 2005

pred1 = []
pred2 = []
actual= []
for i in range(start, len(y)):

actual.append(y.iloc[i]) # actual value

pred1.append(y.iloc[i-1]) # random walk forecast

model = forecast.ses(y.iloc[:i])
model.fit()
pred2.append(model.forecast(1)[0]) # SES forecast

columns=[‘RW’, ‘SES’, ‘Actual’]
results = np.vstack([pred1,pred2,actual]).T
results = pd.DataFrame(results, columns=columns, index=validation)

We find that simple exponential smoothing generates more accurate forecasts.

You need to download statlearning module from Canvas.¶

from statlearning import rmse_jack

table = pd.DataFrame(0.0, index=results.columns[:-1], columns=[‘RMSE’,’SE’])
for i in range(2):
table.iloc[i,0], table.iloc[i,1] = rmse_jack(results.iloc[:,i], results.iloc[:,-1])
table.round(3)

RW 0.694 0.144
SES 0.564 0.084

We use a fan chart to report our final forecast. For now, the prediction interval is based on the normal distribution, even though we saw that this is not a good assumption for this data.

model = ses

test=pd.period_range(start=y.index[-1]+1, periods=h, freq=’Q’)

pred=pd.Series(model.forecast(h), index=test)

intv1=pd.DataFrame(model.intervalforecast(h, level=.8), index=test)
intv2=pd.DataFrame(model.intervalforecast(h, level=.9), index=test)
intv3=pd.DataFrame(model.intervalforecast(h, level=.99), index=test)

fig, ax = forecast.fanchart(y[’01-2012′:], pred, intv1, intv2, intv3)
ax.set_xlabel(”)
ax.set_xticks([], minor=True)
plt.title(‘Inflation forecast (simple exponential smoothing)’)

sns.despine()
plt.show()

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts