8/26/2021 Homework 3
file:///Users/holiday/Downloads/Homework 3 Solution.html 1/15
Homework 3 Time Series Analysis
Part A
Import data, extract end-of-quarter data, and plot level, difference, and log difference.
In [1]: import pandas as pd
import numpy as np
# read raw PPI data from .xlsx to DataFrame in Python
ppi_raw = pd.read_excel(‘ppi.xlsx’)
ppi_raw.head()
In [2]: # format data and extract end-of-quarter data
ppi_format = ppi_raw.loc[ppi_raw[‘Period’].str.match(‘Mar|Jun|Sep|Dec’),
[‘Year’, ‘Period’, ‘Value’]]
# rename column names
ppi_format = ppi_format.rename(columns={‘Period’: ‘Month’, ‘Value’:’PPI’
})
ppi_format.head()
Out[1]:
Unnamed: 0 Series Id Year Period Value
0 NaN WPSSOP3000 1947 Apr 26.0
1 NaN WPSSOP3000 1947 May 26.1
2 NaN WPSSOP3000 1947 Jun 26.2
3 NaN WPSSOP3000 1947 Jul 26.2
4 NaN WPSSOP3000 1947 Aug 26.3
Out[2]:
Year Month PPI
2 1947 Jun 26.2
5 1947 Sep 26.7
8 1947 Dec 27.7
11 1948 Mar 28.0
14 1948 Jun 28.6
8/26/2021 Homework 3
file:///Users/holiday/Downloads/Homework 3 Solution.html 2/15
In [3]: # add columns of delta PPI and delta ln PPI
ppi_format[‘dPPI’] = ppi_format[‘PPI’].diff()
ppi_format[‘dlnPPI’] = np.log(ppi_format[‘PPI’]).diff()
# remove first row as it does not have valid value for delta PPI and del
ta ln PPI
ppi_format = ppi_format.iloc[1:, :]
ppi_format.head()
Out[3]:
Year Month PPI dPPI dlnPPI
5 1947 Sep 26.7 0.5 0.018904
8 1947 Dec 27.7 1.0 0.036769
11 1948 Mar 28.0 0.3 0.010772
14 1948 Jun 28.6 0.6 0.021202
17 1948 Sep 28.8 0.2 0.006969
8/26/2021 Homework 3
file:///Users/holiday/Downloads/Homework 3 Solution.html 3/15
In [16]: import matplotlib.pyplot as plt
# plot PPI, delta PPI, and delta ln PPI
f, ax = plt.subplots(nrows=3, ncols=1, figsize=(12, 12))
ax[0].plot(ppi_format[‘PPI’])
ax[0].set_title(‘PPI’)
ax[1].plot(ppi_format[‘dPPI’])
ax[1].set_title(‘delta PPI’)
ax[2].plot(ppi_format[‘dlnPPI’])
ax[2].set_title(‘delta ln PPI’)
plt.show()
8/26/2021 Homework 3
file:///Users/holiday/Downloads/Homework 3 Solution.html 4/15
Part B
PPI is obviously not stationary as the mean increases over time. PPI is not stationary as the variance is much
smaller in earlier dates than later. ln(PPI) is more likely to be stationary as both mean and covariance seems to
be constant over time. We select ln(PPI) for following analysis.
A formal test is ADF (Augmented Dickey–Fuller) test.
Δ
Δ
Δ
In [5]: from statsmodels.tsa.stattools import adfuller
# ADF test for delta PPI
result = adfuller(ppi_format[‘dPPI’])
# print p-value
print(result[1])
# ADF test for delta ln PPI
dlnPPI = ppi_format[‘dlnPPI’].tolist()
result = adfuller(dlnPPI)
# print p-value
print(result[1])
PPI has a p-value > 0.05, which means the null hypothesis (contains a unit root, non-stationary) cannot be
rejected at 95% confidence level. So PPI is non-stationary. ln(PPI) has p-value < 0.05, so it is more likely to
be stationary at 95% confidence level.
Δ
Δ Δ
Part C
Plot ACF.
0.12706509779572472
4.528905579802456e-05
8/26/2021 Homework 3
file:///Users/holiday/Downloads/Homework 3 Solution.html 5/15
In [6]: from statsmodels.graphics.tsaplots import plot_acf
# ACF plot for lag 12
plt.figure(figsize=(10, 8))
plot_acf(dlnPPI, lags=12)
plt.show()
ACF decreased quickly. ln(PPI) is most likely stationary. The MA effect is most like within the lag of 3.Δ
Part D
Plot PACF.
8/26/2021 Homework 3
file:///Users/holiday/Downloads/Homework 3 Solution.html 6/15
In [7]: from statsmodels.graphics.tsaplots import plot_pacf
# PACF plot for lag 12
plt.figure(figsize=(10, 8))
plot_pacf(dlnPPI, lags=12)
plt.show()
PACF decreased quickly. ln(PPI) is most likely have AR effect within the lag of 2.Δ
Part E
We first test the simplest ARMA model ARMA(1, 1).
8/26/2021 Homework 3
file:///Users/holiday/Downloads/Homework 3 Solution.html 7/15
In [8]: from statsmodels.tsa.arima_model import ARMA
from statsmodels.stats.diagnostic import acorr_ljungbox
import seaborn as sns
def arma_analysis(data, arma_order):
# function for ARMA model and residual analysis
# fit ARMA model and print summary
model_fit = ARMA(data, order=arma_order).fit(disp=0)
print(‘Model summary’)
print(model_fit.summary())
# check stationary, ARMA() function automatically check stationary a
nd return error if non-stationary
print(‘\nCheck stationary (arroots much all be bigger than 1, the de
finition of roots is inversion of that in Lecture)’)
print(model_fit.arroots)
# Residual ACF for 12 lags
resid = model_fit.resid
print(‘\nResidual ACF’)
plt.figure(figsize=(10, 8))
plot_acf(resid, lags=12)
plt.show()
# Ljung-Box test of residual (p-value)
lbtest = acorr_ljungbox(resid, lags=12)
print(‘\nLjung-Box test of residual (p-value)’)
print(lbtest[1])
# residual distribution
print(‘\nResidual scatter plot and distribution’)
f, ax = plt.subplots(nrows=1, ncols=2, figsize=(12, 6))
ax[0].plot(resid)
ax[0].set_title(‘Residual’)
sns.distplot(resid, hist=True, kde=True, ax=ax[1])
ax[1].set_title(‘Residual Distribution’)
plt.show()
# AIC
print(‘\nAIC’)
print(model_fit.aic)
8/26/2021 Homework 3
file:///Users/holiday/Downloads/Homework 3 Solution.html 8/15
In [9]: # ARMA(1, 1)
arma_analysis(dlnPPI, [1, 1])
8/26/2021 Homework 3
file:///Users/holiday/Downloads/Homework 3 Solution.html 9/15
Model summary
ARMA Model Results
=======================================================================
=======
Dep. Variable: y No. Observations:
246
Model: ARMA(1, 1) Log Likelihood
750.864
Method: css-mle S.D. of innovations
0.011
Date: Mon, 19 Aug 2019 AIC -1
493.727
Time: 20:53:46 BIC -1
479.706
Sample: 0 HQIC -1
488.081
=======================================================================
=======
coef std err z P>|z| [0.025
0.975]
———————————————————————–
——-
const 0.0072 0.002 3.640 0.000 0.003
0.011
ar.L1.y 0.7945 0.072 11.084 0.000 0.654
0.935
ma.L1.y -0.4324 0.106 -4.077 0.000 -0.640
-0.225
Roots
=======================================================================
======
Real Imaginary Modulus Fre
quency
———————————————————————–
——
AR.1 1.2586 +0.0000j 1.2586
0.0000
MA.1 2.3127 +0.0000j 2.3127
0.0000
———————————————————————–
——
Check stationary (arroots much all be bigger than 1, the definition of
roots is inversion of that in Lecture)
[1.25862716]
Residual ACF
8/26/2021 Homework 3
file:///Users/holiday/Downloads/Homework 3 Solution.html 10/15
From characteristic root, we can see that this ARMA(1, 1) process is stationary. The residual is a white noise as
ACF of lags are insignificant, and the distribution appears to be normal.
We can also use auto_arima function from pmdarima package to perform stepwise auto-selection of p
and q for ARMA model.
Ljung-Box test of residual (p-value)
[0.72941318 0.67254639 0.79180719 0.5638434 0.68308625 0.7928066
0.72916177 0.64707825 0.64659405 0.72062997 0.43314038 0.51328013]
Residual scatter plot and distribution
AIC
-1493.7270552295504
8/26/2021 Homework 3
file:///Users/holiday/Downloads/Homework 3 Solution.html 11/15
In [10]: import pmdarima as pm
# automatically select ARMA, need to install pmdarima package first
model = pm.auto_arima(dlnPPI, start_p=1, d=0, start_q=1, D=0, seasonal=F
alse, stationary = True, stepwise=True,
suppress_warnings=True, error_action=’ignore’, tra
ce=True)
# ARMA model summary
print(model.summary())
8/26/2021 Homework 3
file:///Users/holiday/Downloads/Homework 3 Solution.html 12/15
Fit ARIMA: order=(1, 0, 1); AIC=-1493.727, BIC=-1479.706, Fit time=0.09
1 seconds
Fit ARIMA: order=(0, 0, 0); AIC=-1434.379, BIC=-1427.368, Fit time=0.01
0 seconds
Fit ARIMA: order=(1, 0, 0); AIC=-1483.275, BIC=-1472.759, Fit time=0.02
9 seconds
Fit ARIMA: order=(0, 0, 1); AIC=-1464.857, BIC=-1454.341, Fit time=0.03
5 seconds
Fit ARIMA: order=(2, 0, 1); AIC=-1493.318, BIC=-1475.792, Fit time=0.15
7 seconds
Fit ARIMA: order=(1, 0, 2); AIC=-1493.808, BIC=-1476.281, Fit time=0.14
1 seconds
Fit ARIMA: order=(2, 0, 3); AIC=nan, BIC=nan, Fit time=nan seconds
Fit ARIMA: order=(0, 0, 2); AIC=-1481.563, BIC=-1467.542, Fit time=0.09
1 seconds
Fit ARIMA: order=(2, 0, 2); AIC=-1491.817, BIC=-1470.785, Fit time=0.28
2 seconds
Fit ARIMA: order=(1, 0, 3); AIC=-1491.831, BIC=-1470.799, Fit time=0.26
8 seconds
Total fit time: 1.138 seconds
ARMA Model Results
=======================================================================
=======
Dep. Variable: y No. Observations:
246
Model: ARMA(1, 2) Log Likelihood
751.904
Method: css-mle S.D. of innovations
0.011
Date: Mon, 19 Aug 2019 AIC -1
493.808
Time: 20:53:55 BIC -1
476.281
Sample: 0 HQIC -1
486.750
=======================================================================
=======
coef std err z P>|z| [0.025
0.975]
———————————————————————–
——-
const 0.0072 0.002 3.887 0.000 0.004
0.011
ar.L1.y 0.6962 0.119 5.857 0.000 0.463
0.929
ma.L1.y -0.3578 0.139 -2.571 0.011 -0.631
-0.085
ma.L2.y 0.1410 0.093 1.509 0.133 -0.042
0.324
Roots
=======================================================================
======
Real Imaginary Modulus Fre
quency
———————————————————————–
——
8/26/2021 Homework 3
file:///Users/holiday/Downloads/Homework 3 Solution.html 13/15
ARMA(1, 2) seems to be the best fit model with the lowest AIC (-1493.808). pmdarima package can very
conveniently diagnose residual by using its plot_diagnostics function.
In [11]: # residual diagnostics
model.plot_diagnostics(figsize=(12, 12))
plt.show()
AR.1 1.4363 +0.0000j 1.4363
0.0000
MA.1 1.2683 -2.3412j 2.6627 –
0.1710
MA.2 1.2683 +2.3412j 2.6627
0.1710
———————————————————————–
——
8/26/2021 Homework 3
file:///Users/holiday/Downloads/Homework 3 Solution.html 14/15
Part F
The one-step-ahead, out-of-sample forecast is a very typical forecast for time series model. Different from
cross-sectional data, as time moves further, if no test data is added, it will become multiple-step forecast with
increasing estimation error and uncertainty, which does not provide any useful information than one-step-ahead
forecast.
In order to do one-step-ahead forecast, model is still trained by using training set. However, for each step
forward, test data before that step is added for forecast.
This type of forecast can be easily coded in MATLAB and R. However, Python does not seem to have this
capability. Phthon seems to only have multiple-step out-of-sample forecast and in-sample prediction. In order to
perform this one-step-ahead, out-of-sample forecast, we use some “trick” in the code by refitting model using
test data, set initial parameters as those of training model, set maximal iteration very low, and tolerance very
high. This will minimize the deviation of refitted parameters from initial parameters and makes refitted model
essentially training model but incorporates test data.
An alternative approach is to manually do the one-step-ahead forecast by using ARMA formula with parameters
from training model. However, that is quite cumbersome, especially if working with four ARMA models with
different orders.
In [12]: # index of 2000.IV
np.where((ppi_format[‘Year’] == 2000) & (ppi_format[‘Month’] == ‘Dec’))[
0]
Out[12]: array([213], dtype=int64)
8/26/2021 Homework 3
file:///Users/holiday/Downloads/Homework 3 Solution.html 15/15
In [13]: import warnings
def arma_predict_one_step(data, arma_order, param, start):
# function for out-of-sample, one-step forecast. The forecast use mo
del parameters from training and add one test
# value each step to get out-of-sample, one-step forecast.
# in MATLAB and R, there are direct ways of doing this analysis. Pyt
hon lacks this capability. Therefore, here we
# use an indirect way as a hack.
# surppress warning
warnings.filterwarnings(“ignore”)
# add new data but still use old model parameters by setting maxiter
very low and tol very high
# as iteration turns off, refitted parameters will be very close to
training parameters given in param
model = ARMA(data, order=arma_order).fit(start_params=param, transpa
rams=False, maxiter=0, tol=1e10)
# turn on warning
warnings.filterwarnings(“default”)
# out-of-sample one-step prediction by using in-sample mechanism (dy
namic=False)
# this can be achieved as refitted parameters is very close to train
ing parameters
predict = model.predict(start=start, end=(len(data) – 1), dynamic=Fa
lse)
return predict
In [14]: # training ARMA(1, 2) model
model_fit = ARMA(dlnPPI[0:214], order=[1, 2]).fit(disp=0)
# out-of-sample, one-step forecast
predict = arma_predict_one_step(dlnPPI, [1, 2], model_fit.params, 214)
# calculate MSPE
err = predict – dlnPPI[214:]
mspe = np.average(err**2)
print(mspe)
In [15]: # MSPE assuming no predictability (current value is the best forecast fo
r future value)
np.average((np.array(dlnPPI[213:-1]) – np.array(dlnPPI[214:]))**2)
ARMA(1, 2) has lower MSPE than no predictability, which means that forecast from ARMA(1, 2) matches actual
value better than no model.
In [ ]:
0.00038411371796335993
Out[15]: 0.00043350562564963664