ECON7350: Applied Econometrics for Macroeconomics and Finance
Tutorial 6: Cointegration – I
At the end of this tutorial you should be able to:
• Automate the task of unit root testing in multiple time-series samples in R;
Copyright By PowCoder代写 加微信 powcoder
• Implement the Engle-Granger cointegration test in R;
• Interpret the outcome of an Engle-Granger.
• Use the outcome of the Engle-Granger test to infer possible cointegrating relations.
In this tutorial you will test for cointegration using the Engle-Granger method. The data consists of four Australian interest rates: the 5 year (i3y) and 3 year (i3y) Treasury Bond (i.e., Capital Market) rates, along with the 180 day (i180d) and 90 (i90d) day Bank Accepted Bill (i.e., Money Market) rates. The data are annualized monthly rates for the period June 1992—August 2010 (T = 219), and are saved in term_structure.csv.
1. Analyse the integration properties of each individual process: {i3yt}, {i5yt}, {i90dt} and {i180dt}. Based on the data, what inference can we draw about each of these processes resembling a unit root process?
Solution For this tutorial, we load the following useful packages.
It is also useful to create some functions to help automate the task of constructing adequate sets for ADF specifications. The following two functions estimate the coefficients and record AIC/BIC values for a range of ADF regressions specified by lags combined with the inclusion and/or excludion of a constant and/or trend.
library(forecast)
library(dplyr)
library(zoo)
library(aTSA)
One function performs the estimation in levels, while the other does the same in differences.
# create a function to estimate a range of ADF regression # specifications in levels along with the AICs and BICs ADF_estimate_lev <- function(y, p_max = 9)
TT <- length(y)
ADF_est <- list()
ic <- matrix(nrow = 3 * (1 + p_max), ncol = 5) colnames(ic) <- c("const", "trend", "p", "aic", "bic") i <- 0
for (const in 0:1)
for (p in 0:p_max) {
i <- i + 1
ADF_est[[i]] <- Arima(diff(y), xreg = y[-TT],
order = c(p, 0, 0),
include.mean = as.logical(const),
include.drift = F)
ic[i,] <- c(const, 0, p, ADF_est[[i]]$aic,
ADF_est[[i]]$bic)
if (const) {
# only add a specification with trend if there is a # constant (i.e., exclude no constant with trend) for (p in 0:p_max)
i <- i + 1
ADF_est[[i]] <- Arima(diff(y), xreg = y[-TT],
order = c(p, 0, 0),
include.mean = as.logical(const),
include.drift = T)
ic[i,] <- c(const, 1, p, ADF_est[[i]]$aic,
ADF_est[[i]]$bic)
ic_aic <- ic[order(ic[,4]),][1:10,]
ic_bic <- ic[order(ic[,5]),][1:10,]
return(list(ADF_est = ADF_est, ic = ic,
ic_aic = ic_aic, ic_bic = ic_bic))
# create a function to estimate a range of ADF regression
# specifications in differences along with the AICs and BICs ADF_estimate_diff <- function(y, p_max = 9)
TT <- length(diff(y))
ADF_est_diff <- list()
ic_diff <- matrix(nrow = 3 * (1 + p_max), ncol = 5) colnames(ic_diff) <- c("const", "trend", "p", "aic", "bic") i <- 0
for (const in 0:1)
for (p in 0:p_max) {
i <- i + 1
ADF_est_diff[[i]] <- Arima(diff(diff(y)),
xreg = diff(y)[-TT],
order = c(p, 0, 0),
include.mean = as.logical(const),
include.drift = F)
ic_diff[i,] <- c(const, 0, p, ADF_est_diff[[i]]$aic,
ADF_est_diff[[i]]$bic)
if (const) {
# only add a specification with trend if there is a # constant (i.e., exclude no constant with trend) for (p in 0:p_max)
i <- i + 1
ADF_est_diff[[i]] <- Arima(diff(diff(y)),
xreg = diff(y)[-TT],
order = c(p, 0, 0),
include.mean = as.logical(const),
include.drift = T)
ic_diff[i,] <- c(const, 1, p, ADF_est_diff[[i]]$aic,
ADF_est_diff[[i]]$bic)
ic_aic_diff <- ic_diff[order(ic_diff[,4]),][1:10,]
ic_bic_diff <- ic_diff[order(ic_diff[,5]),][1:10,]
return(list(ADF_est_diff = ADF_est_diff,
ic_diff = ic_diff,
ic_aic_diff = ic_aic_diff,
ic_bic_diff = ic_bic_diff))
Next, load the data and and extract the four variables.
mydata <- read.delim("term_structure.csv", header = TRUE,
sep = ",")
dates <- as.yearqtr(mydata$obs)
i3y <- mydata$I3Y
i5y <- mydata$I5Y
i90d <- mydata$I90D
i180d <- mydata$I180D
Now, consider the proximity of {i3yt} to a unit root process. We begin by constructing an adequate set of ADF regressions in the level of {i3yt}.
i3y_ADF_lev <- ADF_estimate_lev(i3y, p_max = 15)
print(i3y_ADF_lev$ic_aic)
## [10,] 1
const trend p 110 112 111 012 113 010 012 114 17 011
print(i3y_ADF_lev$ic_bic)
const trend p aic 01 118.3726 01 116.0791 02 113.7241 02 117.3288 03 110.8348 13 107.5159 12 111.4567 11 115.0552
## [9,] 0 0 0 125.2605 132.0295
## [10,] 0 0 3 116.1076 133.0301
The AIC and BIC ranking do not have any specifications in common, so we select from both top 10 rankings in a way that reflects some agreement. This is obviously very subjective! The justification we use as follows. From the AIC list, take the most preferred specification along with a few others that have the lowest BIC values. Then, do the same using the BIC list.
As result, we obtain the following set of specifications on which we run our residuals analysis.
i3y_adq_set <- as.matrix(arrange(as.data.frame(
rbind(i3y_ADF_lev$ic_aic[c(1, 6, 9),],
i3y_ADF_lev$ic_bic[c(1, 3, 5:7),])),
const, trend, p))
i3y_adq_idx <- match(data.frame(t(i3y_adq_set[, 1:3])),
data.frame(t(i3y_ADF_lev$ic[, 1:3])))
for (i in 1:length(i3y_adq_idx)) {
checkresiduals(i3y_ADF_lev$ADF_est[[i3y_adq_idx[i]]])
Residuals from Regression with ARIMA(1,0,0) errors
0 50 100 150 200
0 5 10 15 20
## Ljung-Box test
## data: Residuals from Regression with ARIMA(1,0,0) errors
## Q* = 20.038, df = 8, p-value = 0.01019
## Model df: 2. Total lags used: 10
Residuals from Regression with ARIMA(2,0,0) errors
0 50 100 150 200
0 5 10 15 20
## Ljung-Box test
## data: Residuals from Regression with ARIMA(2,0,0) errors
## Q* = 18.425, df = 6, p-value = 0.005253
## Model df: 4. Total lags used: 10
Residuals from Regression with ARIMA(3,0,0) errors
0 50 100 150 200
0 5 10 15 20
## Ljung-Box test
## data: Residuals from Regression with ARIMA(3,0,0) errors
## Q* = 16.161, df = 5, p-value = 0.0064
## Model df: 5. Total lags used: 10
Residuals from Regression with ARIMA(10,0,0) errors
0 50 100 150 200
0.15 0.10 0.05 0.00
−0.05 −0.10
0 5 10 15 20
## Ljung-Box test
## data: Residuals from Regression with ARIMA(10,0,0) errors
## Q* = 7.0612, df = 3, p-value = 0.06997
## Model df: 12. Total lags used: 15
Residuals from Regression with ARIMA(2,0,0) errors
0 50 100 150 200
0 5 10 15 20
## Ljung-Box test
## data: Residuals from Regression with ARIMA(2,0,0) errors
## Q* = 19.404, df = 5, p-value = 0.001616
## Model df: 5. Total lags used: 10
Residuals from Regression with ARIMA(3,0,0) errors
0 50 100 150 200
0 5 10 15 20
## Ljung-Box test
## data: Residuals from Regression with ARIMA(3,0,0) errors
## Q* = 15.443, df = 4, p-value = 0.003866
## Model df: 6. Total lags used: 10
Residuals from Regression with ARIMA(7,0,0) errors
0 50 100 150 200
0 5 10 15 20
## Ljung-Box test
## data: Residuals from Regression with ARIMA(7,0,0) errors
## Q* = 8.9851, df = 3, p-value = 0.02949
## Model df: 10. Total lags used: 13
Residuals from Regression with ARIMA(10,0,0) errors
0 50 100 150 200
0.10 0.05 0.00
−0.05 −0.10
0 5 10 15 20
## Ljung-Box test
## data: Residuals from Regression with ARIMA(10,0,0) errors
## Q* = 6.5159, df = 3, p-value = 0.08904
## Model df: 13. Total lags used: 16
We reject white noise residuals at the 5% significance level for all models with p < 10. Hence, we remove all models except the two with p = 10, both containing a constant and one also containing a trend.
Given our adequate set of ADF regressions, we should run the ADF test with nlag = 11, but we will use nlag = 15 just to check how sensitive the results are to including more lags (which the AIC prefers, but the BIC rejects).
adf.test(i3y, nlag = 15)
## Augmented Dickey-
## alternative: stationary
## Type 1: no drift no trend
## lag ADF p.value
## [1,] 0 -0.896 0.358
## [2,] 1 -0.697 0.429
## [3,] 2 -1.220 0.242
## [4,] 3 -1.161 0.264
## [5,] 4 -1.103 0.284
## [6,] 5 -1.022 0.313
## [7,] 6 -0.915 0.352
## [8,] 7 -0.950 0.339
## [9,] 8 -0.761 0.406
## [10,] 9 -0.773 0.402
## [11,] 10 -0.738 0.415
## [12,] 11 -0.807 0.390
## [13,] 12 -0.709 0.425
## [14,] 13 -0.603 0.463
## [15,] 14 -0.645 0.448
## Type 2: with drift no trend
## lag ADF p.value
## [1,] 0 -1.74 0.4298
## [2,] 1 -2.05 0.3087
## [3,] 2 -2.80 0.0634
## [4,] 3 -3.10 0.0295
## [5,] 4 -2.85 0.0553
## [6,] 5 -2.38 0.1779
## [7,] 6 -2.22 0.2424
## [8,] 7 -2.65 0.0893
## [9,] 8 -2.32 0.2036
## [10,] 9 -2.54 0.1154
## [11,] 10 -2.09 0.2924
## [12,] 11 -1.97 0.3400
## [13,] 12 -1.60 0.4828
## [14,] 13 -1.55 0.5018
## [15,] 14 -1.37 0.5683
## Type 3: with drift and trend
## lag ADF p.value
## [1,] 0 -2.27 0.4618
## [2,] 1 -2.90 0.1999
## [3,] 2 -3.29 0.0728
## [4,] 3 -3.79 0.0203
## [5,] 4 -3.51 0.0422
## [6,] 5 -2.97 0.1689
## [7,] 6 -2.90 0.1985
## [8,] 7 -3.45 0.0475
## [9,] 8 -3.28 0.0758
## [10,] 9 -3.63 0.0310
## [11,] 10 -3.05 0.1333
## [12,] 11 -2.83 0.2293
## [13,] 12 -2.52 0.3550
## [14,] 13 -2.65 0.3046
## [15,] 14 -2.39 0.4093
## Note: in fact, p.value = 0.01 means p.value <= 0.01
For specifications with a constant and a trend along with p ≥ 9 the null cannot be rejected at the 5% significance level. The same conclusion holds for specifications with a constant, no trend and p ≥ 10. Overall evidence suggests {i3yt} is not empirically distinguishable from a unit root process.
Accordingly, we repeat the exercise for the differenced process {∆i3yt}.
i3y_ADF_diff <- ADF_estimate_diff(i3y, p_max = 15)
print(i3y_ADF_diff$ic_aic_diff)
## [10,] 1
## [10,] 1
const trend p 012 04 010 011 05 012 013 04 010 011
print(i3y_ADF_diff$ic_bic_diff)
const trend p aic 04 78.52513 01 89.38183 05 79.34904 02 90.39326 04 80.46394 01 91.37490 03 90.16004 05 81.28020 02 92.37827 14 82.30717
i3y_adq_set_diff <- as.matrix(arrange(as.data.frame(
i3y_ADF_diff$ic_bic_diff[c(1, 3, 5),]),
const, trend, p))
i3y_adq_idx_diff <- match(data.frame(
t(i3y_adq_set_diff[, 1:3])),
data.frame(
t(i3y_ADF_diff$ic_diff[, 1:3])))
for (i in 1:length(i3y_adq_idx_diff))
checkresiduals(
i3y_ADF_diff$ADF_est_diff[[i3y_adq_idx_diff[i]]])
Residuals from Regression with ARIMA(4,0,0) errors
0 50 100 150 200
0 5 10 15 20
0.0 0.5 1.0
## Ljung-Box test
## data: Residuals from Regression with ARIMA(4,0,0) errors
## Q* = 7.1851, df = 5, p-value = 0.2072
## Model df: 5. Total lags used: 10
Residuals from Regression with ARIMA(5,0,0) errors
0 50 100 150 200
0 5 10 15 20
0.0 0.5 1.0
## Ljung-Box test
## data: Residuals from Regression with ARIMA(5,0,0) errors
## Q* = 6.5018, df = 4, p-value = 0.1647
## Model df: 6. Total lags used: 10
Residuals from Regression with ARIMA(4,0,0) errors
0 50 100 150 200
0 5 10 15 20
0.0 0.5 1.0
## Ljung-Box test
## data: Residuals from Regression with ARIMA(4,0,0) errors
## Q* = 7.1557, df = 4, p-value = 0.1279
## Model df: 6. Total lags used: 10
adf.test(diff(i3y))
## Augmented Dickey-
## alternative: stationary
## Type 1: no drift no trend
## lag ADF p.value
## [1,] 0 -12.23 0.01
## [2,] 1 -8.31 0.01
## [3,] 2 -6.28 0.01
## [4,] 3 -6.30 0.01
## [5,] 4 -6.85 0.01
## Type 2: with drift no trend
## lag ADF p.value
## [1,] 0 -12.21 0.01
## [2,] 1 -8.32 0.01
## [3,] 2 -6.29 0.01
## [4,] 3 -6.31 0.01
## [5,] 4 -6.86 0.01
## Type 3: with drift and trend
## lag ADF p.value
## [1,] 0 -12.19 0.01
## [2,] 1 -8.30 0.01
## [3,] 2 -6.27 0.01
## [4,] 3 -6.30 0.01
## [5,] 4 -6.84 0.01
## Note: in fact, p.value = 0.01 means p.value <= 0.01
The null is rejected for all specifications. We conclude {∆i3yt} is empirically distinguishable from an I(1) process, which means {i3yt} is not empirically distin- guishable from an I(1) process.
Repeating for {i5yt} and {∆i5yt}, we obtain the following.
i5y_ADF_lev <- ADF_estimate_lev(i5y, p_max = 15)
print(i5y_ADF_lev$ic_aic)
## const trend p
## [10,] 1
110 13 15 14 17 112 19 111 12 16
print(i5y_ADF_lev$ic_bic)
## const trend p aic
## [10,] 0
01 86.51082 01 84.62997 02 86.58971 11 83.38674 02 83.91952 00 94.42651 12 81.54914 13 78.72176 03 82.46661 03 86.18707
i5y_adq_set <- as.matrix(arrange(as.data.frame(
rbind(i5y_ADF_lev$ic_aic[1,],
i5y_ADF_lev$ic_bic[c(1, 7:8),])),
const, trend, p))
i5y_adq_idx <- match(data.frame(t(i5y_adq_set[, 1:3])),
data.frame(t(i5y_ADF_lev$ic[, 1:3])))
for (i in 1:length(i5y_adq_idx)) {
checkresiduals(i5y_ADF_lev$ADF_est[[i5y_adq_idx[i]]])
Residuals from Regression with ARIMA(1,0,0) errors
1.5 1.0 0.5 0.0
0 50 100 150 200
0.10 0.05 0.00
−0.05 −0.10 −0.15
0 5 10 15 20
## Ljung-Box test
## data: Residuals from Regression with ARIMA(1,0,0) errors
## Q* = 14.115, df = 8, p-value = 0.07883
## Model df: 2. Total lags used: 10
Residuals from Regression with ARIMA(2,0,0) errors
1.5 1.0 0.5 0.0
0 50 100 150 200
0.15 0.10 0.05 0.00
−0.05 −0.10
0 5 10 15 20
−1.0 −0.5 0.0 0.5 1.0 1.5
## Ljung-Box test
## data: Residuals from Regression with ARIMA(2,0,0) errors
## Q* = 14.329, df = 5, p-value = 0.01365
## Model df: 5. Total lags used: 10
Residuals from Regression with ARIMA(3,0,0) errors
1.5 1.0 0.5 0.0
0 50 100 150 200
0.15 0.10 0.05 0.00
−0.05 −0.10
0 5 10 15 20
40 30 20 10
## Ljung-Box test
## data: Residuals from Regression with ARIMA(3,0,0) errors
## Q* = 11.163, df = 4, p-value = 0.02479
## Model df: 6. Total lags used: 10
Residuals from Regression with ARIMA(10,0,0) errors
0 50 100 150 200
0.15 0.10 0.05 0.00
−0.05 −0.10
0 5 10 15 20
0.0 0.5 1.0
## Ljung-Box test
## data: Residuals from Regression with ARIMA(10,0,0) errors
## Q* = 5.6557, df = 3, p-value = 0.1296
## Model df: 13. Total lags used: 16
i5y_adq_set <- as.matrix(arrange(as.data.frame(
rbind(i5y_ADF_lev$ic_aic[c(1, 6:8),],
i5y_ADF_lev$ic_bic[c(1, 3, 6, 10),])),
const, trend, p))
i5y_adq_idx <- match(data.frame(t(i5y_adq_set[, 1:3])),
data.frame(t(i5y_ADF_lev$ic[, 1:3])))
for (i in 1:length(i5y_adq_idx)) {
checkresiduals(i5y_ADF_lev$ADF_est[[i5y_adq_idx[i]]])
Residuals from Regression with ARIMA(0,0,0) errors
1.0 0.5 0.0
0 50 100 150 200
0 5 10 15 20
40 30 20 10
−1.0 −0.5 0.0 0.5
## Ljung-Box test
## data: Residuals from Regression with ARIMA(0,0,0) errors
## Q* = 25.775, df = 9, p-value = 0.002223
## Model df: 1. Total lags used: 10
Residuals from Regression with ARIMA(1,0,0) errors
1.5 1.0 0.5 0.0
0 50 100 150 200
0.10 0.05 0.00
−0.05 −0.10 −0.15
0 5 10 15 20
## Ljung-Box test
## data: Residuals from Regression with ARIMA(1,0,0) errors
## Q* = 14.115, df = 8, p-value = 0.07883
## Model df: 2. Total lags used: 10
Residuals from Regression with ARIMA(2,0,0) errors
1.5 1.0 0.5 0.0
0 50 100 150 200
0.10 0.05 0.00
−0.05 −0.10 −0.15
0 5 10 15 20
## Ljung-Box test
## data: Residuals from Regression with ARIMA(2,0,0) errors
## Q* = 13.283, df = 7, p-value = 0.0655
## Model df: 3. Total lags used: 10
Residuals from Regression with ARIMA(3,0,0) errors
1.5 1.0 0.5 0.0
0 50 100 150 200
0.15 0.10 0.05 0.00
−0.05 −0.10 −0.15
0 5 10 15 20
## Ljung-Box test
## data: Residuals from Regression with ARIMA(3,0,0) errors
## Q* = 11.812, df = 6, p-value = 0.0663
## Model df: 4. Total lags used: 10
Residuals from Regression with ARIMA(9,0,0) errors
1.5 1.0 0.5 0.0
0 50 100 150 200
0.10 0.05 0.00
−0.05 −0.10
0 5 10 15 20
−1.0 −0.5 0.0 0.5 1.0 1.5
## Ljung-Box test
## data: Residuals from Regression with ARIMA(9,0,0) errors
## Q* = 8.2944, df = 3, p-value = 0.0403
## Model df: 12. Total lags used: 15
Residuals fr
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com