1 ECMT2130 – 2021 semester 1 final exam solutions
1. (10 points) Briony’s ARMA model identification
Briony has a realisation of a stochastic process, xt, for t = 1, 2, 3, . . . , 1000. The estimated autocorrelation function and partial autocorrelation functions for that realisation are as shown below (the dotted blue lines represent the 95% confidence interval):
(a) (3 points) What order of ARMA model will provide the best approximation to the stochastic process that produced the sample of data?
Copyright By PowCoder代写 加微信 powcoder
(b) (2 points) Specify a reasonable point estimate for each of the AR and MA coefficients in the ARMA model you specified in part A. Base your answer on inspection of the ACF and PACF.
(c) (2 points) Is the ARMA model that you described in part B causal?
(d) (3 points) The sample standard deviation for the realisation of the stochastic process is 1.394. If you were to estimate the model by maximum-likelihood, assuming the shocks have a normal distribution,
and you wanted to choose initial parameter values to use in the numerical optimisation of the log- likelihood function, what would you use as an initial estimate of σ, the standard deviation of the white-noise shocks in the ARMA model? Use the point estimates for the ARMA model from part B in your answer.
(a) The ACF tapers off rather than abruptly cutting out, suggesting that the model has AR structure. The PACF cuts out abruptly at the second lag, indicating that the model has no MA structure and that the AR structure is of order 2. Thus the stochastic process is likely best described by an AR(2).
(b) Given that we are working with an AR(2), we could write the model as: xt = φ1xt−1 + φ2xt−2 + et
The coefficients to estimate are phi1 and phi2. Rough estimates for these can be based upon the PACF. The first lag in the PACF is not significantly different from zero so an estimate of φ1 equal to 0, would be reasonable. The second lag of the PACF is significant and close to 0.7, suggesting that a reasonable estimate of φ2 would be 0.7, given φ1 is zero.
(c) The ACF tapers off rather than remaining high out to very large lags. This suggests that the stochastic process is causal. Also, the rough estimates of the AR coefficients suggest that the roots of the AR polynomial in the lag operator are outside the unit circle, consistent with the process being causal. (Either of those sentences would constitute a sufficient answer to get the marks.)
(d) To estimate an AR(2), if we start with coefficient point estimates φ1 = 0 and φ2 = 0.7, and we are using an AR(2) model, then the variance of xt can be related to the variance of the shocks as:
V ar(xt) = 0.72V ar(xt) + var(et) Rearranging and representing V ar(et) with σ2, we have:
σˆ = (1 − 0.72) × 1.3942 = √0.991 = 1.0 after rounding to 1 decimal place.
2. (10 points) GARCH model
(a) (3 points) Explain why ARCH and GARCH models are able to explain fat tails in the distributions of high-frequency financial rate-of-return data, even if the shocks impacting the system have normal distributions.
(b) (2 points) Write out the variance equation for a GARCH(1,2) model.
(c) (3 points) What is formula for the conditional variance forecast, 3 periods ahead, using the variance equation in the GARCH(1,2) model?
(d) (2 points) In an MA(1) model, a shock only influences the stochastic process for 2 periods; the period in which the shock occurs and the period afterward. Thereafter, the shock has no influence on the value of the stochastic process. This is not the case for an ARCH(1) model where the conditional variance is a function of a constant and the square of the first lag of the shock. Explain why an ARCH(1) will tend to exhibit multiple periods of higher volatility for shocks following a particularly large shock even though the variance equation only includes one lag of the squared shock.
(a) Fat tails will be in evidence if the excess kurtosis for a realisation of financial rate-of-returns on an asset or portfolio of assets is greater than 0. This excess kurtosis measure is based on an assumption that each observation in the sample is drawn from the same distribution. Excess kurtosis is measured relative to the kurtosis of a normal distribution with the same variance as the variance of the sample. If there is volatility clustering, as described by an ARCH or GARCH model, then the various observations in the sample will have different variances, some above the unconditional variance and some below.
When the conditional variance is above the unconditional variance, there will be a greater prob- ability of extreme observations. With a large enough sample, this will lead to there being more observations in the tails of the distribution than would be expected from a normal distribution with the unconditional variance.
The relatively large number of tail observations will cause a finding of positive excess kurtosis, or fat tails, even though the actual shocks are normally distributed.
(b) The GARCH(1,2) conditional variance equation in its general form is: σ2 = ω + α u2 + α u2 + β σ2
t 1 t−1 2 t−2 1 t−1
(c) From model estimation in sample we have an estimate of the conditional variance in the last period of the sample, σˆT . Using the law of iterated expectations, we can replace the squared residuals in the conditional variance equation with their expectation, when they are out of sample. Thus, the conditional variance forecasts 1, 2, and 3 steps ahead are given by:
σˆ 2 = ω + α σˆ 2 + α σˆ 2 + β σˆ 2
σˆ 2 = ω + α σˆ 2 T+2 1 T+1
σˆ2 =ω+α σˆ2 +α σˆ2 T+3 1 T+2 2 T+1
We could use these three equations to substitute out σˆT+2 and σˆT+1 from this equation for σˆ2 but that is not necessary to get the marks.
To make this equation operational, estimates of the conditional variance equation coefficients are required.
(d) An ARCH(1) model includes a variance equation of the form: σ t2 = ω + α 1 u 2t − 1
A particularly large shock, positive or negative, will cause the conditional variance in the following period to be relatively high. That relatively high conditional variance increases the probability of another large shock, immediately following the first large shock. It is that second large shock in a row that will cause the conditional variance to stay high. The increased probability of a large second shock is why periods of high volatility will tend to last several periods instead of just one period in an ARCH(1) model.
+ α σˆ 2 + β σˆ 2
3. (10 points) Unit roots and interest bond yields
Michael has monthly data from 1995 on the Australian 2 and 10 year government bond yields (both are interest rates), expressed in percentages. Before undertaking analysis of the yield curve in Australia, he wants to check whether the 2 government bond yield time series is I(0). Over the sample, both series trends down from around 10% to close to 0%.
He tests for at least one unit root in the natural log of the total return index using an Augmented Dickey Fuller test, allowing for a drift but not a deterministic trend. Her test results are reported below, based on estimation of the regression:
Coefficient
Point estimate 0.0235 -0.0102 0.3464
∆yt =α+βyt−1 +γ∆yt−1 +et Standard error
0.0267 0.0057 0.0519
(a) (5 points) Write up the formal Augmented Dickey-Fuller test. Perform the test at the 5% level of significance, using the corresponding critical value, −3.42.
The ACF for the residuals from this ADF regression is shown below:
(b) (2 points) Describe the information conveyed by this ACF and its implications for the ADF test.
(c) (2 points) He observes that government bond yields for different maturity bonds move closely to- gether so he runs a simple linear regression (with intercept) of the 10-year bond yield on the 2-year bond yield. The slope coefficient has a t-ratio of 82 and the R-squared for the regression is 0.96. Given that both yields exhibit a clear downward trend over the sample, and given the findings from his unit root testing, what should be his primary concern for this regression?
(d) (1 point) Doing an ADF test with the residuals from the regression described in part C, the null hypothesis is rejected the 1% level of significance. Explain the implications of this finding for the concerns raised in part C?
(a) The formal Augmented Dickey-Fuller test:
1. Test the hypothesis: H0: β = 0 against H1: β < 0 at the 5% level of significance.
2. The test statistic, DF , is computed as the t-ratio for β: DF = βˆ/SE(βˆ) = −0.0102/0.0057 = −1.7895.
3. Under the null hypothesis, and given that the sample is large, the test statistic is asymp- totically distributed according to the Augmented Dickey-Fuller distribution.
4. The test is a one-sided lower-tail test. The critical value is −3.42. The decision rule is reject the null hypothesis if the test statistic lies below the critical value, −3.42. There is evidence of weak stationarity. Otherwise, fail to reject the null hypothesis.
5. The computed test statistic lies outside of the rejection region. Thus, we fail to reject the null hypothesis at the 5% level of significance. There is insufficient evidence to warrant concluding that there are no unit roots.
(b) The ACF for the residuals suggests that there is no evidence of remaining autocorrelation in the residuals of the regression used for the ADF test. This increases our confidence that the ADF test statistic does have the asymptotic Dickey-Fuller distribution, under the null hypothesis. That would not be the case if the residuals exhibited significant autocorrelation. In that case additional lagged difference would need to be added to the regression used to produce the test statistic.
(c) The regression of one trending series on another opens up the possibility of spurious regression results. This would cause the coefficient estimates and their standard errors to be biased and inconsistent. If the regression is spurious then the usual t-tests cannot be used to test hypotheses about the coefficients.
(d) If the regression is spurious, then the residuals will also exhibit trending behaviour. In this case they do not. There is sufficient evidence to conclude, using an ADF test, that the residuals are weakly stationary. This suggests that the long-term relationship between the two yields is not spurious.
4. (10 points) Anthony’s damped Holt-Winters model
Anthony has data on yt for periods t = 1, 2, 3, . . . , T and wants to forecast yt several steps ahead.
The first model he considers is a Holt-Winters model without a seasonal component or a trend component. It just uses a level component, with updating equation:
lt = αyt + (1 − α)(lt−1) (1)
(a) (3 points) In terms of the estimates of the level in the last available time period T, what is the formula for the forecast of yT+3?
Anthony notes that the data exhibits periods of upward trending followed by periods of decline so he adds a trend to the model so there are two updating equations.
The level updating equation becomes:
lt =αyt +(1−α)(lt−1 +bt−1) (2)
The slope updating equation is:
bt =β(lt −lt−1)+(1−β)bt−1 (3)
(b) (3 points) In terms of the estimates of the slope in the last available time period T , bT , and the level, lT , in the same period, what is the formula for the forecast of yT +3?
Anthony is concerned that the periods of upward trending and downward trending behaviour mean that forecasts based on a constant trend into the future will quickly become unrealistic as he projects further ahead. This causes him to alter the updating equations yet again, this time including a trend damping parameter φ where 0 < φ < 1.
The level equation becomes:
lt = αyt + (1 − α)(lt−1 + φbt−1) (4) The slope equation becomes (in question variant 2, the symbol for the slope is s rather than b:
bt = β(lt − lt−1) + (1 − β)φbt−1 (5)
(c) (1 point) Using the model with the damping parameter, φ, derive the formula for the forecast of yT +3 in terms of the estimates of the slope, bT , and the level, lT , in the last available time period T.
(d) (1 point) If φ = 0.5, find the lowest value of h such that yˆT +h − yˆT +h−1 < 0.1bT where yˆT +h is the forecast for period T + h.
(e) (2 points) If the true stochastic process for yt is a random walk, yt = yt−1 + et
where et is white noise, which of the three forecasting models would be most appropriate to use for forecasting?
(a) The 3-step ahead forecast would be yˆT +3|T = lT
(b) The 3-step ahead forecast would be yˆT +3|T = lT + 3bT
(c) The 3-step ahead forecast would be yˆT +3|T = lT + (φ + φ2 + φ3)bT . This can be derived as follows:
For 1-step ahead:
For 2-steps ahead:
For 3-steps ahead:
lT+1 =lT +φbT
bT+1 =β(lT+1 −lT)+(1−β)φbT =φbT
yˆT+1|T =lT+1 =lT +φbT
lT+2 = lT+1 +φbT+1 = lT +φbT +φbT+1
bT+2 = β(lT+2 − lT+1) + (1 − β)φbT+1 = φbT+1 = φ2bT
yˆT+2|T =lT+2 =lT +φbT +φ2bT
lT+3 =lT+2 +φbT+2 =lT +φbT +φ2bT +φbT+2
bT+3 = β(lT+3 − lT+2) + (1 − β)φbT+2 = φbT+2 = φ3bT yˆT+3|T =lT+2 =lT +φbT +φ2bT +φ3bT
(d) Extrapolating the forecasting model, yˆT +h − yˆT +h−1 = φhbT . Thus, if this difference is going to be less than 0.1bT , we require φh < 0.1. With phi = 0.7, this implies h must be greater than or equal to 4.
Note that if the student makes an error in the previous part, they should get at least half marks here if they understood how h would need to be determined.
(e) The best forecast for a random walk process is the most recently available value for that process, yT . Therefore the first model, without any trend component, would be most suitable of the three models that were considered.
One mark for understanding the characteristics of the true process that we are trying to match. One mark for understanding which model fits that best.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com