Financial Econometrics and Data Science Volatility Modelling
Dr Ran Tao
8. Volatility and Correlation Modelling
Copyright By PowCoder代写 加微信 powcoder
8.1 Motivation
8.2 Autoregressive Conditionally Heteroskedastic Models
8.3 Generalised ARCH Models (GARCH)
8.4 Extensions of GARCH Models
8.5 An Example of the Application of GARCH Models
8.6 Correlation Modelling
8. Volatility and Correlation Modelling
8.1 Motivation
8.1 Motivation
An Excursion into Non-linearity Land
Motivation: the linear structural (and time series) models cannot explain a number of important features common to much financial data
– leptokurtosis
– volatility clustering or volatility pooling – leverage effects
Our “traditional” structural model could be something like: y = β1 + β2×2 + . . . + βkxkt + u
or more compactly y = Xβ + u.
We also assumed that ut ∼ N(0, σ2).
8.1 Motivation
A Sample Financial Asset Returns Time Series
Daily S&P 500 Returns for August 2003 – August 2013
8.1 Motivation
Models for Volatility
Modelling and forecasting stock market volatility has been the subject of vast empirical and theoretical investigation
There are a number of motivations for this line of inquiry:
– Volatility is one of the most important concepts in finance
– Volatility, as measured by the standard deviation or variance of returns, is often used as a crude measure of the total risk of financial assets
– Many value-at-risk models for measuring market risk require the estimation or forecast of a volatility parameter
– The volatility of stock market prices also enters directly into the Black–Scholes formula for deriving the prices of traded options
We will now examine several volatility models.
8.1 Motivation
Historical Volatility
The simplest model for volatility is the historical estimate
Historical volatility simply involves calculating the variance (or standard deviation) of returns in the usual way over some historical period
This then becomes the volatility forecast for all future periods
Evidence suggests that the use of volatility predicted from more sophisticated time series models will lead to more accurate forecasts and option valuations
Historical volatility is still useful as a benchmark for comparing the forecasting ability of more complex time models
8.1 Motivation
Heteroskedasticity Revisited
An example of a structural model is
yt =β1 +β2x2t +β3x3t +β4x4t +ut
with ut ∼ N(0,σu2).
The assumption that the variance of the errors is constant is
known as homoskedasticity, i.e. Var(ut) = σu2.
What if the variance of the errors is not constant? – Heteroskedasticity
– would imply that standard error estimates could be wrong.
Is the variance of the errors likely to be constant over time? Not for financial data.
8.2 Autoregressive Conditionally Heteroskedastic Models (ARCH)
8.2 Autoregressive Conditionally Heteroskedastic Models (ARCH)
Autoregressive Conditionally Heteroskedastic (ARCH) Models
So use a model which does not assume that the variance is constant.
Recall the definition of the variance of ut:
σt2 =var(ut|ut−1,ut−2,…)=E[(ut −E(ut))2|ut−1,ut−2,…]
We usually assume that E(ut) = 0
soσt2 =var(ut|ut−1,ut−2,…)=Eu2t|ut−1,ut−2,…
What could the current value of the variance of the errors plausibly depend upon?
– Previous squared error terms.
8.2 Autoregressive Conditionally Heteroskedastic Models (ARCH) (Cont’d)
This leads to the autoregressive conditionally heteroscedastic model for the variance of the errors:
σ t2 = α 0 + α 1 u 2t − 1
This is known as an ARCH(1) model
The ARCH model due to Engle (1982) has proved very useful in finance.
The full model would be
yt =β1 +β2x2t +…+βkxkt +ut, ut ∼N0,σt2 where σt2 = α0 + α1u2t−1
8.2 Autoregressive Conditionally Heteroskedastic Models (ARCH) (Cont’d)
We can easily extend this to the general case where the error variance depends on q lags of squared errors:
σ t2 = α 0 + α 1 u 2t − 1 + α 2 u 2t − 2 + · · · + α q u 2t − q
This is an ARCH(q) model.
Instead of calling the variance σt2, in the literature it is usually called ht, so the model is
yt = β1 +β2x2t +…+βkxkt +ut ut ∼N(0,ht) whereht =α0+α1u2t−1+α2u2t−2+···+αqu2t−q
8.2 Autoregressive Conditionally Heteroskedastic Models (ARCH)
Another Way of Writing ARCH Models
For illustration, consider an ARCH(1). Instead of the above, we can write
yt = β1 +β2x2t +…+βkxkt +ut, ut =vtσt σt = α0+α1u2t−1 vt∼N(0,1)
The two are different ways of expressing exactly the same model. The first form is easier to understand while the second form is required for simulating from an ARCH model, for example.
8.2 Autoregressive Conditionally Heteroskedastic Models (ARCH)
Testing for ARCH Effects
1. First, run any postulated linear regression of the form given in the equation above,
yt =β1 +β2x2t +…+βkxkt +ut saving the residuals, uˆt.
2. Then square the residuals, and regress them on q own lags to test for ARCH of order q, i.e. run the regression
uˆ2t =γ0 +γ1uˆ2t−1 +γ2uˆ2t−2 +···+γquˆ2t−q +vt
where vt is iid.
Obtain R2 from this regression
8.2 Autoregressive Conditionally Heteroskedastic Models (ARCH) (Cont’d)
3. The test statistic is defined as TR2 (the number of observations multiplied by the coefficient of multiple correlation) from the last regression, and is distributed as a χ2 (q).
4. The null and alternative hypotheses are
H0 :γ1 =0andγ2 =0andγ3 =0and…andγq =0
H1 :γ1 ̸=0orγ2 ̸=0orγ3 ̸=0or…orγq ̸=0
If the value of the test statistic is greater than the critical value from the χ2 distribution, then reject the null hypothesis.
Note that the ARCH test is also sometimes applied directly to returns instead of the residuals from Stage 1 above.
8.2 Autoregressive Conditionally Heteroskedastic Models (ARCH)
Problems with ARCH (q) Models
How do we decide on q?
The required value of q might be very large
Non-negativity constraints might be violated.
– When we estimate an ARCH model, we require
αi ≥ 0∀i = 0,1,2,…,q (since variance cannot be negative)
A natural extension of an ARCH(q) model which gets around some of these problems is a GARCH model.
8.3 Generalised ARCH Models (GARCH)
8.3 Generalised ARCH Models (GARCH)
Generalised ARCH (GARCH) Models
Bollerslev (1986) generalises the ARCH to allow the conditional variance to be dependent on own lagged values
the variance equation in a GARCH(1,1) is then defined as σ2=α +αu2 +βσ2 (1)
t 0 1t−1 t−1
which is analogous to an ARMA(1,1) model (ARMA for
returns, GARCH for variances) It follows immediately that:
σ2 =α+αu2+βσ2 t−1 0 1t−2 t−2
σ2 =α+αu2+βσ2 t−2 0 1t−3 t−3
8.3 Generalised ARCH Models (GARCH) (Cont’d)
Substituting into (1) for σ2 : t−1
σ2=α +αu2 +β(α +αu2 +βσ2 ) t 0 1t−1 0 1t−2 t−2
= α0 + α1u2t−1 + α0β + α1βu2t−2 + β2σt2−2 Now substituting for σ2
σ2=α +αu2 +αβ+αβu2 +β2(α +αu2 +βσ2 )
t 0 1t−1 0 1t−2 0 1t−3 t−3
= α0 + α1u2t−1 + α0β + α1βu2t−2 + α0β2 + α1β2u2t−3 + β3σt2−3
= α0(1 + β + β2) + α1u2t−1(1 + βL + β2L2) + β3σt2−3 An infinite number of successive substitutions would yield
σt2 =α0(1+β+β2 +…)+α1u2t−1(1+βL+β2L2 +…)+β∞σ02 = α0(1 − β)−1 + α1u2t−1(1 − βL)−1 + β∞σ02
8.3 Generalised ARCH Models (GARCH) (Cont’d)
So the GARCH(1,1) model can be written as an infinite order ARCH model, ARCH(∞)
We can again extend the GARCH(1,1) model to a GARCH( p, q):
σ2 = α +αu2 +αu2 +···+αu2 +βσ2
t 0 1 t−1 2 t−2 q t−q 1 t−1
+ β 2 σ t2− 2 + · · · + β p σ t2− p qp
σ2 =α+αu2 +βσ2
t 0 i t−i j t−j
But in general a GARCH(1,1) model will be sufficient to
capture the volatility clustering in the data. Why is GARCH usually better than ARCH?
8.3 Generalised ARCH Models (GARCH) (Cont’d)
– more parsimonious in parameters, p = q = 1 is sufficient in most cases – avoids overfitting
– less likely to breech non-negativity constraints
8.3 Generalised ARCH Models (GARCH)
The Unconditional Variance under the GARCH Specification
The unconditional variance of ut is given by var(ut) = α0
α1 + β ≥ 1 is termed “non-stationarity” in variance
when α1 + β < 1
α1 + β = 1 would be known as a ‘unit root in variance’, also termed intergrated GARCH
For non-stationarity in variance, the conditional variance forecasts will not converge on their unconditional value as the horizon increases.
8.3 Generalised ARCH Models (GARCH)
Estimation of ARCH / GARCH Models
OLS minimises the RSS which depends only on the parameters in the conditional mean equation, and not the conditional variance.
Since the model is no longer of the usual linear form, we cannot use OLS.
We use another technique known as maximum likelihood.
The method works by finding the most likely values of the
parameters given the actual data.
More specifically, we form a log-likelihood function and maximise it.
The steps involved in actually estimating an ARCH or GARCH model are as follows
8.3 Generalised ARCH Models (GARCH) (Cont’d) 1. Specify the appropriate equations for the mean and the
variance - e.g. an AR(1)- GARCH(1,1) model:
yt = μ+φyt−1+ut,ut∼N0,σt2 σ2 =α+αu2 +βσ2
t 0 1t−1 t−1
2. Specify the log-likelihood function to maximise:
L=−2 log(2π)− 2 logσt2−2 (yt −μ−φyt−1)2/σt2
3. The computer will maximise the function and give parameter values and their standard errors
8.3 Generalised ARCH Models (GARCH)
Parameter Estimation using Maximum Likelihood
Consider the bivariate regression case with homoscedastic errors for simplicity:
yt = β1 + β2xt + ut
Assuming that ut ∼ N(0, σ2), then yt ∼ N(β1 + β2xt, σ2) so that the probability density function for a normally distributed random variable with this mean and variance is given by
2 1 1(yt−β1−β2xt)2 f(yt|β1+β2xt,σ)=σ√2πexp −2 σ2
Successive values of yt would trace out the familiar bell-shaped curve.
Assuming that ut are iid, then yt will also be iid.
8.3 Generalised ARCH Models (GARCH) (Cont’d)
Then the joint pdf for all the y’s can be expressed as a product of the individual density functions
f(y1,y2,...,yT |β1+β2x1,σ2)
=f(y1 |β1 +β2x1,σ2)f(y2 |β1 +β2x2,σ2)... f(yT |β1+β2xT,σ2)
=f(yt |β1 +β2xt,σ2) (3)
Substituting into equation (3) for every yt from equation (2),
f(y1,y2,...,yT |β1 +β2xt,σ2) T
1 1(yt −β1 −β2xt)2 = σT(√2π)T exp −2 t=1 σ2
8.3 Generalised ARCH Models (GARCH) (Cont’d)
The typical situation we have is that the xt and yt are given and we want to estimate β1, β2, σ2. If this is the case, then f(•) is known as the likelihood function, denoted LF(β1, β2, σ2), so we write
T 2 1 1(yt −β1 −β2xt)2
LF(β1,β2,σ)=σT(√2π)Texp −2t=1 σ2
Maximum likelihood estimation involves choosing parameter
values (β1, β2 ,σ2) that maximise this function.
We want to differentiate (5) w.r.t. β1, β2 ,σ2, but (5) is a
product containing T terms.
Since max f (x) = max ln(f (x)), we can take logs of (5). xx
8.3 Generalised ARCH Models (GARCH) (Cont’d)
Then, using the various laws for transforming functions containing logarithms, we obtain the log-likelihood function, LLF:
LLF=−Tlnσ−Tln(2π)−1T (yt−β1−β2xt)2 2 2 t=1 σ2
which is equivalent to
T 2 T 1 T ( y t − β 1 − β 2 x t ) 2 LLF=−2lnσ −2ln(2π)−2t=1 σ2 (6)
8.3 Generalised ARCH Models (GARCH) (Cont’d)
Differentiating (6) w.r.t. β1, β2 ,σ2, we obtain
∂LLF = −1(yt −β1 −β2xt).2.−1 (7)
∂LLF = −1(yt −β1 −β2xt).2.−xt (8)
∂LLF T 1 1(yt −β1 −β2xt)2
∂σ2 = −2σ2+2 σ4 (9)
Setting (7)–(9) to zero to minimise the functions, and putting hats above the parameters to denote the maximum likelihood estimators,
8.3 Generalised ARCH Models (GARCH) (Cont’d)
From (7),
( y t − βˆ 1 − βˆ 2 x t ) = 0
y t − βˆ 1 − βˆ 2 x t = 0 y t − T βˆ 1 − βˆ 2 x t = 0
1 y − βˆ − βˆ 1 x = 0 Tt12Tt
βˆ1 =y ̄−βˆ2x ̄ (10)
8.3 Generalised ARCH Models (GARCH) (Cont’d)
From (8),
( y t − βˆ 1 − βˆ 2 x t ) x t = 0
ytxt−βˆ1xt−βˆ2x2t =0 ytxt−βˆ1xt−βˆ2x2t =0 βˆ2x2t =ytxt−(y ̄−βˆ2x ̄)xt βˆ2x2t =ytxt−Txy+βˆ2Tx ̄2 βˆ2x2t−Tx ̄2=ytxt −Txy
ytxt −Txy
βˆ2 = x2t−Tx ̄2 (11)
8.3 Generalised ARCH Models (GARCH) (Cont’d)
From (9),
Rearranging,
T = 1 ( y t − βˆ 1 − βˆ 2 x t ) 2 σˆ2 σˆ4
σˆ2 = T1(yt−βˆ1−βˆ2xt)2
σˆ2 = T1uˆ2t (12) How do these formulae compare with the OLS estimators?
(10) & (11) are identical to OLS
(12) is different. The OLS estimator was
σˆ2= 1 uˆ2t T−k
8.3 Generalised ARCH Models (GARCH) (Cont’d)
Therefore the ML estimator of the variance of the disturbances is biased, although it is consistent.
But how does this help us in estimating heteroscedastic models?
8.3 Generalised ARCH Models (GARCH)
Estimation of GARCH Models Using Maximum Likelihood
Now we have y
yt = μ + φyt−1 + ut, ut ∼ N0, σt2
σ2=α +αu2 +βσ2
t 0 1t−1 t−1
L = − 2 log(2π) − 2 log σt2 −2 (yt − μ − φyt−1)2/σt2
Unfortunately, the LLF for a model with time-varying variances cannot be maximised analytically, except in the simplest of cases. So a numerical procedure is used to maximise the log-likelihood function.
All methods work by “searching" over the parameter-space until the values of the parameters that maximise the log-likelihood function are found.
8.3 Generalised ARCH Models (GARCH) (Cont’d)
If the LLF has only one maximum with respect to the parameter values, any optimisation method should be able to find it - although some methods will take longer than others.
However, as is often the case with non-linear models such as GARCH, the LLF can have many local maxima, so that different algorithms could find different local maxima of the LLF.
The way we do the optimisation is:
1. Set up LLF.
2. Use regression to get initial guesses for the mean parameters.
3. Choose some initial guesses for the conditional variance
parameters.
4. Specify a convergence criterion - either by criterion or by
8.3 Generalised ARCH Models (GARCH)
To estimate a GARCH(1,1) in R, you can use the command garch() from package fGARCH. Print out your coefficients with the summary() command.
8.3 Generalised ARCH Models (GARCH)
All three GARCH parameters are significant. Also note the diagnostics of the residuals.
8.3 Generalised ARCH Models (GARCH)
For other (more complex) GARCH models, you can use the commands ugarchspec() and ugarchfit() from package rugarch:
8.3 Generalised ARCH Models (GARCH)
Non-Normality and Maximum Likelihood
Recall that the conditional normality assumption for ut is essential.
We can test for normality using the following representation ut = vtσt, vt∼N0,1
σt = α0+α1u2 +βσ2 vt=ut
t−1 t−1 σt
The sample counterpart is
vˆ t = uˆ t
Are the vˆt normal? Typically vˆt are still leptokurtic, although less so than the uˆt. Is this a problem? Not really, as we can use the ML with a robust variance/covariance estimator. ML with robust standard errors is called Quasi- Maximum Likelihood or QML.
8.4 Extensions of GARCH Models
8.4 Extensions of GARCH Models
Extensions to the Basic GARCH Model
Since the GARCH model was developed, a huge number of extensions and variants have been proposed. Three of the most important examples are EGARCH, GJR, and GARCH-M models.
Problems with GARCH(p,q) Models:
– Non-negativity constraints may still be violated
– GARCH mode ls enforce a symmetric response of volatility to
positive and negative shocks (i.e., leverage effects)
Possible solutions: the exponential GARCH (EGARCH) model or the GJR model, which are asymmetric GARCH models.
8.4 Extensions of GARCH Models
The EGARCH Model
Suggested by Nelson (1991). The variance equation is given by
2 2 ut−1 |ut−1| 2 ln σt =ω+βln σt−1 +γ +α −
Advantages of the model
– Since we model the log(σt2), then even if the parameters are
negative, σt2 will be positive.
– We can account for the leverage effect: if the relationship between volatility and returns is negative, γ, will be negative.
σ2 σ2 π t−1 t−1
8.4 Extensions of GARCH Models
The GJR Model
Due to Glosten, Jaganathan and Runkle σ2=α +αu2 +βσ2 +γu2 I
t 0 1 t−1 t−1 whereIt−1 =1ifut−1 <0
= 0 otherwise
For a leverage effect, we would see γ > 0.
We require α1 + γ ≥ 0 and α1 ≥ 0 for non-negativity.
8.4 Extensions of GARCH Models
An Example of the use of a GJR Model
Using monthly S&P 500 returns, December 1979–June 1998
Estimating a GJR model, we obtain the following results.
yt = 0.172 (3.198)
σ2 = 1.243 + 0.015u2
+ 0.498σ2 t−1
+ 0.604u2 I t−1 t−1
(16.372) (0.437) (14.999)
Note that the asymmetry term, γ, has the correct sign and is significant.
Supposed that σ2 = 0.823, and consider uˆt−1 = ±0.5. t−1
– If uˆt−1 = 0.5, this implies that σt2 = 1.65. – If uˆt−1 = −0.5, this implies that σt2 = 1.80.
8.4 Extensions of GARCH Models
News Impact Curves
The news impact curve plots the next period volatility (ht) that would arise from various positive and negative values of ut−1, given an estimated model.
News Impact Curves for S&P 500 Returns using Coefficients from GARCH and GJR Model Estimates:
–1 –0.9 –0.8 –0.7 –0.6 –0.5 –0.4 –0.3 –0.2 –0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Value of lagged shock
Value of conditional variance
8.4 Extensions of GARCH Models
What Use Are GARCH-type Models?
GARCH can model the volatility clustering effect since the conditional variance is autoregressive. Such models can be used to forecast volatility.
In reality, the application of forecasting could be pricing the future value of a call option, which is a function of current value of the underlying, the strike price, the time to maturity, the risk-free interest rate and volatility.
It is possible to use a simple historical average measure as the forecast of future volatility.
But another method would be to use a time series model such as GARCH to compute the volatility forecasts.
8.4 Extensions of GARCH Models
Forecasting Variances using GARCH Models
Producing conditional variance forecasts from GARCH models uses a very similar approach to producing forecasts from ARMA models.
It is again an exercise in iterating with the conditional expectations operator.
Consider the following GARCH(1,1) model:
yt =μ+ut, ut ∼N(0,σt2), σt2 =α0 +α1u2 +βσ2
What is needed is to generate are forecasts of σT+12|ΩT,σT+22|ΩT,…,σT+s2|ΩT whereΩT denotesall information available up to and including observation T .
8.4 Extensions of GARCH Models (Cont’d)
Adding one to each of the time subscripts of the above conditional variance equation, and then two, and then three would yield the following equations
Let σf2 1,T
σT+12 = α0+α1u2T+βσT2
σ 2 =α+αu2 +βσ2
T+2 0 1T+1 T+1 σ 2 =α+αu2 +βσ2
T+3 0 1T+2 T+2
be the one step ahead forecast for σ2 made at time T. This is easy to calculate since, at time T, the values of all the terms on the RHS are known.
σf2 would be obtained by taking the conditional expectation 1,T
of the first equation presented on the previous slide:
σ1,T2 = α0+α1u2T+βσT2
8.4 Extensions of GARCH Models (Cont’d)
Given, σf2 how is σf2 , the 2-step ahead forecast for σ2 1,T 2,T
made at time T, calculated? Taking the conditional expectation of the second equation of the previous slide:
σf2 =α0+α1E(u2T+1|ΩT)+βσf2 2,T 1,T
where E(u2T +1 | ΩT ) is the expectation, made at time T , of u2T+1, which is the squared disturbance term.
We can write
E(u | Ω )2 = σ2 T+1 t T+1
8.4 Extensions of GARCH Models (Cont’d)
but σ2 is not known at time T, so it is replaced with the T+1
forecast for it, σf2 , so that the 2-step ahead forecast 1,T
= α0+α1σf2 +βσf2 1,T 1,T
α0 +(α1 +β)σf2 1,T
By similar arguments, the 3-step ahead forecast will be given by
σf2 =Eα+αu2 +βσ2 3,T T01T+2T+2
= α0 +(α1 +β)σf2 2,T
= α0 +(α1 +β)α0 +(α1 +β)σf2 1,T
= α0 +α0(α1 +β)+(α1 +β)2σf2 1,T
8.4 Extensions of GARCH Models (Cont’d)
Any s-step ahead