Review on ARIMA
Population
Sample/Data
Stochastic Process
Time Series
White Noise
AR(p)
MA(q)
ARMA(p,q)
I(d) (e.g. Random walk is I(1) process)
Markov switching process
Theoretical behaviors: mean, var, ACF, PACF, roots
1. Empirical features:
stationarity?
How to estimate mean, var, ACF, PACF
2. Modelestimation: OLS, MLE, Yule- Walker
3. Diagnosticchecking: Q-test, normality
4. Forecast:1-stepand h-step
->var of forecast errors increase
-> ADF test -> visual inspection -> model selection
-> residuals
ARIMA(p,d,q)
Outline
ARCH/GARCH Process.
Estimation of the GARCH Models. Variants of the GARCH Model.
Realized volatility
Readings
SDA chapter 18 FTS chapter 3 and 5 SFM chapter 12
Estimating conditional means and variances
We care volatility because expected asset returns are related to volatility (risk- return), we want to understand how to generate volatility and predict future volatility to estimate current asset prices.
Consider regression modeling with a constant conditional variance, 𝑉𝑎𝑟𝑌𝑋 ,…,𝑋 =𝜎2.
The general form for the regression of 𝑌 on 𝑋 ,…,𝑋 is
𝑡 1,𝑡 𝑝,𝑡 𝜖
𝑝,𝑡
𝑌=𝑓𝑋,…,𝑋+𝜖 𝑡 1,𝑡 𝑝,𝑡 𝑡
is independent of 𝑋 , … , 𝑋 and has expectation of 0 and a constant
𝑡 1,𝑡
where 𝜖
𝑡
1,𝑡 𝑝,𝑡
conditional variance 𝜎2. 𝜖
Implied volatility
Implied volatility of the underlying asset can be inferred from observed option prices. Example: Consider a three-month European call option on a non-dividend paying stock with the following characteristics: c = $1.875, S=$21, X=$20, r=10%.
Given c, S, X, r, and T, we can infer σ from the Black-Scholes option pricing formula. It is in fact equal to 23.5%.
𝐶𝑆,𝑡 =𝑁𝑑1 𝑆−𝑁𝑑2 𝑋𝑒−𝑟𝑇−𝑡 1 𝑆 𝜎2
𝑑1 = 𝜎 𝑇 − 𝑡 ln 𝑋 + 𝑟 + 2 𝑇 − 𝑡 𝑑2=𝑑1−𝜎 𝑇−𝑡
where 𝑁(⋅) is the cumulative distribution function of the standard normal distribution, 𝑇 − 𝑡 is the time to maturity, S is the spot price of the underlying asset, X is the strike price, r is the risk free rate and 𝜎 is the volatility of returns of the underlying asset.
Volatility smile: for a given expiration, options whose strike price differs substantially from the underlying asset’s price have higher implied volatilities than that of at the money. These options are said to be either deep in-the-money or out-of-the-money.
The market prices of different options on the same asset are often used to calculate a volatility term structure. This is the relationship between the implied volatilities of the options and their maturities.
Estimating volatility from historical data
While traders use implied volatilities extensively, risk management is largely based on historical volatilities.
n number of observations
𝑆𝑖 stock price at the end of interval i 𝜏 length of time interval
Let the return during the i-th interval be
𝑠= 1 σ𝑛 (ri−𝑟ҧ)2 𝑛−1 𝑖=1
where𝑟ҧ isthemeanof𝑟.sisan 𝑖
estimate of 𝜎 𝜏 and 𝜎 can be estimated
using s as follows: 𝜎 = 𝑠 . So, daily volatility equals 1.216%. What is
𝜏 the annual volatility? Assume that one year consists of 252 trading days.
1.216 × 252 = 19.3%
𝑟 = ln 𝑆𝑖 𝑖 𝑆𝑖−1
. The estimates of the standard deviation of 𝑢𝑖 is
Conditional heteroscedasticity
Most popular option pricing models, such as BlackScholes, assume that the volatility of the underlying asset is constant. This assumption is far from perfect. In practice, the volatility of an asset, like the asset’s price, is a stochastic variable. Unlike the asset price, it is not directly observable.
A widely observed phenomenon in finance, the so-called volatility clustering, refers to the tendency of large changes in asset prices to be followed by large changes and small changes to be followed by small changes.
–reported by Mandelbrot in 1963.
Conditional variance is serially correlated. This is called conditional heteroscedasticity.
Estimate time-dependent conditional
variances
Let 𝜎2(𝑋 ,…,𝑋 ) be the conditional variance of 𝑌 given 𝑋 ,…,𝑋
.
where 𝑧 has conditional (given 𝑋 , … , 𝑋 ) mean equal to 0 and conditional
1,𝑡 𝑝,𝑡
𝑡 1,𝑡 𝑝,𝑡
𝑌=𝑓𝑋 ,…,𝑋 +𝜎(𝑋 ,…,𝑋 )𝑧 𝑡 1,𝑡 𝑝,𝑡 1,𝑡 𝑝,𝑡 𝑡
𝑡 1,𝑡 𝑝,𝑡
variance equal to 1.The volatility function 𝜎(𝑋 1,𝑡
, … , 𝑋 ) 𝑝,𝑡
should be nonnegative
since it is a standard deviation.
Rolling window technique 𝜎2 = 1 σ𝑀 r
𝑡 𝑀𝑖=1t−i
Recursive:𝜎2= 1 σ𝑡−1r 2 𝑡 𝑡−1 𝑖=1 t−i
2
Weighting scheme: The formula above gives equal weights to each past
return. Sometimes risk managers like to give more weight to more recent data,
so they calculate volatility as: 𝜎2 = σ𝑡 𝛼 r 2 , where 𝛼 are the weights. 𝑡 𝑖=1 𝑖 t−i 𝑖
The Autoregressive conditional heteroskedasticity (ARCH) model and its generalization, the generalized autoregressive conditional heteroskedasticity (GARCH) model, provide a convenient framework to model volatility clustering.
ARCH(1) Process
ConsideranARCH(1)modelfor𝑦 =𝑓 𝑋 ,…,𝑋 +𝜖 : 𝑡 1,𝑡 𝑝,𝑡 𝑡
𝜖 = 𝜎 𝑧 , 𝑧 ~ 𝐼𝐼𝐷 𝑁(0,1) 𝑡𝑡𝑡𝑡
𝜎2=𝑎 +𝑎𝜖2
𝑡 0 1𝑡−1
where 𝜖 can be understood as demeaned returns, 𝜎 denotes volatility – 𝑡𝑡
conditional standard deviation.
It is also popular to use h𝑡, denoting the conditional variance of 𝜖𝑡, in ARCH models:
𝜖𝑡= h𝑡𝑧𝑡,𝑧𝑡~𝐼𝐼𝐷𝑁(0,1)
h =𝑎 +𝑎𝜖2
𝑡 0 1𝑡−1
Let I𝑡 denote the filtration information until time 𝑡: 𝐸 𝜖2 I = 𝐸 h 𝑧2 I = h 𝐸 𝑧2 I
= h 𝐸 𝑧2 𝑡 𝑡
= h 𝑡
𝑡 𝑡−1 𝑡𝑡 𝑡−1 𝑡 𝑡 𝑡−1 Thus,wecanwrite𝜖𝑡|I𝑡−1~𝑁0,h𝑡 .
The Gaussian assumption for 𝑧𝑡 is not critical. We can relax it and allow for heavy-tailed distributions, such as the Student’s t-distribution, as is typically required in finance.
The parameters in the conditional variance equation should satisfy 𝑎0 > 0 and 𝑎1 > 0.
ARCH(1) captures the effect that a large value in 𝜖𝑡 leads to a larger variance (volatility) in the following period.
Properties of ARCH(1) Process
𝜖𝑡 = h𝑡 𝑧𝑡 is a nonlinear function with the following properties: 1. The unconditional mean of 𝜖𝑡 is zero, since
𝐸𝜖𝑡 =𝐸𝐸𝜖𝑡I𝑡−1 =𝐸 𝐸 h𝑡𝑧𝑡I𝑡−1 =E h𝑡𝐸𝑧𝑡 =0
2. The conditional variance of 𝜖𝑡 is
𝐸𝜖2I =𝐸h𝑧2I =h𝐸𝑧2I =h =𝑎 +𝑎𝜖2
𝑡𝑡−1 𝑡𝑡𝑡−1𝑡𝑡𝑡−1𝑡01𝑡−1
3. The unconditional variance of 𝜖𝑡 is obtained as
𝑉𝑎𝑟𝜖 =𝐸𝜖2 =𝐸𝐸𝜖2I =𝐸𝑎+𝑎𝜖2
𝑡 𝑡 𝑡𝑡−1 01𝑡−1
=𝑎+𝑎𝐸(𝜖2 )= 𝑎0
0 1 𝑡−1 1−𝑎1
Itimplies𝑎0 >0and0<𝑎1 <1. 4. The kurtosis of 𝜖 , 𝐾 , (heavy tails)
𝑡𝜖
𝐸 𝜖4 𝐸 𝜖4 1−𝑎2
𝐾= 𝑡 =𝑡=3 1>3,𝑖𝑓1−3𝑎2>0
𝜖 𝑉𝑎𝑟𝜖𝑡2 𝐸𝜖22 1−3𝑎2 1 𝑡1
= 𝑬 𝑬 𝝐𝟒 𝕴 = 𝑬 𝑬 𝒉𝟐𝒛𝟒 𝕴 = 𝑬 𝒛𝟒 𝑬 𝒂 + 𝒂 𝝐𝟐 𝟐 = 𝟑 𝒂𝟐 + 𝟐𝒂 𝒂 𝑬 𝝐𝟐 + 𝒂𝟐𝑬 𝝐𝟒
E 𝝐𝟒
𝒕 𝒕𝒕−𝟏 𝒕𝒕𝒕−𝟏 𝒕𝟎𝟏𝒕−𝟏 𝟎𝟎𝟏𝒕−𝟏𝟏𝒕−𝟏
Properties (1) through (4) also hold for higher order ARCH models.
Example: A simulated ARCH(1) process
(a) White noise: 𝑧 (b) conditional std dev: h (𝜎 ) 𝑡𝑡𝑡
(c) ARCH: 𝜖𝑡 = h𝑡𝑧𝑡 (d) AR/ARCH: 𝑦𝑡 = 𝑓 𝑦𝑡−1 + 𝜖𝑡 R: 6_SimulatedARCH(1).R
ARCH(q) Process
ConsideranARCH(q)modelfor𝑦 =𝑓 𝑋 ,…,𝑋 +𝜖 : 𝑡 1,𝑡 𝑝,𝑡 𝑡
𝜖𝑡 = h𝑡 𝑧𝑡, 𝑧𝑡 ~𝐼𝐼𝐷 𝑁 0,1 𝑞
h =𝑎 +𝑎𝜖2
𝑡 0 𝑖𝑡−𝑖
𝑖=1
The variable h𝑡 is the conditional variance of 𝜖𝑡 .
Let I𝑡 denote the filtration information until time 𝑡. Then
𝐸 𝜖2 I = 𝐸 h 𝑧2 I = h 𝐸 𝑧2 I = h 𝑡𝑡−1 𝑡𝑡𝑡−1𝑡𝑡𝑡−1𝑡
Thus, we can write 𝜖𝑡|I𝑡−1 ~ 𝑁 0, h𝑡 .
The parameters in the conditional variance equation should satisfy 𝑎0 > 0 and
𝑎𝑖 >0for𝑖=1,2,…,𝑞.
The Gaussian assumption for 𝑧𝑡 is not critical. We can relax it and allow for
more heavy-tailed distributions, such as the Student’s t-distribution, as is typically required in finance.
Properties of ARCH(q) Process
The unconditional variance is 𝑣𝑎𝑟 𝜖𝑡 = 𝑎0 . Necessary and sufficient 1−𝑎1−⋯−𝑎𝑞
condition for weak stationarity of ARCH(q) process is 0 < σ 𝑎 < 1. 𝑗𝑗
ARCH models possess some drawbacks in practical applications:
It might be difficult to determine the order of q, the number of lags of the squared residuals in the model.
Due to the structure of the model, only the recent squared 𝜖𝑡−𝑖 affect the current volatility, h𝑡 , i.e. the impact of a large shock lasts only for q periods. This may be unrealistic.
Volatility clustering is ignored.
GARCH(1,1) Model
The GARCH(1,1) model is defined as 𝜖𝑡 = h𝑡𝑧𝑡
h =𝑎 +𝑎𝜖2 +𝑏h
𝑡 0 1 𝑡−1 1 𝑡−1
where 𝑎0 ≥ 0, 𝑏1 ≥ 0, 𝑎1 ≥ 0.
GARCH model allows the conditional variance to be modelled by past values of
itself in addition to the past shocks. We obtain for 𝑡 ≥ 2
𝑡
h =h ෑ𝑏 +𝑎𝜖2 +𝑎 𝑡 0 1 1𝑡−𝑖 0
𝑖=1
𝑡−1 𝑘
1+ෑ𝑏 +𝑎𝜖2
1 1𝑡−𝑖
𝑘=1 𝑖=1
GARCH(1,1) is equivalent to a higher order ARCH process. To define the model, we need the definition of the probability measure for the starting value h0 or the assumption that the system extends infinitely far into the past.
The start of the system at time 0 requires that h0 > 0 and finite with probability one, and h0 and 𝜖0 are independent. The obtained model h𝑡,𝜖𝑡 𝑡=0 is the conditional model.
Properties of the GARCH(1,1) Model
𝜖𝑡 is a nonlinear function with the following properties: The conditional variance of 𝜖𝑡 is
𝐸 𝜖2 I = 𝐸 h 𝑧2 I = h 𝐸 𝑧2 I = h = 𝑎 + 𝑎 𝜖2
𝑡 𝑡−1 𝑡𝑡 𝑡−1 𝑡 𝑡 𝑡−1 𝑡 0 1𝑡−1
+ 𝑏 h 1𝑡−1
The unconditional variance of 𝜖𝑡 is 𝑉𝑎𝑟𝜖=𝐸𝜖2=𝐸𝐸𝜖2I =𝐸𝑎+𝑎𝜖2 +𝑏h
𝑡 𝑡 𝑡𝑡−1 01𝑡−11𝑡−1
𝑎0
0 1 𝑡−1 1 𝑡−1 0 1 𝑡−1 1 𝑡−1 1−𝑎1−𝑏1
Weak stationarity: 𝑎0 > 0 and (𝑎1 + 𝑏1) < 1.
Volatility clustering.
Heavy tails: if 1 − 2𝑎2 − 𝑎 + 𝑏 2 > 0, then the kurtosis of 𝜖 , 𝐾 , 414112𝑡𝜖
=𝑎 +𝑎𝐸(𝜖2 )+𝑏𝐸h =𝑎 +𝑎𝐸(𝜖2 )+𝑏𝐸(𝜖2 )=
𝐸 𝜖𝑡 = 𝐸 𝜖𝑡 = 3 1 − 𝑎1 + 𝑏1 > 3
𝑉𝑎𝑟𝜖𝑡 2 𝐸𝜖2 2 1−2𝑎2− 𝑎1+𝑏1 2 𝑡1
.
𝐾 = 𝜖
GARCH(p,q) Process
We consider the GARCH(p, q) process for the time series 𝜖𝑡 𝜖𝑡 = h𝑡 𝑧𝑡,𝑧𝑡 ~𝐼𝐼𝐷 𝑁(0,1)
𝑞𝑝
h =𝑎 +𝑎𝜖2 +𝑏h
𝑡 0 𝑖 𝑡−𝑖 𝑗 𝑡−𝑗
𝑖=1 𝑗=1
We assume that the parameters (𝑎0, 𝑎1, … , 𝑎𝑞, 𝑏1, … , 𝑏𝑝) are restricted such that
h > 0 for all 𝑡, which is ensured when 𝑎 > 0, 𝑎 ≥ 0 for 𝑖 = 1, 2, … , 𝑞, and 𝑏 ≥ 0
𝑡0𝑖𝑗 for 𝑗 = 1, 2, … , 𝑝 . We also assume that the fourth moment of 𝜖𝑡 exists.
Weakstationarity:𝑎 > 0andσ𝑞 𝑎 +σ𝑝 𝑏 <1. 0 𝑖=1 𝑖 𝑗=1 𝑗
GARCH(1,1) and ARMA(1,1)
Let us consider the simple 𝐺𝐴𝑅𝐶𝐻(1,1) model:
𝜖𝑡= h𝑡𝑧𝑡,𝑧𝑡~𝑁(0,1)
h =𝑎 +𝑎𝜖2 +𝑏h
𝑡 0 1 𝑡−1 1 𝑡−1
The GARCH process 𝜖2 has the ARMA(1,1) representation 𝑡
𝜖2=𝑎+𝑎+𝑏𝜖2 +𝑢−𝑏𝑢 , 2 𝑡 0 1 1𝑡−1 𝑡 1𝑡−1
where 𝑢𝑡 = h𝑡 𝑧𝑡 − 1 is white noise: 𝐸𝑢 =𝐸h𝐸𝑧2−1=0
𝑡𝑡𝑡
𝑣𝑎𝑟𝑢 =𝐸𝑢2 =𝐸h2 𝐸 𝑧2−12 =𝐸𝑧4−2𝑧2+1𝐸(𝑎2+𝑎2𝜖4 + 𝑡2𝑡𝑡𝑡 𝑡𝑡01𝑡−1
2𝑎0𝑎1𝜖𝑡−1) = 𝑐𝑜𝑛𝑠𝑡
𝑐𝑜𝑣 𝑢 , 𝑢 = 𝐸 h (𝑧2 − 1 h (𝑧2 − 1)) 𝑡 𝑡+𝑠 𝑡 𝑡 2𝑡+𝑠 𝑡+𝑠
2
=𝐸h𝑡(𝑧𝑡 −1h𝑡+𝑠)𝐸𝑧𝑡+𝑠−1 =0
Plug in the GARCH process, we have
𝜖2 = h 𝑧2 = 𝑢 + h 𝑡𝑡𝑡2𝑡𝑡
= 𝑎0 + 𝑎1𝜖𝑡−1 + 𝑏1h𝑡−1 + 𝑢𝑡
=𝑎 + 𝑎 +𝑏 𝜖2 +𝑏 h −𝜖2 +𝑢 0 1 1 𝑡−1 1 𝑡−1 𝑡−1 𝑡
=𝑎 +(𝑎 +𝑏)𝜖2 +𝑏(h −h 𝑧2 )+𝑢 0 1 1 𝑡−1 1 𝑡−1 𝑡−1 𝑡−1 𝑡
GARCH process and ARMA representation
Similarly, rewriting the equation in GARCH(p,q) for h𝑡 , we obtain an ARMA(r, p) representation for 𝜖2:
𝑟𝑝
𝑡
𝑖=1 𝑗=1
where 𝑟 = max(𝑝, 𝑞). {𝑢𝑡} is a martingale difference series. In compact
representation, we can write the ARMA representation as
Φ𝐿𝜖2=𝑎 +𝑏𝐿𝑢 𝑡0𝑡
𝑟𝑖
where L is the lag operator, Φ 𝐿 = 1 − σ𝑖=0 Φ𝑖𝐿 . Φ𝑖 = 𝑎𝑖 + 𝑏𝑖, 𝑟 = max(𝑝, 𝑞),
𝑝𝑖 𝑏𝐿 =1−σ𝑖=1𝑏𝑖𝐿.
𝜖2=𝑎+(𝑎+𝑏)𝜖2 +𝑢−𝑏𝑢 𝑡0 𝑖𝑖𝑡−𝑖𝑡 𝑗𝑡−𝑗
Sources of excess kurtosis
Sources of excess kurtosis:
Bai, Russel, and Tiao consider the ARMA(r, q) representation of the GARCH(p, q) process and analyse the relationship between
1. the excess kurtosis of 𝜖 , called the overall kurtosis and denoted by 𝐾
if it
t𝜖 2. theexcesskurtosisof𝑧,calledtheIIDkurtosisanddenotedby𝐾,
exists and
𝑡𝑧
3. the excess kurtosis of the normal GARCH process, called the GARCH
kurtosis and denoted by 𝐾(𝑔) if it exists. 𝜖
Sources of excess kurtosis
Their results show that if 𝜖t follows the GARCH(p, q) process and satisfies the
additional two assumptions (that 𝑢𝑡's are uncorrelated with zero mean and finite
variance and that {𝜖2} process is weakly stationary), the following holds: 𝑡
6𝑘1 ,𝑤h𝑒𝑟𝑒 𝑘 = Ψ2 𝑎𝑛𝑑 Ψ 𝐿 = 𝑏(𝐿)/Φ(𝐿) 𝜖 1−2𝑘1 1 𝑖
𝑖
𝐾(𝑔) + 𝐾 + 5/6 𝐾 𝑔 𝐾 𝜖𝑧𝜖𝑧
1 − (1/6)𝐾 𝑔 𝐾
𝐾(𝑔) =
𝐾= 𝜖
For a normal GARCH(1,1) model, the first expression reduces to 6𝑎2
𝐾 = 𝐾(𝑔) = 1
𝜖𝜖
1− 𝑎 +𝑏 2−2𝑎2 111
𝜖
𝑧
The normal (Gaussian) GARCH(1,1) model is not capable of matching the large leptokurtosis typically found in the data.
A non-normal GARCH model fits well the time-varying volatility relation and matches the sample kurtosis much better.
ARIMA(𝑝 ,𝑑,𝑞 )/GARCH(𝑝 ,𝑞 )Models 𝐴𝐴𝐺𝐺
(a) white noise (b) conditional std dev (c) ARCH (d) AR/GARCH
0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 tttt
(e) GARCH
(f) GARCH squared
(g) AR/GARCH
(h) AR/GARCH squared
0 5 10 15 20 Lag
0 5 10 15 20 Lag
0 5 10 15 20
Lag Lag
(a) White noise: 𝑧 (b) conditional std dev: 𝑡𝑡𝑡
(c) GARCH: 𝜖𝑡 = h𝑡𝑧𝑡 (d) AR/GARCH: 𝑦𝑡 = 𝑓 𝑦𝑡−1 + 𝜖𝑡 R: 6_SimulatedARCH(1).R
0 5 10 15 20
R: 6_SimulatedGARCH(1,1).R
h (𝜎 )
ACF
-0.2 0.0 0.2 0.4 0.6 0.8 1.0 -2 -1 0
1
2
ACF -0.2 0.0 0.2 0.4
t
ACF -0.4 -0.2 0.0 0.2
0.4
0.6
0.8
a 1.0 -5 0
5
ACF -0.2 0.0 0.2 0.4
0.6 0.8 1.0
u
-15 -10 -5 0 5 10
0.6
0.8
1.0 3.25 3.30
3.35
3.40
Estimation of the GARCH Models
The conditional variance is an unobserved variable, which must itself be explicitly estimated along with the parameters of the model.
Engle suggested two possible methods for estimating the parameters in model, namely the least squares estimator (LSE) and the maximum likelihood estimator (MLE). The LSE is given as
where𝜽=(𝑎 ,𝑎 ,...,𝑎 )′and𝝐 =(1,𝜖2,...,𝜖2 01 𝑝 𝒕 𝑡 𝑡−𝑝+1
𝑇 −1𝑇 𝟐
𝜽 = 𝝐𝒕−𝟏𝝐′𝒕−𝟏 𝑡=2
𝝐𝒕−𝟏𝝐𝒕 𝑡=2
The conditional likelihood, 𝑙𝑡 , of 𝜖t is
𝑡 2𝜋h𝑡 2h𝑡 𝑡
). 𝑙= 1 exp−1𝜖2
By iterating the conditional argument, we obtain
𝑓 𝜖𝑇,...,𝜖1 𝜖0 =𝑓 𝜖𝑇 𝜖𝑇−1,...,𝜖0 ...𝑓 𝜖2 𝜖1,𝜖0 𝑓 𝜖1 𝜖0 𝑇
=ෑ𝑓 𝜖𝑡 I𝑡−1 𝑡=1
Maximum Likelihood Estimation
The joint likelihood of the entire sample of T observations is 𝑇
𝐿 = ෑ 𝑙𝑡 𝑡=1
and for the log likelihood we obtain
𝑇 log𝑓(𝜖𝑡,...,𝜖1|𝜖0) = log𝑓 𝜖𝑡 I𝑡−1
𝑇𝑡=1 𝑇2
=−𝑇log2𝜋+−1logh𝑡 −1𝜖𝑡 2 𝑡=1 2 2 𝑡=1 h𝑡
The conditional log-likelihood function can thus be written as
1 𝑇 𝜖2 𝜃 𝐿𝜃 =− log2𝜋+logh𝑡 𝜃 + 𝑡
2𝑡=1 h𝑡 𝜃
where 𝜃 = (𝑎0, 𝑎1, ... , 𝑎𝑞, 𝑏1, ... , 𝑏𝑝). The value of 𝜃 which maximizes 𝐿 𝜃 to as maximum likelihood estimates or MLEs.
is referred
Newton-Raphson
The MLE of a GARCH model is obtained by numerically maximizing the log- likelihood function using iterative optimization methods like Gauss-Newton or Newton-Raphson.
Starting values for the parameters in 𝜃 and initialization of the two series 𝜖2 and h
need to be specified for iterative ML optimization.
The usual default solution to parameter initializations is to set parameter values in the mean equal to those estimated using a “first pass" LS estimation, and the parameters (except 𝑎0) in the conditional-variance equation to zero.
Differences in parameter initialization of the packages may yield different results.
When the innovation process is not normal or the conditional distribution is not perfectly known, one may still use Gaussian MLE methods due to the property of asymptotic parameter efficiency. Such estimates are known as pseudo- or quasi- MLE (PMLE or QMLE).
𝑡𝑡
Example: AR(1)/GARCH(1,1) fit to BMW returns
Consider the BMW daily log returns. An AR(1)/GARCH(1,1) model was fit to these returns using R's garchFit function in the fGarch package. Although garchFit allows the white noise to have a non-Gaussian distribution, in this example we specified Gaussian white noise (the default).
Thefittedmodelis 𝑟 −4.0092𝑒−04 =0.098596 𝑟 −4.0092𝑒−04 +𝜖 𝑡 𝑡−1 𝑡
𝜖𝑡 = h𝑡𝑧𝑡, 𝑧𝑡 ∼ 𝑁(0,1)
h =8.9043𝑒−06+0.10210𝜖2 +0.85944h
𝑡 𝑡−1 𝑡−1
Data: 6_bmw.txt
R: 6_BMWReturnsFit.R
Example: AR(1)/GARCH(1,1) fit to BMW returns
In the output, 𝜙 is denoted by ar1. The mean is mu, and omega 𝜔 is the intercept of the GARCH process.
Note that 𝜙 = 0.0986 and is statistically significant, implying that there is a small
መ amount of positive autocorrelation. Both 𝛼 and 𝛽 are highly significant and 𝛽 =
111 0.859, which implies rather persistent volatility clustering.
There are two additional information criteria reported, SIC (Schwarz's information criterion) and HQIC (Hannan-Quinn information criterion). These are less widely used compared to AIC and BIC and will not be discussed here.
Diagnostic checking
We need to check the adequacy of the fitted model.
The part of the data unexplained by the model (i.e., the residuals 𝜖Ƹ and
studendized residuals 𝑧Ƹ = 𝜖Ƹ / h ) should be small and not exhibit any
𝑡
𝑡𝑡𝑡
systematic or predictable patterns.
Ljung and Box test for residuals and studendized residuals
𝑚
𝑄 = 𝑇 𝑇 + 2 𝑇 − 𝑘 − 1 𝜌ො 2 𝜖,𝑘
𝑘=1
and McLeod and Li test the joint hypothesis on the ACF of the squared residuals and studendized residuals:
𝐻 ∶𝜌2 =𝜌2 =⋯=𝜌2 0𝜖,1𝜖,2 𝜖,𝑚
by performing a Q test on the squared residuals: 𝑚
=0
𝑄2 =𝑇(𝑇+2) 𝑇−𝑘 −1𝜌ො22
𝜖 ,𝑘
𝑘=1
Under the null hypothesis of no autocorrelation, 𝑄2 has a 𝜒2 distribution with 𝑚 − 𝑝 − 𝑞 degrees of freedom.
Example: BMW returns
The Ljung-Box tests with an R in the second column are applied to the residuals (here R = residuals, not the R software), while the Ljung-Box tests with R^2 are applied to the squared residuals.
Stationarity of ARMA-GARCH models
We have to capture the conditional mean of the data with an adequate model so that the residuals obtained from this model satisfy the assumptions for the white- noise sequence {𝜖𝑡} which enters the conditional variance.
For asset returns, the conditional mean is typically captured by an AR or ARMA
model:
𝑝𝑞
𝑦𝑡 =𝑎𝑖𝑦𝑡−𝑖 +𝑏𝑖𝜖𝑡−𝑖 +𝜖𝑡
𝑖=1 𝑖=1 𝑟𝑠
𝜖= h𝑧,h=𝛼+𝛼𝜖2 +𝛽h
𝑡 𝑡𝑡𝑡 0 𝑖𝑡−𝑖 𝑗𝑡−𝑖
𝑖=1 𝑖=1
If all the roots of AR polynomial lie outside the unit circle, the ARMA-GARCH process 𝑦𝑡 is strictly stationary if 𝜖𝑡 is strictly stationary.
The parameters of an ARMA-GARCH model can be jointly estimated via MLE. Alternatively, a two-step procedure can be adopted:
1. Estimate the parameters of the conditional mean in the first equation
2. From residuals of the second equation, estimate the parameters of the GARCH model
Variants of the GARCH Model
Empirically relevant variants of the GARCH models are able to accommodate particular features of asset return series.
Conditional non-normality of the error process can better explain the leptokurtosis of the return series. For example, daily percentage change in exchange rates based on 12 different rates for a 10-year period. The “true” distribution has heavier tails than the normal distribution: Extreme events are more common than predicted by the normal distribution.
Asymmetric responses to negative and positive return innovations to model the asymmetry in the reaction of conditional volatility to the arrival of different news.
Long-memory, i.e., variances generated by fractionally integrated processes.
GARCH Model with Student’s t-distributed Innovations
The shortcoming of the GARCH model with Gaussian innovations is that the assumption of conditional normality for {𝜖𝑡} usually does not hold.
The error term or residual 𝜖𝑡 is conditionally normal if the standardized residual
𝜖Ƹ 𝑡
h𝑡
is normally distributed.
The GARCH model is only able to capture partially the leptokurtosis in the
unconditional distribution of asset returns.
We can estimate a GARCH model with the Student's t-distribution or generalized exponential distribution (GED).
𝑧Ƹ𝑡 =
GARCH Model with Student’s t-distributed Innovations
The Student's t-distribution or the standardized t(d) distribution has only one parameter, d, and its density is
𝑓 𝑥;𝑑= 𝑡(𝑑)
It is a power function of the random variable 𝑥 (allows the standardized t(d) distribution to have fatter tails).
This distribution is symmetric around zero. The mean, variance, skewness, and excess kurtosis are 0, 1, 0, and 6/(𝑑 − 4), respectively.
When using the assumption that 𝜖𝑡 ~ 𝑡(𝑑) in the GARCH model, the estimation can be done by quasi maximum likelihood estimation.
𝛤 (𝑑+1)/2 1+𝑥2 𝛤𝑑/2 𝜋𝑑−2
−(1+𝑑)/2 𝑑−2
Example: ARMA(1)/GARCH(1,1) fit to BMW returns with Student-t distribution
The fitted model is:
𝑟 −1.736𝑒−04 𝑡
=0.2987 𝑟 𝑡−1
𝜖𝑡 = h𝑡𝑧𝑡,
−1.736𝑒−04 +𝜖 +0.3690𝜖
𝑡 𝑡−1
𝑧𝑡 ∼ 𝑡 4.046
h =6.052𝑒−06+0.09292𝜖2
𝑡 𝑡−1
+0.8869h 𝑡−1
Example: AR(1)/GARCH(1,1) fit to BMW returns
The Jarque-Bera test of normality strongly rejects the null hypothesis that the innovation process is Gaussian.
Data: 6_bmw.txt
R: 6_BMWReturnsFit.R
Asymmetry in returns (skewness)
Traders react more strongly to negative information than to positive information which leads to asymmetry in rates of return.
Rates of return are slightly negatively skewed.
Left-Skewed (negatively skewed)
To accommodate asymmetric distributions, we may use the generalized hyperbolic distribution that is able to accommodate both fat tails and asymmetry.
Mean Median
Leverage effect: Exponential GARCH model
Since future return volatility tends to respond asymmetrically with respect to negative or positive shocks, the symmetric GARCH model is not appropriate.
To overcome this limitation, a nonlinear exponential GARCH (EGARCH) specification was proposed by Nelson.
The conditional variance h𝑡 is specified as 𝑞𝑝
logh𝑡 =𝑎0+𝑎𝑖𝑔(𝑧𝑡−𝑖)+𝑏𝑖log(h𝑡−𝑖) 𝑖=1 𝑖=1
where𝜖𝑡= h𝑡𝑧𝑡and𝑔𝑧𝑡 =𝜃𝑧𝑡+𝛾[𝑧𝑡 −𝐸𝑧𝑡]aretheweightedinnovations that model asymmetric effects between positive and negative asset returns.
The function 𝑔(. ) can be rewritten as
𝑔𝑧𝑡 =ቊ𝜃+𝛾𝑧𝑡−𝛾𝐸𝑧𝑡 𝑖𝑓𝑧𝑡≥0
𝜃−𝛾𝑧𝑡−𝛾𝐸𝑧𝑡 𝑖𝑓𝑧𝑡<0
If 𝛾 < 0 a positive return shock or surprise will increase volatility less than a negative
one of the same magnitude. This phenomenon is referred to as the leverage effect.
Moreover, EGARCH relaxes the constraint of positive GARCH coefficients and guarantees the estimated volatility to be non-negative.
Exponential GARCH Model
For a standard Gaussian random variable 𝑧𝑡, we have 𝐸 standardized Student's t-distribution we have
𝑑Γ 0.5 𝑑 − 1 𝑑Γ 0.5𝑑
𝑧𝑡 =
2/𝜋 . For the
𝐸 𝑧𝑡 =
where 𝑑 is the number of degrees of freedom.
In summary, the EGARCH model has two advantages over the symmetric GARCH
specification:
Function 𝑔 enables the model to respond asymmetrically to positive and
negative lagged values of 𝜖𝑡.
Use of the log-conditional variance in EGARCH specification relaxes the constraint of positive model coefficients.
APARCH Models
In some financial time series, large negative returns appear to increase volatility more than do positive returns of the same magnitude. This is called the leverage effect.
The APARCH(p,q) model for the conditional standard deviation is 𝑝𝑞
h𝛿 = 𝜔 + 𝛼 𝜖 − 𝛾 𝜖 𝛿 + 𝛽 h𝛿
𝑡 𝑖𝑡−1𝑖𝑡−1 𝑗𝑡−𝑗
𝑖=1 𝑗=1
The leverage effect is through the function 𝑔𝛾𝑖, where 𝑔𝛾 𝑥 = 𝑥 − 𝛾𝑥. When 𝛾>0, 𝑔𝛾 −𝑥 >𝑔𝛾 𝑥 for any 𝑥>0, so there is a leverage effect. If 𝛾<0,
then there is a leverage effect in the opposite direction to what is expected --- positive past values of 𝜖𝑡 increase volatility more than negative past values of the same magnitude.
Gamma function
Gamma>0, larger weights are given to negative than positive. The larger the magnitude of gamma, the model is more asymmetric.
Integrated GARCH Model
The estimation of ARCH processes on log-return data yields the similar pattern in the results:
For longer samples, the estimated parameters 𝑎1, … , 𝑎𝑞 and 𝑏1, … , 𝑏𝑝 of the model sum up to values that are typically close to one.
For shorter samples, the sum of the estimated coefficients, although not small, stays away from 1.
These two observed facts are known as the integrated GARCH (IGARCH) effect. Engle and Bollerslev introduced the integrated GARCH(p, q) (IGARCH(p, q))
process for which
𝑞𝑝
𝑎 >0and𝑎+𝑏=1 0𝑖𝑗
𝑖=1 𝑗=1
The IGARCH model has a strictly stationary solution (h𝑡), and therefore {𝜖𝑡} is strictly stationary as well, but 𝜖𝑡’s do not have a finite second moment:
𝑞𝑝
𝐸h =𝑎0+𝑎𝑖𝐸(𝜖2)+𝑏𝑖𝐸(h)=𝑎+𝐸(h) 𝑖=1 𝑖=1
Integrated GARCH Model
We can also rewrite the GARCH(1,1) model as 1−𝑎𝐿−𝑏𝐿𝜖2=𝑎 + 1−𝑏𝐿 𝜖2−h
1101𝑡𝑡
If the polynomial 1 − 𝑎1𝐿 − 𝑏1𝐿 contains a unit root, we obtain the IGARCH model of Engle and Bollerslev.
For the GARCH case, integrated is not synonymous for nonstationarity.
The stationary GARCH(1,1) process is not persistent in variance if and only if 𝑎1 + 𝑏1 < 1. The stationary IGARCH(1,1) model with 𝑎1 + 𝑏1 = 1 is persistent in variance so that shocks to conditional variance never die out.
Forecasting with GARCH Models
GARCH models describe evolution of the conditional variance of 𝜖𝑡, which can be linked with the evolution of 𝑦𝑡:
𝑉𝑎𝑟 𝑦𝑡 𝑦𝑡−1, 𝑦𝑡−2, ... = 𝑉𝑎𝑟 𝜖𝑡 𝜖𝑡−1, 𝜖𝑡−2, ... )
Consider GARCH(1,1) model and let 𝑡 be the forecast origin. The 1-step ahead forecast is
h 1 =h =𝑎 +𝑎𝜖2+𝑏h 𝑡 𝑡+1 0 1𝑡 1𝑡
and the 2-step ahead forecast is
h2=h =𝑎+𝑎𝜖2 +𝑏h
𝑡 𝑡+2 0 1 𝑡+1 1 𝑡+1
=𝑎 +𝑎(𝑎 +𝑎𝜖2+𝑏h)+𝑏 𝑎 +𝑎𝜖2+𝑏h 0 1 0 1 𝑡 1 𝑡 1 0 21 𝑡 1 𝑡
=𝑎0 1+(𝑎1+𝑏1) +(𝑎1+𝑏1)(𝑎1𝜖𝑡 +𝑏1h𝑡)
This can be extended to general recursion form for the n-step ahead forecast
h𝑡 𝑛 =𝑎0+ 𝑎1+𝑏1 h𝑡 𝑛−1,𝑛>1
Forecasting with GARCH Models
For the GARCH(1,1) model, the n−step ahead forecast can be written as 𝑛−1
𝑖 𝑛−12
h 𝑡 𝑛 = h 𝑡 + 𝑛 = 𝑎ො 0 1 + 𝑎ො 1 + 𝑏 1 + 𝑎ො 1 + 𝑏 1 ( 𝑎ො 1 𝜖 𝑡 + 𝑏 1 h 𝑡 )
𝑖=1
for any 𝑛 ≥ 2 where the quantities on the right−hand side are known.
For the GARCH(1,1) model, given that (𝑎ො1 + 𝑏1) < 1, the n-step ahead forecast is
𝑛−1
𝑛−1
+ 𝑎ො 1 + 𝑏 1 h 𝑡 + 1 , 𝑛 ≥ 2
As the forecast horizon grows, the long-term prediction will tend towards the unconditional volatility:
h 𝑡 + 𝑛 = 𝑎ො 0
1 − 𝑎ො 1 + 𝑏 1
1 − 𝑎ො 1 + 𝑏 1
𝑎ො0
h⟶ ,𝑎𝑠𝑛⟶∞
1 − 𝑎ො 1 + 𝑏 1
Forecast evaluation
To measure forecast accuracy, we use usual summary statistics based directly on the deviation between forecasts and realizations such as the root mean squared error (RMSE), the mean absolute error (MAE) and the mean absolute percentage error (MAPE):
𝑡+𝑛
1 𝑅𝑀𝑆𝐸 = 𝑛
𝜏=𝑡+1
𝑡+𝑛
2 1/2
1
𝑀𝐴𝐸=𝑛 h𝜏−h𝜏
𝜏=𝑡+1
h𝜏 − h𝜏
𝑡+𝑛
1 𝑀𝐴𝑃𝐸=𝑛
𝜏=𝑡+1
h𝜏− h𝜏
h𝜏
Does Anything Beat a GARCH(1,1)?
A voluminous literature has emerged for modeling the temporal dependencies in financial market volatility at the daily and lower frequencies using ARCH and stochastic volatility type models. Most of these studies find highly significant in-sample parameter estimates and pronounced intertemporal volatility persistence. Meanwhile, when judged by standard forecast evaluation criteria, based on the squared or absolute returns over daily or longer forecast horizons, standard volatility models provide seemingly poor forecasts.
Andersen, T. G., And T. Bollerslev (1998): “Answering the skeptics: Yes, standard volatility models do provide accurate forecasts,” International Economic Review, 39(4), 885–905.
While 𝐸 𝑟2 I = h 𝐸 𝑧2 = h appears to justify the use of the squared returns over 𝑡 𝑡−1 𝑡 𝑡 𝑡
the relevant forecast horizon as a proxy for the ex-post volatility, the squared innovation may yield very noisy measurements due to the idiosyncratic error term, z .
In empirically realistic situations the GARCH models actually produce accurate interdaily forecasts for the latent volatility factor if realized variance is used as benchmark.
Hansen, P. Reinhard, and Asger Lunde (2001): "A comparison of volatility models: Does anything beat a GARCH (1, 1)." Unpublished manuscript. Department of Economics, Brown University.
Performed out-of-sample comparison of 330 different volatility models using daily exchange rate data (DM/$) and IBM stock prices. Interestingly, the best models do not provide a significantly better forecast than the GARCH(1,1) model, while an ARCH(1) model is clearly outperformed.
Illustration: Estimate Monthly Vola
Let 𝑟𝑚 be the t-th month log return and 𝑟 𝑇 be the daily log returns within the t-th month:
𝑡
We have
𝑡,𝑖 𝑖=1
𝑇
𝑇
𝑟𝑚 =𝑟 .
𝑖=1
𝑡
𝑡,𝑖
𝑣𝑎𝑟𝑟𝑚𝐹 )=𝑣𝑎𝑟𝑟 𝐹 )+2𝑐𝑜𝑣[(𝑟 ,𝑟 )|𝐹 ] 𝑡 𝑡−1 𝑡,𝑖 𝑡−1 𝑡,𝑖 𝑡,𝑗 𝑡−1
𝑖=1 𝑖<𝑗
where 𝐹 𝑡−1
denotes the information available at month 𝑡 − 1 (inclusive).
If𝑟 isawhitenoiseseries,then𝜎2 =𝑣𝑎𝑟 𝑟𝑚 𝐹 = 𝑇𝑣𝑎𝑟 𝑟 =𝑇𝜎2 and
𝑡,𝑖
The estimated monthly volatility is then
If the sample mean 𝑟ഥ is zero, then 𝜎 ≈ σ 𝑟
𝑡 𝑚 𝑖=1 𝑡,𝑖
𝑚 𝑡𝑡−1 𝑡,1 𝑇𝑇
෪2
𝜎 = 𝑟 − 𝑟ഥ / ( 𝑇 − 1 ) ,
𝑟ഥ = 𝑟 / 𝑇 𝑡 𝑡,𝑖
𝑖=1
෪2
𝜎 =
22 𝑟 − 𝑟ഥ ≈ 𝑟 − 𝑟ഥ
2
𝑡,𝑖 𝑡 𝑖=1
𝑇𝑇
𝑇
𝑚 𝑇−1 𝑡,𝑖 𝑡 𝑡,𝑖 𝑡
𝑖=1 𝑖=1 ෪2 𝑇 2
Apply the idea to intra-daily log returns we have Realized Variance 𝑅𝑉 = σ𝑇
𝑡 𝑖=1 𝑡,𝑖 𝑡,𝑖
denoting the 𝑖-th intra-daily log return on day t.
𝑟2 with 𝑟
Realized variance and high frequency data
Tick by tick data of the IBM share traded at the New York Stock Exchange (NYSE) in the year 2005. Data source: Trades and Quotes (TAQ).
High-frequency (HF) financial data (Intradaily data, tick- by-tick data or transaction data): include transaction information recorded in markets at very high frequencies, e.g. minute, second, millisecond.
Basic features:
Irregular time intervals
Discrete values, e.g. price in multiples of tick size
Large sample size
Multi-dimensional variables, e.g. price, volume,
quotes, etc.
Daily periodic/Diurnal Pattern
Multiple transactions within a single time unit
Leptokurtic or heavy tails
High frequency data
Left: Intraday Trading Intensity for IBM on 11th of March 1998. The height of each block represents the number of trades in 6 minutes time.
Right: Multiple transactions within one second: The histogram depicts the occurrence of seconds with 2,3,4 or more transactions per second. It is based on the data of GE in 2005. The maximum number of transaction per second is 153 transactions.
Data cleaning
Data cleaning:
Detect errors in the raw data
Synchronize the unequally spaced
data by interpolation
(Fig: Raw HF stock price of IBM in 2004)
Market Microstructure:
Asynchronous trading can introduce:
lag-1 cross-correlation between stock returns,
lag-1 serial correlation in a portfolio return,
in some situations (mean is nonzero) negative serial correlations of the return series of a single stock.
Bid-ask bounce
Impact of changes in tick size, after-hour trading, etc. Impact of daily price limits (many foreign markets)
Asynchronous Trading
Suppose log returns 𝑟 are IID 𝜇, 𝜎2 . For each time index 𝑡, the probability of no trading 𝑃 = 𝑡
𝜋, that is, 𝑟 is unobserved if there is no trade. The observed log return series 𝑟𝑜 turns to be: 𝑡𝑡
0 𝑤𝑖𝑡h 𝑝𝑟𝑜𝑏 𝜋
𝑟 𝑤𝑖𝑡h𝑝𝑟𝑜𝑏1−𝜋2 𝑡
𝑟 +𝑟
𝑡 𝑡−1
𝑟𝑜 = 𝑡
𝑖=0
𝑤𝑖𝑡h𝑝𝑟𝑜𝑏 1−𝜋 2𝜋 ⋮⋮
𝑤𝑖𝑡h𝑝𝑟𝑜𝑏 1−𝜋 2𝜋𝑘 ⋮⋮
𝑘
𝑟 𝑡−𝑖
One can use this relation to show:
E𝑟𝑜 =𝜇,
𝑡𝑜22
𝑣𝑎𝑟𝑟 =𝜎+2𝜋𝜇 1−𝜋 𝑡
𝑐𝑜𝑣 𝑟𝑜,𝑟𝑜 =−𝜇2𝜋𝑗 𝑗≥ 1. 𝑡 𝑡−𝑗
Impact of Microstructure Noise
Let 𝑝𝑡,𝑖 be the observed 𝑖-th log price on day 𝑡. It is contaminated by market microstructure
noise 𝑢𝑡,𝑖:
where 𝑝∗ is the efficient log price following the Brownian motion (continuous-time stochastic
𝑡,𝑖
process with 𝑢𝑡,𝑖 IID normal distributed).
Observed intra-day return and the naive RV are equal to:
𝑝 =𝑝∗ +𝑢 𝑖=1,2,⋯,𝑛 𝑡,𝑖 𝑡,𝑖 𝑡,𝑖
𝑟(𝑇)=𝑟∗(𝑇)+𝜀𝑇, 𝜀𝑇 =𝑢𝑡,𝑖−𝑢𝑡,𝑖−1
𝑡,𝑖 𝑡,𝑖 𝑡,𝑖
𝑡,𝑖 𝑛𝑛
𝑅𝑉=𝑅𝑉∗+2𝑟∗𝜀 +𝜀2 𝑡 𝑡 𝑡,𝑖𝑡,𝑖 𝑡,𝑖
𝑖=1 𝑗=1
𝐸𝑅𝑉 =𝑅𝑉∗+2𝑛E(𝑢2) 𝑡 𝑡 𝑡,𝑖
When 𝑛 is large, the naïve estimator with the contaminated returns will be dominated by the variance of noise!
Optimal frequency: Minimizing Mean Squared Error.
Consistent estimators under presence of Microstructure Noise?
Two Scaled Realized Variance (TSRV)
Zhang, Lan, Per A. Mykland, and Yacine Aït-Sahalia. "A tale of two time scales: Determining integrated volatility with noisy high-frequency data." Journal of the American Statistical Association 100.472 (2005): 1394-1411.
Subsampling: Allocate the intra-day observations into 𝑄 non-overlapping subsamples, each with 𝑛𝑞
observations
Compute RV for each subsamples: 𝑅𝑉 𝑞,𝑛𝑞 and averaging them
1 σ𝑄 𝑅𝑉 𝑞,𝑛𝑞 to remove the 𝑄 𝑞=1 𝑡
impact of microstructure noise.
RV based on returns at lower sampling frequency
1 𝑛𝑞
𝑟2 , 𝑡,𝑖
𝑡
𝑞,𝑛𝑞
RV based on returns at higher sampling frequency
𝑅𝑉= 𝑟2,
TSRV
Where𝑛ത =σ𝑄 𝑛𝑞 𝑞=1
𝑅𝑉 𝑡
=
𝑛𝑞
𝑖=1 1𝑛
𝑡 𝑛 𝑡,𝑖 𝑖=1
1𝑄
𝑇𝑆𝑅𝑉= 𝑅𝑉𝑞,𝑛𝑞 −𝑛ത/𝑛𝑅𝑉 𝑡𝑄𝑡𝑡
𝑞=1
Realized Kernel: Hansen, P. R., Lunde, A. (2006b). Realized variance and market microstructure noise (with discussion), Journal of Business and Economic Statistics 24:127–218.
Pre-averaging: Jacod, J., Li, Y., Mykland, P. A., Podolskij, M., & Vetter, M. (2009). Microstructure noise in the continuous case: the pre-averaging approach. Stochastic processes and their applications, 119(7), 2249-2276.
Example: Realized volatility and GARCH
(Daily) variance of IBM for year 2005.
R lab
Run the following code to load the data set Tbrate, which has three variables: the 91-day T-bill rate, the log of real GDP, and the inflation rate. In this lab you will use only the T-bill rate.
data(Tbrate,package="Ecdat") library(tseries)
library(fGarch)
# r = the 91-day treasury bill rate # y = the log of real GDP
# pi = the inflation rate Tbill = Tbrate[,1] Del.Tbill = diff(Tbill)
Problem 1 Plot both Tbill and Del.Tbill. Use both time series and ACF plots. Also, perform ADF and KPSS tests on both series. Which series do you think are stationary? Why? What types of heteroscedasticity can you see in the Del.Tbill series?
Data:6_Tbrate.txt
R lab
In the following code, the variable Tbill can be used if you believe that series is stationary. Otherwise, replace Tbill by Del.Tbill. This code will fit an ARMA/GARCH model to the series.
garch.model.Tbill = garchFit(formula= ~arma(1,0) + garch(1,0),Tbill) summary(garch.model.Tbill)
garch.model.Tbill@fit$matcoef
Problem 2
a) Which ARMA/GARCH model is being fit? Write down the model using the same parameter names as in the R output.
b) What are the estimates of each of the parameters in the model?
R lab
Next, plot the residuals (ordinary or raw) and standardized residuals in various ways using the code below. The standardized residuals are best for checking the model, but the residuals are useful to see if there are GARCH effects in the series.
res = residuals(garch.model.Tbill)
res_std = res / garch.model.Tbill@sigma.t par(mfrow=c(2,3))
plot(res)
acf(res)
acf(res^2)
plot(res_std)
acf(res_std)
acf(res_std^2)
R lab
Problem 3
a) Describe what is plotted by acf(res). What, if anything, does the plot tell you about the fit of the model?
b) Describe what is plotted by acf(res^2). What, if anything, does the plot tell you about the fit of the model?
c) Describe what is plotted by acf(res_std^2). What, if anything, does the plot tell you about the fit of the model?
d) What is contained in the variable garch.model.Tbill@sigma.t?
e) Is there anything noteworthy in the plot produced by the code plot(res_std)?
Problem 4 Now find an ARMA/GARCH model for the series del.log.-tbill, which we will define as diff(log(Tbill)). Do you see any advantages of working with the differences of the logarithms of the T-bill rate, rather than with the difference of Tbill as was done earlier?
R lab
On Black Monday, the return on the S&P 500 was −22.8%. Ouch! This exercise attempts to answer the question, “what was the conditional probability of a return this small or smaller on Black Monday?" “Conditional“ means given the information available the previous trading day. Run the following R code:
library(Ecdat)
library(fGarch)
data(SP500,package="Ecdat")
returnBlMon = SP500$r500[1805]
x = SP500$r500[(1804-2*253+1):1804]
plot(c(x,returnBlMon))
results = garchFit(~arma(1,0)+garch(1,1),data=x,cond.dist="std") dfhat = as.numeric(results@fit$par[6])
forecast = predict(results,n.ahead=1)
R Lab
The S&P 500 returns are in the data set SP500 in the Ecdat package. The returns are the variable r500. (This is the only variable in this data set.) Black Monday is the 1805th return in this data set. This code fits an AR(1)/GARCH(1,1) model to the last two years of data before Black Monday, assuming 253 trading days/year. The conditional distribution of the white noise is the t-distribution (called “std" in garchFit). The code also plots the returns during these two years and on Black Monday.
From the plot you can see that Black Monday was highly unusual. The parameter estimates are in results@fit$par and the sixth parameter is the degrees of freedom of the t-distribution. The predict function is used to predict one-step ahead, that is, to predict the return on Black Monday; the input variable n.ahead specifies how many days ahead to forecast, so n.ahead=5 would forecast the next five days. The object forecast will contain meanForecast, which is the conditional expected return on Black Monday, meanError, which you should ignore, and standardDeviation, which is the conditional standard deviation of the return on Black Monday.
R lab
Problem 5 Use the information above to calculate the conditional probability of a return less than or equal to −0.228 on Black Monday.
Problem 6 Compute and plot the standardized residuals. Also plot the ACF of the standardized residuals and their squares. Include all three plots with your work. Do the standardized residuals indicate that the AR(1)/GARCH(1,1) model fits adequately?
Problem 7 Would an AR(1)/ARCH(1) model provide an adequate fit? (Warning: If you apply the function summary to an fGarch object, the AIC value reported has been normalized by division by the sample size. You need to multiply by the sample size to get AIC.)
Problem 8 Does an AR(1) model with a Gaussian conditional distribution provide an adequate fit? Use the arima function to fit the AR(1) model. This function only allows a Gaussian conditional distribution.
R lab – Results & Discussions
Problem 1 Which series do you think are stationary? Why? What types of heteroskedasticity can you see in the Del.Thill series?
The plot of Tbill (below, left) looks non-stationary because it seems not to be mean-reverting. The plot of Del.Thill (below, right) appears stationary though there is clearly some volatility clustering and an increase in volatility with time (or with the T-bill rate).
R lab – Results & Discussions
Problem 1 Which series do you think are stationary? Why? What types of heteroskedasticity can you see in the Del.Thill series?
The ADF test accepts the null hypothesis of unit-root nonstationary for Tbill and the KPSS test rejects the null hypothesis of stationary for this series. Thus, the two tests agree with each other and with the plot of Tbill that the rate is non-stationary.
> adf.test(Tbill)
Augmented Dickey-Fuller Test data: Tbill
Dickey-Fuller = -1.925, Lag order = 5, p-value 0.6075 alternative hypothesis: stationary
> kpss.test(Tbill)
KPSS Test for Level Stationarity data: Tbill
KPSS Level= 2.777, Truncation lag parameter 3, p-value 0.01
For Del.Tbill, the ADF tests reject the null hypothesis of unit-root non-stationarity and KPSS test accepts the null hypothesis of stationarity (the p-value is greater than 0.1). Thus, from both tests and the plot, we conclude that Del. Tbill is stationary.
R lab – Results & Discussions
Problem 2 (a) Which ARMA/GARCH model is being fit? Write down the model using the same parameter names as in the R output.
(b) What are the estimates of each of the parameters in the model?
(a) The model is AR(1) for the conditional mean and ARCH(1) for the con-
ditional variance. The model for the conditional mean is 𝜇 + 𝜙(𝑌
and the model for the conditional variance is 𝜔 + 𝛼 𝑌2 . 1 𝑡−1
(b) The estimates arc in the table below. Here ar1 is 𝜙. Error Analysis:
Estimate Std. Error t value Pr(>ltl)
mu 0.08350 0.05391 1.549 0.121395 ar1 0.24163 0.07280 3.319 0.000902 *** omega 0.33816 0.06145 5.503 3.73e-08 *** alpha1 0.83483 0.24295 3.436 0.000590 ***
− 𝜇)
𝑡−1
R lab – Results & Discussions
Problem 3 (a,b,c) Describe what is plotted by acf(res)/acf(res^2)/ acf(res_std 2). What, if anything, does the plot tell you about the fit of the model?
(a) acf (res) plots the autocorrelation function of the residuals. The autocorrelations at low lags are not significantly different from 0, which indicates that the AR(1) model for the conditional mean is adequate.
(b) acf (res^2) plots the autocorrelation function of the squared residuals. Several are significant, which shows that there is volatility clustering. However, this does not indicate a problem with the model, since the ARCH(1) specification may be adequate. We need to look at the ACF of the standardized residuals to check.
(c) acf (res_std^2) plots the ACF of the squared standardized residuals. The standardized residuals are the raw residuals divided by the estimated conditional standard deviation. The ACF of the squared standardized residuals does not have significant autocorrelations at low lags and indicates that the ARCH(1) model for the conditional variance is adequate.
R lab – Results & Discussions
Problem 3 (d) What is contained in the variable garch.model.Tbill @sigma.t? (e) Is there anything noteworthy in the plot produced by the code plot(res_std)?
(d) garch. model.Tbill@sigma. t contains the estimates of the conditional standard deviations.
(e) As can be seen in the lower left plot above, the variability of res_std increases with time (as does the Tbill rate itself- see the plot in Problem 1). This problem could perhaps be remedied by using a transformation of Tbill such as a log transformation.
R lab – Results & Discussions
Problem 4 Do you see any advantages of working with the differences of the logarithms of the T-bill rate, rather than with the difference of Tbill as was done earlier?
In the following plots, the fit of the AR(1) /ARCH(1) model to del.log.tbill shows an inadequacy of the ARCH(1) model for the conditional variance because there is autocorrelation in the squared standardized residuals.
R lab – Results & Discussions
Problem 4 Do you see any advantages of working with the differences of the logarithms of the T-bill rate, rather than with the difference of Tbill as was done earlier?
However, if we use a GARCH(1,1) model for the conditional variance, then the residual plots look good, as can be seen in the next figure.
The advantage of using a log transformation prior to taking differences is that the log transformation removes a type of heteroscedasticity that GARCH models cannot model adequately, specifically that in the original Tbill series the conditional variance of the difference is proportional to the rate. This behaviour can be seen in the plots in Problem 1.
R lab – Results & Discussions
Problem 4 Do you see any advantages of working with the differences of the logarithms of the T-bill rate, rather than with the difference of Tbill as was done earlier?
The best way to detect heteroscedasticity is to plot absolute residuals. This is done in the following graph. The plots in this graph show that the standardized residuals from AR(1) / ARCH(1) fit to Del.Tbill still have this type of heteroscedasticity (see left plot), but the standardized residuals from AR(1)/GARCH(1,1) fit to del.log.tbill do not (right plot). The R function lowess was used to add smooth curves to the plots. Using lowess helps one see the patterns but is not essential.
R lab – Results & Discussions
Problem 4 Do you see any advantages of working with the differences of the logarithms of the T-bill rate, rather than with the difference of Tbill as was done earlier?
Here is the additional R code need to produce the plot. garch.model.Tbill_log_11 = garchFit(formula= -arma(1,0)+ garch(1,1),del.log.tbill) summary(garch.model.Tbill_log_11)
res_log_11 = residuals(garch.model.Tbill_log_11) res_std_log_11 = res_log I garch.model.Tbill_log_11@sigma.t par(mfrow=c(1,2))
plot (Tbill [-1] ,abs(res_std) ,xlab=”Tbill” ,ylab=”abs std residual”, main=”no log transform”)
lines(lowess (Tbill [-1], abs (res_std)), col=”red”, h7d=3)
plot (Tbill [-1] , abs (res_std_log_11), xlab=”Tbill”, ylab=”abs std residual”, main=”log
transform”)
lines(lowess(Tbill[-1] ,abs(res_std_log_11)),col=”red”,lwd=3)
R lab – Results & Discussions
Problem 5 Use the information above to calculate the conditional probability of a return less than or equal to -0.228 on Black Monday.
> probBlackMonday = pstd(returnB1Mon,mean=forecast$meanForecast, sd=forecast$standardDeviation,nu=dfhat)
> round(probBlackMonday,7)
[1] 2.11e-05
The conditional probability is 2.11e-05.
R lab – Results & Discussions
Problem 6 Do the standardized residuals indicate that the AR(1)/GARCH(1,1) model fits adequately?
The plots are produced by the R code below. The first ACF plot shows no correlation in the standardized residuals, which indicates that the AR(1) model is suitable for the conditional mean. Moreover, there is no autocorrelation in the squared standardized residuals, so the GARCH(1,1) model is suitable for the conditional variance.
std_res =results@residuals/results@sigma.t par(mfrow=c(1,3))
plot(std_res)
acf(std_res)
acf(std_res^2)
R lab – Results & Discussions
Problem 7 Would an AR(1)/ARCH(1) model provide an adequate fit?
The AR(1)/GARCH(1,0) model does provide an adequate fit, though the AR(1)/ GARCH(1,1) model’s fit is slightly better.
The normalized AIC values are -6.518632 and -6.518026 for the AR(1)/GARCH (1,1) and AR(1)/GARCH(1,0) models respectively. Multiplying by the sample size, the AIC values are -3298.428 and-3298.121, so the AR(1)/GARCH(1,1) model is only slightly better than the AR(1)/GARCH(1,0) model.
> -6.518632 *length(x) [1] -3298.428
> -6.518026 * length(x) [1] -3298.121
The standardized residual plots for the AR(1)/GARCH(1,0) do not show any serious problems. However, the lag-2 autocorrelation of the squared standardized residuals of the AR(1)/GARCHI(1,0) model hits the test bound, but this autocorrelation is much smaller with the AR(1)/GARCH(1,1) model.
R lab – Results & Discussions
Problem 8 Does an AR(1) model with a Gaussian conditional distribution pro vide an adequate fit?
Call:
arima(x = x, order = c(1,0,0))
Coefficients:
ar1 intercept
0.1275 8e-04 0.0455 5e-04
sigma-2 estimated as 9.45e-05: log likelihood= 1626.52, aic = -3247.03
The AIC is -3247.03 and is much larger than
The fit of the AR(1) model with a Gaussian conditional distribution is worse than the fit
of the AR(1)/GARCH models. s.e.
fitAR1 = arima(x,order=c(1,0,0)) fitAR1
par(mfrow=c(1,3))
residAR1 = residuals(fitAR1) plot(residAR1)
acf(residAR1) acf(residAR1^2)
AIC for the AR(1)/GARCH models.
squared residuals show some autocorrelation, especially at lag 2.
The poor fit of the AR(1) model could be due both to the lack of a GARCH effect in the model and to the conditional Gaussian assumption.
Also, the