3 Stylized facts of asset returns
In this section, we study some statistical properties of asset returns from the viewpoint of exploratory data analysis. In particular, we illustrate some stylized (i.e., commonly observed) properties of asset returns. Then, in Section 3.4, we discuss the geometric Brownian motion (GBM) as a model of asset price.
3.1 Histogram and kernel density estimator
Consider in this subsection the Russel 1000 Index (symbol: ^RUI). It is a capitalization- weighted index consisting of the largest 1000 stocks in the US stock market. Its per- formance is very similar to that of the S&P 500. In Figure 3.1 we plot the daily log returns (left) and their histogram (right). This data set contains n = 5524 data points.
Copyright By PowCoder代写 加微信 powcoder
Russell 1000 daily log return
Histogram of daily log return
2010 2015 2020
−0.10 −0.05 0.00 0.05 0.10
Figure 3.1: Daily log returns of Russell 1000 from January 2000 to December 2021.
The histogram provides an overall picture of the data but loses all information about the ordering (time dependence) of the observations. The empirical distribution of the log return resembles that of a continuous distribution on the real line.
The kernel density estimator allows one to compute an estimate of the density from a data set. Let K(·) be a probability density function on R; it is called the kernel. A typical choice of K is the standard normal density. Given data x1, . . . , xn ∈ R and a bandwidth h > 0, the kernel density estimator is
ˆ 1n 1x
fh(x)= n Kh(x−xi), Kh(x)= hK h . (3.1)
That is, fˆ (x) is the sample average of the rescaled kernels centered at each of the data
points. If the data points are i.i.d. samples from from a continuous distributions, it can be shown under suitable conditions that as n → ∞ the kernel estimator converges (in an appropriate sense) to the true density. Software packages automatically adopt some bandwidth h computed using some asymptotic convergence theory, but h can be manually tuned as well. It is clear from Figure 3.1 that the daily log returns (over
Kernel density estimate & normal curve
−0.10 −0.05 0.00 0.05 0.10
Figure 3.2: Kernel density estimate (in blue) of the daily log return of Russell 1000. The red dashed curve plots the normal density where the mean and standard deviation match those of the data.
the 20 years span) are not i.i.d.1; here, we only use the kernel density estimator to visualize the empirical distribution.
In Figure 3.2 we plot (in blue) the kernel density estimator of the daily log return data. Also shown (in red) is the normal density
φ(x;μ,σ)= √1 exp−1(x−μ)2, 2πσ 2σ2
where the mean μ and the standard deviation σ are chosen to match the sample mean and sample standard deviation. From the figure, we see that empirical distribution of the log return is roughly symmetric about 0. Compared to the normal distribution it has a higher peak near the origin but has thicker tails in both directions. These features are typical for asset returns. In the next subsection we will quantify these observations using the concepts of skewness and kurtosis.
3.2 Describing a univariate distribution
Even though stock prices take values in a discrete set (due to the tick size which, for US stocks, is $0.01), it is customary to model asset prices and returns using continuous random variables. In this subsection, we review some vocabularies for describing a univariate distribution.
Let X be a real-valued random variable. Its cumulative distribution function (cdf) is given by
FX(x)=P(X ≤x), x∈R. (3.2)
Then FX is nondecreasing, right-continuous, and satisfies limx→−∞ FX (x) = 0 and
limx→∞ FX (x) = 1. If FX is differentiable, then the distribution of X has a density
given by fX = FX′ . Given α ∈ (0,1), we say that x ∈ R is a 100α%-quantile of
the distribution if FX (x) = α. If the distribution has a density which is everywhere
positive (i.e., fX (x) > 0 for x ∈ R), then FX is strictly increasing and so the 100α%-
quantile exists uniquely and is given by x = QX (α) := F −1 (α). We call QX the
quantile function. Here is an example:
1More precisely, they are extremely unlikely to be a realization of an i.i.d. process.
Example 3.1 (Laplace distribution). The Laplace (also called double exponential) dis- tribution is a location-scale family with density
f(x;μ,b)= 1 exp−|x−μ|, x∈R. 2b b
Definition 3.2 (Value at risk). Let X be a random variable interpreted as the mone- tary loss of a portfolio over some specified horizon.2 The value at risk (VaR) at level α ∈ (0, 1) is the (1 − α)-quantile of X:
VaRα(X) = QX(1 − α) = F−1(1 − α), (3.6) X
provided QX is well-defined.
In words, with probability 1 − α the potential loss of the portfolio is less than or equal to VaRα(X). Value at risk is often used to monitor the risk of a portfolio or a firm. However, value at risk has its limitations. In particular, it neglects the tail of the distribution beyond the quantile. A theoretically more sound quantity is the expected shortfall which is also called the conditional value at risk.
Definition 3.3 (Expected shortfall). Consider the context of Definition 3.2. The expected shortfall of X at level α ∈ (0, 1) is defined by
ESα(X) = E[X|X ≥ VaRα(X)]. (3.7) Next we consider descriptive quantities based on the moments of the distribution.
If E[|X|] < ∞, then the mean (or expected value) of X exists finitely and is given by μ = E[X]. (3.8)
For k = 0,1,..., the centered k-th moment of X is defined by
mk = E[(X − μ)k], (3.9)
provided E[|X − μ|k] < ∞. Otherwise, we say that mk does not exist. If X admits a density fX, then
mk = (x−μ)kfX(x)dx.
By construction, we have m1 = μ. When exists, m2 is the variance of X:
σ2 =Var(X)=m2 =E[(X−μ)2]. 2Thus loss is positive and profit is negative.
It is standard to show that its cdf is given by 1expx−μ,
F (x) = 2 1 b x−μ 1−2exp−b
ifx<μ; , ifx≥μ.
Inverting, we see that the quantile function is given by F(α)= μ+blog(2α), ifα<12;
μ−blog(2−2α), ifα≥12.
The quantile function has important applications in risk management. Here, we
state two relevant definitions.
It is well known that if X is normally distributed, then all moments of X exists finitely. In fact, if X ∼ N(μ,σ2) then
E[etX]=eμt+21σ2t2, t∈R. (3.10) The normalized centered third moment is the skewness of X:
S = E (X − μ)3 = m3 . (3.11) σ3 σ3
If the distribution of X is s symmetric about μ (for example, when fX (μ + x) = fX (μ − x)), then S = 0. Intuitively, S > 0 (resp. S < 0) indicates that the right tail of the distribution is thicker (resp. thinner) than the left tail.
The normalized centered fourth moment is called the kurtosis:
K = E (X − μ)4 = m4 . (3.12)
Intuitively, since (x − μ)4 is non-negative and increases rapidly as x moves away from μ, K measures the thickness of the tails of the distribution. The quantity K − 3 is often called the excess kurtosis because of the following result:
Example 3.4 (Normal distribution). If X ∼ N(μ,σ2) is normally distributed, then E[X]=μ,Var(X)=σ2,S=0,andK=3. Onewaytoshowthisistodifferentiate repeatedly the moment generating function (3.10) and then set t = 0.
The normal distribution is often taken as a benchmark distribution for comparison purposes. A distribution with kurtosis greater than 3 is said to be leptokurtic. The empirical distributions of asset returns are often leptokurtic as in Figure 3.2. Indeed, there is evidence that some higher moments of asset returns are infinite. Financially, leptokurticity means that “extreme events” are much more probable than predicted by the normal distribution.
A distribution which is commonly used to model heavy tails is the t-distribution:
Example 3.5 (t-distribution). Let ν > 0, μ ∈ R and σ > 0. The t-distribution with ν degrees of freedom, location parameter μ and scale parameter σ has density on R
Γ ν +1 1 (x − μ)2 − ν +1
p(x;ν,μ,σ)=Γν2√πνσ 1+ν σ2 (3.13)
ifandonlyifν>2. Theskewnessexists(andis0)ifandonlyifν>3. Lastbutnot
The mean exists (and is μ) if and only if ν > 1. The variance is finite (and is
least,thekurtosisexists(andis3+ 6 )ifandonlyifν>4. ν−4
Given data points x1, x2, . . . , xn, we define the sample mean by 1 n
μˆ = n xi, i=1
the sample variance by
σˆ2 = n−1 (xi −μˆ)2, i=1
and the sample skewness and kurtosis respectively by
Sˆ = (n−1)σˆ3 (xi −μˆ)3,
Kˆ = (n−1)σˆ4 21
(xi −μˆ)4.
The sample excess kurtosis is Kˆ − 3. (Using n − 1 or n do not matter when n is sufficiently large.) For a given data set, the above quantities can be regarded as summary statistics. If x1, . . . , xn are i.i.d. samples from a distribution, then they define consistent estimates of the population quantities. (Recall that an estimator θˆn (where n denotes the sample size) of θ is said to be consistent if θˆn converges in probability to θ as n → ∞.)
We illustrate some of the above concepts with an example. More empirical results will be shown in the next subsection.
Example 3.6. Consider the daily log return data of Russell 1000 considered in Section 3.1. We have
μˆ = 0.00022, σˆ2 = 0.00016, Sˆ = −0.44, Kˆ = 11.
Thus the data is slightly negatively skewed and is lepkokurtic with excess kurtosis about 8.
Let us fit a t-distribution to the data using maximum likelihood estimation. If the return series is represented by a vector x, the maximum liklihood estimator (MLE) can be computed using the following code:
The additional parameters in fitdistr() provide an initial guess and lower bounds of the parameters; with them the optimization algorithm converges. The estimates of the parameters are
μˆ = 0.00070 (0.00012), σˆ = 0.0073 (0.00014), νˆ = 3.00 (0.16).
where the standard errors are shown in parentheses. Since the data is unlikely to be i.i.d. (as shown from the low and high volatility periods in Figure 3.1), the standard errors are not so meaningful and are only shown for completeness. Maximum likelihood estimation fits the return data with a t-distribution with ν = 3 degrees of freedom. Note that under this fitted t-distribution the variance is finite but the skewness and kurtosis are infinite. In Figure 3.3 we plot the density of the fitted t-distribution together with the kernel density estimate and the normal fit. We see that the t- distribution provides a much better fit.
We end this section with a statistical test of normality. Consider the null hypothesis that X1,X2,… are i.i.d. observations from a normal distribution N(μ,σ2) for some μ ∈ R and σ > 0. Recall that the normal distribution has skewness 0 and kurtosis 3. This motivates the Jarque-Bera test which adopts the test statistic
Sˆ2 (Kˆ − 3)2
JB = 6/T + 24/T . (3.17)
Under the null hypothesis, the JB statistic is asymptotically (this means as n → ∞) distributed as a chi-squared random variable with 2 degrees of freedom (we denote the chi-squared distribution with ν degrees of freedom by χ2ν). Thus a large value of JB suggests departure from normality. Given a significance level α, say 0.99, one rejects the null hypothesis at level α if the p-value of the JB is less than 1−α, or equivalently if JB > z0, where z0 is the (1 − α)-quantile of the distribution χ2.
For the Russell 1000 daily log return data discussed above, the value of the JB statistic is 27834, and the p-value is less than 2.2 × 10−16. This practically eliminates
library(MASS)
t_fit <- fitdistr(x, "t",
start = list(m = mean(x), s = sd(x), df = 3),
lower = c(-1, 0.001, 1))
Fit with t− and normal distributions
−0.10 −0.05 0.00 0.05 0.10
Figure 3.3: Blue solid: The kernel density estimate as in Figure 3.2. Red dashed: Normal density. Purple dotted: t-distribution with parameters given by the maximum like- lihood estimates. The t-distribution (here ν = 3) provides a better fit than the normal distribution.
the possibility that the daily log returns are i.i.d. samples from a normal distribution (this is not surprising because of the fat tails and that the data set spans over 20 years). Nevertheless, the normal distribution and associated models such as geometric Brownian motion still have many applications in financial modelling, especially over shorter time horizons and lower time frequencies.
3.3 Stylized facts of asset returns
Some statistical properties observed in the previous subsection, such as fat tails, are common across a wide range of assets. In this subsection we illustrate these and other stylized facts for the following four indices and stocks:
FTSE 100 Index (^FTSE)
SSE Composite Index (000001.SS) Walmart (WMT)
The sample period is 2000–01–01 to 2021–12–16. For more discussion and other stylized facts, see the paper Cont (2001). We emphasize that the stylized facts are empirical patterns that seem to occur frequently and generally; for a given sample they may or may not hold.
For each asset, we consider the log return at daily, weekly and monthly frequencies. For each return series we report three summary statistics, namely the sample skewness (S), sample excess kurtosis (Ke), and the Jarque-Bera statistic (JB). Note that there are missing values in some of the price series. Those values (which are rare) are simply removed in the analysis (this means that a few of the returns are aggregated over two or more periods). The results are shown in Table 1.
We make the following observations:
Fact 1. Heavy tails.
At all frequencies and for all assets considered, the (sample) kurtosis is larger than 3
Asset FTSE SSE WMT MS S −0.335 −0.365 0.17 1.19 Ke 7.91 5.15 7.79 46.9
JB 14500 6000 14000
S −1.25 −0.201 −0.136 −1.11
Ke 12.4 2.41 3.71 2.17 JB 7600 277 662 87100
S −0.721 −0.537 −0.456 −0.944 Ke 1.22 2.17 1.34 3.40 JB 39.2 64.2 28.9 166
Table 1: Summary statistics of the log return of the 4 assets at daily, weekly and monthly frequencies.
(i.e., the excess kurtosis is positive). (In the table, the p-values of all JB statistics are less than 0.01, implying statistically significant departure from normality.)
Fact 2. Gain/loss asymmetry.
Large negative values are more likely to occur than large positive values. This is reflected by the fact that most skewness values in the table are negative.
Fact 3. .
Recall that the log return is additive over time. As the frequency decreases, it ap- pears that the empirical distribution of the (log) return becomes closer to the normal distribution. In Table 1, we observe that for each asset, the value of the JB statistic decreases as the frequency decreases. In particular, the excess kurtosis decreases. In Figure 3.4 we illustrate this phenomena graphically for Walmart.
Daily log return
−0.10 −0.05 0.00
Weekly log return
Monthly log return
−0.2 −0.1 0.0 0.1
−0.1 0.0 0.1
Figure 3.4: Kernel density estimates and normal fits for the Walmart data, at daily, weekly and monthly frequencies. It appears that the distribution of the log return looks more and more like a normal distribution as the frequency decreases.
The previous properties depend on the empirical distribution which does not take into account the temporal structure. The next properties are related to the autocor- relation of the returns. Some concepts from time series analysis will be reviewed in
Section 4.
Fact 4. Absence of (linear) autocorrelations.
The returns over successive periods are not strongly correlated. Here, we only show the lag-1 (sample) correlation between rt and rt−1. In Figure 2, we see that the correlations are generally close to 0 for all assets and for all frequencies. (Note that correlation may become significantly nonzero at high (intraday) frequencies due to microstructure effects.) Zero correlation means that it is not possible to predict future stock returns using linear predictors based on previous returns, but uncorrelation does not imply independence.
log return absolute log return log return absolute log return log return absolute log return
FTSE SSE −0.038 0.021
0.27 0.18 −0.06 0.05 0.19 0.19 0.01 0.11 0.20 0.13
WMT MS −0.048 −0.012
0.27 0.37 −0.17 −0.20 0.24 0.40
−0.09 0.05 0.01 0.26
Table 2: Correlation between rt and rt−1, and correlation between |rt| and |rt−1|, for the 4 assets and at different fre- quencies.
Fact 5. Volatility clustering.
Volatility clustering refers to the phenomena that periods of high volatility (as well as low volatility) tend to occur in clusters. For example, in Figure 2.2, we see that there are high volatility periods during the financial crisis in 2008 and the COVID crisis in 2020. One way to quantify this to examine Cov(|rt|,|rt−1|) or more generally Cov(|rt|α,|rt−1|α), where α > 0 is a constant. We expect that the empirical corre- lations are generally positive, meaning that a large return (in magnitude) is usually followed by another large return. In Table 2 we also show the empirical correlation between |rt| and |rt−1|. Note that the correlations are generally positive. One way to capture this phenomena is to use the ARCH and GARCH models.
3.4 Geometric Brownian motion
In this subsection, we review the geometric Brownian motion (GBM) and consider its estimation.
Definition 3.7 (Brownian motion). A continuous stochastic process (Wt)t≥0 is a (standard) Brownian motion if W0 =0 and for any 0≤t0
If St = S0eγt+σWt is a geometric Brownian motion, we call γ the growth rate and σ the volatility. It is also of interest to consider the parameter μ = γ + 12 σ2 due to the representation of S by stochastic differential equations (SDEs):
dSt =μStdt+σStdWt ⇔dlogSt =γdt+σdWt. 25
The equivalence of the two representations follows immediately from Itˆo’s formula. Note that γ is called the growth rate because we have the following law of large
lim 1 log St = γ, almost surely. t→∞ t S0
Suppose we model the price St of an asset (e.g. a market index) by a geometric Brownian motion, and we record the log return ri = log Sti − log Sti−1 over a uniform time grid, i.e., ti − ti−1 = h > 0 is independent of i. Then the log returns r1, r2, . . . are i.i.d. normal random variables:
ri ∼ N(hγ,hσ ). (3.18)
Consequently, to test whether the GBM is an adequate model, we can use e.g. the Jarque-Bera test discussed in Section 3.2. We saw there that the i.i.d. Gaussian assumption is usually not adequate. Nevertheless, the GBM is widely applied in practice due to its tractability.
Consider estimation of the GBM model. Assume we observe n data points r1, . . . , rn from the model (3.18) (for simplicity we assume the time points form a uniform grid). Then, we may estimate hγ by the sample mean
rn = n ri.
Thus we estimate the growth rate γ by
γˆ = h1 r n .
Since ri = log Sti − log Sti−1 , we have
γˆ=hn (logS(ti)−logSti−1)=t −t logS . i=1 n 0 t0
(3.19) Note that the estimator of the growth rate parameter depends only on the initial and
final values of S. We can estimate σ2 by
σˆ2 = hs2n := h(n−1) (ri −rn)2, i=1
where s2n is the sample variance of the log return.
Next consider the estimation errors and how they relate to the sampling interval
h. Let T = tn − t0 be the (fixed) length of the sample period (so h = T/n). From
(3.19), we have
γˆ ∼ N γ , σ 2 . T
So γˆ is unbiased, i.e., E[γˆ] = γ. Also, the standard error (standard deviation) of γˆ is √σ which decays like the square root of T . Since r1, . . . , rn are i.i.d., it is a standard
result that
is distributed as the t-distr
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com