The Covariance Matrix
The mathematical theme this week is the covariance matrix. We will study the following topics
• Understandingthedefinitionofthemultivariatenormaldistribution.
• Simulating samples from a multivariate normal distribution using Cholesky decomposition. • The Markowitz model for portfolio selection
Copyright By PowCoder代写 加微信 powcoder
The financial theme this week is The Markowitz model.
The question Markowitz addressed was what quantities you should buy of different stocks. The assump- tions on the market are that you know the price of the stocks, their expected return and the covariance matrix of their returns. The assumptions on the investor are that we know what return the investor wants to achieve and that they want to minimize risk, as measured by the standard deviation of the return.
Markowitz showed that you shouldn’t just put your money into the stock with the best expected return, you should balance your portfolio across a number of stocks. This is called diversification.
(1927-) for Economics 1990
• Problem: Choosing an optimal selection of stocks.
• Key paper: “Portfolio Selection” The Journal of Finance 1952 • Keywords: Diversification, The Efficient Frontier
The computational theme this week is using Python’s built in libraries.
• Reading data using an Excel file of data downloaded from a Bloomberg terminal
• Solvinganoptimizationproblemusingcvxopt
We will put these ingredients together to solve the Markowitz problem with real data.
Before writing this weeks slides, I didn’t know how to read an Excel file into Python and I’d never used cvxopt before.
If you want to perform some task you have never done before in Python, you should do what I did: search on the internet for how to solve the problem in Python. I was able to quickly find advice on how to read an Excel file and how to perform quadratic programming in Python. My contribution is to put these components together to solve the Markowitz problem.
We won’t use a lot of libraries in this course, because as novice programmers you will want to write some simple code from scratch so you can understand how it works completely. In the real world, however, programmers spend most of their time gluing lib
1 Multivariate normal distributions
An Rn valued random vector Y is said to follow a multivariate normal distribution if it has density
(2π)−n2 exp −(y−μ) Σ (y−μ) (detΣ)−21
for some positive definite symmetric n × n matrix Σ and some vector μ ∈ Rn.
But where on earth does this strange formula come from? And what does positive-definite mean?
We will show that if X has independent components which follow a standard normal distribution and if Y = HX + b for some matrix H and some vector b then Y follows a multivariate normal distribution.
A standard normally distributed random variable, X, has density function
1 −x2 fX(x)=√ e 2
x = np.linspace(-4,4,1000)
y = 1/sqrt(2*pi)*np.exp(-x**2/2)
ax = plt.gca() ax.set_xlim(min(x),max(x)) ax.axvline(x=0, color=’gray’) ax.axhline(y=0, color=’gray’); ax.plot(x,y); #+
By definition, an Rn valued random vector X has a density fX if for all measurable sets A ⊆ Rn
P(X ∈ A) =
The integral on the right as an n-dimensional integral of the kind you will have studied in vector calculus courses. (You will learn what a measurable set means in FM01, just treat it as meaning the kind of set you can integrate over.)
As you will learn in FM01, not all random variables have a density.
If X and Y are independent random variables with densities fX and fY then their joint density function is
fX,Y (x, y) = fX (x)fY (y).
This is a basic result about multivariate distributions and is proved in FM01 (it follows quickly from the
definition of independent).
More generally if X1 , . . . Xn are independent random variables with densities fXi then the density of the
random vector X := (X1, . . . Xn) is
fX(x1,…,xn) =
So if X1 and X2 are independent standard normal random variables, their joint density function is
22 1 −x1 1 −x2 fX1,X2(x1,x2) = √2πe 2 √2πe 2
1 x2+x2 = exp−1 2
More generally, if X is a vector of independent standard normal random variables then its density function
⊤ fX(x)=(2π)−n2 exp −x x
We generate a 3D plot of the density of the multivariate standard normal distribution in two variables.
1.1 Generating a 3D plot of a normal distribution
x1 = np.linspace(-4, 4, 1000)
x2 = np.linspace(-4, 4, 1000)
x1_dash, x2_dash = np.meshgrid(x1, x2)
f = 1/(2*pi)*np.exp(-(x1_dash**2 + x2_dash**2)/2)
ax = plt.gca(projection=’3d’)
ax.plot_surface(x1_dash,x2_dash,f,
cmap=matplotlib.cm.coolwarm); #+
/tmp/ipykernel_27247/3151608498.py:6: MatplotlibDeprecationWarning: Calling
gca() with keyword arguments was deprecated in Matplotlib 3.4. Starting two
minor releases later, gca() will take no keyword arguments. The gca() function
should only be used to get the current axes, or if no axes exist, create new
axes with default keyword arguments. To create a new axes with non-default
arguments, use plt.axes() or plt.subplot().
ax = plt.gca(projection=’3d’)
There are 1000 x1 values we wish to consider and 1000 x2 values. That means there are 1000 × 1000 points to plot. The meshgrid command takes vectors representing the x1 and x2 values we need and generates two matrices vectors x′1 and x′2 each with 1000 × 1000 entries. If we run through all the entries of x′1 and x′2 we list all the possible combinations of x1 and x2 values. The (i, j) entry of x′1 is given by the j-th entry of x1 and the (i, j) entry of x′2 is given by the i-th entry of x2.
It is probably easier just to look at a numerical example.
x1 = np.array([0,1,2,3])
x2 = np.array([4,5,6])
x1_dash, x2_dash = np.meshgrid(x1,x2)
print( x1_dash )
print( x2_dash)
[[0 1 2 3]
[0 1 2 3]]
x1 = np.random.randn(1000)
x2= np.random.randn(1000)
ax = plt.gca() ax.set_aspect(1) ax.set_xlim(-4,4) ax.set_ylim(-4,4) ax.scatter(x1,x2); #+
[[4 4 4 4]
[6 6 6 6]]
1.2 Generating a scatter plot of a normal distribution
We generate a scatter plot of the density of the multivariate standard normal distribution in two variables. I personally find this easier to follow than the 3D plot.
1.3 Geneating a contour plot of a normal distribution
Another possible visualisation of the standard normal distribution is a contour plot
x1 = np.linspace(-4, 4, 1000)
x2 = np.linspace(-4, 4, 1000)
x1_dash, x2_dash = np.meshgrid(x1, x2)
f = 1/(2*pi)*np.exp(-(x1_dash**2 + x2_dash**2)/2)
ax = plt.gca()
ax.set_aspect(1)
ax.set_xlim(-4,4)
ax.set_ylim(-4,4)
ax.contour(x1_dash,x2_dash,f,
cmap=matplotlib.cm.coolwarm); #+
1.4 Applying a linear transformation
Suppose we compute a new variable Y from X by performing a linear transformation. For example we might define
or in matrix form
Y1 = 2X1, Y2 = X1 + X2
To perform matrix multiplication in numpy use @. Remember that * means elementwise multiplication. [7]:
X = np.random.randn(2,1000) H = np.array([[2,0],[1,1]])
fig,[ax1,ax2] = plt.subplots(nrows=1,ncols=2)
ax1.set_aspect(1)
ax1.scatter(X[0,:],X[1,:]);
ax1.set_title(‘X’)
ax2.set_aspect(1) ax2.scatter(Y[0,:],Y[1,:]) ax2.set_title(‘Y’); #+
Let h : Rn → Rn be a smooth invertible function with a smooth inverse u.
We think of h as mapping one system of coordinates x to another system of coordinated y, with u being the
inverse transformation.
We write ui (with 1 ≤ i ≤ n) for the components of u, so we can form a matrix J(y) with components
This is called the Jacobian matrix.
The change of variables formula for multi-dimensional integrals is
Differentiating
Hence J(y) = U so
f(x) dx = f(u(y)) det J(y) dy. h(A)
y = h(x), x = u(y).
We will use this formula to solve the following problem:
Suppose that X follows a standard multivariate normal distribution of dimension n. Let H be an invertible
(n × n)-matrix. Define What is the density of Y?
Write U = H−1. And define
Write uij for the components of U so that we may write
h(x) = Hx, u(y) = Uy.
∂ui =uij. ∂yj
detJ(y) = detU = det(H) = 1 det H
P(Y ∈ B) = =
In this calculation we have used in sequence
• The definitions of Y and h
• Thefacthistheinverseofu
• The known density of X
• The transformation formula for integrals • Thefacthistheinverseofu
• Our comnputation of J(y)
P(h(X) ∈ B)
P(X ∈ u(B))
(2π)− n2 exp − x
2 (2π)− n2 exp − u(y)
u(y) (det J (y)) dy ⊤
(2π)−n2 exp −(Uy) Uy (detJ(y))dy B 2
(2π)−n2 exp −y⊤U⊤Uy detUdy 2
We have shown that if X follows a standard normal distribution, then Y = HX has density ⊤⊤
(2π)− n2 exp − y U U y det U 2
where U = H−1.
Since the components Xi of X are independent, they are uncorrelated and so have no covariance. Each
component Xi has variance equal to 1. We deduce that
Let us compute the covariance matrix of Y. Write Yi for the components of Y and Hij for the components of H.
Cov(Xi,Xj) =
So Cov(Xi , Xj ) gives the components of the identity matrix.
1 0otherwise
In this calculation we have used
So the density of Y is
U⊤U = (H−1)⊤H−1 = (HH⊤)−1 = Σ−1
detU = (det(U⊤U))12 = (detHH⊤)−21 = (detΣ)−12 .
(2π)− n2 exp − y Σ y (det Σ)− 12
Cov(Yi, Yj ) = E(YiYj ) n
n b=1
= HiaHjbE(XaXb)
a=1 b=1 n n
= HiaHjbCov(Xa, Xb) a=1 b=1
• The definition of covariance together with the fact that the mean of Yi is zero (an easy exercise) • The explicit formula for multiplication of a vector by a matrix as a sum
• The linearity of expectation
• The fact that the Xa have mean 0
• ThefactthatCov(Xa,Xb)givesthecoefficientsoftheidentitymatrix • The explicit formula for matrix multiplication as a sum
Let us write Σ for the covariance matrix of Y .
We know that the density of Y = H(X) is given by
⊤⊤ (2π)− n2 exp − y U U y
where U = H−1. Notice that
We conclude that Y follows a multivariate normal distribution with μ = 0 and Σ = HH⊤. This justifies
the definition of a multivariate normal distribution in the case when μ = 0. Definition: A symmetric matrix Σ is positive-definite if for all non-zero v, v⊤Σv > 0. Lemma: If H is invertible then HH⊤ is symmetric and positive-definite.
Proof: (HH⊤)⊤ = (H⊤)⊤H⊤ = HH⊤. If H is invertible, then H⊤ is invertible as they have the same determinant. So if v is non-zero, w := H⊤v is non-zero, hence w⊤w = v⊤HH⊤v > 0 □
This explains why the definition of a multivariate normal random vector requires that Σ is positive-definite. I have only explained the definition of a multivariate normal distributions with mean μ = 0. The general
case is left for you to complete in the exercises.
fig = plt.figure(figsize=(6,6), dpi=300)
fig.patch.set_facecolor(‘black’)
x1 = np.random.randn(1000)
x2= np.random.randn(1000)
ax = plt.gca()
ax.set_aspect(1) ax.set_xlim(-4,4) ax.set_ylim(-4,4) ax.scatter(x1,x2); #+ ax.set_facecolor(‘black’) plt.savefig(‘normal-scatter.jpg’) plt.tight_layout()
2 Exercises 2.1 Exercise
Suppose that X follows a standard normal distribution. Define Y = HX + μ for a vector μ ∈ Rn and an invertible matrix H with U := H−1. Show that Y has density
⊤⊤ (2π)−n2 exp −(y−μ) U U(y−μ) detU
Show that the mean of Y is μ.
This generalizes the results I have shown you to include distributions with non-zero mean. (This is a maths exercise and is not automatically marked)
3 Cholesky Decomposition
Suppose that we know that Y has a multivariate normal distribution with density
(2π)−n2 exp −(y−μ) Σ (y−μ) (detΣ)−21
for some positive definite symmetric matrix Σ and vector μ.
We wish to know how to simulate Y .
To simulate Y, we reverse the process we used to derive the density of a multivariate normal. We want to find a matrix U such that
X = U(Y − μ) We can simulate X easily. We can then simulate Y as
Y = HX + μ
where H = U−1.
From our previous calculations we know that we will require HH⊤ = Σ if Y is to have the desired multi-
variate normal distributions.
Definition: A pseudo square-root of a symmetric matrix, Σ is a matrix H satisfying HH⊤ = Σ
Theorem: Any positive-definite symmetric matrix, Σ, has a unique lower-triangular pseudo square-root with positive diagonal, L. This is called the Cholesky decomposition of Σ.
follows a standard normal distribution.
Corollary:AllmultivariatenormalrandomvectorsYmaybewrittenasY =LX+μwhereXisastandard normal random vector, and μ is
Corollary: The mean of a multivariate normal distribution is μ and the covariance matrix is Σ (We have already shown this if the distribution arises as Y = HX + μ, but we now know this is the general case)
Application: If we can compute L, we can simulate Y by first simulating X then computing LX + μ Proof:BecauseLislowertriangular,wecanwritetheequationLLT =Σoutindetailas:
a1100…0 a11a21a31…an1
a21 a22 0 … 0
0 0 . . .
a22 a32 … an2
a31 a32 a33 . . .
… 0 . . .
a33 … an3 =Σ . .
Take the lower triangular part of both sides. This gives n(n−1) equations in the same number of unknowns.
an1 an2 an3
2 a211 = Σ11
The first row gives:
We can now solve for a unique positive a11. The next row gives: a21a11 = Σ21
a21 + a22 = Σ22
We solve the first for a21. Now we can read off the unique positive a22.
The third row gives:
a31a11 = Σ31 a31a21 + a32a22 = Σ32
a231 + a232 + a233 = Σ33 Continuing in this way we obtain an algorithm for finding L □
4 Exercises 4.1 Exercise
You can compute the Cholesky decomposition of a matrix using the Python function numpy.linalg.cholesky. Use this to compute the Cholesky decomposition of
We solve for a31 then a32 then a33.
and check your answer.
Use this to draw a scatter plot of the multivariate normal distribution with mean 0 and covariance matrix Σ.
To check your answer using the automated tests, create a 2 × 100000 matrix whose columns contain 100000 samples from such a distribution and call this matrix sample.
4.2 Exercise
Compute the Cholesky decomposition of the matrix
analytically. Write a function my_chol which takes a single parameter rho and returns the Cholesky de- composition of this matrix computed using your formula.
What condition do you need on ρ for this matrix to be positive-definite? 4.3 Exercise
Show that pseudo square roots are far from unique by finding 4 lower triangular pseudo square roots of the
and one upper triangular pseudo square root.
Write a function find_5_pseudo_square_roots which returns a list of the 5 roots you have found, starting with the lower traingular roots.
4.4 Exercise
What gaps do you think there are in the proof of the existence of the Cholesky decomposition given in lectures?
4.5 Exercise
Implement your own cholesky function using the following formulae:
if i > j, where lij are the components of L and sij are the components of Σ and L is the Cholesky decom-
position of the positive definite symmetric matrix Σ.
Note that this formula has been written using the mathematics convention that indices start at 1. To change
to the maths convention simply change the term k = 1 to k = 0 and the formulae will still hold. 15
j−1 ljj= sjj− ljk
lij = l sij − likljk
4.6 Further Reading
We now give a complete proof of the existence of the Cholesky decomposition of a positive-definite sym- metric matrix Σ.
Suppose as an induction hypothesis that the result is true for (n − 1) × (n − 1) matrices. We write Σ in
block diagonal form as
whereΣn−1 isan(n−1)×(n−1)symmetricmatrix,vn−1 isavectoroflength(n−1)andsisascalar.
LLT =Σ. This last condition is equivalent to the three conditions
Ln−1LTn−1 = Σn−1, (1) Ln−1wn−1 = vn−1 (1) (2)
wnT−1wn−1 + w2 = s. (3) It is easy to check that Sn−1 is positive definite. So there is a unique choice for Ln−1 by our induction
hypothesis.
Since Ln−1 is lower triangular with positive diagonal, it has a non-zero determinant and so is invertible. Hence there is a unique wn−1 solving (1). (Indeed this equation will already be in echelon form so it is quick and easy to solve). We now require that:
w2 = s − wnT−1wn−1. There will be a unique positive solution to this equation if and only if
s − wnT−1wn−1 > 0.
To see that this inequality will hold, we define a vector v in block diagonal form by:
−(LT )−1w v= n−1 n−1
we know that vT Σv > 0 as Σ is positive definite. We compute
Σn−1 vn−1 vnT−1 s
We now define L by
where wn−1 is some n vector to be determined, and w is a scalar to be determined. We require that Ln−1
is lower triangular with positive diagonal and that
Ln−1 0 wnT−1 w
n−1 n−1 n−1 n−1 n−1
Σ v 1 n−1
−(LT )−1w
5 The Markowitz Model
The return of an asset from time 0 to time T is defined by return = PT − P0
wherePT isthepriceoftheassetattimeT.
One possible reason to work with returns rather than the price is that you might expect the returns each day to follow the same probability distribution independent of the current price, so it is reasonable to look at historic returns to guess what the future returns might be.
Suppose that you create a portfolio across the different stocks, investing a certain proportion wi in stock i (1 ≤ i ≤ n). The values wi are called the weights of your portfolio. We write w for the vector of weights that determines the portfolio.
The total of all the proportions must clearly add up to 1. So we must have n
wi = 1 i=1
or in vector notation
We assume that the random vector r whose entries are given by returns of each stock i follows a multivariate normal distribution with mean μ and covariance matrix Σ.
−w (L )−1 n−1 n−1
−w (L )−1 n−1 n−1
−w (L n−1 n−1
This quantity is positive, as required, so there is a unique positive solution for w.
L LT L w −(LT )−1w
w nT − 1 L Tn − 1 s
= s − wnT−1wn−1.
−L w +L w n−1 n−1 n−1 n−1
−w (L n−1 n−1
−wnT−1wn−1 + s 0
−wnT−1wn−1 + s
We will allow negative weights. Just as you can use negative numbers to indicate borrowing money, you
where 1 is a vector of 1′s.
can use negative weights to indicate borrowing stock. Borrowing stock is called short selling.
The return of our porfolio will be a random variable given by the weighted sum of all the returns in the portfolio
rportfolio =
It follows that the expected return of the portfolio is
This is a controversial decision. A normal distribution is determined entirely by it’s mean and standard deviation. This means that if you know the distribution of returns will be normal, then once the expected return is known, the standard deviation is the only possible measure of risk. However, for more general distributions it is not so obvious how to measure risk.
This means that if you know that asset returns follow a multivariate normal distribution, then you can safely measure risk using the standard deviation. For other distributions of returns you might decide to use standard deviation to measure risk anyway just because it is so convenient and easy to understand, but you should keep in mind that you are choosing to measure risk in a way that is perhaps more convenient than it is logically sound.
Whatever the distribution of the returns, we can compute the covariance matrix of a portfolio from the covariance matrix of the individual assets as follows.
E(rportfolio) =
Markowitz suggested measuring the risk of the portfolio using its variance, or equivalently the standard
deviation.
i=1 j=1 = w⊤Σw
i=1 j=1 n n
Suppose you wish to vary the weights to achieve a target expected return of R but while minimizing the variance.
A formal mathematical description of this problem can now be written as
riwi = r⊤w.
μiwi = μ⊤w.
riwi, wiwjCov(ri,rj) wiwjΣij
5.1 The Efficient Frontier
w∈Rn subject to
w⊤ Σw 1⊤w = 1
I obtained historic data for 75 diff
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com