Principal Components Analysis
Chris Hansman
Empirical Finance: Methods and Applications Imperial College Business School
February 15-16
Copyright By PowCoder代写 加微信 powcoder
Today: Four Parts
1. Geometric Interpretation of Eigenvalues and Eigenvectors 2. Geometric Interpretation of Correlation Matricies
3. An Introduction to PCA
4. An Example of PCA
Topic 1: Geometry of Eigenvalues and Eigenvectors
1. Technical definitions of eigenvalues and eigenvectors 2. Geometry of matrix multiplication: rotate and stretch 3. Eigenvectors are only stretched
4. Length of eigenvectors doesn’t matter
A Review of Eigenvalues and Eigenvectors
Consider a square n×n matrix A.
An eigenvalue λi of A is a (1×1) scalar:
The corresponding eigenvector of a ⃗vi is an (n × 1) vector Whereλi,⃗vi satisfy:
A⃗vi =λi⃗vi
Geometric Interpretation of Eigenvalues and Eigenvectors
Consider the square n×n matrix A.
A times any (n×1) vector gives an (n×1) vector
Useful to think of this as a linear function that: Takes n×1 vectors as inputs
Gives n×1 vectors as outputs:
f : Rn → Rn
Specifically, for the input vector⃗v, this is the function that outputs: f (⃗v ) = A⃗v
Geometric Interpretation of Eigenvalues and Eigenvectors
Consider the square n×n matrix A.
Think of this matrix as the function that maps vectors to vectors:
Lets say
f (⃗v ) = A⃗v
5 0 2 A= 2 3 and⃗v= 1
Whatisf(⃗v)? menti.com
The Matrix A Can be Thought of as a Function
Consider the square n×n matrix A.
Think of this matrix as the function that maps vectors to vectors:
Lets say
Whatisf(⃗v)?
f (⃗v ) = A⃗v
A= 2 3 and⃗v= 0
5 f(⃗v)=Ax= 2
The Matrix A Rotates and Stretches a Vector ⃗v Lets say
5 0 1 5 A= 2 3 and⃗v= 0 ⇒A⃗v= 2
The Matrix A Rotates and Stretches a Vector ⃗v Lets say
5 0 1 5 A= 2 3 and⃗v= 0 ⇒A⃗v= 2
The Matrix A Rotates and Stretches a Vector ⃗v Lets say
5 0 −1 −5 A= 2 3 and⃗v= 2 ⇒A⃗v= 4
The Matrix A Rotates and Stretches a Vector ⃗v Lets say
5 0 −1 −5 A= 2 3 and⃗v= 2 ⇒A⃗v= 4
Av=(−5 4)’
The Matrix A Rotates and Stretches a Vector ⃗v Lets say
5 0 0 0
A= 2 3 andv⃗2= 1 ⇒Av⃗2= 3 =3v⃗2
The Matrix A Rotates and Stretches a Vector ⃗v Lets say
5 0 0 0
A= 2 3 andv⃗2= 1 ⇒Av⃗2= 3 =3v⃗2
Av2=(0 3)’
For Some Vectors⃗v, Matrix A Only Stretches Lets say
5 0 0 0
A= 2 3 andv⃗2= 1 ⇒Av⃗2= 3 =3v⃗2
Av2=(0 3)’ = 3v2
For Some Vectors⃗v, Matrix A Only Stretches Lets say
5 0 0 0
A= 2 3 andv⃗2= 1 ⇒Av⃗2= 3 =3v⃗2
Some vectors, like v⃗2 = 1 have a special relationship with A:
The matrix A only stretches v⃗2 No rotations!
0 Are there any other vectors like v⃗2 = 1 ?
1 Lets try v⃗1 = 1
For Some Vectors⃗v, Matrix A Only Stretches Lets say
5 0 1 5
A= 2 3 andv⃗1= 1 ⇒Av⃗1= 5 =1v⃗1
For Some Vectors⃗v, Matrix A Only Stretches Lets say
5 0 1 5
A= 2 3 andv⃗1= 1 ⇒Av⃗1= 5 =1v⃗1
Av1=(5 5)’
For Some Vectors⃗v, Matrix A Only Stretches Lets say
5 0 1 5
A= 2 3 andv⃗1= 1 ⇒Av⃗1= 5 =5v⃗1
Av1=(5 5)’ = 5v1
For Some Vectors⃗v, Matrix A Only Stretches
For the matrix A, we’ve found two vectors with this special property: 1 5
v⃗1= 1 withAv⃗1= 5 = 5 v⃗1
v⃗2= 1 withAv⃗2= 3 = 3 v⃗2
λ1 We call these vectors eigenvectors of the matrix A
For Some Vectors⃗v, Matrix A Only Stretches
For the matrix A, we’ve found two vectors with this special property: 1 5
v⃗1= 1 withAv⃗1= 5 = 5 v⃗1
v⃗2= 1 withAv⃗2= 3 = 3 v⃗2
λ2 Note that they get stretched by different factors
5forv⃗1,3forv⃗2
We call these stretching factors eigenvalues:
λ1=5, λ2=3
Defining Eigenvalues and Eigenvectors
This notion of only stretching is the defining feature of eigenvalues and eigenvectors:
Eigenvalue λi and corresponding eigenvector ⃗vi are λi , ⃗vi such that: A⃗vi =λi⃗vi
In our example:
A⃗v1 = λ1⃗v1
5 01 1
A⃗v2 = λ2⃗v2
5 00 1
Length of Eigenvector Doesn’t Change Anything
Imagine multiplying an eigenvector by some constant (e.g. 21 ): 5 0 1 5
A= 2 3 andv⃗1= 1 ⇒Av⃗1= 5 =5v⃗1
Av1=(5 5)’ = 5v1
Length of Eigenvector Doesn’t Change Anything
Imagine multiplying an eigenvector by some constant (e.g. 21 ):
5 0 0.5 2.5
A= 2 3 andv⃗1= 0.5 ⇒Av⃗1= 2.5 =5v⃗1
v1=(0.5 0.5)’
Length of Eigenvector Doesn’t Change Anything
Imagine multiplying an eigenvector by some constant (e.g. 21 ):
5 0 0.5 2.5
A= 2 3 andv⃗1= 0.5 ⇒Av⃗1= 2.5 =5v⃗1
Av1=(2.5 2.5)’ = 5v1
v1=(0.5 0.5)’
Length of Eigenvector Doesn’t Change Anything
Imagine multiplying an eigenvector by some constant (e.g. 21 ):
5 0 0.5 2.5
A= 2 3 andv⃗1= 0.5 ⇒Av⃗1= 2.5 =5v⃗1
Av1=(2.5 2.5)’ = 5v1
v1=(0.5 0.5)’
Length of Eigenvector Doesn’t Change Anything
Imagine multiplying an eigenvector by some constant (e.g. 2): 5 0 0 0
A= 2 3 andv⃗2= 2 ⇒Av⃗2= 6 =3v⃗2
Av2=(0 3)’ = 3v2
Length of Eigenvector Doesn’t Change Anything
Imagine multiplying an eigenvector by some constant (e.g. 2): 5 0 0 0
A= 2 3 andv⃗2= 2 ⇒Av⃗2= 6 =3v⃗2
Length of Eigenvector Doesn’t Change Anything
Imagine multiplying an eigenvector by some constant (e.g. 2): 5 0 0 0
A= 2 3 andv⃗2= 2 ⇒Av⃗2= 6 =3v⃗2
Av2=(0 6)’
Length of Eigenvector Doesn’t Change Anything
Imagine multiplying an eigenvector by some constant (e.g. 2): 5 0 0 0
A= 2 3 andv⃗2= 2 ⇒Av⃗2= 6 =3v⃗2
Av2=(0 6)’ = 3v2
Length of Eigenvector Doesn’t Change Anything
Any multiple of an eigenvector is also an eigenvector: if vi is an an eigenvector, so is cvi for any scalar c.
As a result, often normalize them so that they have unit length i.e. vi′vi =1
Best to think of an eigenvector vi as a direction Think of eigenvalue λi as a stretching factor
Finding Eigenvalues of Symmetric Matricies
From here, focus on symmetric matricies (like covariance Σx ) How do we calculate the eigenvalues?
Use a computer
But if you have to, in the 2×2 case:
a b A=bd
(a+d)+(a−d)2 +4b2 2 (a+d)−(a−d)2 +4b2 2
7 0 A=02
λ2 = What are the eigenvalues of:
Menti.com
Finding Eigenvectors of Symmetric Matricies
From here, we will focus on symmetric matricies (like Σx )
Given the eigenvalues, how do we calculate the eigenvectors?
Again, use a computer
But if you have to, simply use:
Important note: Symmetric matrices have orthogonal eigenvectors,
for any i ̸=j
Finding Eigenvalues of Diagonal Matrices
Diagonal matrices are a subset of symmetric matrices: a 0
How do we calculate the eigenvalues and eigenvectors?
Topic 2: Geometric Interpretations of Correlation Matrices
1. Uncorrelated assets: eigenvalues are variances
2. Correlated assets: first eigenvector finds direction of maximum variance
za Uncorrelated Standardized Data: z = zb
−10 −5 0 5 10 z1
1 0 Cov(z)=Σz= 0 1
−10 −5 0 5 10
xa Uncorrelated (Non-Standardized) Data: x = xb
−10 −5 0 5 10 xa
4 0 Cov(x)=Σx= 0 4
−10 −5 0 5 10
xa Uncorrelated (Non-Standardized) Data: x = xb
−10 −5 0 5 10 xa
3 0 Cov(x)=Σx= 0 1
−10 −5 0 5 10
xa Uncorrelated (Non-Standardized) Data: x = xb
−10 −5 0 5 10 xa
3 0 Cov(x)=Σx= 0 1
−10 −5 0 5 10
xa Uncorrelated (Non-Standardized) Data: x = xb
−10 −5 0 5 10 xa
3 0 Cov(x)=Σx= 0 1
−10 −5 0 5 10
Eigenvalues of Σx with Uncorrelated Data 3 0
What are the eigenvalues and eigenvectors of Σx ?
Uncorrelated assets: eigenvalues are variances of each asset return! Eigenvectors:
1 0 v1= 0 , v2= 1
First eigenvalue points in the direction of the largest variance We sometimes write the eigenvectors together as a matrix:
1 0 Γ=(v1 v2)= 0 1
xa Uncorrelated (Non-Standardized) Data: x = xb
||V1||=λ1=3
−10 −5 0 5 10 xa
3 0 Cov(x)=Σx= 0 1
−10 −5 0 5 10
xa Uncorrelated (Non-Standardized) Data: x = xb
−10 −5 0 5 10 xa
1 0 Cov(x)=Σx= 0 3
−10 −5 0 5 10
Eigenvalues of Σx with Uncorrelated Data 1 0
What are the eigenvalues and eigenvectors of Σx ?
With uncorrelated assets eigenvalues are just the variances of each asset return!
Eigenvectors:
0 1 v1= 1 , v2= 0
Note that the first eigenvalue points in the direction of the largest variance
We sometimes write the eigenvectors together as a matrix:
0 1 Γ=(v1 v2)= 1 0
xa Correlated Data: z = xb
−10 −5 0 5 10 xa
2 1 Cov(x)=Σx= 1 2
−10 −5 0 5 10
Eigenvalues of Σx with Correlated Data 2 1
What are the eigenvalues and eigenvectors of Σx ?
With correlated assets eigenvalues are a bit trickier
The eigenvalues are 3 and 1
Which of the following is not an eigenvector of Σx ? 1 −1 21
1 1 1 r=2,w=2,s=2
menti.com
xa Correlated Data: z = xb
−10 −5 0 5 10 xa
2 1 Cov(x)=Σx= 1 2
−10 −5 0 5 10
Eigenvectors are Γ = (v1 v2) = 2 2 22
1 −1 1 1
V2=(−1 1)’ V1=(1 1)’
−10 −5 0 5 10 xa
2 1 Cov(x)=Σx= 1 2
−10 −5 0 5 10
Eigenvectors of Σx with Correlated Data 2 1
1 1 Γ=(v1 v2)= 2 2
Just as with uncorrelated data, first eigenvector finds the direction with the most variability
Second eigenvector points in the direction that explains the maximum amount of the remaining variance
Note that the two are perpendicular
This is the geometric implication of the fact that they are orthogonal:
The fact that they are orthogonal also implies:
Eigenvalues of Σx with Correlated Data 2 1
Σx= 1 2 λ1=3 λ2=1
The eigenvalues are the same as our uncorrelated data
Note that the scatter plot looks quite similar to our uncorrelated data
Just rotated a bit
Imagine rotating the data so that the first eigenvector is lined up with the x-axis
The first eigenvalue is the variance (along the x axis) of this rotated data
The second eigenvalue is the variance along the y axis
Eigenvalues Represent Variance along the Eigenvectors
V2=(−1 1)’ V1=(1 1)’
−10 −5 0 5 10 xa
2 1 Cov(x)=Σx= 1 2
−10 −5 0 5 10
Eigenvalues Represent Variance along the Eigenvectors
−10 −5 0 5 10 xa
2 1 Cov(x)=Σx= 1 2
−10 −5 0 5 10
What is This Rotation
So with a little rotation, we take our data drawn from xa
And back what looks like our uncorrelated data, which was
generated by
̃ 3 0 Σ=01
How do we rotate x into this uncorrelated data? Γ′x
Topic 3: Introduction to Principal Components Analysis
Principal Components Analysis
This notion of rotation underlies the concept of Principal Components Analysis
Consider a general cross-section of returns on m assets xt
E[xt] = α Cov(xt) = Σx
Principal Components Analysis
m×1 E[xt] = α Cov(xt) = Σx
Define the normalized asset returns: x ̃t = xt − α Let the eigenvalues of Σx be given by:
λ1 ≥λ2 ≥λ3 ≥···≥λm Let the eigenvectors be given by:
v1,v2,v3,···vm
Principal Components Analysis
Cov(xt) = Σx
Note that the eigenvectors are orthogonal vi′vj =0
Because the scaling doesn’t matter, we can normalize: vi′vi =1
These scaled, normalized vectors are called orthonormal
As before, let Γi be the matrix with eigenvectors as columns:
Γi = [v1 v2 ···vm]
Principal Components Analysis
Define the Principal components variables as the rotation: p = Γ ′ x ̃ t
Or written out further:
E[p]= 0
v 1′ ( x t − α ) v′(xt −α)
2 p= .
v m′ ( x t − α )
Principal Components Analysis
Recall the eigendecomposition:
λ1 ··· 0 …
Λ= . .. . 0 ··· λm
Cov(p) = Cov(Γ′x ̃ ) = Γ′Cov(x )Γ tt
= Γ′ΓΛΓ′Γ = Λ
Aside: Variance Decomposition
A nice result from linear algebra:
mm ∑var(xit)= ∑λi
So the proportion of the total variance of xi that is explained by the
largest eigenvalue λi is simply:
λi ∑mi=1 λi
Principal Components Analysis
Our Principal components variables provide a transformation of the data into variables that are:
Uncorrelated (orthogonal)
Ordered by how much of the total variance they explain (size of
eigenvalue)
What if we have many m, but the first few (2, 5, 20) Principal components explain most of the variation:
Idea: Use these as “factors” Dimension reduction!
Principal Components Analysis
Note that because Γ′ = Γ−1
We can also partition Γ into the first K < m eigenvectors and the remaining m−K
Γ = [Γ1 Γ2]
Partition p into its first K elements and the remaining m − K
We can then write
p1 p= p2
xt =α+Γ1p1+Γ2p2
Principal Components Analysis
xt =α+Γ1p1+Γ2p2 This looks just like a factor model:
εt =Γ2p2
One minor difference:
xt = α + Bft + εt
Cov(εt) = Ψ = Γ2Λ2Γ′2
which is (likely) not diagonal, as assumed in factor analysis
Implementing Principal Components Analysis
xt =α+Γ1p1+Γ2p2
Recall the sample covariance matrix:
Σˆ x = T1 X ̃ X ̃ ′
Calculate this, and perform the eigendecomposition (using a computer):
We now have everything need to compute the sample Principal
components at each t:
P = [ p 1 p 2 · · · p T ] = Γ ′ X ̃
An Example of Principal Components Analysis
1. Principal Components on Yield Changes for US Treasuries 2. Most Patterns Explained by 2 Principal Components
Recall: Implementing Principal Components Analysis
In each period t, we see a vector of asset returns xt: We’ll consider the covariance of asset returns
Cov(xt) = Σx
Today, we’ll consider the yields on US Treasury debt
Yields on T-bills, notes and bonds
Constant Maturity Treasuries from 3 months to 20 years
The Yield Curve: May 24th, 2001
60 84 120 240
Maturity (Months)
The Yield Curve: January 6th, 2003
60 84 120 240
Maturity (Months)
The Yield Curve: January 4th, 2005
24 36 60 84 120 240
Maturity (Months)
Constant Maturity Treasury Yields: 2001-2005
DGS10 DGS2 DGS20 DGS3 DGS3MO DGS5 DGS6MO DGS7
Implementing Principal Components Analysis
What patterns do you see in the data?
What two or three things characterize the data in a given period?
Idea behind principal components:
Can we choose a small number of variables that characterize most of
the variation in the data?
Do these variables have an intuitive meaning that helps us understand the data itself
Technical Point: We Will Actually Examine Differences
DGS10 DGS2 DGS20 DGS3 DGS3MO DGS5 DGS6MO DGS7
xnt =Ynt−Ynt−1
Technical Point: We Will Actually Examine Differences
DGS10 DGS2 DGS20 DGS3 DGS3MO DGS5 DGS6MO DGS7
Yield Difference
xnt =Ynt−Ynt−1
Setting up The Basics
xt is cross-section of yield chages (at time t):
Y3 month,t −Y3 month,t−1
Y6 month,t −Y6 month,t−1 xt = . .
Y20 Year,t −Y20 Year,t−1 We denote the mean by: α = E[xt]
And will work with demeaned values: x ̃t = xt − α
Implementing Principal Components Analysis
First step: computing covariance matrix Σˆx =coˆv(x ̃t)
Below is the (slightly more intuitive) correlation matrix:
1.000 0.815 0.641 0.815 1.000 0.863 0.641 0.863 1.000 0.480 0.696 0.874 0.443 0.651 0.833 0.395 0.592 0.779 0.341 0.539 0.733 0.312 0.499 0.694 0.236 0.411 0.610
0.480 0.443 0.696 0.651 0.874 0.833 1.000 0.972 0.972 1.000 0.919 0.955 0.882 0.922 0.842 0.888 0.761 0.812
0.395 0.341 0.312 0.592 0.539 0.499 0.779 0.733 0.694 0.919 0.882 0.842 0.955 0.922 0.888 1.000 0.974 0.953 0.974 1.000 0.979 0.953 0.979 1.000 0.894 0.942 0.960
0.236 0.411 0.610 0.761 0.812 0.894 0.942 0.960 1.000
Implementing Principal Components Analysis
With Σˆ x = coˆv (x ̃t ) in hand, can perform the eigendecomposition: Σˆx =ΓΛΓ′
Where Λ is the diagonal matrix of eigenvalues λ1 ··· 0
... Λ= . .. .
And Γ is the orthogonal matrix of eigenvectors
This is easy in R: eigen(Sigma x)
Implementing Principal Components Analysis
We can now construct the Principal components variables: p=Γ′(xt −α)
This creates 9 new principal components variables
Each is a linear combination of our original 9 yields Can use them to recreate xt
First variable now explains largest proportion of variance
Examining the First Few Principal Components
By examining the first few factors, we get a sense of the key drivers of these yields
In particular, exploring the first few columns of Γ (the Loadings) helps us interpret/name these drivers
Examining the factors themselves gives a sense of how they have changed over time
Loadings on First Principal Component
(Cumulative) First Principal Component: 2001-2005
First Principal Component
Constant Maturity Treasury Yields: 2001-2005
DGS10 DGS2 DGS20 DGS3 DGS3MO DGS5 DGS6MO DGS7
First Principal Component: Interpretation
First principal component measures level shifts in the yield curve Basically just an average (weighted) of all the yields
p1t =−0.11×CMT3Month−0.16×CMT6Month+···−0.31×CMT20Year
Maybe not surprising—they all move together
Highest weight given to middle (belly) of the yield curve
Loadings on Second Principal Component
(Cumulative) Second Principal Component: 2001-2005
Second Principal Component
Constant Maturity Treasury Yields: 2001-2005
DGS10 DGS2 DGS20 DGS3 DGS3MO DGS5 DGS6MO DGS7
Second Principal Component: Interpretation
Second principal component measures the slope of the yield curve
Difference between yields on long and short maturities
High when the yields are spread out, low when they are compressed
How much of the variation does each component explain?
Recall: the proportion of the total variance of xt that is explained by principal component i can be measured by:
λi ∑mi=1 λi
Where λi is the eigenvalue associated with that principal component
Proportion of Variance Explained by Each Component
V2 V3 V4 V5 V6 V7 V8 V9
Predicted Yield
How much of the variation does each component explain?
First principal component explains just under 85% of the variation in the data
The second explains almost another 10%
Together, these account for 95% of the variation
Let’s use these two components to predict all yields over time
(Cumulative) First Two Principal Components: 2001-2005
First Two Principal Components
Summarizing with the first two factors
Recall that we can split up
Rewriting it as: Here:
xt =α+Γ1p1+Γ2p2
Γ1 is the first two columns of Γ
p1 represents the first two principal components
Now imagine considering the following “summary” of xt
xˆ t = α + Γ 1 p 1
We are forgetting about 7 of our principal components
Might still do a decent job of capturing the patterns in the data:
Predicted Differences Using Two Principal Components
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com