MAST 90138: MULTIVARIATE STATISTICAL TECHNIQUES
See Ha ̈rdle and Simar, chapter 16.
7 7.1
CANONICAL CORRELATION ANALYSIS (CCA) MOST INTERESTING LINEAR COMBINATIONS
A tool developed by Hotelling for discovering and quantifying as- sociation between two sets of variables
Setup: two random vectors X ∈ Rq and Y ∈ Rp, X∼(μ,ΣXX), Y∼(ν,ΣYY).
Also,
a q × p matrix.
Cov(X,Y)=E(X−μ)(Y −ν)T =ΣXY =ΣTYX,
1
Goal: Find two coefficient vectors a ∈ Rq and b ∈ Rp such that the correlation between
is maximized .
For any a and b
aT X and bT Y
ρ(a, b) := Corr (aT X, bT Y ) = aT ΣXY b
(aT ΣXXa)1/2(bT ΣY Y b)1/2
Also, for any constant c, d ∈ R+, we have ρ(a, b) = ρ(ca, db)
2
Refined goal: Find a and b such that aT ΣXY b is maximized subject to the constraints
aTΣXXa = 1 and bTΣXXb = 1 How to solve this?
3
For this problem, define
K = Σ−1/2ΣXY Σ−1/2
XX YY
Also, look at its singular value decomposition (SVD): K = ΓΛ∆T
Recall: Γ = (γ1|…|γk), ∆ = (δ1|…|δk), Λ = diag(λ1/2,…,λ1/2) 1k
Also, the number k is
– rank(K)
– rank(ΣXY ) – rank(ΣY X)
– the number of non-zero eigenvalues of KKT or KT K. (Precisely, λ1, . . . , λk are the non-zero eigenvalues of these two matrices)
4
Now, define
for i = 1, . . . , k. They are called canonical correlation vectors.
Using them, define the canonical correlation variables ηi =aTi Xandψi =bTi Y
ai = Σ−1/2γi and bi = Σ−1/2δi XX YY
for i = 1,…,k.
The quantities
fori=1,…,karecalled canonicalcorrelationcoefficients
ρi = λ1/2 i
5
Solution to our problem:
η1 =a1X, ψ1 =b1Y
Proof: demonstrated in class (can be in exam!) Precisely, Cov(η1, ψ1) = ρ1 = √λ1
6
In fact, similar to PCA and PLS, one can continue to ask: Givenη1 =a1Xandψ1 =b1Y,finda∈Rq andb∈Rp suchthat
Cov(aTX,bTY)
is maximized subject to
– Corr(aT X, η1) = Corr(bT Y, η1) = 0, – Corr(aT X, ψ1) = Corr(bT Y, ψ1) = 0 – aTΣXXa = 1 and bTΣXXb = 1
Turns out, the solution is: η2 = a2X and ψ2 = b2Y . Proof: shown in class. (maybe on exam again!) Precisely, Corr(η2, ψ2) = ρ2 = √λ2
7
Continuing, we can further ask:
Given η1,…,etal and ψ1,…,ψl, find a ∈ Rq and b ∈ Rp such that
Cov(aTX,bTY)
is maximized subject to
– Corr(aTX,ηi) = Corr(bTY,ηi) = 0, for all i = 1,…,l – Corr(aTX,ψi) = Corr(bTY,ψi) = 0, for all i = 1,…,l – aTΣXXa = 1 and bTΣXXb = 1
As expected the solution is: ηl+1 = al+1X and ψl+1 = bl+1Y . This process can continue until l = k.
Theorem (Ha ̈rdle and Simar p.447):
Let η = (η1,…,ηk)T and ψ = (ψ1,…,ψk)T, we have
T T T Ik Λ Cov(η,ψ) = ΛIk
8
Theorem(invariancetoinvertibletransformation,Ha ̈rdleandSimar p.448):
LetX∗ =UTX+uandY∗ =VTY+v,whereUandVareinvertible matrices, and u and v are constant vectors. Then
– the canonical correlations between X∗ and Y ∗ are the same as those between X and Y ;
– the canonical correlation vectors of X∗ and Y ∗ are given by a∗i = U−1ai, b∗i = V−1bi, i = 1,…,k.
9