程序代写 PCA Algorithm – Data

PCA Algorithm – Data
Let’s say this is our data matrix (say our houses), where each data point is an d-dimensional row vector.
̈ ̨
T
̊ x1 ‹ ̊‹ ̊‹
̊ xT ‹ ̊‹
X“2
̊ . ‹
̊ . ‹ ̊‹ ̋‚
xTn
The dimensions are n ˆ d
⃝c -Trenn, King’s College London
2

PCA Algorithm
Step 1: Compute the mean row vector x ̄ “ n1 řni“1 xi ̈ ̨ ̈ ̨
T
̊1‹ ̊ x ̄ ‹ ̊‹ ̊‹ ̊‹ ̊‹ ̊‹ ̊T‹ ̊1‹ ̊ x ̄ ‹
Step2:ComputethemeanrowmatrixX ̄ “ ̊ ‹ ̈x ̄T “ ̊ ‹
̊.‹ ̊ . ‹ ̊.‹ ̊ . ‹ ̊‹ ̊‹ ̋‚ ̋‚
1 x ̄T
The dimensions are n ˆ d
⃝c -Trenn, King’s College London
3

PCA Algorithm
Step 3: Subtract mean (obtain mean centred data)
The dimensions are n ˆ d Example:
B “ X ́ X ̄
50 40 30 20 10
-3 -2 -1 -10
-20 -30
1 2 3 4 5 6 7
50 40 30 20 10
-3 -2 -1 -10
-20 -30
1 2 3 4 5 6 7
⃝c -Trenn, King’s College London
4

PCA Algorithm
Step 4: Compute the covariance matrix of rows of B C “ n1 B T B
ThedimensionsarepnˆdqT ˆpnˆdq“pdˆnqˆpnˆdq“dˆd
⃝c -Trenn, King’s College London 5

PCA Algorithm
Step 5: Compute the k largest eigenvectors v1 , v1 , . . . , vk of C (not covered how to do this in this module. You use Python or WolframAlpha).
Each eigenvector has dimensions 1 ˆ d
Pro tip: Python doesn’t sort the eigenvectors for you. Sort eigenvectors by decreasing order of eigenvalues.
Step 6: Compute matrix W of k-largest eigenvectors ̈ ̨
̊‹ ̊‹ ̊‹
W“ ̊v v … v ‹ 12k
̊‹ ̋‚
Dimensions of W are pd ˆ kq.
⃝c -Trenn, King’s College London
6

PCA Algorithm
Step 7: Multiply each datapoint xi for i P t1,2,…,nu with WT yi “ WT ̈ xi
Dimensionsofyi arepkˆdqˆpdˆ1q“kˆ1
⃝c -Trenn, King’s College London 7

PCA Algorithm
Step 7: Multiply each datapoint xi for i P t1,2,…,nu with WT yi “ WT ̈ xi
Dimensionsofyi arepkˆdqˆpdˆ1q“kˆ1
Congratulations! You’ve reduced the number of dimensions from d to k!
⃝c -Trenn, King’s College London 8

Why do we compute the covariance matrix?
-10
-20 Example illustration: -30
v2
50 40
v1 30
20 λ1 10
-3 -2 -1 λ 1 2 3 4 5 6 7 2
The covariance matrix measures the correlation between pairs of features. Finding the largest eigenvectors allows us to explain most of the variance in data The more variance is explained by the eigenvectors, the more important they are
⃝c -Trenn, King’s College London 9

Why do we compute the covariance matrix?
We can measure the explained variance by considering the quantity
řri“1 λi řdi“1 λi
Example:
⃝c -Trenn, King’s College London
10