CS代考 PCA Example

PCA Example
Say you have a bunch of house listings and you would like to group them into student housing, regular and luxury
⃝c -Trenn, King’s College London 2

PCA Example
Let’s say we have the following features
‚ Floor size pm2 q
‚ Numberofrooms
‚ Distancesupermarket
‚ DistanceKing’s
‚ Hipstervibe
Let’s say we want to reduce to two features to have a nice visual representation. Can we reduce it to two features?
⃝c -Trenn, King’s College London 3

Why does PCA work?
Reduce to two or three features ‚ Size
§ Floor size pm2 q
§ Numberofrooms ‚ Location
§ Distancesupermarket § DistanceKing’s
§ Hipstervibe
Why does this make sense?
⃝c -Trenn, King’s College London 4

PCA
size (m2)
60 50 40 30 20
1 2 3
4 5 6
7 number of rooms
For example the floor size and the number of rooms are often correlated
Let’s see how it would look like if we compressed both dimensions to one dimension
⃝c -Trenn, King’s College London 5

PCA
size (m2)
60 50 40 30 20
1 2 3
4 5 6
7 number of rooms
If we take the line that minimises the Least Squares Distance, we get …
⃝c -Trenn, King’s College London 6

PCA
size (m2)
60 50 40 30 20
1 2 3
4 5 6
7 number of rooms
If we take the line that minimises the Least Squares Distance, we get … … the following projection of the points.
⃝c -Trenn, King’s College London 7

PCA
size (m2)
60 50 40 30 20
After cleaning up, this is what we get
⃝c -Trenn, King’s College London
8
1 2 3 4 5 6
7 number of rooms

PCA
What if we take a different line? (purple)?
⃝c -Trenn, King’s College London
9
size (m2)
60 50 40 30 20
1 2 3 4 5 6
7 number of rooms

PCA
size (m2)
60 50 40 30 20
The spread here is the variance of the data And we would like to maximise it.
⃝c -Trenn, King’s College London
10
1 2 3 4 5 6
7 number of rooms

PCA
size (m2)
60 50 40 30 20
1 2
The spread here is the variance of the data
And we would like to maximise it.
Intuitively, the more variance we capture, the better we can approximate the higher-dimensional space (here d “ 2q
⃝c -Trenn, King’s College London 11
3 4
5 6
7 number of rooms

PCA
size (m2)
60 50 40 30 20
1 2
The spread here is the variance of the data
And we would like to maximise it.
3 4
5 6
7 number of rooms
Intuitively, the more variance we capture, the better we can approximate the higher-dimensional space (here d “ 2q
If we compare them, we see that the points are less spread out on the purple line
The black line actually maximises the spread and therefore is the best for approximating the higher-dimensional space
⃝c -Trenn, King’s College London 12

Why should we maximise the variance?
60 50 40 30 20
1234567
⃝c -Trenn, King’s College London 13

Why should we maximise the variance?
60 60 50 50 40 40 30 30 20 20
1234567 1234567
Left: New input
Right: Two potential lines onto which we can project. Consider projecting to a horizontal and a vertical line.
⃝c -Trenn, King’s College London 14

Why should we maximise the variance?
This is how the output would looks like Which line retains more information?
⃝c -Trenn, King’s College London 15

Why should we maximise the variance?
This is how the output would looks like
Which line retains more information?
Clearly the black line, all points on the purple line are at the same location.
⃝c -Trenn, King’s College London 16

3D to 2D
Let’s say our three dimensions (x1, x2, and x3) are as on the l.h.s. ‚ Distancesupermarket
‚ DistanceKing’s ‚ Hipstervibe
Then after reducing it to 2D it looks like the r.h.s.
We may also wish to reduce it to just a line p1Dq, but we can see that this would be
very lossy
⃝c -Trenn, King’s College London 17

5D to 3
If we plot our 5D data using the components we found (1 for size and 2 for location)
We get this 3D plot
We can see that our different classes student housing, regular and luxury are well-separated.
⃝c -Trenn, King’s College London 18

5D to 3
If we plot our 5D data using the components we found (1 for size and 2 for location)
We get this 3D plot
We can see that our different classes student housing, regular and luxury are well-separated.
This is the whole point: reduce the information, but keep the important
information!
⃝c -Trenn, King’s College London 19