How should we choose k?
⃝c -Trenn, King’s College London 2
How should we choose k?
⃝c -Trenn, King’s College London 3
Elbow method
Source: https://towardsdatascience.com/
The more clusters the better the lower the cost/score
Of course we could have one cluster per datapoint, but that does not really help One way of strike a good tradeoff is to use the elbow method
⃝c -Trenn, King’s College London 4
Elbow method
Source: https://towardsdatascience.com/
The more clusters the better the lower the cost/score
Of course we could have one cluster per datapoint, but that does not really help
One way of strike a good tradeoff is to use the elbow method
A natural definition: Pick the k P t2, 3, . . . , n ́ 1u that maximises costk ́1{costk, where costj is the cost of the clustering with j clusters
k “ 1 and k “ n are ruled out as they result in trivial clusters
⃝c -Trenn, King’s College London 5
Flat Clustering
So far we tried to assign the points to k clusters.
We haven’t assumed any structure among the clusters
⃝c -Trenn, King’s College London 6
Clustering
The problem with flat clustering is that it’s flat
Example: Cluster the following news headlines in 3 categories
⃝c -Trenn, King’s College London 7
Clustering
The problem with flat clustering is that it’s flat
Example: Cluster the following news headlines in 3 categories
‚ Deepmind’s AlphaBingo wins world championship
‚ Black holes swallow stars whole according to new study ‚ Someone didn’t dope
‚ Researcher finally figured out the rules of cricket
⃝c -Trenn, King’s College London 8
Clustering
The problem with flat clustering is that it’s flat
Example: Cluster the following news headlines in 3 categories
CS
Physics
Sports Sports
‚ Deepmind’s AlphaBingo wins world championship
‚ Black holes swallow stars whole according to new study ‚ Someone didn’t dope
‚ Researcher finally figured out the rules of cricket
⃝c -Trenn, King’s College London 9
Clustering
The problem with flat clustering is that it’s flat
Example: Cluster the following news headlines in 3 categories
Science
Science
Cycling
‚ Deepmind’s AlphaBingo wins world championship
‚ Black holes swallow stars whole according to new study
‚ Someone didn’t dope
‚ Researcher finally figured out the rules of cricket Structure is lost …
Cricket
⃝c -Trenn, King’s College London 10
Hierarchical Clustering
Recursive partitioning of data at increasingly finer granularity represented as a tree The leaves of the hierarchical cluster tree represent data.
Sci
Sports
News
⃝c -Trenn, King’s College London
11
CS
Phy
Hierarchical Clustering
Sci
Sports
News
⃝c -Trenn, King’s College London
12
CS
Phy
Hierarchical Clustering
News
Sci
Sports
CS
Phy
‚Google’s AlphaBingo wins world champinship ‚Someone finally figured out why neural nets work ‚Black holes swallow stars whole according to new study ‚Someone didn’t dope
‚Researcher finally figured out the rules of cricket
⃝c -Trenn, King’s College London 13
Hierarchical Clustering
News
Sci
Sports
Science Science Science Sports Sports
‚Google’s AlphaBingo wins world champinship ‚Someone finally figured out why neural nets work ‚Black holes swallow stars whole according to new study ‚Someone didn’t dope
‚Researcher finally figured out the rules of cricket
CS
Phy
⃝c -Trenn, King’s College London 14
Hierarchical Clustering
News
Sci
Sports
CS
CS Physics Cycling Cricket
‚Google’s AlphaBingo wins world champinship ‚Someone finally figured out why neural nets work ‚Black holes swallow stars whole according to new study ‚Someone didn’t dope
‚Researcher finally figured out the rules of cricket
CS
Phy
⃝c -Trenn, King’s College London 15