1 Introduction
2 Unsupervised Learning
3 Clustering
K-Means Clustering
Copyright By PowCoder代写 加微信 powcoder
Fuzzy K-means Clustering
Iterative Optimisation
Hierarchical Clustering
Competitive Learning
Clustering for Unknown Number of Clusters
DrH.K.Lam (KCL) UnsupervisedLearning&Clustering 7CCSMPNN2020-21 2/58
Introduction
DrH.K.Lam (KCL) UnsupervisedLearning&Clustering 7CCSMPNN2020-21 3/58
Introduction
Previously, all our training samples were labelled: these samples were said “supervised”.
We now investigate “unsupervised” procedures which use unlabelled samples Collecting and Labelling a large set of sample patterns can be costly
We can train with large amounts of (less expensive) unlabelled data, and only then use supervision to label the groupings found.
DrH.K.Lam (KCL) UnsupervisedLearning&Clustering 7CCSMPNN2020-21 4/58
Unsupervised Learning
DrH.K.Lam (KCL) UnsupervisedLearning&Clustering 7CCSMPNN2020-21 5/58
Unsupervised Learning
This is appropriate for large “data mining” applications where the contents of a large database are not known beforehand.
This is also appropriate in many applications when the characteristics of the patterns can change slowly with time.
Improved performance can be often achieved if classifiers running in an unsupervised mode are used.
We can use unsupervised methods to identify features that will then be useful for categorisation.
We gain some insight into the nature (or structure) of the data
DrH.K.Lam (KCL) UnsupervisedLearning&Clustering 7CCSMPNN2020-21 6/58
Clustering
DrH.K.Lam (KCL) UnsupervisedLearning&Clustering 7CCSMPNN2020-21 7/58
Clustering
What is clustering?
A cluster is a group of samples of similar properties.
Clustering is to put samples into one of the K clusters. The samples from the same cluster are similar to each other but dissimilar to samples in other clusters.
DrH.K.Lam (KCL) UnsupervisedLearning&Clustering 7CCSMPNN2020-21 8/58
Clustering for understanding: By putting similar objects together, it allows us to understand the common characteristics of the objects. For example, Retrieve relevant information from the Internet; Analyse the market by grouping similar types of customers; Extract useful data from a larger amount genetic information.
Clustering for utility: Use the extracted information in different areas. For example, summarisation, compression, finding the nearest neigbours.
DrH.K.Lam (KCL) UnsupervisedLearning&Clustering 7CCSMPNN2020-21 9/58
Clustering
Types of Clusterings
Clustering
Agglomerative
Hierarchical
Partitional
Partitional clustering: It divides the samples into clusters such that each sample is in one cluster.
Hierarchical clustering: It divides the samples into a nested clusters organised as a tree. Each cluster is the union of its sub-clusters.
DrH.K.Lam (KCL) UnsupervisedLearning&Clustering 7CCSMPNN2020-21 10/58
Clustering
Agglomerative approach: It treats each sample as a single cluster in the beginning and then groups them into clusters in a bottom-up approach.
Divisive approach: It treats the whole group of samples as a single cluster and then break them down into small clusters in a top-down approach.
DrH.K.Lam (KCL) UnsupervisedLearning&Clustering 7CCSMPNN2020-21 11/58
Clustering
a Original pointsà b Two clustersà
c Four clustersà
d Six clustersà
Figure 1: Examples of clusters.
Figure 8.1. Different ways of clustering the same set of points.
sense of Chapter 4 is supervised classiÞcation; i.e., new, u DrH.K.Lam (KCL) UnsupervisedLearning&Clustering 7CCSMPNN2020-21 12/58
K-Means Clustering
DrH.K.Lam (KCL) UnsupervisedLearning&Clustering 7CCSMPNN2020-21 13/58
K-Means Clustering
Problem: Given c clusters/classes (K = c) and n samples, xi, i = 1, 2, …, n, assign each sample to one of the c clusters.
Goal: Find the c mean vectors, m1, m2, …, mc.
Criterion: Minimise the distance of each sample xi to each mj, j = 1, 2, …,
Use Euclidean distance as metric and cost function ∥xi −mj∥2.
Group sets of samples according to the closet mean.
An iterative process is employed to adjust mj until classes do not change.
DrH.K.Lam (KCL) UnsupervisedLearning&Clustering 7CCSMPNN2020-21 14/58
K-Means Clustering
Algorithm: K-means clustering
begin initialise n, c (K = c), m1, m2, …, mc
do classify n samples according to the nearest mj, j ∈ {1,2,…,c} †;
recompute mj ‡ until no change in mj return m1, m2, …, mc
† minimise the Euclidean distance
‡ average of the samples in the jth cluster
Table 1: Pseudo Code: Algorithm for K-means clustering.
DrH.K.Lam (KCL) UnsupervisedLearning&Clustering 7CCSMPNN2020-21 15/58
K-Means Clustering
Example: Given the data set S and the initial centres as m1 = x1 and m2 = x6,
perform the K-means algorithm with K = 2 until the means converge.
S = {(−1, 3), (1, 2), (0, 1), (4, 0), (5, 4), (3, 2)}
x1 x2 x3 x4 x5 x6
6 5.5 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 −0.5
−1−2−1.5−1−0.50 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 x1
DrH.K.Lam (KCL)
UnsupervisedLearning&Clustering
7CCSMPNN2020-21
K-Means Clustering
Example: Initialisation: c = 2; m1 = (−1, 3) and m2 = (3, 2)
First iteration:
Step 1: Compute ∥xi −mj∥, i = 1, 2, …, 6; j = 1, 2 and classify samples.
i xi ∥xi −m1∥ ∥xi −m2∥ Assigned Cluster 1(−1,3)04.1231 1
Step 2: Recompute mj.
m1 = (−1,3)+(0,1) = (−0.5, 2), m2 = (1,2)+(4,0)+(5,4)+(3,2) = (3.25, 2),
2.2361 2.0000 2 2.2361 3.1623 1 5.8310 2.2361 2 6.0828 2.8284 2 4.1231 0.0000 2
5.5 5.5 55 4.5 4.5 44 3.5 3.5 33 2.5 2.5 22 1.5 1.5 11 0.5 0.5 00
−1−2−1.5−1−0.50 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6
x1 DrH.K.Lam (KCL)
−1−2−1.5−1−0.50 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6
x1 UnsupervisedLearning&Clustering
7CCSMPNN2020-21
K-Means Clustering
Second iteration:
m1 = (−0.5, 2) and m2 = (3.25, 2)
Step 1: Compute ∥xi −mj∥, i = 1, 2, …, 6; j = 1, 2 and classify samples.
1 2 3 4 5 6
xi (−1, 3) (1, 2) (0, 1) (4, 0) (5, 4) (3, 2)
∥xi −m1∥ 1.1180 1.5000 1.1180 4.9244 5.8523 3.5000
∥xi −m2∥ 4.3661 2.2500 3.4004 2.1360 2.6575 0.25
Assigned Cluster
1 1 1 2 2 2
Step 2: Recompute mj.
m1 = (−1,3)+(1,2)+(0,1) = (0, 2), m2 = (4,0)+(5,4)+(3,2) = (4, 2).
5.5 5.5 55 4.5 4.5 44 3.5 3.5 33 2.5 2.5 22 1.5 1.5 11 0.5 0.5 00
−1−2−1.5−1−0.50 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6
x1 DrH.K.Lam (KCL)
−1−2−1.5−1−0.50 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6
x1 UnsupervisedLearning&Clustering
7CCSMPNN2020-21
K-Means Clustering
Third iteration: m1 = (0, 2) and m2 = (4, 2) do not change.
6 5.5 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 −0.5
−1−2−1.5−1−0.50 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6
DrH.K.Lam (KCL) UnsupervisedLearning&Clustering 7CCSMPNN2020-21 19/58
K-Means Clustering
General Measures of Similarity
Defined a cost function as the sum of square Euclidean
distances of each cluster from its mean mj c
Je = ∑ ∑ ∥x−mj∥2 j=1x∈Dj
30 CHAPTER àà UNSUPERVISED LEARNING AND CLUS
the problem of interpreting and evaluating the results of clustering. Since be said about that problem, we shall merely observe that if additional cons render the results of minimizing Je unsatisfactory, then these consideratio be used, if possible, in formulating a better criterion function.
This criterion defines clusters as their mean vectors mj in the sense that it minimizes the sum of the squared lengths of the error x−mj.
The optimal partition is defined as one that minimizes Je,
Je = large
Je = small
also called minimum variance partition. Figure 10.9: When two natural groupings have very different numbers of p clusters minimizing a sum-squared-error criterion (Eq. 49) may not revea underlying structure. Here the criterion is smaller for the two clusters at t
Works fine when clusters form well separatedthcaonmat tpheamcotre natural clustering at the top.
clouds, less when there are great differences in the
number of samples in different clusters.
àààà2 Related Minimum Variance Criteria
By some simple algebraic manipulation (Problem 19) we can eliminate vectors from the expression for Je and obtain the equivalent expression
DrH.K.Lam (KCL)
Je = nis ̄i, UnsupervisedLearning&Clustering 7C2CSMPNN2020-21 20/58
K-Means Clustering
Clustering results are sensitive to initial locations of the centres
It works fine when clusters form well separated, compact, clouds
Simple to implement
Variations on standard k-means use distance metrics other than Euclidean distance
K-means clustering is also known as C-means clustering
DrH.K.Lam (KCL) UnsupervisedLearning&Clustering 7CCSMPNN2020-21 21/58
Fuzzy K-means Clustering
DrH.K.Lam (KCL) UnsupervisedLearning&Clustering 7CCSMPNN2020-21 22/58
Fuzzy K-means Clustering
Same as K-means clustering, except
A sample can have graded, or fuzzy membership of a cluster
Membership of sample x j in cluster i is defined by μi j
μij ∈ [0,1], ∑μij = 1
Cluster centres: m = j=1 in
A sample can therefore partially belong to multiple clusters
Not a partitional clustering method
Fuzzy K-means clustering also known as fuzzy C-means clustering
where b ≥ 1
DrH.K.Lam (KCL) UnsupervisedLearning&Clustering 7CCSMPNN2020-21 23/58
Fuzzy K-means Clustering
Goal is to find the cluster centres: m1, m2, …, mc
Each cluster centre is the the mean of all the samples weighted by the
membership value μi j (or called membership grade)
An iterative algorithm adjusts the cluster centres, and the membership values, to minimise the weighted sum of square Euclidean distances between the centres and the samples.
J = μb∥x −m∥
e∑∑ijji i=1 j=1
where b ≥ 1 (called “fuzzifier”).
Small b causes samples to be members of one/few clusters Large b causes samples to be members of many clusters
DrH.K.Lam (KCL) UnsupervisedLearning&Clustering 7CCSMPNN2020-21 24/58
Fuzzy K-means Clustering
Example: Assume K = 2. 2 cluster centres denoted by m1 and m2. S = {(−1, 3), (1, 2), (0, 1), (4, 0), (5, 4), (3, 2)}
x1 x2 x3 x4 x5 x6
(μ11,μ21) = (0.1,0.9), (μ12,μ22) = (0.2,0.8), (μ13,μ23) = (0.3,0.7),
for x1 for x2 for x3
(μ14,μ24) = (0.95,0.05), (μ15,μ25) = (0.85,0.15), (μ16,μ26) = (0.75,0.25)
for x4 for x5 for x6
DrH.K.Lam (KCL) UnsupervisedLearning&Clustering 7CCSMPNN2020-21 25/58
Fuzzy K-means Clustering
Example: Assume K = 2. 2 cluster centres denoted by m1 and m2. S = {(−1, 3), (1, 2), (0, 1), (4, 0), (5, 4), (3, 2)}
x1 x2 x3 x4 x5 x6
(μ11,μ21) = (0.1,0.9), (μ12,μ22) = (0.2,0.8), (μ13,μ23) = (0.3,0.7),
for x1 for x2 for x3
(μ14,μ24) = (0.95,0.05), (μ15,μ25) = (0.85,0.15), (μ16,μ26) = (0.75,0.25)
for x4 for x5 for x6 Cluster centres:
∑μ1jbxj b b b b b b
= μ11 x1+μ12 x2+μ13 x3+μ14 x4+μ15 x5+μ16 x6 μb+μb+μb+μb+μb+μb
∑μ2jbxj b b b b b b
m = j=1 1 n
11 12 13 14 15 16
m = j=1 2 n
j=1 DrH.K.Lam (KCL)
21 22 23 24 25 26
= μ21 x1+μ22 x2+μ23 x3+μ24 x4+μ25 x5+μ26 x6 μb+μb+μb+μb+μb+μb
UnsupervisedLearning&Clustering 7CCSMPNN2020-21 25/58
Fuzzy K-means Clustering
Algorithm: Fuzzy K-means clustering
begin initialise n, c (K = c), μij (i = 1,2,…,c; j = 1,2,…,n), m1, m2, …, mc
normalize cluster memberships: μi j = do
μbx update cluster centres: m = j=1
update memberships: μ
until no change in mi
return m1, m2, …, mc end
j i b−1 1/∥x −m∥ 2
c 1/∥x−m∥2 ∑ j r b−1
Table 2: Pseudo Code: Algorithm for Fuzzy K-means clustering.
Classification: classify samples according to the nearest m j , j ∈ {1, 2, . . . , c}
DrH.K.Lam (KCL) UnsupervisedLearning&Clustering 7CCSMPNN2020-21 26/58
Iterative Optimisation
DrH.K.Lam (KCL) UnsupervisedLearning&Clustering 7CCSMPNN2020-21 27/58
Iterative Optimisation
Once a criterion function has been selected, clustering becomes a problem of discrete optimisation.
As the sample set is finite there is a finite number of possible partitions, and the optimal one can be always found by exhaustive search.
Most frequently, an iterative optimisation procedure is adopted to select the optimal partitions.
The basic idea lies in starting from a reasonable initial partition and “move” samples from one cluster to another trying to minimise the criterion function.
In general, these kinds of approaches guarantee local, not global, optimisation.
DrH.K.Lam (KCL) UnsupervisedLearning&Clustering 7CCSMPNN2020-21 28/58
Iterative Optimisation
Iterative Clustering
Let us consider an iterative procedure to minimise the sum-of-squared-error
criterion Je,
where Jj = ∑ ∥x−mj∥2 is the effective error per cluster.
It can be proved that if a sample xˆ currently in cluster Di is tentatively moved
into Dj, the change of the errors in the 2 clusters is
J∗j=Jj+ nj ∥xˆ−mj∥2, Ji∗=Ji− ni ∥xˆ−mi∥2.
nj +1 ni −1
where ni and nj denote the number of data points in clusters i and j,
respectively.
Rightmove: ni ∥xˆ−mi∥2> nj ∥xˆ−mj∥2 ni −1 nj +1
After moving, update mi and mj
DrH.K.Lam (KCL) UnsupervisedLearning&Clustering 7CCSMPNN2020-21 29/58
Iterative Optimisation
Pseudo Code: Algorithm for Basic iterative minimum-squared-error clustering begin initialise n, c, m1, m2, …, mc
do randomly select a sample xˆ;
i ←− argmin∥xˆ −mi′∥ (classify xˆ)
if ni ̸= 1 (do not destroy a singleton cluster)
nj ∥xˆ−mj∥2 j̸=i nj+1
, j=1,2,…,c
computeρj = nj ∥xˆ−mj∥2 j=i nj−1
k ←− argminρj (find the minimum ρj) j
transfer xˆ to Dk from Di
recompute Je, mi, mk end
until no change in Je in n attempts
return m1, m2, …, mc end
Table 3: Pseudo Code: Algorithm for basic iterative minimum-squared-error clustering.
DrH.K.Lam (KCL) UnsupervisedLearning&Clustering 7CCSMPNN2020-21 30/58
Iterative Optimisation
Iterative Clustering
This procedure is a sequential version of the K-means algorithm, with the difference that K-means waits until n samples have been reclassified before updating, whereas the latter updates each time a sample is reclassified.
This procedure is more prone to be trapped in local minima, and depends from the order of presentation of the samples, but it is online!
Starting point is always a problem
DrH.K.Lam (KCL) UnsupervisedLearning&Clustering 7CCSMPNN2020-21 31/58
Hierarchical Clustering
DrH.K.Lam (KCL) UnsupervisedLearning&Clustering 7CCSMPNN2020-21 32/58
Hierarchical Clustering
The partitional clustering methods have only formed disjoint clusters.
Hierarchical clustering methods split clusters into subclusters; sub-clusters into sub-subclusters and so on.
Hierarchical information can be extracted from the dataset.
DrH.K.Lam (KCL) UnsupervisedLearning&Clustering 7CCSMPNN2020-21 33/58
Agglomerative Hierarchical Clustering
Agglomerative Hierarchical Clustering Goal: Given n samples and each sample represents a cluster, merge clusters one by one in each lever until c clusters are left.
First, each cluster contains exactly one sample (first level);
then partition into n − 1 clusters by merging the two nearest clusters into one (second level);
the next partition into n − 2 clusters by merging the two nearest clusters into one (third level); · · · until c clusters are achieved.
Pseudo Code: Algorithm for Agglomerative hierarchical clustering begininitialisec,cˆ←n,Di ←{xi},i=1,2,…,n
do cˆ = cˆ − 1
Find the nearest clusters, say, Di and Dj Merge Di and Dj
until c = cˆ
return c clusters end
c: target number of clusters; cˆ: current number of clusters.
Table 4: Pseudo Code: Algorithm for Agglomerative hierarchical clustering.
DrH.K.Lam (KCL) UnsupervisedLearning&Clustering 7CCSMPNN2020-21 34/58
Agglomerative Hierarchical Clustering
Agglomerative Hierarchical Clustering
Distance mHeoawstuorDee(tfionemInetaesr-uCrleusintetreSr-icmluilasrtietyr?similarity)
Cluster 1 Similarity? Cluster 2 ● single-link clustering – distance between clusters is the
dmin(Di,mDinji)m=um dimstaince b∥extw−exen∥constituent samples
● x∈Di ,x′ =D j
complete-link clustering – distance between clusters is the
maximum distance between constituent samples
Single-link clustering – distance between clusters is the minimum distance
● group-average clustering – distance between clusters is between constituent samples
average distance between all pairs of samples ●′
dmax(Di,cDenjt)ro=id clumstearxing -∥dxis−taxnc∥e between clusters is the distance ′
betweenxt∈hDeia,xv=erDajge of the feature vectors in each cluster Complete-link clustering – distance between clusters is the maximum distance
between constituent samples
davg(Di,Dj) = 1 ∑ ∑ ∥x−x′∥ where ni and nj are the number of ninj x∈Di x′=Dj
elements in clusters i and j, respectively.
Group-average clustering – distance between clusters is average distance
between all pairs of samples
dmean(Di,Dj) = ∥mi −mj∥ where mi and mj are cluster centres for
clusters i and j, respectively.
Centroid clustering – distance between clusters is the distance between the
average of the feature vectors in each cluster
DrH.K.Lam (KCL) UnsupervisedLearning&Clustering 7CCSMPNN2020-21 35/58
f possible values, then there is no principled argument that any particular c
lusters is better or “more natural” than another. Conversely, suppose that
Agglomerative Hierarchical Clustering
usually large gap between the similarity values for the levels corresponding
level of cluster may contain sets that are subcl
other, textual, representation uses brackets, suc
While such representations may reveal the hier
sters, h as: {{ archica
dtoc=4clusters. Insuchacase,onecanarguethatc=3isthemost
ber of clusters (Problem 35).
not naturally represent the similarities quantitatively. are generally preferred.
Level 2 Level 3
Level 4 Level 5
Level 6 Level 7 Level 8
x1 x2 x3 x4 x5 x6 x7 x8
100 90 80 70 60 50 40 30 20 10 0
(a) A dendrogram representing the results of hierarchical clustering
(b) A set or Venn diagram
0: A dendrogram can represent the results of hierarchical clustering algo-
e vertical axis shows a generalized measure of similarity among clusters.
vel 1 all eight points lie in singleton clusters; each point in a cluster is
Figure 10.11: A set or Venn diagram representation o
algorithms. representation of two-
was used in the dendrogram of Fig. 10.10) reveals the
lar to itself, of course. Points x6 and xà happen to be the most similar,
rged at level 2, and so forth. Because of their conceptual simplicity, hierarchical c
representation for hierarchical clustering is based on sets, in which each
dimensional data.
the quantitative distances between clusters. The level
Figure 2: An example of Agglomerative Hierarchical Clustering.
the best-known of unsupervised methods. The proced
Textual representation: {{x1,{x2,x3}},{{{x4,x5},{x6,x7}},x8}}
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com