Hierarchical
k
Clustering Means Clustering
– Khasha
Dehnad
1
Clustering Task
–
classes of similar objects
Clustering refers to grouping records, observations, or tasks into
– – – – –
–
Cluster is collection records similar to one another
Records in one cluster dissimilar to records in other clusters
Clustering is unsupervised data mining task
Therefore, no target variable specified
Clustering algorithms segment records and maximize homogeneity in subgroups
Similarity to records outside cluster minimized
2
Clustering Task
(cont’d)
–
geographic areas, according to zip code
–
–
– – – –
For example,
Claritas
, Inc. provides demographic profiles of
PRIZM segmentation system clusters zip codes in terms of lifestyle
–
types
Recall clusters identified for 90210 Beverly Hills, CA
Cluster 01: Blue Blood Estates
“Established executives, professionals, and ‘old money’ heirs that live in America’s wealthiest suburbs…”
Cluster 10: Cluster 02: Cluster 07: Cluster 08:
Bohemian Mix Winner’s Circle Money and Brains Young Literati
http://www.claritas.com/MyBestSegments/Default.jsp?ID=20#
3
Clustering Task
•
(cont’d)
Clustering Tasks in Business and Research
–
budget
Target marketing for niche product, without large marketing
–
Segment financial behavior into benign and suspicious categories
–
characteristics
Gene expression clustering, where genes exhibit similar
Clustering often performed as preliminary step in data mining Clustering results used as input to other data mining techniques
–
process
–
4
Clustering Task
(cont’d)
Between-cluster variation: Within-cluster variation:
–
–
–
–
Clustering identifies groups of highly Algorithms construct clusters where between
similar records
cluster variation (WCV)
(BCV) large, as compared to within
– cluster variation
–
Analogous to concept behind analysis of variance
5
Clustering Task
– –
(cont’d)
Applying cluster analysis to enormous databases helpful Reduces search space for downstream algorithms
•
classification
– – – –
Cluster analysis addresses similar issues encountered in
Similarity measurement
Recoding categorical variables Standardizing and normalizing variables Number of clusters
6
Distance Function
–
neighbors?
–
the similarity between coordinates
How is
similarity defined between an unclassified record and its
A
distance metric is a real
–
used to measure
,
valued function
x
d , and
y
z
with properties:
1. d(x,y)0,andd(x,y)0if andonlyif x y 2. d(x,y)d(y,x)
3. d(x,z)d(x,y)d(y,z)
–
Property 1: Distance is always non
–
negative
–
from “B to A”
Property 2: Commutative, distance from “A to B” is distance
–
must be less than or equal to distance from “A to B to C”
Property 3:
Triangle inequality holds, distance from “A to C”
7
Clustering Task
•
–
(cont’d)
Euclidean Distance measures distance between records
Measuring Similarity
d
(x,y) (xy)2,where Euclidean i i
i
xx,x,…,x andyy,y,…,y representmattributevaluesoftworecords 12m12m
–
Minkowski
–
Block Distance
and
Other distance measurements include City Distance
dCity-Block(x,y)xi yi i
d
(x,y)x y q Minkowski i i
i
8
Clustering Task
–
(cont’d)
Different From” function measures similarity between categorical attributes
“
different(xi , yi ) 0 if xi yi 1 otherwise
different(
max Normalization or Z
–
Euclidean Distance function
)
for each categorical attribute in
Substitute
Normalizing data enhances performance of clustering algorithms
x,y
– –
Use Min
–
–
Score Standardization
Min – Max Normalization X min( X ) Z – Score Standardization X mean( X ) max(X)min(X) standarddeviation(X)
MIS 637
Summer 2013
9
Hierarchical Clustering Methods
–
•
–
–
–
Clustering algorithms either
Hierarchical or Non
–
Hierarchical
Hierarchical
–
recursive partitioning (Divisive Methods) or combining (Agglomerative Methods) existing clusters
Tree like
cluster structure (
dendogram
) created through
Divisive Methods
All records initialized into single cluster
At each iteration, most dissimilar record split off into separate Continues until each record represents single cluster
–
cluster
10
Hierarchical Clustering Methods (cont’d)
–
– – – – – – –
–
Agglomerative Methods
Each observation initialized to become
At each iteration two closest clusters aggregated together Number of clusters reduced by one, each step
Eventually, all records combined into single cluster Agglomerative more popular hierarchical method
Therefore, focus remains on this
Measuring distance between records straightforward once recoding and normalization applied
its own
cluster
approach
However, how is distance between clusters determined?
11
Hierarchical Clustering Methods (cont’d)
•
Distance Between Clusters
–
clusters, A and B
Several criteria examined to determine distance between
–
–
Single Linkage
Known as Nearest
–
Neighbor Approach
most similar records from each
Tends to form long, slender clusters
Sometime heterogeneous records clustered together
–
record in cluster B
Minimum distance between any record in cluster A, and any
–
cluster
Cluster similarity based on
– –
12
Hierarchical Clustering Methods (cont’d)
–
in cluster B
Measure is average distance of records in cluster A, from records
Resulting clusters have approximately equal within Next, linkage methods examined using small data set
2 5 9 15 16 18 25 33 33 45
–
variability
–
cluster
–
13
Single
–
–
Linkage Clustering
To begin, each record assigned to
its own
cluster
–
records, in separate clusters
Single
linkage seeks minimum distance between any two
–
–
{33}. Distance = 0, clusters combined
– – –
Step 2: Step 3: Step 4:
Clusters {15} and {16} combined, where distance = 1 Cluster {15, 16} combined with cluster {18}
Clusters {2} and {5} combined
2, 5 15, 16 33, 33 15, 16, 18
Step 1: Minimum cluster distance is between clusters {33} and
2
5
9
15
16
18
25
33
33
45
14
Single
–
Linkage Clustering
(cont’d)
2
5
9
15
16
18
25
33
33
45
2, 5
2, 5, 9
15, 16 33, 33 15, 16, 18
2, 5, 9, 15, 16, 18
2, 5, 9, 15, 16, 18, 25
2, 5, 9, 15, 16, 18, 25, 33, 33
2, 5, 9, 15, 16, 18, 25, 33, 33, 45
–
Agglomeration continues similarly Steps 4
–
9
–
all records in data set
Above, last cluster {2, 5, 9, 15, 16, 18, 25, 33, 33, 45} contains
15
k
–
•
–
Means Clustering
k
–
Means effective at finding clusters in data
k
– –
– –
–
Means Algorithm
–
data
Step 1:
Analyst specifies
k
= number of clusters to partition
Step 2: Step 3:
Step 4: Step 5:
k
a partition of data set into
k Repeats Steps 3
records randomly assigned to initial clusters
For each record, find the nearest cluster center,
Each cluster center “owns” subset of records, we have
k clusters, find cluster
,C
, …., Update cluster center location to centroid
clusters, C centroid
C
1
2
k
For each of
–
5 until convergence or termination
16
k
–
•
– – –
–
Means Clustering
(cont’d)
Nearest criterion in Step 3 typically Euclidean Distance
Determining Cluster Centroid
n Located at point (
,
b
,
c
)
Assume
Centroid of points is center of gravity of points
data points (a
,b
,c
), (a
,b
,c
), …, (a
1
Σ
1
a
1
,
2
/
2
Σ
2
/
n
n
n
i
i
i
)
/
n
Σ
b
n
,
c
n
–
have centroid
For example, points (1, 1, 1), (1, 2, 1), (1, 3, 1), and (2, 1, 1)
1112,1231,1111(1.25,1.75,1.00) 4 4 4
17
k
–
–
Means Clustering
k
(cont’d)
Means algorithm terminates when centroids no longer change
–
–
remain in cluster
– –
,C
Convergence criterion may also cause termination For example, no significant reduction in SSE
For
k
clusters, C
1
2
, ….,
C
, all records “owned” by cluster
k
k
SSE d ( p, m )2 , where
p Ci mi
i i1 pCi
each data point in cluster i
represents centroid of cluster i
18
Example of Work
–
– –
Assume
k
k
For example, m
–
= 2 to cluster following data points
= 2 specifies number of clusters to partition
k
Means Clustering at
a
b
c
d
e
f
g
h
(1, 3)
(3, 3)
(4, 3)
(5, 3)
(1, 2)
(4, 2)
(1, 1)
(2, 1)
Step 1: Step 2:
k
Randomly assign
= 2 cluster centers
1
= (1, 1) and m
= (2, 1)
2
19
Example of Work
k
–
Means Clustering at
First Pass (Copied from book)
Centroid
d1
d2
m1
1
1
m2
2
1
Clustering
Point a
d1 1
d2 Distance from m1
Distance from m2 Cluster Membership
SE
3
b c
3 4
3
3
d e f
5 1 4
3
2
2
g h
1 2
1
1
SSE
BCV
BCV/WCV
20
Example of Work
k
–
Means Clustering at
First Pass (Copied from book)
Centroid
d1
d2
m1
1
1
m2
2
1
Clustering
Point
d1
d2
Distance from m1
Distance from m2 Cluster Membership
SE
a
1
3
2.00
2.24 C1
4
b
3
3
2.83
2.24 C2
5
c
4
3
3.61
2.83 C2
8
d
5
3
4.47
3.61 C2
13
e
1
2
1.00
1.41 C1
1
f
4
2
3.16
2.24 C2
5
g
1
1
0.00
1.00 C1
0
h
2
1
1.00
0.00 C2
0
SSE
36.00
BCV
1
BCV/WCV
0.028
Centroid (Newly calculated)
d1
d2
m1
1
2
m2
3.6
2.4
21
Example of Work
k
–
Means Clustering at
Second Pass
Centroid
d1
d2
m1
1
2
m2
3.6
2.4
Clustering
Point
d1
d2
Distance from m1
Distance from m2
Cluster Membership
SE
a
1
3
1.00
2.67
C1
1
b
3
3
2.24
0.85
C2
0.72
c
4
3
3.16
0.72
C2
0.52
d
5
3
4.12
1.52
C2
2.32
e
1
2
0.00
2.63
C1
0
f
4
2
3.00
0.57
C2
0.32
g
1
1
1.00
2.95
C1
1
h
2
1
1.41
2.13
C1
2
SSE
7.88
BCV
2.631
BCV/WCV
0.334
Centroid (Newly calculated)
d1
d2
m1
1.25
1.75
m2
4
2.75
22
Example of Work
k
–
Means Clustering at
Third Pass (Copied from book)
Centroid
d1
d2
m1
1.25
1.75
m2
4
2.75
Clustering
Point
d1
d2
Distance from m1
Distance from m2
Cluster Membership
SE
a
1
3
1.27
3.01
C1
1.625
b
3
3
2.15
1.03
C2
1.0625
c
4
3
3.02
0.25
C2
0.0625
d
5
3
3.95
1.03
C2
1.0625
e
1
2
0.35
3.09
C1
0.125
f
4
2
2.76
0.75
C2
0.5625
g
1
1
0.79
3.47
C1
0.625
h
2
1
1.06
2.66
C1
1.125
SSE
6.25
BCV
2.926
BCV/WCV
0.468
Centroid (Newly calculated)
d1
d2
m1
1.25
1.75
m2
4
2.75
23
Clustering Task
(Repeated slide)
Between-cluster variation: Within-cluster variation:
–
–
–
–
Clustering identifies groups of highly Algorithms construct clusters where between
similar records
(BCV) large, as compared to within
– cluster variation
–
Analogous to concept behind analysis of variance
cluster variation (WCV)
MIS 637
Summer 2013
24
Example of Work
–
– –
k
k
Euclidean distance from points to m
–
= 2 to cluster following data points
Means Clustering at
Assume
a
b
c
d
e
f
g
h
(1, 3)
(3, 3)
(4, 3)
(5, 3)
(1, 2)
(4, 2)
(1, 1)
(2, 1)
Step 1: Step 2:
k
Randomly assign
= 2 specifies number of clusters to partition
= 2 cluster centers
k
= (1, 1) and m
= (2, 1) For each record, find nearest cluster center
For example, m
1
2
•
–
First Iteration Step 3:
1
and m
shown
2
Point
a
b
c
d
e
f
g
h
Distance from m1
2.00
2.83
3.61
4.47
1.00
3.16
0.00
1.00
Distance from m2
2.24
2.24
2.83
3.61
1.41
2.24
1.00
0.00
Cluster Membership
C1
C2
C2
C2
C1
C2
C1
C2
MIS 637
Summer 2013
25
Example of
k
Work
–
Means Clustering at (cont’d)
– –
contains {a, e, g} and m
Cluster membership assigned, now SSE calculated
Cluster m
has {b, c, d, f, h}
1
2
k
SSE d ( p, m )2
i i1 pCi
22 2.242 2.832 3.612 12 2.242 02 02 36
–
Recall clusters constructed where between
(BCV) large, as compared to within
cluster variation (WCV)
–
–
cluster variation
12
BCV d(m,m ) 1 0.0278,where
WCV
SSE 36 surrogateforBCV
surrogate for WCV
d(m,m ) 12
SSE
–
Ratio BCV/WCV expected to increase for successive iterations
26
Example of
k
–
k
Means Clustering at (cont’d)
–
Work
Step 4: For
clusters, find cluster centroid, update location
–
[(3 + 4 + 5 + 4 + 2)/5, (3 + 3 + 3 + 2 + 1)/5] = (3.6, 2.4)
Cluster 1 = [(1 + 1 + 1)/3, (3 + 2 + 1)/3] = (1, 2), Cluster 2 =
–
first iteration of algorithm
and m
(triangles) after
Figure shows movement of clusters m
1
2
5 4 3 2 1
00123456
27
–
Means Clustering at (cont’d)
–
•
– –
m
Example of
k
Work
Step 5: Repeats Steps 3
–
4 until convergence or termination
Second Iteration
–
contains {a, e, g, h} and m SSE = 7.86, and BCV/WCV = 0.3346
Repeat procedure for Steps 3
Again, for each record find nearest cluster center m
4
= (1, 2) or has {b, c, d, f}
1
2
= (3.6, 2.4)
– – –
Cluster m
1
2
Note 0.3346 has increased compared to First Iteration value = 0.0278
–
cluster variation
–
Between
–
cluster variation increasing with respect to Within
MIS 637
Summer 2013
28
–
Cluster centroids updated to m
After Second Iteration, cluster centroids shown to move slightly
5
4
3
2
1
0
Means Clustering at (cont’d)
– –
Example of
k
Work
1
= (1.25, 1.75) or m
= (4, 2.75)
2
0123456
MIS 637
Summer 2013 29
Means Clustering at (cont’d)
•
Example of
k
–
Third (Final) Iteration
– –
– – – –
Work
Repeat procedure for Steps 3
Now, for each record find nearest cluster center m
–
4
= (1.25,
Again, BCV/WCV has increased compared to previous = 0.3346 This time, no records shift cluster membership
Centroids remain unchanged, therefore algorithm terminates
1
= (4, 2.75)
SSE = 6.23, and BCV/WCV = 0.4703
1.75) or m
2
MIS 637
Summer 2013 30
–
Means not guaranteed to find to find global minimum SSE
Instead, local minimum found
Invoking algorithm using variety of initial cluster centers
Example of
k
Means Clustering at (cont’d)
•
– – –
–
remaining clusters placed far from previous centers
–
– –
Work
Summary
k
–
improves probability of achieving global minimum
One approach places first cluster at random point, with
(Moore)
What is appropriate value for
k
Potential problem for applying
Analyst may have
?
k
– knowledge of
Means
a priori
k
MIS 637
Summer 2013
31
–
Outer loop to algorithm possible
Cycles through different
Results compared, selecting solution with smallest SSE
–
–
Means Clustering at (cont’d)
– – –
Example of
k
Work
k
What attributes to use as input?
values
Some attributes likely more relevant than others
–
discussed in Chapter 5
Apply axis
–
stretching methods for quantifying attribute relevance
MIS 637
Summer 2013 32