程序代写 Introduction to Machine Learning k-Means Clustering

Introduction to Machine Learning k-Means Clustering
Prof. Kutty

Unsupervised Learning

Sometimes labels are not available
Sn ={x ̄(i),y(i)}ni=1 x ̄2X

Unsupervised learning: Applications
”Now, when there are multiple stories related to your search, we’ll also organize the results by story so it’s easier to understand what’s most relevant and
Other Examples: informed decision on which
you can make a more specific articles to explore. ”
• finding similar homes for sale
• grouping patients by symptoms
• mining customer purchase patterns
• grouping search results according to topic
• group emails
• image processing à regions of image segmentation…

Unsupervised DataSets
Goal: Learn a representation of the data, uncover useful structure, identify groups/clusters of similar examples

Clustering
unsupervised learning
• points that are ‘close’ should be in the same cluster
• points that are ‘further apart’ should be in different clusters

Clustering
(i)n (“)d Input:Sn ={x ̄ }i=1 x ̄2R
Output: asetofclusterassignments!!,…,!” witheach!#∈{1,…,’}
sin Cn egK3it 123
Goal: assign similar (dissimilar) points to the same (different) cluster(s)
Clusters can be described by
their corresponding set of examples or a representative point

unsupervised learning clustering

image courtesy Bishop 2006
Example: Image segmentation k I
using k-means clustering
n pixels in the picture Oab
Pixels a, c
are in the same cluster Pixels a, b
are in different clusters

Outline: k-means
• algorithm execution by example
• k-means objective
• algorithm details
– initialization
– hyperparameter selection – convergence

k-means Clustering k is a hyperparameter; k = 2
mean of each cluster (guess)
assign to closest mean
recompute means
assign to closest mean
recompute means
no change in dish assignments terminate

Outline: k-means
• algorithm execution by example
• k-means objective
• algorithm details
– initialization
– hyperparameter selection – convergence

k-means Clustering
parameter hyper
Partition data into k clusters (defined by k “means”) such that the sum of squares of Euclidean distances of each point to its cluster’s mean is minimized

!-means objective function
Datapoints
clusters corresponding to each datapoint
means corresponding to each cluster
Goalistominimizethe objective function
#̅ ! ,…,#̅ ” & , … , &
e.g., ! = 3; % = 2 “̅ (%)
&# ∈ {1,…,!} +̅ ! , … , +̅ $
“̅($) ” ̅ ( ” )X ●
output #&’CI
, #̅ −+̅ $ #%!
ten fellas

Datapoints
initialize means
#̅ ! , … , #̅ ” and fixed k +̅ ! , … , +̅ $
which this smaller
k-means Clustering algorithm: more formally
pick is value
the duster for
Iteratively
• for each point #̅ # , reassign #̅ # &#=argmin#̅# −+̅(
• recompute +̅
∑$ &$%( +̅ $
nor j auster
∑$ &$%( = O
are in cluster j
points that
indicator E 0 I function

!-means algorithm inclassexercise:! = 2;6=4
“̅(-) “̅(%) ●
“̅(“) “̅($)
“̅(%) = 4 3
(-) 5 “̅ =4
initialize means #̅ ” = “̅(“), #̅ $ = “̅($) Iteratively
•∀( reassign”̅ & to&& =argmin “̅ & −#̅ ‘ ‘
•∀) recompute#̅’ =∑!)!*’,̅! ∑! )!*’

terminates

Outline: k-means
• algorithm execution by example
• k-means objective
• algorithm details
– initialization
– hyperparameter selection – convergence

k-means Clustering algorithm: details
Datapoints #̅ ! ,…,#̅ ” and fixed k initialize means +̅ ! , … , +̅ $
Iteratively
• reassign#̅# to&#=argmin#̅# −+̅(
• recompute+̅ ( =∑$ &$%( +̅ $ ∑$ &$%(

Q: How do we initialize the means?
A: Typically randomly pick amongst given data
Other choices:
Random points in the space
Pick points w.p. proportional to distance from already selected means
assumes k ≤ n
See e.g., Celebi, M. E., Kingravi, H. A., & Vela, P. A. (2013). A comparative study of efficient initialization methods for the k-means clustering algorithm.

Q: How do we pick k?

in class exercise
One possible approach to finding the optimal hyperparameter k is to vary k (i.e., set k = 1, 2,…) and pick the value of k that gives you the lowest objective function.
B. Yes but there are complications
C. No this is not reasonable

Elbow method

Outline: k-means
• algorithm execution by example
• k-means objective
• algorithm details
– initialization
– hyperparameter selection – convergence

k-means Clustering algorithm: details
Datapoints #̅ ! ,…,#̅ ” and fixed k initialize means +̅ ! , … , +̅ $
until when?
Iteratively
• reassign#̅# to&#=argmin#̅# −+̅( (
• recompute+̅ ( =∑$ &$%( +̅ $ ∑$ &$%(
k-means is guaranteed to converge*

k-means Clustering convergence
objective function
data point
+̅(“)’s cluster
(,̅ ! ,…,,̅ ” )
!#̅,% ='(̅! −*̅%!
(c1,…,cn)
mean of +̅(“)’s cluster
k-means performs coordinate descent on the objective function

Datapoints
initialize means
#̅ ! ,…,#̅ ”
+̅ ! , … , +̅ $
and fixed k
k-means Clustering performs coordinate descent on the objective function
Iteratively
• reassign#̅# to&#=argmin#̅# −+̅( (
• recompute+̅ ( =∑$ &$%( +̅ $ ∑$ &$%(
fix /, choose 0̅ to minimize %
10̅,/ =34̅# −,̅&! #$!
fix 0̅, choose / to minimize %
10̅,/ =34̅# −,̅&! #$!

k-means Clustering convergence
k-means performs coordinate descent on the objective function
k-means is guaranteed to converge*

k-means Clustering Good news…
fix #̅, choose &̅ to minimize .
Iteratively
• reassign*̅ # to!# =argmin *̅ # −3̅ % %
• recompute3̅ % =∑$ ($)% +̅ $ ∑$ ($)%
3&̅,4 =5″̅& −#̅)! &*”
J must monotonically decrease J must (eventually) converge
k-means is guaranteed to converge (but…)
3&̅,4 =5″̅& −#̅)! &*”
fix &̅, choose #̅ to minimize .

k-means Clustering Bad news…
J is not convex
not guaranteed to converge to global minimum

Getting stuck in local minima
• What we want

Getting stuck in local minima
• What we get

k-means Clustering trying to find a global optimum
vary initialization of means
pick clustering with lowest (eventual) objective function value (for a fixed k)

k-means global optimum
• What we want

k-means global optimum
• What we get

agglomerative hierarchical clustering

Clustering

Mason Hall
Chrysler Center

Clustering
Doesn’t return additional information on how the clusters relate to one another.

Hierarchical Clustering
Idea: start with each pt. in its own cluster and build a hierarchy Cut here for
3 clusters

CSE Chrysler Center
Mason Hall

Agglomerative Clustering
1. Assign each pt. its own cluster
for each point i, !! = {%̅(!)}
2. Find the closest clusters & merge, repeat until convergence
arg min -(!!, !%) where i ≠ j !,%
Notation: !! set of all points in cluster i

Linkage Criteria
1 !!,!% = min -(%̅,%̅+) ‘̅∈)”;’̅+∈)#
Single-linkage:
Complete-linkage: 1 !!,!% = max -(%̅,%̅+) ‘̅∈)”;’̅+∈)#
susceptible to outliers but tends to merge clusters so that all clusters tend to have the same diameter
Average-linkage: 1 !!,!% = !1 1 ∑ -(%̅,%̅+) ! !% ‘ ̅ ∈ ) ” ; ‘ ̅ + ∈ ) #
less susceptible to outliers inefficiency concerns
Notation: !! set of all points in cluster i
susceptible to outliers and chaining phenomenon: larger clusters are naturally biased toward having closer distances to other points and will therefore attract a successively larger number of points

k-means Clustering performs coordinate descent on the objective function
example for k = 2 • reassign#̅# to&#=argmin#̅# −+̅(
Iteratively
fix8=(+̅ ! ,+̅ ‘ )
choose&̅tominimize : &̅,8 =∑” #%!
#̅ # −+̅ &$
choose cluster assignments (&!, … , &”)
to minimize ∑”
foreachexample*̅ # choose!# ∈{1,2}tominimize *̅ # −3̅ ($

k-means Clustering performs coordinate descent on the objective function
example for k = 2 • reassign#̅# to&#=argmin#̅# −+̅(
• recompute+̅ ( =∑$ &$%( +̅ $ ∑$ &$%(
fix &̅,choose+̅tominimize: &̅,8 =∑” #%!
fix cluster assignments (&!, … , &”) chooseM=(+̅ ! ,+̅ ‘ )tominimize
Iteratively
#̅ # −+̅ &$
− +̅ &$ ‘ sum of squared distances of a set of points
to a single point μ is minimized when μ is the mean
#̅# −+̅! ‘+∑” #̅# −+̅’ ‘ #:&$%’

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts