Live Coding Wk5 – Lecture 13 – Clustering¶
We have receive a strange task. Now you can hear whispers “k-means clustering” through out the night. I suppose thats pretty normal. Lets go through how to do k-means clustering algorithm shown in the lectures.
Copyright By PowCoder代写 加微信 powcoder
### Imports and data you will need
import cv2
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
Problem: Clustering our thoughts¶
We will be exploring k-means clustering through a couple fun exercises. Stay until the end for a treat 😉
A union of points¶
Compelled to learn we’ve decided to play around with a toy dataset. It is made from a union of blobs.
Plot the toy datasets toy dataset with different colours determined by the true class.
# First let’s create the toy dataset
# This function Generates isotropic Gaussian blobs for clustering.
# X : ndarray of shape (n_samples, n_features)
# The generated samples.
# y : ndarray of shape (n_samples,)
# The integer labels for cluster membership of each sample.
x_toy, y_toy = make_blobs(n_samples=500, n_features=2, centers=4, cluster_std=2, random_state=6)
np.unique(y_toy) # the 4 labels (categories)
array([0, 1, 2, 3])
# Set up code
colours = [‘r’,’g’,’b’,’y’]
colour_map = list(zip(np.unique(y_toy), colours)) # [(0, ‘r’), (1, ‘g’), (2, ‘b’), (3, ‘y’)]
plt.figure(figsize=(16, 10))
for category, cl in colour_map:
cur_idx = y_toy == category # select the indexes where y_toy == category
plt.scatter(x_toy[cur_idx, 0], x_toy[cur_idx, 1], c = cl)
plt.title(‘Real Clusters’)
plt.show()
Lets now go through how we can use KMeans from sklearn to determine the categories instead. Take a look at the corresponding class.
There are a few interesting parameters we can look at for initialisation. Here are a few of them:
Parameters Description
n_clusters Number of clusters
max_iter Maximum iterations.
init Initialization method of centroids, either random, kmeans++ or an array
Discussion: What would make a bad centroid initialisation? What would be the best initialisation?
Worst case when we choose our centroids to be all in a corner of a plot. Best would be on the blob centers.
Lets go define and train a k-means clustering of our toy dataset.
# by default init=’k-means++’, also check what are the other parameters
kmeans = KMeans(n_clusters=4) # Trying to fit 4 clusters
kmeans.fit(x_toy, y_toy) # The training (unsupervised) (note that the y values are ignored)
y_pred = kmeans.predict(x_toy)
# examine the features of the trained cluster model
kmeans.cluster_centers_ # the cluster centers
array([[ 0.40473078, -1.59470061],
[ 7.94493775, -3.44856612],
[-7.86130864, 1.87598373],
[ 6.46173504, -9.52333651]])
Now plot the (1) centroid center; and (2) the point coloured by their predicted class for each k-means cluster.
Use marker=’D’, c=’k’, s=500 for the marker plot parameters for visibility.
Hint: centroids obtains through kmeans.cluster_centers_.
plt.figure(figsize=(16, 10))
for (category, cl), center in zip(colour_map, kmeans.cluster_centers_):
cur_idx = y_pred == category # note that this time, we are now selecting based on y_pred == category
plt.scatter(center[0], center[1], marker = “D”, c = ‘k’, s = 500) # plot the cluster center
plt.scatter(x_toy[cur_idx, 0], x_toy[cur_idx, 1], c = cl)
plt.title(‘Predicted Clusters with the Cluster Centers’)
plt.show()
Looks pretty good.
Discussion: What would happen if we decreased the number of clusters?
Why are the colours now different?
Your turn: experiment with different number of clusters and maybe also different seed initializations¶
# create another model with say 3 clusters and random seed initializations?
# plot again
A bold new task¶
We are given a mysterious image file. We are compelled to use k-means clustering on it. So lets do it.
Below is some set up code on how you would deal with images in python using cv2 (you don’t need to worry about this).
mystery_image = cv2.imread(‘data/images/mystery’)
mystery = cv2.cvtColor(mystery_image, cv2.COLOR_BGR2RGB)
mystery = np.float32(mystery.reshape(-1, 3))
COLOUR_SPLIT = 10
mystery.shape
(589380, 3)
#plt.imshow(mystery_image)
#plt.imshow(cv2.cvtColor(mystery_image, cv2.COLOR_BGR2RGB))
mystery_image.shape
(627, 940, 3)
Our image mystery is now a (589380 , 3) dimensioned array.
Discussion: What would the dimension size 3 correspond to?
Alrighty, its now your turn to run k-means on the rows of the array.
Set n_clusters = COLOUR_SPLIT.
m_kmeans = KMeans(n_clusters=COLOUR_SPLIT)
m_kmeans.fit(mystery)
KMeans(n_clusters=10)
We separate the training and predicting into different blocks because of the fitting time.
Get the predicted classes for the mystery array. Also get the cluster centroids and also type cast them using np.uint8.
mystery_pred = m_kmeans.predict(mystery)
mystery_center = np.uint8(m_kmeans.cluster_centers_)
Its time for you to construct a new image array using the k-means prediction. For the given class of mystery_pred make an array with the corresponding centroid in mystery_center.
new_mystery = np.zeros(mystery.shape, dtype=int)
# Fill new_mystery with centroids
for i in range(COLOUR_SPLIT):
new_mystery[mystery_pred == i] = mystery_center[i] # replace the category with the cluster center
new_mystery = new_mystery.reshape(mystery_image.shape) # reshape back to the original figure shape
Now we can finally plot our image.
plt.imshow(new_mystery)
plt.show()
Truely an amazing spectacle. Our sleepless nightest were truely worth it. But we can do better! Lets make the values of our clusters randomised. They need to be integers between 0-255.
new_mystery_col = np.zeros(mystery.shape, dtype=int)
# Fill new_mystery with randomized centroids
random_center = np.uint8(np.random.rand(*mystery_center.shape) * 255)
for i in range(COLOUR_SPLIT):
new_mystery_col[mystery_pred == i] = random_center[i] # replace the category with a random colour
new_mystery_col = new_mystery_col.reshape(mystery_image.shape)
plt.imshow(new_mystery_col)
plt.show()
Now its perfect. If we have extra time you can play around with the COLOUR_SPLIT parameter; or re-run your randomised colouring to make more amazing images.
Your turn now.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com