SIT384 Cyber security analytics
Pass Task 7.1P: K-Means and Hierarchical Clustering
Task description:
In machine learning, clustering is used for analyzing and grouping data which does not include pre- labeled class or even a class attribute at all. K-Means clustering and hierarchical clustering are all unsupervised learning algorithms.
K- means is a collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters. It is a division of objects into clusters such that each object is in exactly one cluster, not several.
In Hierarchical clustering, clusters have a tree like structure or a parent child relationship. Here, the two most similar clusters are combined together and continue to combine until all objects are in the same cluster.
In this task, you use K-Means and Agglomerative Hierarchical algorithms to cluster a given dataset and compare their difference.
You are given:
• np.random.seed(0)
• make_blobs class with input:
o n_samples:200
o centers:[2,1],[-1,-1],[5,3],[9,4] o cluster_std:0.9
• KMeans() function with setting: init = “k-means++”, n_clusters = 4, n_init = 12
• AgglomerativeClustering() function with setting: n_clusters = 4, linkage = ‘average’
• Other settings of your choice
You are asked to:
• plot your created dataset
• plot the two clustering models for your created dataset
• set the K-Mean plot with title “KMeans”
• set the Agglomerative Hierarchical plot with “Agglomerative Hierarchical”
• calculate distance matrix for Agglomerative Clustering using the input feature matrix
(linkage = complete)
• display dendrogram
Sample output as shown in the following figure is for demonstration purposes only. Yours might be different from the provided.
Submission:
Submit the following files to OnTrack:
1. Your program source code (e.g. task7_1.py)
2. A screen shot of your program running
Check the following things before submitting: 1. Add proper comments to your code