IFN647 Week 12 Workshop: WordCloud and Clustering
********************************************************************
Task 1. Working with csv files
CSV stands for ¡°comma-separated values¡±. A csv file is a simplified spreadsheet stored as a plain text file. Please see the attached example.csv and its example.xlsx.
Copyright By PowCoder代写 加微信 powcoder
Please try the following to read a csv file and save the contents in a list.
>>> dFile=open(‘example.csv’)
>>> dReader=csv.reader(dFile)
>>> df = list(dReader)
Task 2. Generating Word-Cloud in Python
Word-Cloud is a data visualization method that is used for representing text data in which the size of each word indicates its frequency or importance. Significant information can be
highlighted using a word cloud, see more details at
https://www.geeksforgeeks.org/generating-word-cloud-python/
The following modules are needed for
To install them, run the following commands:
The attached is a csv file that you can find
You are required to write a python program to open this csv file, store the csv file into a list of rows, select the CONTENT
generating word cloud in
Python: matplotlib, pandas and wordcloud.
pip install matplotlib
pip install pandas
pip install wordcloud
Learning Repository. It consists of YouTube comments on videos
Link: https://archive.ics.uci.edu/ml/machine-learning-
from UCI Machine
of popular artists (Dataset
databases/00380/
column, and produce a word cloud figure to show the important information in
(a) All CONTENTS
(b) The positive CONTENTS (the class = 1)
(c) The negative CONTENTS (the class = 0)
Task 3. k-Means clustering using python
The k-means algorithm aims to partition n documents X into k clusters C in which each document belongs to the cluster with the nearest mean ¦Ìj (the cluster centre or cluster centroid)
The k-means algorithm aims to choose centroids that minimise the inertia, or within-cluster sum-of-squares criterion:
In practice, the k-means algorithm is very fast (one of the fastest clustering algorithms available), but it falls in local minima. That¡¯s why it can be useful to restart it several times.
sklearn.cluster.KMeans
fit(X[, y, sample_weight]) Compute k-means clustering.
fit_predict(X[, y, sample_weight]) Compute cluster centers and predict cluster index for each sample.
fit_transform(X[, y, sample_weight]) Compute clustering and transform X to cluster-distance space.
get_params([deep]) Get parameters for this estimator.
predict(X[, sample_weight]) Predict the closest cluster each sample in X belongs to.
score(X[, y, sample_weight]) Opposite of the value of X on the K-means objective.
set_params(**params) Set the parameters of this estimator.
transform(X) Transform X to a cluster-distance space.
Design a python program to
(a) Cluster the following six documents X (where each document is represented as a tripe) into 3 clusters (i.e., assign labels (0, 1, 2) to them) and print the centres of each cluster.
[[1 2 1] [1 4 2] [1 0 0] [10 2 0] [10 4 1] [10 0 5]]
(b) Assign cluster labels to four incoming documents [0, 0,
0], [12, 3, 5], [11, 0, 6] and [11, 2, 0] based on the
centres calculated in (a).
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com