CS计算机代考程序代写 Assignment 5 – Data Visualization with Kibana

Assignment 5 – Data Visualization with Kibana
Introduction
In this Assignment you will be working with NYC OpenData published by the city of New York pertaining to 311 service requests collected since 2010 with over 21 Million rows with 41 columns.
The live dataset can be seen at this link:
https://nycopendata.socrata.com/Social-Services/311-Service-Requests-from-2010-to- Present/erm2-nwe9.
This assignment has the following objectives:
1. To further expose you to using ELK stack as an analytic tool to analyze streaming realistic big data. You will setup the ELK stack following the lecture videos posted.
2. Give you experience working with opened end problems, that are similar to problems that you will face in your career as a big data professional.
3. At the end of this project you should:
a. Gain sufficient confidence in using Logstash configuration files and creating Elasticsearch indices, advanced queries, charts, maps and dashboards using Kibana i.e. fully using ELK stack in real big data scenarios.
b. Gain an appetite for working with large streaming datasets.
c. Be aware of the potential and benefits of analyzing large streaming datasets using big data tools.
What is expected from you?
1. You should demonstrate that you have a good understanding of the features that ELK stack provides, such as advanced queries, indexing, manage tables, aggregations, charts/graphs, maps and dashboards.
2. You should demonstrate your ability to work with multiple large datasets and use the tools to gain valuable insights from the datasets. As well, you should be able to present these insights in a manner that is easily consumable by stakeholders and other interested parties.
Problem Background
You have been hired as a data analyst by the city of New York to gain valuable insights from their huge data set for 311 service requests. Your task is to use the ELK stack you successfully installed and configured in your GCP platform (See videos posted). Successful completion of this task includes reading the data into Elasticsearch using the Logstash configuration file (this is given to you) as well as a geo-point template (for maps), creating a GCP instance and firing Logstash to ingest the NYC 311 service request data into Elasticsearch and using Kibana to analyze and visualize the results as per the questions given.

The required results are:
results for the analytical questions (tables, charts, tag clouds, maps and dashboard)
in MS Word or PDF document.
Analytical Questions
1. Create a table showing the top 10 cities with the highest calls alongside the count of top 10 complaint calls (by Descriptor) in each city.
2. Create a pie chart showing the top 5 cities with the highest calls alongside the top five calls (Descriptor) in each city
3. Create a tag cloud representing the top 20 call descriptors.
4. Create a coordinated map of all the major call descriptors in each city 5. Create a dashboard for all visualizations of 1to 4 above
Note:
1. This is a group assignment (up to three people in a group). For every group member you must submit at least two visualizations of the data. Please clearly indicate the work that each member has done if working in a group. You can also work on this assignment individually.
2. You do not have to open the downloaded data file. You should view and explore the data set on the above website. Knowing and understanding your data set will certainly help you do better analysis
3. A static version of the dataset is provided to you (up to December 2020) but you are welcome to download the latest version (up to April 2021).