Data report
Overview
Content warning — this assignment asks you to work with sensitive information regarding gun violence. If you are uncomfortable with this for any reason, please contact me to discuss it and work out an alternative if necessary.
Coding skills don’t just allow you to analyze data — they enable you to communicate important information to broad audiences. In this exercise, you’ll create a report of this crowdsourced mass shooting data
(Links to an external site.) from 2018. This project will challenge you to apply the skills you’ve acquired thus far towards independently learning a new library for building an interactive map, and building a report about a dataset.
The Data
In order to make an interactive map of your data, you’ll obviously need to read it into R. There is already a file shootings-2018.csv inside of the data/ directory for you to use. This data originally was crowdsourced on this website
(Links to an external site.), though I’ve added latitude and longitude to make mapping easier. The data is in the following format:
Expectations
At this point in the quarter, we expect you to be following the best practices we’ve incorporated into the class. This means:
• Proper use of libraries such as dplyr for data wrangling
• Structuring your code so that if the data changes, you can easily update your entire report
• Leveraging markdown syntax to provide structure to your report
• Clearly commenting and properly organizing your code (adheres to tidyverse style guide, passes linting)
• Writing functions to encapsulate chunks of code that you use more than once
• Creating appropriate labels for your visualizations
Instructions
As with previous assignments, follow this link
(Links to an external site.) to create your own private repository for this assignment. This should automatically create a private repository which you will submit to Canvas as your assignment.
Unlike previous assignments, the repo will not have all necessary starter files. You should begin by creating an index.Rmd file, as well as one (or more) files in which to do your analysis (i.e., analysis.R). As you might expect, you’ll complete all of you analysis (calculations, creations of maps/graphs, etc.) in your analysis.R file, and you’ll use your index.Rmd to simply show the information you’ve computed. More detailed instructions are below, not on GitHub.
This is an opportunity for you to explore a dataset, and communicate the insights that you find most important. The format of the report is up to you, but it must include each of the following components:
Summary information
To start your report, you should summarize relevant features of your dataset. Write a paragraph providing a high-level overview of shootings in the US, based on the dataset. This should provide your reader with a sense of scale of the issue, including answers to these questions:
• How many shootings occurred?
• How many lives were lost?
• Which city was most impacted by shootings (make sure to clarify how you are measuring “impact”)?
• Two other insights of your choice
Data in this paragraph should reference values that you calculate in R, and should not simply be typed as text into the paragraph.
Summary Table
To show a set of quantitative values to your user, you should include a well formatted summary table of your interest. The table should be sorted in a meaningful way. This should not just be the raw data, but instead should an aggregate table of information. How you would like to aggregate the information (by city, state, month, day of the week, etc.) is up to you. Make sure to include accompanying text that describes the important insights from the table.
Description of a particular incident
Your report will include a paragraph (4+ sentences) of in-depth information about a particular (single) incident. You should provide your reader with relevant information from the dataset, such as the date and location of the incident, as well as the number of people impacted (injured, killed). You should include a link to at least one outside resource (not found in the data). Data in this paragraph should reference values that you calculate in R, and should not simply be typed as text into the paragraph.
An interactive map
While maps are not always the most appropriate visual representation of geographic data, they are extraordinarily popular and attract broad audiences. Before rendering your make, make sure to introduce the purpose of displaying the map in the report (e.g., what types of comparisons it affords). You’ll build an interactive map that shows a marker at the location of each shooting. On your map, manipulate the size of the markers based on the underlying dataset (# injured, # killed, etc.). When hovered or clicked on, each point should provide at least 3 pieces of information about the incident (with a line break —
— between each piece of information) and no irrelevant information.
Below your map, you must note at least 2 insights revealed by the map.
Choice of plotting library is up to you, though I suggest you consult the interactive visualization chapter of the book — remember, the map must be interactive.
A plot of your choice
In addition to the interactive map, you will build an additional plot of your choice to answer a specific question about your data. You can do this using the package of your choice, such as ggplot2, plotly, bokeh, or others. The choices you make should be tied directly to the question you have about your data.
Similarly to your map, you should integrate your plot seamlessly with the rest of your report, and reference/describe it in your text. Regardless of library, the chart should have meaningful and clear title, axis labels, and legend (if appropriate).
You should provide a defense of why you chose the visual encodings of the chart (i.e., you chose a layout to answer a specific question), and list at least 2 insights gained from the chart.
Submission
Similarly to last week, you’ll make sure your page is hosted on the web through github (Settings -> Github Pages). As with the previous assignment, you should add and commit your changes using git, and push your assignment to GitHub. Review the rubric.md file for grading expectations. You will submit the URL of your repository as your assignment, and you must provide a link to your live website in your README.md file.