—
title: “assignment_8”
author: “Dong Fang”
date: “10/25/2020”
output:
html_document:
df_print: paged
—
In this exercise, you will use a very large dataset that documents all the fata motorvehicle crashes that have occurred in the US from 2006 to 2018. You will plot the individual crashes over a map of the state of Wisconsin. You will also show the rate of crashes per capita for all the counties in the state.
Objectives:
– Create maps using a raster image that shows roads and topography
– Calculate a meaningful indicator of crash likelihood based on the population of each county
– Use a faceted and a differenced representation to show which counties are seeing an increase or a decrease in fatal crashes
– Create a plot that might be a more effective way of showing trends in crash data
Be sure to download the “fars2020.rda” data file
Submit:
Complete each section chunck of code below to process the data and create the graphs. I have given you some bits of code to do some transformations that we have not discussed in class. Briefly answer the questions posed in describing what each chunk does and the meaning of the graphs.
## Load packages
“`{r}
library(tidyverse) # Includes ggplot and dplyr
library(ggalt)
library(ggpointdensity) # To address overplotting and color lines with density
library(ggrepel)
library(patchwork) # To combine plots
library(ggforce)
library(maps) # Map data for states and counties
library(mapproj) # For a range of map projections: mercator, albers, mollweide, gilbert…
library(usmap) # For 50 states
library(viridis)
library(viridisLite)
library(ggmap)
library(sf)
rm(list = ls()) # Clears all variables
“`
## Load and transform data
“`{r}
## Load and transform the fatal crash data
load(“fars_2020.data”)
fars.df = fars.df %>% select(latitude, longitud, everything()) %>%
filter(longitud < 777) %>% # Removes missing values coded by 777, 888…
mutate(state = str_pad(state, 2, pad = “0”)) %>%
mutate(county = str_pad(county, 3, pad = “0”)) %>%
unite(“fips”, state:county, sep = “”, remove = FALSE)
“`
## Load a raster image of the Wisconsin area and overlay crashes
Hint: Adapt the code for raster image of the midwest in the MapsJoins demo notebook
Why the resulting image might not be very useful?
Because car crashes could happen at anywhere if there’s a road and a car, thus the place with more people will have more crashes in statistics, which means this map is highly repeated with the population density map in WI.
“`{r}
## Filter crashes to include only crashes from Wisconsin, based on the fips number
fars_wisc.df <- fars.df %>% filter(between(fips, 55001, 55141))
midwest.bb <- c(left = -95, bottom = 40, right = -86, top = 47)
map <- get_stamenmap(midwest.bb, # bounding box in lattitude and longitude
zoom = 7, # specifies level of detail, lower for bigger area
maptype = "toner-lite" # specifies map style
)
## Use ggmap to create a raster-based plot with pointdensity overlay
ggmap(map) +
geom_point(data = fars_wisc.df, mapping = aes(x = longitud, y = latitude, color = fatals), alpha = .05) +
geom_pointdensity(adjust = 4) +
scale_color_viridis(option = "plasma") +
theme_void()
```
## Plot the Wisconsin counties using polygons data and fill with percapita crash rate
Hint: Use the "countypop" data from the usmap package
Hint: Be sure to filter the polygons to include just Wisconsin when using the usmap map data
Wisconsin is a major tourist destination, what are the implications of this for the consrtruct validity of the per capita crash rate?
This means that some crashes may be created by tourists, which means not all of the fatal rates are caused by the local residents
```{r}
library(usmap)
## Transform filtered data to calculate the per capita fatal crash rate for each county
wiscpop_map <- countypop %>% filter(abbr == ‘WI’)
wisccrash_map = left_join(wiscpop_map, fars_wisc.df, by = c(“fips” = “fips”))
wisccrash_map <- mutate(wisccrash_map, crashrate_percaptia = fatals/pop_2015)
## Use ggplot to plot polygons filled by percapita crash rates
#ggplot(wisccrash.df, mapping = aes(x = longitud, y = latitude), alpha = .1) +
#If I only plot the polygun with existing longitude and latitude, it's kind of messy
ggplot() +
geom_polygon(wisccrash_map, mapping = aes(x = longitud, y = latitude)) +
#geom_polygon(wisccrash.df, aes(fill = crashrate_percaptia)) +
scale_fill_viridis_c(option = "viridis") +
coord_map(projection = "mollweide") +
labs(title = "Crash rate per captia in Wisconsin") +
theme_void()
```
```{r}
library(usmap)
## Transform filtered data to calculate the per capita fatal crash rate for each county
# will improve the coding format on this one
wiscpop_map <- countypop %>% filter(abbr == ‘WI’)
wisccrash.df = left_join(wiscpop_map, fars_wisc.df, by = c(“fips” = “fips”))
wisccrash.df <- mutate(wisccrash.df, crashrate_percaptia = fatals/pop_2015)
wisccrash.df$county.x <- tolower(wisccrash_map$county.x)
wisccrash.df$county.x <- str_replace(wisccrash_map$county.x, pattern = " county", "")
#wisccrash.df %>% rename(‘county’ = ‘county.x’)
names(wisccrash.df)[names(wisccrash.df) == “county.x”] <- "county"
wisc_county.sf = st_as_sf(map('county', 'wisconsin', plot = FALSE, fill = TRUE)) %>%
mutate(county = str_replace(ID, pattern = “wisconsin,”, “”)) # To keep just county name
wisc_county.sf = left_join(wisc_county.sf, wisccrash.df, by = “county”)
## Use ggplot to plot polygons filled by percapita crash rates
wisc_county.sf %>%
pivot_wider(id_cols = c(county, longitud, latitude ),
names_from = year,
values_from = crashrate_percaptia)
#If I only plot the polygun with existing longitude and latitude, it’s kind of messy
ggplot(wisc_county.sf) +
geom_polygon(mapping = aes(x = longitud, y = latitude)) +
#geom_polygon(wisccrash.df, aes(fill = crashrate_percaptia)) +
scale_fill_viridis_c(option = “viridis”) +
coord_map(projection = “mollweide”) +
labs(title = “Crash rate per captia in Wisconsin”) +
theme_void()
“`
“`{r}
# I’ll try to recreate one
library(ggrepel)
## Repeat with Wisconsin map
wi_county.sf = st_as_sf(map(‘county’, ‘wisconsin’, plot = FALSE, fill = TRUE)) %>%
mutate(county = str_replace(ID, pattern = “wisconsin,”, “”)) # To keep just county name
## Merges data with shape information of map
wi_county.sf = left_join(wi_county.sf, us_county.df, by = c(“ID” = “polyname”))
wi_county.sf = left_join(wi_county.sf, us_unemp.df, by = “fips”) # Adds column for unemployment data
ggplot(wi_county.sf) +
geom_sf(aes(fill = unemp), colour = “white”) +
geom_point(data = us_cities.df %>% filter(country.etc == “WI”),
aes(long, lat), colour = “red”, size = 1) +
geom_label_repel(
data = wi_county.sf %>% filter(unemp > 9),
aes(label = county, geometry = geom),
stat = “sf_coordinates”,
min.segment.length = 1.5,
colour = “black”, size = 3,
segment.colour = “grey25”) +
scale_fill_viridis_c() +
labs(title = “Crash rate per captia in Wisconsin”,
fill = “Crash Rate”) +
theme_void()
“`
## Divide the data and plot the data using facet and difference between pre and post 2012
Hint: You need to use “if_else” to create the pre/post variable
Hint: You need to use “spread” to create a difference variable
What is a benefit of the facet plot relative to the difference plot?
What is a benefit of the difference plot relative to the facet plot?
“`{r}
## Create pre/post variable and calculate the mean crash rate
## Recreate county plot faceted horizontally by the pre/post 2012
## Plot the difference between pre and post 2012
“`
## Create a plot that might be better for showing trends in county crash data
Hint: Reconsider using maps in favor of another representation such as slope graph
Hint: If you choose to create a slope graph make sure your data are in the right form
Why would your recommend this representation relative to the map (or not)?
“`{r}
“`