—
title: “Maps and joins”
author: “John D. Lee”
date: 10/20/2020
output:
html_notebook:
toc: yes
—
“`{r LoadPackages, warning=FALSE, message=FALSE}
library(tidyverse) # Includes ggplot and dplyr
library(ggalt)
library(ggpointdensity) # To address overplotting and color lines with density
library(ggrepel)
library(patchwork) # To combine plots
library(ggforce)
library(HistData) # For Snow Choleara data
library(maps) # Map data for states and counties
#library(fiftystater) # Shows all 50 states together by moveing Alaska and Hawaii
library(mapproj) # For a range of map projections: mercator, albers, mollweide, gilbert…
library(usmap) # For 50 states
library(umap) # For umpap dimensionality reduction
rm(list = ls()) # Clears all variables
“`
## Three general methods for creating maps with ggplot
There are three general ways to make graphs with ggplot:
– A raster image using ggmap with standard ggplot layers on top
– A polygon map using geom_polygon and a dataframe that contains data describing state boundaries
– A simple feature (sf) map that uses geom_sf to work directly with geographic elements
## Method 1. Raster maps with ggmap
ggmap provides a coordinate system that you can overlay points and lines just as you might a cartesian coordinate system–like a scatterplot. You are not able to modify the map data itself (i.e, change the fill of areas based on variables in your dataframe)
Benefits:
* Can access a variety of maps that have geographic details, such as streets, cities, and landmarks
* Provides access to google maps, but you need to set up an account
Limits:
* Does not support mapping fill to regions or mapping color to region boundary
### Raster map: Plot data as scatterplot with detailed map features as background
“`{r}
midwest.bb <- c(left = -95, bottom = 40, right = -86, top = 47)
map <- get_stamenmap(midwest.bb, # bounding box in lattitude and longitude
zoom = 7, # specifies level of detail, lower for bigger area
maptype = "toner-lite" # specifies map style
)
ggmap(map) +
geom_point(x = -89.4012302, y = 43.0730517, size = 4, colour = "red") +
theme_void()
```
## Method 2. Polygon maps using geom_polygon
geom_polygon can be used to create maps, where the polygon is defined by the coordinates of the state boundary. State boundary information can be extracted from the maps data using the "map_data" and the "maps" package data. To ensure the desired projection use coord_map() and specify the coordinate system of choice.
Benefits:
* Data can be mapped to the polygon regions (e.g., counties, state, or countries) using fill or color.
* The "coord_map" makes it possible to show maps with a wide variety of projections.
Limits:
* The maps do not have the details that come with google maps, such as landmarks and elevation.
* The data you want to plot must have a variable (e.g., state name or fips number) that exactly matches the map data
### Polygon map: Join map data and state crime data to plot as fill
```{r}
# From https://www.datanovia.com/en/blog/how-to-create-a-map-using-ggplot2/
## Transform data to be plotted to include a variable that matches the map data
arrests.df = USArrests
arrests.df$region = str_to_lower(rownames(USArrests)) # Adds a variable that can be used to join with the map data
## Extract the map data
states_map = map_data("state")
arrests_map = left_join(states_map, arrests.df, by = "region") # Variable that matches exactly (i.e. lower case)
# Create the map
ggplot(arrests_map, aes(long, lat, group = group)) +
geom_polygon(aes(fill = Assault), color = "white") +
scale_fill_viridis_c(option = "viridis") +
coord_map(projection = "mollweide") +
labs(title = "Wisconsin has few assault arrests") +
theme_void()
```
## Method 3. Simple feature (sf) maps using geom_sf
The geom_sf plots simple feature data that is a common format for map data. The process of integrating data and plotting is similar to geom_polygon, but allows more control of aspect ratios with coordinate systems.
Benefits:
* Data can be mapped to the polygon regions (e.g., counties, state, or countries) using fill or color.
* Allows you to operate directly on geometric entities, such as finding the centroid of a region or subtract one region from another
* Support statistical analysis on points within regions, such as Kriging
Limits:
* The maps do not have the details that come with google maps, such as landmarks and elevation.
* The data you want to plot must have a variable (e.g., state name) that exactly matches the map data
* You must tranlate data frames into sf objects
### sf map: Add layer of points to create a scatterplot with a map for context
```{r}
# https://cfss.uchicago.edu/geoviz_plot.htmlairports_n
library(sf)
library(nycflights13)
airports_n = flights %>% count(dest) %>% left_join(airports, by = c(“dest” = “faa”))
us_state.geo = st_as_sf(map(‘state’, plot = FALSE, fill = TRUE))
ggplot(data = us_state.geo) +
geom_sf(fill = “grey95”, colour = “grey75”) +
geom_point(data = airports_n, aes(x = lon, y = lat, size = n),
fill = “grey30”, color = “black”, alpha = .4) +
coord_sf(xlim = c(-125, -70), ylim = c(25, 50)) +
scale_size_area(guide = FALSE) +
labs(title = “Flights from New York City”) +
theme_void()
## Join flights with airlines to add airline name
named.flights = left_join(flights, airlines)
## Use anti_join to check that names flights contains all records in “flights”
anti_join(named.flights, flights)
“`
### sf map: Adding fill requires joining data to sf object
“`{r}
us_county.sf = st_as_sf(map(‘county’, plot = FALSE, fill = TRUE))
## County unemployment data from maps library
us_cities.df = us.cities
us_unemp.df = unemp # Unemployment data indexed by fips number
us_county.df = county.fips # Data that links fips number to county
## Merges data with shape information of map
us_county.sf = left_join(us_county.sf, us_county.df, by = c(“ID” = “polyname”))
us_county.sf = left_join(us_county.sf, us_unemp.df, by = “fips”) # Adds column for unemployment data
ggplot() +
geom_sf(data = us_county.sf, aes(fill = unemp)) +
geom_point(data = us_cities.df, aes(long, lat), colour = “red”, size = .1) +
coord_sf(xlim = c(-125, -70),
ylim = c(27, 50)) +
scale_fill_viridis_c() +
labs(title = “US cities and Unemployment”) +
theme_void()
“`
### sf map: Adding labels with geom_sf map data
Adding labels involves adding a mapping to the geometry of the sf and adding stat = “sf_coordinates” to link to the coordinate system.
“`{r}
library(ggrepel)
## Repeat with Wisconsin map
wi_county.sf = st_as_sf(map(‘county’, ‘wisconsin’, plot = FALSE, fill = TRUE)) %>%
mutate(county = str_replace(ID, pattern = “wisconsin,”, “”)) # To keep just county name
## Merges data with shape information of map
wi_county.sf = left_join(wi_county.sf, us_county.df, by = c(“ID” = “polyname”))
wi_county.sf = left_join(wi_county.sf, us_unemp.df, by = “fips”) # Adds column for unemployment data
ggplot(wi_county.sf) +
geom_sf(aes(fill = unemp), colour = “white”) +
geom_point(data = us_cities.df %>% filter(country.etc == “WI”),
aes(long, lat), colour = “red”, size = 1) +
geom_label_repel(
data = wi_county.sf %>% filter(unemp > 9),
aes(label = county, geometry = geom),
stat = “sf_coordinates”,
min.segment.length = 1.5,
colour = “black”, size = 3,
segment.colour = “grey25”) +
scale_fill_viridis_c() +
labs(title = “Wisconsin counties with highest unemployment”,
fill = “Percent unemployed”) +
theme_void()
“`
## Maps place data in spatial context: Snow’s map of cholera deaths
This partial replication of Snow’s map overlays spatial analytics (tesselation) to show points closest to each pump.
This map uses line segments defined by a data frame for the map.
“`{r SnowMap}
deaths.df = Snow.deaths
streets.df = Snow.streets
pumps.df = Snow.pumps
snow.plot =
ggplot() +
geom_line(data = streets.df, aes(x, y, group = street),
colour = “grey70”) +
geom_voronoi_tile(data = pumps.df, aes(x, y, group = 1L), fill = NA,
colour = “cornflowerblue”, alpha = .9, size = .6,
bound= c(min(streets.df$x), max(streets.df$x),
min(streets.df$y), max(streets.df$y))) +
geom_point(data = deaths.df, aes(x, y), size =.75) +
geom_point(data = pumps.df, aes(x, y),
colour = “cornflowerblue”, fill = “cornflowerblue”,
shape = 22, size = 3) +
labs(title = “J. Snow’s map of cholera deaths with tesselation for pumps”) +
theme_void() +
coord_equal()
snow.plot
“`
## Maps place data in spatial context: Minard’s map of Napoleon’s march
This partial replication of Minard’s map shows temperature and time, as well as geographic features, such as rivers that contributed to troup deaths. This plot builds on a raster image map that shows rivers and roads.
It also includes a lot of fine-grained positioning of the temperature and map objects. You will probably never need this. Patchwork package does essentially the same thing MUCH more easily.
“`{r}
# ## Match troop data to temperature data
# Small difference in position require reconciliation
Minard.troops$survivors[Minard.troops$survivors==340000] = 422000 # Data don’t match annotation on original plot
Minard.troop.temp = Minard.troops %>% filter(direction == “R”) %>%
filter(round(long, 0) %in% round(Minard.temp$long, 0)) %>%
mutate(long = round(long, 0)) %>% distinct(long, .keep_all = TRUE)
Minard.temp = Minard.temp %>% mutate(long = round(long, 0))
library(ggmap) # For raster maps
library(ggpubr)
library(pdp) # For fine-grained control of graph objects
## Create map
# campaign_map = get_map(
# location = c(min(Minard.troops$long)-1, min(Minard.troops$lat)-.5,
# max(Minard.troops$long)+1, max(Minard.troops$lat)+.75),
# maptype = “toner-lines”, source = “stamen”)
# save(file = “minard_map”, campaign_map)
load(file = “minard_map”)
ggmap(campaign_map)
# ## Plot troop path
troop.plot = ggmap(campaign_map) +
geom_path(data = Minard.troops,
aes(long, lat, size = (survivors)^1.33, colour = direction, group = group),
lineend = “round”, linejoin = “bevel”) +
geom_text(data = Minard.cities,
aes(long, lat, label = city), size = 2.75) +
geom_segment(data = Minard.troop.temp,
aes(x = long, xend = long, y = lat, yend = -Inf), alpha = .33) +
coord_cartesian(xlim = c(23.5, 38.5)) +
scale_size_continuous(“Survivors”, range = c(.25, 12)) +
scale_color_manual(“Direction”, values = c(“grey75”, “grey35”)) +
labs(title = “Napoleon’s March on Moscow”) +
theme_void() +
theme(legend.position = “none”,
legend.box=”horizontal”,
plot.margin = unit(c(0, 1.5, 0, 0), “cm”))
# ## Plot temperature vs. longitude, with labels for dates
temp.plot = ggplot(Minard.temp, aes(long, temp)) +
geom_path(color = “grey75”, size=1) +
geom_point(size=1) +
geom_segment(aes(x = long, xend = long, y = temp, yend = Inf), alpha = .33) +
geom_text(aes(label=paste(temp, “属”, ” “, date, sep = “”)), nudge_y = -4, size = 3) +
labs(x= “”, y = “Temp. (C属)”) +
scale_y_continuous(position = “right”) +
coord_cartesian(xlim = c(24.2, 38.15), ylim = c(-36, 5)) +
theme_minimal() +
theme(plot.margin = unit(c(0, .175, 0, 0), “cm”),
axis.title.y = element_text(size = rel(.8)),
panel.grid.minor = element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank())
## Combine the plots
p1 = ggplot_gtable(ggplot_build(troop.plot))
p2 = ggplot_gtable(ggplot_build(temp.plot))
maxWidth = unit.pmax(p1$widths[2:3], p2$widths[2:3])
p1$widths[2:3] <- maxWidth
p2$widths[2:3] <- maxWidth
pdp::grid.arrange(p1, p2, heights = c(4, 1.5))
```
## Maps use projections to translate locations on a elipsoid to a plane
These projections all involve distortion and so there is no optimal projection. The best projection depends on the task of the viewer. Coastal navigation is well-served by the Mercator projection because it perserves angles, but a straight line is not the shortest path between two points as it is with the Gnomonic projection. The Mercator projection also makes Africa look about the same size as Greenland even though Greenland is 14 times larger.
```{r StateProjection}
## Create dataframe based on the states map data
states_map.df = map_data("state") # Extracts boundary data from the maps package
mercator.plot =
ggplot(data = states_map.df, aes(long, lat, group = group)) +
geom_polygon(fill = NA, color = "grey60") +
coord_map(projection = "mercator") +
labs(title = "Mercator: Angle preserving") +
theme_light()
gnomonic.plot =
ggplot(data = states_map.df, aes(long, lat, group = group)) +
geom_polygon(fill = NA, color = "grey60") +
coord_map(projection = "gnomonic") +
labs(title = "Gnomonic: Great circle preserving") +
theme_light()
bonne.plot =
ggplot(data = states_map.df, aes(long, lat, group = group)) +
geom_polygon(fill = NA, color = "grey60") +
coord_map(projection = "bonne", lat = 45) +
labs(title = "Bonne: Shape preserving") +
theme_light()
mollweide.plot =
ggplot(data = states_map.df, aes(long, lat, group = group)) +
geom_polygon(fill = NA, color = "grey60") +
coord_map(projection = "mollweide") +
labs(title = "Mollweide: Area preserving") +
theme_light()
(mercator.plot + gnomonic.plot)/(bonne.plot + mollweide.plot)
```
## Join dataframes to add information to the map
```{r Choropleth48}
states_map.df = map_data("state") # Extracts boundary data from the maps package
## Read data and create region variable
arrests.df = USArrests
arrests.df = arrests.df %>%
mutate(region = rownames(USArrests)) %>%
mutate(region = tolower(region)) # converts to lower case
## Join map and arrest data
arrests.df = left_join(arrests.df, states_map.df, by = “region”)
## A choropleth maps a variable to the color of a region
ggplot(data = arrests.df, aes(long, lat, group = group, fill = Assault)) +
geom_polygon(color = “grey60”) +
coord_map(projection = “mollweide”) +
labs(title = “Assault arrests per 100,000”) +
theme_void()
“`
## Including all states requires a custom projection
Don’t try to override this projection with another such as “mollweide”, it crashes R.
“`{r Choropleth50}
## Read state data and change name of state variable
all_states.df = us_map()
all_states.df = all_states.df %>% mutate(state = str_to_lower(full))
## Read data and create state variable to match key in all_states
arrests.df = USArrests
arrests.df = arrests.df %>%
mutate(state = rownames(USArrests)) %>%
mutate(state = str_to_lower(state)) # converts to lower case
arrests.df = left_join(arrests.df, all_states.df, by = “state”)
## The custom projection causes other projections to fail
ggplot(data = arrests.df, aes(x, y, group = group, fill = Assault)) +
geom_polygon(color = “grey60”) +
#coord_map(projection = “mollweide”) +
labs(title = “Assault arrests per 100,000”) +
theme_void()
“`
## Think beyond maps: Comparisons with spatial entities might not be spatial
“`{r Dotplot, fig.height=3., units = “in”}
arrests.df = arrests.df %>% group_by(state) %>% slice(1) %>%
mutate(state = str_to_title(state)) # Converts name to title case
ggplot(arrests.df, aes(reorder(state, Assault), Assault)) +
geom_hline(aes(yintercept = mean(Assault)), colour = “grey80”) +
geom_point() +
geom_linerange(aes(ymin = 0, ymax = Assault)) +
labs(x = “”, y = “Assault arrests per 100,000”) +
scale_y_continuous(expand = expansion(mult = c(0, .1))) + # To bring lines close to labels
coord_flip() +
theme_light()
“`
## Scatterplot map: Comply with the Dark-is-more perceptual compatibility
In a scatterplot the dark-is-more mapping is clear. More dots stacking up occlude the map below.
Use direction =-1 to override the default:
scale_color_viridis_c(option = “inferno”, direction = -1)
This graph demonstrates the common problem of plotting instances or total counts. The result typically shows population density and does not reveal anything interesting about the underlying issue: Areas of dense population have many cars and many crashes.
“`{r}
## Load and transform the fatal crash data
load(“fars_2020.data”)
fars.df = fars.df %>% select(latitude, longitud, everything()) %>%
filter(longitud < 777) %>% # Removes missing values coded by 777, 888…
mutate(state = str_pad(state, 2, pad = “0”)) %>%
mutate(county = str_pad(county, 3, pad = “0”)) %>%
unite(“fips”, state:county, sep = “”, remove = FALSE)
## Reads boundary data from the maps package
states_map.df = map_data(“state”)
## Scatterplot with no map features–state boundaries
ggplot(data = states_map.df) +
geom_polygon(aes(long, lat, group = group), fill = “grey80”, color = “grey70”) +
geom_pointdensity(data = fars.df %>% filter(state != “02” & state != “15”), # Remove Alaska and Hawaii
aes(longitud, latitude), size = 0.05, show.legend = FALSE) +
scale_color_viridis_c(option = “inferno”, direction = -1) +
labs(title = “Fatal vehicle crashes reveal ares of dense population”) +
coord_map(projection = “mollweide”) +
theme_void()
“`
## Choropleth: Comply with the Dark-is-more perceptual compatibility
With a choropleth and heat maps more generally the dark-is-more mapping is not as obvious and not the default of ggplot.
“`{r message=FALSE}
## Caculate the mean number of fatalities per year
fips_fars.df = fars.df %>%
group_by(fips) %>%
summarise(m_fatal = mean(fatals))
all_counties.df = us_map(“counties”)
all_counties.df = left_join(all_counties.df, fips_fars.df, by = “fips”)
all_counties.df = left_join(all_counties.df, countypop, by = c(“fips”, “county”, “abbr”))
all_counties.df = all_counties.df %>% mutate(fatal_rate = m_fatal/pop_2015)
##
p1 = ggplot(data = all_counties.df, aes(x, y, group = group, fill = log(fatal_rate))) +
geom_polygon(color = “grey60”, size =.05, show.legend = FALSE) +
scale_fill_viridis_c(option = “viridis”, direction = 1) +
labs(title = “Light is more”) +
coord_equal()+
theme_void()
p2 = ggplot(data = all_counties.df, aes(x, y, group = group, fill = log(fatal_rate))) +
geom_polygon(color = “grey60”, size =.05, show.legend = FALSE) +
scale_fill_viridis_c(option = “viridis”, direction = -1) +
labs(title = “Dark is more”) +
coord_equal()+
theme_void()
p1+p2 + patchwork::plot_annotation(title = “Calculating the percapita rate of fatal crashes reveals dangerous counties”)
“`
## Projecting high-dimensional data onto a plane
“`{r UMAP}
wine.df = read_csv(“winequality-red.csv”)
wine.umap = wine.df %>%
select(-quality) %>%
umap()
umap.layout = wine.umap$layout # Extract the dimensions
colnames(umap.layout) = c(“x”, “y”)
wine_umap.df = cbind(wine.df, umap.layout)
ggplot(wine_umap.df, aes(x, y, colour = quality > 5)) +
geom_point(size = .8) +
labs(title = “Dimensions of wine quality reduced to two”,
x = “”, y = “”,
colour = “Wine quality”) +
scale_colour_brewer(palette = “Dark2”, labels = c(“Bad”, “Good”)) +
theme_void()
“`