—
title: “Assignment 7 Notes”
output:
html_document:
toc: yes
—
“`{r global_options, include=FALSE}
knitr::opts_chunk$set(collapse=TRUE)
“`
## 1. Cancellations and Destination Location
It is useful to add a `canceled` variable to the `flights` table,
assuming that canceled flights are those with both `dep_time` and
`arr_time` missing:
“`{r, message = FALSE}
library(dplyr)
library(nycflights13)
flights <- mutate(flights, canceled = is.na(dep_time) & is.na(arr_time))
```
For each destination and the first three months, compute the number of
flights, percent canceled, and average arrival delay:
```{r}
fl3 <- filter(flights, month <= 3)
fl3 <- summarize(group_by(fl3, dest),
n = n(),
pcan = 100 * mean(canceled),
delay = mean(arr_delay, na.rm = TRUE))
```
To show the data on a map, add location information by joining with
data from the `airports` table:
```{r}
fl3 <- left_join(fl3, select(airports, faa, lat, lon, alt), c("dest" = "faa"))
```
A map showing the cancellation percentages for the top 50 destinations:
```{r}
library(ggplot2)
pm <- ggplot(top_n(fl3, 50, n), aes(x = lon, y = lat)) +
borders("state") + coord_map()
pm + geom_point(aes(size = pcan)) + scale_size_area()
```
Using alpha blending can help with the over-plotting along the east
coast:
```{r}
pm + geom_point(aes(size = pcan), alpha = 0.3) + scale_size_area()
```
Cancellation percentages are higher for closer airports and airports
likely to be experiencing similar weather conditions.
Whether the average delay is 20 minutes or more can be encoded in
using color or shape:
```{r}
pm + geom_point(aes(size = pcan, color = delay >= 20)) + scale_size_area()
pm + geom_point(aes(size = pcan, shape = delay >= 20)) + scale_size_area()
“`
For a 15 minute cutoff there are a few more high delay destinations:
“`{r}
pm + geom_point(aes(size = pcan, color = delay >= 15)) + scale_size_area()
pm + geom_point(aes(size = pcan, shape = delay >= 15)) + scale_size_area()
“`
The size and shape channels interfere with each other; color and size
do not. Picking out the rarer shapes is also harder than spotting the
different colors: color achieves better pop-out.
## 2. Wind Speed, Time of Day, and Departure Delays
Average delays increase approximately linearly until early evening:
“`{r, message = FALSE}
library(nycflights13)
library(dplyr)
library(ggplot2)
weather <- mutate(weather,
wind_speed = ifelse(wind_speed > 1000, NA, wind_speed))
flights <- left_join(flights,
select(weather, -(year : hour)),
c("origin", "time_hour"))
fl <- summarize(group_by(flights, wind_speed, hour),
avg_delay = mean(dep_delay, na.rm = TRUE),
n = n())
p <- ggplot(fl, aes(x = hour, y = avg_delay)) +
geom_point(aes(size = n)) + scale_size_area()
p
```
The pattern is roughly the same at all wind speeds:
```{r}
p + facet_wrap(~ cut_number(wind_speed, 5))
```
Conditioning on time of day also shows little variation with moderate
wind speed levels once time of day is accounted for:
```{r}
ggplot(filter(fl, wind_speed <= 25),
aes(x = wind_speed, y = avg_delay)) +
geom_point(aes(size = n)) + scale_size_area() +
facet_wrap(~ cut_number(hour, 6))
```
The association between departure delay and wind speed seen previously
can be attributed to an association between wind speed and time of
day:
```{r}
w <- summarize(group_by(weather, hour),
avg_wind_speed = mean(wind_speed, na.rm = TRUE))
ggplot(w, aes(x = hour, y = avg_wind_speed)) + geom_point()
```