代写 html MARIE

title: Assignment 6 Notes
output:
htmldocument:
toc: yes

r globaloptions, includeFALSE
knitr::optschunksetcollapseTRUE

r, include FALSE
librarydplyr
librarynycflights13
libraryggplot2

1. Fleet City Gas Mileage

r, include FALSE
librarydplyr
libraryggplot2

libraryreadr
if ! file.existsvehicles.csv.zip
download.filehttp:www.stat.uiowa.edulukedatavehicles.csv.zip,
vehicles.csv.zip
newmpg readcsvvehicles.csv.zip, guessmax 10000

After reading the data into the variable newmpg start by focusing on
the nonelectric vehicles for the years since 2009:

r
nm filternewmpg, fuelType1 ! Electricity, year 2009

Compute the average city mileage for the models for each year and make:

r
nm summarizegroupbynm, make, year, cty meancity08

The result in nm is still grouped, so remove the grouping structure
before identifying the top five for 2018:

r
nm ungroupnm
tnm18 topnfilternm,year 2018, 5, cty

The averages over the years for these makes can be extracted as
make in tnm18make:

r
tnm filternm, make in tnm18make

An alternative is to use semijoin:

r
tnm1 semijoinnm, tnm18, make
identicaltnm, tnm1

A plot shows a steady increase in fleet average city gas mileage for
all of these manufacturers over this period.

r
ggplottnm, aesyear, cty, color make geomline

2. Arrival Delays and Cancellations

r, include FALSE
librarydplyr
librarynycflights13

After loading the data package, a useful first step is to add a
canceled variable to the flights table:

r
flights mutateflights, canceled is.nadeptime is.naarrtime

Compute the average delay, proportion of canceled flights, and number
of flights for each hour of the day:

r
fl summarizegroupbyflights, origin, hour,
delay meanarrdelay, na.rm TRUE,
pcan meancanceled,
n n
headfl

The first hour has only one flight, which is canceled:

r
filterflights, hour 1

This seems out of place relative to the many other flights in each of
the hours of operation no other flights are scheduled to depart
between midnight and 5 AM. So set it aside for now:

r
fl filterfl, hour ! 1

A plot of the average delays against departure hour:

r
ggplotfl, aesx hour, y delay, color origin geompoint

Adding smooth fitted curves helps:

r
ggplotfl, aesx hour, y delay, color origin
geompoint geomsmooth

Delays increase over the day, tapering off a little in the later
evening. Delays are similar across all three airports during the
morning. For flights leaving in the late afternoon and early evening,
flights from Newark experience greater delays and flights from JFK
experience smaller delays. Early morning seem the best time to depart
for an on time arrival.

Cancellations also happen more often for flights leaving later than
flights leaving earlier. So againan early departure looks like a good
idea.

r
ggplotfl, aesx hour, y pcan, color origin
geompoint geomsmoothse FALSE

To see whether conclusion change for shorter or longer flights,
add a classification of distance into short or long:

r
fl2 mutateflights, type ifelsedistance 1000, short, long

Then redo the summaries:
r
fl2 summarizegroupbyfl2, origin, hour, type,
delay meanarrdelay, na.rm TRUE,
pcan meancanceled,
n n
fl2 filterfl2, hour ! 1

For the average delays on longer flights, all three airports follow a
similar pattern of delays increasing throughout the day. For shorter
flights, delays for flights out of Newark become substantially larger
in the afternoon and evening than for the other two airports.

r
ggplotfl2, aesx hour, y delay, color origin
geompoint geomsmoothse FALSE facetwrap type

For longer flights the proportion canceled varies little throughout
the day. For shorter flights it increases slightly through most of
the day.

r
ggplotfl2, aesx hour, y pcan, color origin
geompoint geomsmoothse FALSE facetwrap type

3. Departure Delays and Wind Speed

Box plots of wind speeds at the three NYC airports show a very high
value for one measurement:

r
librarynycflights13
ggplotweather geomboxplotaesy windspeed, x origin

filterweather, windspeed 1000

This value is not plausiblehttps:en.wikipedia.orgwikiWindspeed,
so set it to NA:

r
weather mutateweather,
windspeed ifelsewindspeed 1000, NA, windspeed

Join the weather data to the flights data using origin and
timehour as the key. We dont need the year, month, day, and
hour, so drop them to simplify the result:

r
fl leftjoinflights,
selectweather, year : hour,
corigin, timehour

Check that this key is a good primary key for the weather table:

r
nrowfiltercountweather, origin, timehour, n 1
! anyNAweatherorigin ! anyNAweathertimehour

Compute the average departure delay and number of flights for each
wind speed level:

r
flw summarizegroupbyfl, windspeed,
delay meandepdelay,na.rm TRUE,
n n

A scatter plot of the average delay times for each wind speed:
r
ggplotflw, aesx windspeed, y delay geompoint

For wind speeds below 20 MPH the average delay increases nearly
linearly with wind speed. For higher wind speeds the relation is more
diffuse.

Using size to encode the number of departures at each wind speed:

r
ggplotflw, aesx windspeed, y delay, size n
geompoint scalesizearea

The number of departures for wind speeds above 20 MPH is much lower
than at lower wind speeds, so the averages are based on less data an
thus more variable. For lower wind speeds the relation between
departure delay and wind speed seems quite solid:

r
ggplotfilterflw, windspeed 25,
aesx windspeed, y delay, size n
geompoint scalesizearea

!
try first, third quarter:
flq mutateflights, quarter month 1 3 1
flq summarizegroupbyflq, dest, quarter,
n n,
pcan 100 meancanceled,
delay meanpmaxarrdelay, 0, na.rm TRUE
flq leftjoinflq, selectairports, faa, lat, lon, alt, cdest faa
flq semijoinflq, topnfl3, 50, n, dest

mflq semijoinfilterflq, quarter in c1, 3,
topnfilterflq, quarter 1, 50, n,
dest
pmq ggplotmflq, aesx lon, y lat
bordersstate coordmap
pmq geompointaessize pcan, color delay 15 scalesizearea
facetwrapquarter
comparisons?
plotly?

!
Local Variables:
mode: polymarkdownR
mode: flyspell
End: