程序代写 US-36 Crawfordsville 139

Manipulating Data in R Lab

Manipulating Data in R Lab

Copyright By PowCoder代写 加微信 powcoder

library(readr)
library(dplyr)

## Attaching package: ‘dplyr’

## The following objects are masked from ‘package:stats’:
## filter, lag

## The following objects are masked from ‘package:base’:
## intersect, setdiff, setequal, union

library(ggplot2)
library(tidyr)
library(stringr)

Read in the Wide Bike lanes data: data/Bike_Lanes_Wide.csv . Name the data set wide.

wide = read_csv(“data/Bike_Lanes_Wide.csv”)

## Rows: 134 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: “,”
## chr (1): name
## dbl (8): BIKE BOULEVARD, BIKE LANE, CONTRAFLOW, SHARED BUS BIKE, SHARROW, SI…
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Reshape wide using gather. Call this data long. Make the key lanetype, and the value the_length. Make sure we gather all columns but name, using -name. Note the NAs here:

long = wide %>%
gather(key = “lanetype”, value = “the_length”, -name)
head(long)

## # A tibble: 6 × 3
## name lanetype the_length
##
## 1 ALBEMARLE ST BIKE BOULEVARD NA
## 2 ALICEANNA ST BIKE BOULEVARD NA
## 3 ARGONNE DR BIKE BOULEVARD NA
## 4 ART MUSEUM DR BIKE BOULEVARD NA
## 5 AUCHENTOROLY TERR BIKE BOULEVARD NA
## 6 BANK ST BIKE BOULEVARD NA

This exercise is motivated by https://anythingbutrbitrary.blogspot.com/2012/08/manipulating-data-frames-using-sqldf.html

read in the roads and crashes CSVs. Call them crash and road crash: data/crashes.csv road: data/roads.csv

crash = read_csv(“data/crashes.csv”)

## Rows: 110 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: “,”
## chr (1): Road
## dbl (3): Year, N_Crashes, Volume
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

road = read_csv(“data/roads.csv”)

## Rows: 5 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: “,”
## chr (2): Road, District
## dbl (1): Length
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

head(crash)

## # A tibble: 6 × 4
## Year Road N_Crashes Volume
##
## 1 1991 Interstate 65 25 40000
## 2 1992 Interstate 65 37 41000
## 3 1993 Interstate 65 45 45000
## 4 1994 Interstate 65 46 45600
## 5 1995 Interstate 65 46 49000
## 6 1996 Interstate 65 59 51000

head(road)

## # A tibble: 5 × 3
## Road District Length
##
## 1 Interstate 65 Greenfield 262
## 2 Interstate 70 Vincennes 156
## 3 US-36 Crawfordsville 139
## 4 US-40 Greenfield 150
## 5 US-52 Crawfordsville 172

Replace (using str_replace) any hyphens (-) with a space in crash$Road. Call this data crash2. Table the Road variable.

crash2=crash;
crash2$Road=str_replace(crash2$Road,”-“,” “)

crash2 = crash %>% mutate(Road = str_replace(Road, “-“, ” “))
table(crash2$Road)

## Interstate 275 Interstate 65 Interstate 70 US 36 US 40
## 22 22 22 22 22

How many observations are in each dataset?

dim(crash)

## [1] 110 4

## [1] 5 3

nrow(crash)

## [1] 110

nrow(road)

Separate the Road column (using separate) into (type and number) in crash2. Reassign this to crash2. Table crash2$type

crash2 = separate(crash2, col = “Road”, into = c(“type”, “number”))
table( crash2$type)

## Interstate US
## 66 44

Create a new variable calling it road_hyphen using the unite function. Unite the type and number columns using a hyphen (-) and then table road_hyphen

crash2 = unite(crash2, col = “road_hyphen”, type, number ,sep = “-“)
table( crash2$road_hyphen)

## Interstate-275 Interstate-65 Interstate-70 US-36 US-40
## 22 22 22 22 22

Which and how many years were data collected?

unique(crash$Year)

## [1] 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005
## [16] 2006 2007 2008 2009 2010 2011 2012

length(unique(crash$Year))

Read in the Bike lanes data: data/Bike_Lanes.csv Name the data set bike,

bike = read_csv(“data/Bike_Lanes.csv”)

## Rows: 1631 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: “,”
## chr (6): subType, name, block, type, project, route
## dbl (3): numLanes, length, dateInstalled
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Keep rows where the record is not missing type and not missing name and re-assign the output to bike.

bike = filter(bike, !is.na(type) & !is.na(name))

Summarize and group the data by grouping name and type and take the sum of the length (reassign length variable). Call this data set sub

sub = bike %>%
group_by(name, type) %>%
summarize(length = sum(length))

## `summarise()` has grouped output by ‘name’. You can override using the `.groups`
## argument.

Reshape sub using spread. Spread the data where the key is type and we want the value in the new columns to be length – the bike lane length.

Call this wide.

wide = spread(sub, key = type, value = length)

Look at the column names of wide – what are they? (they also have spaces)

Join data to retain only complete data, (using an inner join) e.g. those observations with road lengths and districts. Merge without using by argument, then merge using by = “Road”. call the output merged. How many observations are there?

merged = inner_join(crash, road)

## Joining, by = “Road”

merged = inner_join(crash, road, by = “Road”)
dim(merged)

## [1] 88 6

Join data using a full_join. Call the output full. How many observations are there?

full = full_join(crash, road)

## Joining, by = “Road”

nrow(full)

## [1] 111

Do a left join of the road and crash. ORDER matters here! How many observations are there?

left = left_join(road, crash)

## Joining, by = “Road”

nrow(left)

Repeat above with a right_join with the same order of the arguments. How many observations are there?

right = right_join(road, crash)

## Joining, by = “Road”

nrow(right)

## [1] 110

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com