Instructions
HW 5. Parser and Tidying Data
Stat 412/612 Statistical Programming in R Due: Thursday, June 11
• Turn in this assignment as a single PDF, knitted from R Markdown, on Blackboard. • Only include the necessary code, not any extraneous code, to answer the questions. • Please also turn in the R Markdown file you used to create the PDF.
• Learning objectives:
– Gain more familiarity with tidyr – Parsing practice
– Import messy data
– A little more dplyr practice
• Grading: 10 pts.
Exercises
1. (5 pts) Baltimore City Crime Data:
a. (1 pt) Import the data. (See Blackboard.)
b. (2 pts) Convert the given dates (CrimeDat) and times (CrimeTime) to date and times classes. Note that, for CrimeTime, not all of the rows conform to the “HH:MM:SS” format. Remove those rows where the parsing failed. Then, use appropiate graph to show the distribution of CrimeTime. Save the output data frame for the following questions. (Hint: You can use problems() to identify the rows with parsing errors.).
c. (1 pt) Make Location 1 into two columns LocationLat and LocationLon. The new columns should be numerical.
d. (1 pt) Determine the % of crimes committed between midnight and 4:00 am. (Hint: use logticals and the mean() function.)
2. (3 pts) Import the billboard dataset (posted as a .csv on Blackboard) and tidy it up. The values in column wkx are the a song’s ranking after x weeks of being released.
a. (1 pt) First, tidy the data frame so that there is one observational unit (row) for each week-song combination when there is an entry (i.e. drop the missing values). Save the output data frame for the following questions.
b. (1pt)Then,converttheweekvariabletoanumber,andusetheweekvariableandthedata.entered variable to figure out the dates corresponding to each week on the chart. Save the output data frame for the following questions.
c. (1 pt) Sort the data by artist, track and week. Here are what your first entries should be (formatting can be different):
1
## # A tibble: 5,307 x 7
## year artist track time week rating date
##