程序代写代做代考 database Structure

Structure
Programming in R – Week 2 Assignment
IPAL – The University of Chicago
Due: Sunday, July 12, 2020 at 11:59pm on Canvas
This assignment will focus on gathering, analyzing, and plotting real data. The answers here are much more open ended than those in Problem Set 1, and there may not be an obvious “right” way to do things. Try your best and record any assumptions or major choices you make as comments. Like before, this problem set will be broken into three sections, each worth 16 points.
The goal of this assignment is to explore the relationship between temperature and homicides in Chicago. We will use temperature data from the National Oceanic and Atmospheric Administration (NOAA) and crime data from the City of Chicago Data Portal. The temperature data is pre-collected, but you will need to retrieve the crime data yourself via an API.
Start by creating a new project/folder for this assignment. Create a new R script to save your code. For each chunk of code you create, please preface it with a comment describing what your code is doing.
For example, your answers might look like this:
# Loading in saved homicide data
homicides <- read_csv("homicides.csv") # Finding the number of crimes for each police district homicides %>% group_by(district) %>% summarize(count = n())
Section 1: Visualizing Weather Data
The provided weather data (ohare_temps.csv) comes from the NOAA weather station at O’Hare Airport. It includes a timestamp and corresponding temperature (in Farenheit) for each hour since January 1st, 2001.
Using functions from the tidyverse and lubridate packages, start by reading the provided CSV and converting the timestamp column to a datetime format. Next, extract the year, month, day, week, and hour from your datetime-formatted column and make them into separate columns, called year, month, day, week, and hour.
Calculate the average temperature for each year and save your results to a new dataframe. Your results should look similar to the ones below:
head(temps_avg, n = 4)
## # A tibble: 4 x 2
## year mean_temp
1

##
## 1 2001 51.3
## 2 2002 51.2
## 3 2003 49.3
## 4 2004 50.4
Answer the following questions using code and comments:
1. What are the top two coldest years in your summarized dataset? Exclude 2020 since there is no summer and fall data.
2. On average, across all months and years, what hour of the day is the hottest?
3. What day has the largest swing in temperature from 3 AM to 3 PM? (Hint: There are many ways to
calculate this. I suggest filter() and spread() or the lag() function)
4. What week in the dataset had the largest year-to-year absolute change in average temperature? In
other words, comparing the average temperature of weeks across all years, what week had the greatest change from the same week in the previous year?
Finally, using your summarized dataset, do your best to replicate the following plot. Note: 2020 is excluded.
Average Temperature by Year in Chicago (2001−2019)
Chicago’s polar vortex year
54
52
50
48
2005
2010 2015
Year
Source: NOAA Weather Station, O’Hare Airport
2
Temperature °F

Section 2: Downloading and Summarizing Homicide Data
The City of Chicago keeps a fairly comprehensive database of crimes which can be found here: https: //data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2. Within this database there are records of all the homicides committed in Chicago since 2001.
We want to extract only these records, however, our previous method of downloading a CSV and using filter() to keep only the records we want is unlikely to work because the crimes dataset CSV is multiple GB in size. Instead, we can use the Data Portal’s API to grab only the records we’re interested in. This can be accomplished in two ways:
1. Use the RSocrata library and connect to the crimes API. There is example documentation on the city website.
2. Use the raw API and the read_json() function from the jsonlite library to directly query the API and read JSON data into R.
Your dataset of all homicides should contain 10,000 rows. Once you’ve successfully read the data into R, use lubridate functions similar to those you used on the temperature data to extract the year, month, day, week, and hour columns.
Next, answer the following questions using code and comments:
1. What year had the highest number of homicides in Chicago?
2. What hour, on average, has the most homicides?
3. What community areas had the lowest number of homicides? Use a join and community area data
from the Data Portal to determine the the names of each community area. Finally, replicate the plot below to the best of your ability:
2001
2003
2005
2007
2009
2011
2013
2015
2017
2019
Number of Homicides
(0,10] (10,20] (20,30] (30,40] (40,50]
Homicides Over Time in Chicago
Year
10 20 30 40 50
Week
Source: City of Chicago Data Portal
3

Section 3: Combining Both Datasets
Finally, we want to combine aggregated data from both datasets into a single plot. First, find the mean temperature and mean number of homicides by week across all years. Then, merge your results and replicate the plot below to the best of your ability.
Homicides vs Temperature in Chicago (2001 − 2019)
15
70°F
12 60°F
Type Homicides
Temp
9
50°F
40°F
6 30°F
20°F
0 10 20 30 40 50
Week
What is potentially wrong with this plot? Is there a way we could improve it? What might explain the phenomenon that it shows? Answer in a comment.
4
Average # of Homicides
Average Temp