Problem 1: Get the basic station information into the database. Load the station file “Capital_Bikeshare_Terminal_Locations.csv” into a table in the database. Your solution should check to see if the data table exists and creates the table if it does not exist. Your solution should also output to the screen the total number of records entered into the data table. This problem gives you the skills required to connect to a database, create tables, import data from a CSV and use Python abstract data structures. HINTS: use the executemany command in SQLite to load the data into the database. Recommend you use the lists to temporarily store the data in memory before loading it into the database (as opposed to using a dictionary – which seems to be a bit more difficult to implement correctly for some people). Use the following date-time format when loading the data into the SQLite database: YYYY-MM-DD HH:SS. The dashes are important because that is the only format that SQLite knows and recognizes inherently as a date.
Problem 2 : Get the trip data into the database. Iterate over all the data files that contain trip history (these are the 24 data files contained in the zipped directory) and load them into a single data table in a database. Your solution should check for the existence of the data table that holds the bike data (and create it if it does not exist), and load the contents of the files into the data table. Your solution should also output the number of records transferred from each data set as well as return the total number of records that have been loaded into the data table. This problem gives you the skills required to iterate over multiple data sets, insert data into an existing table, and creating tables if one does not exist. HINTS: use the same hints for Problem 1 and I recommend that you extract the data from the zipped archive into a directory as opposed to trying to access the files from within the archive. Iterate over the 24 files do not type all 24 file names into your routine.
Problem 3: Write a routine (or call an existing module) that takes as its arguments the LAT/LON for any two points and whether distance should be calculated as miles or kilometers. The output of the routine should be the distance between the two points (in the specified unit of measure). This problem exercises your ability to make routines that take in required arguments and passes out a formatted response. Hint: The function I am asking you to implement is the Haversine function. As an alternative, you can also use the Vincenty function. You can find the actual math formulas on line, or you can load and call the two libraries that already have been written. You just need to find them so you can install them.
Problem 4 : Create a routine that returns a dictionary that has as its keys all pairs of bike terminals and the distance between those two locations. Use your solution to problem 3 for this problem. Your solution should include calling a query in your SQLite database that provides you the data you need to calculate the distance between all station pairs. This problem exercises your ability to manipulate data sets within a database as well as use dictionaries to save data. Hint: This means you have to make a join between two tables. You could do a Cartesian join or you could iterate over all bike terminals and then find the distance from terminal to all other terminals.
Problem 5 : Create a routine that takes as its argument a dictionary, a Bikeshare terminal, and a distance and returns a list of all stations that are within the specified distance of the specified docking station. This problem tests your ability to write a routine that takes in arguments, passes out results, and tests your ability to filter off of keys in a dictionary. Hint: You don’t need to query the database for this problem if you did Problem 4 correctly.
Problem 6: Create a routine that takes as its argument any two BikeShare stations and a start and end date and returns the total number of trips made by riders between those two stations over the period of time specified by the start and stop date. This problem tests your ability to write a select statement on a table in a database and return the results from a select query. Hint: You have to call the database for this routine to work correctly.
Problem 7: Write commands (and include them in the module) that actually calls and executes all of the scripts you wrote for Problems 1 through 6. This problem makes it easier for me to grade your assignment. Hint: There is a strong correlation between how easy it is to grade your homework and how good your grade is. This holds true for the rest of the course and in all probability with the majority of your professors.
Format for the data files follows:
Capital_Bikeshare_Terminal_Locations.csv (1 file in total)
- TERMINAL_NUMBER: The unique numeric identifier for the terminal
- ADDRESS: The street address for the terminal
- LATITUDE: The latitude for the terminal
- LONGITUDE: The longitude of the terminal
- DOCKS: The number of docking stations at the terminal
Year_201X_QTR_Y_bikeTrips.csv (24 files in total)
- TRIP_DURATION: The length of time the bike was leased (measured in milliseconds)
- START_DATE: The date and time the bike lease started
- START_STATION: The station ID where the bike trip started
- STOP_DATE: The date and time the bike lease ended
- STOP_STATION: The station ID where the bike lease ended
- BIKE_ID: The unique identifier for the bike that was leased
- USER_TYPE: Indicates whether the user was a registered user (member) or a casual user (not a member)