代写 html python database Data Science: Data

Data Science: Data
Coursework 1, weight: 50% Submission Deadline: 29th of January
For this coursework we will focus on developing a Python 3 library that wraps around API calls to access live data provided by Transport for London (TfL).
As mentioned in the lectures, creating reusable libraries that deal with all the messy aspects of using APIs and parsing their outputs is considered to be best practice in data science project management and workflow. By implementing the library once and sharing it, the rest of the data science team can focus on utilising the data and building models rather than every member having to write their own code to handle and obtain data.
The documentation for TfL’s API is here: ​https://api.tfl.gov.uk/swagger/ui/index.html
Note that the documentation for this API is not comprehensive and is example-based. This means that you might need to do some reverse engineering to understand how the API calls work, but this is common in publicly available APIs. You do not need to register for an API key.
Question 1
Part (i) [20 marks]
Create a library called ​tfl_data
The library should contain two functions, with the following signature and specification:
get_line_severity(line_id=)
#Takes as input a name of a tube line and returns its current status
Example:
Input:​’victoria’
Output:​ ‘good service’
If TfL is not able to understand the given line id (for example due to typos) then the function should return a suitable error message.
get_air_quality(is_future):
#Takes a boolean is_future, if it is True returns one word summary of the
future forecast of air quality as determined by TfL, if false returns the one
word summary of current state of air quality as determined by TfL.
Example: Input: ​True Output: ​’Low’

Part (ii) [10 marks]
Assume that you have a contractually restricted API quota imposed by your data provider (in this case TfL) which implies that no more than 5 API calls can be made within any 5 minute period. Implement your ​tfl_data​ library so that it adheres to this restriction in every session in which it is used. That means if a function that makes an API call breaks the quota, the library returns an error message, “Can’t make an API call now due to quota limit, try again in a few minutes.”. Note also in this case the function is required to return the error message and exit the function ​before s​ ending an API call to TfL.
Hint: to implement this feature you might need to store the timestamp of every call that is made in a session. You may find Python’s ​time​ library to be useful.
Part (iii) [10 marks]
Implement a test library for ​tfl_data​. The unit test should only contain tests that test the quota restriction requirement implemented in Part (ii).
Question 2
[10 marks]
Assume you work for a large telecom corporation that has recently set up a data science team. Like all large corporations it has plenty of legacy databases spread around the company (marketing database, operational database concerning mobile towers, HR databases, retail outlets selling sim cards, …).
Now imagine that your manager asked you to provide her with a brief report outlining the pros and cons of rolling out an internal central API service, which sits on top of the company’s existing legacy databases and which can be used by the company’s new data science team.
The business case should not contain more than 400 words, and it is considered best-practice in such executive reports to layout the pros and cons using bullet points.