QBUS3600 Assignment 1 (A2B) Due dates: Thursday 9 April 2020 Value: 30%
Notes to Students
1. TheassignmentMUSTbesubmittedelectronicallytoTurnitinthrough QBUS3600 Canvas site. Please do NOT submit a zipped file.
2. Theassignmentisdueat16:00pmonThursday,9April2020.Thelate penalty for the assignment is 5% of the assigned mark per day, starting after 16:00 on the due date. The closing date Thursday, 16 April 2020, 16:00 is the last date on which an assessment will be accepted for marking.
3. Youranswersshallbeprovidedasaword-processedreport(MicrosoftWord, LaTeX or equivalent) giving full explanation and interpretation of any results you obtain. Output without explanation will receive zero marks.
4. Bewarnedthatplagiarismbetweenindividualsisalwaysobvioustothe markers of the assignment and can be easily detected by Turnitin.
5. ThedatasetsforthisassignmentcanbedownloadedfromCanvas.The dataset is highly confidential, and you have responsibility to keep it secure and for it to be used only for your QBUS3600 course work.
6. Presentationoftheassignmentispartoftheassignment.Marksareassigned for clarity of writing and presentation.
7. YoushouldsubmityourPythoncodeorJupyterNotebooktotheseparate submission portal available on canvas
8. Youmayinsertsmallsectionsofyourcodeintothereportforbetter interpretation when necessary. However, you must consider the audience of your report.
9. Thinkaboutthebestandmoststructuredwaytopresentyourwork, summarise the procedures implemented, support your results/findings and prove the originality of your work.
10.Numbers with decimals should be reported to the third decimal point.
2020S1 QBUS3600 Assignment 1 (A2B) 1
Background
Taxi prices are heavily regulated across Australia. In NSW, these regulations are detailed in the “Point to Point Transport (Fares) Order 2018” legislation. Fares are the aggregate of five charges: base fare, booking fee, government levy, charge per minute or charge per km (based on a cross-over speed). Customers may also incur extras such as tolls and time-based tariffs. Due to the multi-faceted nature of taxi charges, there are many approaches which can be taken to estimate fares, such as high-level estimation using historical data or more granular methods, such as fare table estimation, including different hourly predictions.
Scenario Story: It is March 2020, and you have just recently started your job as a business analyst at A2B Australia Limited (A2B), the parent company to several of Australia’s most iconic taxi brands, notably 13Cabs and Silver Service. Your job has been smooth so far and you spend your days sipping jasmine tea and querying SQL databases. One morning as you make your way into the office, breakfast in hand, you spot Ali exiting the Chief Operating Officer’s (COO) office, deep in thought. As you attempt to sneak past to your desk, he notices you and walks over.
Ali tells you Fred, the COO, is interested in taxi fare prediction as the general sentiment amongst frequent taxi users is for the addition of upfront payment options for trips, to add certainty for both drivers and passengers. Ali is certain that the inhouse analytics team can develop an accurate solution. As most of the department is on holiday, Ali entrusts this important task to you and suggests you start working alone on understanding the data and relevant business.
Initially, Ali is interested in patterns available in taxi fares in Sydney.
Data
The taxi trip data will be provided. This data will include trips in Sydney between August 2019 to October 2019.
BookingID: M_On_Time:
Lat_M_On: Lat_M_Off: M_Off_Time:
Lat_M_Off: Lat_M_Off:
City: Dest_Suburb: DelJobDistance: DelJobTime: ChargesPrice: ChargesExtras: ChargesFlagfall:
Unique identifier for each trip.
Records when taxi meter was switched on for the trip. Data is stored in YYYY-MM-DD HH:MM:SS format.
Latitude of start point.
Longitude of start point.
Records when taxi meter was switched off for the trip. Data is stored in YYYY-MM-DD HH:MM:SS format.
Latitude of end point.
Longitude of end point.
Designates the city where transaction took place.
Records the destination suburb of the passenger.
Records the total distance of the trip. Data is stored in metres. Records the total time of the trip. Data is stored in seconds. Records the cost of the trip. Data is stored in dollars.
Records any surcharges for trip. Data is stored in dollars. Records the fixed start fee for taxi. Data is stored in dollars.
2020S1 QBUS3600 Assignment 1 (A2B) 2
Task 1 – Preliminary Investigation (80 marks)
Working in a team of analysts at A2B you have been placed in charge of leading the preliminary investigation. This investigation, in the form of a technical report, is intended to serve as a reference for other business analysts working in your team.
You must address the following key questions:
1. Isitpossibletoestablishanyrelationshipamongvariablesorfeatures recorded in the dataset? If so, provide detail about the nature of the relationships.
For example (these are just examples, you are encouraged to think of other relationships)
a. Isthechargedtripcostrelevanttothedistances,locations,pick-up times, durations etc?
b. Whatkindsoftripsaremoreoften?
c. How is the trip time related to trip distance? The fare regulation
provides a formula, but in the real world, traffic, routes, and driving conditions impact fares. What is the relationship between trip distance and fare under different conditions?
d. Howdoestaxitransportchangethroughouttheweek(e.g.Monday compared to Friday)
e. Andmore(anythingyouthinkworthtoexplore)
2. How would you segment the trips in the given dataset? What are some segments, and their characteristics? Some examples of ways you could segment trips:
a. Byhowmuchisthetripcost
b. Byhowlongisthetripdistance
c. By locations where passengers are picked up
d. Bylocationswherepassengersaredroppedoff
e. Bytheextraorsur-chargepassengerspaid
f. Can you make some assumptions about the above to translate clusters
to passenger or trip type (work, leisure, etc.)
g. HowdoesSydneymoveintaxis?Howdopickupanddropoffareas
change throughout the day, or the week?
To address Questions 1 and 2, you are expected to perform a preliminary Exploratory Data Analysis (EDA) over the dataset. You are expected to find or reveal all possible properties, characteristics, patterns and statistics hidden in the dataset with a view to the problem at hand.
2020S1 QBUS3600 Assignment 1 (A2B) 3
You are limited to a maximum of 15 pages. Your report should describe, explain, and justify your findings. Ensure that your report is concise and objective.
List key resources as references in the end of your report, such as journal articles, conference papers, reports, news and software etc. Use APA style for your references.
Task 2 – Executive Briefing (20 marks)
You have been asked to summarise your findings so that they can be shared with the wider business and in particular, management. This one-page briefing should concisely describe your findings to a non-technical audience and primarily address the business problem. In the briefing you should also outline your suggestions for acting on your findings.
You are limited to a maximum of 1 page.
Marking and Key Rules
Your reports will be marked against the following criteria and rules:
• Demonstrate a clear understanding of the problem
• Demonstrate consideration for the audience
• Clear outline and demonstration of investigation process
• The analysis overall is sound and logical
• Clearly draw conclusions based on analysis
• Statements are clear, concise and accurate, with correct spelling, free of
grammar errors and correct use of punctuation
• Use of visual presentation is appropriate
• The report is well structured, and sentences are well connected
• Closely follow a referencing style specified in Business School Referencing
Guide (e.g. APA) with consistency
• Clear, concise and commented Python code, if any
2020S1 QBUS3600 Assignment 1 (A2B) 4