Business Analytics Coding Semester 203- Assignment 1
Information
• Due Date – Sunday 25/10/2020 11:59pm
• Percentage of Grade – 20%
• This assignment has a total of 100 marks
• Data required for this assignment is on iLearn in the raw_orders.rds, orders.csv,
customers.csv and items_in_orders.csv files
• You are required to submit your solutions to this assignment via iLearn. Submit a zip file of
your R project folder
• Save your zip file as DTSC-100_A1_Firstname_Lastname_SID.zip
• An example filename would be DTSC-100_A1_Robin_Banks_12345678.zip
Task Description
A recently opened pizza restaurant called Pizza Place collects data from the online orders placed by customers and saves it in a single RDS file called raw_orders.rds. RDS files contain R objects and can be loaded into R using the readRDS function. The file you have been provided with has data for the month of January 2019 and contains the following fields.
Field
Description
Order_ID
Unique identifier for the order
Date
Date the order was placed
Coupon
Optional coupon code
Type
Type of the order, with values of Pick Up or Delivery
Customer
First and last name of the customer who placed the order (e.g. John Smith). If there are more than two words, the first is the first name and the remaining words are the last name
Customer_ID
Unique identifier for the customer placing the order
Customer_DOB
Date of birth of the customer
Products
List containing vectors of product names (e.g. Supreme Pizza)
Quantities
List containing vectors of quantities, corresponding to each of the Products
Prices
List containing vectors of unit prices, corresponding to each of the Products
The owner of Pizza Place, John Smith, knows that the current ordering system is not scalable as the business grows and collects more data. He would like to have a program that cleans the data in the CSV file and moves it to a database with multiple tables – one for customers, items in the order and overall order information. He also would like to build a web application that can help him analyse the data that his business collects.
John has asked you to help him with this project by writing an R program to clean the data in the orders CSV file and split it into three data frames so that it is ready to be put into a database. Specifically, he would like you to:
• Extract the unique customer names, IDs and birth dates and place them into a data frame containing customers’ information. Break each customer’s name into first and last name columns. Additionally, use the birth date column to create an age column that reports the customer’s age in years (whole number).
• Extract the product names, quantities and unit prices and place them into a data frame containing the items in each order (i.e. one row for every item in an order). The order that a row in this data frame is associated with should be identifiable by the order ID.
• Create an orders data frame containing overall information about each order. Specifically, the data frame should contain the order ID, customer ID of the customer purchasing the order, order type, coupon code and the date the order was placed. Additionally, the orders data frame should contain fields calculating the delivery fee (if any), sub-total before discount, discount (if any), sub-total after discount, GST and final total of each order. Dollar figures should be rounded to two decimal places.
• Save all three data frames as separate CSV files.
To show you what you are trying to achieve, John has manually prepared sample CSV files for the customers, orders and items in orders data frames using the first 50 rows in the raw_orders.rds file. It is a good idea to check that your final data frames match these samples. Additionally, if you are struggling to create the final data frames and want to work on other parts of the project, you can use these sample CSV files to complete them.
In order to help you with your tasks, John has provided you with the following business rules.
• An order can be placed for either Pick Up or Delivery. Pick up is a free service while delivery will incur an additional $12 delivery charge
• For orders containing more than 3 pizzas, a 10% discount is automatically applied. This should be reported automatically by the program as LARGE10 in the Coupon field (i.e. the user won’t enter LARGE10, the program needs to do that)
• The system should also support the coupon codes of: OFF5 for 5% off the total price, and OFF10 for 10% off the total price
• If a customer orders more than 3 pizzas and uses a coupon code, only the greatest discount should be applied and put in the Coupon field
• There is a Goods & Services Tax (GST) of 10% that has to be applied to the sub-total after any discount has been factored in
While John is hiring someone else to design the front end (user interface) of his web application, John would like you to build some R functions that will be used to pass information to the application. The functions that he would like you to create are detailed below.
• Customer Reporting – this function will be used to generate a report about a particular customer. It will take in a customer ID, start date and end date (both dates will be in YYYY- MM-DD format) and then print out information about the customer’s orders between (inclusive) the start and end dates. Specifically, it will print out the customer’s full name, age, number of orders, mean/median order totals, percentage breakdown of the order types and percentage breakdown of products. You can assume that the customers, items in orders and
orders data frames are loaded globally (i.e. you can refer to them in your functions without
passing them in as parameters).
• Plotting Data – John wants to create a graph showing the breakdown between order types,
coupons and the products purchased. He has already written some R code that will create the graph (show below). However, he needs to get the data into the format required by the plotting code. John wants you to create a function that takes in the items in orders and orders data frames and returns the data required by the plotting code. The plotting code, required data format (i.e. what your function needs to return) and plot output are shown below to help you test your function. Note that NA values for the Coupon field are to be replaced by values of “NONE” to indicate that no coupon has been used for that order.
Plotting Code
plotting_data <- expand_items_in_orders(items_in_orders, orders)
ggplot(plotting_data, aes(x = Product, fill = Coupon)) + geom_bar() +
coord_flip() +
facet_wrap(vars(Order_Type)) +
xlab("Product") +
ylab("Count") +
ggtitle("Product by Coupon and Order Type")
Data Format (on all data – first 6 rows)
Plot Output (on all data)
Finally, John has some specific questions that he would like you to answer about the current dataset.
1. What’s the breakdown of discounts applied to orders (e.g. how many orders have no discounts, OFF5, OFF10 and LARGE10)?
2. What’s the minimum, maximum, mean and median of the order totals?
3. What’s the total revenue of the restaurant (i.e. sum of the order totals)?
4. Who are the top 10 most valuable customers (i.e. customers with highest lifetime order
total)?
5. What is the most valuable product (i.e. product with highest total lifetime order total)?
6. Print the total revenue of the restaurant for each day in January 2019. Identify the best and
worst days, as well as the difference in revenue between them.
In terms of structuring your solution, John has given you the following guidelines.
• Create an R project to contain your solution
• Create a sub-folder to contain the data files
• Put the data cleaning and splitting code in an R file
• Put the customer reporting function and plotting data function code in a separate R file
• Put the answers to John’s questions in an RMD file. Compile the RMD file into a HTML
document
• You will likely find the tidyverse package useful when creating your solution
Marking Criteria
• Good programming style (indentation, comments, meaningful variable names, adhering to John’s structure guidelines) - 15 marks
• Cleaning and splitting data into data frames
o Customersdataframe–10marks
o Itemsinordersdataframe–10marks o Ordersdataframe–20marks
• Functions
o CustomerReporting–10marks o PlottingData–10marks
• Questions about data o Q1–5marks o Q2–3marks o Q3–2marks o Q4–5marks o Q5–5marks o Q6–5marks