Business Analytics Coding
Semester 213 – Assignment 1
Information
• Due date: 22nd of October 2021, 11:59pm AEST
• Percentage of Grade: 20%
• Total marks: 100
• Data Required: Please use the raw_orders.rds file on iLearn. You may also find the
orders.csv, customers.csv, and items_in_orders.csv files useful.
• Submission: You are required to submit your solutions for this assignment via iLearn. Submit
a zip file of your R project folder. Do not submit a gz or similar compression file other than a
zip file.
• Save your zip file as DTSC-100_StudentID_A1.zip.
For example, DTSC-100_12345678_A1.zip is the submission file for a student with a student
ID of 12345678.
• Ensure you adhere to the Academic Integrity Guidelines for Coding.
Task Description
A nearby restaurant collects data from the orders placed by customers and saves it in a single RDS file
called raw_orders.rds. RDS files contain R objects and can be loaded into R using the readRDS
function. The file you have been provided with has data for the months of January 2021 to March
2021 and contains the following fields.
Field Description
Order_ID Unique identifier for the order
Date Date the order was placed
Discount_Code Optional discount code
Type Type of the order, with values of “Dine In” or “Delivery”
Customer First and last name of the customer who placed the order (e.g. John Smith).
If there are more than two words, the first is the first name and the
remaining words are the last name
Customer_ID Unique identifier for the customer placing the order
Customer_DOB Date of birth of the customer
Products List containing vectors of product names (e.g. Pad Thai)
Quantities List containing vectors of quantities corresponding to each of the Products
Prices List containing vectors of unit prices corresponding to each of the Products
James thinks that the current ordering system is not scalable as the business grows and collects more
data. He would like to have a program that cleans the data in the RDS file and moves it to a database
with multiple tables – one for customers, items in the order and overall order information.
James has asked you to help with this project by writing an R program to clean the data in the
raw_orders RDS file and split it into three data frames so that it is ready to be put into a database.
Specifically, he would like you to:
• Extract the unique customer names, IDs and birth dates and place them into a data frame
containing customers’ information. Break each customer’s name into first and last name
columns. Additionally, use the birth date column to create an age column that reports the
customer’s age in years (whole number).
• Extract the product names, quantities and unit prices and place them into a data frame
containing the items in each order (i.e. one row for every item in an order). Each row in the
data frame should be associated with an order, identifiable by the order ID column.
• Create an orders data frame containing summary information about each order. Specifically,
the data frame should contain the order ID, customer ID of the customer purchasing the order,
order type, discount code and the date the order was placed. Additionally, the orders data
frame should contain fields calculating the delivery fee (if any), sub-total before discount,
value of the discount (if any), sub-total after discount, GST and final total of each order.
Dollar figures should be rounded to two decimal places.
• Save all three data frames as separate CSV files.
To show you what you are trying to achieve, James has manually prepared sample CSV files for the
“customers”, “orders” and “items in orders” data frames using the first 50 rows in the raw_orders.rds
file. It is a good idea to check that your final data frames match these samples. Additionally, if you are
struggling to create the final data frames and want to work on other parts of the project, you can use
these sample CSV files to complete them.
In order to help you with your tasks, James has provided you with the following business rules.
• An order can be placed for either Dine In or Delivery. Dine in is a free service while delivery
will incur an additional $3.99 delivery charge
• For orders containing more than 2 items, a 10% discount is automatically applied. This should
be reported automatically by the program as LARGE10 in the Discount Code field (i.e. the
user won’t enter LARGE10, the program needs to do that)
• The system should also support the discount codes of: OFF5 for 5% off the total price, and
OFF15 for 15% off the total price
• If a customer orders more than 2 items and uses a discount code, only the greatest discount
should be applied and put in the Discount Code field
• There is a Goods & Services Tax (GST) of 10% that has to be applied to the sub-total after
any discount has been factored in
While James is hiring someone else to design the front end (user interface) of a related web
application, James would like you to build some R functions that will be used to pass information to
the application. The functions that he would like you to create are detailed below.
• Customer Reporting – this function will be used to generate a report about a particular
customer. It will take in a customer ID, start date and end date (both dates will be in YYYY-
MM-DD format) and then print out information about the customer’s orders between
(inclusive) the start and end dates. Specifically, it will print out the customer’s full name, age,
number of orders, mean/median order totals, percentage breakdown of the order types and
percentage breakdown of products. You can assume that the “customers”, “items in orders”
and “orders” data frames are loaded globally (i.e. you can refer to them in your functions
without passing them in as parameters). If the function finds no orders for a given customer
ID and date range, it should tell the user whether or not the customer exists in the data.
• Plotting Data – James wants to create a graph showing the breakdown between order types,
discounts and the products purchased. He has already written some R code that will create the
graph (shown below). However, he needs to get the data into the format required by the
plotting code. James wants you to create a function that takes in the “items in orders” and
“orders” data frames and returns the data required by the plotting code. The plotting code,
required data format (i.e. what your function needs to return) and plot output are shown below
to help you test your function. Note that NA values for the Discount_Code field are to be
replaced by values of “NONE” to indicate that no discount has been used for that order.
Plotting Code
plotting_data <- expand_items_in_orders(items_in_orders, orders) ggplot(plotting_data, aes(x = Product, fill = Discount_Code)) + geom_bar() + coord_flip() + facet_wrap(~Order_Type) + xlab("Product") + ylab("Count") + ggtitle("Product by Discount and Order Type") Data Format (on all data – first 6 rows) Plot Output (on all data) Finally, James has some specific questions that he would like you to answer about the current dataset. 1. What’s the breakdown of discounts applied to orders (e.g. how many orders have no discounts, OFF5, OFF15 and LARGE10)? Report the breakdown in terms of both frequency (count of each value) and relative frequency (proportion of each value). 2. What’s the minimum, maximum, mean and median of the order totals? 3. What’s the total revenue of the restaurant? What is the total revenue for orders with a Discount Code of “LARGE10”? 4. Who are the top 10 most valuable customers (i.e. customers with highest lifetime order total)? 5. What is the highest revenue product (i.e. product with highest total lifetime order total)? 6. Print the total revenue of the restaurant for each day in February 2021. Identify the best and worst days, as well as the difference in revenue between them. In terms of structuring your solution, James has given you the following guidelines. • Create an R project to contain your solution • Create a sub-folder to contain the data files • Put the data cleaning and splitting code in an R file • Put the customer reporting function and plotting data function code in a separate R file • Put the answers to James’ questions in an RMD file. Compile the RMD file into a HTML document. Marking Criteria • Good programming style (indentation, comments, meaningful variable names, adhering to James’s structure guidelines) - 15 marks • Cleaning and splitting data into data frames o Customers data frame – 10 marks o Items in orders data frame – 10 marks o Orders data frame – 20 marks • Functions o Customer Reporting – 10 marks o Plotting Data – 10 marks • Questions about data o Q1 – 5 marks o Q2 – 2 marks o Q3 – 3 marks o Q4 – 5 marks o Q5 – 5 marks o Q6 – 5 marks