Prompt
Practice Assignment
2020-05-11
You are working for a local retail firm. They have many, many clients and are interested in some basic information about them. The firm has supplied you with the excel sheet Masterdata.csv. They have also asked you to do some basic analysis on the dataset. You are asked to do the following:
Basic
1) How many missing values are there in each column? How many missing values are there total? Put your answers in a table.
2) How many unique states does the company operate in? Are there any typos in this column? If so, correct the typos. Save the corrections as data.cleaned. Unless otherwise told, work with data.cleaned.
3) Find the average invoice and the percent of invoices paid.
Advanced
4) Create a new dataframe where the “Names” column is split into 2 columns: First Name and Last Name. Nothing else needs to be changed. Save it as data.cleaned.split. For
1
1
2
3
4
5 6} 7}
8
part 5, use data.cleaned.split.
5) Write a function named “client_status” that inputs last name, first name and outputs: i) state they live in, ii) number of invoices that need to be paid, iii) outstanding debt. Here, outstanding debt means the sum of the value of the invoices not yet paid. Have your code return an error if someone enters a name not in data.cleaned.split. Provide a line of code that runs your function for Jordyn Kang.
6) Create a new dataframe named p6 where you:
i) drop all rows that have at least one NA value
ii) group by state
iii) return the number of invoices per state as well as the average number of invoices paid
7) I wrote a function that compares the number of invoices between two states using p6. It is supposed to return to me which state has more invoices. It’s not working and I don’t know why. In 3 sentences or less, explain to me why my code won’t work. Then fix the code so that it returns which state has more invoices1.
comp <- function(state1,state2){
if(p6$state1 > p6$state2){
return(state1)
}else{
return(state2)
1If you have extra time, is there a way to run the function without putting quotes around “Arizona” and “Alaska”? What about extending to more than 2 states?
2
9 #Running the code
10 comp(“Arizona”,”Alaska”)
8) Create a barplot that compares the average value of paid invoices to unpaid invoices to missing invoice information. Label the graph accordingly.
Technical Suggestions
• Please set your working directory to source file location and call the original dataset data.
• Please comment your code.
• Your code will be run using the original dataset you were provided. This means every change that is done to the data must be done within your script.
• If anything is confusing, please ask! You may email dklinenberg@ucsb.edu.
3