FACULTY OF BUSINESS ADMINISTRATION ISOM2007 PROGRAMMING FOR BUSINESS ANALYTICS
ASSIGNMENT TWO (FIRST SEMESTER 2020/2021) DUE ON DEC. 25, 2020 BY MID-NIGHT AND SUBMIT VIA MOODLE
Submission Instructions:
Submit all your answers for the assignment in one single file with file extension .ipynb via the link on Moodle by mid-night on Dec. 19, 2020.
It is your responsibility to ensure the file that you submitted is compatible with the Jupyter Notebook, as your GA may randomly testing running your program in the Jupyter Notebook. No marks will be given if it cannot be opened or executed correctly.
For consistent, make sure that you named your file in the format as “BB912345 ISOM2007- 001 Assignment 02.jpynb” (i.e. your student ID , course code and section #, followed by the assignment # with one space in-between each element).
ALL works submitted by you MUST be your own. Any form of plagiarism is an offense and all students involved will receive a zero (0) marks.
No late submission of assignments. Late submission for any excuses/reasons will not be entertained and be graded zero.
Please pay attention to any new announcements and requirements before the due date of the assignment.
Assessment criteria:
The assessment of your works will be roughly based on the following components.
Appearance & Styles (~15%) – you should try to enhance the readability of your program, for example have proper spaces between block of codes, make each part / module / cell be self- explain to make the reader (i.e. your marker) more easy to following your idea / logic of your program.
Comments (~15%) – your documentation should be clear and concise and allow a user who has not read this handout to fully understand how to use and manipulate your coding.
Programming Styles (~30%) – Your variable names should be meaningful and your code as simple and clear as possible. You should have as little duplicated code as possible. If you find yourself repeating code, there’s a good chance you could find a simpler (lazier) method.
Correctness (~40%) – Try to ensure that you have performed all the necessary testing of your programs. Your test cases should not only cover all input categories, but should also be clearly labelled and organized in a sensible manner.
Assignment #2 1
1. Input both the SalesConsultants.xlsx and SalesManagers.xlsx files. Design a program (i.e. one program that contain all functions) that can allow the users to perform information look up of the following: (25 scores)
a) Enter the Sales Consultant ID and generate a report that contains the Sales
Consultant ID, name, sales of the 4 quarters and the annual sales.
b) Enter the Sales Manager ID and generate a report that contains that Sales
Manager ID, name and the total quarterly sales and total annual sales of all the Sales Consultants that under his/her supervisory.
2. Input the dataset from the airports file, you will find all airports in a given state from the airports dataset. Finish writing the necessary coding for the find_state. The state parameter indicates which state to find all of airports for. The dataframe parameter is passed in as df. (15 scores)
3. Input the dataset from the tips file and perform the follows accordingly: (15 scores)
i) Split the tips DataFrame into three new DataFrame objects with roughly equal
numbers of rows.
ii) Vertically stack only two of these three DataFrame objects into a new
DataFrame.
iii) Create a pivot table on this new DataFrame using the time column as the
horizontal index and the sex column as the vertical index.
iv) Compute and display the mean and standard deviation for the tips column in this
pivot table.
Hints:
tc = pd.concat([tr1, tr2]) # vertical staking where tr1 & tr2 are DF tm = pd.concat([tc1, tc2], axis=1) # horizontal staking
4. Input the iris.csv dataset and stored in a pandas dataframe df. Your task is to finish the function df_manipulate. Dataframe df is passed into df_manipulate. Using Dataframe df do the following: (15 scores)
2
Assignment #2
a) b)
Find the sum of the “sepal length in cm” column. Store this in a variable called sepal_length_sum
Transpose the dataframe and store in a variable called dfT
5. Plot the output of 2 + 3 – () which is saved in the variable y and plot the out of 2
+ 3 + () which is saved in the below variable called 2.
(15 scores)
In addition:
i) Titletheplot“Plotofy=2 +3-()andy=2 +3+()”
ii) Label the x axis as “x–axis”
iii) Label the y axis as “y–axis”
iv) Change the color of the plot for y2 to red
v) Add a legend so that the line for y2 is labeled “adding log(x)” and the line for y
is labeled “subtracting log(x)”. IMPORTANT: Make sure you add the legend text in the same command as making the plot.
6. Write a function called histogram_plotter that takes in a DataFrame, a column name from that data frame, and a number of bins and then plots a histogram of the data in that column. (15 scores)
Furthermore:
i) Set the y axis label to “Counts”
ii) Set the x axis label to the name of the column being plotted
iii) Design your testing module
Note: This question, you are asked to create a “function” that to do the plotting of a histogram. As a developer, you have the responsibility to test the correctness of your function by providing the values to the parameters (i.e. the DataFrame, column name, and # of bins), which you may allow the # of bins be optional and have default value equal to 10.
For example, you may have a call statement as histogram_plotter(TipsDF, ‘total_bill’, 10) , or histogram_plotter(TipsDF, ‘tip’), or histogram_plotter(column=’tip’, df=TipsDF)
You may use any of the data files that given in classes to test your function. But, remember, testing the correctness of your works is compulsory and you need to provide evidence and indicate that you have performed this in your works.
Assignment #2 3