ACCT5034 Analysis and Visualization of Financial Data
Folio 2 Tableau Desktop and Visualisations for Data Analytics Profile Sem 2, 2020
Introduction
This is the final assessment in this Unit and combines your developing knowledge of Excel and Tableau with new learning about coding for data management, cleaning and visualization in “R”. R is an open source platform and collaborative project with many thousands of contributors from teams and individuals globally. This Folio extends the CIMIC Case to use data visualizations and tables generated through coding in R for benchmarking the performance of CIMIC Group Ltd against statistics data and statistics generated for the global Metals & Mining industry.
Prior to commencing Folio 2 your Instructor will have demonstrated best practice project management with R Scripts (files of R Code), Data (source data files like .csv), Plots (output visualisations from packages like ggplot2) and statistical analysis and summary statistics packages in RStudio. You will also have completed your learning with the Instructor’s demonstration of Swirl Courses and RStudio Cloud Primers (a “Primer” is defined as “a beginner class”).
Case – “R” for Roseberry
As the new Graduate Accountants at CIMIC Group Ltd, you and your two closest friends, Zhising and Frank, have a running joke with your line-manager Roseberry Peters, that she has become your IT Guru. The reason? Roseberry has recently been promoted to Strategic Management Accountant after Sherlock Poems, the previous SMA was promoted to Chief Financial Officer. It is part of Roseberry’s new role to run the graduate training sessions on Friday afternoons. The training goes for three hours from 2-5pm followed by drinks and snacks in the Perth office staff room. The title of the past few weeks training is “Coding for Analytics and Statistics with R” and Roseberry has thrown herself into the teaching role with the Grad Accounting team – she seems to love the challenge.
You and your two graduate accountant buddies have really been enjoying the R training using the Swirl package in RStudio and RStudio Cloud Primers. You all respect Roseberry’s enthusiasm. Frank sent an invite last week for you all – a WhatsApp discussion group called “R for Roseberry” and the training group is using it for chatting and general Q&A. In fact this forum has acted like a virtual team with Roseberry as the project manager, and very enthusiastic contributor!
After the most recent Grad Training session (last Friday) a WhatsApp arrives from your boss this morning to the “R for Roseberry” group:
1|Page
ACCT5034 Analysis and Visualization of Financial Data
Folio 2 Tableau Desktop and Visualisations for Data Analytics Profile Sem 2, 2020
Later in the day you meet with Roseberry for about 30 minutes to discuss her early morning post to the group. It seems her presentation to the local Chamber of Commerce, “A Stakeholder perspective on the Metals & Mining industry” in early-September was received so well that the local CoC is funding her to represent Western Australian mining services in Toronto, Canada.
After the meeting you review an extract of your notes (below) and you realize there is quite a lot of work to get done before her conference presentation date in early November, 2020.
“Notes from meeting with the boss on Monday morning:
a) Complete Friday Grad Training (RStudio Cloud, ACCT5034 Project you have been emailed an invite to this project). In this ACCT5034 Project you need to finish the Swirl Courses, “R_Programming” and “Getting and Cleaning Data” (10-15 hours). You also need to do the training in RStudio Cloud, “Visualize Data” Primer is an excellent way to learn quite a bit about visualizing data in R with ggplot2 from the Tidyverse suite of packages).
b) Read Hayley Wickham’s article titled “Tidy Data” from the Journal of Statistical Software (2014) (available in ‘Module 3 – R>Data’ of the Unit’s Blackboard site).
c) Using the Metals & Mining dataset that you have previously cleaned in Excel, use “Install.Packages” function in RStudio to load to the Environment and then use an R Script file to write code for analysis of the data. Need to refer to the Data Transformation functions available in the Dplyr package – refer to “Data Transformation with Dplyr:: Cheat Sheet” available from the RStudio website. Roseberry says … not limited in any way but should try to use at least the following functions: %>% Pipe function of magrittr package, summarise, count, group_by, select, sample_n, slice, top_n, mutate, transmute, mutate_all, arrange, desc (functions from dplyr package of tidyverse suite of packages). Of course using the functions is driven by question you want to ask about the data – Roseberry emphasized this point!
Use the ggplot2 package, and any of the datasets you generate through your data analytics (Transformations using dplyr package functions) analysis above to create interesting new plots.”
Notes from Instructor about requirements:
You are required to submit a Tableau Data Story which Roseberry Peter’s can deliver at the Canadian conference on the mining industry. She will present at the mining service chapter of the conference so please include observations about CIMIC’s financial data in your analysis (You will need to join data from Metals & Mining and Cimic Group that you have previously cleaned.
The primary aim of Folio 2 is to use functions (section c) above) in the dplyr package of R for data analytics, and produce novel datasets (dataframes in R) that can be output from R for use in Tableau Desktop. This analysis is expected to be interesting analysis of the global Metals & Mining financial data for comparison with CIMIC for Roseberry Peter’s presentation and Data Story.
2|Page
ACCT5034 Analysis and Visualization of Financial Data
Folio 2 Tableau Desktop and Visualisations for Data Analytics Profile Sem 2, 2020
A secondary aim of Folio 2 is to fully explore the use of the ggplot2 package in R for producing novel plots. These plots generated in RStudio with ggplot can be saved from R to your working directory using the ggsave as demonstrated in class. These novel plots can be opened as images in Roseberry’s Data Story.
Files required to be submitted:
Data file – Metals&Mining.csv
Example naming: 19245879_GregWhite_Data.csv
(This file is the starting data set which your R Script file (below) will run from)
R Script – [student id]_[Name]_Folio2.R Example naming: 19245879_GregWhite_Folio2.R (This file is your code saved in an R Script file)
Tableau Workbook – [student id]_[Name]_Folio2.R
Example naming: 19245879_GregWhite_Folio2.twbx
(This file is the Data Story for Roseberry saved as an extracted Tableau Workbook)
MSWord doc – [student id]_[Name]_Folio2.docx
Example naming: 19245879_GregWhite_Folio2.docx
(This file is your R Script above copied and saved as text in an MS Word file. ***Please submit this to the Turnitin Assignment link which is separate from where you submit the above files)
Notes from Instructor about process:
You will need to install RBase and RStudio on your computer workstation as demonstrated. Your Instructor will demonstrate the best way to manage your assessment using an R Project. Basically you need to create a new R Project, save it somewhere safe like your own Cloud, USB or Shared-drive storage and create the folders under the Project called “Scripts” and “Data”. Set the project directory you created as the working directory and use RStudio to upload the global Metals & Mining data. Ensure that the tidyverse and magrittr packages are installed by using install.packages() and library().
NOTE: As you write code in your script file that creates a new object in the R Environment Please save your datasets (dataframes) of data from R to your working directory R Project>Data folder using the write.csv function. As you write code in your script file that creates new plots using ggplot2 you can use ggsave in a similar way to save your plots.
IMPORTANT: Please make sure that your Folio R Project is being saved in a safe cloud, shared drive or storage media location. As a way of protecting your files you might like to email them to yourself as attachments from your project working directory.
End of Folio 2
3|Page