—
title: “MA7419 Course Work November 2020”
output:
html_document:
df_print: paged
—
This contains the CW exercises.
## Instructions
This is the RMD file you will edit and submit for your coursework. Please save the edited file with the same name as this one but with your user name (ie university log in name, not student number) added to the end, e.g. “MA7419_CW_pk255.RMD”
Complete the course work by adding code where needed and typing text between the horizontal lines where indicated. You should not need to add any new code chunks, or any variable/object names (apart from in the final part).
Marks indicated for each section are out of a total of 90. There are 5 marks for submitting code that runs and 5 for submitting a pdf file that has been knitted from the code.) This work comprises 30% of the total assessment for the module.
__It is very important that you work on your own for this assessment. No collaboration is allowed and anyone found doing so will be subject to the penalties in the University Regulations (they are very severe). Anti-plagiarism software will be used for this submission.__
If you get __completely__ stuck contact me and I will give you some guidance – if that will result in the loss of any marks I will tell you and give you the choice to continue on your own.
You must email me both your RMD file and your PDF file by 12 noon on Tuesday 17 November.
“`{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, eval = TRUE)
“`
## Install libraries
“`{r, warning=FALSE, message=FALSE}
library(tidyverse)
“`
## Download and read in data (2 marks)
Read in the file called Leeds_salaries_2014.xlsx as a data frame (tibble) called salaries_raw.
(This data was downloaded from here: https://datamillnorth.org/dataset/senior-officer-salaries)
“`{r}
salaries_raw <- read_excel("~/7419/Leeds_salaries_2014.xlsx")
View(Leeds_salaries_2014_1_)
```
## Count rows and columns (2 marks)
Assign variables to the number of rows and number of columns read in.
```{r}
nrows_raw <- dim(salaries_raw)[1]
ncols_raw <- dim(salaries_raw)[2]
dim(salaries_raw)
```
## Select Columns (6 marks)
The vector `not-required` contains the names of columns that you won't need for the rest of this exercise. Create a new data frame without these columns.
```{r}
not_required <- names(salaries_raw)[c(1,2,3,4,6,11,19,20,21,22)]
salaries <-
salaries_raw %>%
“`
## Filter rows (6 marks)
Remove any rows where the salary is 0.
“`{r}
salaries <-
salaries %>%
“`
## Summarise whole data frame (6 marks)
Use `summarise()` to calculate min, mean and max salaries for the whole group.
“`{r}
global_stats <-
salaries %>%
“`
## Create factor and group_by (8 marks)
Covert the column `Grade` to a factor and then use group_by and summarise to calculate min, mean, and maximum salaries for each grade.
“`{r}
salaries <-
salaries %>%
“`
“`{r}
grade_stats <-
salaries %>%
“`
Repeat the code you just wrote, but add a column for the number in each group, filter so that you only show groups containg more than 5 roles, and order by decreasing mean salary.
“`{r}
grade_stats <-
salaries %>%
“`
## Calculate a new column and plot (12 marks)
Add a column to salaries called PensionPercent which contains the pension contribution as a percentage of salary.
“`{r}
salaries <-
salaries %>%
“`
Plot a histogram of the distribution of the pension contributions. Your histogram should be well labelled and clearly display the information requested.
## Comment on your chart.(5 marks)
“`{r}
“`
————–
Comments on chart here.
————–
## Use regular expressions to creat new grouping factors (10 marks)
Create a column called JobGroup containing a factor with the following levels:
“Director”, “Chief”, “Head”, “Manager”, “Other”
and allocate roles to the levels based on the JobTitle (i.e. if it contains “Director of”, Chief Officer”, “Head of”, “Manager”, or none of these.). You may want to use the functions `case_when` and `str_detect`.
“`{r}
salaries <-
salaries %>%
“`
## Make box plots (10 marks)
Make a box plot of salary by Job group. Put JobGroup on the vertical axis.
Your chart should look similar to the example in the file ExampleChart.png.
“`{r}
“`
## Comment on the box plot (5 marks)
Comment on the plot and suggest any further actions you might take to make the plot more meaningful.
————————-
Comments and further actions here
————————-
## Compare with other years (10 marks)
Find some data from a different year and compare salaries between the two years. Include at least one graph or chart and comment on your results.
“`{r}
“`
————————-
Comments and discussion here
————————-