Part 1: Interpreting Data, Extracting Information, and Data Exploration
You are a Business Analytics Consultant staffed on a project to help Chimera Corporation to bring down its attrition rate. Recall that ‘Chimera Corporation’ is an Office Supply and Automation Company in Western Europe. This company has 19000 Employees and a high attrition rate of ~14%, compared to the industry average of ~8-9%. This project aims to help develop an analytics-informed strategy for Chimera Corp to address their attrition rate problem and bring it down to the industry average.
We have received the last year’s employee data (attached) from Chimera Corp, with the following structure and details. Note that the data covers an entire year (employees who were present at the start of the year, with employee exits marked for those exiting by the end of that year).
using pseudo-code / bulleted points to tell me what the use or describe the R code that followed for the ever assignment question (eg. #see the class of x)
• Is the data analyzable and usable (Is there any additional cleaning or data wrangling required?) Is additional data required?
Hint: show me in the R code you used or describe (using pseudo-code / bulleted points) how you checked if this data is usable and analyzable.
Remember to go column by column, check the data object and data type, etc. Is all the data in numeric or integer format?
• Employee Profiles: basic demographics of the employees at Chimera Corp. (at the beginning of the year)
• What is the gender ratio?
• What is the age range of employees?
• What is the decadal age distribution of employees (1-10,11-20,21-30,…)?
• What percentage of employees are foreign nationals?
• How many cities are employees located in?
• Any other demographic profiles you think relevant?
• Hint: Use tables, plots, boxplots, and hist commands as needed
• Beyond looking at our class code, remember to google what exactly you want to do
• Suggest you use tables to answer the question first before plotting these.
• Describe the Summary Descriptive Statistics for Employee Profiles (columns used in 2): a. Of all employees at the beginning of the year
b. Of employees exiting the firm by the end of the year
c. Of employees who remained with the firm by the end of the year
d. Comparing these descriptive statistics – Is there anything that sets apart the employees who exited from those who stayed?
Hint: you might want to subset the data or spit the data to make comparisons.
4. Employee Climate Survey
a. What was the distribution of employee job satisfaction at Chimera Corp?
b. What was the distribution of employee evaluations (kpi_performance)?
c. What was the distribution of manager evaluations (boss_survey)?
d. Describe (summarize or plot) any other columns that describe the “employment climate,”
i.e., measures that might shed light on the employees’ work culture, satisfaction, etc.
Hint: use hist() function or any other plotting functions you think relevant
5. Describe the Summary Descriptive Statistics for Employee Climate Survey (columns used in 4): a. Of all employees at the beginning of the year
b. Of Employees exiting the firm by the end of the year
c. Of employees who remained with the firm by the end of the year
d. Comparing these descriptive statistics – Is there anything that sets apart the employees who exited from those who stayed?
Hint: Subset the data like in Q3
• Compared to employees at the beginning of the year, what percentage of employees left the firm in the last year,
a. What percentage of core employees (employees who worked in core business) left the firm?
b. What percentage of high-potential employees left the firm?
Hint: Subset the data like in Q3 and use tables
• Given our preliminary data exploration, what next steps would you recommend, and why?
Hint: From everything you looked at from 1-6, what are the next steps on this analytics project to identify the cause / significant indicator of employee exits?
Should we do some hypothesis testing? And if so, what exact hypothesis should we test?