Description
Please note that the scenario for this assignment is fictional.
The National Tertiary Education Union (NTEU) has hired your data science team to conduct an evaluation of the “Rate My Professors” (RMP) website. The NTEU team suspects that the RMP website is used in the hiring and promotion process of staff at universities in Australia. They believe that student evaluations are not a reliable measure of staff performance and this information should not be used. The NTEU believes that the universities have collected the RMP profiles in order to build a model to predict student ratings for professors without profiles.
The NTEU suggests that the perceived difficulty of the class, demographics of the professor, discipline of study are bigger influences on a Professor’s rating than the quality of education they deliver. The NTEU further claim that students use the website to choose professors or classes that are perceived as being easier, which undermines the academic process.
The NTEU has asked you, as an independent third party to address three key questions:
• What affects a Professor’s rating on RMP?
• Can a model be built to predict a Professor’s rating, and could this model be used to predict a rating if the Professor does not have a profile on RMP?
• What, in your expert opinion, are the social and ethical issues involved?
This task requires you to produce a final report which is the deliverable for the NTEU project. This report must address the three key questions.
As an independent third-party you should also assess the validity of the NTEUs original claims. If it is not possible to directly answer the NTEUs questions using the data then you must investigate what can be done to address the questions as best as possible.
To increase the integrity of your finding’s others must be able to replicate your work. Therefore, the NTEU has asked that you provide all code used to produce the report. This code should be runnable without error, neat and documented.
Q1: What affects a Professor’s rating on RMP?
To answer this question, you should provide a cohesive EDA that combines and extends your individual analyses. You may use any combination of EDA techniques and traditional statistical testing. This might include non-graphical or graphical EDA, clustering or preliminary statistical testing. You must provide accompanying written explanations of your intentions and subsequent findings. Finally you should provide a summary statement.
It is up to you how you choose to answer this question. You will be assessed on the depth, quality of written analysis/justification and the clarity/presentation of results.
We have not covered how to do statistical testing in tutorials. However, we have equipped you with the skills and knowledge to research this on your own should you feel the need to take this approach.
Q2: Can a model be built to predict a Professor’s rating?
This question will require you to carefully consider the goal of the problem and the available data.
You must provide a description of how one could build a model to answer question 2 including:
• technical explanation of the model
• discussion of how such a model could be evaluated
• a thorough discussion of assumptions and shortcomings of said model
• a discussion of how to interpret the model generally and an example interpretation of the prototype model
It is expected that you will have to make simplifications or assumptions to make the model work. It is important that you clearly explain your steps and provide justification for your work.
You must include a proof of concept in your Jupyter Notebook. The prototype only needs to be a close facsimile of the model you describe in your report that implements the core components. In your report you may describe more advanced techniques which could be incorporated later.
Q3: What are the social and ethical issues involved?
This section will require you to analyse the social and ethical issues surround the University’s suspected use of the data. You must critically evaluate whether the claims made by the NTEU are plausible, what impact they may have on society and whether it would be ethical for the University to act on such data.
You must also discuss the ethical issues created by your team’s analysis. These issues might include the collection of data, any pre-processing of the data and assumptions made in the EDA or modelling phases.