Ethical Implications of AI Bias as a Result of Workforce Gender Imbalance
CIS & The Policy Lab, The University of Melbourne
Final Report for UniBank (Teachers Mutual Bank Limited)
Copyright By PowCoder代写 加微信 powcoder
T o contextualise the findings provided within, this report should be read in conjunction with the Literature Review deliverable dated 26th June 2020 (attached within the Appendix for convenience). The terms resumé and CV will be used interchangeably throughout the report.
The authors have also provided a Python Jupyter Notebook1 (titled UniB ank Project Source Code) which contains the Python source code, technical documentation, experimental methodology (including internal tests and calculations), and results. This notebook is provided in the native computer-readable IPYNB format – with a print-friendly PDF version also included in the Appendix. The current report contains cross- references to relevant Jupyter Notebook sections as relevant.
1 The programming language of choice is Python, due to its widespread use in data science and machine learning applications, as well as its extensive library of specialist functions (e.g. numpy for mathematical operations). It is provided in the Jupyter Notebook file format, which integrates both code and technical documentation. For a brief introduction on
the Jupyter Notebook file format, please refer to https://en.wikipedia.org/wiki/Project_Jupyter.
CIS & The Policy Lab, The University of Melbourne | Gender Occupational Sorting
Authors Marc of Computing and Information Systems
School of Computing and Information Systems
School of Computing and Information Systems
The Policy Lab
Leah Ruppanner
The Policy Lab
School of Computing and Information Systems
CIS & The Policy Lab, The University of Melbourne | Gender Occupational Sorting
Preamble 2
Contents 4
Executive Summary 5
Gender Roles, Gendered Judgements 7
Designing research: gender biases in 8 panellists’ recruitment
Quantifying gender bias in panellists’ 9 recruitment decisions
Patterns of heuristic judgements 10 Algorithms: eradicating or exacerbating 11
gender bias?
Lessons from the Amazon Case 12
Data Science Approach: Machine Learning 14
Design Decisions
Experimental Design and Key Tasks
Experimental Results 17
15 Can the computer match human 17
decision-making?
Machine Bias Could Influence Algorithmic 19 Outcomes in Three Meaningful Ways
Human Bias Could Influence Algorithms in 24 Two Meaningful Ways
Discussion and Future Work 29 Recommendations and Conclusion 31 Bibliography 33 Appendices 35
CIS & The Policy Lab, The University of Melbourne | Gender Occupational Sorting
Executive Summary
This report summarises the key findings from the Exploring the Ethical Implications of AI Bias as a Result of Workforce Gender Imbalance project by University of Melbourne researchers (from the Policy Lab and the School of Computing and Information Systems (CIS)/Centre for AI and Digital Ethics (CAIDE). This interdisciplinary project draws upon sociological research and human panel experiments from Policy Lab, and industry-standard data science approaches for machine learning and algorithmic development and evaluation from CIS/CAIDE.
This report will first discuss background research in gender bias with respect to hiring; in order to contextualise results of the hiring decisions (and rationales) by
Next, the report focuses on design
The report finds that both human bias and machine bias can influence hiring outcomes, by illustrating three hypotheses on how algorithms affect hiring decisions and entrench stereotypes, and two hypotheses on how human factors play a role as well. Recommendations to reduce bias include training programs for human resource professionals; audits of hiring and gender discrimination across all positions; creating established quota systems for hiring; and creating proprietary hiring algorithms that are transparent and trained with the express aim of reducing gender bias (with regular audits).
2 Further reading: Han,, Pei & Kamber 2011.
the Policy
applicants’ names (as markers of gender).
Lab panel when given
characteristic of industries, controlling
algorithms
participatory design, and use of open- source software); before describing the prototype machine learning algorithms to automate (and replicate) human judgements; as well as a critical investigation of their outputs. We use the term ‘classifier’ for an algorithm which attempts to classify a data point (in our case, e.g. ‘hired’ versus ‘not hired’). The term ‘predictor’ or ‘estimator’ is used for an algorithm to, as its name implies, predict an unknown value (in our case, e.g. candidate is ranked ‘2nd’ or ‘5th’). Hence, throughout the report, the terminology used reflects the context in which the algorithm in question is discussed2.
and justifications for the machine learning (including transparency,
Gendered Roles, Gendered Judgements
Gender bias occurs when specific traits tied to broader expectations of gender are applied to individuals regardless of whether an individual exhibits these traits3. Gender bias often extends into job recruitment when hiring panels rely on existing heuristics to rank men and women as qualified for positions4.
3 Sczesny et al. 2016 4 Cohen 1976; Russo 1976 5 Bailey LaFrance,and Dividio 2018
6 Burgess 2013 7 Cohen 1976 8 Gaucher et al. 2011 9 Sczesny et al. 2016
Throughout history, women’s position in society has been relegated5 to the role of homemaker. Although this gender-role norm has somewhat weakened 6 today, its impact on lingering prejudice still elicits subconscious gender bias in the modern-day recruitment arena. As women are often still associated with domestic work, they are presumed to be less productive in the workplace than men. Despite identical qualifications and skills, employers often favour male candidates as they describe them as ‘competitive’, ‘experienced’ and ‘ambitious’ in comparison to women who are considered.7 It remains unclear, however, how these gender associations have been generated and why these specific traits became part of the recruitment success metrics.
These descriptions are not limited to employers. In fact, men and women often describe themselves with these gendered adjectives8. Women often use more communal, social and expressive vocabularies in comparison to men who use language that is more managerial and directive. Moreover, these gender differences
are apparent in how society describes itself. Descriptions about men contain words that
highlight ‘prominence’, such as ‘outstanding’ or ‘unique’, whilst women are often described more with ‘social’ connotations, such as ‘warm’ and ‘collaborative’9.
CIS & The Policy Lab, The University of Melbourne | Gender Occupational Sorting
Panellists:
Graduate students with experience in hiring
Designed for three specific industries
Gender-manipulated
Male-dominated:
Data Analyst
Female Male
Each panellist assess three sets of resumés
Gender-balanced:
Finance Officer
Female Male
Female-dominated:
Recruitment Officer
Female Male
Designing research: gender biases in panelists’ recruitment
Our research first investigates how gender bias is incorporated into employment recruitment. We observe hiring behaviours for three specific occupational roles, selected for their gender ratios, to include: male- dominated, gender-balanced and female- dominated industries. The three occupational roles are Data Analyst, Finance Officer and Recruitment Officer respectively (Figure 1).
For each occupational role, we provide hiring panellists with a set of real resumés, with a balanced ratio between male and female candidates. Half of the panellists were given the original resumés, without the gender of the candidate manipulated. The other half were given the exact same resumés, but with the gender changed (male to female and female to male – indicated by names as gender variable; for instance, ‘Mark’ to ‘Sarah’, or ‘Rachel’ to ‘John’). The hiring panellists within each group were then instructed to rank each resumé individually: with a lower value denoting a better rank (i.e., their favoured candidate has rank 1). They were asked to provide their ranked lists to the researchers prior to the group meeting. Then, the panellists deliberated in groups of three to collectively decide on the Top 3 and Bottom 3 resumés for each occupational role. Finally, we used these discussions and rankings to create a hiring algorithm to create a consensus ranking of candidates for the position.
CIS & The Policy Lab, The University of Melbourne | Gender Occupational Sorting
Figure 1. Experimental setup for human panellists in assessing resumés
Quantifying gender bias in panelists’ recruitment decisions
Individual Ranking Average
Note: the lower the value, the higher the ranking for 40 panellists
The findings in Figure 2 demonstrate that panellists of both genders ranked female resumés lower than male resumés in both male-dominated (Data Analyst) and gender-balanced (Finance Officer) roles, on average.
On the other hand, female resumés are favoured more in the female- dominated (Recruitment Officer) role, being almost half a rank better than male resumés on average.
Data Analyst
Finance Officer
Male Female
Recruitment Officer
Figure 2a. Average rankings of individual panellists
Average Ranking by Female Panellists
Note: the lower the value, the higher the ranking for 40 panellists
Average Ranking by Male Panellists
Note: the lower the value, the higher the ranking for 40 panellists
Data Analyst
Finance Officer
Recruitment Officer
Figure 2b. Average rankings of (a) female panellists and (b) male panellists
CIS & The Policy Lab, The University of Melbourne | Gender Occupational Sorting
Data Analyst
Finance Officer
Recruitment Officer
16 14 12 10
Recruitment Data Analyst Of fic er
Top 3 Group Ranking
Bottom 3 Group Ranking
Data Analyst Finance Of fic er
Mal e Fe ma le
Finance Of fic er
Mal e Fe ma le
Recruitment Of fic er
Figure 3. Count of topmost (ranks 1, 2 and 3) and bottommost (ranks 6, 7 and 8) rankings for resumés by gender, per industry type
As shown in Figure 3, our male candidates were more often ranked in the Top 3 for all jobs listed. By contrast, our female candidates were more often ranked in the Bottom 3. The figures also demonstrate that the gender bias occurs the most in the Bottom 3 candidates for the Finance Officer role, with male resumés half as likely to be ranked at the bottom (9 male vs. 18 female resumés). As a close second, the Top 3 ranked resumés for Finance Officer also illustrate a possible gender bias, with more male resumés ranking in the top 3 resumés than female (15 male vs. 11 female).
Patterns of heuristic judgements
Our study uncovered several themes during deliberation. We identified subjective assessments from the panellists that may increase the risk of introducing gender biases in hiring decisions. These findings are coherent with extant research on decision-
m1a k i n g h e u r i s t i c b i a s e s 1 0 .
First impression as a ruling lens
During each deliberation, panellists express their first impression towards the candidates’ ‘self-presentation’. This first impression emerged from the structure of the resumé, layout and word choices in self-description. Most panellists believe that these components signify a candidate’s passion, likeability and resilien ce. These assumptive associations are often referred to as a first impression bias, in which one’s judgement is influenced by societal and cultural connotations and personal encounters.11
Personal experience as a benchmark of quality
Panellists assess the resumés against the given job descriptions, including qualifications, skills and educational backgrounds; however, most panellists describe the term ‘experienced’ subjectively, by comparing against their own background. Studies have identified this pattern as an affinity bias, which has the potential to exclude women and other minority groups when underrepresented on hiring panels.12 This includes the panellists’ varying judgements over what ‘job churn’ means.
Whilst some favour this component as they believe it indicates versatility and agility based on their experience, others perceive this component as undesirable as it may indicate an undervaluing of growth and persistence. Given that females often have gaps in employment for childrearing, these differences in opinions may introduce gender bias.
CIS & The Policy Lab, The University of Melbourne | Gender Occupational Sorting
Lim, Benbasat & Ward 2000; Lindsay & Norman 2013; Rivera 2012 11 Judge & Cable 2004; Rivera 2012
12 Lewicki et al. 2016
Algorithms:
Eradicating or exacerbating gender bias?
These discussions and group rankings guided us to develop a suite of automated and semi- automated algorithms to rate the candida tes’ suitability for each of these jobs. The idea of adopting hiring algorithms was initially grounded on the premises that, firstly, the absence of human intervention gives rise to purported impartiality and neutrality. Secondly, technology enables optimal efficiency and accuracy in sorting a massive volume of applications with minimum cost and for maximum benefit of the company13.
Theoretically, hiring algorithms should be able to create an optimum amalgam of excellent candidates based on pure meritocracy. However, decision-making algorithms — such as these are designed to mimic how a human would, in this case, chose a potential employee — ultimately work with associations, just as our human brains. Without careful risk mitigation, algorithms are not immune to gender bias; in fact, in some cases, they
14 may instead exacerbate gender bias .
Gender bias in hiring algorithms can occur in three forms:
Bias in datasets
The limitation of the datasets is a major factor in the algorithms’ proneness to discrimination as it builds the scope of benchmarking a candidate against others.15 If the algorithms are fed with limited datasets to assess and rank candidates, the algorithms will transform their benchmark based on this specific set of data. Without proportional representati on of gender and other protected attributes, datasets can introduce bias to algorithmic judgements. Simply, if the data lacks enough female candidates, the algorithm will make hiring decisions based largely on male attributes.
Bias in the system
Correlational bias remains a major concern in algorithms, including those designed for hiring management16. One of the most prominent reasons for this is the use of proxy attributes to represent a parameter or an individual. For instance, algorithms make correlations between ‘creativity’ and the individual’s length of employment within the same job17. They also make associations between higher levels of ‘inquisitiveness’ and their likelihood of finding other opportunities. When these correlations are made based on demographic traits, such as neighbourhood, race or gender, the algorithms are at risk for bias that can influence the whole corporate culture18. Therefore, when these proxies are embedded within the algorithms’ judgement of suitability for employment, it will repeat the same societal bias. For women, this may lead to discrimination based on schooling, certain types of extracurricular activities, employment gaps for parental leave, and/or other correlated gendered characteristics. Another example will be the models used in natural language processing: these models, trained on large corpora of language data (from real-world news sites to webpages) will pick up any biased language usage, however subtle. As a result, these biases, in one form or another, will manifest themselves statistically in the language
Bias in human decisions
Algorithms are generally trained to record and memorise past decisions and learn from them20. This implies that the algorithms memorise the patterns of previous decisions and can replicate the patterns of these decisions. Without concrete mitigation plans, algorithms adopt human decision patterns and replicate them as a predictor of success metrics. Here, humans may rely on internalized gender bias to make hiring decisions or hiring panels may fail to include sufficient female representation, leading to gender bias codified in the algorithms. Ultimately, algorithms trained with human interference can replicate human bias.
19 Chang, Prabhakaran & Ordonez 2019 20 O’Neil 2016
Preuss 2017; Kulkarni & Che 2017 14 O’Neil 2016; Costa et al 2020; Kim 2019 15 Costa et al. 2020
17 O’Neil 2016 18 O’Neil 2016
CIS & The Policy Lab, The University of Melbourne | Gender Occupational Sorting
Review document that in 2014, Amazon generated hiring algorithms to predict the suitability of applicants. The algorithms were trained using internal company data over the past 10 years21. Years after, it was then found that Amazon’s hiring algorithms discriminated against female applicants.22 This bias was not introduced by the algorithms; rather, it was a consequence of the biased datasets that mirror the existing gender inequality in the workplace23.
21 Costa et al. 2020 22 Bogen 2019; Dastin 2018 23 Costa et al. 2020; O’Neil 2016 24 Costa et al. 2020; Faragher 2019 26 Orlikowski 1991
As the majority of Amazon’s employees were Caucasian men, their hiring algorithms used this pattern as a determining factor of success, and therefore, discriminating against female candidates24. Keywords such as “all- women’s college” and “female” served as proxies that ranked female applicants lower25.
Information Systems theory can also help explain the Amazon case. Research suggests that there is a reciprocal relationship between technologies, the organisational environment and organisational agents26. When ranking algorithms for recruitment are trained with biased data sets, the technology impacts the organisation in a way that reflects the organisational operation, while at the same time influencing the way it operates. This means hiring algorithms trained with biased data can replicate existing inequalities while also introducing new ones.
Literature
Lessons from the Amazon Case
CIS & The Policy Lab, The University of Melbourne | Gender Occupational Sorting
Data Science Approach: Machine Learning
Prioritising explainability and interpretability over commercial accuracy
The choice of actual machine-learning algorithms used are to be as transparent and as explainable28 as possible. Many existing algorithms do not show the math, logic or programming behind them. We prioritised transparency which allows us to scrutinise the algorithmic judgements– with emphasis on gender bias propagated from humans29 – over the higher degrees of accuracy needed in a commercial or
Desi1gn decision
production-ready system. Hence, we refrain from using
30 complex techniques such as deep neural networks.
De2sign decision
User-centred design and domain expert knowledge
To simulate a real-world user-centred systems development process, our data scientist programmer (co-author McLoughney) is embedded within the panel sessions conducted at Policy Lab (by co-authors Ruppanner and Njoto). Our data scientist is required to observe, ask questions and solicit input from panel participants – including assumptions, justifications, and clarifications – in order
to gain domain knowledge of the task at hand . This meant the
hiring panel rather than the data scientist drove the logic of the
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com