Five sources of bias in natural language processing

Shrimai Prabhumoye E. Hovy

Abstrak

Abstract Recently, there has been an increased interest in demographically grounded bias in natural language processing (NLP) applications. Much of the recent work has focused on describing bias and providing an overview of bias in a larger context. Here, we provide a simple, actionable summary of this recent work. We outline five sources where bias can occur in NLP systems: (1) the data, (2) the annotation process, (3) the input representations, (4) the models, and finally (5) the research design (or how we conceptualize our research). We explore each of the bias sources in detail in this article, including examples and links to related work, as well as potential counter‐measures.

Artikel Ilmiah Terkait

Welcome to the Modern World of Pronouns: Identity-Inclusive Natural Language Processing beyond Gender

Anne Lauscher Dirk Hovy Archie Crowley

24 Februari 2022

The world of pronouns is changing – from a closed word class with few members to an open set of terms to reflect identities. However, Natural Language Processing (NLP) barely reflects this linguistic shift, resulting in the possible exclusion of non-binary users, even though recent work outlined the harms of gender-exclusive language technology. The current modeling of 3rd person pronouns is particularly problematic. It largely ignores various phenomena like neopronouns, i.e., novel pronoun sets that are not (yet) widely established. This omission contributes to the discrimination of marginalized and underrepresented groups, e.g., non-binary individuals. It thus prevents gender equality, one of the UN’s sustainable development goals (goal 5). Further, other identity-expressions beyond gender are ignored by current NLP technology. This paper provides an overview of 3rd person pronoun issues for NLP. Based on our observations and ethical considerations, we define a series of five desiderata for modeling pronouns in language technology, which we validate through a survey. We evaluate existing and novel modeling approaches w.r.t. these desiderata qualitatively and quantify the impact of a more discrimination-free approach on an established benchmark dataset.

Fairness And Bias in Artificial Intelligence: A Brief Survey of Sources, Impacts, And Mitigation Strategies

Emilio Ferrara

16 April 2023

The significant advancements in applying artificial intelligence (AI) to healthcare decision-making, medical diagnosis, and other domains have simultaneously raised concerns about the fairness and bias of AI systems. This is particularly critical in areas like healthcare, employment, criminal justice, credit scoring, and increasingly, in generative AI models (GenAI) that produce synthetic media. Such systems can lead to unfair outcomes and perpetuate existing inequalities, including generative biases that affect the representation of individuals in synthetic data. This survey study offers a succinct, comprehensive overview of fairness and bias in AI, addressing their sources, impacts, and mitigation strategies. We review sources of bias, such as data, algorithm, and human decision biases—highlighting the emergent issue of generative AI bias, where models may reproduce and amplify societal stereotypes. We assess the societal impact of biased AI systems, focusing on perpetuating inequalities and reinforcing harmful stereotypes, especially as generative AI becomes more prevalent in creating content that influences public perception. We explore various proposed mitigation strategies, discuss the ethical considerations of their implementation, and emphasize the need for interdisciplinary collaboration to ensure effectiveness. Through a systematic literature review spanning multiple academic disciplines, we present definitions of AI bias and its different types, including a detailed look at generative AI bias. We discuss the negative impacts of AI bias on individuals and society and provide an overview of current approaches to mitigate AI bias, including data pre-processing, model selection, and post-processing. We emphasize the unique challenges presented by generative AI models and the importance of strategies specifically tailored to address these. Addressing bias in AI requires a holistic approach involving diverse and representative datasets, enhanced transparency and accountability in AI systems, and the exploration of alternative AI paradigms that prioritize fairness and ethical considerations. This survey contributes to the ongoing discussion on developing fair and unbiased AI systems by providing an overview of the sources, impacts, and mitigation strategies related to AI bias, with a particular focus on the emerging field of generative AI.

Detecting race and gender bias in visual representation of AI on web search engines

M. Makhortykh R. Ulloa Aleksandra Urman

26 Juni 2021

Web search engines influence perception of social reality by filtering and ranking information. However, their outputs are often subjected to bias that can lead to skewed representation of subjects such as professional occupations or gender. In our paper, we use a mixed-method approach to investigate presence of race and gender bias in representation of artificial intelligence (AI) in image search results coming from six different search engines. Our findings show that search engines prioritize anthropomorphic images of AI that portray it as white, whereas non-white images of AI are present only in non-Western search engines. By contrast, gender representation of AI is more diverse and less skewed towards a specific gender that can be attributed to higher awareness about gender bias in search outputs. Our observations indicate both the the need and the possibility for addressing bias in representation of societally relevant subjects, such as technological innovation, and emphasize the importance of designing new approaches for detecting bias in information retrieval systems.

We are Who We Cite: Bridges of Influence Between Natural Language Processing and Other Academic Fields

Jan Philip Wahle Mohamed Abdalla Saif Mohammad + 2 lainnya

23 Oktober 2023

Natural Language Processing (NLP) is poised to substantially influence the world. However, significant progress comes hand-in-hand with substantial risks. Addressing them requires broad engagement with various fields of study. Yet, little empirical work examines the state of such engagement (past or current). In this paper, we quantify the degree of influence between 23 fields of study and NLP (on each other). We analyzed ~77k NLP papers, ~3.1m citations from NLP papers to other papers, and ~1.8m citations from other papers to NLP papers. We show that, unlike most fields, the cross-field engagement of NLP, measured by our proposed Citation Field Diversity Index (CFDI), has declined from 0.58 in 1980 to 0.31 in 2022 (an all-time low). In addition, we find that NLP has grown more insular -- citing increasingly more NLP papers and having fewer papers that act as bridges between fields. NLP citations are dominated by computer science; Less than 8% of NLP citations are to linguistics, and less than 3% are to math and psychology. These findings underscore NLP's urgent need to reflect on its engagement with various fields.

Fairness in Machine Learning: A Survey

C. Haas Simon Caton

4 Oktober 2020

When Machine Learning technologies are used in contexts that affect citizens, companies as well as researchers need to be confident that there will not be any unexpected social implications, such as bias towards gender, ethnicity, and/or people with disabilities. There is significant literature on approaches to mitigate bias and promote fairness, yet the area is complex and hard to penetrate for newcomers to the domain. This article seeks to provide an overview of the different schools of thought and approaches that aim to increase the fairness of Machine Learning. It organizes approaches into the widely accepted framework of pre-processing, in-processing, and post-processing methods, subcategorizing into a further 11 method areas. Although much of the literature emphasizes binary classification, a discussion of fairness in regression, recommender systems, and unsupervised learning is also provided along with a selection of currently available open source libraries. The article concludes by summarizing open challenges articulated as five dilemmas for fairness research.

Daftar Referensi

0 referensi

Tidak ada referensi ditemukan.

Artikel yang Mensitasi

0 sitasi

Tidak ada artikel yang mensitasi.