How We've Taught Algorithms to See Identity: Constructing Race and Gender in Image Databases for Facial Analysis

M. Scheuerman Kandrea Wade Caitlin Lustig + 1 penulis

Abstrak

Race and gender have long sociopolitical histories of classification in technical infrastructures-from the passport to social media. Facial analysis technologies are particularly pertinent to understanding how identity is operationalized in new technical systems. What facial analysis technologies can do is determined by the data available to train and evaluate them with. In this study, we specifically focus on this data by examining how race and gender are defined and annotated in image databases used for facial analysis. We found that the majority of image databases rarely contain underlying source material for how those identities are defined. Further, when they are annotated with race and gender information, database authors rarely describe the process of annotation. Instead, classifications of race and gender are portrayed as insignificant, indisputable, and apolitical. We discuss the limitations of these approaches given the sociohistorical nature of race and gender. We posit that the lack of critical engagement with this nature renders databases opaque and less trustworthy. We conclude by encouraging database authors to address both the histories of classification inherently embedded into race and gender, as well as their positionality in embedding such classifications.

Artikel Ilmiah Terkait

Do Datasets Have Politics? Disciplinary Values in Computer Vision Dataset Development

A. Hanna M. Scheuerman Emily L. Denton

9 Agustus 2021

Data is a crucial component of machine learning. The field is reliant on data to train, validate, and test models. With increased technical capabilities, machine learning research has boomed in both academic and industry settings, and one major focus has been on computer vision. Computer vision is a popular domain of machine learning increasingly pertinent to real-world applications, from facial recognition in policing to object detection for autonomous vehicles. Given computer vision's propensity to shape machine learning research and impact human life, we seek to understand disciplinary practices around dataset documentation - how data is collected, curated, annotated, and packaged into datasets for computer vision researchers and practitioners to use for model tuning and development. Specifically, we examine what dataset documentation communicates about the underlying values of vision data and the larger practices and goals of computer vision as a field. To conduct this study, we collected a corpus of about 500 computer vision datasets, from which we sampled 114 dataset publications across different vision tasks. Through both a structured and thematic content analysis, we document a number of values around accepted data practices, what makes desirable data, and the treatment of humans in the dataset construction process. We discuss how computer vision datasets authors value efficiency at the expense of care; universality at the expense of contextuality; impartiality at the expense of positionality; and model work at the expense of data work. Many of the silenced values we identify sit in opposition with social computing practices. We conclude with suggestions on how to better incorporate silenced values into the dataset creation and curation process.

“It’s Complicated”: Negotiating Accessibility and (Mis)Representation in Image Descriptions of Race, Gender, and Disability

Cynthia L. Bennett Anhong Guo Cole Gleason + 3 lainnya

6 Mei 2021

Content creators are instructed to write textual descriptions of visual content to make it accessible; yet existing guidelines lack specifics on how to write about people’s appearance, particularly while remaining mindful of consequences of (mis)representation. In this paper, we report on interviews with screen reader users who were also Black, Indigenous, People of Color, Non-binary, and/or Transgender on their current image description practices and preferences, and experiences negotiating theirs and others’ appearances non-visually. We discuss these perspectives, and the ethics of humans and AI describing appearance characteristics that may convey the race, gender, and disabilities of those photographed. In turn, we share considerations for more carefully describing appearance, and contexts in which such information is perceived salient. Finally, we offer tensions and questions for accessibility research to equitably consider politics and ecosystems in which technologies will embed, such as potential risks of human and AI biases amplifying through image descriptions.

FACET: Fairness in Computer Vision Evaluation Benchmark

Melissa Hall Cheng-Yang Fu Chloé Rolland + 5 lainnya

31 Agustus 2023

Computer vision models have known performance disparities across attributes such as gender and skin tone. This means during tasks such as classification and detection, model performance differs for certain classes based on the demographics of the people in the image. These disparities have been shown to exist, but until now there has not been a unified approach to measure these differences for common use-cases of computer vision models. We present a new benchmark named FACET (FAirness in Computer Vision EvaluaTion), a large, publicly available evaluation set of 32k images for some of the most common vision tasks - image classification, object detection and segmentation. For every image in FACET, we hired expert reviewers to manually annotate person-related attributes such as perceived skin tone and hair type, manually draw bounding boxes and label fine-grained person-related classes such as disk jockey or guitarist. In addition, we use FACET to benchmark state-of-the-art vision models and present a deeper understanding of potential performance disparities and challenges across sensitive demographic attributes. With the exhaustive annotations collected, we probe models using single demographics attributes as well as multiple attributes using an intersectional approach (e.g. hair color and perceived skin tone). Our results show that classification, detection, segmentation, and visual grounding models exhibit performance disparities across demographic attributes and intersections of attributes. These harms suggest that not all people represented in datasets receive fair and equitable treatment in these vision tasks. We hope current and future results using our benchmark will contribute to fairer, more robust vision models. FACET is available publicly at https://facet.metademolab.com.

Detecting race and gender bias in visual representation of AI on web search engines

M. Makhortykh R. Ulloa Aleksandra Urman

26 Juni 2021

Web search engines influence perception of social reality by filtering and ranking information. However, their outputs are often subjected to bias that can lead to skewed representation of subjects such as professional occupations or gender. In our paper, we use a mixed-method approach to investigate presence of race and gender bias in representation of artificial intelligence (AI) in image search results coming from six different search engines. Our findings show that search engines prioritize anthropomorphic images of AI that portray it as white, whereas non-white images of AI are present only in non-Western search engines. By contrast, gender representation of AI is more diverse and less skewed towards a specific gender that can be attributed to higher awareness about gender bias in search outputs. Our observations indicate both the the need and the possibility for addressing bias in representation of societally relevant subjects, such as technological innovation, and emphasize the importance of designing new approaches for detecting bias in information retrieval systems.

An Exploration of Intersectionality in Software Development and Use

Alicia E. Boyd Hana Winchester Brittany Johnson

1 Mei 2022

The growing ubiquity of machine learning technologies has led to concern and concentrated efforts at improving data-centric research and practice. While much work has been done on addressing equity concerns with respect to unary identities (e.g, race or gender), little to no work in Software Engineering has studied intersectionality to determine how we can provide equitable outcomes for complex, overlapping social identities in data-driven tech. To this end, we designed a survey to learn the landscape of intersectional identities in tech, where these populations contribute data, and how marginalized populations feel about the impact technology has on their day to day lives. Our data thus far, collected from 12 respondents and composed mostly of white and male identities, further highlights the lack of representation in modern data sets and need for contributions that explicitly explore how to support data-driven research and development. ACM Reference Format: Hana Winchester, Alicia E. Boyd, and Brittany Johnson. 2022. An Exploration of Intersectionality in Software Development and Use. In Third Workshop on Gender Equality, Diversity, and Inclusion in Software Engineering (GE@ICSE’22), May 20, 2022, Pittsburgh, PA, USA. ACM, New York, NY, USA, 4 pages. https://doi.org/10.1145/3524501.3527605

Daftar Referensi

0 referensi

Tidak ada referensi ditemukan.

Artikel yang Mensitasi

1 sitasi

“It’s Complicated”: Negotiating Accessibility and (Mis)Representation in Image Descriptions of Race, Gender, and Disability

Cynthia L. Bennett Anhong Guo + 4 lainnya

6 Mei 2021