Retiring Adult: New Datasets for Fair Machine Learning
Abstrak
Although the fairness community has recognized the importance of data, researchers in the area primarily rely on UCI Adult when it comes to tabular data. Derived from a 1994 US Census survey, this dataset has appeared in hundreds of research papers where it served as the basis for the development and comparison of many algorithmic fairness interventions. We reconstruct a superset of the UCI Adult data from available US Census sources and reveal idiosyncrasies of the UCI Adult dataset that limit its external validity. Our primary contribution is a suite of new datasets derived from US Census surveys that extend the existing data ecosystem for research on fair machine learning. We create prediction tasks relating to income, employment, health, transportation, and housing. The data span multiple years and all states of the United States, allowing researchers to study temporal shift and geographic variation. We highlight a broad initial sweep of new empirical insights relating to trade-offs between fairness criteria, performance of algorithmic interventions, and the role of distribution shift based on our new datasets. Our findings inform ongoing debates, challenge some existing narratives, and point to future research directions. Our datasets are available at https://github.com/zykls/folktables.
Artikel Ilmiah Terkait
C. Haas Simon Caton
4 Oktober 2020
When Machine Learning technologies are used in contexts that affect citizens, companies as well as researchers need to be confident that there will not be any unexpected social implications, such as bias towards gender, ethnicity, and/or people with disabilities. There is significant literature on approaches to mitigate bias and promote fairness, yet the area is complex and hard to penetrate for newcomers to the domain. This article seeks to provide an overview of the different schools of thought and approaches that aim to increase the fairness of Machine Learning. It organizes approaches into the widely accepted framework of pre-processing, in-processing, and post-processing methods, subcategorizing into a further 11 method areas. Although much of the literature emphasizes binary classification, a discussion of fairness in regression, recommender systems, and unsupervised learning is also provided along with a selection of currently available open source libraries. The article concludes by summarizing open challenges articulated as five dilemmas for fairness research.
E. Shmueli Dana Pessach
3 Februari 2022
An increasing number of decisions regarding the daily lives of human beings are being controlled by artificial intelligence and machine learning (ML) algorithms in spheres ranging from healthcare, transportation, and education to college admissions, recruitment, provision of loans, and many more realms. Since they now touch on many aspects of our lives, it is crucial to develop ML algorithms that are not only accurate but also objective and fair. Recent studies have shown that algorithmic decision making may be inherently prone to unfairness, even when there is no intention for it. This article presents an overview of the main concepts of identifying, measuring, and improving algorithmic fairness when using ML algorithms, focusing primarily on classification tasks. The article begins by discussing the causes of algorithmic bias and unfairness and the common definitions and measures for fairness. Fairness-enhancing mechanisms are then reviewed and divided into pre-process, in-process, and post-process mechanisms. A comprehensive comparison of the mechanisms is then conducted, toward a better understanding of which mechanisms should be used in different scenarios. The article ends by reviewing several emerging research sub-fields of algorithmic fairness, beyond classification.
Giovanna Guerrini B. Catania Chiara Accinelli
9 Juni 2022
The data science era is characterized by data-driven automated decision systems (ADS) enabling, through data analytics and machine learning, automated decisions in many contexts, deeply impacting our lives. As such, their downsides and potential risks are becoming more and more evident: technical solutions, alone, are not sufficient and an interdisciplinary approach is needed. Consequently, ADS should evolve into data-informed ADS , which take humans in the loop in all the data processing steps. Data-informed ADS should deal with data responsibly , guaranteeing nondiscrimination with respect to protected groups of individuals. Nondiscrimination can be characterized in terms of different types of properties, like fairness and diversity. While fairness, i.e., absence of bias against minorities, has been widely investigated in machine learning, only more recently this issue has been tackled by considering all the steps of data processing pipelines at the basis of ADS, from data acquisition to analysis. Additionally, fairness is just one point of view of nondiscrimination to be considered for guaranteeing equity: other issues, like diversity, are raising interest from the scientific community due to their relevance in society. This paper aims at critically surveying how nondiscrimination has been investigated in the context of complex data science pipelines at the basis of data-informed ADS, by focusing on the specific data processing tasks for which nondiscrimination solutions have been proposed.
Jiongli Zhu Babak Salimi Sainyam Galhotra + 1 lainnya
21 Desember 2022
This paper proposes a novel framework for certifying the fairness of predictive models trained on biased data. It draws from query answering for incomplete and inconsistent databases to formulate the problem of consistent range approximation (CRA) of fairness queries for a predictive model on a target population. The framework employs background knowledge of the data collection process and biased data, working with or without limited statistics about the target population, to compute a range of answers for fairness queries. Using CRA, the framework builds predictive models that are certifiably fair on the target population, regardless of the availability of external data during training. The framework's efficacy is demonstrated through evaluations on real data, showing substantial improvement over existing state-of-the-art methods.
Zhenpeng Chen M. Harman J Zhang + 2 lainnya
15 Juli 2022
This article provides a comprehensive survey of bias mitigation methods for achieving fairness in Machine Learning (ML) models. We collect a total of 341 publications concerning bias mitigation for ML classifiers. These methods can be distinguished based on their intervention procedure (i.e., pre-processing, in-processing, post-processing) and the technique they apply. We investigate how existing bias mitigation methods are evaluated in the literature. In particular, we consider datasets, metrics, and benchmarking. Based on the gathered insights (e.g., What is the most popular fairness metric? How many datasets are used for evaluating bias mitigation methods?), we hope to support practitioners in making informed choices when developing and evaluating new bias mitigation methods.
Daftar Referensi
0 referensiTidak ada referensi ditemukan.
Artikel yang Mensitasi
0 sitasiTidak ada artikel yang mensitasi.