The Craft and Coordination of Data Curation: Complicating Workflow Views of Data Science
Abstrak
Data curation is the process of making a dataset fit-for-use and archivable. It is critical to data-intensive science because it makes complex data pipelines possible, studies reproducible, and data reusable. Yet the complexities of the hands-on, technical, and intellectual work of data curation is frequently overlooked or downplayed. Obscuring the work of data curation not only renders the labor and contributions of data curators invisible but also hides the impact that curators' work has on the later usability, reliability, and reproducibility of data. To better understand the work and impact of data curation, we conducted a close examination of data curation at a large social science data repository, the Inter-university Consortium for Political and Social Research (ICPSR). We asked: What does curatorial work entail at ICPSR, and what work is more or less visible to different stakeholders and in different contexts? And, how is that curatorial work coordinated across the organization? We triangulated accounts of data curation from interviews and records of curation in Jira tickets to develop a rich and detailed account of curatorial work. While we identified numerous curatorial actions performed by ICPSR curators, we also found that curators rely on a number of craft practices to perform their jobs. The reality of their work practices defies the rote sequence of events implied by many life cycle or workflow models. Further, we show that craft practices are needed to enact data curation best practices and standards. The craft that goes into data curation is often invisible to end users, but it is well recognized by ICPSR curators and their supervisors. Explicitly acknowledging and supporting data curators as craftspeople is important in creating sustainable and successful curatorial infrastructures.
Artikel Ilmiah Terkait
P. Almklov Thomas Østerlie Elena Parmiggiani
2022
Much information systems research on data science treats data as preexisting objects and focuses on how these objects are analyzed. Such a view, however, overlooks the work involved in finding and preparing the data in the first place, such that they are available to be analyzed. In this paper, we draw on a longitudinal study of data management in the oil and gas industry to shed light on this backroom data work. We find that this type of work is qualitatively different from the front-stage data analytics in the realm of data science but is also deeply interwoven with it. We show that this work is unstable and bidirectional. That is, the work practices are constantly changing and must simultaneously take into account what data might be possible to access as well as the potential future uses of the data. It is also a collaborative endeavor involving cross-disciplinary expertise that seeks to establish control over data and is shaped by the epistemological orientation of the oil and gas domain.
Michael J. Muller Dakuo Wang Amy X. Zhang
18 Januari 2020
Today, the prominence of data science within organizations has given rise to teams of data science workers collaborating on extracting insights from data, as opposed to individual data scientists working alone. However, we still lack a deep understanding of how data science workers collaborate in practice. In this work, we conducted an online survey with 183 participants who work in various aspects of data science. We focused on their reported interactions with each other (e.g., managers with engineers) and with different tools (e.g., Jupyter Notebook). We found that data science teams are extremely collaborative and work with a variety of stakeholders and tools during the six common steps of a data science workflow (e.g., clean data and train model). We also found that the collaborative practices workers employ, such as documentation, vary according to the kinds of tools they use. Based on these findings, we discuss design implications for supporting data science team collaborations and future research directions.
Tom Steinberger J. King M. Ackerman + 1 lainnya
30 Maret 2022
Domain experts play an essential role in data science by helping data scientists situate their technical work beyond the statistical analysis of large datasets. How domain experts themselves may engage with data science tools as a type of end-user remains largely invisible. Understanding data science as domain expert-driven depends on understanding how domain experts use data. Drawing on an ethnographic study of a craft brewery in Korea, we show how craft brewers worked with data by situating otherwise abstract data within their brewing practices and settings. We contribute theoretical insight into how domain experts use data distinctly from technical data scientists in terms of their view of data (situated vs. abstract), purposes for engaging with data (guiding processes over predicting outcomes), and overall goals of using data (flexible control vs. precision). We propose four ways in which working with data can be supported through the design of data science tools, and discuss how craftwork can be a useful lens for integrating domain expert-driven understandings of data science into CSCW and HCI research.
Tianling Yang A. Hanna Diana Serbanescu + 3 lainnya
3 Maret 2021
In industrial computer vision, discretionary decisions surrounding the production of image training data remain widely undocumented. Recent research taking issue with such opacity has proposed standardized processes for dataset documentation. In this paper, we expand this space of inquiry through fieldwork at two data processing companies and thirty interviews with data workers and computer vision practitioners. We identify four key issues that hinder the documentation of image datasets and the effective retrieval of production contexts. Finally, we propose reflexivity, understood as a collective consideration of social and intellectual factors that lead to praxis, as a necessary precondition for documentation. Reflexive documentation can help to expose the contexts, relations, routines, and power structures that shape data.
Michelle Wilkerson Kathryn A. Lanouette Victor R. Lee
23 September 2021
There is growing interest in how to better prepare K–12 students to work with data. In this article, we assert that these discussions of teaching and learning must attend to the human dimensions of data work. Specifically, we draw from several established lines of research to argue that practices involving the creation and manipulation of data are shaped by a combination of personal experiences, cultural tools and practices, and political concerns. We demonstrate through two examples how our proposed humanistic stance highlights ways that efforts to make data personally relevant for youth also necessarily implicate cultural and sociopolitical dimensions that affect the design and learning opportunities in data-rich learning environments. We offer an interdisciplinary framework based on literature from multiple bodies of educational research to inform design, teaching and research for more effective, responsible, and inclusive student learning experiences with and about data.
Daftar Referensi
0 referensiTidak ada referensi ditemukan.
Artikel yang Mensitasi
0 sitasiTidak ada artikel yang mensitasi.