Data science: a game changer for science and innovation
Abstrak
This paper shows data science’s potential for disruptive innovation in science, industry, policy, and people’s lives. We present how data science impacts science and society at large in the coming years, including ethical problems in managing human behavior data and considering the quantitative expectations of data science economic impact. We introduce concepts such as open science and e-infrastructure as useful tools for supporting ethical data science and training new generations of data scientists. Finally, this work outlines SoBigData Research Infrastructure as an easy-to-access platform for executing complex data science processes. The services proposed by SoBigData are aimed at using data science to understand the complexity of our contemporary, globally interconnected society.
Artikel Ilmiah Terkait
S. Vadhan Sophia Song Tania Schlatter + 2 lainnya
23 Februari 2023
Across academia, government, and industry, data stewards are facing increasing pressure to make datasets more openly accessible for researchers while also protecting the privacy of data subjects. Differential privacy (DP) is one promising way to offer privacy along with open access, but further inquiry is needed into the tensions between DP and data science. In this study, we conduct interviews with 19 data practitioners who are non-experts in DP as they use a DP data analysis prototype to release privacy-preserving statistics about sensitive data, in order to understand perceptions, challenges, and opportunities around using DP. We find that while DP is promising for providing wider access to sensitive datasets, it also introduces challenges into every stage of the data science workflow. We identify ethics and governance questions that arise when socializing data scientists around new privacy constraints and offer suggestions to better integrate DP and data science.
David Donoho
2 Oktober 2023
A purported `AI Singularity' has been in the public eye recently. Mass media and US national political attention focused on `AI Doom' narratives hawked by social media influencers. The European Commission is announcing initiatives to forestall `AI Extinction'. In my opinion, `AI Singularity' is the wrong narrative for what's happening now; recent happenings signal something else entirely. Something fundamental to computation-based research really changed in the last ten years. In certain fields, progress is dramatically more rapid than previously, as the fields undergo a transition to frictionless reproducibility (FR). This transition markedly changes the rate of spread of ideas and practices, affects mindsets, and erases memories of much that came before. The emergence of frictionless reproducibility follows from the maturation of 3 data science principles in the last decade. Those principles involve data sharing, code sharing, and competitive challenges, however implemented in the particularly strong form of frictionless open services. Empirical Machine Learning (EML) is todays leading adherent field, and its consequent rapid changes are responsible for the AI progress we see. Still, other fields can and do benefit when they adhere to the same principles. Many rapid changes from this maturation are misidentified. The advent of FR in EML generates a steady flow of innovations; this flow stimulates outsider intuitions that there's an emergent superpower somewhere in AI. This opens the way for PR to push worrying narratives: not only `AI Extinction', but also the supposed monopoly of big tech on AI research. The helpful narrative observes that the superpower of EML is adherence to frictionless reproducibility practices; these practices are responsible for the striking progress in AI that we see everywhere.
P. Subedi Jack Smith Y. Samuel + 2 lainnya
19 April 2021
There has been an increasing interest in and growing need for high performance computing (HPC), popularly known as supercomputing, in domains such as textual analytics, business domains analytics, forecasting and natural language processing (NLP), in addition to the relatively mature supercomputing domains of quantum physics and biology. HPC has been widely used in computer science (CS) and other traditionally computation intensive disciplines, but has remained largely siloed away from the vast array of social, behavioral, business and economics disciplines. However, with ubiquitous big data, there is a compelling need to make HPC technologically and economically accessible, easy to use, and operationally democratized. Therefore, this research focuses on making two key contributions, the first is the articulation of strategies based on availability, accessibility and usability for the demystification and democratization of HPC, based on an analytical review of Caliburn, a notable supercomputer at its inception. The second contribution is a set of principles for HPC adoption based on an experiential narrative of HPC usage for textual analytics and NLP of social media data from a first-time user perspective. Both, the HPC usage process and the output of the early stage analytics are summarized. This research study synthesizes expert input on HPC democratization strategies, and chronicles the challenges and opportunities from a multidisciplinary perspective, of a case of rapid adoption of supercomputing for textual analytics and NLP. Deductive logic is used to identify strategies which can lead to efficacious engagement, adoption, production and sustained usage for research, teaching, application and innovation by researchers, faculty, professionals and students across a broad range of disciplines.
Julia S. Lowndes A. Fredston
7 Juli 2023
Open science is a global movement happening across all research fields. Enabled by technology and the open web, it builds on years of efforts by individuals, grassroots organizations, institutions, and agencies. The goal is to share knowledge and broaden participation in science, from early ideation to making research outputs openly accessible to all (open access). With an emphasis on transparency and collaboration, the open science movement dovetails with efforts to increase diversity, equity, inclusion, and belonging in science and society. The US Biden-Harris Administration and many other US government agencies have declared 2023 the Year of Open Science, providing a great opportunity to boost participation in open science for the oceans. For researchers day-to-day, open science is a critical piece of modern analytical workflows with increasing amounts of data. Therefore, we focus this article on open data science-the tooling and people enabling reproducible, transparent, inclusive practices for data-intensive research-and its intersection with the marine sciences. We discuss the state of various dimensions of open science and argue that technical advancements have outpaced our field's culture change to incorporate them. Increasing inclusivity and technical skill building are interlinked and must be prioritized within the marine science community to find collaborative solutions for responding to climate change and other threats to marine biodiversity and society. Expected final online publication date for the Annual Review of Marine Science, Volume 16 is January 2024. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
P. Almklov Thomas Østerlie Elena Parmiggiani
2022
Much information systems research on data science treats data as preexisting objects and focuses on how these objects are analyzed. Such a view, however, overlooks the work involved in finding and preparing the data in the first place, such that they are available to be analyzed. In this paper, we draw on a longitudinal study of data management in the oil and gas industry to shed light on this backroom data work. We find that this type of work is qualitatively different from the front-stage data analytics in the realm of data science but is also deeply interwoven with it. We show that this work is unstable and bidirectional. That is, the work practices are constantly changing and must simultaneously take into account what data might be possible to access as well as the potential future uses of the data. It is also a collaborative endeavor involving cross-disciplinary expertise that seeks to establish control over data and is shaped by the epistemological orientation of the oil and gas domain.
Daftar Referensi
0 referensiTidak ada referensi ditemukan.
Artikel yang Mensitasi
0 sitasiTidak ada artikel yang mensitasi.