DOI: 10.1145/3575663
Terbit pada 20 Januari 2023 Pada Applied Data Science

What is Data Science?

Michael L. Brodie

Abstrak

The Communications website, https://cacm.acm.org, features more than a dozen bloggers in the BLOG@CACM community. In each issue of Communications, we'll publish selected posts or excerpts. twitter Follow us on Twitter at http://twitter.com/blogCACM https://cacm.acm.org/blogs/blog-cacm Koby Mike and Orit Hazzan consider why multiple definitions are needed to pin down data science.

Artikel Ilmiah Terkait

Foundations of Data Science

J. Hopcroft R. Kannan Avrim Blum

17 Januari 2020

Computer science as an academic discipline began in the 1960’s. Emphasis was on programming languages, compilers, operating systems, and the mathematical theory that supported these areas. Courses in theoretical computer science covered finite automata, regular expressions, context-free languages, and computability. In the 1970’s, the study of algorithms was added as an important component of theory. The emphasis was on making computers useful. Today, a fundamental change is taking place and the focus is more on applications. There are many reasons for this change. The merging of computing and communications has played an important role. The enhanced ability to observe, collect, and store data in the natural sciences, in commerce, and in other fields calls for a change in our understanding of data and how to handle it in the modern setting. The emergence of the web and social networks as central aspects of daily life presents both opportunities and challenges for theory.

Data science curriculum in the iField

B. Bishop Chirag Shah I. Song + 10 lainnya

30 Juli 2022

Many disciplines, including the broad Field of Information (iField), offer Data Science (DS) programs. There have been significant efforts exploring an individual discipline's identity and unique contributions to the broader DS education landscape. To advance DS education in the iField, the iSchool Data Science Curriculum Committee (iDSCC) was formed and charged with building and recommending a DS education framework for iSchools. This paper reports on the research process and findings of a series of studies to address important questions: What is the iField identity in the multidisciplinary DS education landscape? What is the status of DS education in iField schools? What knowledge and skills should be included in the core curriculum for iField DS education? What are the jobs available for DS graduates from the iField? What are the differences between graduate‐level and undergraduate‐level DS education? Answers to these questions will not only distinguish an iField approach to DS education but also define critical components of DS curriculum. The results will inform individual DS programs in the iField to develop curriculum to support undergraduate and graduate DS education in their local context.

In the Backrooms of Data Science

P. Almklov Thomas Østerlie Elena Parmiggiani

2022

Much information systems research on data science treats data as preexisting objects and focuses on how these objects are analyzed. Such a view, however, overlooks the work involved in finding and preparing the data in the first place, such that they are available to be analyzed. In this paper, we draw on a longitudinal study of data management in the oil and gas industry to shed light on this backroom data work. We find that this type of work is qualitatively different from the front-stage data analytics in the realm of data science but is also deeply interwoven with it. We show that this work is unstable and bidirectional. That is, the work practices are constantly changing and must simultaneously take into account what data might be possible to access as well as the potential future uses of the data. It is also a collaborative endeavor involving cross-disciplinary expertise that seeks to establish control over data and is shaped by the epistemological orientation of the oil and gas domain.

What issues are data scientists talking about? Identification of current data science issues using semantic content analysis of Q&A communities

Fatih Gurcan

18 Mei 2023

Background Because of the growing involvement of communities from various disciplines, data science is constantly evolving and gaining popularity. The growing interest in data science-based services and applications presents numerous challenges for their development. Therefore, data scientists frequently turn to various forums, particularly domain-specific Q&A websites, to solve difficulties. These websites evolve into data science knowledge repositories over time. Analysis of such repositories can provide valuable insights into the applications, topics, trends, and challenges of data science. Methods In this article, we investigated what data scientists are asking by analyzing all posts to date on DSSE, a data science-focused Q&A website. To discover main topics embedded in data science discussions, we used latent Dirichlet allocation (LDA), a probabilistic approach for topic modeling. Results As a result of this analysis, 18 main topics were identified that demonstrate the current interests and issues in data science. We then examined the topics’ popularity and difficulty. In addition, we identified the most commonly used tasks, techniques, and tools in data science. As a result, “Model Training”, “Machine Learning”, and “Neural Networks” emerged as the most prominent topics. Also, “Data Manipulation”, “Coding Errors”, and “Tools” were identified as the most viewed (most popular) topics. On the other hand, the most difficult topics were identified as “Time Series”, “Computer Vision”, and “Recommendation Systems”. Our findings have significant implications for many data science stakeholders who are striving to advance data-driven architectures, concepts, tools, and techniques.

A Call for a Humanistic Stance Toward K–12 Data Science Education

Michelle Wilkerson Kathryn A. Lanouette Victor R. Lee

23 September 2021

There is growing interest in how to better prepare K–12 students to work with data. In this article, we assert that these discussions of teaching and learning must attend to the human dimensions of data work. Specifically, we draw from several established lines of research to argue that practices involving the creation and manipulation of data are shaped by a combination of personal experiences, cultural tools and practices, and political concerns. We demonstrate through two examples how our proposed humanistic stance highlights ways that efforts to make data personally relevant for youth also necessarily implicate cultural and sociopolitical dimensions that affect the design and learning opportunities in data-rich learning environments. We offer an interdisciplinary framework based on literature from multiple bodies of educational research to inform design, teaching and research for more effective, responsible, and inclusive student learning experiences with and about data.

Daftar Referensi

0 referensi

Tidak ada referensi ditemukan.

Artikel yang Mensitasi

0 sitasi

Tidak ada artikel yang mensitasi.