DOI: 10.18653/v1/2023.emnlp-main.797

Terbit pada 23 Oktober 2023 Pada Conference on Empirical Methods in Natural Language Processing

We are Who We Cite: Bridges of Influence Between Natural Language Processing and Other Academic Fields

Jan Philip Wahle Mohamed Abdalla Saif Mohammad + 2 penulis

Abstrak

Natural Language Processing (NLP) is poised to substantially influence the world. However, significant progress comes hand-in-hand with substantial risks. Addressing them requires broad engagement with various fields of study. Yet, little empirical work examines the state of such engagement (past or current). In this paper, we quantify the degree of influence between 23 fields of study and NLP (on each other). We analyzed ~77k NLP papers, ~3.1m citations from NLP papers to other papers, and ~1.8m citations from other papers to NLP papers. We show that, unlike most fields, the cross-field engagement of NLP, measured by our proposed Citation Field Diversity Index (CFDI), has declined from 0.58 in 1980 to 0.31 in 2022 (an all-time low). In addition, we find that NLP has grown more insular -- citing increasingly more NLP papers and having fewer papers that act as bridges between fields. NLP citations are dominated by computer science; Less than 8% of NLP citations are to linguistics, and less than 3% are to math and psychology. These findings underscore NLP's urgent need to reflect on its engagement with various fields.

Artikel Ilmiah Terkait

The Elephant in the Room: Analyzing the Presence of Big Tech in Natural Language Processing Research

Saif M. Mohammad Jan Philip Wahle Aur'elie N'ev'eol + 4 lainnya

4 Mei 2023

Recent advances in deep learning methods for natural language processing (NLP) have created new business opportunities and made NLP research critical for industry development. As one of the big players in the field of NLP, together with governments and universities, it is important to track the influence of industry on research. In this study, we seek to quantify and characterize industry presence in the NLP community over time. Using a corpus with comprehensive metadata of 78,187 NLP publications and 701 resumes of NLP publication authors, we explore the industry presence in the field since the early 90s. We find that industry presence among NLP authors has been steady before a steep increase over the past five years (180% growth from 2017 to 2022). A few companies account for most of the publications and provide funding to academic researchers through grants and internships. Our study shows that the presence and impact of the industry on natural language processing research are significant and fast-growing. This work calls for increased transparency of industry influence in the field.

NLP Scholar: An Interactive Visual Explorer for Natural Language Processing Literature

Saif M. Mohammad

31 Mei 2020

As part of the NLP Scholar project, we created a single unified dataset of NLP papers and their meta-information (including citation numbers), by extracting and aligning information from the ACL Anthology and Google Scholar. In this paper, we describe several interconnected interactive visualizations (dashboards) that present various aspects of the data. Clicking on an item within a visualization or entering query terms in the search boxes filters the data in all visualizations in the dashboard. This allows users to search for papers in the area of their interest, published within specific time periods, published by specified authors, etc. The interactive visualizations presented here, and the associated dataset of papers mapped to citations, have additional uses as well including understanding how the field is growing (both overall and across sub-areas), as well as quantifying the impact of different types of papers on subsequent publications.

What Can Natural Language Processing Do for Peer Review?

Jonathan K. Kummerfeld Nihar B. Shah Aur'elie N'ev'eol + 21 lainnya

10 Mei 2024

The number of scientific articles produced every year is growing rapidly. Providing quality control over them is crucial for scientists and, ultimately, for the public good. In modern science, this process is largely delegated to peer review -- a distributed procedure in which each submission is evaluated by several independent experts in the field. Peer review is widely used, yet it is hard, time-consuming, and prone to error. Since the artifacts involved in peer review -- manuscripts, reviews, discussions -- are largely text-based, Natural Language Processing has great potential to improve reviewing. As the emergence of large language models (LLMs) has enabled NLP assistance for many new tasks, the discussion on machine-assisted peer review is picking up the pace. Yet, where exactly is help needed, where can NLP help, and where should it stand aside? The goal of our paper is to provide a foundation for the future efforts in NLP for peer-reviewing assistance. We discuss peer review as a general process, exemplified by reviewing at AI conferences. We detail each step of the process from manuscript submission to camera-ready revision, and discuss the associated challenges and opportunities for NLP assistance, illustrated by existing work. We then turn to the big challenges in NLP for peer review as a whole, including data acquisition and licensing, operationalization and experimentation, and ethical issues. To help consolidate community efforts, we create a companion repository that aggregates key datasets pertaining to peer review. Finally, we issue a detailed call for action for the scientific community, NLP and AI researchers, policymakers, and funding bodies to help bring the research in NLP for peer review forward. We hope that our work will help set the agenda for research in machine-assisted scientific quality control in the age of AI, within the NLP community and beyond.

From human writing to artificial intelligence generated text: examining the prospects and potential threats of ChatGPT in academic writing.

Ismail Dergaa H. Ben Saad P. Żmijewski + 1 lainnya

1 April 2023

Natural language processing (NLP) has been studied in computing for decades. Recent technological advancements have led to the development of sophisticated artificial intelligence (AI) models, such as Chat Generative Pre-trained Transformer (ChatGPT). These models can perform a range of language tasks and generate human-like responses, which offers exciting prospects for academic efficiency. This manuscript aims at (i) exploring the potential benefits and threats of ChatGPT and other NLP technologies in academic writing and research publications; (ii) highlights the ethical considerations involved in using these tools, and (iii) consider the impact they may have on the authenticity and credibility of academic work. This study involved a literature review of relevant scholarly articles published in peer-reviewed journals indexed in Scopus as quartile 1. The search used keywords such as "ChatGPT," "AI-generated text," "academic writing," and "natural language processing." The analysis was carried out using a quasi-qualitative approach, which involved reading and critically evaluating the sources and identifying relevant data to support the research questions. The study found that ChatGPT and other NLP technologies have the potential to enhance academic writing and research efficiency. However, their use also raises concerns about the impact on the authenticity and credibility of academic work. The study highlights the need for comprehensive discussions on the potential use, threats, and limitations of these tools, emphasizing the importance of ethical and academic principles, with human intelligence and critical thinking at the forefront of the research process. This study highlights the need for comprehensive debates and ethical considerations involved in their use. The study also recommends that academics exercise caution when using these tools and ensure transparency in their use, emphasizing the importance of human intelligence and critical thinking in academic work.

Five sources of bias in natural language processing

Shrimai Prabhumoye E. Hovy

1 Agustus 2021

Abstract Recently, there has been an increased interest in demographically grounded bias in natural language processing (NLP) applications. Much of the recent work has focused on describing bias and providing an overview of bias in a larger context. Here, we provide a simple, actionable summary of this recent work. We outline five sources where bias can occur in NLP systems: (1) the data, (2) the annotation process, (3) the input representations, (4) the models, and finally (5) the research design (or how we conceptualize our research). We explore each of the bias sources in detail in this article, including examples and links to related work, as well as potential counter‐measures.

Daftar Referensi

0 referensi

Tidak ada referensi ditemukan.

Artikel yang Mensitasi

0 sitasi

Tidak ada artikel yang mensitasi.