DOI: 10.1145/3649598
Terbit pada 2 Maret 2024 Pada ACM Transactions on Software Engineering and Methodology

Communicating Study Design Trade-offs in Software Engineering

Jin L. C. Guo Igor Steinmacher M. Nassif + 7 penulis

Abstrak

Reflecting on the limitations of a study is a crucial part of the research process. In software engineering studies, this reflection is typically conveyed through discussions of study limitations or threats to validity. In current practice, such discussions seldom provide sufficient insight to understand the rationale for decisions taken before and during the study, and their implications. We revisit the practice of discussing study limitations and threats to validity and identify its weaknesses. We propose to refocus this practice of self-reflection to a discussion centered on the notion of trade-offs. We argue that documenting trade-offs allows researchers to clarify how the benefits of their study design decisions outweigh the costs of possible alternatives. We present guidelines for reporting trade-offs in a way that promotes a fair and dispassionate assessment of researchers’ work.

Artikel Ilmiah Terkait

SEGRESS: Software Engineering Guidelines for REporting Secondary Studies

D. Budgen L. Madeyski B. Kitchenham

1 Maret 2023

Context: Several tertiary studies have criticized the reporting of software engineering secondary studies. Objective: Our objective is to identify guidelines for reporting software engineering (SE) secondary studies which would address problems observed in the reporting of software engineering systematic reviews (SRs). Method: We review the criticisms of SE secondary studies and identify the major areas of concern. We assess the PRISMA 2020 (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) statement as a possible solution to the need for SR reporting guidelines, based on its status as the reporting guideline recommended by the Cochrane Collaboration whose SR guidelines were a major input to the guidelines developed for SE. We report its advantages and limitations in the context of SE secondary studies. We also assess reporting guidelines for mapping studies and qualitative reviews, and compare their structure and content with that of PRISMA 2020. Results: Previous tertiary studies confirm that reports of secondary studies are of variable quality. However, ad hoc recommendations that amend reporting standards may result in unnecessary duplication of text. We confirm that the PRISMA 2020 statement addresses SE reporting problems, but is mainly oriented to quantitative reviews, mixed-methods reviews and meta-analyses. However, we show that the PRISMA 2020 item definitions can be extended to cover the information needed to report mapping studies and qualitative reviews. Conclusions: In this paper and its Supplementary Material, we present and illustrate an integrated set of guidelines called SEGRESS (Software Engineering Guidelines for REporting Secondary Studies), suitable for quantitative systematic reviews (building upon PRISMA 2020), mapping studies (PRISMA-ScR), and qualitative reviews (ENTREQ and RAMESES), that addresses reporting problems found in current SE SRs.

How Should Software Engineering Secondary Studies Include Grey Material?

B. Kitchenham L. Madeyski D. Budgen

1 Februari 2023

Context: Recent papers have proposed the use of grey literature (GL) and multivocal reviews. These papers have raised issues about the practices used for systematic reviews (SRs) in software engineering (SE) and suggested that there should be changes to the current SR guidelines. Objective: To investigate whether current SR guidelines need to be changed to support GL and multivocal reviews. Method: We discuss the definitions of GL and the importance of GL and of industry-based field studies in SE SRs. We identify properties of SRs that constrain the material used in SRs: a) the nature of primary studies; b) the requirements of SRs to be auditable, traceable, and reproducible; and explain why these requirements restrict the use of blogs in SRs. Results: SR guidelines have always considered GL as a possible source of primary studies and have never supported exclusion of field studies that incorporate the practitioners’ viewpoint. However, the concept of GL, which was meant to refer to documents that were not formally published, is now being extended to information from sources such as blogs/tweets/Q&A posts. Thus, it might seem that SRs do not make full use of GL because they do not include such information. However, the unit of analysis for an SR is the primary study. Thus, it is not the source but the type of information that is important. Any report describing a rigorous empirical evaluation is a candidate primary study. Whether it is actually included in an SR depends on the SR eligibility criteria. However, any study that cannot be guaranteed to be publicly available in the long term should not be used as a primary study in an SR. This does not prevent such information from being aggregated in surveys of social media and used in the context of evidence-based software engineering (EBSE). Conclusions: Current guidelines for SRs do not require extensions, but their scope needs to be better defined. SE researchers require guidelines for analysing social media posts (e.g., blogs, tweets, vlogs), but these should be based on qualitative primary (not secondary) study guidelines. SE researchers can use mixed-methods SRs and/or the fourth step of EBSE to incorporate findings from social media surveys with those from SRs and to develop industry-relevant recommendations.

Decision-Making Principles for Better Software Design Decisions

A. Tang R. Kazman

1 November 2021

Software design is about making decisions. The quality of design decisions influences the quality of software design. This article describes nine decision-making principles to give software designers a systematic approach for decision making.

Ethical Aspects of ChatGPT in Software Engineering Research

Peng Liang A. Khan M. Akbar

13 Juni 2023

ChatGPT can improve software engineering (SE) research practices by offering efficient, accessible information analysis, and synthesis based on natural language interactions. However, ChatGPT could bring ethical challenges, encompassing plagiarism, privacy, data security, and the risk of generating biased or potentially detrimental data. This research aims to fill the given gap by elaborating on the key elements: motivators, demotivators, and ethical principles of using ChatGPT in SE research. To achieve this objective, we conducted a literature survey, identified the mentioned elements, and presented their relationships by developing a taxonomy. Furthermore, the identified literature-based elements (motivators, demotivators, and ethical principles) were empirically evaluated by conducting a comprehensive questionnaire-based survey involving SE researchers. In addition, we employed an interpretive structure modeling approach to analyze the relationships between the ethical principles of using ChatGPT in SE research and develop a level-based decision model. We further conducted a cross-impact matrix multiplication applied to classification analysis to create a cluster-based decision model. These models aim to help SE researchers devise effective strategies for ethically integrating ChatGPT into SE research by following the identified principles by adopting the motivators and addressing the demotivators. The findings of this study will establish a benchmark for incorporating ChatGPT services in SE research with an emphasis on ethical considerations.

Can GPT-4 Replicate Empirical Software Engineering Research?

Carmen Badea Thomas Zimmermann Christian Bird + 4 lainnya

3 Oktober 2023

Empirical software engineering research on production systems has brought forth a better understanding of the software engineering process for practitioners and researchers alike. However, only a small subset of production systems is studied, limiting the impact of this research. While software engineering practitioners could benefit from replicating research on their own data, this poses its own set of challenges, since performing replications requires a deep understanding of research methodologies and subtle nuances in software engineering data. Given that large language models (LLMs), such as GPT-4, show promise in tackling both software engineering- and science-related tasks, these models could help replicate and thus democratize empirical software engineering research. In this paper, we examine GPT-4’s abilities to perform replications of empirical software engineering research on new data. We specifically study their ability to surface assumptions made in empirical software engineering research methodologies, as well as their ability to plan and generate code for analysis pipelines on seven empirical software engineering papers. We perform a user study with 14 participants with software engineering research expertise, who evaluate GPT-4-generated assumptions and analysis plans (i.e., a list of module specifications) from the papers. We find that GPT-4 is able to surface correct assumptions, but struggles to generate ones that apply common knowledge about software engineering data. In a manual analysis of the generated code, we find that the GPT-4-generated code contains correct high-level logic, given a subset of the methodology. However, the code contains many small implementation-level errors, reflecting a lack of software engineering knowledge. Our findings have implications for leveraging LLMs for software engineering research as well as practitioner data scientists in software teams.

Daftar Referensi

0 referensi

Tidak ada referensi ditemukan.

Artikel yang Mensitasi

0 sitasi

Tidak ada artikel yang mensitasi.