Can AI serve as a substitute for human subjects in software engineering research?
Abstrak
Research within sociotechnical domains, such as software engineering, fundamentally requires the human perspective. Nevertheless, traditional qualitative data collection methods suffer from difficulties in participant recruitment, scaling, and labor intensity. This vision paper proposes a novel approach to qualitative data collection in software engineering research by harnessing the capabilities of artificial intelligence (AI), especially large language models (LLMs) like ChatGPT and multimodal foundation models. We explore the potential of AI-generated synthetic text as an alternative source of qualitative data, discussing how LLMs can replicate human responses and behaviors in research settings. We discuss AI applications in emulating humans in interviews, focus groups, surveys, observational studies, and user evaluations. We discuss open problems and research opportunities to implement this vision. In the future, an integrated approach where both AI and human-generated data coexist will likely yield the most effective outcomes.
Artikel Ilmiah Terkait
Piper Vasicek Kevin Seppi Courtni Byun
19 April 2023
Machine Learning models have become more advanced than could have been supposed even a few years ago, often surpassing human performance on many tasks. Large language models (LLM) can produce text indistinguishable from human-produced text. This begs the question, how necessary are humans - even for tasks where humans appear indispensable? Qualitative Analysis (QA) is integral to human-computer interaction research, requiring both human-produced data and human analysis of that data to illuminate human opinions about and experiences with technology. We use GPT-3 and ChatGPT to replace human analysis and then to dispense with human-produced text altogether. We find GPT-3 is capable of automatically identifying themes and generating nuanced analyses of qualitative data arguably similar to those written by human researchers. We also briefly ponder philosophical implications of this research.
Piyush Kulkarni Vinay Kulkarni Akanksha Somase + 1 lainnya
22 Februari 2024
The Software Development Life Cycle (SDLC) comprises multiple phases, each requiring Subject Matter Experts (SMEs) with phase-specific skills. The efficacy and quality of deliverables of each phase are skill dependent. In recent times, Generative AI techniques, including Large-scale Language Models (LLMs) like GPT, have become significant players in software engineering. These models, trained on extensive text data, can offer valuable contributions to software development. Interacting with LLMs involves feeding prompts with the context information and guiding the generation of textual responses. The quality of the response is dependent on the quality of the prompt given. This paper proposes a systematic prompting approach based on meta-model concepts for SDLC phases. The approach is validated using ChatGPT for small but complex business application development. We share the approach and our experience, learnings, benefits obtained, and the challenges encountered while applying the approach using ChatGPT. Our experience indicates that Generative AI techniques, such as ChatGPT, have the potential to reduce the skills barrier and accelerate software development substantially.
Mazen Mohamad Ranim Khojah Philipp Leitner + 1 lainnya
23 April 2024
Large Language Models (LLMs) are frequently discussed in academia and the general public as support tools for virtually any use case that relies on the production of text, including software engineering. Currently, there is much debate, but little empirical evidence, regarding the practical usefulness of LLM-based tools such as ChatGPT for engineers in industry. We conduct an observational study of 24 professional software engineers who have been using ChatGPT over a period of one week in their jobs, and qualitatively analyse their dialogues with the chatbot as well as their overall experience (as captured by an exit survey). We find that rather than expecting ChatGPT to generate ready-to-use software artifacts (e.g., code), practitioners more often use ChatGPT to receive guidance on how to solve their tasks or learn about a topic in more abstract terms. We also propose a theoretical framework for how the (i) purpose of the interaction, (ii) internal factors (e.g., the user's personality), and (iii) external factors (e.g., company policy) together shape the experience (in terms of perceived usefulness and trust). We envision that our framework can be used by future research to further the academic discussion on LLM usage by software engineering practitioners, and to serve as a reference point for the design of future empirical LLM research in this domain.
Sajed Jalil Suzzana Rafi Kevin Moran + 2 lainnya
7 Februari 2023
Over the past decade, predictive language modeling for code has proven to be a valuable tool for enabling new forms of automation for developers. More recently, we have seen the ad-vent of general purpose "large language models", based on neural transformer architectures, that have been trained on massive datasets of human written text, which includes code and natural language. However, despite the demonstrated representational power of such models, interacting with them has historically been constrained to specific task settings, limiting their general applicability. Many of these limitations were recently overcome with the introduction of ChatGPT, a language model created by OpenAI and trained to operate as a conversational agent, enabling it to answer questions and respond to a wide variety of commands from end users.The introduction of models, such as ChatGPT, has already spurred fervent discussion from educators, ranging from fear that students could use these AI tools to circumvent learning, to excitement about the new types of learning opportunities that they might unlock. However, given the nascent nature of these tools, we currently lack fundamental knowledge related to how well they perform in different educational settings, and the potential promise (or danger) that they might pose to traditional forms of instruction. As such, in this paper, we examine how well ChatGPT performs when tasked with answering common questions in a popular software testing curriculum. We found that given its current capabilities, ChatGPT is able to respond to 77.5% of the questions we examined and that, of these questions, it is able to provide correct or partially correct answers in 55.6% of cases, provide correct or partially correct explanations of answers in 53.0% of cases, and that prompting the tool in a shared question context leads to a marginally higher rate of correct answers and explanations. Based on these findings, we discuss the potential promises and perils related to the use of ChatGPT by students and instructors.
Tahsinur Rahman M. Akbar Muhammad Hamza + 1 lainnya
17 Desember 2023
This paper investigates the dynamics of human-AI collaboration in software engineering, focusing on the use of ChatGPT. Through a thematic analysis of a hands-on workshop in which 22 professional software engineers collaborated for three hours with ChatGPT, we explore the transition of AI from a mere tool to a collaborative partner. The study identifies key themes such as the evolving nature of human-AI interaction, the capabilities of AI in software engineering tasks, and the challenges and limitations of integrating AI in this domain. The findings show that while AI, particularly ChatGPT, improves the efficiency of code generation and optimization, human oversight remains crucial, especially in areas requiring complex problem-solving and security considerations. This research contributes to the theoretical understanding of human-AI collaboration in software engineering and provides practical insights for effectively integrating AI tools into development processes. It highlights the need for clear role allocation, effective communication, and balanced AI-human collaboration to realize the full potential of AI in software engineering.
Daftar Referensi
0 referensiTidak ada referensi ditemukan.
Artikel yang Mensitasi
0 sitasiTidak ada artikel yang mensitasi.