ChatGPT: A Study on its Utility for Ubiquitous Software Engineering Tasks

Sourav Mazumdar G. Sridhara Ranjani H.G.

Abstrak

ChatGPT (Chat Generative Pre-trained Transformer) is a chatbot launched by OpenAI on November 30, 2022. OpenAI's GPT-3 family of large language models serve as the foundation for ChatGPT. ChatGPT is fine-tuned with both supervised and reinforcement learning techniques and has received widespread attention for its articulate responses across diverse domains of knowledge. In this study, we explore how ChatGPT can be used to help with common software engineering tasks. Many of the ubiquitous tasks covering the breadth of software engineering such as ambiguity resolution in software requirements, method name suggestion, test case prioritization, code review, log summarization can potentially be performed using ChatGPT. In this study, we explore fifteen common software engineering tasks using ChatGPT. We juxtapose and analyze ChatGPT's answers with the respective state of the art outputs (where available) and/or human expert ground truth. Our experiments suggest that for many tasks, ChatGPT does perform credibly and the response from it is detailed and often better than the human expert output or the state of the art output. However, for a few other tasks, ChatGPT in its present form provides incorrect answers and hence is not suited for such tasks.

Artikel Ilmiah Terkait

Accelerating Software Development Using Generative AI: ChatGPT Case Study

Piyush Kulkarni Vinay Kulkarni Akanksha Somase + 1 lainnya

22 Februari 2024

The Software Development Life Cycle (SDLC) comprises multiple phases, each requiring Subject Matter Experts (SMEs) with phase-specific skills. The efficacy and quality of deliverables of each phase are skill dependent. In recent times, Generative AI techniques, including Large-scale Language Models (LLMs) like GPT, have become significant players in software engineering. These models, trained on extensive text data, can offer valuable contributions to software development. Interacting with LLMs involves feeding prompts with the context information and guiding the generation of textual responses. The quality of the response is dependent on the quality of the prompt given. This paper proposes a systematic prompting approach based on meta-model concepts for SDLC phases. The approach is validated using ChatGPT for small but complex business application development. We share the approach and our experience, learnings, benefits obtained, and the challenges encountered while applying the approach using ChatGPT. Our experience indicates that Generative AI techniques, such as ChatGPT, have the potential to reduce the skills barrier and accelerate software development substantially.

ChatGPT in Action: Analyzing Its Use in Software Development

Costain Nachuma A. I. Champa Md. Fazle Rabbi + 1 lainnya

15 April 2024

The emergence of AI tools such as ChatGPT is being used to assist with software development, but little is known of how developers utilize these tools as well as the capabilities of these tools in software engineering tasks. Using the DevGPT dataset, we conduct quantitative analyses of the tasks developers seek assistance from ChatGPT and how effectively ChatGPT addresses them. We also examine the impact of initial prompt quality on conversation length. The findings reveal where ChatGPT is most and least suited to assist in the identified 12 software development tasks. The insights from this research would guide the software developers, researchers, and AI tool providers in optimizing these tools for more effective programming aid.

ChatGPT and Software Testing Education: Promises & Perils

Sajed Jalil Suzzana Rafi Kevin Moran + 2 lainnya

7 Februari 2023

Over the past decade, predictive language modeling for code has proven to be a valuable tool for enabling new forms of automation for developers. More recently, we have seen the ad-vent of general purpose "large language models", based on neural transformer architectures, that have been trained on massive datasets of human written text, which includes code and natural language. However, despite the demonstrated representational power of such models, interacting with them has historically been constrained to specific task settings, limiting their general applicability. Many of these limitations were recently overcome with the introduction of ChatGPT, a language model created by OpenAI and trained to operate as a conversational agent, enabling it to answer questions and respond to a wide variety of commands from end users.The introduction of models, such as ChatGPT, has already spurred fervent discussion from educators, ranging from fear that students could use these AI tools to circumvent learning, to excitement about the new types of learning opportunities that they might unlock. However, given the nascent nature of these tools, we currently lack fundamental knowledge related to how well they perform in different educational settings, and the potential promise (or danger) that they might pose to traditional forms of instruction. As such, in this paper, we examine how well ChatGPT performs when tasked with answering common questions in a popular software testing curriculum. We found that given its current capabilities, ChatGPT is able to respond to 77.5% of the questions we examined and that, of these questions, it is able to provide correct or partially correct answers in 55.6% of cases, provide correct or partially correct explanations of answers in 53.0% of cases, and that prompting the tool in a shared question context leads to a marginally higher rate of correct answers and explanations. Based on these findings, we discuss the potential promises and perils related to the use of ChatGPT by students and instructors.

Prompt Engineering or Fine Tuning: An Empirical Assessment of Large Language Models in Automated Software Engineering Tasks

Maleknaz Nayebi Song Wang Clark Tang + 3 lainnya

2023

The advancements in Large Language Models (LLMs) have opened up new opportunities for Automated Software Engineering (ASE). Two orthogonal approaches have been widely used to customize LLMs for ASE tasks, i.e., prompt engineering and fine-tuning. Prompt engineering-based approaches leverage different prompting strategies to query LLMs (e.g., ChatGPT ) to automate a task such as code generation, while fine-tuning-based approaches further train the pre-trained models (e.g., CodeBERT ) on customized data related to the down-stream task (in this example, code generation datasets). However, to date, there is no comprehensive and in-depth analysis of these two orthogonal approaches, in the ASE literature. In this paper, we investigate the effectiveness of state-of-the-art LLM, i.e., GPT-4 , with three different prompting engineering techniques (i.e., basic prompting, in-context learning, and task-specific prompting) against 18 fine-tuned LLMs on three typical ASE tasks, i.e., code generation, code summarization, and code translation. Our quantitative analysis of these prompting strategies suggests that prompt engineering GPT-4 cannot necessarily and significantly outperform fine-tuning smaller/older LLMs in all three tasks. For comment generation, GPT-4 with the best prompting strategy (i.e., task-specific prompt) had outperformed the first-ranked fine-tuned model by 8.33% points on average in BLEU. However, for code generation, the first-ranked fine-tuned model outperforms GPT-4 with best prompting by 16.61% and 28.3% points, on average in BLEU. For code translation, GPT-4 and fine-tuned baselines tie as they outperform each other on different translation tasks. To explore the impact of different prompting strategies, we conducted a user study with 27 graduate students and 10 industry practitioners. From our qualitative analysis, we find that the GPT-4 with conversational prompts (i

Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code

Jianguo Li Zi Gong Rui Wang + 5 lainnya

14 November 2023

In this work we systematically review the recent advancements in software engineering with language models, covering 70+ models, 40+ evaluation tasks, 180+ datasets, and 900 related works. Unlike previous works, we integrate software engineering (SE) with natural language processing (NLP) by discussing the perspectives of both sides: SE applies language models for development automation, while NLP adopts SE tasks for language model evaluation. We break down code processing models into general language models represented by the GPT family and specialized models that are specifically pretrained on code, often with tailored objectives. We discuss the relations and differences between these models, and highlight the historical transition of code modeling from statistical models and RNNs to pretrained Transformers and LLMs, which is exactly the same course that had been taken by NLP. We also go beyond programming and review LLMs' application in other software engineering activities including requirement engineering, testing, deployment, and operations in an endeavor to provide a global view of NLP in SE, and identify key challenges and potential future directions in this domain. We keep the survey open and updated on GitHub at https://github.com/codefuse-ai/Awesome-Code-LLM.

Daftar Referensi

0 referensi

Tidak ada referensi ditemukan.

Artikel yang Mensitasi

0 sitasi

Tidak ada artikel yang mensitasi.