Large Language Models for Software Engineering: A Systematic Literature Review
Abstrak
Large Language Models (LLMs) have significantly impacted numerous domains, including Software Engineering (SE). Many recent publications have explored LLMs applied to various SE tasks. Nevertheless, a comprehensive understanding of the application, effects, and possible limitations of LLMs on SE is still in its early stages. To bridge this gap, we conducted a systematic literature review (SLR) on LLM4SE, with a particular focus on understanding how LLMs can be exploited to optimize processes and outcomes. We selected and analyzed 395 research papers from January 2017 to January 2024 to answer four key research questions (RQs). In RQ1, we categorize different LLMs that have been employed in SE tasks, characterizing their distinctive features and uses. In RQ2, we analyze the methods used in data collection, preprocessing, and application, highlighting the role of well-curated datasets for successful LLM for SE implementation. RQ3 investigates the strategies employed to optimize and evaluate the performance of LLMs in SE. Finally, RQ4 examines the specific SE tasks where LLMs have shown success to date, illustrating their practical contributions to the field. From the answers to these RQs, we discuss the current state-of-the-art and trends, identifying gaps in existing research, and highlighting promising areas for future study. Our artifacts are publicly available at https://github.com/xinyi-hou/LLM4SE_SLR .
Artikel Ilmiah Terkait
Weisong Sun Quanjun Zhang Yang Xie + 5 lainnya
23 Desember 2023
Software Engineering (SE) is the systematic design, development, maintenance, and management of software applications underpinning the digital infrastructure of our modern world. Very recently, the SE community has seen a rapidly increasing number of techniques employing Large Language Models (LLMs) to automate a broad range of SE tasks. Nevertheless, existing information of the applications, effects, and possible limitations of LLMs within SE is still not well-studied. In this paper, we provide a systematic survey to summarize the current state-of-the-art research in the LLM-based SE community. We summarize 62 representative LLMs of Code across three model architectures, 15 pre-training objectives across four categories, and 16 downstream tasks across five categories. We then present a detailed summarization of the recent SE studies for which LLMs are commonly utilized, including 947 studies for 112 specific code-related tasks across five crucial phases within the SE workflow. We also discuss several critical aspects during the integration of LLMs into SE, such as empirical evaluation, benchmarking, security and reliability, domain tuning, compressing and distillation. Finally, we highlight several challenges and potential opportunities on applying LLMs for future SE studies, such as exploring domain LLMs and constructing clean evaluation datasets. Overall, our work can help researchers gain a comprehensive understanding about the achievements of the existing LLM-based SE studies and promote the practical application of these techniques. Our artifacts are publicly available and will be continuously updated at the living repository: https://github.com/iSEngLab/AwesomeLLM4SE.
Mark Harman Shubho Sengupta Angela Fan + 4 lainnya
14 Mei 2023
This paper provides a survey of the emerging area of Large Language Models (LLMs) for Software Engineering (SE). It also sets out open research challenges for the application of LLMs to technical problems faced by software engineers. LLMs' emergent properties bring novelty and creativity with applications right across the spectrum of Software Engineering activities including coding, design, requirements, repair, refactoring, performance improvement, documentation and analytics. However, these very same emergent properties also pose significant technical challenges; we need techniques that can reliably weed out incorrect solutions, such as hallucinations. Our survey reveals the pivotal role that hybrid techniques (traditional SE plus LLMs) have to play in the development and deployment of reliable, efficient and effective LLM-based SE.
Kai-Chun Ning Zibin Zheng Yanlin Wang + 4 lainnya
22 Agustus 2023
Large Language Models (LLMs) have drawn widespread attention and research due to their astounding performance in text generation and reasoning tasks. Derivative products, like ChatGPT, have been extensively deployed and highly sought after. Meanwhile, the evaluation and optimization of LLMs in software engineering tasks, such as code generation, have become a research focus. However, there is still a lack of systematic research on applying and evaluating LLMs in software engineering. Therefore, this paper comprehensively investigate and collate the research and products combining LLMs with software engineering, aiming to answer two questions: (1) What are the current integrations of LLMs with software engineering? (2) Can LLMs effectively handle software engineering tasks? To find the answers, we have collected related literature as extensively as possible from seven mainstream databases and selected 123 timely papers published starting from 2022 for analysis. We have categorized these papers in detail and reviewed the current research status of LLMs from the perspective of seven major software engineering tasks, hoping this will help researchers better grasp the research trends and address the issues when applying LLMs. Meanwhile, we have also organized and presented papers with evaluation content to reveal the performance and effectiveness of LLMs in various software engineering tasks, guiding researchers and developers to optimize.
Ashkan Sami Jasmin Jahić
4 Juni 2024
Large Language Models (LLMs) are finding their way into Software Engineering by assisting with tasks such as code generation. Furthermore, LLMs might have a potential to perform even more complex tasks, such as suggesting architectural design. However, there is a lack of empirical surveys on how software engineering companies use (and plan to use) LLMs and if LLMs truly can provide benefits to software architects. To understand the state of practice considering adoption of LLMs in software engineering, existing challenges, and future trends, we have surveyed 15 different software engineering companies. To understand the ability of LLMs to perform more complex tasks, we report on our experiments with LLM-assisted architectural design. We applied ChatGPT on 5 software projects and in total performed 50 different experiments. Our results capture the state of the practice of LLMs in software engineering and demonstrate how LLMs perform when assisting with (more complex task such as) architectural design. Engineers, architects, and project managers should profit from these results to guide their decision towards targeted adoption of LLMs in their business and engineering domains.
Jieke Shi Zhou Yang David Lo
20 Desember 2024
Large Language Models (LLMs) have recently shown remarkable capabilities in various software engineering tasks, spurring the rapid growth of the Large Language Models for Software Engineering (LLM4SE) area. However, limited attention has been paid to developing efficient LLM4SE techniques that demand minimal computational cost, time, and memory resources, as well as green LLM4SE solutions that reduce energy consumption, water usage, and carbon emissions. This paper aims to redirect the focus of the research community towards the efficiency and greenness of LLM4SE, while also sharing potential research directions to achieve this goal. It commences with a brief overview of the significance of LLM4SE and highlights the need for efficient and green LLM4SE solutions. Subsequently, the paper presents a vision for a future where efficient and green LLM4SE revolutionizes the LLM-based software engineering tool landscape, benefiting various stakeholders, including industry, individual practitioners, and society. The paper then delineates a roadmap for future research, outlining specific research paths and potential solutions for the research community to pursue. While not intended to be a definitive guide, the paper aims to inspire further progress, with the ultimate goal of establishing efficient and green LLM4SE as a central element in the future of software engineering.
Daftar Referensi
1 referensiArtikel yang Mensitasi
2 sitasiTrustworthy and Synergistic Artificial Intelligence for Software Engineering: Vision and Roadmaps
David Lo
14 Mei 2023
For decades, much software engineering research has been dedicated to devising automated solutions aimed at enhancing developer productivity and elevating software quality. The past two decades have witnessed an unparalleled surge in the development of intelligent solutions tailored for software engineering tasks. This momentum established the Artificial Intelligence for Software Engineering (AI4SE) area, which has swiftly become one of the most active and popular areas within the software engiueering field. This Future of Software Engineering (FoSE) paper navigates through several focal points. It commences with a succinct introduction and history of AI4SE. Thereafter, it underscores the core challenges inherent to AI4SE, particularly highlighting the need to realize trustworthy and synergistic AI4SE. Progressing, the paper paints a vision for the potential leaps achievable if AI4SE's key challenges are surmounted, suggesting a transition toward Software Engineering 2.0. Two strategic roadmaps are then laid out: one centered on realizing trustworthy AI4SE, and the other on fostering synergistic AI4SE. While this paper may not serve as a conclusive guide, its intent is to catalyze further progress. The ultimate aspiration is to position AI4SE as a linchpin in redefining the horizons of software engineering, propelling us toward Software Engineering 2.0.
Can GPT-4 Replicate Empirical Software Engineering Research?
Carmen Badea Thomas Zimmermann + 5 lainnya
3 Oktober 2023
Empirical software engineering research on production systems has brought forth a better understanding of the software engineering process for practitioners and researchers alike. However, only a small subset of production systems is studied, limiting the impact of this research. While software engineering practitioners could benefit from replicating research on their own data, this poses its own set of challenges, since performing replications requires a deep understanding of research methodologies and subtle nuances in software engineering data. Given that large language models (LLMs), such as GPT-4, show promise in tackling both software engineering- and science-related tasks, these models could help replicate and thus democratize empirical software engineering research. In this paper, we examine GPT-4’s abilities to perform replications of empirical software engineering research on new data. We specifically study their ability to surface assumptions made in empirical software engineering research methodologies, as well as their ability to plan and generate code for analysis pipelines on seven empirical software engineering papers. We perform a user study with 14 participants with software engineering research expertise, who evaluate GPT-4-generated assumptions and analysis plans (i.e., a list of module specifications) from the papers. We find that GPT-4 is able to surface correct assumptions, but struggles to generate ones that apply common knowledge about software engineering data. In a manual analysis of the generated code, we find that the GPT-4-generated code contains correct high-level logic, given a subset of the methodology. However, the code contains many small implementation-level errors, reflecting a lack of software engineering knowledge. Our findings have implications for leveraging LLMs for software engineering research as well as practitioner data scientists in software teams.