Recommending API Function Calls and Code Snippets to Support Software Development

P. T. Nguyen Claudio Di Sipio Massimiliano Di Penta + 2 penulis

Abstrak

Software development activity has reached a high degree of complexity, guided by the heterogeneity of the components, data sources, and tasks. The proliferation of open-source software (OSS) repositories has stressed the need to reuse available software artifacts efficiently. To this aim, it is necessary to explore approaches to mine data from software repositories and leverage it to produce helpful recommendations. We designed and implemented FOCUS as a novel approach to provide developers with API calls and source code while they are programming. The system works on the basis of a context-aware collaborative filtering technique to extract API usages from OSS projects. In this work, we show the suitability of FOCUS for Android programming by evaluating it on a dataset of 2,600 mobile apps. The empirical evaluation results show that our approach outperforms two state-of-the-art API recommenders, UP-Miner and PAM, in terms of prediction accuracy. We also point out that there is no significant relationship between the categories for apps defined in Google Play and their API usages. Finally, we show that participants of a user study positively perceive the API and source code recommended by FOCUS as relevant to the current development context.

Artikel Ilmiah Terkait

Googling for Software Development: What Developers Search For and What They Find

André C. Hora

1 Mei 2021

Developers often search for software resources on the web. In practice, instead of going directly to websites (e.g., Stack Overflow), they rely on search engines (e.g., Google). Despite this being a common activity, we are not yet aware of what developers search from the perspective of popular software development websites and what search results are returned. With this knowledge, we can understand real-world queries, developers’ needs, and the query impact on the search results. In this paper, we provide an empirical study to understand what developers search on the web and what they find. We assess 1.3M queries to popular programming websites and we perform thousands of queries on Google to explore search results. We find that (i) developers’ queries typically start with keywords (e.g., Python, Android, etc.), are short (3 words), tend to omit functional words, and are similar among each other; (ii) minor changes to queries do not largely affect the Google search results, however, some cosmetic changes may have a non-negligible impact; and (iii) search results are dominated by Stack Overflow, but YouTube is also a relevant source nowadays. We conclude by presenting detailed implications for researchers and developers.

Analysing app reviews for software engineering: a systematic literature review

A. Perini Jacek Dąbrowski Emmanuel Letier + 1 lainnya

20 Januari 2022

App reviews found in app stores can provide critically valuable information to help software engineers understand user requirements and to design, debug, and evolve software products. Over the last ten years, a vast amount of research has been produced to study what useful information might be found in app reviews, and how to mine and organise such information as efficiently as possible. This paper presents a comprehensive survey of this research, covering 182 papers published between 2012 and 2020. This survey classifies app review analysis not only in terms of mined information and applied data mining techniques but also, and most importantly, in terms of supported software engineering activities. The survey also reports on the quality and results of empirical evaluation of existing techniques and identifies important avenues for further research. This survey can be of interest to researchers and commercial organisations developing app review analysis techniques and to software engineers considering to use app review analysis.

SoTaNa: The Open-Source Software Development Assistant

B. Chen Yanlin Wang Dongmei Zhang + 6 lainnya

25 Agustus 2023

Software development plays a crucial role in driving innovation and efficiency across modern societies. To meet the demands of this dynamic field, there is a growing need for an effective software development assistant. However, existing large language models represented by ChatGPT suffer from limited accessibility, including training data and model weights. Although other large open-source models like LLaMA have shown promise, they still struggle with understanding human intent. In this paper, we present SoTaNa, an open-source software development assistant. SoTaNa utilizes ChatGPT to generate high-quality instruction-based data for the domain of software engineering and employs a parameter-efficient fine-tuning approach to enhance the open-source foundation model, LLaMA. We evaluate the effectiveness of \our{} in answering Stack Overflow questions and demonstrate its capabilities. Additionally, we discuss its capabilities in code summarization and generation, as well as the impact of varying the volume of generated data on model performance. Notably, SoTaNa can run on a single GPU, making it accessible to a broader range of researchers. Our code, model weights, and data are public at \url{https://github.com/DeepSoftwareAnalytics/SoTaNa}.

The State of the ML-universe: 10 Years of Artificial Intelligence & Machine Learning Software Development on GitHub

Thomas Zimmermann Danielle Gonzalez Nachiappan Nagappan

1 Mei 2020

In the last few years, artificial intelligence (AI) and machine learning (ML) have become ubiquitous terms. These powerful techniques have escaped obscurity in academic communities with the recent onslaught of AI & ML tools, frameworks, and libraries that make these techniques accessible to a wider audience of developers. As a result, applying AI & ML to solve existing and emergent problems is an increasingly popular practice. However, little is known about this domain from the software engineering perspective. Many AI & ML tools and applications are open source, hosted on platforms such as GitHub that provide rich tools for large-scale distributed software development. Despite widespread use and popularity, these repositories have never been examined as a community to identify unique properties, development patterns, and trends. In this paper, we conducted a large-scale empirical study of AI & ML Tool (700) and Application (4,524) repositories hosted on GitHub to develop such a characterization. While not the only platform hosting AI & ML development, GitHub facilitates collecting a rich data set for each repository with high traceability between issues, commits, pull requests and users. To compare the AI & ML community to the wider population of repositories, we also analyzed a set of 4,101 unrelated repositories. We enhance this characterization with an elaborate study of developer workflow that measures collaboration and autonomy within a repository. We've captured key insights of this community's 10 year history such as it's primary language (Python) and most popular repositories (Tensorflow, Tesseract). Our findings show the AI & ML community has unique characteristics that should be accounted for in future research.

AutoML from Software Engineering Perspective: Landscapes and Challenges

Chao Wang Minghui Zhou Zhenpeng Chen

1 Mei 2023

Machine learning (ML) has been widely adopted in modern software, but the manual configuration of ML (e.g., hyper-parameter configuration) poses a significant challenge to software developers. Therefore, automated ML (AutoML), which seeks the optimal configuration of ML automatically, has received increasing attention from the software engineering community. However, to date, there is no comprehensive understanding of how AutoML is used by developers and what challenges developers encounter in using AutoML for software development. To fill this knowledge gap, we conduct the first study on understanding the use and challenges of AutoML from software developers’ perspective. We collect and analyze 1,554 AutoML downstream repositories, 769 AutoML-related Stack Overflow questions, and 1,437 relevant GitHub issues. The results suggest the increasing popularity of AutoML in a wide range of topics, but also the lack of relevant expertise. We manually identify specific challenges faced by developers for AutoML-enabled software. Based on the results, we derive a series of implications for AutoML framework selection, framework development, and research.

Daftar Referensi

0 referensi

Tidak ada referensi ditemukan.

Artikel yang Mensitasi

0 sitasi

Tidak ada artikel yang mensitasi.