DOI: 10.1145/3639474.3640052

Terbit pada 10 Maret 2024 Pada 2024 IEEE/ACM 46th International Conference on Software Engineering: Software Engineering Education and Training (ICSE-SEET)

LLMs Still Can't Avoid Instanceof: An Investigation Into GPT-3.5, GPT-4 and Bard's Capacity to Handle Object-Oriented Programming Assignments

B. Cipriano P. Alves

Lihat di Semantic Scholar

Abstrak

Large Language Models (LLMs) have emerged as promising tools to assist students while solving programming assignments. However, object-oriented programming (OOP), with its inherent complexity involving the identification of entities, relationships, and responsibilities, is not yet mastered by these tools. Contrary to introductory programming exercises, there exists a research gap with regard to the behavior of LLMs in OOP contexts. In this study, we experimented with three prominent LLMs - GPT-3.5, GPT-4, and Bard - to solve real-world OOP exercises used in educational settings, subsequently validating their solutions using an Automatic Assessment Tool (AAT). The findings revealed that while the models frequently achieved mostly working solutions to the exercises, they often overlooked the best practices of OOP. GPT-4 stood out as the most proficient, followed by GPT-3.5, with Bard trailing last. We advocate for a renewed emphasis on code quality when employing these models and explore the potential of pairing LLMs with AATs in pedagogical settings. In conclusion, while GPT-4 show-cases promise, the deployment of these models in OOP education still mandates supervision.

Artikel Ilmiah Terkait

GPT-3 vs Object Oriented Programming Assignments: An Experience Report

B. Cipriano P. Alves

29 Juni 2023

Recent studies show that AI-driven code generation tools, such as Large Language Models, are able to solve most of the problems usually presented in introductory programming classes. However, it is still unknown how they cope with Object Oriented Programming assignments, where the students are asked to design and implement several interrelated classes (either by composition or inheritance) that follow a set of best-practices. Since the majority of the exercises in these tools' training dataset are written in English, it is also unclear how well they function with exercises published in other languages. In this paper, we report our experience using GPT-3 to solve 6 real-world tasks used in an Object Oriented Programming course at a Portuguese University and written in Portuguese. Our observations, based on an objective evaluation of the code, performed by an open-source Automatic Assessment Tool, show that GPT-3 is able to interpret and handle direct functional requirements, however it tends not to give the best solution in terms of object oriented design. We perform a qualitative analysis of GPT-3's output, and gather a set of recommendations for computer science educators, since we expect students to use and abuse this tool in their academic work.

Large Language Models in Introductory Programming Education: ChatGPT's Performance and Implications for Assessments

Natalie Kiesler D. Schiffner

15 Agustus 2023

This paper investigates the performance of the Large Language Models (LLMs) ChatGPT-3.5 and GPT-4 in solving introductory programming tasks. Based on the performance, implications for didactic scenarios and assessment formats utilizing LLMs are derived. For the analysis, 72 Python tasks for novice programmers were selected from the free site CodingBat. Full task descriptions were used as input to the LLMs, while the generated replies were evaluated using CodingBat's unit tests. In addition, the general availability of textual explanations and program code was analyzed. The results show high scores of 94.4 to 95.8% correct responses and reliable availability of textual explanations and program code, which opens new ways to incorporate LLMs into programming education and assessment.

On ChatGPT: Perspectives from Software Engineering Students

Cemal Yilmaz Orçun Çetin Khadija Hanifi

22 Oktober 2023

ChatGPT, an increasingly popular Large Language Model (LLM), has found widespread acceptance, especially among the younger generation, who rely on it for various tasks, such as comprehending complex course materials and tackling homework assignments. This surge in interest has drawn the attention of researchers, leading to numerous studies that delve into the advantages and disadvantages of the upcoming LLM dominant era. In our research, we explore the influence of ChatGPT and similar models on the field of software engineering, specifically from the perspective of software engineering students. Our main objective is to gain valuable insights into their usage habits and opinions through a comprehensive survey. The survey encompassed diverse questions, addressing the specific areas where ChatGPT was utilized for assistance and gathering students’ reflections on each aspect. We found that ChatGPT has garnered widespread acceptance among software engineering students, with 93% of them utilizing it for their projects. These students expressed satisfaction with the level of assistance provided, and most intend to continue using it as a valuable tool in their work. During our investigation, we also assessed the students’ awareness of the underlying technologies behind ChatGPT. Approximately half of the students demonstrated awareness of these technologies, while 38.7% had made extra efforts to explore prompt engineering to enhance ChatGPT’s productivity. However, an important finding was that 90.6% of the students reported experiencing hallucinations during their interactions with ChatGPT. These hallucinations were shared as examples, raising significant concerns that warrant further exploration and mitigation. Moreover, we delved into potential improvements and gathered valuable recommendations, which could help ChatGPT to become even more effective and dependable in its applications.

CodeHelp: Using Large Language Models with Guardrails for Scalable Support in Programming Classes

Jaromír Šavelka Paul Denny Brad E. Sheese + 1 lainnya

14 Agustus 2023

Computing educators face significant challenges in providing timely support to students, especially in large class settings. Large language models (LLMs) have emerged recently and show great promise for providing on-demand help at a large scale, but there are concerns that students may over-rely on the outputs produced by these models. In this paper, we introduce CodeHelp, a novel LLM-powered tool designed with guardrails to provide on-demand assistance to programming students without directly revealing solutions. We detail the design of the tool, which incorporates a number of useful features for instructors, and elaborate on the pipeline of prompting strategies we use to ensure generated outputs are suitable for students. To evaluate CodeHelp, we deployed it in a first-year computer and data science course with 52 students and collected student interactions over a 12-week period. We examine students’ usage patterns and perceptions of the tool, and we report reflections from the course instructor and a series of recommendations for classroom use. Our findings suggest that CodeHelp is well-received by students who especially value its availability and help with resolving errors, and that for instructors it is easy to deploy and complements, rather than replaces, the support that they provide to students.

An Empirical Study on Usage and Perceptions of LLMs in a Software Engineering Project

Guanlin Wang Sanka Rasnayaka Ganesh Neelakanta Iyer + 1 lainnya

29 Januari 2024

Large Language Models (LLMs) represent a leap in artificial intelligence, excelling in tasks using human language(s). Although the main focus of general-purpose LLMs is not code generation, they have shown promising results in the domain. However, the usefulness of LLMs in an academic software engineering project has not been fully explored yet. In this study, we explore the usefulness of LLMs for 214 students working in teams consisting of up to six members. Notably, in the academic course through which this study is conducted, students were encouraged to integrate LLMs into their development tool-chain, in contrast to most other academic courses that explicitly prohibit the use of LLMs.In this paper, we analyze the AI-generated code, prompts used for code generation, and the human intervention levels to integrate the code into the code base. We also conduct a perception study to gain insights into the perceived usefulness, influencing factors, and future outlook of LLM from a computer science student’s perspective. Our findings suggest that LLMs can play a crucial role in the early stages of software development, especially in generating foundational code structures, and helping with syntax and error debugging. These insights provide us with a framework on how to effectively utilize LLMs as a tool to enhance the productivity of software engineering students, and highlight the necessity of shifting the educational focus toward preparing students for successful human-AI collaboration.CCS CONCEPTS• Software and its engineering → Software development techniques; • Applied computing → Education.

Daftar Referensi

0 referensi

Tidak ada referensi ditemukan.

Artikel yang Mensitasi

0 sitasi

Tidak ada artikel yang mensitasi.