CodeHelp: Using Large Language Models with Guardrails for Scalable Support in Programming Classes

Jaromír Šavelka Paul Denny Brad E. Sheese + 1 penulis

Abstrak

Computing educators face significant challenges in providing timely support to students, especially in large class settings. Large language models (LLMs) have emerged recently and show great promise for providing on-demand help at a large scale, but there are concerns that students may over-rely on the outputs produced by these models. In this paper, we introduce CodeHelp, a novel LLM-powered tool designed with guardrails to provide on-demand assistance to programming students without directly revealing solutions. We detail the design of the tool, which incorporates a number of useful features for instructors, and elaborate on the pipeline of prompting strategies we use to ensure generated outputs are suitable for students. To evaluate CodeHelp, we deployed it in a first-year computer and data science course with 52 students and collected student interactions over a 12-week period. We examine students’ usage patterns and perceptions of the tool, and we report reflections from the course instructor and a series of recommendations for classroom use. Our findings suggest that CodeHelp is well-received by students who especially value its availability and help with resolving errors, and that for instructors it is easy to deploy and complements, rather than replaces, the support that they provide to students.

Artikel Ilmiah Terkait

Exploring the Potential of Large Language Models to Generate Formative Programming Feedback

Dominic Lohr H. Keuning Natalie Kiesler

31 Agustus 2023

Ever since the emergence of large language models (LLMs) and related applications, such as ChatGPT, its performance and error analysis for programming tasks have been subject to research. In this work-in-progress paper, we explore the potential of such LLMs for computing educators and learners, as we analyze the feedback it generates to a given input containing program code. In particular, we aim at (1) exploring how an LLM like ChatGPT responds to students seeking help with their introductory programming tasks, and (2) identifying feedback types in its responses. To achieve these goals, we used students' programming sequences from a dataset gathered within a CS1 course as input for ChatGPT along with questions required to elicit feedback and correct solutions. The results show that ChatGPT performs reasonably well for some of the introductory programming tasks and student errors, which means that students can potentially benefit. However, educators should provide guidance on how to use the provided feedback, as it can contain misleading information for novices.

Next-Step Hint Generation for Introductory Programming Using Large Language Models

H. Keuning Lianne Roest J. Jeuring

3 Desember 2023

Large Language Models possess skills such as answering questions, writing essays or solving programming exercises. Since these models are easily accessible, researchers have investigated their capabilities and risks for programming education. This work explores how LLMs can contribute to programming education by supporting students with automated next-step hints. We investigate prompt practices that lead to effective next-step hints and use these insights to build our StAP-tutor. We evaluate this tutor by conducting an experiment with students, and performing expert assessments. Our findings show that most LLM-generated feedback messages describe one specific next step and are personalised to the student’s code and approach. However, the hints may contain misleading information and lack sufficient detail when students approach the end of the assignment. This work demonstrates the potential for LLM-generated feedback, but further research is required to explore its practical implementation.

On ChatGPT: Perspectives from Software Engineering Students

Cemal Yilmaz Orçun Çetin Khadija Hanifi

22 Oktober 2023

ChatGPT, an increasingly popular Large Language Model (LLM), has found widespread acceptance, especially among the younger generation, who rely on it for various tasks, such as comprehending complex course materials and tackling homework assignments. This surge in interest has drawn the attention of researchers, leading to numerous studies that delve into the advantages and disadvantages of the upcoming LLM dominant era. In our research, we explore the influence of ChatGPT and similar models on the field of software engineering, specifically from the perspective of software engineering students. Our main objective is to gain valuable insights into their usage habits and opinions through a comprehensive survey. The survey encompassed diverse questions, addressing the specific areas where ChatGPT was utilized for assistance and gathering students’ reflections on each aspect. We found that ChatGPT has garnered widespread acceptance among software engineering students, with 93% of them utilizing it for their projects. These students expressed satisfaction with the level of assistance provided, and most intend to continue using it as a valuable tool in their work. During our investigation, we also assessed the students’ awareness of the underlying technologies behind ChatGPT. Approximately half of the students demonstrated awareness of these technologies, while 38.7% had made extra efforts to explore prompt engineering to enhance ChatGPT’s productivity. However, an important finding was that 90.6% of the students reported experiencing hallucinations during their interactions with ChatGPT. These hallucinations were shared as examples, raising significant concerns that warrant further exploration and mitigation. Moreover, we delved into potential improvements and gathered valuable recommendations, which could help ChatGPT to become even more effective and dependable in its applications.

An Empirical Study on Usage and Perceptions of LLMs in a Software Engineering Project

Guanlin Wang Sanka Rasnayaka Ganesh Neelakanta Iyer + 1 lainnya

29 Januari 2024

Large Language Models (LLMs) represent a leap in artificial intelligence, excelling in tasks using human language(s). Although the main focus of general-purpose LLMs is not code generation, they have shown promising results in the domain. However, the usefulness of LLMs in an academic software engineering project has not been fully explored yet. In this study, we explore the usefulness of LLMs for 214 students working in teams consisting of up to six members. Notably, in the academic course through which this study is conducted, students were encouraged to integrate LLMs into their development tool-chain, in contrast to most other academic courses that explicitly prohibit the use of LLMs.In this paper, we analyze the AI-generated code, prompts used for code generation, and the human intervention levels to integrate the code into the code base. We also conduct a perception study to gain insights into the perceived usefulness, influencing factors, and future outlook of LLM from a computer science student’s perspective. Our findings suggest that LLMs can play a crucial role in the early stages of software development, especially in generating foundational code structures, and helping with syntax and error debugging. These insights provide us with a framework on how to effectively utilize LLMs as a tool to enhance the productivity of software engineering students, and highlight the necessity of shifting the educational focus toward preparing students for successful human-AI collaboration.CCS CONCEPTS• Software and its engineering → Software development techniques; • Applied computing → Education.

Large Language Models in Introductory Programming Education: ChatGPT's Performance and Implications for Assessments

Natalie Kiesler D. Schiffner

15 Agustus 2023

This paper investigates the performance of the Large Language Models (LLMs) ChatGPT-3.5 and GPT-4 in solving introductory programming tasks. Based on the performance, implications for didactic scenarios and assessment formats utilizing LLMs are derived. For the analysis, 72 Python tasks for novice programmers were selected from the free site CodingBat. Full task descriptions were used as input to the LLMs, while the generated replies were evaluated using CodingBat's unit tests. In addition, the general availability of textual explanations and program code was analyzed. The results show high scores of 94.4 to 95.8% correct responses and reliable availability of textual explanations and program code, which opens new ways to incorporate LLMs into programming education and assessment.

Daftar Referensi

0 referensi

Tidak ada referensi ditemukan.

Artikel yang Mensitasi

0 sitasi

Tidak ada artikel yang mensitasi.