DOI: 10.1145/3545945.3569770

Terbit pada 20 Oktober 2022 Pada Technical Symposium on Computer Science Education

Using Large Language Models to Enhance Programming Error Messages

Arto Hellas Sami Sarsa Juho Leinonen + 4 penulis

Abstrak

A key part of learning to program is learning to understand programming error messages. They can be hard to interpret and identifying the cause of errors can be time-consuming. One factor in this challenge is that the messages are typically intended for an audience that already knows how to program, or even for programming environments that then use the information to highlight areas in code. Researchers have been working on making these errors more novice friendly since the 1960s, however progress has been slow. The present work contributes to this stream of research by using large language models to enhance programming error messages with explanations of the errors and suggestions on how to fix them. Large language models can be used to create useful and novice-friendly enhancements to programming error messages that sometimes surpass the original programming error messages in interpretability and actionability. These results provide further evidence of the benefits of large language models for computing educators, highlighting their use in areas known to be challenging for students. We further discuss the benefits and downsides of large language models and highlight future streams of research for enhancing programming error messages.

Artikel Ilmiah Terkait

Compiler Error Messages: Their Content and Accessibility in Novice Programming Environments

Ioannis Karvelas Joe Dillane Brett A. Becker

25 Februari 2020

Improving the feedback that novices receive from programming environments is an important and often overlooked aspect of computing education research. This work in progress examines the effects of various mechanisms by which environments deliver feedback to users. By providing insights on the effects of these mechanisms, we aim to inform designers, developers and educators about more effective design and use of such environments for students.

Exploring the Potential of Large Language Models to Generate Formative Programming Feedback

Dominic Lohr H. Keuning Natalie Kiesler

31 Agustus 2023

Ever since the emergence of large language models (LLMs) and related applications, such as ChatGPT, its performance and error analysis for programming tasks have been subject to research. In this work-in-progress paper, we explore the potential of such LLMs for computing educators and learners, as we analyze the feedback it generates to a given input containing program code. In particular, we aim at (1) exploring how an LLM like ChatGPT responds to students seeking help with their introductory programming tasks, and (2) identifying feedback types in its responses. To achieve these goals, we used students' programming sequences from a dataset gathered within a CS1 course as input for ChatGPT along with questions required to elicit feedback and correct solutions. The results show that ChatGPT performs reasonably well for some of the introductory programming tasks and student errors, which means that students can potentially benefit. However, educators should provide guidance on how to use the provided feedback, as it can contain misleading information for novices.

CodeHelp: Using Large Language Models with Guardrails for Scalable Support in Programming Classes

Jaromír Šavelka Paul Denny Brad E. Sheese + 1 lainnya

14 Agustus 2023

Computing educators face significant challenges in providing timely support to students, especially in large class settings. Large language models (LLMs) have emerged recently and show great promise for providing on-demand help at a large scale, but there are concerns that students may over-rely on the outputs produced by these models. In this paper, we introduce CodeHelp, a novel LLM-powered tool designed with guardrails to provide on-demand assistance to programming students without directly revealing solutions. We detail the design of the tool, which incorporates a number of useful features for instructors, and elaborate on the pipeline of prompting strategies we use to ensure generated outputs are suitable for students. To evaluate CodeHelp, we deployed it in a first-year computer and data science course with 52 students and collected student interactions over a 12-week period. We examine students’ usage patterns and perceptions of the tool, and we report reflections from the course instructor and a series of recommendations for classroom use. Our findings suggest that CodeHelp is well-received by students who especially value its availability and help with resolving errors, and that for instructors it is easy to deploy and complements, rather than replaces, the support that they provide to students.

Generating High-Precision Feedback for Programming Syntax Errors using Large Language Models

Sumit Gulwani Tung Phung Tobias Kohn + 4 lainnya

24 Januari 2023

Large language models (LLMs), such as Codex, hold great promise in enhancing programming education by automatically generating feedback for students. We investigate using LLMs to generate feedback for fixing syntax errors in Python programs, a key scenario in introductory programming. More concretely, given a student's buggy program, our goal is to generate feedback comprising a fixed program along with a natural language explanation describing the errors/fixes, inspired by how a human tutor would give feedback. While using LLMs is promising, the critical challenge is to ensure high precision in the generated feedback, which is imperative before deploying such technology in classrooms. The main research question we study is: Can we develop LLMs-based feedback generation techniques with a tunable precision parameter, giving educators quality control over the feedback that students receive? To this end, we introduce PyFiXV, our technique to generate high-precision feedback powered by Codex. The key idea behind PyFiXV is to use a novel run-time validation mechanism to decide whether the generated feedback is suitable for sharing with the student; notably, this validation mechanism also provides a precision knob to educators. We perform an extensive evaluation using two real-world datasets of Python programs with syntax errors and show the efficacy of PyFiXV in generating high-precision feedback.

Hedy: A Gradual Language for Programming Education

F. Hermans

1 Agustus 2020

One of the aspects of programming that learners often struggle with is the syntax of programming languages: remembering the right commands to use and combining those into a working program. Prior research demonstrated that students submit source code with syntax errors in 73% of cases and even the best students do so in 50% of cases. An analysis of 37 million compilations by 250.000 students found that the most common error was a syntax error, which occurred in almost 800.000 compilations. It was also found that Java and Perl are not easier to understand than a programming language with randomly generated keywords, stressing the difficulties that novices face in understanding syntax. This paper presents Hedy: a new way of teaching the syntax of a programming language to novices, inspired by educational methods by which punctuation is taught to children. Hedy starts as a simple programming language without any syntactic elements such as brackets, colons or indentation. The rules slowly and gradually change until the novices are programming in Python. Hedy is evaluated on 9714 programs.

Daftar Referensi

0 referensi

Tidak ada referensi ditemukan.

Artikel yang Mensitasi

1 sitasi

Navigating Complexity in Software Engineering: A Prototype for Comparing GPT-n Solutions

Christoph Treude

28 Januari 2023

Navigating the diverse solution spaces of non-trivial software engineering tasks requires a combination of technical knowledge, problem-solving skills, and creativity. With multiple possible solutions available, each with its own set of trade-offs, it is essential for programmers to evaluate the various options and select the one that best suits the specific requirements and constraints of a project. Whether it is choosing from a range of libraries, weighing the pros and cons of different architecture and design solutions, or finding unique ways to fulfill user requirements, the ability to think creatively is crucial for making informed decisions that will result in efficient and effective software. However, the interfaces of current chatbot tools for programmers, such as OpenAI’s ChatGPT or GitHub Copilot, are optimized for presenting a single solution, even for complex queries. While other solutions can be requested, they are not displayed by default and are not intuitive to access. In this paper, we present our work-in-progress prototype “GPTCOMPARE”, which allows programmers to visually compare multiple source code solutions generated by GPT-n models for the same programming-related query by highlighting their similarities and differences.