Need a Programming Exercise Generated in Your Native Language? ChatGPT's Got Your Back: Automatic Generation of Non-English Programming Exercises Using OpenAI GPT-3.5

Kevin Ly Mollie Jordan Adalbert Gerald Soosai Raj

Abstrak

Large language models (LLMs) like ChatGPT are changing computing education and may create additional barriers to those already faced by non-native English speakers (NNES) learning computing. We investigate an opportunity for a positive impact of LLMs on NNES through multilingual programming exercise generation. Following previous work with LLM exercise generation in English, we prompt OpenAI GPT-3.5 in 4 natural languages (English, Tamil, Spanish, and Vietnamese) to create introductory programming problems, sample solutions, and test cases. We evaluate these problems on their sensibility, readability, translation, sample solution accuracy, topicality, and cultural relevance. We find that problems generated in English, Spanish, and Vietnamese are largely sensible, easily understood, and accurate in their sample solutions. However, Tamil problems are mostly non-sensible and have a much lower passing test rate, indicating that the abilities of LLMs for problem generation are not generalizable across languages. Our analysis suggests that these problems could not be given verbatim to students, but with minimal effort, most errors can be fixed. We further discuss the benefits of these problems despite their flaws, and their opportunities to provide personalized and culturally relevant resources for students in their native languages.

Artikel Ilmiah Terkait

Automatic Generation of Programming Exercises and Code Explanations Using Large Language Models

Paul Denny Juho Leinonen Arto Hellas + 1 lainnya

3 Juni 2022

This article explores the natural language generation capabilities of large language models with application to the production of two types of learning resources common in programming courses. Using OpenAI Codex as the large language model, we create programming exercises (including sample solutions and test cases) and code explanations, assessing these qualitatively and quantitatively. Our results suggest that the majority of the automatically generated content is both novel and sensible, and in some cases ready to use as is. When creating exercises we find that it is remarkably easy to influence both the programming concepts and the contextual themes they contain, simply by supplying keywords as input to the model. Our analysis suggests that there is significant value in massive generative machine learning models as a tool for instructors, although there remains a need for some oversight to ensure the quality of the generated content before it is delivered to students. We further discuss the implications of OpenAI Codex and similar tools for introductory programming education and highlight future research streams that have the potential to improve the quality of the educational experience for both teachers and students alike.

Large Language Models in Introductory Programming Education: ChatGPT's Performance and Implications for Assessments

Natalie Kiesler D. Schiffner

15 Agustus 2023

This paper investigates the performance of the Large Language Models (LLMs) ChatGPT-3.5 and GPT-4 in solving introductory programming tasks. Based on the performance, implications for didactic scenarios and assessment formats utilizing LLMs are derived. For the analysis, 72 Python tasks for novice programmers were selected from the free site CodingBat. Full task descriptions were used as input to the LLMs, while the generated replies were evaluated using CodingBat's unit tests. In addition, the general availability of textual explanations and program code was analyzed. The results show high scores of 94.4 to 95.8% correct responses and reliable availability of textual explanations and program code, which opens new ways to incorporate LLMs into programming education and assessment.

Exploring the Potential of Large Language Models to Generate Formative Programming Feedback

Dominic Lohr H. Keuning Natalie Kiesler

31 Agustus 2023

Ever since the emergence of large language models (LLMs) and related applications, such as ChatGPT, its performance and error analysis for programming tasks have been subject to research. In this work-in-progress paper, we explore the potential of such LLMs for computing educators and learners, as we analyze the feedback it generates to a given input containing program code. In particular, we aim at (1) exploring how an LLM like ChatGPT responds to students seeking help with their introductory programming tasks, and (2) identifying feedback types in its responses. To achieve these goals, we used students' programming sequences from a dataset gathered within a CS1 course as input for ChatGPT along with questions required to elicit feedback and correct solutions. The results show that ChatGPT performs reasonably well for some of the introductory programming tasks and student errors, which means that students can potentially benefit. However, educators should provide guidance on how to use the provided feedback, as it can contain misleading information for novices.

GPT-3 vs Object Oriented Programming Assignments: An Experience Report

B. Cipriano P. Alves

29 Juni 2023

Recent studies show that AI-driven code generation tools, such as Large Language Models, are able to solve most of the problems usually presented in introductory programming classes. However, it is still unknown how they cope with Object Oriented Programming assignments, where the students are asked to design and implement several interrelated classes (either by composition or inheritance) that follow a set of best-practices. Since the majority of the exercises in these tools' training dataset are written in English, it is also unclear how well they function with exercises published in other languages. In this paper, we report our experience using GPT-3 to solve 6 real-world tasks used in an Object Oriented Programming course at a Portuguese University and written in Portuguese. Our observations, based on an objective evaluation of the code, performed by an open-source Automatic Assessment Tool, show that GPT-3 is able to interpret and handle direct functional requirements, however it tends not to give the best solution in terms of object oriented design. We perform a qualitative analysis of GPT-3's output, and gather a set of recommendations for computer science educators, since we expect students to use and abuse this tool in their academic work.

Large Language Models (GPT) for automating feedback on programming assignments

Maciej Pankiewicz R. Baker

30 Juni 2023

Addressing the challenge of generating personalized feedback for programming assignments is demanding due to several factors, like the complexity of code syntax or different ways to correctly solve a task. In this experimental study, we automated the process of feedback generation by employing OpenAI’s GPT-3.5 model to generate personalized hints for students solving programming assignments on an automated assessment platform. Students rated the usefulness of GPT-generated hints positively. The experimental group (with GPT hints enabled) relied less on the platform's regular feedback but performed better in terms of percentage of successful submissions across consecutive attempts for tasks, where GPT hints were enabled. For tasks where the GPT feedback was made unavailable, the experimental group needed significantly less time to solve assignments. Furthermore, when GPT hints were unavailable, students in the experimental condition were initially less likely to solve the assignment correctly. This suggests potential over-reliance on GPT- generated feedback. However, students in the experimental condition were able to correct reasonably rapidly, reaching the same percentage correct after seven submission attempts. The availability of GPT hints did not significantly impact students' affective state.

Daftar Referensi

0 referensi

Tidak ada referensi ditemukan.

Artikel yang Mensitasi

0 sitasi

Tidak ada artikel yang mensitasi.