StudentEval: A Benchmark of Student-Written Prompts for Large Language Models of Code

Author

Arjun Guha (Northeastern University + Roblox), Hannah McLean Babe (Oberlin College), Sydney Nguyen (Wellesley College), Yangtian Zi (Northeastern University), Molly Q. Feldman (Oberlin College), Carolyn Jane Anderson (Wellesley College)

Venue

Findings of the Association of Computational Linguistics (ACL) 2024

Abstract

Code LLMs are being rapidly deployed and there is evidence that they can make professional programmers more productive. Current benchmarks for code generation measure whether models generate correct programs given an expert prompt. In this paper, we present a new benchmark containing multiple prompts per problem, written by a specific population of non-expert prompters: beginning programmers. StudentEval contains 1,749 prompts for 48 problems, written by 80 students who have only completed one semester of Python programming. Our students wrote these prompts while working interactively with a Code LLM, and we observed very mixed success rates. We use StudentEval to evaluate 5 Code LLMs and find that StudentEval is a better discriminator of model performance than existing benchmarks. We analyze the prompts and find significant variation in students' prompting techniques. We also find that nondeterministic LLM sampling could mislead students into thinking that their prompts are more (or less) effective than they actually are, which has implications for how to teach with Code LLMs.

Join us in shaping the future

View All Jobs

Latest

More results

StudentEval: A Benchmark of Student-Written Prompts for Large Language Models of Code

Author

Venue

Abstract

Join us in shaping the future

StudentEval: A Benchmark of Student-Written Prompts for Large Language Models of Code

Author

Venue

Abstract

Related Publications

MATCHA: Can Multi-Agent Collaboration Build a Trustworthy Conversational Recommender?

SelfCodeAlign: Self-Alignment for Code Generation

Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions

Join us in shaping the future