Wir ändern die Abläufe zur DOI Registrierung in einem Pilotprojekt zur Kuratierung für FAIRere Daten, siehe Nachrichtenmeldung
We are chaging DOI registration workflows in a curation pilot for FAIRer data, please see news item
 

Themis: Training Robust Multilingual Code Reward Models for Flexible Multi-Criteria Scoring

dc.contributor.author Paul, Indraneil
dc.contributor.author Gurevych, Iryna
dc.contributor.author Glavaš, Goran
dc.date.accessioned 2026-05-02T16:46:05Z
dc.date.created 2026-05-02
dc.date.issued 2026-05-02
dc.description Themis-CodeRewardBench is a code-specific reward model evaluation benchmark comprising ~8.9k diverse code preference pairs across eight programming languages and five quality scoring dimensions (Accompanying code repo can be accessed here - https://github.com/iNeil77/Themis). It is part of the Themis project and evaluates code reward models on five code quality dimensions — Functional Correctness (FC), Execution Efficiency (EE), Memory Efficiency (ME), Readability & Maintainability (R&M), and Security Hardness (SH) — across eight programming languages: C, C#, C++, Go, Java, JavaScript, Python, and Ruby. The benchmark uses preference accuracy as the evaluation metric. It draws from 13 distinct pre-existing and newly constructed code preference datasets, spanning human-written, LLM-generated, and mixed-provenance prompts and responses. It introduces a largely novel distribution of code preferences, for code of increased complexity, compared to the code subsets in existing RM benchmarks. Key differentiators: - Evaluates across 5 quality dimensions, not just functional correctness - Covers 8 programming languages, not just Python - Includes human-written code from real commits, not only contest/synthetic code - Introduces a novel distribution of code preferences with increased code complexity compared to existing RM benchmarks
dc.description.version v1.0
dc.identifier.uri https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/5115
dc.language.iso en
dc.rights Apache 2.0 License
dc.rights.licenseother
dc.rights.uri https://www.apache.org/licenses/LICENSE-2.0
dc.subject reward modelling, code evaluation, benchmark
dc.subject.classification 4.43-04
dc.subject.ddc 004
dc.title Themis: Training Robust Multilingual Code Reward Models for Flexible Multi-Criteria Scoring
dc.type Text
dcterms.accessRights openAccess
person.identifier.orcid #PLACEHOLDER_PARENT_METADATA_VALUE#
person.identifier.orcid 0000-0003-2187-7621
person.identifier.orcid 0000-0002-1301-6314
tuda.agreements true
tuda.unit TUDa

Files

Original bundle

Now showing 1 - 1 of 1
NameDescriptionSizeFormat
Themis-CodeRewardBench.jsonl60.68 MBUnknown data format Download

Collections