SciCoQA: Quality Assurance for Scientific Paper--Code Alignment
Loading...
Date
2026-01-19
Type
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Description
We present SciCoQA, a dataset for detecting discrepancies between scientific publications and their codebases to ensure faithful implementations. We construct SciCoQA from GitHub issues and reproducibility papers, and to scale our dataset, we propose a synthetic data generation method for constructing paper-code discrepancies. We analyze the paper-code discrepancies in detail and propose discrepancy types and categories to better understand the occurring mismatches. In total, our dataset consists of 611 paper-code discrepancies (81 real, 530 synthetic), spanning diverse computational science disciplines, including AI, Physics, Quantitative Biology, and others.
Keywords
Citation
Endorsement
Related Resources
Is Part Of
https://arxiv.org/abs/2601.12910DFG Classification
Project(s)
Faculty
Collections
License
Except where otherwise noted, this license is described as CC BY 4.0 - Attribution 4.0 International

