SciCoQA

dc.contributor.author Baumgärtner, Tim
dc.contributor.author Gurevych, Iryna
dc.date.accessioned 2026-01-19T08:58:11Z
dc.date.created 2026-01
dc.date.issued 2026-01-19
dc.description We present SciCoQA, a dataset for detecting discrepancies between scientific publications and their codebases to ensure faithful implementations. We construct SciCoQA from GitHub issues and reproducibility papers, and to scale our dataset, we propose a synthetic data generation method for constructing paper-code discrepancies. We analyze the paper-code discrepancies in detail and propose discrepancy types and categories to better understand the occurring mismatches. In total, our dataset consists of 611 paper-code discrepancies (81 real, 530 synthetic), spanning diverse computational science disciplines, including AI, Physics, Quantitative Biology, and others.
dc.description.version v1.0
dc.identifier.uri https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/4994
dc.language.iso en
dc.rights.licenseCC-BY-4.0 (https://creativecommons.org/licenses/by/4.0)
dc.subject AI4Science
dc.subject Peer Review
dc.subject Paper-Code Alignment
dc.subject.classification 4.43-04
dc.subject.ddc 004
dc.title SciCoQA
dc.type Text
dcterms.accessRights openAccess
person.identifier.orcid 0000-0001-6903-5509
person.identifier.orcid 0000-0003-2187-7621
tuda.agreements true
tuda.unit TUDa

Files

Original bundle

Now showing 1 - 1 of 1
NameDescriptionSizeFormat
scicoqa-v1.0.zip1.87 MBZIP-Archivdateien Download

Collections