SciCoQA
| dc.contributor.author | Baumgärtner, Tim | |
| dc.contributor.author | Gurevych, Iryna | |
| dc.date.accessioned | 2026-01-19T08:58:11Z | |
| dc.date.created | 2026-01 | |
| dc.date.issued | 2026-01-19 | |
| dc.description | We present SciCoQA, a dataset for detecting discrepancies between scientific publications and their codebases to ensure faithful implementations. We construct SciCoQA from GitHub issues and reproducibility papers, and to scale our dataset, we propose a synthetic data generation method for constructing paper-code discrepancies. We analyze the paper-code discrepancies in detail and propose discrepancy types and categories to better understand the occurring mismatches. In total, our dataset consists of 611 paper-code discrepancies (81 real, 530 synthetic), spanning diverse computational science disciplines, including AI, Physics, Quantitative Biology, and others. | |
| dc.description.version | v1.0 | |
| dc.identifier.uri | https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/4994 | |
| dc.language.iso | en | |
| dc.rights.license | CC-BY-4.0 (https://creativecommons.org/licenses/by/4.0) | |
| dc.subject | AI4Science | |
| dc.subject | Peer Review | |
| dc.subject | Paper-Code Alignment | |
| dc.subject.classification | 4.43-04 | |
| dc.subject.ddc | 004 | |
| dc.title | SciCoQA | |
| dc.type | Text | |
| dcterms.accessRights | openAccess | |
| person.identifier.orcid | 0000-0001-6903-5509 | |
| person.identifier.orcid | 0000-0003-2187-7621 | |
| tuda.agreements | true | |
| tuda.unit | TUDa |
Files
Original bundle
1 - 1 of 1
| Name | Description | Size | Format | |
|---|---|---|---|---|
| scicoqa-v1.0.zip | 1.87 MB | ZIP-Archivdateien |
