PeerQA-XT

datacite.relation.cites https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/4467
datacite.relation.cites https://aclanthology.org/2025.naacl-long.22/
datacite.relation.isSupplementTo https://doi.org/10.26083/tuda-7777
dc.contributor.author Ngen, Joy Jiaxi
dc.date.accessioned 2026-02-19T15:58:56Z
dc.date.created 2025
dc.date.issued 2026-02-19
dc.description The rapid growth of scientific publications makes it increasingly difficult for researchers to keep up with new findings. Scientific question answering (QA) systems aim to automatically respond to questions based on scientific articles. Advancing these systems requires high-quality, large-scale datasets. Current work is either limited to small scale due to costly manual annotation or lacks realistic depth when generated synthetically. To address this gap, this thesis introduces a novel framework for automatically generating scientific QA pairs from research literature using large language models (LLMs). The framework extracts QA pairs from peer reviews and rebuttals with state-of-the-art open-source LLMs, applying automated filtering and validation to ensure coherence and relevance. The resulting dataset comprises 12,628 free-form, open-ended QA pairs across ten scientific domains. We conduct extensive experiments to evaluate the dataset, examining both the impact of fine-tuning on our resource and its performance across several benchmarks. Results show that fine-tuning substantially improves a model’s ability to understand and apply scientific knowledge. These findings highlight the value of our framework and demonstrate the potential of peer review–based resources in advancing scientific QA, particularly for generative tasks and long-context reasoning.
dc.description.version v1.0
dc.identifier.uri https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/5041
dc.language.iso en
dc.rights CC-BY-NC-SA 4.0
dc.rights.licenseother
dc.rights.uri https://creativecommons.org/licenses/by-nc-sa/4.0/
dc.subject AI4Science, NLP, Question Answering
dc.subject.classification 4.43-04
dc.subject.ddc 004
dc.title PeerQA-XT
dc.type Text
dcterms.accessRights openAccess
person.identifier.orcid 0009-0005-1269-9095
tuda.agreements true
tuda.unit TUDa

Files

Original bundle

Now showing 1 - 1 of 1
NameDescriptionSizeFormat
peerqa-xt-v1.0.zip197.1 MBZIP-Archivdateien Download