PeerQA-XT
Loading...
Date
2026-02-19
Type
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Description
The rapid growth of scientific publications makes it increasingly difficult for researchers to keep up with new findings. Scientific question answering (QA) systems aim to automatically respond to questions based on scientific articles. Advancing these systems requires high-quality, large-scale datasets. Current work is either limited to small scale due to costly manual annotation or lacks realistic depth when generated synthetically. To address this gap, this thesis introduces a novel framework for automatically generating scientific QA pairs from research literature using large language models (LLMs). The framework extracts QA pairs from peer reviews and rebuttals with state-of-the-art open-source LLMs, applying automated filtering and validation to ensure coherence and relevance. The resulting dataset comprises 12,628 free-form, open-ended QA pairs across ten scientific domains. We conduct extensive experiments to evaluate the dataset, examining both the impact of fine-tuning on our resource and its performance across several benchmarks. Results show that fine-tuning substantially improves a model’s ability to understand and apply scientific knowledge. These findings highlight the value of our framework and demonstrate the potential of peer review–based resources in advancing scientific QA, particularly for generative tasks and long-context reasoning.
Citation
Endorsement
Related Resources
DFG Classification
Project(s)
Faculty
Collections
License
Except where otherwise noted, this license is described as CC-BY-NC-SA 4.0
