Wir ändern die Abläufe zur DOI Registrierung in einem Pilotprojekt zur Kuratierung für FAIRere Daten, siehe Nachrichtenmeldung
We are chaging DOI registration workflows in a curation pilot for FAIRer data, please see news item
 
Open Access

Publication data analysis for TUDa, RWTH and KIT, 2018-2025, based on OpenAlex

Abstract

Description

The Python script oa_analysis_2026.py retrieves bibliometric data via the OpenAlex API (https://openalex.org) to analyze the Open Access (OA) publishing behavior of selected universities. It processes peer-reviewed journal articles from 2018 to 2025 and calculates key metrics, including the share of OA publications, the share of Gold OA (with and without Diamond OA) within OA, and the share of Diamond OA within OA. Retrieved data are stored as JSON for reproducibility. Results are exported as a formatted Excel report and additionally provided via console output. The script can be run as is to analyze peer-reviewed journal articles published by members of Technical University of Darmstadt (TUDa), RWTH Aachen University (RWTH), and Karlsruhe Institute of Technology (KIT) during 2018–2025. The script oa_analysis_2026.py is an updated version of the original script OpenAlex_statistics.py (Preuß, Waldecker, 2024, https://doi.org/10.48328/tudatalib-1391.2). It follows a modular pipeline architecture (data retrieval, analysis, export), improving readability, maintainability, and reproducibility. It includes a retry mechanism for API requests, structured logging, and error handling for incomplete or corrupted data. Additionally, it provides informative error messages and recovery suggestions, improving the robustness of the workflow and enabling systematic identification of data processing issues. Institutional identifiers are now predefined using OpenAlex IDs to ensure consistent and reproducible analysis. Compared to the earlier version, this implementation introduces a clearer OA classification based strictly on the OpenAlex oa_status field, explicitly distinguishing Gold and Diamond OA. It also provides a more detailed set of metrics and improved output formats, making the results easier to reuse for reporting and further analysis. While the original script relies on simple console output (print statements), the improved implementation employs structured logging with multiple severity levels (e.g., INFO, WARNING, ERROR), enabling better traceability, debugging, and monitoring of the analysis process. In addition, results are exported to structured Excel files with multiple worksheets, formatted headers, and both absolute and relative metrics, facilitating direct reuse in reports and further analysis. Overall, oa_analysis_2026.py represents a more robust and methodologically rigorous implementation. Key improvements include modular design, enhanced reproducibility, precise OA classification, improved data validation, and comprehensive metric reporting, contributing to more reliable and valid analytical results. This repository includes the script, README file, user guide, retrieved data, and the generated Excel report. For installation, requirements, and usage, please refer to the README and user guide. The authors thank Anne-Christine Günther and Harald Gerlach for their feedback on the approach and implementation.

Citation

Endorsement

Project(s)

Faculty

License

Except where otherwise noted, this license is described as CC BY 4.0 - Attribution 4.0 International