arXiv

RenoBench: A Citation Parsing Benchmark

June 2, 2026 · Parth Sarin, Juan Pablo Alperin, Adam Buttrick, Dione Mentis · Original Source

Title: RenoBench: A Citation Parsing Benchmark

Abstract:

Machine-readable scholarly infrastructure relies heavily on the accurate parsing of citations. However, current evaluation methods frequently suffer from limitations such as poor generalizability, reliance on synthetic data, or lack of public accessibility, despite continued attention to this challenge. To address these gaps, we present RenoBench, a publicly available benchmark designed for citation parsing. This dataset is derived from PDFs obtained across four distinct publishing ecosystems: Open Research Europe, the Public Knowledge Project, Redalyc, and SciELO.

By leveraging 161,000 annotated citations as a starting point, we utilized automated validation and feature-based sampling techniques to curate a refined dataset comprising 10,000 citations. This selection ensures coverage of various languages, publication formats, and platforms. We subsequently assessed several citation parsing systems, reporting their field-level precision and recall metrics. The findings indicate that language models, especially those that have been fine-tuned, achieve robust performance. Ultimately, RenoBench facilitates reproducible and standardized assessments of citation parsing tools, establishing a solid basis for progress in automated citation parsing and metascientific research.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC