arXiv

RenoBench: A Citation Parsing Benchmark

Title: RenoBench: A Citation Parsing Benchmark

Abstract:

Machine-readable scholarly infrastructure relies heavily on the accurate parsing of citations. However, current evaluation methods frequently suffer from limitations such as poor generalizability, reliance on synthetic data, or lack of public accessibility, despite continued attention to this challenge. To address these gaps, we present RenoBench, a publicly available benchmark designed for citation parsing. This dataset is derived from PDFs obtained across four distinct publishing ecosystems: Open Research Europe, the Public Knowledge Project, Redalyc, and SciELO.

By leveraging 161,000 annotated citations as a starting point, we utilized automated validation and feature-based sampling techniques to curate a refined dataset comprising 10,000 citations. This selection ensures coverage of various languages, publication formats, and platforms. We subsequently assessed several citation parsing systems, reporting their field-level precision and recall metrics. The findings indicate that language models, especially those that have been fine-tuned, achieve robust performance. Ultimately, RenoBench facilitates reproducible and standardized assessments of citation parsing tools, establishing a solid basis for progress in automated citation parsing and metascientific research.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...