End-to-End Text Line Detection and Ordering
Title: Integrated Detection and Sequencing of Text Lines
Abstract:
Standard workflows for recognizing text in historical documents usually split layout analysis into two distinct phases: first detecting text lines, and then determining their reading order. The ordering phase typically relies on hand-coded geometric rules, which often fail when dealing with marginalia, multi-column layouts, tables, or specific editorial styles unique to certain sources. To address these limitations, this paper presents Orli (Ordered Regression of Lines), an end-to-end framework that unifies both tasks into a single image-to-sequence challenge. Orli works by autoregressively generating text-line baselines directly in the correct reading sequence from a page image. The baselines utilize a chord-frame parameterization to define a line’s position, orientation, and length, while capturing local geometry via perpendicular offsets. A final curve is produced through an iterative refinement head combined with a local visual refiner.
Trained on a diverse dataset of 196,691 pages covering ten different writing systems, Orli slightly surpasses the current state of the art for cBAD line detection without requiring dataset-specific training. It achieves near-perfect coverage and ordering accuracy across various reading-order benchmarks in a zero-shot setting. Furthermore, the model can adapt to specialized out-of-domain layouts with minimal fine-tuning. The source code and model weights are publicly accessible under an open license at https://github.com/mittagessen/orli.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC




