What to Format and How: A Benchmark and Workflow Approach for Document Formatting
Title: Strategies for Document Formatting: A Benchmark and Workflow Framework
Abstract:
The emergence of large language models (LLMs) has unlocked new potential for the automation of document formatting. Nevertheless, practical formatting tasks frequently depend on identifying specific targets through an analysis of the document’s content. This context-dependent approach remains an underexplored and difficult challenge, largely because there are no specialized evaluation datasets available. To address this gap and facilitate assessment in realistic, content-aware environments, we present DocFormBench. This benchmark expands the scope of Text-to-Format evaluation to encompass a wide array of formatting needs and includes metrics designed to measure both accuracy and efficiency.
To address the issue of redundant document reading often seen in current methods, we introduce DocFormFlow. This workflow-based approach separates the process of target localization from the execution of modifications, effectively distinguishing between "what" needs to be formatted and "how" it should be done. Our extensive experiments, conducted across various LLMs and multimodal models, demonstrate that DocFormFlow consistently enhances formatting accuracy and lowers token usage when compared to leading baseline methods. Additional analysis indicates that the precision of target localization is the most significant determinant of overall formatting performance. We anticipate that DocFormBench and DocFormFlow will support future advancements in the development of more intelligent and dependable document formatting systems.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





