PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training
Title: PaddleOCR-VL-1.6: Advancing Document Parsing via Targeted Region Refinement and Sequential Post-Training
Abstract: This paper presents PaddleOCR-VL-1.6, an enhanced, lightweight document parsing architecture derived from PaddleOCR-VL-1.5. While the predecessor established a robust baseline at 0.9B parameters, its residual inaccuracies were primarily located in under-optimized zones characterized by unstable model dynamics, insufficient data representation, or inconsistent supervision. Instead of indiscriminately scaling the training dataset, PaddleOCR-VL-1.6 implements a region-centric data optimization strategy. This approach isolates the weak areas identified in the prior model, applies focused improvements to these segments, and strengthens the reliability of supervision signals. Additionally, the model utilizes a sequential post-training methodology grounded in curated data selection and reinforcement learning, elevating performance through phased optimization. PaddleOCR-VL-1.6 secures a new state-of-the-art result of 96.33% on OmniDocBench v1.6, showcases competitive standing against leading VLMs, and offers a viable post-training framework for the PaddleOCR-VL lineage.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC





