CAPER: Clause-Aligned Process Supervision for Text-to-SQL
Title: CAPER: Clause-Aligned Process Supervision for Text-to-SQL
Abstract:
Standard evaluations of Text-to-SQL systems rely on query-level execution correctness. However, this final signal offers limited insight into which intermediate SQL decisions were responsible for either success or failure. Furthermore, token-level dense supervision proves ineffective because SQL tokens rarely correspond to complete semantic decisions, can unfairly penalize queries that are execution-equivalent, and are challenging to label reliably at scale.
To address these issues, we introduce CAPER. This approach automatically generates clause-level supervision through counterfactual interventions on the SQL abstract syntax tree, facilitating root-cause error localization for reward modeling. We utilize this generated data to train CAPER-9B, a lightweight Clause-PRM designed to deliver clause-boundary feedback for both policy optimization and candidate verification.
Evaluations on the BIRD and Spider datasets demonstrate that clause-aligned supervision significantly enhances execution accuracy, yielding a relative improvement of up to 15.3% in EX compared to GPT-5.4. Additionally, it bolsters failure-localization capabilities, achieving 84.53% accuracy and a 90.60% MRR on held-out failures. Further details can be found at our project page: https://github.com/banrichard/RL-NL2SQL.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC





