Global News Digest

arXiv

SIRIUS-SQL: Anchoring Multi-Candidate Text-to-SQL in Execution Feedback

Title: SIRIUS-SQL: Leveraging Execution Feedback to Anchor Multi-Candidate Text-to-SQL Generation

Abstract:

Generating accurate SQL queries for complex schemas remains a challenge when relying on single-pass approaches. To mitigate this, contemporary systems often produce multiple SQL candidates and employ voting mechanisms to discard errors. However, voting in isolation proves insufficient due to three interconnected limitations inherent in multi-candidate strategies. First, drawing additional samples from a single generator yields increasingly redundant outputs. Second, current pipelines typically apply a uniform correction to all non-clean execution results, ignoring the fact that runtime errors, timeouts, and empty results signify varying degrees of deviation from the correct answer. Third, existing selection methods depend on a singular perspective—such as majority voting on results or pairwise SQL comparisons—failing to capture insights from other analytical angles.

SIRIUS-SQL is introduced to resolve these three specific weaknesses. The system utilizes a difficulty-smoothing reinforcement learning (RL) framework to train SIRIUS-32B, enabling the generation of diverse, executable SQL candidates. This specialist model is complemented by a generalist LLM designed to address any gaps left by the primary generator. The pipeline incorporates an execution-grounded lifecycle that categorizes each outcome and applies precise repairs before returning viable candidates to the pool. Furthermore, a confidence-gated hybrid selector merges execution-result consensus with pairwise SQL-form evaluation, resorting to a deterministic structural check only for closely contested cases.

In evaluations, SIRIUS-SQL achieved a score of 75.88% on the BIRD development set and 91.20% on the SPIDER test set. Notably, two out of three generalist pairings outperformed Agentar-Scale-SQL, which currently stands as the most robust published multi-candidate system on the BIRD development benchmark.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Schroders Renewable Unit Targets AI Assets as Power Demand Soars
Bloomberg

Schroders Renewable Unit Targets AI Assets as Power Demand Soars

Schroders’ renewable unit targets AI infrastructure, pivoting to meet soaring energy demand from artificial intelligence...

State Street's Paglia on SBI Group Partnership, ETFs
Bloomberg

State Street's Paglia on SBI Group Partnership, ETFs

State Street's Paglia discusses the SBI Group partnership and ETFs, but the source text is missing. Please provide the a...

Nvidia Boss Says Workers Should Be Paid ā€˜as Much as Possible’
Bloomberg

Nvidia Boss Says Workers Should Be Paid ā€˜as Much as Possible’

Nvidia CEO Jensen Huang advocates for paying workers ā€œas much as possible,ā€ emphasizing maximum compensation. This stanc...

TSE Talking With Regulator For Easing ETF Listing Rules
Bloomberg

TSE Talking With Regulator For Easing ETF Listing Rules

The Tokyo Stock Exchange is discussing with regulators to ease ETF listing rules. This aims to simplify market access an...

S&P DJI CEO on Japan Markets, Mega IPOs
Bloomberg

S&P DJI CEO on Japan Markets, Mega IPOs

S&P DJI CEO discusses Japan's financial markets and major IPOs.