APEX-SQL: Talking to the data via Agentic Exploration for Text-to-SQL
Title: APEX-SQL: Enabling Data Interaction through Agentic Exploration for Text-to-SQL
Abstract:
While Large Language Model-driven Text-to-SQL systems have achieved high scores on academic benchmarks, they often falter when applied to complex enterprise contexts. This bottleneck stems largely from their dependence on static schema definitions, which are insufficient for resolving semantic ambiguities or scaling to large, intricate databases. To overcome these challenges, we introduce APEX-SQL, an Agentic Text-to-SQL Framework that transitions the field from passive translation to active, agentic exploration.
Central to our approach is a hypothesis-verification loop that anchors the model’s reasoning in actual data. During schema linking, we utilize logical planning to articulate hypotheses, employ dual-pathway pruning to narrow the search space, and conduct parallel data profiling to verify column roles against real-world data. This process culminates in global synthesis to guarantee topological connectivity. For the SQL generation phase, we implement a deterministic mechanism to retrieve exploration directives. This enables the agent to effectively navigate data distributions, refine its hypotheses, and produce semantically precise SQL queries.
Our experimental results on the BIRD dataset (achieving 70.65% execution accuracy) and Spider 2.0-Snow (51.01% execution accuracy) show that APEX-SQL surpasses competitive baselines while using fewer tokens. Further analysis indicates that agentic exploration serves as a performance multiplier, unlocking the latent reasoning capabilities of foundation models in enterprise scenarios. Ablation studies validate the essential role of each component in ensuring robust and accurate data analysis. The source code is available at https://github.com/Tencent/APEX-SQL-Project.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC






