arXiv

From Graph Retrieval to Schema Realization: Counterfactual Validation for Text-to-SPARQL over Heterogeneous Knowledge Graphs

June 2, 2026 · Yang Zhao, Chengxiao Dai, Yue Xiu, Dusit Niyato · Original Source

Title: From Graph Retrieval to Schema Realization: Counterfactual Validation for Text-to-SPARQL over Heterogeneous Knowledge Graphs

Abstract:

The task of Text-to-SPARQL involves translating natural language inquiries into executable SPARQL queries for RDF knowledge graphs. While conventional benchmarks typically assume a pre-defined, static target graph, real-world Knowledge Graph Question Answering (KGQA) scenarios often necessitate navigating collections of heterogeneous graphs. These collections may feature divergent schemas, partial alignments, and sparse metadata. In such environments, generating queries extends beyond mere SPARQL syntax; the system must first pinpoint a graph schema capable of accommodating the specific predicates, entity types, joins, filters, and constraints inherent to the user’s question.

To address this, we introduce SchemaForge, an agentic framework grounded in schema logic for Text-to-SPARQL tasks across heterogeneous KG collections. The core of SchemaForge is a mechanism for question-conditioned schema-slice alignment. This process begins with weak graph evidence to narrow down plausible candidates, followed by stronger schema evidence to verify whether a local schema slice can successfully realize the intended query. Once identified, this schema slice serves as a constraint for both query generation and verification prior to execution. Notably, in scenarios where only a single graph is accessible, this approach simplifies to standard single-KG Text-to-SPARQL with schema grounding.

We assessed SchemaForge’s performance using four public benchmarks: LC-QuAD 2.0, QALD-9 Plus, QALD-10, and Spider4SPARQL. Our results demonstrate that SchemaForge surpasses the strongest matched agent baseline by an average margin of 11.50 percentage points in execution accuracy across these datasets. Specifically, on the Spider4SPARQL benchmark, the framework boosted execution accuracy from 54.86% to 64.18%. Furthermore, it achieved graph allocation accuracies of 73.0% for Top-1 and 97.0% for Top-3 selections. These findings indicate that transitioning from weak graph evidence to schema-specific query commitments, reinforced by counterfactual answer-set checks, significantly enhances the generation of executable queries over heterogeneous knowledge graphs.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC