Global News Digest

arXiv

From Graph Retrieval to Schema Realization: Counterfactual Validation for Text-to-SPARQL over Heterogeneous Knowledge Graphs

Title: From Graph Retrieval to Schema Realization: Counterfactual Validation for Text-to-SPARQL over Heterogeneous Knowledge Graphs

Abstract:

The task of Text-to-SPARQL involves translating natural language inquiries into executable SPARQL queries for RDF knowledge graphs. While conventional benchmarks typically assume a pre-defined, static target graph, real-world Knowledge Graph Question Answering (KGQA) scenarios often necessitate navigating collections of heterogeneous graphs. These collections may feature divergent schemas, partial alignments, and sparse metadata. In such environments, generating queries extends beyond mere SPARQL syntax; the system must first pinpoint a graph schema capable of accommodating the specific predicates, entity types, joins, filters, and constraints inherent to the user’s question.

To address this, we introduce SchemaForge, an agentic framework grounded in schema logic for Text-to-SPARQL tasks across heterogeneous KG collections. The core of SchemaForge is a mechanism for question-conditioned schema-slice alignment. This process begins with weak graph evidence to narrow down plausible candidates, followed by stronger schema evidence to verify whether a local schema slice can successfully realize the intended query. Once identified, this schema slice serves as a constraint for both query generation and verification prior to execution. Notably, in scenarios where only a single graph is accessible, this approach simplifies to standard single-KG Text-to-SPARQL with schema grounding.

We assessed SchemaForge’s performance using four public benchmarks: LC-QuAD 2.0, QALD-9 Plus, QALD-10, and Spider4SPARQL. Our results demonstrate that SchemaForge surpasses the strongest matched agent baseline by an average margin of 11.50 percentage points in execution accuracy across these datasets. Specifically, on the Spider4SPARQL benchmark, the framework boosted execution accuracy from 54.86% to 64.18%. Furthermore, it achieved graph allocation accuracies of 73.0% for Top-1 and 97.0% for Top-3 selections. These findings indicate that transitioning from weak graph evidence to schema-specific query commitments, reinforced by counterfactual answer-set checks, significantly enhances the generation of executable queries over heterogeneous knowledge graphs.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Schroders Renewable Unit Targets AI Assets as Power Demand Soars
Bloomberg

Schroders Renewable Unit Targets AI Assets as Power Demand Soars

Schroders’ renewable unit targets AI infrastructure, pivoting to meet soaring energy demand from artificial intelligence...

State Street's Paglia on SBI Group Partnership, ETFs
Bloomberg

State Street's Paglia on SBI Group Partnership, ETFs

State Street's Paglia discusses the SBI Group partnership and ETFs, but the source text is missing. Please provide the a...

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’
Bloomberg

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’

Nvidia CEO Jensen Huang advocates for paying workers “as much as possible,” emphasizing maximum compensation. This stanc...

TSE Talking With Regulator For Easing ETF Listing Rules
Bloomberg

TSE Talking With Regulator For Easing ETF Listing Rules

The Tokyo Stock Exchange is discussing with regulators to ease ETF listing rules. This aims to simplify market access an...

S&P DJI CEO on Japan Markets, Mega IPOs
Bloomberg

S&P DJI CEO on Japan Markets, Mega IPOs

S&P DJI CEO discusses Japan's financial markets and major IPOs.