arXiv

CauTion: Knowing When to Trust LLMs for Ensemble Causal Discovery

Title: CauTion: Navigating Trust in Large Language Models for Ensemble Causal Discovery

Abstract

Deriving causal structures from observational data is inherently difficult, primarily because purely statistical approaches face fundamental constraints. These include the inability to distinguish within equivalence classes and a pronounced sensitivity to limited sample sizes. Although Large Language Models (LLMs) present a valuable avenue for incorporating domain expertise to support statistical inference, current LLM-integrated methods are prone to errors introduced by the models themselves and involve significant token expenses. Furthermore, depending on a single data-driven algorithm can render outcomes vulnerable to specific algorithmic biases.

To overcome these challenges, we introduce CauTion, a novel framework designed to robustly embed LLM-derived domain knowledge into an ensemble of statistical causal discovery methods. This integration is achieved through consensus filtering and the estimation of LLM reliability. The CauTion process unfolds across three distinct phases:

  1. Consensus Filtering: An ensemble of algorithms employs consensus voting to resolve up to 96% of edges where there is agreement among the methods. This step yields near-perfect accuracy for the edges retained in the consensus.
  2. Trust-Calibrated Arbitration: An annotation-free trust calibration procedure assesses the relative reliability of both the LLM and the statistical algorithms. This metric informs a trust-weighted voting system that limits LLM intervention strictly to edges where algorithmic evidence is deemed unreliable.
  3. Cycle Repair: A final cycle repair mechanism ensures that the resulting causal graph is strictly acyclic and structurally valid.

Empirical evaluations across six datasets show that CauTion consistently surpasses both data-centric and LLM-augmented baseline methods. The performance improvements are particularly notable on larger graphs, and the framework demonstrates strong resilience against LLM inaccuracies. The source code for this framework is publicly accessible at https://github.com/OpenCausaLab/CauTion.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TikTok Billionaire Tops Ambani as Asia’s Second-Richest
Bloomberg

TikTok Billionaire Tops Ambani as Asia’s Second-Richest

TikTok founder surpasses Mukesh Ambani to become Asia’s second-richest person, marking a significant shift in the region...

Publishers in UK can opt out of Google AI search results
BBC News

Publishers in UK can opt out of Google AI search results

UK publishers can now opt out of Google’s AI search summaries, a CMA ruling designed to boost their bargaining power and...

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.
Bloomberg

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.

Kioxia’s market cap nears Toyota’s, signaling a major shift in Japan’s corporate hierarchy. This narrowing gap highlight...

Reuters

Morning Bid: Marvell, a fitting name for the latest AI darling

Reuters highlights Marvell as a top AI stock, noting its name perfectly suits its status as the newest market darling.

Financial Times

Tim Hayward: I built the Jaguar E-Type of computer keyboards

Tim Hayward compares his bespoke keyboard designs to the Jaguar E-Type. He explores high-end customization for personal ...

Financial Times

AI Labs: Zuckerberg’s $100bn gamble

Meta’s $100 billion AI investment aims to secure AI dominance, but questions remain whether sheer spending can outpace c...