arXiv

POIROT: Interrogating Agents for Failure Detection in Multi-Agent Systems

June 2, 2026 · I\~naki Dellibarda Varela, R. Sendra-Arranz, Pablo Romero-Sorozabal, J. M. Valverde-Garc\'ia, Annemarie F. Laudanski, \'Alvaro Guti\'errez, Eduardo Rocon, Manuel Cebrian · Original Source

Title: POIROT: Interrogating Agents for Failure Detection in Multi-Agent Systems

Abstract: While integrating Large Language Models into Multi-Agent Systems (LLM-MAS) has significantly enhanced reasoning abilities, the presence of uncharacterized emergent failures and hallucinations continues to hinder their adoption in safety-critical sectors. This challenge is further exacerbated by the legal risks posed by emerging AI regulations. Current evaluation methods are fundamentally flawed because centralized judgment mechanisms create single points of failure and require specialized domain knowledge. To address this, we introduce POIROT, a protocol that utilizes the system’s own agents as a diagnostic layer, capitalizing on the inherent epistemic diversity within the architecture. Our evaluations show that POIROT surpasses single-LLM evaluator baselines, with performance improvements that increase alongside problem complexity (OR = 1.60, $p = 0.008$), the number of agents, and fault dimensionality. These gains remain consistent even under compound fault conditions. The findings suggest that safety oversight does not need to be externalized; rather, the agents performing specific roles possess sufficient collective intelligence to audit their own actions. We are releasing POIROT as an open-source library, accompanied by BLAME, a new benchmark designed for fault attribution in safety-critical multi-agent environments.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC