Fundamental Limitation in Explaining AI
Title: A Core Barrier to Comprehensively Explaining AI
Original: arXiv:2605.24727v2 Announce Type: replace Abstract: While large-scale models such as LLMs and diffusion models have achieved practical success, public institutions have emphasized the importance of explainability in AI. Existing methods for explaining AI, however, are not designed to provide completely faithful explanations of the behavior of large-scale AI systems. Although a completely faithful and interpretable explanation of the behavior of an AI system might be useful for AI governance, it has not been known whether providing such an explanation is theoretically possible. In this paper, we mathematically prove a fundamental quadrilemma in explaining AI, stating that AI and its explanation cannot satisfy the following four conditions simultaneously: 1) the complexity of the operation environment, 2) the goodness of the AI's performance, 3) the interpretability of the AI's explanation, and 4) the complete faithfulness of the AI's explanation. This quadrilemma suggests that, in most applications where we cannot change the environment or sacrifice good AI performance and an interpretable explanation, we should give up complete faithfulness of explanations and should instead aim to explain only the parts that are important for applications. As a consequence, the quadrilemma implies that AI governance should be designed on the premise that the faithfulness of AI explanations is always incomplete.
Rewrite: Despite the widespread practical adoption of large-scale models like diffusion models and Large Language Models (LLMs), public bodies have increasingly stressed the necessity for AI systems to be explainable. However, current approaches to AI interpretability fail to generate fully faithful accounts of how these complex systems operate. While fully faithful and understandable explanations could significantly aid AI governance, it remained unclear whether such explanations are theoretically achievable. This study presents a mathematical proof of a fundamental "quadrilemma" in AI explanation, demonstrating that it is impossible to simultaneously satisfy four specific criteria: 1) environmental complexity, 2) high AI performance quality, 3) explanation interpretability, and 4) total explanation faithfulness. The analysis indicates that in scenarios where the operating environment is fixed and neither AI performance nor interpretability can be compromised, developers must abandon the goal of complete faithfulness. Instead, the focus should shift to explaining only the aspects most relevant to specific applications. Consequently, this quadrilemma suggests that frameworks for AI governance must be built upon the assumption that AI explanations will inherently lack total faithfulness.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




