Multi-component Causal Tracing in Large Language Models
Title: Multi-component Causal Tracing in Large Language Models
Abstract:
This study introduces a unified framework for the simultaneous causal tracing of multiple components within large language models (LLMs). While prior research has largely focused on single-component or single-layer interventions, this work expands the scope to systematically identify and quantify the causal pathways that connect specific inputs or internal computations to targeted performance metrics. By applying flexible interventions across a broad spectrum of metrics, the framework isolates the subsets of components—such as multi-layer perceptron neurons and attention heads—that are most pivotal to desired outcomes, including accuracy and fairness.
To overcome the combinatorial complexity inherent in multi-component analysis, we developed an efficient algorithm that transforms the discrete search problem into a continuous optimization task. This transformation is achieved through the use of soft interventions and a specialized metric transformation, allowing for the efficient resolution of the problem under appropriate constraints. The resulting process yields precise binary decisions for component selection. Our experimental findings indicate that the proposed method effectively pinpoints high-impact component subsets, surpassing the performance of current baseline techniques. The source code for this research is publicly accessible at https://github.com/ZiruiYan/multi-component-causal-tracing.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



