Expert-Aware Causal Tracing of Factual Recall in Sparse MoE Language Models
Title: Expert-Aware Causal Tracing of Factual Recall in Sparse MoE Language Models
Abstract: While causal tracing for factual recall has primarily focused on dense transformer language models—where interventions typically pinpoint information flow to specific layers or feed-forward modules—the emergence of sparse mixture-of-experts (MoE) architectures raises a more nuanced inquiry: within a routed MoE block, which specific expert contributions are responsible for mediating a factual prediction? This study develops an expert-aware causal tracing framework tailored for sparse MoE language models. By leveraging CounterFact facts, we first distort the model’s factual preferences by introducing noise into subject-token embeddings, subsequently testing whether clean MoE-block outputs or clean expert-level updates can restore the logit contrast between true and foil answers. In experiments with Qwen3-30B-A3B-Base, a layer sweep identified layer 44 as critical; further expert-level tracing isolated L44E069, an expert frequently selected in clean runs, whose held-out patch outperformed patches from other active experts in the same layer. Conversely, for Mixtral-8x7B-v0.1, layer-level tracing confirmed a mid-layer signal, yet this signal was not confined to a single selected expert. Instead, a coalition check successfully recovered the signal through routed multi-expert updates. These findings indicate that factual tracing in MoEs can be adapted to be expert-aware, while demonstrating that the localization of expertise is contingent on both the model architecture and the specific protocol employed, rather than being a universal property.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



