arXiv

Resonant Context Anchoring: Decoupling Attention Routing and Signal Gain at Inference Time

June 2, 2026 · Mingkuan Zhao, Yide Gao, Wentao Hu, Suquan Chen, Tianchen Huang, Zhenhua An, Zetao Chang, Xiayu Sun, Yuheng Min · Original Source

Title: Resonant Context Anchoring: Decoupling Attention Routing and Signal Gain at Inference Time

Large Language Models (LLMs) often suffer from "contextual disregard" when confronted with input evidence that contradicts their internal parametric memory, resulting in persistent factual hallucinations. Current mitigation strategies typically depend on suppressing specific neuron activations or utilizing computationally intensive contrastive decoding mechanisms. These approaches, however, frequently lead to increased perplexity or substantially higher inference latency.

To overcome these constraints, we introduce Resonant Context Anchoring (RCA), a lightweight intervention method applied at inference time. Rooted in the dynamics of residual stream signals, RCA is designed to counteract the attenuation of external evidence as it propagates through deep networks. The method’s core mechanism entails the orthogonal decoupling of routing logic and information magnitude within the self-attention module. By leveraging raw pre-softmax attention scores as an immediate measure of semantic alignment, we generate a dynamic gain field through non-linear rectification. This process selectively amplifies the norms of value vectors associated with context tokens, all without modifying the attention probability distribution. Consequently, this mechanism boosts the signal-to-noise ratio (SNR) of input evidence within the residual stream mixture, firmly anchoring the generation trajectory to truthful context during inference.

Extensive experiments conducted on the Llama-3 model series reveal that RCA substantially enhances contextual faithfulness across various factual consistency and strong knowledge-conflict tasks, effectively curbing parametric hallucinations. Moreover, findings indicate that as a training-free, computationally negligible plug-and-play module, RCA delivers a Pareto improvement in both faithfulness and fluency, while preserving the model’s general language understanding capabilities.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC