Breaking the Reversal Curse in Autoregressive Language Models via Identity Bridge
Title: Overcoming the Reversal Curse in Autoregressive Language Models Using an Identity Bridge
Original: arXiv:2602.02470v2 Announce Type: replace Abstract: Autoregressive large language models (LLMs) have achieved remarkable success in many complex tasks, yet they can still fail in very simple logical reasoning such as the "reversal curse" -- when trained on forward knowledge data of the form "$A \rightarrow B$" (e.g., Alice's husband is Bob), the model is unable to deduce the reversal knowledge "$B \leftarrow A$" (e.g., Bob's wife is Alice) during test. Extensive prior research suggests that this failure is an inherent, fundamental limit of autoregressive causal LLMs, indicating that these models tend to memorize factual-level knowledge rather than capture higher-level rules. In this paper, we challenge this view by showing that this seemingly fundamental limit can be mitigated by slightly tweaking the training data with a simple regularization data recipe called the Identity Bridge of the form "$A \to A$" (e.g., The name of Alice is Alice). Theoretically, we prove that under this recipe, even a one-layer transformer can break the reversal curse by analyzing the implicit bias of gradient descent. Empirically, we show that a 1B pretrained language model finetuned with the proposed data recipe achieves a 50% success rate on reversal tasks, in stark contrast to a near-zero success rate when trained solely on forward-knowledge data. Our work provides a novel theoretical foundation for the reversal curse and offers a principled, low-cost path to encouraging LLMs to learn higher-level rules from data.
Rewrite: Despite the impressive performance of autoregressive large language models (LLMs) across a variety of complex applications, they remain susceptible to failures in basic logical reasoning, a phenomenon known as the "reversal curse." This issue arises when a model, having been trained on forward-facing facts such as "Alice's husband is Bob" ($A \rightarrow B$), fails to infer the corresponding reverse relationship, "Bob's wife is Alice" ($B \leftarrow A$), during evaluation. Previous studies have widely regarded this limitation as an intrinsic constraint of autoregressive causal LLMs, suggesting that these systems prioritize memorizing specific facts over grasping abstract, higher-order rules. However, this study contests that perspective, demonstrating that this apparent barrier can be alleviated through a minor adjustment to the training dataset. Specifically, we introduce a straightforward regularization technique termed the "Identity Bridge," which involves incorporating identity statements like "The name of Alice is Alice" ($A \to A$) into the training mix. From a theoretical standpoint, our analysis of gradient descent’s implicit bias reveals that even a single-layer transformer can overcome the reversal curse when utilizing this approach. In practical experiments, a 1-billion-parameter pretrained language model fine-tuned with this method attained a 50% success rate on reversal tasks, a significant improvement compared to the near-zero performance observed in models trained exclusively on forward knowledge. By establishing a new theoretical basis for the reversal curse, our findings present a cost-effective and systematic strategy for helping LLMs extract higher-level principles from data.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




