Evaluating the Reversal Curse in Model Editing
Title: Assessing the Reversal Curse in Model Editing
Abstract
Large language models (LLMs) frequently generate hallucinated content stemming from incorrect or obsolete information. Because retraining these models is computationally expensive, model editing has gained significant traction. However, despite the development of various benchmarks and techniques, current unidirectional editing and evaluation frameworks have overlooked the "reversal curse." This study investigates bidirectional language model editing to rigorously determine whether edited LLMs can retrieve knowledge in both directions. We introduce a reverse generalization metric and establish a benchmark named Bidirectional Assessment for Knowledge Editing (BAKE) to test if models updated with specific knowledge can recall it when queried in the reverse direction. Through extensive experiments involving diverse LLMs and editing techniques, we find that while most methods successfully recall facts along the intended modification path, they suffer from significant systematic failures when evaluated in reverse. To uncover the root causes of this reversal curse and identify mitigation strategies, we perform a comprehensive analysis from three distinct angles. Our results indicate that while In-Context Learning (ICL) offers some relief from the reversal curse, it is constrained by input length, lacks consistency, and risks introducing new hallucinations. Consequently, integrating the strengths of ICL with other editing methods presents a viable pathway for advancing new editing paradigms.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





