arXiv

eMoT: evolving Memory-of-Thought via Symbolic Anchoring and Memory Corrosion

June 2, 2026 · Xiang Li, Jiwei Wei, Ke Liu, Yitong Qin, Jinyu Guo, Malu Zhang, Peng Wang, Yang Yang · Original Source

Title: eMoT: evolving Memory-of-Thought via Symbolic Anchoring and Memory Corrosion

Original: arXiv:2606.02054v1 Announce Type: new Abstract: While Large Language Models (LLMs) achieve impressive performance on multi-step reasoning tasks, their reliability is persistently hindered by critical limitations such as unconstrained hallucinations and poor numerical computation. Fundamentally, these issues arise because standard models treat reasoning as a transient, one-off generation process rather than retaining and refining successful procedural logic. To address these challenges, we propose eMoT (evolving Memory-of-Thought), a unified framework that stabilizes multi-step reasoning by treating reasoning trajectories as dynamic, evolving memories rather than static templates. The framework primarily consists of three interconnected modules: (i) a memory corrosion mechanism that reinforces high-utility reasoning structures while gradually decaying less frequent ones; (ii) a symbolic anchoring engine that utilizes Python for deterministic computation, much like a human uses a calculator; and (iii) a consistency-driven refinement process that aligns neural inference with symbolic outcomes, reducing the accumulation of logical discrepancies. Across multiple reasoning benchmarks, eMoT improves accuracy and solution consistency over standard Chain-of-Thought and structured reasoning baselines.On the traditional task Game of 24, eMoT achieves 100% accuracy, surpassing the baseline by up to 17.6%. Evaluations on mathematical task GSM8K, ASDiv, SVAMP, and MGSM further show consistent gains in multi-step mathematical reasoning. In our evaluation, we achieve superior performance despite utilizing a lightweight backbone model with constrained baseline capabilities. Compared to alternative methods that rely on massively scaled models, our results demonstrate that the performance gains are fundamentally driven by the eMoT framework's reasoning control rather than sheer model size.

Rewritten: Despite the strong capabilities of Large Language Models (LLMs) in multi-step reasoning, their trustworthiness remains compromised by persistent issues like uncontrolled hallucinations and inadequate numerical processing. These shortcomings stem from the conventional approach where reasoning is viewed as a fleeting, single-generation event, rather than a process that retains and polishes effective procedural logic. To overcome these obstacles, we introduce eMoT (evolving Memory-of-Thought), a cohesive framework that enhances multi-step reasoning by conceptualizing reasoning paths as dynamic, evolving memories instead of rigid templates. eMoT is built upon three core, interlinked components: (i) a memory corrosion mechanism designed to strengthen high-value reasoning patterns while slowly diminishing the influence of less common ones; (ii) a symbolic anchoring engine that employs Python for precise calculations, functioning similarly to a human using a calculator; and (iii) a consistency-focused refinement stage that harmonizes neural inference with symbolic results, thereby minimizing the buildup of logical errors. In various reasoning benchmarks, eMoT outperforms standard Chain-of-Thought and structured reasoning approaches in both accuracy and consistency. Notably, on the classic Game of 24 challenge, eMoT reaches 100% accuracy, exceeding the baseline by as much as 17.6%. Further assessments on mathematical datasets such as GSM8K, ASDiv, SVAMP, and MGSM confirm steady improvements in complex mathematical reasoning. Importantly, these superior results were achieved using a lightweight backbone model with limited baseline capabilities. Unlike other methods that depend on extremely large models, our findings indicate that the performance enhancements are primarily due to the eMoT framework’s effective reasoning control, rather than relying on massive model scale.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC