InfoMem: Training Long-Context Memory Agents with Answer-Conditioned Information Gain
Title: InfoMem: Enhancing Long-Context Memory Agents via Answer-Conditioned Information Gain
Abstract:
Large language models tasked with long-context applications must effectively extract and retain information pertinent to the answer from extensive inputs. Chunk-wise memory agents offer a solution by processing documents sequentially, updating a concise memory state, and ultimately deriving the final response based on this accumulated context. Despite their potential, current reinforcement learning (RL) approaches for these agents face limitations: they typically depend on sparse rewards tied only to the final answer or utilize lexical overlap metrics for intermediate memory and retrieval steps. Such signals monitor general task completion or local matching but fail to directly assess whether the resulting memory content is sufficient to support the correct answer.
To address this gap, we introduce InfoMem, a novel reward mechanism designed to train chunk-wise memory agents by evaluating the utility of the final memory through answer-conditioned information. Specifically, InfoMem quantifies the extent to which the final memory boosts the model’s per-token log-likelihood of generating the ground-truth answer. To ensure stability during RL optimization, this signal is applied exclusively to successful trajectories and is normalized prior to being combined with other reward components.
Experiments conducted under an identical GRPO framework and training budget demonstrate that InfoMem outperforms existing memory-agent RL baselines in long-context tasks. Our analysis further reveals key insights for effective final-memory rewards: they should be restricted to successful trajectories, normalized before reward composition, and conditioned on the answer rather than the initial query. The code for this work is publicly available at https://github.com/GenSouKa1/InfoMem.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



