Training-Free Lexical-Dense Fusion for Conversational-Memory Retrieval
Title: Training-Free Lexical-Dense Fusion for Conversational-Memory Retrieval
Original: arXiv:2606.04194v1 Announce Type: new Abstract: Retrieving the few past turns that answer a new query across long multi-session histories is the retrieval bottleneck behind long-term conversational memory (LoCoMo, LongMemEval). Recent concurrent work, Nano-Memory, shows that scoring a session by the maximum query-turn similarity (late interaction, "Turn Isolation Retrieval") beats mean-pooled session embeddings. We do not claim that effect; we replicate it and ask what a training-free, CPU-only retrieval stage should add around it. We report four findings. (1) Fuse: score-level fusion of the late-interaction dense score with BM25, under a single leave-one-conversation-out weight, adds +8.8 to +17.2 points of LoCoMo Hit@1 over late interaction alone across six encoders (all p<1e-4), reaching Hit@1 0.752 / NDCG@5 0.829 (e5-large-v2), +11.2 pp over BM25. (2) An off-the-shelf web-search cross-encoder reranker over the fused top-10 hurts here, degrading Hit@1 by 6.9 pp (one reranker, one configuration). (3) A pooling-operator ablation shows top-k late interaction matches max-similarity, but a naive smooth-max (log-sum-exp) collapses for half the encoders. (4) The late-minus-early gap is large for all six encoders and tends to be larger for larger ones, while the marginal fusion gain shrinks; on LongMemEval-S, a lexical regime where BM25 saturates, the net fusion gain over BM25 is small and not significant. A per-category analysis frames the gain as a division of labor: dense late interaction helps most on multi-hop and temporal questions but trails BM25 on adversarial ones. The contribution is a controlled, reproducible account of a strong training-free retrieval recipe, not the late-interaction retriever itself (Nano-Memory's). We make no claim to a complete memory architecture; this is a retrieval-stage study.
Rewrite: The primary obstacle in long-term conversational memory systems, as identified in benchmarks like LoCoMo and LongMemEval, is the difficulty of retrieving the specific past turns necessary to answer a new query within extensive, multi-session dialogues. While recent research such as Nano-Memory demonstrates that maximizing query-turn similarity (a technique known as late interaction or "Turn Isolation Retrieval") outperforms mean-pooled session embeddings, our work does not propose this effect as a novel contribution. Instead, we replicate this phenomenon to investigate what additional components a training-free, CPU-efficient retrieval stage should incorporate. We present four key observations. First, combining the late-interaction dense score with BM25 at the score level using a single leave-one-conversation-out weighting scheme significantly improves performance. Across six different encoders, this fusion increased LoCoMo Hit@1 by 8.8 to 17.2 points compared to late interaction alone (with all p-values <1e-4). Using the e5-large-v2 encoder, the method achieved a Hit@1 of 0.752 and an NDCG@5 of 0.829, surpassing BM25 by 11.2 percentage points. Second, applying an off-the-shelf web-search cross-encoder reranker to the top-10 results from the fused method proved detrimental in this context, reducing Hit@1 by 6.9 percentage points (based on one specific reranker and configuration). Third, ablation studies on pooling operators reveal that while top-k late interaction aligns with max-similarity, employing a naive smooth-max function (log-sum-exp) causes performance to collapse for half of the tested encoders. Fourth, we observe a substantial gap between late and early interactions across all six encoders, which widens as encoder size increases, though the marginal benefit of fusion diminishes accordingly. On the LongMemEval-S dataset, where BM25 performance is already saturated due to its lexical nature, the additional gain from fusion over BM25 is negligible and statistically insignificant. A category-specific breakdown characterizes this improvement as a division of labor: dense late interaction is particularly effective for multi-hop and temporal queries, whereas BM25 remains superior for adversarial questions. Our primary contribution is a controlled, reproducible methodology for a robust training-free retrieval pipeline, distinct from the underlying late-interaction retriever introduced by Nano-Memory. We explicitly do not claim to present a comprehensive memory architecture; rather, this study focuses exclusively on the retrieval stage.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC




