arXiv

Training-Free Lexical-Dense Fusion for Conversational-Memory Retrieval

Title: Training-Free Lexical-Dense Fusion for Conversational-Memory Retrieval

Original: arXiv:2606.04194v1 Announce Type: new Abstract: Retrieving the few past turns that answer a new query across long multi-session histories is the retrieval bottleneck behind long-term conversational memory (LoCoMo, LongMemEval). Recent concurrent work, Nano-Memory, shows that scoring a session by the maximum query-turn similarity (late interaction, "Turn Isolation Retrieval") beats mean-pooled session embeddings. We do not claim that effect; we replicate it and ask what a training-free, CPU-only retrieval stage should add around it. We report four findings. (1) Fuse: score-level fusion of the late-interaction dense score with BM25, under a single leave-one-conversation-out weight, adds +8.8 to +17.2 points of LoCoMo Hit@1 over late interaction alone across six encoders (all p<1e-4), reaching Hit@1 0.752 / NDCG@5 0.829 (e5-large-v2), +11.2 pp over BM25. (2) An off-the-shelf web-search cross-encoder reranker over the fused top-10 hurts here, degrading Hit@1 by 6.9 pp (one reranker, one configuration). (3) A pooling-operator ablation shows top-k late interaction matches max-similarity, but a naive smooth-max (log-sum-exp) collapses for half the encoders. (4) The late-minus-early gap is large for all six encoders and tends to be larger for larger ones, while the marginal fusion gain shrinks; on LongMemEval-S, a lexical regime where BM25 saturates, the net fusion gain over BM25 is small and not significant. A per-category analysis frames the gain as a division of labor: dense late interaction helps most on multi-hop and temporal questions but trails BM25 on adversarial ones. The contribution is a controlled, reproducible account of a strong training-free retrieval recipe, not the late-interaction retriever itself (Nano-Memory's). We make no claim to a complete memory architecture; this is a retrieval-stage study.

Rewrite: The primary obstacle in long-term conversational memory systems, as identified in benchmarks like LoCoMo and LongMemEval, is the difficulty of retrieving the specific past turns necessary to answer a new query within extensive, multi-session dialogues. While recent research such as Nano-Memory demonstrates that maximizing query-turn similarity (a technique known as late interaction or "Turn Isolation Retrieval") outperforms mean-pooled session embeddings, our work does not propose this effect as a novel contribution. Instead, we replicate this phenomenon to investigate what additional components a training-free, CPU-efficient retrieval stage should incorporate. We present four key observations. First, combining the late-interaction dense score with BM25 at the score level using a single leave-one-conversation-out weighting scheme significantly improves performance. Across six different encoders, this fusion increased LoCoMo Hit@1 by 8.8 to 17.2 points compared to late interaction alone (with all p-values <1e-4). Using the e5-large-v2 encoder, the method achieved a Hit@1 of 0.752 and an NDCG@5 of 0.829, surpassing BM25 by 11.2 percentage points. Second, applying an off-the-shelf web-search cross-encoder reranker to the top-10 results from the fused method proved detrimental in this context, reducing Hit@1 by 6.9 percentage points (based on one specific reranker and configuration). Third, ablation studies on pooling operators reveal that while top-k late interaction aligns with max-similarity, employing a naive smooth-max function (log-sum-exp) causes performance to collapse for half of the tested encoders. Fourth, we observe a substantial gap between late and early interactions across all six encoders, which widens as encoder size increases, though the marginal benefit of fusion diminishes accordingly. On the LongMemEval-S dataset, where BM25 performance is already saturated due to its lexical nature, the additional gain from fusion over BM25 is negligible and statistically insignificant. A category-specific breakdown characterizes this improvement as a division of labor: dense late interaction is particularly effective for multi-hop and temporal queries, whereas BM25 remains superior for adversarial questions. Our primary contribution is a controlled, reproducible methodology for a robust training-free retrieval pipeline, distinct from the underlying late-interaction retriever introduced by Nano-Memory. We explicitly do not claim to present a comprehensive memory architecture; rather, this study focuses exclusively on the retrieval stage.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

AI Concentration Risk Is the Problem: 3-Minutes MLIV
Bloomberg

AI Concentration Risk Is the Problem: 3-Minutes MLIV

The article argues that AI concentration risk, rather than the technology itself, is the primary concern. It highlights ...

Reuters

Foxconn announces strategic collaboration with Intel on next-gen AI infrastructure

Foxconn and Intel announced a strategic partnership to develop next-generation AI infrastructure. This collaboration aim...

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)
Bloomberg

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)

SpaceX aims for a record $75 billion valuation through an initial public offering. This historic IPO marks a significant...

Broadcom AI Chip Outlook Disappoints Investors
Bloomberg

Broadcom AI Chip Outlook Disappoints Investors

Broadcom’s AI chip projections disappointed investors, dampening market sentiment. The outlook fell short of expectation...

Reuters

Europe's tech 'liberation day'? Computer says not yet

Europe’s expected tech breakthrough remains unrealized, as current systems indicate that a true "liberation day" has not...

Hiranandani Group CEO on Powering India's Digital Future
Bloomberg

Hiranandani Group CEO on Powering India's Digital Future

Hiranandani Group CEO discusses driving India's digital transformation.