arXiv

Test-Time Training for Zero-Resource Dense Retrieval Reranking

June 2, 2026 · Shiyan Liu, Yichen Li · Original Source

Title: Enhancing Zero-Resource Dense Retrieval Reranking Through Test-Time Training

Abstract

While dense retrievers are highly effective at generating initial candidate sets, they often struggle to perform efficient reranking when no training data is available. Current methods encounter a significant trade-off: cross-encoders offer superior reranking accuracy but demand expensive supervised training and introduce substantial latency, whereas unsupervised BM25 reranking tends to undermine dense retrieval performance across the majority of BEIR benchmarks.

To address this challenge, we introduce DART (Dense Adaptive Reranking at Test-time), a method that overcomes this dilemma by dynamically adjusting the scoring function during inference. For every query, the system utilizes the top-ranked documents as pseudo-positives and the bottom-ranked documents as pseudo-negatives. This approach provides immediate, albeit noisy, supervision signals that allow for the adaptation of a bilinear scoring matrix, $W$, through just a few gradient updates. Additionally, we implement a confidence-weighted margin loss and a cross-query momentum buffer to facilitate the warm-starting of the adaptation process across different queries.

Evaluated across six BEIR benchmarks, DART delivers a mean per-dataset relative NDCG@10 improvement of +2.1% compared to the standard dense retrieval baseline. Notably, this enhancement is achieved with less than 10ms of added latency per query, highlighting the method’s strong potential for improving zero-shot performance and generalizing across diverse domains.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC