ANN Search: Recall What Matters
Title: ANN Search: Recall What Matters
Abstract:
Approximate nearest neighbor (ANN) search has emerged as a fundamental component in information retrieval and contemporary machine learning workflows, underpinning applications ranging from classification to retrieval-augmented generation. Currently, the research community predominantly assesses and optimizes ANN algorithms based on their throughput at specific Recall@k levels, a metric quantifying the proportion of true exact neighbors successfully retrieved. However, we contend that the critical factor in ANN search is the inherent quality of the retrieved results, rather than their mere overlap with the ground-truth k-nearest neighbors.
We demonstrate that relying on Recall@k to gauge retrieval quality imposes unnecessary computational burdens. Consequently, we propose substituting it with 1/Ratio@k, which represents the inverse approximation ratio. This metric assesses the disparity in distances between the retrieved neighbors and the true neighbors. Notably, 1/Ratio@k requires no judgment calls or hyperparameter tuning and can be calculated solely using standard inputs from ANN benchmarks.
We conducted a comprehensive benchmark of state-of-the-art ANN algorithms across various datasets with differing intrinsic dimensionalities. Our evaluation compared the two metrics regarding efficiency, downstream classification performance, and retrieval-augmented generation outcomes. In terms of efficiency, optimizing for 1/Ratio@k achieves operational quality thresholds at a significantly reduced computational cost compared to Recall@k. Furthermore, in downstream tasks, key performance indicators—such as label precision, semantic similarity, BERTScore, and LLM-graded quality—remain remarkably stable even when Recall@k experiences substantial declines. Conversely, the inverse approximation ratio aligns closely with this stability, reflecting true utility far more accurately than Recall@k. Ultimately, while Recall@k exaggerates the actual cost of approximation, 1/Ratio@k serves as a more precise and practical proxy for genuine ANN quality.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC




