arXiv

ANN Search: Recall What Matters

Title: ANN Search: Recall What Matters

Abstract:

Approximate nearest neighbor (ANN) search has emerged as a fundamental component in information retrieval and contemporary machine learning workflows, underpinning applications ranging from classification to retrieval-augmented generation. Currently, the research community predominantly assesses and optimizes ANN algorithms based on their throughput at specific Recall@k levels, a metric quantifying the proportion of true exact neighbors successfully retrieved. However, we contend that the critical factor in ANN search is the inherent quality of the retrieved results, rather than their mere overlap with the ground-truth k-nearest neighbors.

We demonstrate that relying on Recall@k to gauge retrieval quality imposes unnecessary computational burdens. Consequently, we propose substituting it with 1/Ratio@k, which represents the inverse approximation ratio. This metric assesses the disparity in distances between the retrieved neighbors and the true neighbors. Notably, 1/Ratio@k requires no judgment calls or hyperparameter tuning and can be calculated solely using standard inputs from ANN benchmarks.

We conducted a comprehensive benchmark of state-of-the-art ANN algorithms across various datasets with differing intrinsic dimensionalities. Our evaluation compared the two metrics regarding efficiency, downstream classification performance, and retrieval-augmented generation outcomes. In terms of efficiency, optimizing for 1/Ratio@k achieves operational quality thresholds at a significantly reduced computational cost compared to Recall@k. Furthermore, in downstream tasks, key performance indicators—such as label precision, semantic similarity, BERTScore, and LLM-graded quality—remain remarkably stable even when Recall@k experiences substantial declines. Conversely, the inverse approximation ratio aligns closely with this stability, reflecting true utility far more accurately than Recall@k. Ultimately, while Recall@k exaggerates the actual cost of approximation, 1/Ratio@k serves as a more precise and practical proxy for genuine ANN quality.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

AI Concentration Risk Is the Problem: 3-Minutes MLIV
Bloomberg

AI Concentration Risk Is the Problem: 3-Minutes MLIV

The article argues that AI concentration risk, rather than the technology itself, is the primary concern. It highlights ...

Reuters

Foxconn announces strategic collaboration with Intel on next-gen AI infrastructure

Foxconn and Intel announced a strategic partnership to develop next-generation AI infrastructure. This collaboration aim...

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)
Bloomberg

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)

SpaceX aims for a record $75 billion valuation through an initial public offering. This historic IPO marks a significant...

Broadcom AI Chip Outlook Disappoints Investors
Bloomberg

Broadcom AI Chip Outlook Disappoints Investors

Broadcom’s AI chip projections disappointed investors, dampening market sentiment. The outlook fell short of expectation...

Reuters

Europe's tech 'liberation day'? Computer says not yet

Europe’s expected tech breakthrough remains unrealized, as current systems indicate that a true "liberation day" has not...

Hiranandani Group CEO on Powering India's Digital Future
Bloomberg

Hiranandani Group CEO on Powering India's Digital Future

Hiranandani Group CEO discusses driving India's digital transformation.