When Knowledge Is Not Free: Cost-Aware Evidence Selection in Retrieval-Augmented Generation
Title: The Price of Information: Implementing Cost-Aware Evidence Selection in Retrieval-Augmented Generation
Abstract:
While Retrieval-Augmented Generation (RAG) frameworks generally operate on the premise that external knowledge is freely available, a significant portion of high-quality data is subject to paywalls, licensing restrictions, or other financial barriers. To address this discrepancy, we propose "cost-aware RAG," a framework in which retrieved evidence is categorized by access costs, requiring systems to generate responses within a defined budget. We operationalize this concept by integrating access-friction tiers into the MS MARCO v2.1 dataset and assessing budget-constrained evidence selection across both general and specialized question-answering benchmarks.
Our analysis reveals that static selection methods are fragile; no single fixed selector consistently outperforms others, and increasing the budget does not guarantee better answer quality, even when expensive sources are highly relevant to the domain. We further examine agentic cost-aware RAG, wherein large language models autonomously determine when to retrieve information, which access tier to utilize, and when to halt the process. While these agents demonstrate considerable potential as adaptive controllers for evidence acquisition, their performance varies significantly depending on the specific model and task. These insights highlight cost-aware evidence acquisition as a critical hurdle for the evolution of next-generation RAG systems. All associated code and data can be accessed at https://github.com/Mignonmy/Cost-Aware.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





