arXiv

CRAFT: Cost-aware Refinement And Front-aware Tuning of Prompts

June 4, 2026 · Shanu Kumar, Shubhanshu Khandelwal, Akhila Yesantarao Venkata, Parag Agrawal, Yova Kementchedjhieva, Manish Gupta · Original Source

Title: CRAFT: Cost-aware Refinement And Front-aware Tuning of Prompts

Abstract:

Optimizing prompts for peak accuracy frequently results in extended sequences, which significantly increases inference expenses with each model interaction. Because the ideal balance between performance and expenditure varies depending on the specific task and available budget, prompt optimization should be viewed as a search across the Pareto front of accuracy versus prompt-token cost, rather than the identification of a single optimal prompt. A common but flawed approach involves collapsing these dual objectives into a weighted sum. This method locks in the trade-off weight prior to the search process, typically yielding only a limited segment of the Pareto front—a deficiency we term "scalarization collapse."

To address this, we introduce CRAFT (Cost-aware Refinement And Front-aware Tuning), an optimizer designed to navigate the Pareto front. CRAFT regards validation calls to the target Large Language Model as a limited resource, strategically distributing them among candidate prompts situated near the optimistic candidate front. In every iteration, distinct generators focused on accuracy and cost respectively suggest modifications. The process utilizes a Pareto-gap acquisition function to manage the validation budget allocated per round, while NSGA-II retention mechanisms ensure the population remains diverse and well-spread.

Evaluations across six benchmarks covering classification and reasoning tasks demonstrate that CRAFT’s retained fronts successfully encompass both high-accuracy and low-cost areas. In contrast, baseline methods relying on accuracy-only, cost-only, or weighted-sum objectives tend to cluster within much narrower segments of the solution space. Consequently, CRAFT transforms the accuracy-cost trade-off from a pre-defined weight into a decision made after the search is complete.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC