The Shadow Price of Reasoning: Economic Perspective on Optimal Budget Allocation for LLMs
Title: The Economic Cost of Cognition: Optimizing LLM Budgets Through Resource Allocation
Abstract:
While inference-time scaling offers a promising path to boosting Large Language Model (LLM) capabilities, practical implementation is often hindered by rigid computational limits. This study addresses this challenge by treating the distribution of inference budgets as a global constrained optimization problem, applying economic frameworks to the task. We model the utility derived from reasoning per query using a shifted-surge function, which allows us to establish an optimal allocation strategy driven by a global shadow price. This price mechanism balances marginal utility against resource scarcity. Leveraging this theoretical foundation, we introduce Constrained Latent-utility Equilibrium Allocation for Reasoning (CLEAR). The CLEAR framework enables intelligent resource withdrawal from queries deemed insolvent, redirecting those resources toward queries that are close to their solvability thresholds. Our extensive evaluations across various reasoning tasks and traffic patterns reveal that CLEAR substantially enhances the trade-off between total token expenditure and average accuracy. In environments characterized by tight resource constraints, CLEAR delivers a global accuracy gain of up to three times that of uniform allocation strategies.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



