On Effectiveness and Efficiency of Agentic Tool-calling and RL Training
Title: Assessing the Efficacy and Computational Efficiency of Agentic Tool-Calling and Reinforcement Learning Training
Abstract:
Tool-calling serves as a foundational element for contemporary large language model (LLM) agents, extending their capabilities beyond inherent parametric knowledge. This study investigates tool-calling through two complementary lenses: effectiveness, which concerns the metrics used to gauge this capability, and efficiency, which focuses on the learning process. Regarding effectiveness, we conduct a systematic analysis of evaluation pipelines, demonstrating that outcomes are highly susceptible to minor, frequently undocumented implementation details. Factors such as the selection of random seeds, system prompts, the construction of multi-turn templates, and the method for carrying forward prior interaction and reasoning history can cause significant variances in reported performance. In multi-turn scenarios, the absence of rigorous standardization renders leaderboard rankings unreliable. On the front of efficiency, we scrutinize standard reinforcement learning (RL) approaches for tool-calling, pinpointing two primary sources of computational waste: first, many prompts yield no learning signal during rollouts; and second, policy updates entail prohibitive computational costs during optimization. Leveraging these insights, we propose two techniques to accelerate RL-based tool-calling training. These methods deliver considerable wall-clock speedups while maintaining performance levels.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




