R2IF: Aligning Reasoning with Decisions via Composite Rewards for Interpretable LLM Function Calling
Title: R2IF: Enhancing Interpretable LLM Function Calling by Aligning Reasoning with Decisions Through Composite Rewards
Abstract:
While large language models (LLMs) leverage function calling to interact with external tools, current reinforcement learning (RL) methods often struggle with a disconnect between the model's reasoning steps and its actual tool-selection decisions. To address this, we introduce R2IF, a reasoning-aware RL framework designed for interpretable function calling. This approach utilizes a composite reward structure that combines format and correctness constraints, a Chain-of-Thought Effectiveness Reward (CER), and a Specification-Modification-Value (SMV) reward, all optimized through the GRPO algorithm. Our evaluations on BFCL and ACEBench demonstrate that R2IF surpasses baseline methods by as much as 34.62% (specifically with Llama3.2-3B on BFCL). Additionally, it achieves a positive Average CoT Effectiveness score of 0.05 for Llama3.2-3B, thereby improving both the accuracy of function calls and the interpretability required for reliable deployment of tool-augmented LLMs.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



