Finding the Minimal Parameter Budget for Implicit Reasoning: A Data Complexity Driven Scaling Law for Language Models
Title: Determining the Minimum Parameter Budget for Implicit Reasoning: A Data Complexity-Based Scaling Law for Language Models
Abstract:
While reasoning stands as a fundamental capability of language models (LMs), the precise amount of model capacity required to foster reasoning during the pretraining phase remains an open question. This study investigates the minimal parameter budget necessary for implicit reasoning, which is characterized by the model's capacity to deduce novel facts from acquired knowledge without relying on explicit chain-of-thought supervision. To isolate this specific phenomenon, we train LMs from the ground up within a controlled synthetic setting that replicates the structure and distribution found in real-world knowledge graphs, subsequently assessing their performance in completing missing edges through multi-hop inference. Both theoretically and empirically, we uncover a scaling law that connects this optimal parameter budget to a graph search entropy metric. Our analysis, spanning various model dimensions, training durations, and graph complexities, demonstrates that a language model of optimal size can effectively reason over a maximum of approximately 0.008 bits of information per parameter. These results delineate the minimal sufficient capacity for implicit reasoning during pretraining. Furthermore, our findings offer principled recommendations for aligning model size with data complexity and shed new light on the scaling dynamics of reasoning in large language models.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




