arXiv

A Primer in Post-Training Reasoning Data: What We Know About How It Works

June 2, 2026 · Yaoming Li, Guangxiang Zhao, Qilong Shi, Lin Sun, Xiangzheng Zhang, Tong Yang · Original Source

Title: Understanding Post-Training Reasoning Data: A Comprehensive Overview of Current Knowledge

Abstract:

Post-training has emerged as the central catalyst for recent advancements in large reasoning models, with the quality and composition of reasoning data serving as the decisive factor in the success of this phase. Although research into post-training reasoning data has expanded swiftly, the existing body of work remains fragmented, dispersed across various sources such as dataset publications, reinforcement learning methodologies, reward model analyses, benchmarking studies, and reports on frontier systems. This article presents the first comprehensive primer, consolidating insights from more than 150 significant public studies and system reports focused on post-training reasoning data. The field is structured around four fundamental inquiries: the nature of the data objects involved, the attributes that render them effective, the methods used to construct them, and the strategies for scaling them. Collectively, this framework offers a structured attribution model to guide future releases of reasoning data and the development of post-training protocols.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC