Hidden Thoughts Are Not Secret: Reasoning Trace Exposure in LLMs
Title: Reasoning Traces Are Not Private: Investigating Exposure in Large Language Models
Abstract: Reasoning traces have emerged as critical learning signals for enhancing and transferring the capabilities of large language models. Specifically, granular traces facilitate the distillation of reasoning behaviors from powerful teacher models into less capable student models. Because this capability transfer is so valuable, numerous deployed systems utilizing reasoning models opt to conceal raw internal traces, revealing only summaries and final answers to users. This practice prompts the question: Does such interface-level concealment effectively prevent users from extracting useful reasoning supervision via prompting? To investigate this, we introduce Reasoning Exposure Prompting (REP), a lightweight in-context elicitation technique. REP leverages demonstrations generated by shadow models, wrapped in auxiliary code-like structures, to draw out user-visible reasoning traces from a target model. Our experiments, conducted across standard reasoning datasets, various victim models, and different student model distillation scenarios, demonstrate that REP significantly boosts the similarity between exposed traces and the internal traces conditioned by REP, all while maintaining the integrity of useful reasoning signals.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




