Unlocking the Black Box of Latent Reasoning: An Interpretability-Guided Approach to Intervention
Title: Demystifying Latent Reasoning: An Interpretability-Driven Strategy for Intervention
Abstract:
While Large Language Models (LLMs) leverage latent reasoning to execute multi-step inference within continuous hidden states—thereby achieving greater efficiency than explicit Chain-of-Thought (CoT) methods—the inherent opacity of these continuous thought vectors undermines their reliability and controllability. This study aims to close the divide between mechanistic interpretability and practical control. Through a comprehensive analysis employing structural, causal, and geometric probes, we demonstrate that latent vectors contain compressed and faithful representations of reasoning steps, with earlier vectors serving as pivotal causal hubs. Leveraging these insights, we develop a collection of training-free, decode-time interventions that enhance the latent reasoning process by enforcing the discovered geometric and semantic priors. Our extensive experiments, conducted across various model scales and a wide range of task domains, show that these interventions consistently boost reasoning accuracy. Crucially, these interpretability-guided methods unlock latent capabilities and improve performance without requiring any parameter updates.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





