arXiv

Geometric Latent Reasoning Induces Shorter Generations in LLMs

June 2, 2026 · Shashi Kumar, Yacouba Kaloga, Petr Motlicek, Ina Kodrasi, Andrea Cavallaro · Original Source

Title: Geometric Latent Reasoning Induces Shorter Generations in LLMs

Abstract:

Large language models (LLMs) currently tackle complex tasks by producing extensive sequences of explicit reasoning tokens. Although this approach yields strong results, it introduces significant costs, sensitivity to length, and limitations tied to discrete natural language. While latent reasoning presents a continuous alternative, identifying effective structures for intermediate latent states remains an unresolved issue. This study addresses the gap by framing latent reasoning as a geometric path-approximation problem situated within the model’s pretrained token-embedding space. We propose Geometric Latent Reasoning (GLR), a method that employs a lightweight transition head to forecast iterative direction updates within the embedding space. Leveraging textual chain-of-thought traces as anchors, GLR learns to mimic discrete reasoning trajectories while allowing for continuous deviations from precise token embeddings. Assessments on mathematical reasoning benchmarks using Qwen3 models highlight an emergent behavior: geometric latent reasoning significantly reduces generation length without relying on an explicit length objective. By substituting initial explicit reasoning with continuous latent steps, models frequently achieve accurate results with far fewer total generation steps. These insights indicate that continuous trajectories serve as compact intermediate reasoning states, revealing a novel tradeoff involving the latent computation budget, output length, and accuracy.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC