The Shape of Wisdom: Decision Trajectories in Language Models
Title: The Geometry of Insight: Mapping Decision Paths in Large Language Models
Abstract: Contrary to the notion that language models merely select a final output, their decision-making processes evolve dynamically across network layers. Our analysis, encompassing 9,000 distinct trajectories from the MMLU benchmark, examines three prominent models: Qwen2.5-7B-Instruct, Llama-3.1-8B-Instruct, and Mistral-7B-Instruct-v0.3. We observe that answer accuracy shifts in structured patterns as data passes through the network depth. To characterize these trajectories, we utilize three metrics: the prevailing answer margin, the variation in that margin at the subsequent layer, and the proximity to a decision reversal.
Our primary finding reveals a distinction between correctness and stability. Surprisingly, the most prevalent category is "unstable-correct," rather than "stable-correct." By tracing a subset of these cases, we investigate the factors influencing the answer margin. In instances of stable correctness, the average attention mechanism aligns with the correct direction, whereas the average Multi-Layer Perceptron (MLP) component does not. Furthermore, span deletion experiments indicate that eliminating text supporting the answer diminishes the margin, while removing distractor-like content enhances it.
This study does not provide a complete circuit explanation. Instead, it offers a reproducible methodology for identifying which answers are firmly established, which remain precarious, and which specific sources drive their changes.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




