arXiv

A Close Look At World Model Recovery In Supervised Fine-Tuned LLM Planners

Title: Investigating World Model Reconstruction in Supervised Fine-Tuned LLM Planners

Original: arXiv:2606.03685v1 Announce Type: cross Abstract: Supervised fine-tuning (SFT) improves end-to-end classical planning in large language models (LLMs), but do these models also learn to represent and reason about the planning problems they are solving? Due to the relative complexity of classical planning problems and the challenge that end-to-end plan generation poses for LLMs, it has been difficult to explore this question. In our work, we devise and perform a series of interpretability experiments that holistically interrogate world model recovery by examining both internal representations and generative capabilities of fine-tuned LLMs. We find that: a) Supervised fine-tuning on valid action sequences enables LLMs to linearly encode action validity and some state predicates. b) Models that struggle to use output probabilities for classifying action validity may still learn internal representations that separate valid from invalid actions. c) Broader state space coverage during fine-tuning, such as from random walk data, yields more accurate recovery of the underlying world model. In summary, this work contributes a recipe for applying interpretability techniques to planning LLMs and generates insights that shed light on open questions about how knowledge is represented in LLMs.

Rewritten:

Title: An In-Depth Analysis of World Model Reconstruction in Supervised Fine-Tuned LLM Planners

Abstract: While supervised fine-tuning (SFT) enhances the ability of large language models (LLMs) to perform end-to-end classical planning, it remains unclear whether these systems actually acquire the capacity to represent and reason about the specific planning tasks they address. The inherent complexity of classical planning, combined with the difficulties LLMs face in generating complete plans end-to-end, has historically made this inquiry challenging. To address this, our study implements a comprehensive suite of interpretability experiments designed to assess world model recovery by analyzing both the internal representations and generative outputs of fine-tuned LLMs. Our findings indicate three key outcomes: First, SFT utilizing valid action sequences allows LLMs to linearly encode action validity along with certain state predicates. Second, even when models fail to leverage output probabilities for classifying action validity, they can still develop internal representations capable of distinguishing between valid and invalid actions. Third, expanding state space coverage during the fine-tuning phase—such as by incorporating data from random walks—leads to a more precise reconstruction of the underlying world model. Ultimately, this research provides a methodological framework for applying interpretability techniques to planning-focused LLMs and offers valuable insights into ongoing debates regarding knowledge representation within these models.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TikTok Billionaire Tops Ambani as Asia’s Second-Richest
Bloomberg

TikTok Billionaire Tops Ambani as Asia’s Second-Richest

TikTok founder surpasses Mukesh Ambani to become Asia’s second-richest person, marking a significant shift in the region...

Publishers in UK can opt out of Google AI search results
BBC News

Publishers in UK can opt out of Google AI search results

UK publishers can now opt out of Google’s AI search summaries, a CMA ruling designed to boost their bargaining power and...

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.
Bloomberg

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.

Kioxia’s market cap nears Toyota’s, signaling a major shift in Japan’s corporate hierarchy. This narrowing gap highlight...

Reuters

Morning Bid: Marvell, a fitting name for the latest AI darling

Reuters highlights Marvell as a top AI stock, noting its name perfectly suits its status as the newest market darling.

Financial Times

Tim Hayward: I built the Jaguar E-Type of computer keyboards

Tim Hayward compares his bespoke keyboard designs to the Jaguar E-Type. He explores high-end customization for personal ...

Financial Times

AI Labs: Zuckerberg’s $100bn gamble

Meta’s $100 billion AI investment aims to secure AI dominance, but questions remain whether sheer spending can outpace c...