arXiv

Can Large Language Models Generalize Procedures Across Representations?

Title: Can Large Language Models Generalize Procedures Across Representations?

Abstract: Although large language models (LLMs) undergo rigorous training and evaluation using symbolic formats like code and graphs, practical user requirements are frequently articulated in natural language. This raises the critical question of how effectively LLMs can transfer skills between these distinct representational forms. To investigate this, we examine isomorphic tasks where procedures are encoded as code, visualized as graphs, or described in natural language—such as step scheduling in planning scenarios. Our results indicate that relying exclusively on post-training methods with either graph or code datasets fails to ensure robust generalization to natural language equivalents. Conversely, training exclusively on natural language data yields inefficient improvements in performance. To bridge this divide, we introduce a two-stage reinforcement learning curriculum that prioritizes symbolic data before transitioning to natural language inputs. This approach significantly boosts performance across various model architectures and task types. Notably, a 1.5B parameter Qwen model optimized with our method achieves performance levels comparable to zero-shot GPT-4o in naturalistic planning contexts. Furthermore, our analysis interprets successful cross-representation generalization as a type of generative analogy, a capability that our proposed curriculum actively fosters. The dataset and code used in this paper can be found \href{https://github.com/fangru-lin/procedure_generalization_llm}{here}.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

Nvidia-Backed Robotics Startup Generalist AI Valued at $2 Billion
Bloomberg

Nvidia-Backed Robotics Startup Generalist AI Valued at $2 Billion

Nvidia-backed robotics startup Generalist AI has reached a $2 billion valuation. Founders Pete Florence, Andy Zeng, and ...

TechCrunch

Oura Ring 5 review: Thinner, lighter, better

The Oura Ring 5 is 40% smaller and lighter than its predecessor, offering superior comfort and a discreet, jewelry-like ...

Financial Times

How AI has de-skilled translation

AI fragments specialist translation into routine tasks, effectively de-skilling the profession. This shift reduces compl...

Zurich Insurance Expands Data-Center Offering Beyond the US
Bloomberg

Zurich Insurance Expands Data-Center Offering Beyond the US

Zurich Insurance Group is expanding its data center insurance products internationally, extending coverage beyond the Un...

Emerging-Market Stocks Fall as Broadcom Miss Disrupts AI Trade
Bloomberg

Emerging-Market Stocks Fall as Broadcom Miss Disrupts AI Trade

Broadcom’s earnings miss triggered a sell-off in AI stocks, dragging down emerging-market equities. This disruption high...

Revolut Co-Founder, CTO Vlad Yatsenko to Step Down From Role
Bloomberg

Revolut Co-Founder, CTO Vlad Yatsenko to Step Down From Role

Revolut co-founder and CTO Vlad Yatsenko is stepping down from his executive role. The resignation marks a significant l...