arXiv

Plan-R1: Safe and Feasible Trajectory Planning as Language Modeling

Title: Plan-R1: Ensuring Safety and Feasibility in Trajectory Planning Through Language Modeling

Abstract:

For real-world autonomous driving systems, the ability to generate safe and feasible trajectories is paramount. However, current learning-based planning approaches are heavily dependent on expert demonstrations. This reliance poses a significant risk: such data often lacks explicit safety awareness and may inadvertently teach the model undesirable habits, such as speeding, derived from suboptimal human driving records. Drawing inspiration from the advancements in large language models, we introduce Plan-R1, a novel two-stage trajectory planning framework that separates principle alignment from behavior learning.

The first stage involves pre-training a general trajectory predictor on expert data to capture a wide range of human-like driving behaviors. In the second stage, the model undergoes fine-tuning using Group Relative Policy Optimization (GRPO) with rule-based rewards. This process explicitly aligns the ego-vehicle’s planning with core principles, including traffic rule compliance, comfort, and safety. By adopting this two-stage approach, the framework preserves the naturalistic qualities of human driving while simultaneously enhancing safety awareness and filtering out negative patterns present in the original demonstrations.

We also identified a critical limitation when applying standard GRPO directly to planning tasks. Specifically, group-wise normalization tends to erase scale differences between groups. This issue causes rare groups with high-variance safety violations to exhibit advantages similar to those of abundant, low-variance safe groups, which inadvertently suppresses the optimization of safety-critical objectives. To resolve this, we propose Variance-Decoupled GRPO (VD-GRPO). This method replaces standard normalization with centering and fixed scaling, thereby preserving the absolute magnitude of rewards. This adjustment ensures that safety-critical objectives maintain their dominance throughout the training process.

Experiments conducted on the nuPlan benchmark indicate that Plan-R1 significantly enhances both the safety and feasibility of planning, achieving state-of-the-art results, especially within realistic reactive scenarios. Our code is publicly available at https://github.com/XiaolongTang23/Plan-R1.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Withings Debuts New Smart Scale Marketed Toward GLP-1 Users
Bloomberg

Withings Debuts New Smart Scale Marketed Toward GLP-1 Users

Withings launched a new smart scale targeting GLP-1 users, offering advanced body composition analysis. This device help...

TechCrunch

Rocket engine startup Impulse raises $500 million to hire people, not AI

Rocket engine startup Impulse Space raised $500 million to hire 200 engineers, prioritizing human expertise over AI for ...

Startup Impulse Space Raises $500 Million, Valued at $4 Billion
Bloomberg

Startup Impulse Space Raises $500 Million, Valued at $4 Billion

Impulse Space secured $500 million in funding, achieving a $4 billion valuation. This investment supports the developmen...

Walmart’s Answer to Apple Pay Wants to Be Your Favorite Financial App
Bloomberg

Walmart’s Answer to Apple Pay Wants to Be Your Favorite Financial App

Walmart’s new financial app aims to rival Apple Pay, positioning itself as a preferred digital payment and banking solut...

Nvidia Is Bigger, Stronger, and Trying to Slay the Laptop Dragon Again
Bloomberg

Nvidia Is Bigger, Stronger, and Trying to Slay the Laptop Dragon Again

Nvidia unveiled the RTX Spark Superchip at Computex 2026, aiming to challenge Intel’s PC dominance and modernize hardwar...

TechCrunch

Pacific Fusion’s latest prototype packs 440 gigawatts into an 80-nanosecond burst

Pacific Fusion’s new prototype delivers 440 gigawatts in 80 nanoseconds, securing over $1 billion in funding and enablin...