arXiv

MAVEN-T: Reinforced Heterogeneous Distillation for Real-Time Multi-Agent Trajectory Prediction

June 2, 2026 · Wenchang Duan, Zhenguo Gao, Jinguo Xian, Yi Shi · Original Source

Title: MAVEN-T: Reinforced Heterogeneous Distillation for Real-Time Multi-Agent Trajectory Prediction

Abstract:

Future motion prediction is a critical element of autonomous driving architectures, as it directly influences collision detection, behavioral planning, and control mechanisms. Despite its importance, the task presents significant challenges due to dense vehicle interactions, heterogeneous agent behaviors, multimodal outcome possibilities, and constrained onboard computational resources. While current graph-based, attention-driven, and generative models enhance interaction reasoning and uncertainty quantification, their high-capacity structures often impose prohibitive costs for real-time applications. Conversely, lightweight predictors and standard distillation techniques lower inference demands but typically depend on static imitation learning, failing to explicitly address safety-critical biases present in teacher models.

To address these limitations, this study introduces MAVEN-T, a reinforced heterogeneous distillation framework designed for real-time multi-agent trajectory prediction. The framework employs a high-capacity teacher network that utilizes a surround-aware graph encoder to model directed local interactions. This teacher integrates efficient temporal filtering with shifted-window spatial attention and employs a sparse Mixture-of-Experts head to decode maneuver-specific future trajectories. On the other end, a compact student network, built upon a GRU-Squeeze-and-Excitation architecture with a Low-Rank Adapted policy head, is trained through distillation at the feature, attention, and semantic levels.

To ensure predictions align with downstream driving behaviors, the student model undergoes further refinement using Proximal Policy Optimization (PPO) rewards focused on collision avoidance, comfort, and progress. Additionally, a complexity-aware curriculum and Elastic Weight Consolidation are employed to stabilize the training process across different stages. The proposed method is evaluated on the NGSIM, HighD, MoCAD, Argoverse~2, and Waymo Open Motion Dataset benchmarks, assessing metrics such as accuracy, efficiency, generalization, robustness, and closed-loop safety. Results demonstrate that the student model achieves a 6.2$\times$ reduction in parameters and a 3.7$\times$ increase in inference speed, operating with a latency of 14.6 ms on an NVIDIA Jetson AGX Orin while maintaining competitive accuracy.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC