arXiv

DEFLECT: Temporal Counterfactual Preference Learning for Delay-Robust Asynchronous VLAs

June 4, 2026 · Yixiang Zhu, Yonghao Chen, Zijie Yang, Yusong Hu, Xinyu Chen · Original Source

Title: DEFLECT: Temporal Counterfactual Preference Learning for Delay-Robust Asynchronous VLAs

Abstract:

To mask the high latency of large models, Vision-Language-Action (VLA) policies are increasingly adopting asynchronous inference. This approach eliminates the stop-and-go inefficiencies inherent in synchronous action-chunk execution but introduces a prediction-execution mismatch. Specifically, the model computes the next action chunk based on a stale observation captured at the start of inference, yet only executes it after the robot and environment have progressed. Consequently, actions optimized for the state at prediction time may become misaligned with the actual state at execution time. Current methods for runtime repair, behavior cloning, and preference alignment fail to explicitly train policies to correct for this stale-input discrepancy.

We introduce DEFLECT, an offline post-training framework designed to enhance the delay robustness of asynchronous VLAs. DEFLECT transforms latency-induced mismatches into counterfactual preference signals. Using a frozen reference VLA, it generates a preferred action chunk from the future observation available at execution time and a rejected chunk from the stale observation available at prediction time. The trainable policy is then tasked with scoring both chunks against the same deployment-time input, thereby learning to prioritize actions aligned with the execution-time state. Simultaneously, a supervised fine-tuning anchor ensures the expert action manifold is preserved. Notably, DEFLECT operates without requiring human preference labels, reward models, online robot rollouts, architectural modifications, or extra computation during inference. Evaluations across the Kinetix and LIBERO benchmarks, as well as three real-robot tasks, demonstrate that DEFLECT significantly enhances delay robustness compared to strong asynchronous VLA baselines. It increases success rates under high latency by up to 6.4 percentage points and achieves a 4.6 percentage-point improvement at the maximum delay tested on a real-scale VLA.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC