arXiv

DEFLECT: Temporal Counterfactual Preference Learning for Delay-Robust Asynchronous VLAs

Title: DEFLECT: Temporal Counterfactual Preference Learning for Delay-Robust Asynchronous VLAs

Abstract:

To mask the high latency of large models, Vision-Language-Action (VLA) policies are increasingly adopting asynchronous inference. This approach eliminates the stop-and-go inefficiencies inherent in synchronous action-chunk execution but introduces a prediction-execution mismatch. Specifically, the model computes the next action chunk based on a stale observation captured at the start of inference, yet only executes it after the robot and environment have progressed. Consequently, actions optimized for the state at prediction time may become misaligned with the actual state at execution time. Current methods for runtime repair, behavior cloning, and preference alignment fail to explicitly train policies to correct for this stale-input discrepancy.

We introduce DEFLECT, an offline post-training framework designed to enhance the delay robustness of asynchronous VLAs. DEFLECT transforms latency-induced mismatches into counterfactual preference signals. Using a frozen reference VLA, it generates a preferred action chunk from the future observation available at execution time and a rejected chunk from the stale observation available at prediction time. The trainable policy is then tasked with scoring both chunks against the same deployment-time input, thereby learning to prioritize actions aligned with the execution-time state. Simultaneously, a supervised fine-tuning anchor ensures the expert action manifold is preserved. Notably, DEFLECT operates without requiring human preference labels, reward models, online robot rollouts, architectural modifications, or extra computation during inference. Evaluations across the Kinetix and LIBERO benchmarks, as well as three real-robot tasks, demonstrate that DEFLECT significantly enhances delay robustness compared to strong asynchronous VLA baselines. It increases success rates under high latency by up to 6.4 percentage points and achieves a 4.6 percentage-point improvement at the maximum delay tested on a real-scale VLA.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

Dimon and SpaceX Executives to Pitch IPO to Clients
Bloomberg

Dimon and SpaceX Executives to Pitch IPO to Clients

JPMorgan Chase CEO Jamie Dimon and SpaceX executives are pitching IPO details to clients.

Financial Times

Europe is finally flexing its innovation muscles

The EU’s new tech sovereignty package signals a positive shift from defensive regulation to proactive innovation, markin...

Apollo’s Zelter Expects High-Grade Debt Sales to Top US Treasuries
Bloomberg

Apollo’s Zelter Expects High-Grade Debt Sales to Top US Treasuries

Apollo’s Zelter expects high-grade debt sales to surpass US Treasuries. He anticipates investment-grade debt outperformi...

EU Insurance Watchdog Warns on Loan Risks
Bloomberg

EU Insurance Watchdog Warns on Loan Risks

EIOPA warns insurers to closely monitor loan risks, though initial reports lack specific details on the nature or scope ...

Glazer Family Members Said to Study Manchester United Stake Sale
Bloomberg

Glazer Family Members Said to Study Manchester United Stake Sale

Reports indicate the Glazer family is evaluating a potential sale of their Manchester United stake, with family members ...

Ares' Blair Jacbobson: Disconnect Over Private Credit Headlines
Bloomberg

Ares' Blair Jacbobson: Disconnect Over Private Credit Headlines

Ares’ Blair Jacobson argues that private credit headlines misrepresent reality, highlighting a disconnect between media ...