Echo-Infinity: Learning Evolving Memory for Real-Time Infinite Video Generation
Title: Echo-Infinity: Learning Evolving Memory for Real-Time Infinite Video Generation
Abstract
This paper introduces Echo Infinity, an autoregressive (AR) framework designed for the real-time generation of infinite-length videos. The system utilizes a learnable, evolving memory mechanism to dynamically filter, abstract, and compress historical data at a constant computational cost, regardless of length. Current approaches typically rely on predefined KV-cache scheduling, fixed-ratio heuristic compression, or inference-time RoPE adaptation to manage memory. However, these methods are prone to losing historical context and amplifying compounding errors, largely due to restricted cache windows and a disregard for autoregressive generation noise.
Drawing inspiration from human memory consolidation, Echo Infinity replaces manual memory curation with a learnable Memory Query system. These queries are refined through attention mechanisms and a gating process whenever older frames are removed from the local processing window. By optimizing these queries end-to-end alongside video diffusion transformers (DiTs), the framework creates an evolving memory structure that supports arbitrary compression ratios while maintaining constant computation, independent of video duration. Furthermore, this memory serves as a robust, generalizable generation prior, enhancing output quality even when utilizing only the optimized initial state.
Additionally, we present the Unified Relative RoPE Recipe. This approach anchors sink frames to begin at index 0 and ensures that the newest frame’s ID grows no further than the DiTs’ pretrained maximum temporal RoPE ID during both training and inference. This strategy eliminates the finite RoPE constraint and bridges the gap between training and inference regarding RoPE extrapolation. In evaluations spanning both short and long video generation tasks, Echo Infinity achieves state-of-the-art results. Notably, it demonstrates, for the first time, promising real-time rollouts extending beyond 24 hours (exceeding 1.3 million frames), offering a viable pathway toward practical infinite video generation.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC






