arXiv

Echo-Infinity: Learning Evolving Memory for Real-Time Infinite Video Generation

Title: Echo-Infinity: Learning Evolving Memory for Real-Time Infinite Video Generation

Abstract

This paper introduces Echo Infinity, an autoregressive (AR) framework designed for the real-time generation of infinite-length videos. The system utilizes a learnable, evolving memory mechanism to dynamically filter, abstract, and compress historical data at a constant computational cost, regardless of length. Current approaches typically rely on predefined KV-cache scheduling, fixed-ratio heuristic compression, or inference-time RoPE adaptation to manage memory. However, these methods are prone to losing historical context and amplifying compounding errors, largely due to restricted cache windows and a disregard for autoregressive generation noise.

Drawing inspiration from human memory consolidation, Echo Infinity replaces manual memory curation with a learnable Memory Query system. These queries are refined through attention mechanisms and a gating process whenever older frames are removed from the local processing window. By optimizing these queries end-to-end alongside video diffusion transformers (DiTs), the framework creates an evolving memory structure that supports arbitrary compression ratios while maintaining constant computation, independent of video duration. Furthermore, this memory serves as a robust, generalizable generation prior, enhancing output quality even when utilizing only the optimized initial state.

Additionally, we present the Unified Relative RoPE Recipe. This approach anchors sink frames to begin at index 0 and ensures that the newest frame’s ID grows no further than the DiTs’ pretrained maximum temporal RoPE ID during both training and inference. This strategy eliminates the finite RoPE constraint and bridges the gap between training and inference regarding RoPE extrapolation. In evaluations spanning both short and long video generation tasks, Echo Infinity achieves state-of-the-art results. Notably, it demonstrates, for the first time, promising real-time rollouts extending beyond 24 hours (exceeding 1.3 million frames), offering a viable pathway toward practical infinite video generation.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

China’s Robotaxi Dilemma Shows AI Policy Tension Between Growth and Jobs
Bloomberg

China’s Robotaxi Dilemma Shows AI Policy Tension Between Growth and Jobs

China’s robotaxi expansion highlights the policy tension between driving economic growth through AI and protecting emplo...

Exams watchdog warns of rise in high-tech cheating
BBC News

Exams watchdog warns of rise in high-tech cheating

Ofqual warns of rising high-tech cheating, with smart devices involved in 44% of misconduct cases. Invigilators are trai...

Thailand’s Richest Man Plans $4.3 Billion Expansion Amid AI Boom
Bloomberg

Thailand’s Richest Man Plans $4.3 Billion Expansion Amid AI Boom

Thailand’s wealthiest individual is investing $4.3 billion in expansion, capitalizing on the booming artificial intellig...

US Tech Sector Announces Most Job Cuts in Nearly Two Years
Bloomberg

US Tech Sector Announces Most Job Cuts in Nearly Two Years

The US tech sector recorded its highest wave of layoffs in nearly two years, signaling a significant downturn for the in...

Iran Says No Progress in US Talks | The Opening Trade 6/4/2026
Bloomberg

Iran Says No Progress in US Talks | The Opening Trade 6/4/2026

Iran reports no progress in US talks on June 4, 2026. The Opening Trade highlights the ongoing diplomatic impasse betwee...

The Do’s and Don’ts of Buying Used Tech Gadgets
New York Times

The Do’s and Don’ts of Buying Used Tech Gadgets

Refurbished tech offers a cost-effective alternative amid component shortages and inflated prices. This guide outlines e...