MemoGen: Can Past Experience Improve Future Text-to-Image Generation?
Title: MemoGen: Can Past Experience Improve Future Text-to-Image Generation?
Abstract:
While modern text-to-image models excel at visual synthesis, they often struggle with prompts demanding implicit visual constraints, relational reasoning, or external knowledge. Although retrieval-augmented and agentic approaches help by sourcing external references or refining prompts, they generally treat each generation task as an isolated event, failing to systematically retain past successes or failures for future application. This study investigates whether a text-to-image system can continuously enhance its performance through its own generation history without requiring updates to the underlying generator.
We introduce MemoGen, a training-free framework that integrates an agentic evolution layer with existing image generators. For every task, MemoGen deduces visual requirements, fetches external evidence and references as needed, and converts these into executable generation constraints. It then assesses the output and archives task comprehension, reference selection, visual feedback, effective strategies, and lessons learned from failures into a reusable experience memory. Over successive evolution rounds, the agent accesses this relevant experience to refine similar future generations, selectively fixing previous errors while maintaining successful outputs. This mechanism facilitates test-time self-evolution without altering model parameters.
Extensive evaluations on knowledge-intensive and reasoning-focused benchmarks confirm the efficacy of this approach. After just two evolution rounds, MemoGen, leveraging the open-source Qwen-Image backbone, outperformed robust proprietary systems like Nano Banana Pro and GPT-Image-1 on both WISE and Mind-Bench. These results indicate that explicit experience memory acts as a potent continual learning signal, significantly improving the reliability of text-to-image generation.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC





