arXiv

UniVerse: A Unified Modulation Framework for Segmentation-Free,Disentangled Multi-Concept Personalization

June 2, 2026 · Quynh Phung, Sandesh Ghimire, Minsi Hu, Chung-Chi Tsai, Jia-Bin Huang · Original Source

Title: UniVerse: A Unified Modulation Framework for Segmentation-Free, Disentangled Multi-Concept Personalization

Abstract:

While personalized visual understanding has seen substantial progress, current methodologies face challenges in isolating and retrieving specific concepts from images containing multiple objects. Previous techniques often depend heavily on segmentation-based supervision or suffer from weak compositional generalization, which hinders their capacity to accurately separate and manipulate distinct concepts. To address these limitations, we introduce UniVerse, a Unified Modulation Framework designed for segmentation-free, disentangled multi-concept personalization within diffusion transformers. This approach facilitates both composable and decomposable concept extraction, allowing for precise localization and representation of target objects without the need for explicit segmentation masks.

UniVerse operates by learning to break down intricate scenes into concept-specific representations, which are then integrated in a unified fashion. This mechanism supports robust personalization across a wide range of visual contexts. Our extensive evaluations across several benchmarks reveal that UniVerse significantly surpasses state-of-the-art baselines in terms of both visual fidelity and localization accuracy. The qualitative and quantitative findings indicate that our method can accurately isolate target concepts even in cluttered environments, thereby advancing the development of more flexible, interpretable, and personalized systems for visual generation and understanding.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC