PRISM: Synergizing Vision Foundation Models via Self-organized Expert Specialization
Title: PRISM: Harmonizing Vision Foundation Models Through Self-Organized Expert Specialization
Abstract: Combining the distinct advantages of various Vision Foundation Models (VFMs) into a single, streamlined architecture is an attractive goal, yet it is often hindered by the negative transfer effects associated with monolithic distillation. To overcome these feature conflicts, we present PRISM, a innovative dual-stream Mixture-of-Experts (MoE) framework that unifies VFMs through modular specialization. Our approach employs a two-stage paradigm: first, expertise deconstruction, in which a teacher-conditional router directs experts to focus on separate representational subspaces, thereby reducing interference; second, dynamic recomposition, where the router acquires the ability to construct customized computational pathways by assembling these experts for specific downstream tasks. Evaluations on the PASCAL-Context and NYUD-v2 datasets demonstrate that PRISM achieves a new state of the art, confirming that sparse, emergent specialization serves as a scalable method for integrating diverse visual knowledge.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



