arXiv

Representation Forcing for Bottleneck-Free Unified Multimodal Models

Title: Eliminating Bottlenecks in Unified Multimodal Models via Representation Forcing

Abstract:

Unified Multimodal Models (UMMs) are designed to integrate both perception and generation capabilities within a single architecture. However, current implementations typically depend on a frozen, independently pretrained Variational Autoencoder (VAE) for image creation, which creates a structural bottleneck. Simply removing this component leads to a significant drop in quality, as the model is then required to learn both high-level structural elements and low-level pixel details from scratch. To address this challenge, this study introduces Representation Forcing (RF), a method that integrates representation prediction as an inherent capability of the model. Specifically, RF compels the decoder to autoregressively generate visual representations as intermediate tokens prior to predicting pixels; these tokens remain within the context window to guide the pixel diffusion process within the same backbone. By converting representations from outputs of perception tasks into targets for generation, RF removes the necessity for an external generative latent space. Our findings indicate that RF enhances both understanding and generation performance. In terms of image generation, our pixel-space model utilizing RF achieves parity with state-of-the-art UMMs that rely on VAEs. Furthermore, for image understanding tasks, the pixel-space RF approach generally surpasses its VAE-based counterpart. Collectively, these outcomes represent a significant advancement toward achieving end-to-end, bottleneck-free Unified Multimodal Models.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

Who’s Excited for SpaceX’s I.P.O.? Space Nerds.
New York Times

Who’s Excited for SpaceX’s I.P.O.? Space Nerds.

Space enthusiasts are the most eager for SpaceX’s IPO, driven by their passion for space exploration.

TechCrunch

Apple touts $1.4 trillion in App Store billings and sales, 90% without a commission

Apple reported $1.4 trillion in App Store billings for 2025, noting 90% were commission-free. Digital sales rose to $149...

Dimon and SpaceX Executives to Pitch IPO to Clients
Bloomberg

Dimon and SpaceX Executives to Pitch IPO to Clients

JPMorgan Chase CEO Jamie Dimon and SpaceX executives are pitching IPO details to clients.

Financial Times

Europe is finally flexing its innovation muscles

The EU’s new tech sovereignty package signals a positive shift from defensive regulation to proactive innovation, markin...

Apollo’s Zelter Expects High-Grade Debt Sales to Top US Treasuries
Bloomberg

Apollo’s Zelter Expects High-Grade Debt Sales to Top US Treasuries

Apollo’s Zelter expects high-grade debt sales to surpass US Treasuries. He anticipates investment-grade debt outperformi...

EU Insurance Watchdog Warns on Loan Risks
Bloomberg

EU Insurance Watchdog Warns on Loan Risks

EIOPA warns insurers to closely monitor loan risks, though initial reports lack specific details on the nature or scope ...