Global News Digest

arXiv

Zamba2-VL Technical Report

Zamba2-VL Technical Report

Abstract

This paper introduces Zamba2-VL, a family of vision-language models (VLMs) constructed upon Zamba2. The underlying Zamba2 architecture is a hybrid design that integrates Mamba2 state-space layers with a limited set of shared transformer blocks. Our evaluation demonstrates that Zamba2-VL performs competitively against top-tier open-weight Transformer-based VLMs of similar size, such as the Molmo2, Qwen3-VL, and InternVL3.5 series, across diverse tasks including image comprehension, reasoning, optical character recognition (OCR), grounding, and counting. Furthermore, it significantly surpasses earlier SSM-based and hybrid VLMs, including VL-Mamba, Cobra, and mmMamba.

By leveraging the Zamba2 backbone, Zamba2-VL benefits from near-linear prefill computational costs and a recurrent state that remains small and nearly constant. Consequently, these models achieve a time-to-first-token (TTFT) that is approximately ten times lower than comparable Transformer baselines. This efficiency advantage is particularly notable at the 1.2B and 2.7B parameter scales, which are critical for on-device and edge computing applications. We have made three model variants—1.2B, 2.7B, and 7B—along with the corresponding inference code available at https://huggingface.co/collections/Zyphra/zamba2-vl.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Schroders Renewable Unit Targets AI Assets as Power Demand Soars
Bloomberg

Schroders Renewable Unit Targets AI Assets as Power Demand Soars

Schroders’ renewable unit targets AI infrastructure, pivoting to meet soaring energy demand from artificial intelligence...

State Street's Paglia on SBI Group Partnership, ETFs
Bloomberg

State Street's Paglia on SBI Group Partnership, ETFs

State Street's Paglia discusses the SBI Group partnership and ETFs, but the source text is missing. Please provide the a...

Nvidia Boss Says Workers Should Be Paid ā€˜as Much as Possible’
Bloomberg

Nvidia Boss Says Workers Should Be Paid ā€˜as Much as Possible’

Nvidia CEO Jensen Huang advocates for paying workers ā€œas much as possible,ā€ emphasizing maximum compensation. This stanc...

TSE Talking With Regulator For Easing ETF Listing Rules
Bloomberg

TSE Talking With Regulator For Easing ETF Listing Rules

The Tokyo Stock Exchange is discussing with regulators to ease ETF listing rules. This aims to simplify market access an...

S&P DJI CEO on Japan Markets, Mega IPOs
Bloomberg

S&P DJI CEO on Japan Markets, Mega IPOs

S&P DJI CEO discusses Japan's financial markets and major IPOs.