Global News Digest

arXiv

PolySpeech-100: A Large-Scale Benchmark for Speech Understanding Across 100+ Languages and Dialects

Title: PolySpeech-100: A Comprehensive Benchmark for Speech Understanding Spanning Over 100 Languages and Dialects

Abstract:

As End-to-End (E2E) Speech-Large Language Models (Speech-LLMs) continue to advance rapidly, their assessment methods remain stuck in the past, relying primarily on simple transcription tasks. Current benchmarks are hindered by three major flaws: a strong skew toward high-resource languages, an emphasis on low-level automatic speech recognition (ASR) rather than semantic reasoning, and a general oversight of regional dialects. To address these shortcomings, we present PolySpeech-100, a large-scale benchmark aimed at evaluating 'native-level' speech comprehension across 110 linguistic variants. We utilize a unique hybrid construction pipeline that combines gold-standard human recordings with instruction-driven synthetic speech, enabling coverage of 19 specific Chinese dialects and more than 80 low-resource languages.

Our extensive evaluation of 22 state-of-the-art models, including Gemini-3, GPT-Audio, and Qwen2.5-Omni, provides several key insights. First, we show that open-source E2E models surpass Cascade systems (ASR+LLM) when handling heavy dialects. This confirms that direct audio processing retains vital paralinguistic cues and prosodic features, such as intonation and stress, which are typically lost in standard transcription. Second, we identify a stark performance divide: while commercial models remain robust, open-source models experience significant degradation in performance on low-resource languages. Finally, surprisingly, we find that under standard zero-shot conditions, Chain-of-Thought prompting often reduces speech understanding performance across most tested models, suggesting a potential modality alignment gap in current architectures. PolySpeech-100 sets a new rigorous standard for the next generation of inclusive, omni-capable Speech-LLMs. The data, demo, and code are publicly accessible at https://github.com/YoungSeng/PolySpeech-100.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Schroders Renewable Unit Targets AI Assets as Power Demand Soars
Bloomberg

Schroders Renewable Unit Targets AI Assets as Power Demand Soars

Schroders’ renewable unit targets AI infrastructure, pivoting to meet soaring energy demand from artificial intelligence...

State Street's Paglia on SBI Group Partnership, ETFs
Bloomberg

State Street's Paglia on SBI Group Partnership, ETFs

State Street's Paglia discusses the SBI Group partnership and ETFs, but the source text is missing. Please provide the a...

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’
Bloomberg

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’

Nvidia CEO Jensen Huang advocates for paying workers “as much as possible,” emphasizing maximum compensation. This stanc...

TSE Talking With Regulator For Easing ETF Listing Rules
Bloomberg

TSE Talking With Regulator For Easing ETF Listing Rules

The Tokyo Stock Exchange is discussing with regulators to ease ETF listing rules. This aims to simplify market access an...

S&P DJI CEO on Japan Markets, Mega IPOs
Bloomberg

S&P DJI CEO on Japan Markets, Mega IPOs

S&P DJI CEO discusses Japan's financial markets and major IPOs.