LaSR: Context-Aware Speech Recognition via Latent Reasoning
Title: LaSR: Context-Aware Speech Recognition via Latent Reasoning
Original: arXiv:2606.00507v1 Announce Type: new
Abstract: While recent breakthroughs in Speech Large Language Models (Speech LLMs) have markedly improved the comprehension and reasoning capabilities of spoken language, their contextual sensitivity remains constrained. These models often fail to conduct speech recognition that accurately mirrors the speaker’s intent and the surrounding topical context. To address this, we introduce LaSR (Latent Speech Reasoning), a fresh training framework that employs a context-aware reasoning trajectory driven by latent reasoning processes. Rather than producing explicit intermediate tokens, LaSR aligns chain-of-thought (CoT) supervision with the acoustic feature regions corresponding to target words, incorporating latent reasoning phases to ground context information and manage transcriptional transitions. Additionally, to establish a robust benchmark for contextual recognition involving specialized vocabulary, we present Spoken Darwin-Science, a comprehensive dataset centered on academic terminology. Initial tests on Fun-Audio-Chat reveal that LaSR substantially boosts terminology recognition accuracy without adding latency, consistently surpassing standard supervised fine-tuning baselines. These results underscore the promise of latent reasoning in developing efficient, context-aware speech assistants.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





