AlignAtt4LLM: Fast AlignAtt for Decoder-Only LLMs at IWSLT 2026 Simultaneous Speech Translation Task
Title: AlignAtt4LLM: Efficient AlignAtt Implementation for Decoder-Only LLMs in the IWSLT 2026 Simultaneous Speech Translation Challenge
Abstract
This paper introduces AlignAtt4LLM, a simultaneous speech translation system developed for the IWSLT 2026 competition, targeting translations from English to German, Italian, and Chinese. The architecture employs a synchronous cascade approach: Qwen3-ASR generates an incrementally updated source transcript using forced alignment, which is then translated by Gemma-4 E4B-it under a machine translation-side AlignAtt policy. To our knowledge, this marks the first successful deployment of AlignAtt on a decoder-only large language model, distinguishing it from previous systems that relied on encoder-decoder cross-attention mechanisms.
To establish a viable policy within this decoder-only context, we propose four key innovations: (1) the inclusion of an explicit source span within the prompt; (2) the offline identification of alignment heads specific to translation tasks; (3) the selective qk-fast replay of the draft-to-source attention block; and (4) runtime capture of queries and keys, ensuring that model outputs remain bit-identical.
Evaluation on the IWSLT 2026 development set demonstrates that AlignAtt4LLM surpasses the provided baselines for European target languages (English-to-German and English-to-Italian) across both low-latency conditions (approximately 2 seconds) and high-latency conditions (under 4 seconds), measured by CU-LongYAAL. While results for English-to-Chinese translation are more varied, the methodology is not strictly bound to the Gemma-4 model. Since AlignAtt4LLM relies solely on a deterministic prompt structure, calibrated attention heads, and query/key capture, the same policy can be adapted to more powerful, translation-focused decoder-only MT backbones for non-European target languages.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



