arXiv

SALSA: Speech Aware LLM Adaptation via Learned Steering Activation Vectors

June 2, 2026 · Yekaterina Yegorova, Argyrios Gerogiannis, Haolong Zheng, Julia Hockenmaier, Chang D. Yoo, Mark A. Hasegawa-Johnson · Original Source

Title: SALSA: Speech Aware LLM Adaptation via Learned Steering Activation Vectors

Original: arXiv:2606.00460v1 Announce Type: new

Abstract: Large language models equipped with speech capabilities frequently struggle to generalize when applied to out-of-domain scenarios. To address this, we introduce SALSA (Speech-Aware LLM Adaptation via Learned Steering Activations), a resource-efficient adaptation technique that acquires layer-specific steering vectors. In contrast to traditional methods that depend on contrastive activation disparities, SALSA employs a supervised objective to directly optimize these steering vectors.

Our evaluation across benchmarks involving children’s speech, multilingual speech, and Mandarin-English code-switching reveals that SALSA significantly outperforms both zero-shot inference and speech in-context learning baselines. Notably, it delivers relative performance gains of up to 46.8% over zero-shot approaches. Further analysis indicates that applying steering to the encoder—especially in its later layers—yields greater effectiveness than steering the LLM backbone. These results imply that the enhancement in downstream Automatic Speech Recognition (ASR) performance stems from adapting higher-level acoustic and phonetic representations to better align with the pretrained language model’s representation space, rather than from modifications to the decoder.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC