Adaptive Head Budgeting for Efficient Multi-Head Attention
Title: Adaptive Head Budgeting for Efficient Multi-Head Attention
Original: arXiv:2604.22583v2 Announce Type: replace
Abstract: While Transformer models leverage multi-head attention to capture a wide variety of representations, conventional approaches activate every attention head for each input, irrespective of the task's complexity. This static resource allocation can lead to superfluous computational costs, particularly for coarse-grained tasks like text classification, where pertinent data is often distributed globally. To address this, we introduce BudgetFormer, a novel Transformer framework that dynamically assigns attention heads based on individual inputs. The architecture simultaneously learns a head budget and a relevance distribution to identify the most informative heads. We also propose a training methodology designed to strike an optimal balance between exploration and exploitation, thereby facilitating effective head selection. Empirical evaluations on text classification benchmarks demonstrate that BudgetFormer significantly lowers both memory consumption and floating-point operations (FLOPs) without compromising accuracy, often outperforming standard multi-head attention mechanisms. These findings underscore the efficacy of adaptive head allocation in enhancing both the efficiency and performance of Transformer models.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC






