Analyzing Stream Collapse in Hyper-Connections: From Diagnosis to Mitigation
Title: Investigating Stream Collapse in Hyper-Connections: Diagnosis and Mitigation Strategies
Abstract: Hyper-Connections (HC) diverge from the standard Transformer architecture by substituting the single residual stream with multiple distinct streams, thereby introducing permutation symmetry across stream indices. This study examines how this symmetry is practically resolved, specifically investigating whether streams achieve balanced specialization or if usage is dominated by a single stream. Through fine-grained diagnostics applied to HC-based language models, we map the actual utilization of multi-stream representations. Our analysis reveals that following an initial seeding phase, residual mixing tends to stay near the identity matrix, which restricts a fundamental HC mechanism for information exchange between streams. Furthermore, we observe that both signals and interpretable features cluster within a dominant stream, causing the nominally multi-stream residual connection to underutilize its potential and function more like a single-stream pathway. Lastly, we demonstrate that disrupting symmetry during stream initialization mitigates dominant behavior and enhances performance across various \textit{m}HC variants. Our code is publicly available.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



