Parameter-efficient Dual-encoder Architecture with Differentiable Choquet Integral Fusion for Underwater Acoustic Classification
Title: A Parameter-Efficient Dual-Encoder Framework Using Differentiable Choquet Integral Fusion for Underwater Acoustic Classification
Abstract:
Despite its broad utility in oceanic operations, underwater acoustic classification is hindered by the growing complexity of acoustic environments. Currently, this field primarily relies on waveform and spectrogram representations for feature extraction. While spectrograms effectively capture harmonic dependencies, their reduced nature may inadvertently discard acoustic details crucial for accurate discrimination. Conversely, although waveform phase information enables a complete signal characterization, the raw waveform’s inherent noise and complexity make it difficult for models to process directly.
To address these issues, this study introduces a dual-encoder neural architecture designed to simultaneously handle acoustic waveforms and spectrograms. By employing pre-trained backbones alongside parameter-efficient fine-tuning modules, the model facilitates effective domain adaptation. These adapted branches are integrated using a novel fuzzy aggregation mechanism grounded in the differentiable Choquet integral, which optimally balances temporal and spectral data. This fusion approach not only enhances classification precision but also improves interpretability. Through the analysis of learned fuzzy measures, the study uncovers insights regarding class-specific variations in the network’s reliance on different representations. Furthermore, the proposed gating mechanism dynamically redirects attention toward the representation least affected by potential asymmetric channel distortions, thereby addressing the non-stationary challenges inherent to underwater settings.
Experiments conducted on the DeepShip and ShipsEar datasets indicate that the proposed architecture outperforms independent single-encoder baselines in classification performance. Crucially, it achieves this while constraining the number of trainable parameters. This parameter efficiency reduces the likelihood of overfitting on scarce acoustic data and lowers the computational burden typically associated with fully fine-tuning foundation models.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





