arXiv

Boosting Multimodal Federated Learning via Chained Modality Optimization

June 2, 2026 · Zixin Zhang, Fan Qi, Shuai Li, Xiaoshan Yang, Changsheng Xu · Original Source

Title: Enhancing Multimodal Federated Learning Through Sequential Modality Optimization

Abstract: Multimodal Federated Learning (MMFL) facilitates collaborative, privacy-preserving model training among decentralized clients, accommodating heterogeneous data distributions and varying modality availability. Despite its advantages, current MMFL approaches typically treat multimodal training as a simultaneous joint optimization task. This methodology often neglects a critical limitation known as modality competition, wherein stronger modalities overshadow weaker ones, resulting in inferior global model performance. To overcome this challenge, we introduce FedMChain, a novel framework designed to balance MMFL by organizing training into a sequence of modality-specific phases. By allocating dedicated local optimization intervals for each modality, this phased approach reduces competitive interference and enhances cross-modal synergy through an error-compensated regularizer. At the server level, we implement a sparse sign-guided aggregation mechanism. This strategy utilizes directional sign alignment to ensure robust intra-modality aggregation, prevents the degradation associated with destructive averaging, and enables reduced synchronization frequency to lower communication costs. Comprehensive evaluations across various multimodal benchmarks reveal that FedMChain consistently delivers superior predictive accuracy compared to baseline methods, all while demanding less frequent communication.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC