arXiv

CL-DMDF:Dynamic Multimodal Data Fusion Model Based on Contrastive Learning

June 3, 2026 · Dong Li, Lingling Zhang, Binghao Han, Linlin Ding, Yue Kou · Original Source

Title: CL-DMDF: A Contrastive Learning-Based Framework for Dynamic Multimodal Data Fusion

Abstract

Multimodal data fusion is a critical process for synthesizing and examining information derived from diverse sources, aiming to reveal hidden correlations and complementary patterns that ultimately improve data processing capabilities and decision-making outcomes. However, conventional approaches for structured multimodal inputs are often task-specific and rely on the assumption that all modalities are fully present. In practical scenarios, this assumption frequently fails, as modalities may be incomplete or absent due to various external factors. Furthermore, traditional models tend to prioritize local interactions within the context of missing modalities, thereby overlooking the broader, global complementary signals inherent in multimodal representations.

To address these challenges, we introduce the Dynamic Multimodal Data Fusion model based on Contrastive Learning (CL-DMDF). This framework features a novel attention mechanism that functions across both feature and modality dimensions, calculating robust attention scores that accurately reflect importance at each respective level. To bolster discriminative learning, CL-DMDF integrates an entity-centroid contrastive learning module, which generates positive samples based on entity centroids. Additionally, the model utilizes an adaptive fusion module designed to optimize both the efficiency and precision of dynamic fusion strategies. Comprehensive experiments performed on three distinct datasets validate the superior performance of CL-DMDF across a variety of multimodal fusion tasks.

Source: arXiv Generated at: 2026-06-03 00:00:00 UTC