Rethinking Incompleteness: Formalizing Protocol Divergence and Train-Once Learning for Robust IMVC
Title: Reevaluating Incompleteness: Encoding Protocol Divergence and Single-Pass Training for Resilient IMVC
Original: arXiv:2606.04857v1 Announce Type: new Abstract: Standard IMVC evaluation retrains separate models for different missing-data configurations. We show that this paradigm obscures a fundamental vulnerability: missing rate alone is insufficient to characterize data incompleteness. Specifically, we show that protocols with identical nominal missing rates can differ by up to $50\times$ in their proportion of fully observed samples, inducing drastically different learning regimes. We formalize this phenomenon as incompleteness divergence, providing measures that capture structural disparities across missing-data protocols. We further prove that for a broad class of reconstruction-based objectives, learning becomes structurally ill-posed when the proportion of complete samples falls below a critical threshold, leading to near-random performance. To bypass this theoretical bound, we propose CRAFT (Complete-data Robust Attention-masked Fusion Transformer). CRAFT shifts the burden of robustness from the loss function to the architecture via two key properties: (i) per-sample independence, which removes reliance on complete-sample co-occurrence, and (ii) mask-aware variable-length fusion, which aggregates only observed views through attention masking. This design allows a single model, trained once on complete data, to generalize to diverse missing patterns at inference time without retraining. Extensive experiments on seven benchmarks show that CRAFT matches or outperforms per-configuration baselines while reducing training overhead by $8.8\times$, demonstrating that robustness to missing data can be achieved as an inherent architectural property. Code (CRAFT) and our imvc-audit toolkit are available at https://anonymous.4open.science/r/CRAFT-BF80/ and https://anonymous.4open.science/r/imvc-audit-8263/.
Rewrite: Current Incomplete Multi-View Clustering (IMVC) assessment methods typically require training distinct models for each specific missing-data scenario. This study reveals that such an approach masks a critical weakness: the mere missing rate fails to adequately define the nature of data incompleteness. Our analysis demonstrates that protocols sharing the same nominal missing rate can exhibit disparities of up to $50\times$ in the fraction of fully observed instances, thereby triggering significantly different learning dynamics. We introduce the concept of "incompleteness divergence" to formally describe this effect, offering metrics that quantify structural differences among various missing-data protocols. Additionally, we demonstrate that for many reconstruction-based objectives, the learning process becomes structurally ill-posed if the share of complete samples drops beneath a specific limit, resulting in performance levels close to random chance. To overcome this theoretical limitation, we introduce CRAFT (Complete-data Robust Attention-masked Fusion Transformer). CRAFT transfers the responsibility for robustness from the loss function to the network architecture through two mechanisms: (i) per-sample independence, eliminating the need for complete-sample co-occurrence, and (ii) mask-aware variable-length fusion, which utilizes attention masking to aggregate only the available views. This architecture enables a single model, trained exclusively on complete data, to adapt to various missing patterns during inference without the need for retraining. Our extensive testing across seven benchmarks indicates that CRAFT performs on par with or better than baselines trained for specific configurations, while cutting training costs by $8.8\times$. These results suggest that robustness to missing data can be embedded directly into the architecture. The source code for CRAFT and the imvc-audit toolkit can be accessed at https://anonymous.4open.science/r/CRAFT-BF80/ and https://anonymous.4open.science/r/imvc-audit-8263/.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC






