arXiv

Rethinking Incompleteness: Formalizing Protocol Divergence and Train-Once Learning for Robust IMVC

Title: Reevaluating Incompleteness: Encoding Protocol Divergence and Single-Pass Training for Resilient IMVC

Original: arXiv:2606.04857v1 Announce Type: new Abstract: Standard IMVC evaluation retrains separate models for different missing-data configurations. We show that this paradigm obscures a fundamental vulnerability: missing rate alone is insufficient to characterize data incompleteness. Specifically, we show that protocols with identical nominal missing rates can differ by up to $50\times$ in their proportion of fully observed samples, inducing drastically different learning regimes. We formalize this phenomenon as incompleteness divergence, providing measures that capture structural disparities across missing-data protocols. We further prove that for a broad class of reconstruction-based objectives, learning becomes structurally ill-posed when the proportion of complete samples falls below a critical threshold, leading to near-random performance. To bypass this theoretical bound, we propose CRAFT (Complete-data Robust Attention-masked Fusion Transformer). CRAFT shifts the burden of robustness from the loss function to the architecture via two key properties: (i) per-sample independence, which removes reliance on complete-sample co-occurrence, and (ii) mask-aware variable-length fusion, which aggregates only observed views through attention masking. This design allows a single model, trained once on complete data, to generalize to diverse missing patterns at inference time without retraining. Extensive experiments on seven benchmarks show that CRAFT matches or outperforms per-configuration baselines while reducing training overhead by $8.8\times$, demonstrating that robustness to missing data can be achieved as an inherent architectural property. Code (CRAFT) and our imvc-audit toolkit are available at https://anonymous.4open.science/r/CRAFT-BF80/ and https://anonymous.4open.science/r/imvc-audit-8263/.

Rewrite: Current Incomplete Multi-View Clustering (IMVC) assessment methods typically require training distinct models for each specific missing-data scenario. This study reveals that such an approach masks a critical weakness: the mere missing rate fails to adequately define the nature of data incompleteness. Our analysis demonstrates that protocols sharing the same nominal missing rate can exhibit disparities of up to $50\times$ in the fraction of fully observed instances, thereby triggering significantly different learning dynamics. We introduce the concept of "incompleteness divergence" to formally describe this effect, offering metrics that quantify structural differences among various missing-data protocols. Additionally, we demonstrate that for many reconstruction-based objectives, the learning process becomes structurally ill-posed if the share of complete samples drops beneath a specific limit, resulting in performance levels close to random chance. To overcome this theoretical limitation, we introduce CRAFT (Complete-data Robust Attention-masked Fusion Transformer). CRAFT transfers the responsibility for robustness from the loss function to the network architecture through two mechanisms: (i) per-sample independence, eliminating the need for complete-sample co-occurrence, and (ii) mask-aware variable-length fusion, which utilizes attention masking to aggregate only the available views. This architecture enables a single model, trained exclusively on complete data, to adapt to various missing patterns during inference without the need for retraining. Our extensive testing across seven benchmarks indicates that CRAFT performs on par with or better than baselines trained for specific configurations, while cutting training costs by $8.8\times$. These results suggest that robustness to missing data can be embedded directly into the architecture. The source code for CRAFT and the imvc-audit toolkit can be accessed at https://anonymous.4open.science/r/CRAFT-BF80/ and https://anonymous.4open.science/r/imvc-audit-8263/.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

Revolut Co-Founder, CTO Vlad Yatsenko to Step Down From Role
Bloomberg

Revolut Co-Founder, CTO Vlad Yatsenko to Step Down From Role

Revolut co-founder and CTO Vlad Yatsenko is stepping down from his executive role. The resignation marks a significant l...

Microsoft’s AI Chief Says Anthropic Models Are Too Expensive
Bloomberg

Microsoft’s AI Chief Says Anthropic Models Are Too Expensive

Microsoft AI CEO Mustafa Suleyman criticized Anthropic’s models as too expensive. Meanwhile, Microsoft plans to allow us...

Ramp Notches $44 Billion Valuation in New Funding Round
Bloomberg

Ramp Notches $44 Billion Valuation in New Funding Round

RAMP secured a $44 billion valuation in its latest funding round. CEO Eric Glyman attended the 2026 Reagan National Econ...

China’s Robotaxi Dilemma Shows AI Policy Tension Between Growth and Jobs
Bloomberg

China’s Robotaxi Dilemma Shows AI Policy Tension Between Growth and Jobs

China’s robotaxi expansion highlights the policy tension between driving economic growth through AI and protecting emplo...

Exams watchdog warns of rise in high-tech cheating
BBC News

Exams watchdog warns of rise in high-tech cheating

Ofqual warns of rising high-tech cheating, with smart devices involved in 44% of misconduct cases. Invigilators are trai...

Thailand’s Richest Man Plans $4.3 Billion Expansion Amid AI Boom
Bloomberg

Thailand’s Richest Man Plans $4.3 Billion Expansion Amid AI Boom

Thailand’s wealthiest individual is investing $4.3 billion in expansion, capitalizing on the booming artificial intellig...