Learning What Not to Impute: An Uncertainty-Aware Diffusion Framework for Meaningful Missingness
Title: Identifying Meaningful Absences: An Uncertainty-Driven Diffusion Approach for Selective Imputation
Abstract:
Addressing missing values is a cornerstone of machine learning, yet conventional techniques typically operate under the assumption that every absent data point represents an unobserved instance of a standard value. This perspective overlooks the reality that missingness in practical datasets often stems from two fundamentally different causes: certain entries are genuinely absent and semantically appropriate (meaningfully missing), while others are lost due to observational limitations and require reconstruction. To tackle this complexity, we define the problem as selective imputation, aiming to simultaneously determine which gaps should remain intact and which ought to be filled. In response, we introduce Diff-Joint, a novel framework leveraging diffusion models to co-model tabular data and a latent mask representing missingness patterns. The algorithm refines both the imputed values and the classification of missingness through an iterative process that alternates between conditional sampling and uncertainty-aware aggregation. Our experiments on both synthetic and real-world benchmarks confirm that Diff-Joint successfully distinguishes meaningfully missing entries, delivering competitive accuracy in imputation and enhancing performance in subsequent downstream tasks.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC




