arXiv

Generative Augmented Inference

June 4, 2026 · Cheng Lu, Mengxin Wang, Dennis J. Zhang, Heng Zhang · Original Source

Title: Generative Augmented Inference

Abstract:

While large language models facilitate the creation of cost-effective, AI-generated annotations, leveraging them for robust causal inference presents significant difficulties. Simply combining human and AI data introduces bias, and current techniques like Prediction-Powered Inference (PPI; Angelopoulos et al., 2023a) rely on the premise that AI outputs serve as direct proxies for true labels—a condition frequently unmet in practical applications involving generative models. To address this, we introduce Generative Augmented Inference (GAI), a novel framework that redefines AI outputs not as surrogate labels, but as general, potentially high-dimensional informative features used to predict human annotations. By employing nonparametric methods to flexibly model this relationship, GAI ensures consistent estimation and valid statistical inference when merging human and AI datasets. We prove the asymptotic normality of our approach and demonstrate that, under random labeling conditions, GAI offers strictly superior asymptotic efficiency compared to estimations based solely on human data, provided the AI outputs contain information regarding the true labels. Our empirical evaluations across various real-world datasets confirm that GAI markedly lowers estimation error and enhances the quality of confidence intervals, outperforming both human-only baselines and PPI-based methods when dealing with diverse generative data sources.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC