Scalable Counterfactual Risk Estimation for Rare Events in Longitudinal Data
Title: Scalable Counterfactual Risk Estimation for Rare Events in Longitudinal Data
Abstract:
Calculating the causal impact of time-dependent interventions on survival results within extensive observational datasets presents significant computational hurdles, a difficulty that is amplified when the events of interest are infrequent. Although g-formula approaches, such as the iterative conditional expectation (ICE) estimator, offer a rigorous foundation for causal inference in longitudinal contexts, they are often prohibitively costly in terms of processing time. This expense is particularly acute when variance estimation relies on bootstrap methods. Furthermore, the scarcity of outcomes at individual time points creates pronounced class imbalance, which frequently triggers convergence failures and instability in logistic regression and similar analytical models. To overcome these obstacles, we introduce a robust strategy combining subsampling with reweighting specifically designed for longitudinal survival data. This approach is compatible with various existing estimators for this domain, including the ICE method. Our proposed technique significantly lowers computational demands while maintaining consistency and enhancing stability when dealing with rare outcomes. The efficacy of this method is demonstrated through simulation studies and validated via a large-scale electronic health record (EHR) cohort analysis focusing on social and behavioral determinants of health (SBDH) and suicide risk, proving its utility for modeling rare longitudinal events.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





