On the Recoverability of Causal Relations from Bulk Gene Expression Data
Title: Reassessing the Feasibility of Inferring Causal Links from Bulk Gene Expression Profiles
Abstract
Despite the rise of single-cell technologies, bulk gene expression profiling continues to hold significant value. By pooling RNA across cells within a biological specimen, this method typically yields data that is less noisy, more sensitive, and more economical than single-cell assays. Consequently, there is an expanding array of computational approaches aimed at reconstructing causal gene interactions from bulk expression records. However, the process of aggregation acts as a lossy, non-invertible simplification of the underlying cellular dynamics, leaving the question of whether causal relationships can be reliably retrieved from such coarse-grained data largely unresolved.
To address this gap, we define the concept of recoverability under aggregation through two specific frameworks: functional-form consistency and conditional-independence consistency. Our analysis derives the necessary and sufficient conditions for such recoverability, demonstrating that these properties are maintained exclusively when linear aggregations (such as sums or means) are paired with affine structural equations. To evaluate the practical applicability of these theoretical conditions, we conducted analyses on four single-cell and four bulk gene expression datasets. The results indicate that estimated pairwise regulatory functions in both data types exhibit non-linear characteristics. This finding offers scant empirical evidence for the linearity assumptions that are prerequisite for recoverability. Ultimately, these findings suggest that inferring causal relations from aggregated bulk expression data should be approached with caution in the absence of robust additional assumptions.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





